|
|
DescriptionFix segfault when PeerConnection is destroyed during stats collection.
RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the
PeerConnection is destroyed, reference counting keeps the
RTCStatsCollector alive until the request has completed. But the request
is using PeerConnection/WebRtcSession resources that are destroyed in
~PeerConnection().
To get around this problem, RTCStatsCollector::WaitForPendingRequest()
is added, which is invoked at ~PeerConnection().
Integration test added, it caused a segmentation fault before this
change / EXPECT failure.
BUG=chromium:627816
Review-Url: https://codereview.webrtc.org/2583613003
Cr-Commit-Position: refs/heads/master@{#15674}
Committed: https://chromium.googlesource.com/external/webrtc/+/b78306a7d339bfcc3c4a6043023ab57d95444b99
Patch Set 1 #
Total comments: 8
Patch Set 2 : Fix possible nullptr dereferencing #Patch Set 3 : Addressed comments #
Total comments: 4
Patch Set 4 : DCHECK in destructor added #
Messages
Total messages: 27 (18 generated)
Description was changed from ========== Fix segfault when PeerConnection is destroyed during stats collection. RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the PeerConnection is destroyed, reference counting keeps the RTCStatsCollector alive until the request has completed. But the request is using PeerConnection/WebRtcSession resources that are destroyed in ~PeerConnection. To get around this problem, RTCStatsCollector::WaitForPendingRequest() is added, which is invoked at ~PeerConnection(). Integration test added, it caused a segmentation fault before this change. BUG=chromium:627816 ========== to ========== Fix segfault when PeerConnection is destroyed during stats collection. RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the PeerConnection is destroyed, reference counting keeps the RTCStatsCollector alive until the request has completed. But the request is using PeerConnection/WebRtcSession resources that are destroyed in ~PeerConnection(). To get around this problem, RTCStatsCollector::WaitForPendingRequest() is added, which is invoked at ~PeerConnection(). Integration test added, it caused a segmentation fault before this change. BUG=chromium:627816 ==========
The CQ bit was checked by hbos@webrtc.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
Patchset #1 (id:1) has been deleted
hbos@webrtc.org changed reviewers: + deadbeef@webrtc.org, hta@webrtc.org
Please take a look, hta and beefy
https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstatscollec... File webrtc/api/rtcstatscollector.h (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstatscollec... webrtc/api/rtcstatscollector.h:73: // If there are any |GetStatsReport| requests in-flight, waits until it has nit: Will update comment to speak in singular, this makes it sound like there could be multiple concurrent requests, but there can only be one at a time. (Multiple GetStatsReport calls are piggybacked)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_msan on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_msan/builds/16106) mac_asan on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/mac_asan/builds/20374)
A few comments. Good find! https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/peerconnection.cc File webrtc/api/peerconnection.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/peerconnection... webrtc/api/peerconnection.cc:611: stats_collector_->WaitForPendingRequest(); Doing waiting in a destructor makes me itchy. The destructor seems to be marked "protected" in the .h files - is it possible to do the waiting in the function that calls the destructor instead, and just assert that there are no pending requests when you get to the destructor level? There's another wait below (the Invoke()), where the same comments apply, so you wouldn't make the destructor more irritating to me than it already is, but it's nice to not cause excessive itches. https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstats_integ... File webrtc/api/rtcstats_integrationtest.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstats_integ... webrtc/api/rtcstats_integrationtest.cc:550: DestroyCallerAndCallee(); In the interest of minimizing the amount of side effects, should you only destroy the caller here, or do they hold references to each other? https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstatscollec... File webrtc/api/rtcstatscollector.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstatscollec... webrtc/api/rtcstatscollector.cc:438: rtc::Thread::Current()->SleepMs(1); This is a busy-wait. Is it possible to wait on something asynchronous instead (like a callback)?
The CQ bit was checked by hbos@webrtc.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
PTAL hta, deadbeef. https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/peerconnection.cc File webrtc/api/peerconnection.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/peerconnection... webrtc/api/peerconnection.cc:611: stats_collector_->WaitForPendingRequest(); On 2016/12/16 14:05:31, hta-webrtc wrote: > Doing waiting in a destructor makes me itchy. The destructor seems to be marked > "protected" in the .h files - is it possible to do the waiting in the function > that calls the destructor instead, and just assert that there are no pending > requests when you get to the destructor level? > > There's another wait below (the Invoke()), where the same comments apply, so you > wouldn't make the destructor more irritating to me than it already is, but it's > nice to not cause excessive itches. I'm afraid not, PeerConnectionInterface is reference counted, the only thing we can be sure of is that we're on the signaling thread. https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstats_integ... File webrtc/api/rtcstats_integrationtest.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstats_integ... webrtc/api/rtcstats_integrationtest.cc:550: DestroyCallerAndCallee(); On 2016/12/16 14:05:31, hta-webrtc wrote: > In the interest of minimizing the amount of side effects, should you only > destroy the caller here, or do they hold references to each other? Good point, changed to "caller_ = nullptr" and removed the helper function. https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstatscollec... File webrtc/api/rtcstatscollector.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/rtcstatscollec... webrtc/api/rtcstatscollector.cc:438: rtc::Thread::Current()->SleepMs(1); On 2016/12/16 14:05:31, hta-webrtc wrote: > This is a busy-wait. Is it possible to wait on something asynchronous instead > (like a callback)? We need to ProcessMessages (e.g. ProducePartialResultsOnSignalingThread or AddPartialResults_s or if any other part of our code base that might get executed posts to the signaling thread) and we need to do it blockingly, otherwise the PeerConnection destructor would proceed to delete the resources that we can't delete until we wait.
The CQ bit was checked by hbos@webrtc.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
Description was changed from ========== Fix segfault when PeerConnection is destroyed during stats collection. RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the PeerConnection is destroyed, reference counting keeps the RTCStatsCollector alive until the request has completed. But the request is using PeerConnection/WebRtcSession resources that are destroyed in ~PeerConnection(). To get around this problem, RTCStatsCollector::WaitForPendingRequest() is added, which is invoked at ~PeerConnection(). Integration test added, it caused a segmentation fault before this change. BUG=chromium:627816 ========== to ========== Fix segfault when PeerConnection is destroyed during stats collection. RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the PeerConnection is destroyed, reference counting keeps the RTCStatsCollector alive until the request has completed. But the request is using PeerConnection/WebRtcSession resources that are destroyed in ~PeerConnection(). To get around this problem, RTCStatsCollector::WaitForPendingRequest() is added, which is invoked at ~PeerConnection(). Integration test added, it caused a segmentation fault before this change / EXPECT failure. BUG=chromium:627816 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm with nits. I'd prefer if the SleepMs could be avoided. https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/peerconnection.cc File webrtc/api/peerconnection.cc (right): https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/peerconnection... webrtc/api/peerconnection.cc:612: stats_collector_->WaitForPendingRequest(); nit: I'd just put this in the RTCStatsCollector destructor instead of making it a public method. But I do see the reasoning of "a destructor shouldn't be doing things like waiting for other threads". If you choose to leave it like this, could you add something like: RTC_DCHECK(num_pending_partial_reports_ == 0) << "You must call WaitForPendingRequest before safely destroying the RTCStatsCollector." ... in the RTCStatsCollector destructor? And leave a comment in the header saying WaitForPendingRequest must be called before safely destroying? https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/rtcstatscollec... File webrtc/api/rtcstatscollector.cc (right): https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/rtcstatscollec... webrtc/api/rtcstatscollector.cc:438: rtc::Thread::Current()->SleepMs(1); nit: Could use an rtc::Event that's set when a partial stats report completes instead of SleepMs.
lgtm rtc::Event seems like it might fit the bill for avoiding the busy-wait. A little tricky to get the sequencing right.... https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/peerconnection.cc File webrtc/api/peerconnection.cc (right): https://codereview.webrtc.org/2583613003/diff/20001/webrtc/api/peerconnection... webrtc/api/peerconnection.cc:611: stats_collector_->WaitForPendingRequest(); On 2016/12/16 14:34:52, hbos wrote: > On 2016/12/16 14:05:31, hta-webrtc wrote: > > Doing waiting in a destructor makes me itchy. The destructor seems to be > marked > > "protected" in the .h files - is it possible to do the waiting in the function > > that calls the destructor instead, and just assert that there are no pending > > requests when you get to the destructor level? > > > > There's another wait below (the Invoke()), where the same comments apply, so > you > > wouldn't make the destructor more irritating to me than it already is, but > it's > > nice to not cause excessive itches. > > I'm afraid not, PeerConnectionInterface is reference counted, the only thing we > can be sure of is that we're on the signaling thread. OK, the only thing that calls the destructor is the subclass constructed by RefCounted<PeerConnection>..... but you have no real control over when the refcount hits zero. Then this makes sense.
https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/peerconnection.cc File webrtc/api/peerconnection.cc (right): https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/peerconnection... webrtc/api/peerconnection.cc:612: stats_collector_->WaitForPendingRequest(); On 2016/12/16 18:53:27, Taylor Brandstetter wrote: > nit: I'd just put this in the RTCStatsCollector destructor instead of making it > a public method. But I do see the reasoning of "a destructor shouldn't be doing > things like waiting for other threads". > > If you choose to leave it like this, could you add something like: > > RTC_DCHECK(num_pending_partial_reports_ == 0) << "You must call > WaitForPendingRequest before safely destroying the RTCStatsCollector." > > ... in the RTCStatsCollector destructor? And leave a comment in the header > saying WaitForPendingRequest must be called before safely destroying? It's not that you have to wait before destroying the collector, in fact the collector is reference counted and will not be destroyed during a pending request by design. The wait is to ensure that resources the collector is relying on (that are not reference counted or owned by the collector) are not destroyed, e.g. making sure PeerConnection and friends is not destroyed before the collector. I added a DCHECK, but if it is ever hit we have a bug and not just a problem of not calling WaitForPendingRequest. That said, now that we have a WaitForPendingRequests we could remove the reference counting if we wanted to (not in this CL). https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/rtcstatscollec... File webrtc/api/rtcstatscollector.cc (right): https://codereview.webrtc.org/2583613003/diff/60001/webrtc/api/rtcstatscollec... webrtc/api/rtcstatscollector.cc:438: rtc::Thread::Current()->SleepMs(1); On 2016/12/16 18:53:27, Taylor Brandstetter wrote: > nit: Could use an rtc::Event that's set when a partial stats report completes > instead of SleepMs. rtc::Event's wait does not process messages. In order for num_pending_partial_reports_ to decrease we have to do message processing (doe to jumping to other threads for stats collecting and back to the signaling thread for the stats report merging and invoking callback).
The CQ bit was checked by hbos@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from hta@webrtc.org, deadbeef@webrtc.org Link to the patchset: https://codereview.webrtc.org/2583613003/#ps80001 (title: "DCHECK in destructor added")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
CQ is committing da patch. Bot data: {"patchset_id": 80001, "attempt_start_ts": 1482151990556130, "parent_rev": "0c8c5388355f5df085595d9ea24fa38995708120", "commit_rev": "b78306a7d339bfcc3c4a6043023ab57d95444b99"}
Message was sent while issue was closed.
Description was changed from ========== Fix segfault when PeerConnection is destroyed during stats collection. RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the PeerConnection is destroyed, reference counting keeps the RTCStatsCollector alive until the request has completed. But the request is using PeerConnection/WebRtcSession resources that are destroyed in ~PeerConnection(). To get around this problem, RTCStatsCollector::WaitForPendingRequest() is added, which is invoked at ~PeerConnection(). Integration test added, it caused a segmentation fault before this change / EXPECT failure. BUG=chromium:627816 ========== to ========== Fix segfault when PeerConnection is destroyed during stats collection. RTCStatsCollector relies on PeerConnection and its WebRtcSession. If the PeerConnection is destroyed, reference counting keeps the RTCStatsCollector alive until the request has completed. But the request is using PeerConnection/WebRtcSession resources that are destroyed in ~PeerConnection(). To get around this problem, RTCStatsCollector::WaitForPendingRequest() is added, which is invoked at ~PeerConnection(). Integration test added, it caused a segmentation fault before this change / EXPECT failure. BUG=chromium:627816 Review-Url: https://codereview.webrtc.org/2583613003 Cr-Commit-Position: refs/heads/master@{#15674} Committed: https://chromium.googlesource.com/external/webrtc/+/b78306a7d339bfcc3c4a60430... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:80001) as https://chromium.googlesource.com/external/webrtc/+/b78306a7d339bfcc3c4a60430... |