|
|
Created:
3 years, 7 months ago by Taylor Brandstetter Modified:
3 years, 4 months ago Reviewers:
nisse-webrtc CC:
webrtc-reviews_webrtc.org, tterriberry_mozilla.com, kwiberg-webrtc Target Ref:
refs/heads/master Project:
webrtc Visibility:
Public. |
DescriptionFixing potential AsyncInvoker deadlock that occurs for "reentrant" invocations.
The deadlock occurs if the AsyncInvoker is destroyed on thread A while
a task on thread B is running, which AsyncInvokes a task back on thread
A.
This was causing pending_invocations_ to end up negative, because
an AsyncClosure that's never added to a thread's message queue (due to
the "destroying_" flag) caused the count to be decremented but not
incremented.
BUG=None
Review-Url: https://codereview.webrtc.org/2885143006
Cr-Commit-Position: refs/heads/master@{#18225}
Committed: https://chromium.googlesource.com/external/webrtc/+/ef37ca5fb3431864130de3c2fd0ff865f9eb47dd
Patch Set 1 #
Total comments: 10
Patch Set 2 : Move "MessageQueueManager::Clear" back to where it was. Makes no difference. #Patch Set 3 : Addressing TSan race warning #
Messages
Total messages: 21 (9 generated)
deadbeef@webrtc.org changed reviewers: + nisse@webrtc.org
https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc File webrtc/base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:33: MessageQueueManager::Clear(this); This is subtle (maybe too subtle...). So at the end of this loop, pending_invocations is zero. Is destruction safe then? I think it is; in the "reentrant" case count starts at 1 (or larger), goes up to 2 when the new invocation is posted, and gets down to zero only when both the parent closure and the new closure are destroyed. (And pending_invocations_ going from 0 to 1 indicates improper "top-level" invocations racing against the destructor, which why can't take any responsibility for here). As for the deadlock, I'm having some difficulty understanding what difference this change makes. Let us see if we can sort this out... MessageQueueManager::Clear is expected to destroy some references, with the AsyncClosure destructor being called, and for each decrement, the event will be signalled. Say we race against another thread executing an AsyncClosure. When it's done, it will signal the event, terminating our Wait(). By this time, it has added the new invocation to our queue, so MessageQueueManager::Clear will remove it, make pending_invocations_ zero (and signal the event again, which we don't care about). So how is this race different from the old code before this change? Even if the new invocation gets posted between Clear and Wait, that means that the parent invocation on the other thread was alive by the time we called Clear, and when it finishes, our Wait wakes up. Clear in the next iteration through the loop should destroy that AsyncClosure and signal the event again. So I'm having some difficulty understanding the deadlock. https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:113: AtomicOps::Increment(&invoker_->pending_invocations_); I take it moving the increment here from DoInvoke* is a stylistic rather than functional change, it is still done *before* anything gets added to the MessageQueue? Or am I missing some subtlety? (Diversion: I no longer think incrementing from 0 to 1 is our problem, but I'll keep this note anyhow in case some veriant of it is useful): If we (1) change pending_invocations_ to be a refcount which is initialized to 1 and decremented also in the destructor, and (2) ensure that we never increment it from zero, would that help? Could be done with something like while (true) { int refs = atomic::AcquireLoad(&refcount); if (!refs) return; if (Atomic::CompareAndSwap(&refcount, refs, refs+1)) break; } We would at least get the helpful invariant that whenever refcount gets to zero, it's stuck at zero. (But on the other hand, if it *is* zero, we shouldn't have accessed it in the first place, since it's a likely use-after-free). https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/thread_unittest.cc File webrtc/base/thread_unittest.cc (right): https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/thread_unittest.c... webrtc/base/thread_unittest.cc:505: } I'm trying to understand what happens here... Scope ends, so AsyncInvoker is destroyed. And its destructor is going to block until |functor| has returned, which takes about 1s? That |reentrant_functor| never runs, is that due to the |destroying_| flag (seems a bit scary, since it's set by the destructor running on main thread, and read by the AsyncInvoke call on the other thread, with no synchronization), or due to the MessageQueueManager::Clear call? To me, it seems the synchronization primitive needed to make things easy is to atomically {check that destroying_ is false and if so, increment the reference count}. But seems difficult to implement without locks.
https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc File webrtc/base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:117: AtomicOps::Decrement(&invoker_->pending_invocations_); Thinking a bit more, I'm afraids this approach is fundamentally unsafe. Assume the AsyncInvoker destructor gets to run between this Decrement call and the next line. Then pending_invocations_ is zero, the AsyncInvoker object is destroyed, and the access to the invocation_complete_ event below is a use-after-free. When using a reference count, decrementing the count to zero has to be the final access to the object. Also note that using Event::Set involves both locking operations and a condition variable, so attempting to avoid locks and instead doing an Event.Set per invocation seems pointless. I see two approaches that might work: * Use an explicit lock, which could protect both the count, |destroying_ | flag and any other needed state. * Introduce a refcounted object, referenced (via plain scoped_refptr) from both AsyncInvoker and AsyncClosure, and put all synchronization state there. Then I think AsyncClosure could be implemented without lock operations, only atomics. With heavier synchronization only at AsyncInvoker destruction time.
https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc File webrtc/base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:33: MessageQueueManager::Clear(this); I must have had a brain lapse and forgot that if "Wait" is called after an event is already set, it returns immediately. So you're right, this makes no difference. https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:113: AtomicOps::Increment(&invoker_->pending_invocations_); On 2017/05/18 08:04:55, nisse-webrtc wrote: > I take it moving the increment here from DoInvoke* is a stylistic rather than > functional change, it is still done *before* anything gets added to the > MessageQueue? Or am I missing some subtlety? It's still done before anything gets added to the MessageQueue. But it's not just stylistic; it prevents the reference count from getting messed up in the "tried to invoke while destroying" case. In that case, the destructor was getting run, decrementing pending_invocations_, even though it wasn't incremented. Resulting in a refcount of -1. https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:117: AtomicOps::Decrement(&invoker_->pending_invocations_); On 2017/05/18 11:12:39, nisse-webrtc wrote: > Thinking a bit more, I'm afraids this approach is fundamentally unsafe. Assume > the AsyncInvoker destructor gets to run between this Decrement call and the next > line. Then pending_invocations_ is zero, the AsyncInvoker object is destroyed, > and the access to the invocation_complete_ event below is a use-after-free. That's exactly the issue I was trying to fix in this CL: https://codereview.webrtc.org/2876273002/ ... which just uses a busy loop, since I wanted to avoid adding a new lock for every async task. I also made an alternative that uses a ref-counted object just like you suggest: https://codereview.webrtc.org/2885143005/ But *this* CL isn't about fixing that race condition. It's just about adding a test for the "reentrant" invocations. > Also note that using Event::Set involves both locking operations and a condition > variable, so attempting to avoid locks and instead doing an Event.Set per > invocation seems pointless. I didn't notice that... I'll try to come up with a solution that doesn't use the event. But I think we can go ahead with this CL, at least so that the unit test gets added.
https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc File webrtc/base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:113: AtomicOps::Increment(&invoker_->pending_invocations_); On 2017/05/18 20:54:52, Taylor Brandstetter wrote: > On 2017/05/18 08:04:55, nisse-webrtc wrote: > > I take it moving the increment here from DoInvoke* is a stylistic rather than > > functional change, it is still done *before* anything gets added to the > > MessageQueue? Or am I missing some subtlety? > > It's still done before anything gets added to the MessageQueue. But it's not > just stylistic; it prevents the reference count from getting messed up in the > "tried to invoke while destroying" case. In that case, the destructor was > getting run, decrementing pending_invocations_, even though it wasn't > incremented. Resulting in a refcount of -1. I see, I was under the impression that the closure object was created by DoInvoke* (and hence this change would move the increment slightly later). But in fact, it moves the increment earlier, before the if (destroying_) check. https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:117: AtomicOps::Decrement(&invoker_->pending_invocations_); On 2017/05/18 20:54:52, Taylor Brandstetter wrote: > But *this* CL isn't about fixing that race condition. It's just about adding a > test for the "reentrant" invocations. Makes sense. lgtm, then.
https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc File webrtc/base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143006/diff/1/webrtc/base/asyncinvoker.cc#n... webrtc/base/asyncinvoker.cc:117: AtomicOps::Decrement(&invoker_->pending_invocations_); On 2017/05/19 06:43:21, nisse-webrtc wrote: > On 2017/05/18 20:54:52, Taylor Brandstetter wrote: > > But *this* CL isn't about fixing that race condition. It's just about adding a > > test for the "reentrant" invocations. > > Makes sense. lgtm, then. But please update cl title accordingly...
Description was changed from ========== Fixing potential AsyncInvoker deadlock. The deadlock occurs if the AsyncInvoker is destroyed on thread A while a task on thread B is running, which AsyncInvokes a task back on thread A. BUG=None ========== to ========== Fixing potential AsyncInvoker deadlock that occurs for "reentrant" invocations. The deadlock occurs if the AsyncInvoker is destroyed on thread A while a task on thread B is running, which AsyncInvokes a task back on thread A. This was causing pending_invocations_ to end up negative, because an AsyncClosure that's never added to a thread's message queue (due to the "destroying_" flag) caused the count to be decremented but not incremented. BUG=None ==========
The CQ bit was checked by deadbeef@webrtc.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_tsan2 on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_tsan2/builds/22667)
The CQ bit was checked by deadbeef@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from nisse@webrtc.org Link to the patchset: https://codereview.webrtc.org/2885143006/#ps40001 (title: "Addressing TSan race warning")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1495490425435520, "parent_rev": "f472699bbd955062dcf1413316d0810eb3e2dea9", "commit_rev": "ef37ca5fb3431864130de3c2fd0ff865f9eb47dd"}
Message was sent while issue was closed.
Description was changed from ========== Fixing potential AsyncInvoker deadlock that occurs for "reentrant" invocations. The deadlock occurs if the AsyncInvoker is destroyed on thread A while a task on thread B is running, which AsyncInvokes a task back on thread A. This was causing pending_invocations_ to end up negative, because an AsyncClosure that's never added to a thread's message queue (due to the "destroying_" flag) caused the count to be decremented but not incremented. BUG=None ========== to ========== Fixing potential AsyncInvoker deadlock that occurs for "reentrant" invocations. The deadlock occurs if the AsyncInvoker is destroyed on thread A while a task on thread B is running, which AsyncInvokes a task back on thread A. This was causing pending_invocations_ to end up negative, because an AsyncClosure that's never added to a thread's message queue (due to the "destroying_" flag) caused the count to be decremented but not incremented. BUG=None Review-Url: https://codereview.webrtc.org/2885143006 Cr-Commit-Position: refs/heads/master@{#18225} Committed: https://chromium.googlesource.com/external/webrtc/+/ef37ca5fb3431864130de3c2f... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/external/webrtc/+/ef37ca5fb3431864130de3c2f...
Message was sent while issue was closed.
A revert of this CL (patchset #3 id:40001) has been created in https://codereview.webrtc.org/2904543003/ by deadbeef@webrtc.org. The reason for reverting is: Causes a new TSan race warning. Will reland after fixing. Note this is the same race as will be fixed by https://codereview.webrtc.org/2876273002/..
Message was sent while issue was closed.
deadbeef@webrtc.org changed reviewers: - nisse@webrtc.org
Message was sent while issue was closed.
This CL will be relanded as https://codereview.webrtc.org/2885143005/ |