Issue 2885143005: Fixing race between ~AsyncInvoker and ~AsyncClosure, using ref-counting.

Taylor Brandstetter

Description was changed from ========== Fix AsyncInvoker destructor race condition using ref-counting. BUG=webrtc:7656 ========== to ...

3 years, 4 months ago (2017-08-03 23:28:46 UTC) #1

Taylor Brandstetter

Description was changed from ========== Fixing race between ~AsyncInvoker and ~AsyncClosure. The AsyncInvoker destructor waits ...

3 years, 4 months ago (2017-08-03 23:29:11 UTC) #2

Taylor Brandstetter

Description was changed from ========== Fixing race between ~AsyncInvoker and ~AsyncClosure, using ref-counting. The AsyncInvoker ...

3 years, 4 months ago (2017-08-03 23:56:45 UTC) #3

Taylor Brandstetter

deadbeef@webrtc.org changed reviewers: + kwiberg@webrtc.org

3 years, 4 months ago (2017-08-04 00:04:52 UTC) #4

Taylor Brandstetter

kwiberg: I don't know how familiar you are with AsyncInvoker, but would you be able ...

3 years, 4 months ago (2017-08-04 00:04:53 UTC) #5

kwiberg-webrtc

lgtm, but see comments I'm not an AsyncInvoker expert by any means, but it certainly ...

3 years, 4 months ago (2017-08-04 09:28:09 UTC) #6

lgtm, but see comments

I'm not an AsyncInvoker expert by any means, but it certainly looks like this
change does what you claim it does.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:33: Thread::Current()->Clear(this);
What does the change on this line do?

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:49: return;
A comment explaining why this is a no-op if we're being destroyed?

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:74: // to use it.
Doesn't that mean that the warning message is flaky? If we don't know if a
situation is an error or not, should we warn about it?

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.h (right):

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.h:129: scoped_refptr<RefCountedObject<Event>>
invocation_complete_;
Umm... isn't this exactly how scoped_refptr isn't intended to be used? The ref
counting is supposed to be an inalienable part of the class, so that it's
impossible to create non-refcounted instances. But that isn't the case with
Event.

Adding reference counting to a class that lacks native support for it is
precisely the intended job of std::shared_ptr, on the other hand.

I'm thinking we should consider deviating from Chromium's ban on shared_ptr for
cases where non-intrusive reference counting is what we want; I'll write
something up. In the mean time, I guess using this construction in this CL is
OK. Having to create a RefCountedEventContainer class would be painful.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.h:133: // terminates.
That's why you need it. Could you also explain what the value means? Is it a
count of the number of currently running destructors, or what?

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.h:134: int destroying_ = 0;
Ooohhh... when reading the .cc file, I found that you use atomic operations on
this member. That isn't apparent from its type! Consider using std::atomic<int>
instead. (Side note: We should probably remove our home-grown atomics and use
the std:: atomics everywhere.)

Hmm. No, it should be std::atomic<bool>, right? It isn't a counter, but with our
homegrown atomics, int is the only available type.

Taylor Brandstetter

Switched to std::atomic, take another look. https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinvoker.cc File webrtc/rtc_base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinvoker.cc#newcode33 webrtc/rtc_base/asyncinvoker.cc:33: Thread::Current()->Clear(this); On 2017/08/04 ...

3 years, 4 months ago (2017-08-04 19:14:23 UTC) #7

Switched to std::atomic, take another look.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:33: Thread::Current()->Clear(this);
On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> What does the change on this line do?

It only clears events on the current thread, which is all that's necessary (see
comment above). This is something nisse@ pointed out in a previous review:
https://codereview.webrtc.org/2876273002/diff/1/webrtc/base/asyncinvoker.cc#n...

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:49: return;
On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> A comment explaining why this is a no-op if we're being destroyed?

I don't know exactly. I went back to the CL where this was added, in 2014, but
didn't find a clear explanation.

Though it makes sense that if you're destroying, you don't want to execute even
more tasks. The goal is to finish all tasks in progress, not start executing new
ones. I'll add a comment.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:74: // to use it.
On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> Doesn't that mean that the warning message is flaky? If we don't know if a
> situation is an error or not, should we warn about it?

My thinking was "I'd rather have a flaky warning that directs people to this
comment than have no warning at all." Our guidelines say "warning" log levels
should be used "for undesired but relatively expected events, which *may*
indicate a problem."

Though, are there any chromium-specific log level guidelines? I couldn't find
any.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.h (right):

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.h:129: scoped_refptr<RefCountedObject<Event>>
invocation_complete_;
On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> Umm... isn't this exactly how scoped_refptr isn't intended to be used? The ref
> counting is supposed to be an inalienable part of the class, so that it's
> impossible to create non-refcounted instances. But that isn't the case with
> Event.
> 
> Adding reference counting to a class that lacks native support for it is
> precisely the intended job of std::shared_ptr, on the other hand.
> 
> I'm thinking we should consider deviating from Chromium's ban on shared_ptr
for
> cases where non-intrusive reference counting is what we want; I'll write
> something up. In the mean time, I guess using this construction in this CL is
> OK. Having to create a RefCountedEventContainer class would be painful.

Sorry, I still don't understand what the problem is. I'd use shared_ptr if I
could, but I don't see what practical difference it makes whether Event inherits
from RefCountInterface or not. Can you explain more/give an example?

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.h:134: int destroying_ = 0;
On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> Ooohhh... when reading the .cc file, I found that you use atomic operations on
> this member. That isn't apparent from its type! Consider using
std::atomic<int>
> instead. (Side note: We should probably remove our home-grown atomics and use
> the std:: atomics everywhere.)
> 
> Hmm. No, it should be std::atomic<bool>, right? It isn't a counter, but with
our
> homegrown atomics, int is the only available type.

Oh, apparently <atomic> is a feature chromium allows now! I'll switch to that.

kwiberg-webrtc

Lots of subtleties with std::atomic... :-) I'm by no means an expert, so I tried ...

3 years, 4 months ago (2017-08-06 04:17:13 UTC) #8

Lots of subtleties with std::atomic... :-) I'm by no means an expert, so I tried
to flag everything I thought *might* be wrong, and provide a rationale; it's
quite possible that all you need to do is explain why I'm wrong in each case.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:74: // to use it.
On 2017/08/04 19:14:22, Taylor Brandstetter wrote:
> On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> > Doesn't that mean that the warning message is flaky? If we don't know if a
> > situation is an error or not, should we warn about it?
> 
> My thinking was "I'd rather have a flaky warning that directs people to this
> comment than have no warning at all." Our guidelines say "warning" log levels
> should be used "for undesired but relatively expected events, which *may*
> indicate a problem."

Hmm, OK.

> Though, are there any chromium-specific log level guidelines? I couldn't find
> any.

Not that I'm aware of.

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.h (right):

https://codereview.webrtc.org/2885143005/diff/100001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.h:129: scoped_refptr<RefCountedObject<Event>>
invocation_complete_;
On 2017/08/04 19:14:22, Taylor Brandstetter wrote:
> On 2017/08/04 09:28:09, kwiberg-webrtc wrote:
> > Umm... isn't this exactly how scoped_refptr isn't intended to be used? The
ref
> > counting is supposed to be an inalienable part of the class, so that it's
> > impossible to create non-refcounted instances. But that isn't the case with
> > Event.
> > 
> > Adding reference counting to a class that lacks native support for it is
> > precisely the intended job of std::shared_ptr, on the other hand.
> > 
> > I'm thinking we should consider deviating from Chromium's ban on shared_ptr
> for
> > cases where non-intrusive reference counting is what we want; I'll write
> > something up. In the mean time, I guess using this construction in this CL
is
> > OK. Having to create a RefCountedEventContainer class would be painful.
> 
> Sorry, I still don't understand what the problem is. I'd use shared_ptr if I
> could, but I don't see what practical difference it makes whether Event
inherits
> from RefCountInterface or not. Can you explain more/give an example?

No, it's just me who's confused. The requirements on a refcounted type is that
it must only owned by folks who use its reference count correctly (which almost
inevitably implies it must be on the heap). Event isn't such a type, but
RefCountedObject<Event> is, and the fact that RefCountedObject<Event> is a
subclass of Event doesn't matter.

However,

  1. The reference counter methods are virtual for no good reason. (This could
be fixed by making a template like RefCountedObject that has non-virtual
reference counter methods.)

  2. The contained class is required to be non-final. (Not a problem in this
case, because Event is non-final. Could be fixed by making a refcounted template
that has the given class as a member instead of as a subclass.)

  3. The contained class is required to be concrete. (Not a problem in this
case, because Event is concrete. Could be fixed by having a unique_ptr<T>
instead of a plain T as member.)

...at which point we've essentially reinvented a primitive form of
std::shared_ptr.

So keep doing what you're doing for now (it *is* safe), and we'll see if we can
get a consensus on allowing std::shared_ptr in cases like this where it would
obviously be a good fit.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:24: destroying_.store(true,
std::memory_order_relaxed);
Doesn't this need to be std::memory_order_release, so that no reads or writes in
the current thread can be reordered after this store?

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:30: while
(pending_invocations_.load(std::memory_order_acquire)) {
For clarity, avoid implicit int -> bool conversion:

  while (pending_invocations_.load(std::memory_order_acquire) > 0)

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:52: if
(destroying_.load(std::memory_order_relaxed))
std::memory_order_acquire, so that no reads or writes in the current thread can
be reordered before this load?

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:74: if
(destroying_.load(std::memory_order_relaxed)) {
Again, std::memory_order_acquire?

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:91: if
(destroying_.load(std::memory_order_relaxed)) {
And again...

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:125: invoker_->pending_invocations_.fetch_add(1,
std::memory_order_relaxed);
std::memory_order_acquire? (Because unlike a reference count, where you never
increment from 0 to 1, pending_invocations_ can be incremented from 0 to 1 by
this operation, which makes it unsafe to reorder reads and writes from this
thread before this increment.)

Taylor Brandstetter

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinvoker.cc File webrtc/rtc_base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinvoker.cc#newcode24 webrtc/rtc_base/asyncinvoker.cc:24: destroying_.store(true, std::memory_order_relaxed); On 2017/08/06 04:17:12, kwiberg-webrtc wrote: > Doesn't ...

3 years, 4 months ago (2017-08-07 20:28:12 UTC) #9

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:24: destroying_.store(true,
std::memory_order_relaxed);
On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> Doesn't this need to be std::memory_order_release, so that no reads or writes
in
> the current thread can be reordered after this store?

I don't think so. "destroying_" doesn't need to be synchronized with anything,
it just needs to propagate *eventually* to every thread, to potentially break an
endless loop of AsyncInvokes. See the comment in the header:

// This flag is used to ensure that if an application AsyncInvokes tasks that
// recursively AsyncInvoke other tasks ad infinitum, the cycle eventually
// terminates.

For example, suppose I own an AsyncInvoker, but I need to do some work on a
different thread, so I do:

invoker_.AsyncInvoke(RTC_FROM_HERE, worker_thread, rtc::Bind(this,
Foo::DoWork));

And DoWork does this:

void Foo::DoWork() {
  work_item = work_queue_.pop();
  DoWorkInternal(work_item);
  if (!work_queue_.empty()) {
    invoker_.AsyncInvoke(RTC_FROM_HERE, worker_thread, rtc::Bind(this,
Foo::DoWork));
  }
}

If Foo tries to destroy the invoker on the "main" thread, then without the
"destroying" flag, it would be blocked until the work queue is empty.

I'm not saying this is a great pattern in the first place, but it's one
AsyncInvoker supports.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:30: while
(pending_invocations_.load(std::memory_order_acquire)) {
On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> For clarity, avoid implicit int -> bool conversion:
> 
>   while (pending_invocations_.load(std::memory_order_acquire) > 0)

Done.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:52: if
(destroying_.load(std::memory_order_relaxed))
On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> std::memory_order_acquire, so that no reads or writes in the current thread
can
> be reordered before this load?

Not needed, see above. Same for below comments around destroying_.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:125: invoker_->pending_invocations_.fetch_add(1,
std::memory_order_relaxed);
On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> std::memory_order_acquire? (Because unlike a reference count, where you never
> increment from 0 to 1, pending_invocations_ can be incremented from 0 to 1 by
> this operation, which makes it unsafe to reorder reads and writes from this
> thread before this increment.)

This would only be an issue if one thread is incrementing the count from 0->1
while another thread is destroying the AsyncInvoker. But that means one thread
is calling "AsyncInvoke" while another thread is destroying the invoker, which
is inherently unsafe.

The only time when a "safe" usage of this class could result in AsyncInvoke
being called during destruction is when the AsyncInvoke is happening from
*within* an AsyncInvoked functor (the so-called "re-entrant"/"recursive"
invocations). But in that case, the outer invocation is holding a reference, so
it could only be incremented from 1->2. And it's not possible for the decrement
of pending_invocations_ to be reordered relative to the increment, so we don't
have to worry about, say, it being decremented from 1->0 before being
incremented from 1->2.

I added some more comments to the header to make it more clear what things are
"safe" and how the class should be properly used.

kwiberg-webrtc

lgtm https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinvoker.cc File webrtc/rtc_base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinvoker.cc#newcode24 webrtc/rtc_base/asyncinvoker.cc:24: destroying_.store(true, std::memory_order_relaxed); On 2017/08/07 20:28:12, Taylor Brandstetter wrote: ...

3 years, 4 months ago (2017-08-08 22:22:39 UTC) #10

lgtm

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:24: destroying_.store(true,
std::memory_order_relaxed);
On 2017/08/07 20:28:12, Taylor Brandstetter wrote:
> On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> > Doesn't this need to be std::memory_order_release, so that no reads or
writes
> in
> > the current thread can be reordered after this store?
> 
> I don't think so. "destroying_" doesn't need to be synchronized with anything,
> it just needs to propagate *eventually* to every thread, to potentially break
an
> endless loop of AsyncInvokes. See the comment in the header:
> 
> // This flag is used to ensure that if an application AsyncInvokes tasks that
> // recursively AsyncInvoke other tasks ad infinitum, the cycle eventually
> // terminates.
> 
> For example, suppose I own an AsyncInvoker, but I need to do some work on a
> different thread, so I do:
> 
> invoker_.AsyncInvoke(RTC_FROM_HERE, worker_thread, rtc::Bind(this,
> Foo::DoWork));
> 
> And DoWork does this:
> 
> void Foo::DoWork() {
>   work_item = work_queue_.pop();
>   DoWorkInternal(work_item);
>   if (!work_queue_.empty()) {
>     invoker_.AsyncInvoke(RTC_FROM_HERE, worker_thread, rtc::Bind(this,
> Foo::DoWork));
>   }
> }
> 
> If Foo tries to destroy the invoker on the "main" thread, then without the
> "destroying" flag, it would be blocked until the work queue is empty.
> 
> I'm not saying this is a great pattern in the first place, but it's one
> AsyncInvoker supports.

Ah, OK. Yeah, that seems good.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:52: if
(destroying_.load(std::memory_order_relaxed))
On 2017/08/07 20:28:12, Taylor Brandstetter wrote:
> On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> > std::memory_order_acquire, so that no reads or writes in the current thread
> can
> > be reordered before this load?
> 
> Not needed, see above. Same for below comments around destroying_.

Acknowledged.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:125: invoker_->pending_invocations_.fetch_add(1,
std::memory_order_relaxed);
On 2017/08/07 20:28:12, Taylor Brandstetter wrote:
> On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> > std::memory_order_acquire? (Because unlike a reference count, where you
never
> > increment from 0 to 1, pending_invocations_ can be incremented from 0 to 1
by
> > this operation, which makes it unsafe to reorder reads and writes from this
> > thread before this increment.)
> 
> This would only be an issue if one thread is incrementing the count from 0->1
> while another thread is destroying the AsyncInvoker. But that means one thread
> is calling "AsyncInvoke" while another thread is destroying the invoker, which
> is inherently unsafe.
> 
> The only time when a "safe" usage of this class could result in AsyncInvoke
> being called during destruction is when the AsyncInvoke is happening from
> *within* an AsyncInvoked functor (the so-called "re-entrant"/"recursive"
> invocations). But in that case, the outer invocation is holding a reference,
so
> it could only be incremented from 1->2. And it's not possible for the
decrement
> of pending_invocations_ to be reordered relative to the increment, so we don't
> have to worry about, say, it being decremented from 1->0 before being
> incremented from 1->2.
> 
> I added some more comments to the header to make it more clear what things are
> "safe" and how the class should be properly used.

Acknowledged. Ach, this is subtle... I have a strong feeling that the right
thing to do from a code maintenance perspective is to not have naked atomic
operations mixed in with a heap of other nontrivial code like this, but to
package them up in a clearly documented class. Compare with the current best
practice of handling heap object ownership in C++: always use a dedicated smart
pointer class, except when implementing smart pointers. Not because raw new and
delete are impossible to use correctly, but because they have sharp edges that
we'd rather not deal with at the same time we're dealing with other nontrivial
things.

Taylor Brandstetter

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinvoker.cc File webrtc/rtc_base/asyncinvoker.cc (right): https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinvoker.cc#newcode125 webrtc/rtc_base/asyncinvoker.cc:125: invoker_->pending_invocations_.fetch_add(1, std::memory_order_relaxed); On 2017/08/08 22:22:39, kwiberg-webrtc wrote: > On ...

3 years, 4 months ago (2017-08-08 23:42:16 UTC) #11

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:125: invoker_->pending_invocations_.fetch_add(1,
std::memory_order_relaxed);
On 2017/08/08 22:22:39, kwiberg-webrtc wrote:
> On 2017/08/07 20:28:12, Taylor Brandstetter wrote:
> > On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> > > std::memory_order_acquire? (Because unlike a reference count, where you
> never
> > > increment from 0 to 1, pending_invocations_ can be incremented from 0 to 1
> by
> > > this operation, which makes it unsafe to reorder reads and writes from
this
> > > thread before this increment.)
> > 
> > This would only be an issue if one thread is incrementing the count from
0->1
> > while another thread is destroying the AsyncInvoker. But that means one
thread
> > is calling "AsyncInvoke" while another thread is destroying the invoker,
which
> > is inherently unsafe.
> > 
> > The only time when a "safe" usage of this class could result in AsyncInvoke
> > being called during destruction is when the AsyncInvoke is happening from
> > *within* an AsyncInvoked functor (the so-called "re-entrant"/"recursive"
> > invocations). But in that case, the outer invocation is holding a reference,
> so
> > it could only be incremented from 1->2. And it's not possible for the
> decrement
> > of pending_invocations_ to be reordered relative to the increment, so we
don't
> > have to worry about, say, it being decremented from 1->0 before being
> > incremented from 1->2.
> > 
> > I added some more comments to the header to make it more clear what things
are
> > "safe" and how the class should be properly used.
> 
> Acknowledged. Ach, this is subtle... I have a strong feeling that the right
> thing to do from a code maintenance perspective is to not have naked atomic
> operations mixed in with a heap of other nontrivial code like this, but to
> package them up in a clearly documented class. Compare with the current best
> practice of handling heap object ownership in C++: always use a dedicated
smart
> pointer class, except when implementing smart pointers. Not because raw new
and
> delete are impossible to use correctly, but because they have sharp edges that
> we'd rather not deal with at the same time we're dealing with other nontrivial
> things.

Agreed. I can't think of a great way to split this functionality into an
independent class at the moment, but I'll add a TODO comment for now and think
about it.

Taylor Brandstetter

The CQ bit was checked by deadbeef@webrtc.org

3 years, 4 months ago (2017-08-08 23:42:30 UTC) #12

Taylor Brandstetter

The patchset sent to the CQ was uploaded after l-g-t-m from kwiberg@webrtc.org Link to the ...

3 years, 4 months ago (2017-08-08 23:42:31 UTC) #13

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/2885143005/160001

3 years, 4 months ago (2017-08-08 23:42:35 UTC) #14

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 160001, "attempt_start_ts": 1502235750569520, "parent_rev": "42f96d53f3acf7a721d456b6eae872bb6f37a0a0", "commit_rev": "3af63b0dc9d8444ca5480dfcccd1f87fa05eb232"}

3 years, 4 months ago (2017-08-09 00:59:47 UTC) #15

commit-bot: I haz the power

Description was changed from ========== Fixing race between ~AsyncInvoker and ~AsyncClosure, using ref-counting. The AsyncInvoker ...

3 years, 4 months ago (2017-08-09 00:59:52 UTC) #16

Message was sent while issue was closed.

Description was changed from

==========
Fixing race between ~AsyncInvoker and ~AsyncClosure, using ref-counting.

The AsyncInvoker destructor waits for all invoked tasks to be complete
(in other words, all AsyncClosures to be destructed). They were using an
event to wake up the destructor, but a race made it possible for this
event to be dereferenced after it's destroyed.

This CL makes the event reference counted, such that if the destructor
runs right after AsyncClosure decrements "pending_invocations_",
setting the event will be a no-op, and the event will be destructed
in the AsyncClosure destructor.

This CL also fixes a deadlock that may occur for "re-entrant"
invocations. The deadlock occurs if the AsyncInvoker is destroyed on
thread A while a task on thread B is running, which AsyncInvokes a task
back on thread A.

This was causing pending_invocations_ to end up negative, because
an AsyncClosure that's never added to a thread's message queue (due to
the "destroying_" flag) caused the count to be decremented but not
incremented.

BUG=webrtc:7656
==========

to

==========
Fixing race between ~AsyncInvoker and ~AsyncClosure, using ref-counting.

The AsyncInvoker destructor waits for all invoked tasks to be complete
(in other words, all AsyncClosures to be destructed). They were using an
event to wake up the destructor, but a race made it possible for this
event to be dereferenced after it's destroyed.

This CL makes the event reference counted, such that if the destructor
runs right after AsyncClosure decrements "pending_invocations_",
setting the event will be a no-op, and the event will be destructed
in the AsyncClosure destructor.

This CL also fixes a deadlock that may occur for "re-entrant"
invocations. The deadlock occurs if the AsyncInvoker is destroyed on
thread A while a task on thread B is running, which AsyncInvokes a task
back on thread A.

This was causing pending_invocations_ to end up negative, because
an AsyncClosure that's never added to a thread's message queue (due to
the "destroying_" flag) caused the count to be decremented but not
incremented.

BUG=webrtc:7656

Review-Url: https://codereview.webrtc.org/2885143005
Cr-Commit-Position: refs/heads/master@{#19278}
Committed:
https://chromium.googlesource.com/external/webrtc/+/3af63b0dc9d8444ca5480dfcc...
==========

commit-bot: I haz the power

Committed patchset #9 (id:160001) as https://chromium.googlesource.com/external/webrtc/+/3af63b0dc9d8444ca5480dfcccd1f87fa05eb232

3 years, 4 months ago (2017-08-09 00:59:54 UTC) #17

kwiberg-webrtc

3 years, 4 months ago (2017-08-09 08:41:57 UTC) #18

Message was sent while issue was closed.

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
File webrtc/rtc_base/asyncinvoker.cc (right):

https://codereview.webrtc.org/2885143005/diff/120001/webrtc/rtc_base/asyncinv...
webrtc/rtc_base/asyncinvoker.cc:125: invoker_->pending_invocations_.fetch_add(1,
std::memory_order_relaxed);
On 2017/08/08 23:42:16, Taylor Brandstetter wrote:
> On 2017/08/08 22:22:39, kwiberg-webrtc wrote:
> > On 2017/08/07 20:28:12, Taylor Brandstetter wrote:
> > > On 2017/08/06 04:17:12, kwiberg-webrtc wrote:
> > > > std::memory_order_acquire? (Because unlike a reference count, where you
> > never
> > > > increment from 0 to 1, pending_invocations_ can be incremented from 0 to
1
> > by
> > > > this operation, which makes it unsafe to reorder reads and writes from
> this
> > > > thread before this increment.)
> > > 
> > > This would only be an issue if one thread is incrementing the count from
> 0->1
> > > while another thread is destroying the AsyncInvoker. But that means one
> thread
> > > is calling "AsyncInvoke" while another thread is destroying the invoker,
> which
> > > is inherently unsafe.
> > > 
> > > The only time when a "safe" usage of this class could result in
AsyncInvoke
> > > being called during destruction is when the AsyncInvoke is happening from
> > > *within* an AsyncInvoked functor (the so-called "re-entrant"/"recursive"
> > > invocations). But in that case, the outer invocation is holding a
reference,
> > so
> > > it could only be incremented from 1->2. And it's not possible for the
> > decrement
> > > of pending_invocations_ to be reordered relative to the increment, so we
> don't
> > > have to worry about, say, it being decremented from 1->0 before being
> > > incremented from 1->2.
> > > 
> > > I added some more comments to the header to make it more clear what things
> are
> > > "safe" and how the class should be properly used.
> > 
> > Acknowledged. Ach, this is subtle... I have a strong feeling that the right
> > thing to do from a code maintenance perspective is to not have naked atomic
> > operations mixed in with a heap of other nontrivial code like this, but to
> > package them up in a clearly documented class. Compare with the current best
> > practice of handling heap object ownership in C++: always use a dedicated
> smart
> > pointer class, except when implementing smart pointers. Not because raw new
> and
> > delete are impossible to use correctly, but because they have sharp edges
that
> > we'd rather not deal with at the same time we're dealing with other
nontrivial
> > things.
> 
> Agreed. I can't think of a great way to split this functionality into an
> independent class at the moment, but I'll add a TODO comment for now and think
> about it.

I was thinking along these lines:

  // Instances of this class are initially unset. The flag is
  // set and tested with atomic instructions, which makes it
  // thread-safe, but there are no memory barriers that prevent
  // the setting and testing of the flag being reordered relative
  // to surrounding memory accesses. Informally, use this to
  // signal a condition when it doesn't matter if the signal
  // arrives a little early or a little late.
  class NonsynchronizingAtomicOneWayFlag {
    void Set();  // Atomically set the flag.
    bool IsSet();  // Has the flag been set?
  };

  // Instances of this class initially have a reference count of 1.
  // Incref() and Decref() calls can be freely mixed and are
  // thread-safe. Once the refcount goes to zero, the counter is
  // spent, and you may no longer call either Incref() or Decref();
  // this means that the caller needs to statically ensure that
  // the refcount is at least 1.
  class RefCounter {
    void Incref();  // Atomically increases the refcount by 1.
    bool Decref();  // Atomically decreases the refcount by 1,
                    // returning true if the refcount reached zero.
  };

pending_invocations_ is more complicated, so I won't try to design it here.

Issue 2885143005: Fixing race between ~AsyncInvoker and ~AsyncClosure, using ref-counting. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Rebase on revision that still had CancelInvoke, which is a test we need. #

Patch Set 3 : . #

Patch Set 4 : Remove extra increments of pending_invocations_ #

Patch Set 5 : Merging in https://codereview.webrtc.org/2885143006/ #

Patch Set 6 : Only clear current thread's message queue in destructor loop #

Patch Set 7 : Adding comments and switching to std::atomic. #

Patch Set 8 : Adding comments and addressing nit. #

Patch Set 9 : Add another TODO. #

Messages