Issue 2350663002: Change thread check to race check

the sun

Description was changed from ========== Change thread check to non-re-entrancy check. BUG=webrtc:6345 ========== to ========== ...

4 years, 3 months ago (2016-09-19 20:30:28 UTC) #1

the sun

solenberg@webrtc.org changed reviewers: + kwiberg@webrtc.org, tommi@webrtc.org

4 years, 3 months ago (2016-09-19 20:45:00 UTC) #2

the sun

Tommi, Karl, this is a proof of concept and I'd like your input before adding ...

4 years, 3 months ago (2016-09-19 20:45:01 UTC) #3

tommi

On 2016/09/19 20:45:01, the sun wrote: > Tommi, Karl, this is a proof of concept ...

4 years, 3 months ago (2016-09-20 08:51:58 UTC) #4

the sun

On 2016/09/20 08:51:58, tommi (webrtc) wrote: > On 2016/09/19 20:45:01, the sun wrote: > > ...

4 years, 3 months ago (2016-09-20 09:02:38 UTC) #5

tommi

On 2016/09/20 09:02:38, the sun wrote: > On 2016/09/20 08:51:58, tommi (webrtc) wrote: > > ...

4 years, 3 months ago (2016-09-20 09:10:35 UTC) #6

kwiberg-webrtc

I like the idea. https://codereview.webrtc.org/2350663002/diff/20001/webrtc/base/thread_checker.h File webrtc/base/thread_checker.h (right): https://codereview.webrtc.org/2350663002/diff/20001/webrtc/base/thread_checker.h#newcode211 webrtc/base/thread_checker.h:211: __RTC_DCHECK_NON_REENTRANT_NAME(checker, __LINE__)(checker); The semicolon shouldn't ...

4 years, 3 months ago (2016-09-20 10:55:22 UTC) #7

the sun

On 2016/09/20 10:55:22, kwiberg-webrtc wrote: > I like the idea. > > https://codereview.webrtc.org/2350663002/diff/20001/webrtc/base/thread_checker.h > File ...

4 years, 3 months ago (2016-09-20 12:34:46 UTC) #8

kwiberg-webrtc

On 2016/09/20 12:34:46, the sun wrote: > perkj@ made me aware of > https://cs.chromium.org/chromium/src/third_party/webrtc/base/race_checker.h Ew! ...

4 years, 3 months ago (2016-09-20 12:50:56 UTC) #9

the sun

On 2016/09/20 12:50:56, kwiberg-webrtc wrote: > On 2016/09/20 12:34:46, the sun wrote: > > perkj@ ...

4 years, 3 months ago (2016-09-20 17:53:31 UTC) #10

kwiberg-webrtc

On 2016/09/20 17:53:31, the sun wrote: > On 2016/09/20 12:50:56, kwiberg-webrtc wrote: > > On ...

4 years, 3 months ago (2016-09-20 18:26:18 UTC) #11

On 2016/09/20 17:53:31, the sun wrote:
> On 2016/09/20 12:50:56, kwiberg-webrtc wrote:
> > On 2016/09/20 12:34:46, the sun wrote:
> > > perkj@ made me aware of
> > >
https://cs.chromium.org/chromium/src/third_party/webrtc/base/race_checker.h
> > 
> > Ew! That one uses volatile int instead of real atomics. I guess it makes
> sense,
> > since it can only ever misbehave if there was indeed a race, but still...
it's
> > only a few lines of code, and five minutes later I'm still not sure that it
> > can't misbehave in a way that wouldn't be OK.
> 
> Yeah, I think you're right. In C++, volatile only marks memory as "may change
at
> any time", it does not say anything about memory ordering, like it does in
Java,
> and is therefore not so useful for multithreaded situations. But these things
> are *tricky*.

Yes. It's difficult enough to get right if you use real atomic operations.

> To me it looks like the "if (access_count_++ == 0)" line introduces a race.
> [...] If I'm right that'd be a race in the RaceChecker.

Yes (trivially, because the code writes to memory without proper
synchronization). It's clear enough that if there is no race, the checker will
reliably fail to report a race while doing no damage to the system. It's also
obvious that if we experience a race, the checker may (1) report the race
without doing any damage, or (2) fail to report the race without doing any
damage. What's in question is if there's a third possibility in case we
experience a race: (3) the checker damages the system.

If we just stick to the standard, then (3) may trivially occur, because a race
is undefined behavior, so the compiler is free to e.g. insert its own race
checker that starts downloading cat videos if it detects a race. It's very
likely that (3) cannot occur in practice, but as I said I'm not sure.

> 
> > > I'll update the CL and use that since it is almost exactly what I was
trying
> > to
> > > do here (% an int or two left in release builds).
> > 
> > Yes, that implementation attempts to be usable both as CHECK and DCHECK, so
> the
> > struct can't be empty in release builds. I would be much more comfortable
with
> > something that was a no-op in release builds, and used proper
synchronization
> to
> > detect races when DCHECK_IS_ON.
> > 
> > I won't object if you write a CL to copy it into WebRTC, though.
> 
> It's already in webrtc/base/. I'll ask the author for an explanation and
suggest
> we use a crit sect if we figure out it is indeed broken.

Oh, somewhy I got the impression that this was a Chromium thing, but it's a
WebRTC-specific thing.

Yes, unless the author can give a convincing argument why this code is
unconditionally safe, we should fix it. One way to do so is to use real atomics,
but as you said even those are tricky to use correctly. A real mutex would be
much better, but the performance penalty would probably make this checker
useless unless checking is disabled in release builds.

the sun

On 2016/09/20 18:26:18, kwiberg-webrtc wrote: > On 2016/09/20 17:53:31, the sun wrote: > > On ...

4 years, 3 months ago (2016-09-20 18:40:04 UTC) #12

On 2016/09/20 18:26:18, kwiberg-webrtc wrote:
> On 2016/09/20 17:53:31, the sun wrote:
> > On 2016/09/20 12:50:56, kwiberg-webrtc wrote:
> > > On 2016/09/20 12:34:46, the sun wrote:
> > > > perkj@ made me aware of
> > > >
> https://cs.chromium.org/chromium/src/third_party/webrtc/base/race_checker.h
> > > 
> > > Ew! That one uses volatile int instead of real atomics. I guess it makes
> > sense,
> > > since it can only ever misbehave if there was indeed a race, but still...
> it's
> > > only a few lines of code, and five minutes later I'm still not sure that
it
> > > can't misbehave in a way that wouldn't be OK.
> > 
> > Yeah, I think you're right. In C++, volatile only marks memory as "may
change
> at
> > any time", it does not say anything about memory ordering, like it does in
> Java,
> > and is therefore not so useful for multithreaded situations. But these
things
> > are *tricky*.
> 
> Yes. It's difficult enough to get right if you use real atomic operations.
> 
> > To me it looks like the "if (access_count_++ == 0)" line introduces a race.
> > [...] If I'm right that'd be a race in the RaceChecker.
> 
> Yes (trivially, because the code writes to memory without proper
> synchronization). It's clear enough that if there is no race, the checker will
> reliably fail to report a race while doing no damage to the system. It's also
> obvious that if we experience a race, the checker may (1) report the race
> without doing any damage, or (2) fail to report the race without doing any
> damage. What's in question is if there's a third possibility in case we
> experience a race: (3) the checker damages the system.
> 
> If we just stick to the standard, then (3) may trivially occur, because a race
> is undefined behavior, so the compiler is free to e.g. insert its own race
> checker that starts downloading cat videos if it detects a race. It's very
> likely that (3) cannot occur in practice, but as I said I'm not sure.
> 
> > 
> > > > I'll update the CL and use that since it is almost exactly what I was
> trying
> > > to
> > > > do here (% an int or two left in release builds).
> > > 
> > > Yes, that implementation attempts to be usable both as CHECK and DCHECK,
so
> > the
> > > struct can't be empty in release builds. I would be much more comfortable
> with
> > > something that was a no-op in release builds, and used proper
> synchronization
> > to
> > > detect races when DCHECK_IS_ON.
> > > 
> > > I won't object if you write a CL to copy it into WebRTC, though.
> > 
> > It's already in webrtc/base/. I'll ask the author for an explanation and
> suggest
> > we use a crit sect if we figure out it is indeed broken.
> 
> Oh, somewhy I got the impression that this was a Chromium thing, but it's a
> WebRTC-specific thing.
> 
> Yes, unless the author can give a convincing argument why this code is
> unconditionally safe, we should fix it. One way to do so is to use real
atomics,
> but as you said even those are tricky to use correctly. A real mutex would be
> much better, but the performance penalty would probably make this checker
> useless unless checking is disabled in release builds.

According to the author (pbos@ - https://codereview.webrtc.org/2097403002) the
RaceChecker is intentionally racy for performance reasons. I've added a comment
trying to explain why it will still reliably report the races it is there to
find, but it does not address your (3) concern.

the sun

Description was changed from ========== Change thread check to non-re-entrancy check. Introduces rtc::NonReentrantChecker. BUG=webrtc:6345 ========== ...

4 years, 3 months ago (2016-09-20 18:41:06 UTC) #13

the sun

solenberg@webrtc.org changed reviewers: + pbos@webrtc.org

4 years, 3 months ago (2016-09-20 18:41:17 UTC) #14

kwiberg-webrtc

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc File webrtc/base/race_checker.cc (right): https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc#newcode20 webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual() check to return false ...

4 years, 3 months ago (2016-09-20 19:29:08 UTC) #15

pbos-webrtc

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc File webrtc/base/race_checker.cc (right): https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc#newcode20 webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual() check to return false ...

4 years, 3 months ago (2016-09-20 19:36:02 UTC) #16

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc
File webrtc/base/race_checker.cc (right):

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker....
webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual() check
to return false at a later point.
On 2016/09/20 19:29:07, kwiberg-webrtc wrote:
> It may be incremented once instead of twice, possibly followed by both threads
> writing to accessing_thread_. A similar race may occur if two threads try to
> leave the critical section at the same time, or if one leaves at the same time
> as one enters. The only possible bad effect of a race should be that we fail
to
> detect a race.
> 
> However, this assumes that all reads of accessing_thread_ retrieve a value
that
> was previously written. This is not guaranteed to be the case! E.g. if
> PlatformThreadRef is 64 bits and we have a 32-bit system, the write may happen
> in two parts.
> 
> I'd be more comfortable if RaceChecker used atomics so that it itself was race
> free, or if it was only enabled in debug builds. I don't expect that
problematic
> behavior is very likely, though, so fixing this isn't a high priority.

If two threads use it, and anyone at any point writes garbage into access_count_
(or generate -1 by double-decrementing or whatever) then the thread ref check
will fail for either thread A or B in the future since they won't set
accessing_thread_ even though they use the critical section.

Atomics cause synchronization and could also potentially make a data race less
likely to be found, plus it adds non-zero overhead, especially on ARM platforms
etc.

The idea was that this should be used in release as well, and the cost being
that this is more probabilistic than deterministic in detection. Though I doubt
it has a very high false negative rate, unless a compiler manages to completely
miscompile it since it's not supposed to be racy.

pbos-webrtc

ell-gee-tee-emm lgtm https://codereview.webrtc.org/2350663002/diff/40001/webrtc/media/engine/webrtcvoiceengine.cc File webrtc/media/engine/webrtcvoiceengine.cc (right): https://codereview.webrtc.org/2350663002/diff/40001/webrtc/media/engine/webrtcvoiceengine.cc#newcode1321 webrtc/media/engine/webrtcvoiceengine.cc:1321: webrtc::AudioTransport* const voe_audio_transport_ = nullptr; GUARDED_BY(audio_capture_racer_checker_) if ...

4 years, 3 months ago (2016-09-20 19:38:03 UTC) #17

kwiberg-webrtc

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc File webrtc/base/race_checker.cc (right): https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc#newcode20 webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual() check to return false ...

4 years, 3 months ago (2016-09-20 20:01:32 UTC) #18

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc
File webrtc/base/race_checker.cc (right):

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker....
webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual() check
to return false at a later point.
On 2016/09/20 19:36:02, pbos-webrtc wrote:
> On 2016/09/20 19:29:07, kwiberg-webrtc wrote:
> > It may be incremented once instead of twice, possibly followed by both
threads
> > writing to accessing_thread_. A similar race may occur if two threads try to
> > leave the critical section at the same time, or if one leaves at the same
time
> > as one enters. The only possible bad effect of a race should be that we fail
> to
> > detect a race.
> > 
> > However, this assumes that all reads of accessing_thread_ retrieve a value
> that
> > was previously written. This is not guaranteed to be the case! E.g. if
> > PlatformThreadRef is 64 bits and we have a 32-bit system, the write may
happen
> > in two parts.
> > 
> > I'd be more comfortable if RaceChecker used atomics so that it itself was
race
> > free, or if it was only enabled in debug builds. I don't expect that
> problematic
> > behavior is very likely, though, so fixing this isn't a high priority.
> 
> If two threads use it, and anyone at any point writes garbage into
access_count_
> (or generate -1 by double-decrementing or whatever) then the thread ref check
> will fail for either thread A or B in the future since they won't set
> accessing_thread_ even though they use the critical section.

Yes, garbage in access_count_ shouldn't be a problem. At most, it'll cause us to
fail to detect a race. It's garbage in accessing_thread_ that could be bad. On
Windows, we end up comparing such values as plain integers, which is safe even
if they contain garbage, but where we use pthreads we compare them with
pthread_equal(). If that function e.g. interprets (parts of) its two pthread_t
arguments as indexes and uses those indexes to look stuff up in memory, we have
a problem.

> Atomics cause synchronization and could also potentially make a data race less
> likely to be found, plus it adds non-zero overhead, especially on ARM
platforms
> etc.

Yes on both counts, although I wouldn't suspect that atomic instructions would
cause data races to be all that much less likely to occur. Did you try it and
find a nonnegligible effect?

Implementing this with proper atomics would make it exactly equivalent to a
mutex that explodes whenever there is contention (i.e. in exactly the
circumstances where a real mutex would busy-wait or sleep). That seems like it
ought to work just fine.

> The idea was that this should be used in release as well, and the cost being
> that this is more probabilistic than deterministic in detection.

OK. That may serve as a convincing argument for why using a mutex or even
atomics might be too expensive.

> Though I doubt
> it has a very high false negative rate, unless a compiler manages to
completely
> miscompile it since it's not supposed to be racy.

The compiler doesn't need to miscompile this code---if you have a race, that's
undefined behavior, so the compiler is fully within its rights to produce code
that behaves very badly if a race occurs. This code relies on compiler
implementations and hardware to be nice. As a result, you have to know a bunch
of low-level details about the various processors, OSes, and compilers that we
use in order to assess if it's safe.

kwiberg-webrtc

https://codereview.webrtc.org/2350663002/diff/40001/webrtc/media/engine/webrtcvoiceengine.cc File webrtc/media/engine/webrtcvoiceengine.cc (right): https://codereview.webrtc.org/2350663002/diff/40001/webrtc/media/engine/webrtcvoiceengine.cc#newcode1321 webrtc/media/engine/webrtcvoiceengine.cc:1321: webrtc::AudioTransport* const voe_audio_transport_ = nullptr; On 2016/09/20 19:38:03, pbos-webrtc ...

4 years, 3 months ago (2016-09-20 20:05:25 UTC) #19

pbos-webrtc

On 2016/09/20 20:01:32, kwiberg-webrtc wrote: > https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc > File webrtc/base/race_checker.cc (right): > > https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc#newcode20 > ...

4 years, 3 months ago (2016-09-20 20:15:29 UTC) #20

On 2016/09/20 20:01:32, kwiberg-webrtc wrote:
>
https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc
> File webrtc/base/race_checker.cc (right):
> 
>
https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker....
> webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual() check
> to return false at a later point.
> On 2016/09/20 19:36:02, pbos-webrtc wrote:
> > On 2016/09/20 19:29:07, kwiberg-webrtc wrote:
> > > It may be incremented once instead of twice, possibly followed by both
> threads
> > > writing to accessing_thread_. A similar race may occur if two threads try
to
> > > leave the critical section at the same time, or if one leaves at the same
> time
> > > as one enters. The only possible bad effect of a race should be that we
fail
> > to
> > > detect a race.
> > > 
> > > However, this assumes that all reads of accessing_thread_ retrieve a value
> > that
> > > was previously written. This is not guaranteed to be the case! E.g. if
> > > PlatformThreadRef is 64 bits and we have a 32-bit system, the write may
> happen
> > > in two parts.
> > > 
> > > I'd be more comfortable if RaceChecker used atomics so that it itself was
> race
> > > free, or if it was only enabled in debug builds. I don't expect that
> > problematic
> > > behavior is very likely, though, so fixing this isn't a high priority.
> > 
> > If two threads use it, and anyone at any point writes garbage into
> access_count_
> > (or generate -1 by double-decrementing or whatever) then the thread ref
check
> > will fail for either thread A or B in the future since they won't set
> > accessing_thread_ even though they use the critical section.
> 
> Yes, garbage in access_count_ shouldn't be a problem. At most, it'll cause us
to
> fail to detect a race. It's garbage in accessing_thread_ that could be bad. On
> Windows, we end up comparing such values as plain integers, which is safe even
> if they contain garbage, but where we use pthreads we compare them with
> pthread_equal(). If that function e.g. interprets (parts of) its two pthread_t
> arguments as indexes and uses those indexes to look stuff up in memory, we
have
> a problem.

If this is a crash, we're fine. If it ends up reformatting the drive that's less
fine. But remember that this code is *only* racy when the program using it is
racy, and if the program running this is racy we have the same problems, just
outside this class, right?

> > Atomics cause synchronization and could also potentially make a data race
less
> > likely to be found, plus it adds non-zero overhead, especially on ARM
> platforms
> > etc.
> 
> Yes on both counts, although I wouldn't suspect that atomic instructions would
> cause data races to be all that much less likely to occur. Did you try it and
> find a nonnegligible effect?

Nah, just know that it might flush cache lines etc, which might align threads I
guess. This is very hand-wavy, theoretical and mostly bogus. Possibly.

> Implementing this with proper atomics would make it exactly equivalent to a
> mutex that explodes whenever there is contention (i.e. in exactly the
> circumstances where a real mutex would busy-wait or sleep). That seems like it
> ought to work just fine.
> 
> > The idea was that this should be used in release as well, and the cost being
> > that this is more probabilistic than deterministic in detection.
> 
> OK. That may serve as a convincing argument for why using a mutex or even
> atomics might be too expensive.

It was also shut down in the review landing this RaceChecker, I originally did
it with atomics. :)

> > Though I doubt
> > it has a very high false negative rate, unless a compiler manages to
> completely
> > miscompile it since it's not supposed to be racy.
> 
> The compiler doesn't need to miscompile this code---if you have a race, that's
> undefined behavior, so the compiler is fully within its rights to produce code
> that behaves very badly if a race occurs. This code relies on compiler
> implementations and hardware to be nice. As a result, you have to know a bunch
> of low-level details about the various processors, OSes, and compilers that we
> use in order to assess if it's safe.

These optimizations are what I mean as miscompiling. But they're only
miscompilations if the underlying program is racy (regardless of the race
checker), since it should protect thread-unsafe areas. In that hypothetical case
we're not guaranteed to be able to detect races (since the whole class could be
optimized away, theoretically).

the sun

On 2016/09/20 20:15:29, pbos-webrtc wrote: > On 2016/09/20 20:01:32, kwiberg-webrtc wrote: > > > https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc ...

4 years, 3 months ago (2016-09-20 20:25:00 UTC) #21

On 2016/09/20 20:15:29, pbos-webrtc wrote:
> On 2016/09/20 20:01:32, kwiberg-webrtc wrote:
> >
>
https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc
> > File webrtc/base/race_checker.cc (right):
> > 
> >
>
https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker....
> > webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual()
check
> > to return false at a later point.
> > On 2016/09/20 19:36:02, pbos-webrtc wrote:
> > > On 2016/09/20 19:29:07, kwiberg-webrtc wrote:
> > > > It may be incremented once instead of twice, possibly followed by both
> > threads
> > > > writing to accessing_thread_. A similar race may occur if two threads
try
> to
> > > > leave the critical section at the same time, or if one leaves at the
same
> > time
> > > > as one enters. The only possible bad effect of a race should be that we
> fail
> > > to
> > > > detect a race.
> > > > 
> > > > However, this assumes that all reads of accessing_thread_ retrieve a
value
> > > that
> > > > was previously written. This is not guaranteed to be the case! E.g. if
> > > > PlatformThreadRef is 64 bits and we have a 32-bit system, the write may
> > happen
> > > > in two parts.
> > > > 
> > > > I'd be more comfortable if RaceChecker used atomics so that it itself
was
> > race
> > > > free, or if it was only enabled in debug builds. I don't expect that
> > > problematic
> > > > behavior is very likely, though, so fixing this isn't a high priority.
> > > 
> > > If two threads use it, and anyone at any point writes garbage into
> > access_count_
> > > (or generate -1 by double-decrementing or whatever) then the thread ref
> check
> > > will fail for either thread A or B in the future since they won't set
> > > accessing_thread_ even though they use the critical section.
> > 
> > Yes, garbage in access_count_ shouldn't be a problem. At most, it'll cause
us
> to
> > fail to detect a race. It's garbage in accessing_thread_ that could be bad.
On
> > Windows, we end up comparing such values as plain integers, which is safe
even
> > if they contain garbage, but where we use pthreads we compare them with
> > pthread_equal(). If that function e.g. interprets (parts of) its two
pthread_t
> > arguments as indexes and uses those indexes to look stuff up in memory, we
> have
> > a problem.
> 
> If this is a crash, we're fine. If it ends up reformatting the drive that's
less
> fine. But remember that this code is *only* racy when the program using it is
> racy, and if the program running this is racy we have the same problems, just
> outside this class, right?
> 
> > > Atomics cause synchronization and could also potentially make a data race
> less
> > > likely to be found, plus it adds non-zero overhead, especially on ARM
> > platforms
> > > etc.
> > 
> > Yes on both counts, although I wouldn't suspect that atomic instructions
would
> > cause data races to be all that much less likely to occur. Did you try it
and
> > find a nonnegligible effect?
> 
> Nah, just know that it might flush cache lines etc, which might align threads
I
> guess. This is very hand-wavy, theoretical and mostly bogus. Possibly.
> 
> > Implementing this with proper atomics would make it exactly equivalent to a
> > mutex that explodes whenever there is contention (i.e. in exactly the
> > circumstances where a real mutex would busy-wait or sleep). That seems like
it
> > ought to work just fine.
> > 
> > > The idea was that this should be used in release as well, and the cost
being
> > > that this is more probabilistic than deterministic in detection.
> > 
> > OK. That may serve as a convincing argument for why using a mutex or even
> > atomics might be too expensive.
> 
> It was also shut down in the review landing this RaceChecker, I originally did
> it with atomics. :)
> 
> > > Though I doubt
> > > it has a very high false negative rate, unless a compiler manages to
> > completely
> > > miscompile it since it's not supposed to be racy.
> > 
> > The compiler doesn't need to miscompile this code---if you have a race,
that's
> > undefined behavior, so the compiler is fully within its rights to produce
code
> > that behaves very badly if a race occurs. This code relies on compiler
> > implementations and hardware to be nice. As a result, you have to know a
bunch
> > of low-level details about the various processors, OSes, and compilers that
we
> > use in order to assess if it's safe.
> 
> These optimizations are what I mean as miscompiling. But they're only
> miscompilations if the underlying program is racy (regardless of the race
> checker), since it should protect thread-unsafe areas. In that hypothetical
case
> we're not guaranteed to be able to detect races (since the whole class could
be
> optimized away, theoretically).

I'm mostly ok with the tradeoffs in the RaceChecker design. Finding races using
run time primitives is by nature a probabilistic business, and the RaceChecker
seems more likely to trip over and signal something, rather than hide it,
whenever it is hit by its own raciness. So, yeah, it's a bit scary, but I
believe it does the job.

What do you guys think about the comment I added? Does it improve understanding
to the extent that later users won't have to go through these similar
discussions?

kwiberg-webrtc

On 2016/09/20 20:15:29, pbos-webrtc wrote: > > Yes, garbage in access_count_ shouldn't be a problem. ...

4 years, 3 months ago (2016-09-20 20:33:02 UTC) #22

On 2016/09/20 20:15:29, pbos-webrtc wrote:

> > Yes, garbage in access_count_ shouldn't be a problem. At most,
> > it'll cause us to fail to detect a race. It's garbage in
> > accessing_thread_ that could be bad. On Windows, we end up
> > comparing such values as plain integers, which is safe even if
> > they contain garbage, but where we use pthreads we compare them
> > with pthread_equal(). If that function e.g. interprets (parts of)
> > its two pthread_t arguments as indexes and uses those indexes to
> > look stuff up in memory, we have a problem.
>
> If this is a crash, we're fine. If it ends up reformatting the drive
> that's less fine. But remember that this code is *only* racy when
> the program using it is racy, and if the program running this is
> racy we have the same problems, just outside this class, right?

Yes. But there's no guarantee that adding more racy code to an already
racy situation won't make it worse. (Not saying that I think this
implementation is *likely* to cause bad things to happen, just that I
don't know it doesn't.)

> > The compiler doesn't need to miscompile this code---if you have a
> > race, that's undefined behavior, so the compiler is fully within
> > its rights to produce code that behaves very badly if a race
> > occurs. This code relies on compiler implementations and hardware
> > to be nice. As a result, you have to know a bunch of low-level
> > details about the various processors, OSes, and compilers that we
> > use in order to assess if it's safe.
>
> These optimizations are what I mean as miscompiling. But they're
> only miscompilations if the underlying program is racy (regardless
> of the race checker), since it should protect thread-unsafe areas.
> In that hypothetical case we're not guaranteed to be able to detect
> races (since the whole class could be optimized away,
> theoretically).

Oh, OK. The meaning of "miscompilation" that I'm used to seeing is
"the compiler produces code that violates the standard".

The compiler is unlikely to optimize the whole class away. To do so,
it would need to prove (at compile time) that a race is sure to occur.
And maybe not even then---I don't think the code is allowed to do
arbitrary stuff just because undefined behavior will occur in the
future. Although technically, the compiler is allowed to produce code
that will change the past if undefined behavior occurs... not sure if
the standard mentions this. :-)

pbos-webrtc

On 2016/09/20 20:33:02, kwiberg-webrtc wrote: > Yes. But there's no guarantee that adding more racy ...

4 years, 3 months ago (2016-09-20 20:37:33 UTC) #23

pbos-webrtc

On 2016/09/20 20:37:33, pbos-webrtc wrote: > On 2016/09/20 20:33:02, kwiberg-webrtc wrote: > > Yes. But ...

4 years, 3 months ago (2016-09-20 20:38:08 UTC) #24

kwiberg-webrtc

On 2016/09/20 20:25:00, the sun wrote: > On 2016/09/20 20:15:29, pbos-webrtc wrote: > > On ...

4 years, 3 months ago (2016-09-20 20:40:05 UTC) #25

On 2016/09/20 20:25:00, the sun wrote:
> On 2016/09/20 20:15:29, pbos-webrtc wrote:
> > On 2016/09/20 20:01:32, kwiberg-webrtc wrote:
> > >
> >
>
https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker.cc
> > > File webrtc/base/race_checker.cc (right):
> > > 
> > >
> >
>
https://codereview.webrtc.org/2350663002/diff/40001/webrtc/base/race_checker....
> > > webrtc/base/race_checker.cc:20: // twice, causing the IsThreadRefEqual()
> check
> > > to return false at a later point.
> > > On 2016/09/20 19:36:02, pbos-webrtc wrote:
> > > > On 2016/09/20 19:29:07, kwiberg-webrtc wrote:
> > > > > It may be incremented once instead of twice, possibly followed by both
> > > threads
> > > > > writing to accessing_thread_. A similar race may occur if two threads
> try
> > to
> > > > > leave the critical section at the same time, or if one leaves at the
> same
> > > time
> > > > > as one enters. The only possible bad effect of a race should be that
we
> > fail
> > > > to
> > > > > detect a race.
> > > > > 
> > > > > However, this assumes that all reads of accessing_thread_ retrieve a
> value
> > > > that
> > > > > was previously written. This is not guaranteed to be the case! E.g. if
> > > > > PlatformThreadRef is 64 bits and we have a 32-bit system, the write
may
> > > happen
> > > > > in two parts.
> > > > > 
> > > > > I'd be more comfortable if RaceChecker used atomics so that it itself
> was
> > > race
> > > > > free, or if it was only enabled in debug builds. I don't expect that
> > > > problematic
> > > > > behavior is very likely, though, so fixing this isn't a high priority.
> > > > 
> > > > If two threads use it, and anyone at any point writes garbage into
> > > access_count_
> > > > (or generate -1 by double-decrementing or whatever) then the thread ref
> > check
> > > > will fail for either thread A or B in the future since they won't set
> > > > accessing_thread_ even though they use the critical section.
> > > 
> > > Yes, garbage in access_count_ shouldn't be a problem. At most, it'll cause
> us
> > to
> > > fail to detect a race. It's garbage in accessing_thread_ that could be
bad.
> On
> > > Windows, we end up comparing such values as plain integers, which is safe
> even
> > > if they contain garbage, but where we use pthreads we compare them with
> > > pthread_equal(). If that function e.g. interprets (parts of) its two
> pthread_t
> > > arguments as indexes and uses those indexes to look stuff up in memory, we
> > have
> > > a problem.
> > 
> > If this is a crash, we're fine. If it ends up reformatting the drive that's
> less
> > fine. But remember that this code is *only* racy when the program using it
is
> > racy, and if the program running this is racy we have the same problems,
just
> > outside this class, right?
> > 
> > > > Atomics cause synchronization and could also potentially make a data
race
> > less
> > > > likely to be found, plus it adds non-zero overhead, especially on ARM
> > > platforms
> > > > etc.
> > > 
> > > Yes on both counts, although I wouldn't suspect that atomic instructions
> would
> > > cause data races to be all that much less likely to occur. Did you try it
> and
> > > find a nonnegligible effect?
> > 
> > Nah, just know that it might flush cache lines etc, which might align
threads
> I
> > guess. This is very hand-wavy, theoretical and mostly bogus. Possibly.
> > 
> > > Implementing this with proper atomics would make it exactly equivalent to
a
> > > mutex that explodes whenever there is contention (i.e. in exactly the
> > > circumstances where a real mutex would busy-wait or sleep). That seems
like
> it
> > > ought to work just fine.
> > > 
> > > > The idea was that this should be used in release as well, and the cost
> being
> > > > that this is more probabilistic than deterministic in detection.
> > > 
> > > OK. That may serve as a convincing argument for why using a mutex or even
> > > atomics might be too expensive.
> > 
> > It was also shut down in the review landing this RaceChecker, I originally
did
> > it with atomics. :)
> > 
> > > > Though I doubt
> > > > it has a very high false negative rate, unless a compiler manages to
> > > completely
> > > > miscompile it since it's not supposed to be racy.
> > > 
> > > The compiler doesn't need to miscompile this code---if you have a race,
> that's
> > > undefined behavior, so the compiler is fully within its rights to produce
> code
> > > that behaves very badly if a race occurs. This code relies on compiler
> > > implementations and hardware to be nice. As a result, you have to know a
> bunch
> > > of low-level details about the various processors, OSes, and compilers
that
> we
> > > use in order to assess if it's safe.
> > 
> > These optimizations are what I mean as miscompiling. But they're only
> > miscompilations if the underlying program is racy (regardless of the race
> > checker), since it should protect thread-unsafe areas. In that hypothetical
> case
> > we're not guaranteed to be able to detect races (since the whole class could
> be
> > optimized away, theoretically).
> 
> I'm mostly ok with the tradeoffs in the RaceChecker design. Finding races
using
> run time primitives is by nature a probabilistic business, and the RaceChecker
> seems more likely to trip over and signal something, rather than hide it,
> whenever it is hit by its own raciness. So, yeah, it's a bit scary, but I
> believe it does the job.

Me too.

> What do you guys think about the comment I added? Does it improve
understanding
> to the extent that later users won't have to go through these similar
> discussions?

It's a bit too specific. Maybe something like

// Note that the implementation here is in itself racy, but we pretend it does
// not matter because we don't want to pay the cost of using atomics. A race
// may cause the checker to scream bloody murder in an unexpected way, or (more
// unlikely) cause it to fail to detect a race. Technically, it is possible
// that a race would e.g. cause the checker to crash or even worse if there is
// a race, but this doesn't seem likely to happen in practice.

kwiberg-webrtc

On 2016/09/20 20:37:33, pbos-webrtc wrote: > But I think the bikeshed is sufficiently red by ...

4 years, 3 months ago (2016-09-20 20:41:44 UTC) #26

kwiberg-webrtc

lgtm, with a suggestion https://codereview.webrtc.org/2350663002/diff/60001/webrtc/base/race_checker.cc File webrtc/base/race_checker.cc (right): https://codereview.webrtc.org/2350663002/diff/60001/webrtc/base/race_checker.cc#newcode22 webrtc/base/race_checker.cc:22: // spot where a race ...

4 years, 3 months ago (2016-09-22 20:12:18 UTC) #28

the sun

The patchset sent to the CQ was uploaded after l-g-t-m from pbos@webrtc.org, tommi@webrtc.org, kwiberg@webrtc.org Link ...

4 years, 3 months ago (2016-09-23 08:09:52 UTC) #31