|
|
Created:
3 years, 10 months ago by tommi Modified:
3 years, 10 months ago Reviewers:
Taylor Brandstetter CC:
webrtc-reviews_webrtc.org, tterriberry_mozilla.com, qiang.lu, niklas.enbom, yujie_mao (webrtc), peah-webrtc Target Ref:
refs/heads/master Project:
webrtc Visibility:
Public. |
DescriptionFix potential deadlock in TaskQueue's libevent PostTaskAndReply implementation
BUG=webrtc:7188
Review-Url: https://codereview.webrtc.org/2709603002
Cr-Commit-Position: refs/heads/master@{#16786}
Committed: https://chromium.googlesource.com/external/webrtc/+/8c80c6e389d235d259a39e0114316447aa228092
Patch Set 1 #Patch Set 2 : Revert webrtc.gni #Patch Set 3 : Retain the correct (reply) queue on Mac for PostTaskAndReply #Patch Set 4 : Add missing include #
Total comments: 2
Patch Set 5 : Update comment #
Total comments: 14
Patch Set 6 : Address comments #
Messages
Total messages: 29 (13 generated)
Revert webrtc.gni
tommi@webrtc.org changed reviewers: + deadbeef@webrtc.org
The CQ bit was checked by tommi@webrtc.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_compile_dbg on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_compile_dbg/build...) linux_libfuzzer_rel on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_libfuzzer_rel/bui...) linux_more_configs on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_more_configs/buil...)
On 2017/02/20 22:36:09, tommi (webrtc) wrote: eh, looks like the trybots are blushing. I'll ping back when I've fixed things :)
Retain the correct (reply) queue on Mac for PostTaskAndReply
Add missing include
The CQ bit was checked by tommi@webrtc.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
OK, Taylor - please take a look
If you're going with the ref-counted approach, this seems more complex than it needs to be. The ref-counting prevents TaskQueue from accessing PostAndReplyTask after it's deleted, and the lock + "OnReplyQueueGone" prevents PostAndReplyTask from accessing TaskQueue after it's deleted. So is there a reason we can't just do this: https://codereview.webrtc.org/2706923005 https://codereview.webrtc.org/2709603002/diff/60001/webrtc/base/task_queue_li... File webrtc/base/task_queue_libevent.cc (right): https://codereview.webrtc.org/2709603002/diff/60001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:97: // * The ReplyTaskOwner owns the reply_ task. nit: Enclose variables in '|'s consistently
Update comment
On 2017/02/21 20:24:45, Taylor Brandstetter wrote: > If you're going with the ref-counted approach, this seems more complex than it > needs to be. The ref-counting prevents TaskQueue from accessing PostAndReplyTask > after it's deleted, and the lock + "OnReplyQueueGone" prevents PostAndReplyTask > from accessing TaskQueue after it's deleted. So is there a reason we can't just > do this: https://codereview.webrtc.org/2706923005 At first I thought that looked like a simpler approach but after taking a closer look, I'm not sure it actually is. If I compare the two approaches, let's call this one A and the other one B: * A and B incorporate a refcounted object shared between both queues * A uses the pipe to signal that the prepared reply task should be run. (PostTask would do this internally anyway) * B uses posttask for the reply task which includes grabbing the reply queue's lock. * B additionally needs PostAndReplyTaskDone which is one more lock (A removes ReplyTaskDone) * B adds a new teardown state that needs to be checked outside of the destructor * B makes destruction of pending_replies_ more complex whereas A simplifies it * With B, PostAndReplyTaskDone now sometimes removes the task but not always (perhaps checking destroying_ would be in order to trap errors). * A removes the lock from PostAndReplyTask * A adds a new message type, kRunReplyTask. The SIGPIPE problem with A isn't ideal, but it's common practice to ignore this signal (in fact some OSen don't signal it when closing()). However, I think approach A actually simplifies several things and I kinda like reducing the number of lock operations. Wdyt? > > https://codereview.webrtc.org/2709603002/diff/60001/webrtc/base/task_queue_li... > File webrtc/base/task_queue_libevent.cc (right): > > https://codereview.webrtc.org/2709603002/diff/60001/webrtc/base/task_queue_li... > webrtc/base/task_queue_libevent.cc:97: // * The ReplyTaskOwner owns the reply_ > task. > nit: Enclose variables in '|'s consistently
https://codereview.webrtc.org/2709603002/diff/60001/webrtc/base/task_queue_li... File webrtc/base/task_queue_libevent.cc (right): https://codereview.webrtc.org/2709603002/diff/60001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:97: // * The ReplyTaskOwner owns the reply_ task. On 2017/02/21 20:24:44, Taylor Brandstetter wrote: > nit: Enclose variables in '|'s consistently Done.
https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... File webrtc/base/task_queue_libevent.cc (right): https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:36: // We can run into that situation and handle it (e.g. reply tasks that don't It's not clear to me what "We can run into that situation and handle it" means. Does that mean "we handle this by turning off the signal"? https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:123: RTC_DCHECK(reply_); Any reason the DCHECK is in Run and not the constructor? https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:157: write(reply_pipe_, &message, sizeof(message)); Could you leave a comment here explaining that if the PostAndReplyTask is destroyed before running, the reply task won't be run due to "set_should_run_task" not being called? Also, possibly rename "kRunReplyTask" to "kMaybeRunReplyTask" or something that indicates that it won't always result in the task being run. https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:230: IgnoreSigPipeSignalOnCurrentThread(); So, does this mean after the first TaskQueue is destroyed, these signals will be permanently turned off? What if an application using webrtc is depending on these signals? Or is that extremely unlikely? https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... File webrtc/base/task_queue_unittest.cc (right): https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... webrtc/base/task_queue_unittest.cc:46: Event event(false, false); These tests still should put the Event above the TaskQueue, so that if "event.Wait(1000)" fails, the event isn't accessed after being destroyed. Though maybe, since the test will fail anyway, it doesn't matter if it crashes? https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... webrtc/base/task_queue_unittest.cc:215: static const char kPostQueue[] = "PostQueue"; Out of curiosity, what's the purpose of having "static const char kPostQueue[]" in each test, as opposed to just 'TaskQueue post_queue("PostQueue");'? https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... webrtc/base/task_queue_unittest.cc:217: std::unique_ptr<TaskQueue> post_queue(new TaskQueue(kPostQueue)); These don't need to be unique_ptrs
Address comments
The CQ bit was checked by tommi@webrtc.org to run a CQ dry run
https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... File webrtc/base/task_queue_libevent.cc (right): https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:36: // We can run into that situation and handle it (e.g. reply tasks that don't On 2017/02/22 00:24:27, Taylor Brandstetter wrote: > It's not clear to me what "We can run into that situation and handle it" means. > Does that mean "we handle this by turning off the signal"? thanks, yeah this isn't clear. "handle" should be "ignore". Comment rephrased. In case you're curious about how this is usually "handled": http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-t... https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:123: RTC_DCHECK(reply_); On 2017/02/22 00:24:27, Taylor Brandstetter wrote: > Any reason the DCHECK is in Run and not the constructor? In case Run() ever gets called twice and reply_.release() ran. I added an explicit call to .reset() to make the intention clearer. https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:157: write(reply_pipe_, &message, sizeof(message)); On 2017/02/22 00:24:27, Taylor Brandstetter wrote: > Could you leave a comment here explaining that if the PostAndReplyTask is > destroyed before running, the reply task won't be run due to > "set_should_run_task" not being called? Done. > Also, possibly rename "kRunReplyTask" to "kMaybeRunReplyTask" or something that > indicates that it won't always result in the task being run. I renamed it, but then figured that whether or not the task runs, is an implementation detail of ReplyTaskOwner. So where kRunReplyTask is handled, we always call the Run() method and it could be confusing to see that combined with a 'maybe' in the constant. So, for now keeping as is. https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_li... webrtc/base/task_queue_libevent.cc:230: IgnoreSigPipeSignalOnCurrentThread(); On 2017/02/22 00:24:27, Taylor Brandstetter wrote: > So, does this mean after the first TaskQueue is destroyed, these signals will be > permanently turned off? What if an application using webrtc is depending on > these signals? Or is that extremely unlikely? Permanently turned off for the current thread, not the entire application. I did consider turning them off for the entire process, but went with the approach with fewer side effects. Having said that, the common thing to do, is to ignore this signal and instead handle error return values from functions. All multithreaded code that I've seen that uses libevent, ignores it (see e.g. https://sourceforge.net/p/levent/bugs/148/). So I think it's unlikely that an application will want to handle the signal in a signal handler rather then by return value, but still opted for the more conservative approach of only affecting threads on which TaskQueues run. https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... File webrtc/base/task_queue_unittest.cc (right): https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... webrtc/base/task_queue_unittest.cc:46: Event event(false, false); On 2017/02/22 00:24:27, Taylor Brandstetter wrote: > These tests still should put the Event above the TaskQueue, so that if > "event.Wait(1000)" fails, the event isn't accessed after being destroyed. Though > maybe, since the test will fail anyway, it doesn't matter if it crashes? Ah, I was wondering why you moved the event objects in the other cl. Good point. If we crash, we could prevent other tests from running, so Done. https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... webrtc/base/task_queue_unittest.cc:215: static const char kPostQueue[] = "PostQueue"; On 2017/02/22 00:24:27, Taylor Brandstetter wrote: > Out of curiosity, what's the purpose of having "static const char kPostQueue[]" > in each test, as opposed to just 'TaskQueue post_queue("PostQueue");'? When I started writing these tests, the constant variable name was the same (kQueueName) but I changed its value from test to test and I checked that the name of the queue was the same as the constant specified (just so that the literal string would only be in one place). Then that pattern was repeated as more tests were added. For many of these tests today, that's not necessary. https://codereview.webrtc.org/2709603002/diff/80001/webrtc/base/task_queue_un... webrtc/base/task_queue_unittest.cc:217: std::unique_ptr<TaskQueue> post_queue(new TaskQueue(kPostQueue)); On 2017/02/22 00:24:28, Taylor Brandstetter wrote: > These don't need to be unique_ptrs Done. I made them unique_ptrs while debugging so that I could make the deletion explicit rather than implicit and be able to put a breakpoint on either the .reset() or the line immediately following.
lgtm
The CQ bit was checked by tommi@webrtc.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
CQ is committing da patch. Bot data: {"patchset_id": 100001, "attempt_start_ts": 1487835949696600, "parent_rev": "64b6d0781fbfce17c98e1dfaf54f31d8f776ba29", "commit_rev": "8c80c6e389d235d259a39e0114316447aa228092"}
Message was sent while issue was closed.
Description was changed from ========== Fix potential deadlock in TaskQueue's libevent PostTaskAndReply implementation BUG=webrtc:7188 ========== to ========== Fix potential deadlock in TaskQueue's libevent PostTaskAndReply implementation BUG=webrtc:7188 Review-Url: https://codereview.webrtc.org/2709603002 Cr-Commit-Position: refs/heads/master@{#16786} Committed: https://chromium.googlesource.com/external/webrtc/+/8c80c6e389d235d259a39e011... ==========
Message was sent while issue was closed.
Committed patchset #6 (id:100001) as https://chromium.googlesource.com/external/webrtc/+/8c80c6e389d235d259a39e011... |