|
|
Created:
4 years, 3 months ago by aluebs-webrtc Modified:
4 years, 3 months ago CC:
webrtc-reviews_webrtc.org, kwiberg-webrtc, Andrew MacDonald, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, peah-webrtc, minyue-webrtc, the sun, bjornv1 Target Ref:
refs/pending/heads/master Project:
webrtc Visibility:
Public. |
DescriptionCompensate for the IntelligibilityEnhancer processing delay in high bands
Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this.
BUG=b/30780909
R=henrik.lundin@webrtc.org, peah@webrtc.org
Committed: https://chromium.googlesource.com/external/webrtc/+/ef00925cd0d41afbc8b7599704ddbf399ad82950
Patch Set 1 #
Total comments: 12
Patch Set 2 : Fix glitches #
Total comments: 27
Patch Set 3 : Add delay test #
Total comments: 12
Patch Set 4 : Implement DelayBuffer helper class #
Total comments: 8
Patch Set 5 : optimize #
Total comments: 4
Patch Set 6 : Only wrap once #Patch Set 7 : Rebasing #
Total comments: 2
Patch Set 8 : Improve comment #Messages
Total messages: 39 (16 generated)
aluebs@webrtc.org changed reviewers: + henrik.lundin@webrtc.org, peah@webrtc.org
https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:119: << 100.f * static_cast<float>(num_active_chunks_) / num_chunks_ This will cause an exception if IE is destroyed right after having been created, right (due to the division by 0)? Please add proper handling of that, and/or add a checked division. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:136: int sample_rate_hz) { Is it really necessary to pass the sample rate to this call? It is only used for checking that is invariant. No other submodule inside APM does this and since you anyway change the parameters of the method it would make sense to change that at the same time. Also, should it not now be possible to use the sample_rate() method from AudioBuffer? https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:138: RTC_CHECK_EQ(num_render_channels_, audio->num_channels()); This looks like something that there should be a DCHECK for instead. Same thing goes for the sample rate CHECK. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:145: render_mangler_->ProcessChunk(in_low_band, out_low_band); With this approach, you only let the audio pass through the IE processing when is_active is true, right? And is_active is true when the SNR is small, right? So am I correct in that IE can toggle between applying and not applying the IE effect during a call? The drawback with that approach is that each time this toggling happens, there is a click. The correct way is instead to apply the processing all the time, but with a fully transparent effect if is_active is false. That will not cause clicks. If you want to go back and forth between having the effect on and off you need to do that in a smooth manner so that there is no click. But since the delay is changed, I'd definitely prefer the above approach. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:146: if (is_active_) { You cannot turn this on/off during the call as that will cause a glitch. Either you have it on or off, but do not toggle between those states. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:378: if (num_bands != high_bands_buffers_.size() + 1u) { Please put the initialization of this in the constructor instead. As it is now, it constitutes such a big part of the method that it hides what it really does.
Patchset #2 (id:20001) has been deleted
https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:119: << 100.f * static_cast<float>(num_active_chunks_) / num_chunks_ On 2016/09/09 09:12:16, peah-webrtc wrote: > This will cause an exception if IE is destroyed right after having been created, > right (due to the division by 0)? > Please add proper handling of that, and/or add a checked division. > Done. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:136: int sample_rate_hz) { On 2016/09/09 09:12:16, peah-webrtc wrote: > Is it really necessary to pass the sample rate to this call? It is only used for > checking that is invariant. > No other submodule inside APM does this and since you anyway change the > parameters of the method it would make sense to change that at the same time. > > Also, should it not now be possible to use the sample_rate() method from > AudioBuffer? > AudioBuffer doesn't have a sample_rate() method. But dropped the checking and input parameter. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:138: RTC_CHECK_EQ(num_render_channels_, audio->num_channels()); On 2016/09/09 09:12:16, peah-webrtc wrote: > This looks like something that there should be a DCHECK for instead. Same thing > goes for the sample rate CHECK. > Done. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:145: render_mangler_->ProcessChunk(in_low_band, out_low_band); On 2016/09/09 09:12:16, peah-webrtc wrote: > With this approach, you only let the audio pass through the IE processing when > is_active is true, right? And is_active is true when the SNR is small, right? > So am I correct in that IE can toggle between applying and not applying the IE > effect during a call? > > The drawback with that approach is that each time this toggling happens, there > is a click. The correct way is instead to apply the processing all the time, but > with a fully transparent effect if is_active is false. That will not cause > clicks. > > If you want to go back and forth between having the effect on and off you need > to do that in a smooth manner so that there is no click. But since the delay is > changed, I'd definitely prefer the above approach. Good catch. I am not sure why I wrongly assumed this would be smoothed out by the windowing. Now it always goes a transparent processing that introduces the blocking delay. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:146: if (is_active_) { On 2016/09/09 09:12:16, peah-webrtc wrote: > You cannot turn this on/off during the call as that will cause a glitch. Either > you have it on or off, but do not toggle between those states. Same as above. https://codereview.webrtc.org/2320833002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:378: if (num_bands != high_bands_buffers_.size() + 1u) { On 2016/09/09 09:12:16, peah-webrtc wrote: > Please put the initialization of this in the constructor instead. As it is now, > it constitutes such a big part of the method that it hides what it really does. Good point. I thought it needed to be dynamic, but actually it doesn't, because the APM will rebuild the whole IE if the sample_rate changes. My bad. Moved to constructor.
https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:115: const size_t initial_delay = render_mangler_->initial_delay(); Have you verified that this is indeed the delay introduced by the IE processing? https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:128: LOG(LS_INFO) << "Intelligibility Enhancer was active for " I think it would be good to have a log line for the case when num_chunks_ == 0 as well. When analyzing logs to find errors, missing log-lines is a source of confusion. WDYT? https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:190: SnrBasedEffectActivation(); I think it makes sense. But please comment in the CL description on why you moved this, as it impacts the processing and is not at all mentioned in the description of the CL. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:205: const int kSamples = 10000; What is the motivation behind the changed number of samples? https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:206: const int kSampleRate = 4000; What is the purpose of the samplerate constant of 4000? https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:336: bool CheckUpdate() { I'd like a more descriptive name here. This method actually processes the audio using the IE and checks whether the resulting output has been processed. I think the readability of the test code below would benefit a lot from having a more descriptive name that matches what is actually done in the method. WDYT? https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:339: float* clear_cursor = clear_data_.data(); The name cursor I think is quite misleading as I cannot see the connection between a cursor and a buffer position. I know it was present in the former code but I think that it would be good to change it here. What about about clear_buffer_position? WDYT? https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:348: if (std::fabs(clear_data_[i] - orig_data_[i - initial_delay_]) > As far as I can see, this does not verify that the delay of the upper bands is correct as well as the delay for the lower bands. Please add a test for that. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:357: AudioBuffer clear_buffer_; What is the reason for this name? Please explain, either in comments or by a more descriptive variable name what it is that makes this buffer "clear". I guess you mean that it is noise-free but then it is better to name it "speech", "clean_speech", or "render_audio_buffer". https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:437: clear_buffer_.CopyFrom(&clear_cursor, stream_config_); Why don't you update the clear cursor counter here, in the same way as you do the line 345?
https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:115: const size_t initial_delay = render_mangler_->initial_delay(); On 2016/09/13 13:29:59, peah-webrtc wrote: > Have you verified that this is indeed the delay introduced by the IE processing? Yes. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:128: LOG(LS_INFO) << "Intelligibility Enhancer was active for " On 2016/09/13 13:29:59, peah-webrtc wrote: > I think it would be good to have a log line for the case when num_chunks_ == 0 > as well. When analyzing logs to find errors, missing log-lines is a source of > confusion. > WDYT? Done. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:190: SnrBasedEffectActivation(); On 2016/09/13 13:29:59, peah-webrtc wrote: > I think it makes sense. But please comment in the CL description on why you > moved this, as it impacts the processing and is not at all mentioned in the > description of the CL. Moved it back. This was left from when is_active_ was needed in ProcessRenderAudio to decide what pointer to pass in. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:205: const int kSamples = 10000; On 2016/09/13 13:30:00, peah-webrtc wrote: > What is the motivation behind the changed number of samples? The number samples before where not enough for the IE to adapt. The test passed because the test did not compensate for the delay introduced by the IE. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:206: const int kSampleRate = 4000; On 2016/09/13 13:30:00, peah-webrtc wrote: > What is the purpose of the samplerate constant of 4000? To set a sample rate for the tests. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:336: bool CheckUpdate() { On 2016/09/13 13:30:00, peah-webrtc wrote: > I'd like a more descriptive name here. This method actually processes the audio > using the IE and checks whether the resulting output has been processed. I think > the readability of the test code below would benefit a lot from having a more > descriptive name that matches what is actually done in the method. WDYT? I think that it is a great idea, but I don't think we should clutter this CL with naming, since it is not related. It should be done in a separate CL. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:339: float* clear_cursor = clear_data_.data(); On 2016/09/13 13:30:00, peah-webrtc wrote: > The name cursor I think is quite misleading as I cannot see the connection > between a cursor and a buffer position. I know it was present in the former code > but I think that it would be good to change it here. What about about > clear_buffer_position? WDYT? See above about naming. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:348: if (std::fabs(clear_data_[i] - orig_data_[i - initial_delay_]) > On 2016/09/13 13:30:00, peah-webrtc wrote: > As far as I can see, this does not verify that the delay of the upper bands is > correct as well as the delay for the lower bands. Please add a test for that. Added test. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:357: AudioBuffer clear_buffer_; On 2016/09/13 13:30:00, peah-webrtc wrote: > What is the reason for this name? Please explain, either in comments or by a > more descriptive variable name what it is that makes this buffer "clear". I > guess you mean that it is noise-free but then it is better to name it "speech", > "clean_speech", or "render_audio_buffer". To be consistent with the naming in the test I am leaving it as is. But added comment. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:437: clear_buffer_.CopyFrom(&clear_cursor, stream_config_); On 2016/09/13 13:30:00, peah-webrtc wrote: > Why don't you update the clear cursor counter here, in the same way as you do > the line 345? Because it is irrelevant to the Noise PSD estimation, but the number of frames to process is higher than the actual buffer size.
https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:206: const int kSampleRate = 4000; On 2016/09/14 00:35:54, aluebs-webrtc wrote: > On 2016/09/13 13:30:00, peah-webrtc wrote: > > What is the purpose of the samplerate constant of 4000? > > To set a sample rate for the tests. You mean that the test is running at a sample rate of 4 kHz? Why not use one of the native rates instead? 4 kHz is never used in practice when using the IE. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:336: bool CheckUpdate() { On 2016/09/14 00:35:54, aluebs-webrtc wrote: > On 2016/09/13 13:30:00, peah-webrtc wrote: > > I'd like a more descriptive name here. This method actually processes the > audio > > using the IE and checks whether the resulting output has been processed. I > think > > the readability of the test code below would benefit a lot from having a more > > descriptive name that matches what is actually done in the method. WDYT? > > I think that it is a great idea, but I don't think we should clutter this CL > with naming, since it is not related. It should be done in a separate CL. Acknowledged. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:437: clear_buffer_.CopyFrom(&clear_cursor, stream_config_); On 2016/09/14 00:35:54, aluebs-webrtc wrote: > On 2016/09/13 13:30:00, peah-webrtc wrote: > > Why don't you update the clear cursor counter here, in the same way as you do > > the line 345? > > Because it is irrelevant to the Noise PSD estimation, but the number of frames > to process is higher than the actual buffer size. I see, so basically you don't really care about what values are in clear_buffer since you only need to perform a number of ProcessRenderAudio calls in order to verify that the noise gain is what it should, right? I think this test code really would benefit from renaming or more commenting. The fact that the same buffer is passed several times to ProcessRenderAudio looks (incorrectly) like a clear bug as it is not the way IE is supposed to work in practice (e.g., being called multiple times with the same render input data). Another thing with this code that makes it looks buggy as that kNumFramesToProcess does not match the length of clear_buffer_. A comment would clarify that, or even more, a variable name of clear_buffer_ to dummy_input_frame_ or empty_input_frame_ would make this clearer. Furthermore, it would make sense not reuse clear_buffer for this purpose as the fact that it is much longer indicates that it should be used as a buffer and not as a dummy frame. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/common_audio/block... File webrtc/common_audio/blocker.cc (right): https://codereview.webrtc.org/2320833002/diff/60001/webrtc/common_audio/block... webrtc/common_audio/blocker.cc:214: if (output != nullptr) { Is this change still needed in this CL? https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:119: num_render_channels_, chunk_length_ + initial_delay))); It is a bit wasteful to throw up 2 AudioRingBuffers just for the purpose of delaying the signal. The overhead of that is that chunk_length_-1 more samples for each channel than needed are stored for each buffer. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:385: RTC_DCHECK_EQ(audio->num_bands(), high_bands_buffers_.size() + 1u); Suggestion: Since in this case high_bands_buffers_.size is 1 or two, I think it would be faster to not set up a for-loop but instead just make two if-statements. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:454: const size_t kTestFragmentSize = rtc::CheckedDivExact(kTestSampleRate, 100); In the IE you call this chunksize. Another common name is frame. But I've never heard fragment before. Is this something new? Please use a common naming scheme for this. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:455: const size_t kTestSplitSize = rtc::CheckedDivExact(kTestSplitRate, 100); Here as well, please use a more descriptive name, like split_chunk_size or something else. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:459: std::vector<float> tmp_buf(kTestFragmentSize); Please give this a more proper name as well. It is used for one purpose only, so it should be possible to rename it to a descriptive name.
https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:206: const int kSampleRate = 4000; On 2016/09/15 15:06:20, peah-webrtc wrote: > On 2016/09/14 00:35:54, aluebs-webrtc wrote: > > On 2016/09/13 13:30:00, peah-webrtc wrote: > > > What is the purpose of the samplerate constant of 4000? > > > > To set a sample rate for the tests. > > You mean that the test is running at a sample rate of 4 kHz? Why not use one of > the native rates instead? 4 kHz is never used in practice when using the IE. Other test (like the bitexactness and the one I just added) use native rates. I don't see anything wrong with also using different sample rates than used in practice. Plus it means having less samples until convergence, making the test quicker. But anyway, if we would decide to change it, this is not the CL for doing that. It is unrelated. https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:437: clear_buffer_.CopyFrom(&clear_cursor, stream_config_); On 2016/09/15 15:06:20, peah-webrtc wrote: > On 2016/09/14 00:35:54, aluebs-webrtc wrote: > > On 2016/09/13 13:30:00, peah-webrtc wrote: > > > Why don't you update the clear cursor counter here, in the same way as you > do > > > the line 345? > > > > Because it is irrelevant to the Noise PSD estimation, but the number of frames > > to process is higher than the actual buffer size. > > I see, so basically you don't really care about what values are in clear_buffer > since you only need to perform a number of ProcessRenderAudio calls in order to > verify that the noise gain is what it should, right? > > I think this test code really would benefit from renaming or more commenting. > The fact that the same buffer is passed several times to ProcessRenderAudio > looks (incorrectly) like a clear bug as it is not the way IE is supposed to work > in practice (e.g., being called multiple times with the same render input data). > > Another thing with this code that makes it looks buggy as that > kNumFramesToProcess does not match the length of clear_buffer_. > > > A comment would clarify that, or even more, a variable name of clear_buffer_ to > dummy_input_frame_ or empty_input_frame_ would make this clearer. > Furthermore, it would make sense not reuse clear_buffer for this purpose as the > fact that it is much longer indicates that it should be used as a buffer and not > as a dummy frame. > Agreed. But it is unrelated to this CL, so we should do it in a different one. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/common_audio/block... File webrtc/common_audio/blocker.cc (right): https://codereview.webrtc.org/2320833002/diff/60001/webrtc/common_audio/block... webrtc/common_audio/blocker.cc:214: if (output != nullptr) { On 2016/09/15 15:06:20, peah-webrtc wrote: > Is this change still needed in this CL? No, I just decided to leave it since I thought it was a nice validation to have. Plus the possibility to be able to have blocker to be used only for analysis. But you are right, this change is now unrelated and should not be on this CL, so I removed it. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:119: num_render_channels_, chunk_length_ + initial_delay))); On 2016/09/15 15:06:20, peah-webrtc wrote: > It is a bit wasteful to throw up 2 AudioRingBuffers just for the purpose of > delaying the signal. The overhead of that is that chunk_length_-1 more samples > for each channel than needed are stored for each buffer. Good point. Implemented DelayBuffer helper class. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:385: RTC_DCHECK_EQ(audio->num_bands(), high_bands_buffers_.size() + 1u); On 2016/09/15 15:06:20, peah-webrtc wrote: > Suggestion: Since in this case high_bands_buffers_.size is 1 or two, I think it > would be faster to not set up a for-loop but instead just make two > if-statements. That is not scalable if we ever decide to add support for any other number of bands and the gain in performance is negligible. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:454: const size_t kTestFragmentSize = rtc::CheckedDivExact(kTestSampleRate, 100); On 2016/09/15 15:06:20, peah-webrtc wrote: > In the IE you call this chunksize. Another common name is frame. But I've never > heard fragment before. Is this something new? Please use a common naming scheme > for this. I agree that this should probably be named chunk (we are using frame for one sample of each channel), but I want to be consistent with the rest of the test. We can rename all the instances of fragment to chunk in another CL. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:455: const size_t kTestSplitSize = rtc::CheckedDivExact(kTestSplitRate, 100); On 2016/09/15 15:06:20, peah-webrtc wrote: > Here as well, please use a more descriptive name, like split_chunk_size or > something else. Agreed that it makes sense, to specify it is a chunk size, so added "Fragment" to it. We can rename all the instances of fragment to chunk in another CL. https://codereview.webrtc.org/2320833002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:459: std::vector<float> tmp_buf(kTestFragmentSize); On 2016/09/15 15:06:20, peah-webrtc wrote: > Please give this a more proper name as well. It is used for one purpose only, so > it should be possible to rename it to a descriptive name. It is only a temporary buffer to generate a signal and then copy it into the original and audio buffer, so that is why I named it tmp_buf. Now I renamed it to rand_gen_buf, but I am happy to rename it to a better suggestion you might have.
https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:206: const int kSampleRate = 4000; On 2016/09/15 23:45:25, aluebs-webrtc wrote: > On 2016/09/15 15:06:20, peah-webrtc wrote: > > On 2016/09/14 00:35:54, aluebs-webrtc wrote: > > > On 2016/09/13 13:30:00, peah-webrtc wrote: > > > > What is the purpose of the samplerate constant of 4000? > > > > > > To set a sample rate for the tests. > > > > You mean that the test is running at a sample rate of 4 kHz? Why not use one > of > > the native rates instead? 4 kHz is never used in practice when using the IE. > > Other test (like the bitexactness and the one I just added) use native rates. I > don't see anything wrong with also using different sample rates than used in > practice. Plus it means having less samples until convergence, making the test > quicker. > But anyway, if we would decide to change it, this is not the CL for doing that. > It is unrelated. It makes sense to test for other sample rates. But in this case the IE is tested for a case which it never (being located beneath APM) will operate in. I'd claim that IE cannot support arbitrary sample rates. For instance, one requirement is that the sample rate produces 10 ms frames of an even number of samples. Same goes for the VAD. 4 kHz fulfills that requirement though. The problem that I see with testing for sample rates that will never be used in practice is that if the code is changed in a way that does not support that sample rate, the test will need to be changed in vain. Furthermore, the added test coverage by using a different sample rate is not useful as it tests the code in a way that it will not be used. But you are correct that it is not related to this CL. https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc (right): https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:75: for (size_t i = 0u; i < length; ++i) { I think this implementation is more expensive than needed. 1) Since buffer.size() is 1 or 2, and each channel is stored in a linear manner, it is better to flip the nesting of the loop and have the inner loop looping for the samples. 2) When doing it in a loop with many iterations, the % operator is slower than using an if statement. 3) I know that the code is not written like that and that it handles an arbitrary delay, but that is actually not really necessary, and it has a potential impact on the complexity. It is a bit unclear to me but I think you are using 256 point ffts, right? That should mean that the delay should be fixed to 174 samples in 8 kHz and 94 samples otherwise, right? That should mean that the delay is fixed for all sample rates above 8 kHz, this can be realize using memcpy's which are significantly faster. https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h (right): https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h:72: dd https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h:74: dd
https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc (right): https://codereview.webrtc.org/2320833002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc:206: const int kSampleRate = 4000; On 2016/09/16 13:35:56, peah-webrtc wrote: > On 2016/09/15 23:45:25, aluebs-webrtc wrote: > > On 2016/09/15 15:06:20, peah-webrtc wrote: > > > On 2016/09/14 00:35:54, aluebs-webrtc wrote: > > > > On 2016/09/13 13:30:00, peah-webrtc wrote: > > > > > What is the purpose of the samplerate constant of 4000? > > > > > > > > To set a sample rate for the tests. > > > > > > You mean that the test is running at a sample rate of 4 kHz? Why not use one > > of > > > the native rates instead? 4 kHz is never used in practice when using the IE. > > > > Other test (like the bitexactness and the one I just added) use native rates. > I > > don't see anything wrong with also using different sample rates than used in > > practice. Plus it means having less samples until convergence, making the test > > quicker. > > But anyway, if we would decide to change it, this is not the CL for doing > that. > > It is unrelated. > > It makes sense to test for other sample rates. But in this case the IE is tested > for a case which it never (being located beneath APM) will operate in. > > I'd claim that IE cannot support arbitrary sample rates. For instance, one > requirement is that the sample rate produces 10 ms frames of an even number of > samples. Same goes for the VAD. 4 kHz fulfills that requirement though. > > The problem that I see with testing for sample rates that will never be used in > practice is that if the code is changed in a way that does not support that > sample rate, the test will need to be changed in vain. Furthermore, the added > test coverage by using a different sample rate is not useful as it tests the > code in a way that it will not be used. > > But you are correct that it is not related to this CL. > > > Acknowledged. https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc (right): https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:75: for (size_t i = 0u; i < length; ++i) { On 2016/09/16 13:35:56, peah-webrtc wrote: > I think this implementation is more expensive than needed. > 1) Since buffer.size() is 1 or 2, and each channel is stored in a linear manner, > it is better to flip the nesting of the loop and have the inner loop looping for > the samples. > 2) When doing it in a loop with many iterations, the % operator is slower than > using an if statement. > 3) I know that the code is not written like that and that it handles an > arbitrary delay, but that is actually not really necessary, and it has a > potential impact on the complexity. It is a bit unclear to me but I think you > are using 256 point ffts, right? That should mean that the delay should be fixed > to 174 samples in 8 kHz and 94 samples otherwise, right? > That should mean that the delay is fixed for all sample rates above 8 kHz, this > can be realize using memcpy's which are significantly faster. 1) This makes the code less readable, because it requires an intermediate channel_index variable, but done. 2) This makes the code less readable, since a simple modulo operation requires an if-statement, but done. 3) You are right that only fixed delays are possible in today's setup, but the FFT depends on the sample rate, so the delay is 112 for 8kHz and 224 for the rest. Anyway, I don't see why it is not possible to use memcpy for arbitrary delays, it is done by the AudioRingBuffer I was using originally. But this would require double the memory, which was your argument for building this DelayBuffer class in the first place. So I am leaving it as is. https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h (right): https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h:72: On 2016/09/16 13:35:56, peah-webrtc wrote: > dd Acknowledged. https://google.github.io/styleguide/cppguide.html#Vertical_Whitespace https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h:74: On 2016/09/16 13:35:56, peah-webrtc wrote: > dd Acknowledged. https://google.github.io/styleguide/cppguide.html#Vertical_Whitespace
Great! lgtm with some nits. https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc (right): https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:75: for (size_t i = 0u; i < length; ++i) { On 2016/09/17 00:48:48, aluebs-webrtc wrote: > On 2016/09/16 13:35:56, peah-webrtc wrote: > > I think this implementation is more expensive than needed. > > 1) Since buffer.size() is 1 or 2, and each channel is stored in a linear > manner, > > it is better to flip the nesting of the loop and have the inner loop looping > for > > the samples. > > 2) When doing it in a loop with many iterations, the % operator is slower than > > using an if statement. > > 3) I know that the code is not written like that and that it handles an > > arbitrary delay, but that is actually not really necessary, and it has a > > potential impact on the complexity. It is a bit unclear to me but I think you > > are using 256 point ffts, right? That should mean that the delay should be > fixed > > to 174 samples in 8 kHz and 94 samples otherwise, right? > > That should mean that the delay is fixed for all sample rates above 8 kHz, > this > > can be realize using memcpy's which are significantly faster. > > 1) This makes the code less readable, because it requires an intermediate > channel_index variable, but done. > 2) This makes the code less readable, since a simple modulo operation requires > an if-statement, but done. > 3) You are right that only fixed delays are possible in today's setup, but the > FFT depends on the sample rate, so the delay is 112 for 8kHz and 224 for the > rest. Anyway, I don't see why it is not possible to use memcpy for arbitrary > delays, it is done by the AudioRingBuffer I was using originally. But this would > require double the memory, which was your argument for building this DelayBuffer > class in the first place. So I am leaving it as is. It may be less readable, but definitely faster. You are right about the memcpys, they could always be used. But I don't think the doubling of memory is needed though. But I'm fine with this implementation though. https://codereview.webrtc.org/2320833002/diff/100001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc (right): https://codereview.webrtc.org/2320833002/diff/100001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:76: size_t channel_index = read_index_; Please consider changing this name. To me, a channel index would be the index of the channel rather than the sample index in the channel which is what it is here. What about next_read_index, or sample_index? https://codereview.webrtc.org/2320833002/diff/100001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:87: while (read_index_ >= buffer_.size()) { This part is not necessary. You have the wraparound for free if you instead do like below: size_t channel_index = 0; for (size_t i = 0u; i < buffer_.size(); ++i) { channel_index = read_index_; for (size_t j = 0u; j < length; ++j) { ... } } read_index_ = channel_index;
The CQ bit was checked by aluebs@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from peah@webrtc.org Link to the patchset: https://codereview.webrtc.org/2320833002/#ps120001 (title: "Only wrap once")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc (right): https://codereview.webrtc.org/2320833002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:75: for (size_t i = 0u; i < length; ++i) { On 2016/09/17 21:31:29, peah-webrtc wrote: > On 2016/09/17 00:48:48, aluebs-webrtc wrote: > > On 2016/09/16 13:35:56, peah-webrtc wrote: > > > I think this implementation is more expensive than needed. > > > 1) Since buffer.size() is 1 or 2, and each channel is stored in a linear > > manner, > > > it is better to flip the nesting of the loop and have the inner loop looping > > for > > > the samples. > > > 2) When doing it in a loop with many iterations, the % operator is slower > than > > > using an if statement. > > > 3) I know that the code is not written like that and that it handles an > > > arbitrary delay, but that is actually not really necessary, and it has a > > > potential impact on the complexity. It is a bit unclear to me but I think > you > > > are using 256 point ffts, right? That should mean that the delay should be > > fixed > > > to 174 samples in 8 kHz and 94 samples otherwise, right? > > > That should mean that the delay is fixed for all sample rates above 8 kHz, > > this > > > can be realize using memcpy's which are significantly faster. > > > > 1) This makes the code less readable, because it requires an intermediate > > channel_index variable, but done. > > 2) This makes the code less readable, since a simple modulo operation requires > > an if-statement, but done. > > 3) You are right that only fixed delays are possible in today's setup, but the > > FFT depends on the sample rate, so the delay is 112 for 8kHz and 224 for the > > rest. Anyway, I don't see why it is not possible to use memcpy for arbitrary > > delays, it is done by the AudioRingBuffer I was using originally. But this > would > > require double the memory, which was your argument for building this > DelayBuffer > > class in the first place. So I am leaving it as is. > > It may be less readable, but definitely faster. > > You are right about the memcpys, they could always be used. But I don't think > the doubling of memory is needed though. But I'm fine with this implementation > though. There is a tradeoff between memory footprint and speed. One extreme is to only use one float to swap each at a time (what is implemented). The other is to double the memory and ensure everything is memcpy-ed at once. And then there is anything in between, using certain memory and doing the necessary amount of memcpys. https://codereview.webrtc.org/2320833002/diff/100001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc (right): https://codereview.webrtc.org/2320833002/diff/100001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:76: size_t channel_index = read_index_; On 2016/09/17 21:31:30, peah-webrtc wrote: > Please consider changing this name. > To me, a channel index would be the index of the channel rather than the sample > index in the channel which is what it is here. > What about next_read_index, or sample_index? Good point. Done. https://codereview.webrtc.org/2320833002/diff/100001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc:87: while (read_index_ >= buffer_.size()) { On 2016/09/17 21:31:30, peah-webrtc wrote: > This part is not necessary. You have the wraparound for free if you instead do > like below: > size_t channel_index = 0; > for (size_t i = 0u; i < buffer_.size(); ++i) { > channel_index = read_index_; > for (size_t j = 0u; j < length; ++j) { > ... > } > } > read_index_ = channel_index; Good point. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_compile_x86_dbg on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/android_compile_x86_dbg...) linux_baremetal on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_baremetal/builds/...) linux_ubsan on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_ubsan/builds/5184) mac_asan on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/mac_asan/builds/17768) mac_baremetal on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/mac_baremetal/builds/14475) mac_gyp_dbg on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/mac_gyp_dbg/builds/753) mac_gyp_rel on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/mac_gyp_rel/builds/756) mac_rel on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/mac_rel/builds/18388)
The CQ bit was checked by aluebs@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from peah@webrtc.org Link to the patchset: https://codereview.webrtc.org/2320833002/#ps140001 (title: "Rebasing")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: presubmit on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/presubmit/builds/8491)
Description was changed from ========== Compensate for the IntelligibilityEnhancer processing delay in high bands Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this. ========== to ========== Compensate for the IntelligibilityEnhancer processing delay in high bands Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this. BUG=b/30780909 ==========
The CQ bit was checked by aluebs@webrtc.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: presubmit on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/presubmit/builds/8492)
henrik.lundin, could you please have a look at webrtc/common_audio, since I need OWNERs approval?
RS-LGTM with one nit. https://codereview.webrtc.org/2320833002/diff/140001/webrtc/common_audio/lapp... File webrtc/common_audio/lapped_transform.h (right): https://codereview.webrtc.org/2320833002/diff/140001/webrtc/common_audio/lapp... webrtc/common_audio/lapped_transform.h:89: // Get the initial delay. Get -> Returns
https://codereview.webrtc.org/2320833002/diff/140001/webrtc/common_audio/lapp... File webrtc/common_audio/lapped_transform.h (right): https://codereview.webrtc.org/2320833002/diff/140001/webrtc/common_audio/lapp... webrtc/common_audio/lapped_transform.h:89: // Get the initial delay. On 2016/09/20 09:50:32, hlundin-webrtc wrote: > Get -> Returns I used "Get" to be consistent with the other getters in this file, but I agree that "Returns" is a better wording. Changed.
The CQ bit was checked by aluebs@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from henrik.lundin@webrtc.org, peah@webrtc.org Link to the patchset: https://codereview.webrtc.org/2320833002/#ps160001 (title: "Improve comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_arm64_rel on master.tryserver.webrtc (JOB_TIMED_OUT, no build URL) android_dbg on master.tryserver.webrtc (JOB_TIMED_OUT, no build URL) android_rel on master.tryserver.webrtc (JOB_TIMED_OUT, no build URL)
Description was changed from ========== Compensate for the IntelligibilityEnhancer processing delay in high bands Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this. BUG=b/30780909 ========== to ========== Compensate for the IntelligibilityEnhancer processing delay in high bands Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this. BUG=b/30780909 R=henrik.lundin@webrtc.org, peah@webrtc.org Committed: https://chromium.googlesource.com/external/webrtc/+/ef00925cd0d41afbc8b759970... ==========
Message was sent while issue was closed.
Description was changed from ========== Compensate for the IntelligibilityEnhancer processing delay in high bands Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this. BUG=b/30780909 R=henrik.lundin@webrtc.org, peah@webrtc.org Committed: https://chromium.googlesource.com/external/webrtc/+/ef00925cd0d41afbc8b759970... ========== to ========== Compensate for the IntelligibilityEnhancer processing delay in high bands Before this CL, the IntelligibilityEnhancer introduced a processing delay to the lower band, without compensating for it in the higher bands. This CL corrects this. BUG=b/30780909 R=henrik.lundin@webrtc.org, peah@webrtc.org Committed: https://crrev.com/ef00925cd0d41afbc8b7599704ddbf399ad82950 Cr-Commit-Position: refs/heads/master@{#14311} ==========
Message was sent while issue was closed.
Committed patchset #8 (id:160001) manually as ef00925cd0d41afbc8b7599704ddbf399ad82950 (presubmit successful). |