|
|
Created:
4 years, 7 months ago by aluebs-webrtc Modified:
4 years, 2 months ago CC:
webrtc-reviews_webrtc.org, Andrew MacDonald, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, kwiberg-webrtc, minyue-webrtc, the sun, bjornv1 Base URL:
https://chromium.googlesource.com/external/webrtc.git@master Target Ref:
refs/pending/heads/master Project:
webrtc Visibility:
Public. |
DescriptionPull out the PostFilter to its own NonlinearBeamformer API
This is done to avoid having a nonlinear component in the AEC path.
Now the linear delay and sum is run before the AEC and the postfilter after it.
R=henrik.lundin@webrtc.org, peah@webrtc.org
Committed: https://crrev.com/b983112bc7686ed4276def4c9215c498444c6bdd
Cr-Commit-Position: refs/heads/master@{#13314}
Patch Set 1 #
Total comments: 47
Patch Set 2 : Drop delay and sum #
Total comments: 20
Patch Set 3 : Add dummy out #
Total comments: 6
Patch Set 4 : Change Beamformer::ProcessChunk() to have one parameter #
Total comments: 20
Patch Set 5 : Drop Beamformer Interface #
Total comments: 20
Patch Set 6 : Rebasing #Patch Set 7 : Add multichannel postfilter support #Patch Set 8 : Rebasing #Patch Set 9 : Remove ChannelBuffer fix #
Total comments: 6
Patch Set 10 : Rebasing #Patch Set 11 : Fix comments #Patch Set 12 : Make NonlinearBeamforer interface virtual #Messages
Total messages: 43 (12 generated)
aluebs@webrtc.org changed reviewers: + peah@webrtc.org
Friendly ping. peah, could you PTAL.
On 2016/05/18 15:09:00, aluebs-webrtc wrote: > Friendly ping. peah, could you PTAL. peah, could you please take a look?
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, Why not change this to have one input/output parameter as that is how it is used? That would mean less code and would also save one memcopy. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, Is there a reason why this is not implemented in the .cc file? https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:183: // static Please correct this comment as well while you are at it. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:186: class PostFilterTransform : public LappedTransform::Callback { This class is a way to be able to use the callback functionality of the LappedTransform for two callbacks in the same class, right? It makes the code more complicated than it needs to, I think. Why not skip the friend class approach, and instead create a new class that implements the Callback interface? Then you could call that to apply the postfilter effect in such a way that the kCompensationGain and final_mask_ variables are passed as input to that call. That would achieve a separation of concerns, data and functionality encapsulation and would avoid the need of the friend class approach. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:192: // Process one frequency-domain block of audio. This is where the fun Please describe this more thoroughly. I'm not sure the comment is needed but if there is to be a comment it should be more descriptive, e.g., what kind of processing is done, how does the callback work, etc. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:248: float freq_hz = (static_cast<float>(i) / kFftSize) * sample_rate_hz_; Please change to use a precomputed 1/kFftSize * sample_rate_hz_. This approach results in kNumFreqBins unnecessary divisions. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:249: wave_numbers_[i] = 2 * M_PI * freq_hz / kSpeedOfSoundMeterSeconds; Please change to use a precomputed 2*pi/kSpeedOfSoundMeterSeconds. This approach results in kNumFreqBins unnecessary divisions. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:403: // Copy over only the first channel of each band. Have you checked the impact on the signal when this is only done in the lowest band. In theory the perfect reconstruction property of the signal could be destroyed through this , causing aliasing distortions. Furthermore, this could in theory cause an sharp increase in the signal power at the first band-split frequency. Why don't you perform the linear beamforming in the time-domain instead, as you anyway do not pre-steer the signals before applying that step (right?)? That would have the benefits 1) Be cheaper as you don't need to do the IFFT to go back to the time-domain. 2) Have lower delay, as the filter-bank delay is avoided. 3) Have zero impact on the perfect-reconstruction property of the splitting filter. 4) Have no edge effect at the first band-split frequency. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:407: memcpy(output->channels(i)[0], what happens if the output is dual channel? Since the ProcessChunk method locks the content of the output to be single-channel, it should either assert that the number of output channels allready is one, or explicitly set the number of channels to 1 in the output. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:419: // Ramp up/down for smoothing. 1 mask per 10ms results in audible I guess, what you mean is that smoothing is needed in order to avoid discontinuities in the transitions between 10 ms frames, right?. I.e., the problem is not 1 mask per 10 ms, but rather that the transition between two masks cannot be done instantaneously. If the above holds, this comment is a bit misleading. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:59: // Applies the postfilter mask to one chunk of audio. The audio is expected to I think that this description is longer than required. The description of the band splitting should not be needed here. It is rather a description of ChannelBuffer and if it needs to be clarified further it would be better to do it in a unittest which illustrates the usage of the method. I think the requirements on the frame size and the number of channels are better illustrated by a DCHECK inside the code.
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/05/22 21:06:48, peah-webrtc wrote: > Why not change this to have one input/output parameter as that is how it is > used? That would mean less code and would also save one memcopy. To be consistent with the ProcessChunk interface and give the user the freedom to *not* override the input. But I see your point, so I changed it. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/05/22 21:06:48, peah-webrtc wrote: > Is there a reason why this is not implemented in the .cc file? Beamformer is an interface, it has no .cc file. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:183: // static On 2016/05/22 21:06:48, peah-webrtc wrote: > Please correct this comment as well while you are at it. What is wrong with it? https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:186: class PostFilterTransform : public LappedTransform::Callback { On 2016/05/22 21:06:48, peah-webrtc wrote: > This class is a way to be able to use the callback functionality of the > LappedTransform for two callbacks in the same class, right? > > It makes the code more complicated than it needs to, I think. > Why not skip the friend class approach, and instead create a new class that > implements the Callback interface? Then you could call that to apply the > postfilter effect in such a way that the kCompensationGain and final_mask_ > variables are passed as input to that call. > That would achieve a separation of concerns, data and functionality > encapsulation and would avoid the need of the friend class approach. I don't see how this simplifies the code, but I agree that it avoids the friend class which is not desired. Anyway, I implemented it. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:192: // Process one frequency-domain block of audio. This is where the fun On 2016/05/22 21:06:48, peah-webrtc wrote: > Please describe this more thoroughly. I'm not sure the comment is needed but if > there is to be a comment it should be more descriptive, e.g., what kind of > processing is done, how does the callback work, etc. Removed comment. It was just to be consistent with the other callback. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:248: float freq_hz = (static_cast<float>(i) / kFftSize) * sample_rate_hz_; On 2016/05/22 21:06:48, peah-webrtc wrote: > Please change to use a precomputed 1/kFftSize * sample_rate_hz_. This approach > results in kNumFreqBins unnecessary divisions. Adds unrelated changes to the CL, but if you think that is ok. Done. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:249: wave_numbers_[i] = 2 * M_PI * freq_hz / kSpeedOfSoundMeterSeconds; On 2016/05/22 21:06:48, peah-webrtc wrote: > Please change to use a precomputed 2*pi/kSpeedOfSoundMeterSeconds. This approach > results in kNumFreqBins unnecessary divisions. Adds unrelated changes to the CL, but if you think that is ok. Done. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:403: // Copy over only the first channel of each band. On 2016/05/22 21:06:48, peah-webrtc wrote: > Have you checked the impact on the signal when this is only done in the lowest > band. In theory the perfect reconstruction property of the signal could be > destroyed through this , causing aliasing distortions. Furthermore, this could > in theory cause an sharp increase in the signal power at the first band-split > frequency. > > Why don't you perform the linear beamforming in the time-domain instead, as you > anyway do not pre-steer the signals before applying that step (right?)? > That would have the benefits > 1) Be cheaper as you don't need to do the IFFT to go back to the time-domain. > 2) Have lower delay, as the filter-bank delay is avoided. > 3) Have zero impact on the perfect-reconstruction property of the splitting > filter. > 4) Have no edge effect at the first band-split frequency. > This is no longer relevant, since we decided offline to not do any linear beamforming because it is negligible compared to the postfilter. But a few points just to be clear: * The perfect reconstruction of the filter bank is lost with any processing done. * The worst case 3dB increase of power in the higher bands is probably not perceivable. And I am not sure what the problem with a "sharp" edge here is. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:407: memcpy(output->channels(i)[0], On 2016/05/22 21:06:48, peah-webrtc wrote: > what happens if the output is dual channel? Since the ProcessChunk method locks > the content of the output to be single-channel, it should either assert that the > number of output channels allready is one, or explicitly set the number of > channels to 1 in the output. This can't be done as is, since the input and output can be the same ChannelBuffer and the ChannelBuffer has a fixed number of channels. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:419: // Ramp up/down for smoothing. 1 mask per 10ms results in audible On 2016/05/22 21:06:48, peah-webrtc wrote: > I guess, what you mean is that smoothing is needed in order to avoid > discontinuities in the transitions between 10 ms frames, right?. I.e., the > problem is not 1 mask per 10 ms, but rather that the transition between two > masks cannot be done instantaneously. > > If the above holds, this comment is a bit misleading. Adds unrelated changes to the CL, but if you think that is ok. I find the comment to be easy to understand, having "1 [constant] mask per 10ms results in audible discontinuities [in the transitions]". But if you find it confusing, I am happy to change it. WDYT of the current one? https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:59: // Applies the postfilter mask to one chunk of audio. The audio is expected to On 2016/05/22 21:06:48, peah-webrtc wrote: > I think that this description is longer than required. > > The description of the band splitting should not be needed here. It is rather a > description of ChannelBuffer and if it needs to be clarified further it would be > better to do it in a unittest which illustrates the usage of the method. > > I think the requirements on the frame size and the number of channels are better > illustrated by a DCHECK inside the code. > I personally think adding more documentation than required is not a problem. It is also consistent (band splitting and requirements on input) with the ProcessChunk documentation. There is a DCHECK on place as well.
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > Is there a reason why this is not implemented in the .cc file? > > Beamformer is an interface, it has no .cc file. My skills on this are limited, but to me, an interface is something that the user of the interface needs to implement. If it has a default implementation, it is not a pure interface. Is there a reason for why this method cannot be declared pure virtual so that this is a proper interface that enforces the implementer of the interface to supply a interface implementation? https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:183: // static On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > Please correct this comment as well while you are at it. > > What is wrong with it? It is not a proper sentence, and not terminated by a period. But maybe this is a standard way of commenting statics definitions? https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:186: class PostFilterTransform : public LappedTransform::Callback { On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > This class is a way to be able to use the callback functionality of the > > LappedTransform for two callbacks in the same class, right? > > > > It makes the code more complicated than it needs to, I think. > > Why not skip the friend class approach, and instead create a new class that > > implements the Callback interface? Then you could call that to apply the > > postfilter effect in such a way that the kCompensationGain and final_mask_ > > variables are passed as input to that call. > > That would achieve a separation of concerns, data and functionality > > encapsulation and would avoid the need of the friend class approach. > > I don't see how this simplifies the code, but I agree that it avoids the friend > class which is not desired. Anyway, I implemented it. The simplification is separation of concerns and data and functionality encapsulation, as well as avoiding the use of an interface. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:403: // Copy over only the first channel of each band. On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > Have you checked the impact on the signal when this is only done in the lowest > > band. In theory the perfect reconstruction property of the signal could be > > destroyed through this , causing aliasing distortions. Furthermore, this > could > > in theory cause an sharp increase in the signal power at the first band-split > > frequency. > > > > Why don't you perform the linear beamforming in the time-domain instead, as > you > > anyway do not pre-steer the signals before applying that step (right?)? > > That would have the benefits > > 1) Be cheaper as you don't need to do the IFFT to go back to the time-domain. > > 2) Have lower delay, as the filter-bank delay is avoided. > > 3) Have zero impact on the perfect-reconstruction property of the splitting > > filter. > > 4) Have no edge effect at the first band-split frequency. > > > > This is no longer relevant, since we decided offline to not do any linear > beamforming because it is negligible compared to the postfilter. But a few > points just to be clear: > * The perfect reconstruction of the filter bank is lost with any processing > done. > * The worst case 3dB increase of power in the higher bands is probably not > perceivable. And I am not sure what the problem with a "sharp" edge here is. I think the perfect reconstruction may actually be preserved if you average two PR filterbanks in the time-domain (which the linear beamforming is identical to if no pre-steering is performed). A 3 dB "edge" is perceivable, but I doubt as well if it will matter in general for most noise types (at least the one with tilted spectral shapes). The sharp edge is dependent on the sharpness of the splitting filter, as the subsequent merge will merge two bands where the noise levels differ by 3 dB after the beamforming. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:59: // Applies the postfilter mask to one chunk of audio. The audio is expected to On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > I think that this description is longer than required. > > > > The description of the band splitting should not be needed here. It is rather > a > > description of ChannelBuffer and if it needs to be clarified further it would > be > > better to do it in a unittest which illustrates the usage of the method. > > > > I think the requirements on the frame size and the number of channels are > better > > illustrated by a DCHECK inside the code. > > > > I personally think adding more documentation than required is not a problem. It > is also consistent (band splitting and requirements on input) with the > ProcessChunk documentation. There is a DCHECK on place as well. Great! https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->ProcessChunk(*ca->split_data_f(), If the ProcessChunk is made input-only, you should be able to pass ca->split_bands_const_f here instead, right? https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:685: ca->set_num_channels(1); Please add a comment that this operation discards all channels by the leftmost one. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/beamformer.h:36: virtual void PostFilter(ChannelBuffer<float>* data) {}; I'm not that familiar with interfaces, but why can't you make this method pure virtual similar to the other interface methods? https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:383: ChannelBuffer<float>* output) { Since the linear beamforming is no longer done, output could be removed, right? Alternative, if you want to maintain this method for when the linear beamforming is to be used, I think it would make sense to add another method which is input-only. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:392: for (size_t i = 0; i < input.num_bands(); ++i) { If the output is removed from ProcessChunk, these memcopies can be removed. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:484: memcpy(output[0], input[0], kNumFreqBins * sizeof(input[0][0])); This memcopy is not neccessary, why not instead discard the output parameter from this method? The memcopy done here, is actually in the calling method reversed by another num_bands memcopies that copies the output back to the input, so discarding the output parameter would save (1+num_bands) memcopys. Furthermore it would avoid the delay of the filterbank transform that is imposed by these two memcopys (and that delay is > 7ms, right)?
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/05/26 08:48:52, peah-webrtc wrote: > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > Is there a reason why this is not implemented in the .cc file? > > > > Beamformer is an interface, it has no .cc file. > > My skills on this are limited, but to me, an interface is something that the > user of the interface needs to implement. If it has a default implementation, it > is not a pure interface. Is there a reason for why this method cannot be > declared pure virtual so that this is a proper interface that enforces the > implementer of the interface to supply a interface implementation? It already is not a pure interface, because IsInBeam() has a default implementation. This has a default empty implementation because not all Beamformers have postfilters. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:183: // static On 2016/05/26 08:48:52, peah-webrtc wrote: > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > Please correct this comment as well while you are at it. > > > > What is wrong with it? > > It is not a proper sentence, and not terminated by a period. But maybe this is > a standard way of commenting statics definitions? I think the static definition is clearer like this than in a sentence. Also, I haven't touched this in this CL, so it would add one more unrelated change. So I will leave it as is. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:186: class PostFilterTransform : public LappedTransform::Callback { On 2016/05/26 08:48:52, peah-webrtc wrote: > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > This class is a way to be able to use the callback functionality of the > > > LappedTransform for two callbacks in the same class, right? > > > > > > It makes the code more complicated than it needs to, I think. > > > Why not skip the friend class approach, and instead create a new class that > > > implements the Callback interface? Then you could call that to apply the > > > postfilter effect in such a way that the kCompensationGain and final_mask_ > > > variables are passed as input to that call. > > > That would achieve a separation of concerns, data and functionality > > > encapsulation and would avoid the need of the friend class approach. > > > > I don't see how this simplifies the code, but I agree that it avoids the > friend > > class which is not desired. Anyway, I implemented it. > > The simplification is separation of concerns and data and functionality > encapsulation, as well as avoiding the use of an interface. I think the separation/encapsulation is almost the same, but the call order is more complicated since you have to pass in the needed arguments. But I see that having friend classes is not desirable, so I implemented it already. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:403: // Copy over only the first channel of each band. On 2016/05/26 08:48:52, peah-webrtc wrote: > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > Have you checked the impact on the signal when this is only done in the > lowest > > > band. In theory the perfect reconstruction property of the signal could be > > > destroyed through this , causing aliasing distortions. Furthermore, this > > could > > > in theory cause an sharp increase in the signal power at the first > band-split > > > frequency. > > > > > > Why don't you perform the linear beamforming in the time-domain instead, as > > you > > > anyway do not pre-steer the signals before applying that step (right?)? > > > That would have the benefits > > > 1) Be cheaper as you don't need to do the IFFT to go back to the > time-domain. > > > 2) Have lower delay, as the filter-bank delay is avoided. > > > 3) Have zero impact on the perfect-reconstruction property of the splitting > > > filter. > > > 4) Have no edge effect at the first band-split frequency. > > > > > > > This is no longer relevant, since we decided offline to not do any linear > > beamforming because it is negligible compared to the postfilter. But a few > > points just to be clear: > > * The perfect reconstruction of the filter bank is lost with any processing > > done. > > * The worst case 3dB increase of power in the higher bands is probably not > > perceivable. And I am not sure what the problem with a "sharp" edge here is. > > I think the perfect reconstruction may actually be preserved if you average two > PR filterbanks in the time-domain (which the linear beamforming is identical to > if no pre-steering is performed). > > A 3 dB "edge" is perceivable, but I doubt as well if it will matter in general > for most noise types (at least the one with tilted spectral shapes). The sharp > edge is dependent on the sharpness of the splitting filter, as the subsequent > merge will merge two bands where the noise levels differ by 3 dB after the > beamforming. I meant that other components are already non-linear, losing the perfect reconstruction. I don't think the 3dB (in the worst case) would be perceivable in the higher bands at all. But in any case it would sound slightly highpass, but the sharpness is not something that would matter too much. Anyways, now I am only taking one channel, so this is no longer relevant. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->ProcessChunk(*ca->split_data_f(), On 2016/05/26 08:48:53, peah-webrtc wrote: > If the ProcessChunk is made input-only, you should be able to pass > ca->split_bands_const_f here instead, right? Because other Beamformers might do some processing (even all of it) in ProcessChunk(), I think we should leave it as is. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:685: ca->set_num_channels(1); On 2016/05/26 08:48:53, peah-webrtc wrote: > Please add a comment that this operation discards all channels by the leftmost > one. Done. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/beamformer.h:36: virtual void PostFilter(ChannelBuffer<float>* data) {}; On 2016/05/26 08:48:53, peah-webrtc wrote: > I'm not that familiar with interfaces, but why can't you make this method pure > virtual similar to the other interface methods? Addressed in the other comment. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:383: ChannelBuffer<float>* output) { On 2016/05/26 08:48:53, peah-webrtc wrote: > Since the linear beamforming is no longer done, output could be removed, right? > Alternative, if you want to maintain this method for when the linear beamforming > is to be used, I think it would make sense to add another method which is > input-only. Adding yet another interface, increases code complexity. I think we want to leave it as is: one processing method and a non-linear postfilter method. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:392: for (size_t i = 0; i < input.num_bands(); ++i) { On 2016/05/26 08:48:53, peah-webrtc wrote: > If the output is removed from ProcessChunk, these memcopies can be removed. See above on why I think the output should not be removed. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:484: memcpy(output[0], input[0], kNumFreqBins * sizeof(input[0][0])); On 2016/05/26 08:48:53, peah-webrtc wrote: > This memcopy is not neccessary, why not instead discard the output parameter > from this method? The memcopy done here, is actually in the calling method > reversed by another num_bands memcopies that copies the output back to the > input, so discarding the output parameter would save (1+num_bands) memcopys. > Furthermore it would avoid the delay of the filterbank transform that is imposed > by these two memcopys (and that delay is > 7ms, right)? See above why I don't think we should drop the output parameter. It was necessary when input and output was the same ChannelBuffer (like in APM), because else this would have put zeros in the lower band. But now I added a dummy buffer to avoid this memcpy.
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/05/28 03:00:00, aluebs-webrtc wrote: > On 2016/05/26 08:48:52, peah-webrtc wrote: > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > Is there a reason why this is not implemented in the .cc file? > > > > > > Beamformer is an interface, it has no .cc file. > > > > My skills on this are limited, but to me, an interface is something that the > > user of the interface needs to implement. If it has a default implementation, > it > > is not a pure interface. Is there a reason for why this method cannot be > > declared pure virtual so that this is a proper interface that enforces the > > implementer of the interface to supply a interface implementation? > > It already is not a pure interface, because IsInBeam() has a default > implementation. This has a default empty implementation because not all > Beamformers have postfilters. Do you mean that there are more beamformers that use this interface? Would it not be better then to let the actual beamformer implementing the interface to explicitly provide an empty implementation, rather to have it pre-provided? https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:183: // static On 2016/05/28 03:00:00, aluebs-webrtc wrote: > On 2016/05/26 08:48:52, peah-webrtc wrote: > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > Please correct this comment as well while you are at it. > > > > > > What is wrong with it? > > > > It is not a proper sentence, and not terminated by a period. But maybe this > is > > a standard way of commenting statics definitions? > > I think the static definition is clearer like this than in a sentence. Also, I > haven't touched this in this CL, so it would add one more unrelated change. So I > will leave it as is. I think the guidelines should be applied regardless of that. But I won't insist any further on this. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:186: class PostFilterTransform : public LappedTransform::Callback { On 2016/05/28 03:00:00, aluebs-webrtc wrote: > On 2016/05/26 08:48:52, peah-webrtc wrote: > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > This class is a way to be able to use the callback functionality of the > > > > LappedTransform for two callbacks in the same class, right? > > > > > > > > It makes the code more complicated than it needs to, I think. > > > > Why not skip the friend class approach, and instead create a new class > that > > > > implements the Callback interface? Then you could call that to apply the > > > > postfilter effect in such a way that the kCompensationGain and > final_mask_ > > > > variables are passed as input to that call. > > > > That would achieve a separation of concerns, data and functionality > > > > encapsulation and would avoid the need of the friend class approach. > > > > > > I don't see how this simplifies the code, but I agree that it avoids the > > friend > > > class which is not desired. Anyway, I implemented it. > > > > The simplification is separation of concerns and data and functionality > > encapsulation, as well as avoiding the use of an interface. > > I think the separation/encapsulation is almost the same, but the call order is > more complicated since you have to pass in the needed arguments. But I see that > having friend classes is not desirable, so I implemented it already. Acknowledged. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:192: // Process one frequency-domain block of audio. This is where the fun On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > Please describe this more thoroughly. I'm not sure the comment is needed but > if > > there is to be a comment it should be more descriptive, e.g., what kind of > > processing is done, how does the callback work, etc. > > Removed comment. It was just to be consistent with the other callback. Acknowledged. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:248: float freq_hz = (static_cast<float>(i) / kFftSize) * sample_rate_hz_; On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > Please change to use a precomputed 1/kFftSize * sample_rate_hz_. This approach > > results in kNumFreqBins unnecessary divisions. > > Adds unrelated changes to the CL, but if you think that is ok. Done. Acknowledged. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:249: wave_numbers_[i] = 2 * M_PI * freq_hz / kSpeedOfSoundMeterSeconds; On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > Please change to use a precomputed 2*pi/kSpeedOfSoundMeterSeconds. This > approach > > results in kNumFreqBins unnecessary divisions. > > Adds unrelated changes to the CL, but if you think that is ok. Done. Acknowledged. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:403: // Copy over only the first channel of each band. On 2016/05/28 03:00:00, aluebs-webrtc wrote: > On 2016/05/26 08:48:52, peah-webrtc wrote: > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > Have you checked the impact on the signal when this is only done in the > > lowest > > > > band. In theory the perfect reconstruction property of the signal could be > > > > destroyed through this , causing aliasing distortions. Furthermore, this > > > could > > > > in theory cause an sharp increase in the signal power at the first > > band-split > > > > frequency. > > > > > > > > Why don't you perform the linear beamforming in the time-domain instead, > as > > > you > > > > anyway do not pre-steer the signals before applying that step (right?)? > > > > That would have the benefits > > > > 1) Be cheaper as you don't need to do the IFFT to go back to the > > time-domain. > > > > 2) Have lower delay, as the filter-bank delay is avoided. > > > > 3) Have zero impact on the perfect-reconstruction property of the > splitting > > > > filter. > > > > 4) Have no edge effect at the first band-split frequency. > > > > > > > > > > This is no longer relevant, since we decided offline to not do any linear > > > beamforming because it is negligible compared to the postfilter. But a few > > > points just to be clear: > > > * The perfect reconstruction of the filter bank is lost with any processing > > > done. > > > * The worst case 3dB increase of power in the higher bands is probably not > > > perceivable. And I am not sure what the problem with a "sharp" edge here is. > > > > I think the perfect reconstruction may actually be preserved if you average > two > > PR filterbanks in the time-domain (which the linear beamforming is identical > to > > if no pre-steering is performed). > > > > A 3 dB "edge" is perceivable, but I doubt as well if it will matter in general > > for most noise types (at least the one with tilted spectral shapes). The sharp > > edge is dependent on the sharpness of the splitting filter, as the subsequent > > merge will merge two bands where the noise levels differ by 3 dB after the > > beamforming. > > I meant that other components are already non-linear, losing the perfect > reconstruction. I don't think the 3dB (in the worst case) would be perceivable > in the higher bands at all. But in any case it would sound slightly highpass, > but the sharpness is not something that would matter too much. Anyways, now I am > only taking one channel, so this is no longer relevant. Acknowledged. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:407: memcpy(output->channels(i)[0], On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > what happens if the output is dual channel? Since the ProcessChunk method > locks > > the content of the output to be single-channel, it should either assert that > the > > number of output channels allready is one, or explicitly set the number of > > channels to 1 in the output. > > This can't be done as is, since the input and output can be the same > ChannelBuffer and the ChannelBuffer has a fixed number of channels. So this means that the input could have 2 channels, and then the output gets only 1 channel, but the meta-information (about the number of channels) in the output does still not reflect that, which means that the caller needs to know this and explicitly set the number of channels in the meta-information to 1! That sounds error prone and more complicated than necessary. It would be better then to replace this ProcessChunk call by two other calls, One similar to this one, but that does not allow in-place calls, and another that is explicitly in-place and takes only one ChannelBuffer pointer. That would allow the ProcessChunk method to change the number of channels. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:419: // Ramp up/down for smoothing. 1 mask per 10ms results in audible On 2016/05/26 01:04:45, aluebs-webrtc wrote: > On 2016/05/22 21:06:48, peah-webrtc wrote: > > I guess, what you mean is that smoothing is needed in order to avoid > > discontinuities in the transitions between 10 ms frames, right?. I.e., the > > problem is not 1 mask per 10 ms, but rather that the transition between two > > masks cannot be done instantaneously. > > > > If the above holds, this comment is a bit misleading. > > Adds unrelated changes to the CL, but if you think that is ok. > I find the comment to be easy to understand, having "1 [constant] mask per 10ms > results in audible discontinuities [in the transitions]". But if you find it > confusing, I am happy to change it. WDYT of the current one? Sounds awesome! https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->ProcessChunk(*ca->split_data_f(), On 2016/05/28 03:00:00, aluebs-webrtc wrote: > On 2016/05/26 08:48:53, peah-webrtc wrote: > > If the ProcessChunk is made input-only, you should be able to pass > > ca->split_bands_const_f here instead, right? > > Because other Beamformers might do some processing (even all of it) in > ProcessChunk(), I think we should leave it as is. Are there other beamformer implementations, apart from the one reviewed in this CL, that could be called here? I don't think it is a good idea to keep it like this, because it gives the impression that the beamformer is actually changing the signal, which it is not. Furthermore, it unnecessarily makes the APM sensitive to any bugs in the beamformer implementation that accidentally changes the signal. If there are other beamformer implementations that could actually change the signal, they should not be allowed to be used together with the AEC, so in that case it would make sense to place a DCHECK that the AEC is not activated if they are active. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:383: ChannelBuffer<float>* output) { On 2016/05/28 03:00:00, aluebs-webrtc wrote: > On 2016/05/26 08:48:53, peah-webrtc wrote: > > Since the linear beamforming is no longer done, output could be removed, > right? > > Alternative, if you want to maintain this method for when the linear > beamforming > > is to be used, I think it would make sense to add another method which is > > input-only. > > Adding yet another interface, increases code complexity. I think we want to > leave it as is: one processing method and a non-linear postfilter method. That depends, as there is currently no beamformer (?) that uses the output functionality of this method, right? And if there is, if another method is added to the interface, the memcopy done below could be removed so I think the amount of code would possibly decrease with another interface. Furthermore, there are clear benefits with another interface that could be argued outweights the potential drawback of code complexity for the beamformer: 1) increases code readability for APM as it is clear that this is an input-only functionality. 2) increases code security for the APM, as adaptive processing of the signal prior to the AEC is explicitly prevented. 3) reduces computational complexity as the below memcopy is not needed and furthermore the APM can assume that the signal is not modified by the call to the beamformer. https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:389: process_transform_->ProcessChunk(input.channels(0), dummy_out->channels(0)); I think changing to interface would be much preferably rather than to avoid this by using a dummy output. Apart from avoiding to changing/extending the interface I cannot see the point of this. And changing the interface to reflect how the beamformer actually performs seems like the natural way to address this issue rather than to mock a beamformer behavior that is not there. https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:439: size_t num_output_channels, With this implementation, the last two parameters are unused. Is there no other way to achieve this? Can the LappedTransform only be used in a way that assumes an input-output manner? With this solution, this is more complex than needed, since it actually does the synthesis unnecessarily on dummy data. https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:443: RTC_CHECK_EQ(1u, num_output_channels); I guess this CHECK can be removed, right?
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/05/30 11:49:25, peah-webrtc wrote: > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > On 2016/05/26 08:48:52, peah-webrtc wrote: > > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > > Is there a reason why this is not implemented in the .cc file? > > > > > > > > Beamformer is an interface, it has no .cc file. > > > > > > My skills on this are limited, but to me, an interface is something that the > > > user of the interface needs to implement. If it has a default > implementation, > > it > > > is not a pure interface. Is there a reason for why this method cannot be > > > declared pure virtual so that this is a proper interface that enforces the > > > implementer of the interface to supply a interface implementation? > > > > It already is not a pure interface, because IsInBeam() has a default > > implementation. This has a default empty implementation because not all > > Beamformers have postfilters. > > Do you mean that there are more beamformers that use this interface? > > Would it not be better then to let the actual beamformer implementing the > interface to explicitly provide an empty implementation, rather to have it > pre-provided? > At this point there is only the non-linear Beamformer in WebRTC, but there were plans of porting other types if we think they provide a benefit. I think having a default implementation is more intuitive and adds less code-complexity. First, because it doesn't force all Beamformers, even the ones that have nothing to do with postfilters, implement it. But also, because this means that every Beamformer needs it's own empty implementation. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:183: // static On 2016/05/30 11:49:25, peah-webrtc wrote: > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > On 2016/05/26 08:48:52, peah-webrtc wrote: > > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > > Please correct this comment as well while you are at it. > > > > > > > > What is wrong with it? > > > > > > It is not a proper sentence, and not terminated by a period. But maybe this > > is > > > a standard way of commenting statics definitions? > > > > I think the static definition is clearer like this than in a sentence. Also, I > > haven't touched this in this CL, so it would add one more unrelated change. So > I > > will leave it as is. > > I think the guidelines should be applied regardless of that. But I won't insist > any further on this. Acknowledged. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:407: memcpy(output->channels(i)[0], On 2016/05/30 11:49:25, peah-webrtc wrote: > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > what happens if the output is dual channel? Since the ProcessChunk method > > locks > > > the content of the output to be single-channel, it should either assert that > > the > > > number of output channels allready is one, or explicitly set the number of > > > channels to 1 in the output. > > > > This can't be done as is, since the input and output can be the same > > ChannelBuffer and the ChannelBuffer has a fixed number of channels. > > So this means that the input could have 2 channels, and then the output gets > only 1 channel, but the meta-information (about the number of channels) in the > output does still not reflect that, which means that the caller needs to know > this and explicitly set the number of channels in the meta-information to 1! > That sounds error prone and more complicated than necessary. > > It would be better then to replace this ProcessChunk call by two other calls, > One similar to this one, but that does not allow in-place calls, and another > that is explicitly in-place and takes only one ChannelBuffer pointer. > That would allow the ProcessChunk method to change the number of channels. I don't think that adding an additional interface is a good idea. It only increases code complexity. But, modifying the ChannelBuffer behavior, we can set the number of output channels to 1. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->ProcessChunk(*ca->split_data_f(), On 2016/05/30 11:49:25, peah-webrtc wrote: > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > On 2016/05/26 08:48:53, peah-webrtc wrote: > > > If the ProcessChunk is made input-only, you should be able to pass > > > ca->split_bands_const_f here instead, right? > > > > Because other Beamformers might do some processing (even all of it) in > > ProcessChunk(), I think we should leave it as is. > > Are there other beamformer implementations, apart from the one reviewed in this > CL, that could be called here? > > I don't think it is a good idea to keep it like this, because it gives the > impression that the beamformer is actually changing the signal, which it is not. > Furthermore, it unnecessarily makes the APM sensitive to any bugs in the > beamformer implementation that accidentally changes the signal. > > If there are other beamformer implementations that could actually change the > signal, they should not be allowed to be used together with the AEC, so in that > case it would make sense to place a DCHECK that the AEC is not activated if they > are active. There are no other beamformers in WebRTC at the moment, but there were plans to support multiple different ones if there is a use case for them. If going from stereo to mono is to modify the signal, it actually does modify it. And also, other linear beamformers that actually change the signal here should totally be allowed to be used with the AEC. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:383: ChannelBuffer<float>* output) { On 2016/05/30 11:49:25, peah-webrtc wrote: > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > On 2016/05/26 08:48:53, peah-webrtc wrote: > > > Since the linear beamforming is no longer done, output could be removed, > > right? > > > Alternative, if you want to maintain this method for when the linear > > beamforming > > > is to be used, I think it would make sense to add another method which is > > > input-only. > > > > Adding yet another interface, increases code complexity. I think we want to > > leave it as is: one processing method and a non-linear postfilter method. > > That depends, as there is currently no beamformer (?) that uses the output > functionality of this method, right? And if there is, if another method is added > to the interface, the memcopy done below could be removed so I think the amount > of code would possibly decrease with another interface. > > > Furthermore, there are clear benefits with another interface that could be > argued outweights the potential drawback of code complexity for the beamformer: > 1) increases code readability for APM as it is clear that this is an input-only > functionality. > 2) increases code security for the APM, as adaptive processing of the signal > prior to the AEC is explicitly prevented. > 3) reduces computational complexity as the below memcopy is not needed and > furthermore the APM can assume that the signal is not modified by the call to > the beamformer. > I think a good balance is to have only one input-output parameters, which let us get rid of the memcpy, but let it be modified for other types of beamformers. https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:389: process_transform_->ProcessChunk(input.channels(0), dummy_out->channels(0)); On 2016/05/30 11:49:25, peah-webrtc wrote: > I think changing to interface would be much preferably rather than to avoid this > by using a dummy output. Apart from avoiding to changing/extending the interface > I cannot see the point of this. > > And changing the interface to reflect how the beamformer actually performs seems > like the natural way to address this issue rather than to mock a beamformer > behavior that is not there. The necessity of having a dummy_out has nothing to do with the Beamformer interface, but with the LappedTransform. But changing that class (which I agree we probably should) should probably be in its own CL. https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:439: size_t num_output_channels, On 2016/05/30 11:49:25, peah-webrtc wrote: > With this implementation, the last two parameters are unused. Is there no other > way to achieve this? Can the LappedTransform only be used in a way that assumes > an input-output manner? With this solution, this is more complex than needed, > since it actually does the synthesis unnecessarily on dummy data. Good point! Using 0 output channels achieves this. https://codereview.webrtc.org/1982183002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:443: RTC_CHECK_EQ(1u, num_output_channels); On 2016/05/30 11:49:25, peah-webrtc wrote: > I guess this CHECK can be removed, right? I changed it to 0 to make sure it only does synthesis.
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/06/01 00:16:34, aluebs-webrtc wrote: > On 2016/05/30 11:49:25, peah-webrtc wrote: > > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > > On 2016/05/26 08:48:52, peah-webrtc wrote: > > > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > > > Is there a reason why this is not implemented in the .cc file? > > > > > > > > > > Beamformer is an interface, it has no .cc file. > > > > > > > > My skills on this are limited, but to me, an interface is something that > the > > > > user of the interface needs to implement. If it has a default > > implementation, > > > it > > > > is not a pure interface. Is there a reason for why this method cannot be > > > > declared pure virtual so that this is a proper interface that enforces the > > > > implementer of the interface to supply a interface implementation? > > > > > > It already is not a pure interface, because IsInBeam() has a default > > > implementation. This has a default empty implementation because not all > > > Beamformers have postfilters. > > > > Do you mean that there are more beamformers that use this interface? > > > > Would it not be better then to let the actual beamformer implementing the > > interface to explicitly provide an empty implementation, rather to have it > > pre-provided? > > > > At this point there is only the non-linear Beamformer in WebRTC, but there were > plans of porting other types if we think they provide a benefit. I think having > a default implementation is more intuitive and adds less code-complexity. First, > because it doesn't force all Beamformers, even the ones that have nothing to do > with postfilters, implement it. But also, because this means that every > Beamformer needs it's own empty implementation. If there are no other beamformer solutions that should implement this interface at the moment, I really don't think we should imposing restrictions on the interface or add any code at all at the moment based on the possible behavior of future beamformer solutions. As it is now, the default implementation only adds code-complexity (as it is not needed). It may possibly reduce code complexity in a future implementation, but that we don't know until we have seen that implementation. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:407: memcpy(output->channels(i)[0], On 2016/06/01 00:16:34, aluebs-webrtc wrote: > On 2016/05/30 11:49:25, peah-webrtc wrote: > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > what happens if the output is dual channel? Since the ProcessChunk method > > > locks > > > > the content of the output to be single-channel, it should either assert > that > > > the > > > > number of output channels allready is one, or explicitly set the number of > > > > channels to 1 in the output. > > > > > > This can't be done as is, since the input and output can be the same > > > ChannelBuffer and the ChannelBuffer has a fixed number of channels. > > > > So this means that the input could have 2 channels, and then the output gets > > only 1 channel, but the meta-information (about the number of channels) in the > > output does still not reflect that, which means that the caller needs to know > > this and explicitly set the number of channels in the meta-information to 1! > > That sounds error prone and more complicated than necessary. > > > > It would be better then to replace this ProcessChunk call by two other calls, > > One similar to this one, but that does not allow in-place calls, and another > > that is explicitly in-place and takes only one ChannelBuffer pointer. > > That would allow the ProcessChunk method to change the number of channels. > > I don't think that adding an additional interface is a good idea. It only > increases code complexity. But, modifying the ChannelBuffer behavior, we can set > the number of output channels to 1. It actually does not add code complexity, as it seems that only one method is needed by the current beamformer. And I don't think we should keep any code only to support (possible) future beamformers. That code could instead then be added when those are introduced. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->ProcessChunk(*ca->split_data_f(), On 2016/06/01 00:16:34, aluebs-webrtc wrote: > On 2016/05/30 11:49:25, peah-webrtc wrote: > > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > > On 2016/05/26 08:48:53, peah-webrtc wrote: > > > > If the ProcessChunk is made input-only, you should be able to pass > > > > ca->split_bands_const_f here instead, right? > > > > > > Because other Beamformers might do some processing (even all of it) in > > > ProcessChunk(), I think we should leave it as is. > > > > Are there other beamformer implementations, apart from the one reviewed in > this > > CL, that could be called here? > > > > I don't think it is a good idea to keep it like this, because it gives the > > impression that the beamformer is actually changing the signal, which it is > not. > > Furthermore, it unnecessarily makes the APM sensitive to any bugs in the > > beamformer implementation that accidentally changes the signal. > > > > If there are other beamformer implementations that could actually change the > > signal, they should not be allowed to be used together with the AEC, so in > that > > case it would make sense to place a DCHECK that the AEC is not activated if > they > > are active. > > There are no other beamformers in WebRTC at the moment, but there were plans to > support multiple different ones if there is a use case for them. If going from > stereo to mono is to modify the signal, it actually does modify it. And also, > other linear beamformers that actually change the signal here should totally be > allowed to be used with the AEC. I don't think we should write/keep code to support possibly upcoming beamformer solutions. That code can instead be (re-)added at that point. You are correct that going from mono to stereo is a signal change, but that I would then prefer to keep outside of the beamformer as it is more explicit. To me, downmixing from stereo to mono is not something that is that is the task of a beamformer, in particular when this is the only signal modification done by a beamformer. Another drawback with that solution is that the downmixing unnecessarily forces the rest of the audio pipeline to be mono. This is not at all needed and if this restriction is imposed by the beamformer, the beamformer will need to be turned off whenever a stereo signal is to be processed by the rest of the components. I'm not sure what you mean with the linear beamformer and the AEC? Do you mean a fixed linear beamformer? That I definitely agree should be fine, but until such a beamformer is developed, I don't think it makes sense to keep/add code that supports that. Adding that code is very easily done when that beamformer solution is developed. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:383: ChannelBuffer<float>* output) { On 2016/06/01 00:16:34, aluebs-webrtc wrote: > On 2016/05/30 11:49:25, peah-webrtc wrote: > > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > > On 2016/05/26 08:48:53, peah-webrtc wrote: > > > > Since the linear beamforming is no longer done, output could be removed, > > > right? > > > > Alternative, if you want to maintain this method for when the linear > > > beamforming > > > > is to be used, I think it would make sense to add another method which is > > > > input-only. > > > > > > Adding yet another interface, increases code complexity. I think we want to > > > leave it as is: one processing method and a non-linear postfilter method. > > > > That depends, as there is currently no beamformer (?) that uses the output > > functionality of this method, right? And if there is, if another method is > added > > to the interface, the memcopy done below could be removed so I think the > amount > > of code would possibly decrease with another interface. > > > > > > Furthermore, there are clear benefits with another interface that could be > > argued outweights the potential drawback of code complexity for the > beamformer: > > 1) increases code readability for APM as it is clear that this is an > input-only > > functionality. > > 2) increases code security for the APM, as adaptive processing of the signal > > prior to the AEC is explicitly prevented. > > 3) reduces computational complexity as the below memcopy is not needed and > > furthermore the APM can assume that the signal is not modified by the call to > > the beamformer. > > > > I think a good balance is to have only one input-output parameters, which let us > get rid of the memcpy, but let it be modified for other types of beamformers. Since ProcessChunk does not alter the signal, I see no reason for not having it input-only. The exception is supporting upcoming beamformer solutions but then the code for that can be altered at that point. The likelihood is anyway very high that any code that is now customized to match upcoming beamformer solutions will anyway need to be changed according to new requirements in these. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... File webrtc/common_audio/channel_buffer.h (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... webrtc/common_audio/channel_buffer.h:92: // 0 <= channel < |num_official_channels_| This seems very confusing to me. What is the difference between num_official_channels and num_channels? Could you please explain further why this change in ChannelBuffer is needed? https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_buffer.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_buffer.cc:187: data_->set_num_channels(num_proc_channels_); Could you please explain why this change is needed? https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_buffer.cc:353: if (split_data_.get()) { Could you please explain why this change is needed? https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/splitting_filter.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:75: RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); Could you please explain why this change is needed? https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:76: for (size_t i = 0; i < data->num_channels(); ++i) { Could you please explain why this change is needed? https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:98: RTC_DCHECK_LE(data->num_channels(), three_band_filter_banks_.size()); Could you please explain why this change is needed?
https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/beamformer.h:37: virtual void PostFilter(const ChannelBuffer<float>& input, On 2016/06/01 14:51:01, peah-webrtc wrote: > On 2016/06/01 00:16:34, aluebs-webrtc wrote: > > On 2016/05/30 11:49:25, peah-webrtc wrote: > > > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > > > On 2016/05/26 08:48:52, peah-webrtc wrote: > > > > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > > > > Is there a reason why this is not implemented in the .cc file? > > > > > > > > > > > > Beamformer is an interface, it has no .cc file. > > > > > > > > > > My skills on this are limited, but to me, an interface is something that > > the > > > > > user of the interface needs to implement. If it has a default > > > implementation, > > > > it > > > > > is not a pure interface. Is there a reason for why this method cannot be > > > > > declared pure virtual so that this is a proper interface that enforces > the > > > > > implementer of the interface to supply a interface implementation? > > > > > > > > It already is not a pure interface, because IsInBeam() has a default > > > > implementation. This has a default empty implementation because not all > > > > Beamformers have postfilters. > > > > > > Do you mean that there are more beamformers that use this interface? > > > > > > Would it not be better then to let the actual beamformer implementing the > > > interface to explicitly provide an empty implementation, rather to have it > > > pre-provided? > > > > > > > At this point there is only the non-linear Beamformer in WebRTC, but there > were > > plans of porting other types if we think they provide a benefit. I think > having > > a default implementation is more intuitive and adds less code-complexity. > First, > > because it doesn't force all Beamformers, even the ones that have nothing to > do > > with postfilters, implement it. But also, because this means that every > > Beamformer needs it's own empty implementation. > > If there are no other beamformer solutions that should implement this interface > at the moment, I really don't think we should imposing restrictions on the > interface or add any code at all at the moment based on the possible behavior of > future beamformer solutions. > > As it is now, the default implementation only adds code-complexity (as it is not > needed). It may possibly reduce code complexity in a future implementation, but > that we don't know until we have seen that implementation. I think we should keep the code scalable, specially if the "code-complexity addition" is literally "{}". But if you insist, I dropped the whole interface, since it makes no sense to have one for only one Beamformer implementation in WebRTC. https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:407: memcpy(output->channels(i)[0], On 2016/06/01 14:51:01, peah-webrtc wrote: > On 2016/06/01 00:16:34, aluebs-webrtc wrote: > > On 2016/05/30 11:49:25, peah-webrtc wrote: > > > On 2016/05/26 01:04:45, aluebs-webrtc wrote: > > > > On 2016/05/22 21:06:48, peah-webrtc wrote: > > > > > what happens if the output is dual channel? Since the ProcessChunk > method > > > > locks > > > > > the content of the output to be single-channel, it should either assert > > that > > > > the > > > > > number of output channels allready is one, or explicitly set the number > of > > > > > channels to 1 in the output. > > > > > > > > This can't be done as is, since the input and output can be the same > > > > ChannelBuffer and the ChannelBuffer has a fixed number of channels. > > > > > > So this means that the input could have 2 channels, and then the output gets > > > only 1 channel, but the meta-information (about the number of channels) in > the > > > output does still not reflect that, which means that the caller needs to > know > > > this and explicitly set the number of channels in the meta-information to 1! > > > That sounds error prone and more complicated than necessary. > > > > > > It would be better then to replace this ProcessChunk call by two other > calls, > > > One similar to this one, but that does not allow in-place calls, and another > > > that is explicitly in-place and takes only one ChannelBuffer pointer. > > > That would allow the ProcessChunk method to change the number of channels. > > > > I don't think that adding an additional interface is a good idea. It only > > increases code complexity. But, modifying the ChannelBuffer behavior, we can > set > > the number of output channels to 1. > > It actually does not add code complexity, as it seems that only one method is > needed by the current beamformer. And I don't think we should keep any code only > to support (possible) future beamformers. That code could instead then be added > when those are introduced. Interface removed and changed to what input-only. https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->ProcessChunk(*ca->split_data_f(), On 2016/06/01 14:51:01, peah-webrtc wrote: > On 2016/06/01 00:16:34, aluebs-webrtc wrote: > > On 2016/05/30 11:49:25, peah-webrtc wrote: > > > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > > > On 2016/05/26 08:48:53, peah-webrtc wrote: > > > > > If the ProcessChunk is made input-only, you should be able to pass > > > > > ca->split_bands_const_f here instead, right? > > > > > > > > Because other Beamformers might do some processing (even all of it) in > > > > ProcessChunk(), I think we should leave it as is. > > > > > > Are there other beamformer implementations, apart from the one reviewed in > > this > > > CL, that could be called here? > > > > > > I don't think it is a good idea to keep it like this, because it gives the > > > impression that the beamformer is actually changing the signal, which it is > > not. > > > Furthermore, it unnecessarily makes the APM sensitive to any bugs in the > > > beamformer implementation that accidentally changes the signal. > > > > > > If there are other beamformer implementations that could actually change the > > > signal, they should not be allowed to be used together with the AEC, so in > > that > > > case it would make sense to place a DCHECK that the AEC is not activated if > > they > > > are active. > > > > There are no other beamformers in WebRTC at the moment, but there were plans > to > > support multiple different ones if there is a use case for them. If going from > > stereo to mono is to modify the signal, it actually does modify it. And also, > > other linear beamformers that actually change the signal here should totally > be > > allowed to be used with the AEC. > > I don't think we should write/keep code to support possibly upcoming beamformer > solutions. That code can instead be (re-)added at that point. > > You are correct that going from mono to stereo is a signal change, but that I > would then prefer to keep outside of the beamformer as it is more explicit. > To me, downmixing from stereo to mono is not something that is that is the task > of a beamformer, in particular when this is the only signal modification done by > a beamformer. > Another drawback with that solution is that the downmixing unnecessarily forces > the rest of the audio pipeline to be mono. This is not at all needed and if this > restriction is imposed by the beamformer, the beamformer will need to be turned > off whenever a stereo signal is to be processed by the rest of the components. > > I'm not sure what you mean with the linear beamformer and the AEC? Do you mean a > fixed linear beamformer? That I definitely agree should be fine, but until such > a beamformer is developed, I don't think it makes sense to keep/add code that > supports that. Adding that code is very easily done when that beamformer > solution is developed. This whole CL is only to support "possibly upcoming AEC", since the current setup seems to be working perfectly fine with today's overly aggressive AEC. I also don't find it hard to follow your comments that are contradicting: should I or should I not force the output ChannelBuffer to be mono? https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/20001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:383: ChannelBuffer<float>* output) { On 2016/06/01 14:51:01, peah-webrtc wrote: > On 2016/06/01 00:16:34, aluebs-webrtc wrote: > > On 2016/05/30 11:49:25, peah-webrtc wrote: > > > On 2016/05/28 03:00:00, aluebs-webrtc wrote: > > > > On 2016/05/26 08:48:53, peah-webrtc wrote: > > > > > Since the linear beamforming is no longer done, output could be removed, > > > > right? > > > > > Alternative, if you want to maintain this method for when the linear > > > > beamforming > > > > > is to be used, I think it would make sense to add another method which > is > > > > > input-only. > > > > > > > > Adding yet another interface, increases code complexity. I think we want > to > > > > leave it as is: one processing method and a non-linear postfilter method. > > > > > > That depends, as there is currently no beamformer (?) that uses the output > > > functionality of this method, right? And if there is, if another method is > > added > > > to the interface, the memcopy done below could be removed so I think the > > amount > > > of code would possibly decrease with another interface. > > > > > > > > > Furthermore, there are clear benefits with another interface that could be > > > argued outweights the potential drawback of code complexity for the > > beamformer: > > > 1) increases code readability for APM as it is clear that this is an > > input-only > > > functionality. > > > 2) increases code security for the APM, as adaptive processing of the signal > > > prior to the AEC is explicitly prevented. > > > 3) reduces computational complexity as the below memcopy is not needed and > > > furthermore the APM can assume that the signal is not modified by the call > to > > > the beamformer. > > > > > > > I think a good balance is to have only one input-output parameters, which let > us > > get rid of the memcpy, but let it be modified for other types of beamformers. > > Since ProcessChunk does not alter the signal, I see no reason for not having it > input-only. The exception is supporting upcoming beamformer solutions but then > the code for that can be altered at that point. The likelihood is anyway very > high that any code that is now customized to match upcoming beamformer solutions > will anyway need to be changed according to new requirements in these. If we would have used my initial suggestion, upcoming Beamformers would not have needed to change the "customizations", since they didn't need to implement the PostFilter method (default empty) and could directly implement the ProcessChunk with an input and output as any normal Beamformer. But with your suggestions of specializing the whole interface to this specific implementation a new Beamformer would be a completely different component. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... File webrtc/common_audio/channel_buffer.h (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... webrtc/common_audio/channel_buffer.h:92: // 0 <= channel < |num_official_channels_| On 2016/06/01 14:51:01, peah-webrtc wrote: > This seems very confusing to me. What is the difference between > num_official_channels and num_channels? > > Could you please explain further why this change in ChannelBuffer is needed? It is needed to change the number of channels of a ChannelBuffer dynamically. The num_channels is the actual space in memory and the num_official_channels is the one surfaced to the users of ChannelBuffer. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_buffer.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_buffer.cc:187: data_->set_num_channels(num_proc_channels_); On 2016/06/01 14:51:01, peah-webrtc wrote: > Could you please explain why this change is needed? To re-init the ChannelBuffers to their original number of channels. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_buffer.cc:353: if (split_data_.get()) { On 2016/06/01 14:51:01, peah-webrtc wrote: > Could you please explain why this change is needed? To change the ChannelBuffers to the new number of channels when the AudioBuffer number of channels is changed. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/splitting_filter.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:75: RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); On 2016/06/01 14:51:01, peah-webrtc wrote: > Could you please explain why this change is needed? When there is a change in number of channels between splitting and merging, the size and channels in the ChannelBuffer are not the same, but it only needs to be lower or equal than the former. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:76: for (size_t i = 0; i < data->num_channels(); ++i) { On 2016/06/01 14:51:01, peah-webrtc wrote: > Could you please explain why this change is needed? Please see above. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:98: RTC_DCHECK_LE(data->num_channels(), three_band_filter_banks_.size()); On 2016/06/01 14:51:01, peah-webrtc wrote: > Could you please explain why this change is needed? Please see above.
Friendly ping
peah, could you please have a look at this?
https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... File webrtc/common_audio/channel_buffer.h (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... webrtc/common_audio/channel_buffer.h:92: // 0 <= channel < |num_official_channels_| On 2016/06/01 22:13:20, aluebs-webrtc wrote: > On 2016/06/01 14:51:01, peah-webrtc wrote: > > This seems very confusing to me. What is the difference between > > num_official_channels and num_channels? > > > > Could you please explain further why this change in ChannelBuffer is needed? > > It is needed to change the number of channels of a ChannelBuffer dynamically. > The num_channels is the actual space in memory and the num_official_channels is > the one surfaced to the users of ChannelBuffer. I still don't see why this change is here. But I'll post the question in the latest patch instead. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_buffer.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_buffer.cc:187: data_->set_num_channels(num_proc_channels_); On 2016/06/01 22:13:20, aluebs-webrtc wrote: > On 2016/06/01 14:51:01, peah-webrtc wrote: > > Could you please explain why this change is needed? > > To re-init the ChannelBuffers to their original number of channels. I still don't understand why this change is needed in this CL. But I'll post the question in the latest patch instead. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/splitting_filter.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:75: RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); On 2016/06/01 22:13:20, aluebs-webrtc wrote: > On 2016/06/01 14:51:01, peah-webrtc wrote: > > Could you please explain why this change is needed? > > When there is a change in number of channels between splitting and merging, the > size and channels in the ChannelBuffer are not the same, but it only needs to be > lower or equal than the former. Do you mean that before this CL, the size in the channels was changed by the beamformer, but that the number of channels needed to be changed inside APM by setting it to 1? And with the changes in this CL, which no longer applies downmixing inside the beamformer, the downmixing instead needs to be done as part of the synthesis? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/common_audio/chann... File webrtc/common_audio/channel_buffer.h (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/common_audio/chann... webrtc/common_audio/channel_buffer.h:51: num_channels_(num_channels), I see that the changes in audio_buffer are needed for the changes in lapped_transform. What I cannot see though, is why these changes are needed for this CL? As I see it, the difference with this in this CL is that the downmixing has been moved from within the beamformer to being done inside APM (but outside the beamformer). I thought before this CL this downmixing was partly done by the call ca->set_num_channels(1); done inside APM and partly by the linear beamforming done inside the beamformer, right? And the way I interpret it, these changes that are now done in the audio_buffer, channel_buffer and splitting_filter are needed to perform all of the downmixing at the call to ca->set_num_channels(1); Is that correct? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->AnalyzeChunk(*ca->split_data_f()); What has changed here, is that now all the downmixing is done by the set_num_channels call, in contrast to that before the downmixing was done inside the beamformer, but the set_num_channels call was needed to correct for the new number of channels? Is that correct? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:191: RTC_CHECK_EQ(1u, num_input_channels); As it is now, the beamformer will crash the application if the number of input and output channels is other than 1. Is it necessary to have this restriction here? There is not really anything that prevents final_mask to be applied on multichannel input/output, right? If needed, it would be nice to place it higher up in the beamformer code, and also to DCHECK instead of CHECK. If possible, it would be nice to remove the constraint of the postfilter as the other components do not have the single-channel restriction and there is no theoretical constraint that limits the number of channels to 1. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:233: dummy_out_.reset(new ChannelBuffer<float>(chunk_length_, 0u)); Why do we need to allocate a dummy output of a full buffer length? Is it not possible to allocate it with a zero length, and zero number of channels, as the zero number of channels should cause no output to be produced, and therefore a zero output buffer length would be feasible? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:388: process_transform_->ProcessChunk(data.channels(0), dummy_out_->channels(0)); Do you really need to pass a dummy output buffer here? Would it not be sufficient with a null pointer, or a pointer to a local scalar? (What I'm looking at is whether it would be possible to move away from having the dummy_out member variable which is only used to make the input/output functionality input-only in the lapped transform.) https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:405: data->channels(i)[0][j] *= smoothed_mask; This code assumes single-channel processing, right? But the CHECKS for that is only in the ProcessAudioBlock. Since mono-channel is assumed hereing, it would be nice to have the CHECKS here as well. Or even better, adjust the postfilter to support multichannel data. Afaics that should be quite easily done. WDYT? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:154: std::unique_ptr<ChannelBuffer<float>> dummy_out_; If possible, it would be nice to avoid this one. Please see my other comments about that. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_test.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_test.cc:74: bf.PostFilter(&buf); Does this really work? The AnalyzeChunk does not downmix the signal, right? But afaics the PostFilter assumes a mono signal and CHECKS otherwise. Or am I missing something? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_unittest.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_unittest.cc:66: capture_audio_buffer->set_num_channels(1); With the changes in this CL, the set_num_channels(1) calls performs a downmix, right?
Patchset #6 (id:100001) has been deleted
https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... File webrtc/common_audio/channel_buffer.h (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/common_audio/chann... webrtc/common_audio/channel_buffer.h:92: // 0 <= channel < |num_official_channels_| On 2016/06/08 12:04:55, peah-webrtc wrote: > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > This seems very confusing to me. What is the difference between > > > num_official_channels and num_channels? > > > > > > Could you please explain further why this change in ChannelBuffer is needed? > > > > It is needed to change the number of channels of a ChannelBuffer dynamically. > > The num_channels is the actual space in memory and the num_official_channels > is > > the one surfaced to the users of ChannelBuffer. > > I still don't see why this change is here. But I'll post the question in the > latest patch instead. Acknowledged. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_buffer.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_buffer.cc:187: data_->set_num_channels(num_proc_channels_); On 2016/06/08 12:04:55, peah-webrtc wrote: > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > Could you please explain why this change is needed? > > > > To re-init the ChannelBuffers to their original number of channels. > > I still don't understand why this change is needed in this CL. But I'll post the > question in the latest patch instead. Acknowledged. https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/splitting_filter.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:75: RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); On 2016/06/08 12:04:55, peah-webrtc wrote: > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > Could you please explain why this change is needed? > > > > When there is a change in number of channels between splitting and merging, > the > > size and channels in the ChannelBuffer are not the same, but it only needs to > be > > lower or equal than the former. > > Do you mean that before this CL, the size in the channels was changed by the > beamformer, but that the number of channels needed to be changed inside APM by > setting it to 1? > > And with the changes in this CL, which no longer applies downmixing inside the > beamformer, the downmixing instead needs to be done as part of the synthesis? > No. This was a bug. Before this CL the number of channels of the AudioBuffer was changed, but this was not reflected in the number of channels of the ChannelBuffers, so the synthesis filter was merging the original number of channels, which were later discarded in CopyTo (or InterleaveTo). So the bug had no effects except adding complexity. Now that the number of channels of AudioBuffer is also reflected in the ChannelBuffer, it triggered this overly strict CHECK. So I fixed it. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/common_audio/chann... File webrtc/common_audio/channel_buffer.h (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/common_audio/chann... webrtc/common_audio/channel_buffer.h:51: num_channels_(num_channels), On 2016/06/08 12:04:55, peah-webrtc wrote: > I see that the changes in audio_buffer are needed for the changes in > lapped_transform. > What I cannot see though, is why these changes are needed for this CL? > > As I see it, the difference with this in this CL is that the downmixing has been > moved from within the beamformer to being done inside APM (but outside the > beamformer). > > I thought before this CL this downmixing was partly done by the call > ca->set_num_channels(1); done inside APM and partly by the linear beamforming > done inside the beamformer, right? > > And the way I interpret it, these changes that are now done in the audio_buffer, > channel_buffer and splitting_filter are needed to perform all of the downmixing > at the call to ca->set_num_channels(1); > Is that correct? > Please see my other comment about the bug this solves. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->AnalyzeChunk(*ca->split_data_f()); On 2016/06/08 12:04:55, peah-webrtc wrote: > What has changed here, is that now all the downmixing is done by the > set_num_channels call, in contrast to that before the downmixing was done inside > the beamformer, but the set_num_channels call was needed to correct for the new > number of channels? Is that correct? What do you mean with downmixing? Before there was a frequency domain fixed delay and sum beamformer and then the number of channels was corrected in the AudioBuffer. Now all channels except the first one are discarded. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:191: RTC_CHECK_EQ(1u, num_input_channels); On 2016/06/08 12:04:55, peah-webrtc wrote: > As it is now, the beamformer will crash the application if the number of input > and output channels is other than 1. Is it necessary to have this restriction > here? > There is not really anything that prevents final_mask to be applied on > multichannel input/output, right? If needed, it would be nice to place it higher > up in the beamformer code, and also to DCHECK instead of CHECK. > > If possible, it would be nice to remove the constraint of the postfilter as the > other components do not have the single-channel restriction and there is no > theoretical constraint that limits the number of channels to 1. Constraint removed. Changed to DCHECK. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:233: dummy_out_.reset(new ChannelBuffer<float>(chunk_length_, 0u)); On 2016/06/08 12:04:55, peah-webrtc wrote: > Why do we need to allocate a dummy output of a full buffer length? Is it not > possible to allocate it with a zero length, and zero number of channels, as the > zero number of channels should cause no output to be produced, and therefore a > zero output buffer length would be feasible? It doesn't allocate anything, since if there are 0 number of channels it doesn't matter the length. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:388: process_transform_->ProcessChunk(data.channels(0), dummy_out_->channels(0)); On 2016/06/08 12:04:55, peah-webrtc wrote: > Do you really need to pass a dummy output buffer here? Would it not be > sufficient with a null pointer, or a pointer to a local scalar? (What I'm > looking at is whether it would be possible to move away from having the > dummy_out member variable which is only used to make the input/output > functionality input-only in the lapped transform.) Good point, since I modified lapped_transform to accept 0 as a number of channels I can pass in a nullptr safely. Done. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc:405: data->channels(i)[0][j] *= smoothed_mask; On 2016/06/08 12:04:55, peah-webrtc wrote: > This code assumes single-channel processing, right? But the CHECKS for that is > only in the ProcessAudioBlock. Since mono-channel is assumed hereing, it would > be nice to have the CHECKS here as well. > > Or even better, adjust the postfilter to support multichannel data. Afaics that > should be quite easily done. WDYT? Adjusted the postfilter to support multichannel data. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:154: std::unique_ptr<ChannelBuffer<float>> dummy_out_; On 2016/06/08 12:04:55, peah-webrtc wrote: > If possible, it would be nice to avoid this one. Please see my other comments > about that. Removed. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_test.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_test.cc:74: bf.PostFilter(&buf); On 2016/06/08 12:04:55, peah-webrtc wrote: > Does this really work? The AnalyzeChunk does not downmix the signal, right? But > afaics the PostFilter assumes a mono signal and CHECKS otherwise. Or am I > missing something? It did, because it doesn't use the number of channels of the ChannelBuffer at all. Instead it uses the number of channels set in the constructor. But added a DCHECK to make it more restrictive. https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_unittest.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer_unittest.cc:66: capture_audio_buffer->set_num_channels(1); On 2016/06/08 12:04:55, peah-webrtc wrote: > With the changes in this CL, the set_num_channels(1) calls performs a downmix, > right? If by downmix you mean it just discards all channels except the first one, then yes.
https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/splitting_filter.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:75: RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); On 2016/06/09 02:11:46, aluebs-webrtc wrote: > On 2016/06/08 12:04:55, peah-webrtc wrote: > > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > > Could you please explain why this change is needed? > > > > > > When there is a change in number of channels between splitting and merging, > > the > > > size and channels in the ChannelBuffer are not the same, but it only needs > to > > be > > > lower or equal than the former. > > > > Do you mean that before this CL, the size in the channels was changed by the > > beamformer, but that the number of channels needed to be changed inside APM by > > setting it to 1? > > > > And with the changes in this CL, which no longer applies downmixing inside the > > beamformer, the downmixing instead needs to be done as part of the synthesis? > > > > No. This was a bug. Before this CL the number of channels of the AudioBuffer was > changed, but this was not reflected in the number of channels of the > ChannelBuffers, so the synthesis filter was merging the original number of > channels, which were later discarded in CopyTo (or InterleaveTo). So the bug had > no effects except adding complexity. Now that the number of channels of > AudioBuffer is also reflected in the ChannelBuffer, it triggered this overly > strict CHECK. So I fixed it. Great find!! It would, however, be good to have that bugfix as a separate CL as it 1) Is not at all related to the beamformer. 2) Will also affect all calls where the beamformer is not active. 3) Will change the complexity (reduce). Would it be possible to extract that part into another CL, and land it before this CL is landed so that the changes in the first CL can be rebased into the basis for this CL? https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->AnalyzeChunk(*ca->split_data_f()); On 2016/06/09 02:11:46, aluebs-webrtc wrote: > On 2016/06/08 12:04:55, peah-webrtc wrote: > > What has changed here, is that now all the downmixing is done by the > > set_num_channels call, in contrast to that before the downmixing was done > inside > > the beamformer, but the set_num_channels call was needed to correct for the > new > > number of channels? Is that correct? > > What do you mean with downmixing? Before there was a frequency domain fixed > delay and sum beamformer and then the number of channels was corrected in the > AudioBuffer. Now all channels except the first one are discarded. By downmixing I mean going from 2 to 1 channel, so discarding the leftmost channel definitely counts as that.
https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/splitting_filter.cc (right): https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/splitting_filter.cc:75: RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); On 2016/06/09 09:36:30, peah-webrtc wrote: > On 2016/06/09 02:11:46, aluebs-webrtc wrote: > > On 2016/06/08 12:04:55, peah-webrtc wrote: > > > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > > > Could you please explain why this change is needed? > > > > > > > > When there is a change in number of channels between splitting and > merging, > > > the > > > > size and channels in the ChannelBuffer are not the same, but it only needs > > to > > > be > > > > lower or equal than the former. > > > > > > Do you mean that before this CL, the size in the channels was changed by the > > > beamformer, but that the number of channels needed to be changed inside APM > by > > > setting it to 1? > > > > > > And with the changes in this CL, which no longer applies downmixing inside > the > > > beamformer, the downmixing instead needs to be done as part of the > synthesis? > > > > > > > No. This was a bug. Before this CL the number of channels of the AudioBuffer > was > > changed, but this was not reflected in the number of channels of the > > ChannelBuffers, so the synthesis filter was merging the original number of > > channels, which were later discarded in CopyTo (or InterleaveTo). So the bug > had > > no effects except adding complexity. Now that the number of channels of > > AudioBuffer is also reflected in the ChannelBuffer, it triggered this overly > > strict CHECK. So I fixed it. > > Great find!! It would, however, be good to have that bugfix as a separate CL as > it > 1) Is not at all related to the beamformer. > 2) Will also affect all calls where the beamformer is not active. > 3) Will change the complexity (reduce). > > Would it be possible to extract that part into another CL, and land it before > this CL is landed so that the changes in the first CL can be rebased into the > basis for this CL? 1) It is related to the Beamformer, since... 2) It is only triggered when there is a num_channels change, which only happens when Beamformer is enabled. But I agree it is better to break it out in a different CL. Done here: https://codereview.webrtc.org/2053773002/ https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/audio_processing_impl.cc:683: private_submodules_->beamformer->AnalyzeChunk(*ca->split_data_f()); On 2016/06/09 09:36:30, peah-webrtc wrote: > On 2016/06/09 02:11:46, aluebs-webrtc wrote: > > On 2016/06/08 12:04:55, peah-webrtc wrote: > > > What has changed here, is that now all the downmixing is done by the > > > set_num_channels call, in contrast to that before the downmixing was done > > inside > > > the beamformer, but the set_num_channels call was needed to correct for the > > new > > > number of channels? Is that correct? > > > > What do you mean with downmixing? Before there was a frequency domain fixed > > delay and sum beamformer and then the number of channels was corrected in the > > AudioBuffer. Now all channels except the first one are discarded. > > By downmixing I mean going from 2 to 1 channel, so discarding the leftmost > channel definitely counts as that. > Personally I find "downmixing" to have a blending implication. But acknowledged.
On 2016/06/09 19:21:42, aluebs-webrtc wrote: > https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... > File webrtc/modules/audio_processing/splitting_filter.cc (right): > > https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... > webrtc/modules/audio_processing/splitting_filter.cc:75: > RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); > On 2016/06/09 09:36:30, peah-webrtc wrote: > > On 2016/06/09 02:11:46, aluebs-webrtc wrote: > > > On 2016/06/08 12:04:55, peah-webrtc wrote: > > > > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > > > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > > > > Could you please explain why this change is needed? > > > > > > > > > > When there is a change in number of channels between splitting and > > merging, > > > > the > > > > > size and channels in the ChannelBuffer are not the same, but it only > needs > > > to > > > > be > > > > > lower or equal than the former. > > > > > > > > Do you mean that before this CL, the size in the channels was changed by > the > > > > beamformer, but that the number of channels needed to be changed inside > APM > > by > > > > setting it to 1? > > > > > > > > And with the changes in this CL, which no longer applies downmixing inside > > the > > > > beamformer, the downmixing instead needs to be done as part of the > > synthesis? > > > > > > > > > > No. This was a bug. Before this CL the number of channels of the AudioBuffer > > was > > > changed, but this was not reflected in the number of channels of the > > > ChannelBuffers, so the synthesis filter was merging the original number of > > > channels, which were later discarded in CopyTo (or InterleaveTo). So the bug > > had > > > no effects except adding complexity. Now that the number of channels of > > > AudioBuffer is also reflected in the ChannelBuffer, it triggered this overly > > > strict CHECK. So I fixed it. > > > > Great find!! It would, however, be good to have that bugfix as a separate CL > as > > it > > 1) Is not at all related to the beamformer. > > 2) Will also affect all calls where the beamformer is not active. > > 3) Will change the complexity (reduce). > > > > Would it be possible to extract that part into another CL, and land it before > > this CL is landed so that the changes in the first CL can be rebased into the > > basis for this CL? > > 1) It is related to the Beamformer, since... > 2) It is only triggered when there is a num_channels change, which only happens > when Beamformer is enabled. > > But I agree it is better to break it out in a different CL. Done here: > https://codereview.webrtc.org/2053773002/ > > https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... > File webrtc/modules/audio_processing/audio_processing_impl.cc (right): > > https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... > webrtc/modules/audio_processing/audio_processing_impl.cc:683: > private_submodules_->beamformer->AnalyzeChunk(*ca->split_data_f()); > On 2016/06/09 09:36:30, peah-webrtc wrote: > > On 2016/06/09 02:11:46, aluebs-webrtc wrote: > > > On 2016/06/08 12:04:55, peah-webrtc wrote: > > > > What has changed here, is that now all the downmixing is done by the > > > > set_num_channels call, in contrast to that before the downmixing was done > > > inside > > > > the beamformer, but the set_num_channels call was needed to correct for > the > > > new > > > > number of channels? Is that correct? > > > > > > What do you mean with downmixing? Before there was a frequency domain fixed > > > delay and sum beamformer and then the number of channels was corrected in > the > > > AudioBuffer. Now all channels except the first one are discarded. > > > > By downmixing I mean going from 2 to 1 channel, so discarding the leftmost > > channel definitely counts as that. > > > > Personally I find "downmixing" to have a blending implication. But acknowledged. What is the status of this CL? Will it be updated once the preceding CL is landed? Or will there be another CL that is linked to the newly created CL, and thereby excludes the code changes in that CL?
Patchset #8 (id:160001) has been deleted
On 2016/06/10 11:42:44, peah-webrtc wrote: > On 2016/06/09 19:21:42, aluebs-webrtc wrote: > > > https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... > > File webrtc/modules/audio_processing/splitting_filter.cc (right): > > > > > https://codereview.webrtc.org/1982183002/diff/60001/webrtc/modules/audio_proc... > > webrtc/modules/audio_processing/splitting_filter.cc:75: > > RTC_DCHECK_LE(data->num_channels(), two_bands_states_.size()); > > On 2016/06/09 09:36:30, peah-webrtc wrote: > > > On 2016/06/09 02:11:46, aluebs-webrtc wrote: > > > > On 2016/06/08 12:04:55, peah-webrtc wrote: > > > > > On 2016/06/01 22:13:20, aluebs-webrtc wrote: > > > > > > On 2016/06/01 14:51:01, peah-webrtc wrote: > > > > > > > Could you please explain why this change is needed? > > > > > > > > > > > > When there is a change in number of channels between splitting and > > > merging, > > > > > the > > > > > > size and channels in the ChannelBuffer are not the same, but it only > > needs > > > > to > > > > > be > > > > > > lower or equal than the former. > > > > > > > > > > Do you mean that before this CL, the size in the channels was changed by > > the > > > > > beamformer, but that the number of channels needed to be changed inside > > APM > > > by > > > > > setting it to 1? > > > > > > > > > > And with the changes in this CL, which no longer applies downmixing > inside > > > the > > > > > beamformer, the downmixing instead needs to be done as part of the > > > synthesis? > > > > > > > > > > > > > No. This was a bug. Before this CL the number of channels of the > AudioBuffer > > > was > > > > changed, but this was not reflected in the number of channels of the > > > > ChannelBuffers, so the synthesis filter was merging the original number of > > > > channels, which were later discarded in CopyTo (or InterleaveTo). So the > bug > > > had > > > > no effects except adding complexity. Now that the number of channels of > > > > AudioBuffer is also reflected in the ChannelBuffer, it triggered this > overly > > > > strict CHECK. So I fixed it. > > > > > > Great find!! It would, however, be good to have that bugfix as a separate CL > > as > > > it > > > 1) Is not at all related to the beamformer. > > > 2) Will also affect all calls where the beamformer is not active. > > > 3) Will change the complexity (reduce). > > > > > > Would it be possible to extract that part into another CL, and land it > before > > > this CL is landed so that the changes in the first CL can be rebased into > the > > > basis for this CL? > > > > 1) It is related to the Beamformer, since... > > 2) It is only triggered when there is a num_channels change, which only > happens > > when Beamformer is enabled. > > > > But I agree it is better to break it out in a different CL. Done here: > > https://codereview.webrtc.org/2053773002/ > > > > > https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... > > File webrtc/modules/audio_processing/audio_processing_impl.cc (right): > > > > > https://codereview.webrtc.org/1982183002/diff/80001/webrtc/modules/audio_proc... > > webrtc/modules/audio_processing/audio_processing_impl.cc:683: > > private_submodules_->beamformer->AnalyzeChunk(*ca->split_data_f()); > > On 2016/06/09 09:36:30, peah-webrtc wrote: > > > On 2016/06/09 02:11:46, aluebs-webrtc wrote: > > > > On 2016/06/08 12:04:55, peah-webrtc wrote: > > > > > What has changed here, is that now all the downmixing is done by the > > > > > set_num_channels call, in contrast to that before the downmixing was > done > > > > inside > > > > > the beamformer, but the set_num_channels call was needed to correct for > > the > > > > new > > > > > number of channels? Is that correct? > > > > > > > > What do you mean with downmixing? Before there was a frequency domain > fixed > > > > delay and sum beamformer and then the number of channels was corrected in > > the > > > > AudioBuffer. Now all channels except the first one are discarded. > > > > > > By downmixing I mean going from 2 to 1 channel, so discarding the leftmost > > > channel definitely counts as that. > > > > > > > Personally I find "downmixing" to have a blending implication. But > acknowledged. > > What is the status of this CL? Will it be updated once the preceding CL is > landed? Or will there be another CL that is linked to the newly created CL, and > thereby excludes the code changes in that CL? I updated this CL to exclude the changes of the other one. It can land before or after the other CL. So please let me know if you have any additional comments on this change.
peah, do you have any additional comments on this?
https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/audio_processing_impl.cc:691: ca->set_num_channels(1); The handling of the number of channels is a bit messy. As it is now, all the pipeline supports multichannel signals (maybe not in practice, but at least in theory). Would it not make sense to here set the number of channels equal to the desired number of output channels? I.e.: formats_.api_format.output_stream().num_channels()? As it is now, that format is overridden when the beamformer is activated which is not that nice. It may have been necessary to do it like that before this CL, but now the pipeline supports multichannel processing which would motivate changing this. WDYT? https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:75: // Applies the postfilter mask to one chunk of audio. The audio is expected to Suggestion: Since the comments are quite long, I think the code would benefit in readability if there is a space between AnalyzeChunk and the following comment. https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:78: // channels is expected to be 1, since that is the output number of channels Please rewrite this comment as the constructor takes num_postfilter_channels as an input which means that the the number of channels could be more than 1. Furthermore, the ProcessChunk method is now called AnalyzeChunk and has no outputs.
Patchset #10 (id:220001) has been deleted
https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/audio_processing_impl.cc (right): https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/audio_processing_impl.cc:691: ca->set_num_channels(1); On 2016/06/23 22:02:40, peah-webrtc wrote: > The handling of the number of channels is a bit messy. > > As it is now, all the pipeline supports multichannel signals (maybe not in > practice, but at least in theory). > > Would it not make sense to here set the number of channels equal to the desired > number of output channels? I.e.: > formats_.api_format.output_stream().num_channels()? > > As it is now, that format is overridden when the beamformer is activated which > is not that nice. It may have been necessary to do it like that before this CL, > but now the pipeline supports multichannel processing which would motivate > changing this. > > WDYT? That makes sense, but it should be a different CL. It has nothing to do with pulling out the PostFilter from the Beamformer. https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:75: // Applies the postfilter mask to one chunk of audio. The audio is expected to On 2016/06/23 22:02:40, peah-webrtc wrote: > Suggestion: Since the comments are quite long, I think the code would benefit in > readability if there is a space between AnalyzeChunk and the following comment. Done. https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:78: // channels is expected to be 1, since that is the output number of channels On 2016/06/23 22:02:40, peah-webrtc wrote: > Please rewrite this comment as the constructor takes num_postfilter_channels as > an input which means that the the number of channels could be more than 1. > Furthermore, the ProcessChunk method is now called AnalyzeChunk and has no > outputs. Done.
On 2016/06/24 03:00:54, aluebs-webrtc wrote: > https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... > File webrtc/modules/audio_processing/audio_processing_impl.cc (right): > > https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... > webrtc/modules/audio_processing/audio_processing_impl.cc:691: > ca->set_num_channels(1); > On 2016/06/23 22:02:40, peah-webrtc wrote: > > The handling of the number of channels is a bit messy. > > > > As it is now, all the pipeline supports multichannel signals (maybe not in > > practice, but at least in theory). > > > > Would it not make sense to here set the number of channels equal to the > desired > > number of output channels? I.e.: > > formats_.api_format.output_stream().num_channels()? > > > > As it is now, that format is overridden when the beamformer is activated which > > is not that nice. It may have been necessary to do it like that before this > CL, > > but now the pipeline supports multichannel processing which would motivate > > changing this. > > > > WDYT? > > That makes sense, but it should be a different CL. It has nothing to do with > pulling out the PostFilter from the Beamformer. > > https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... > File webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h (right): > > https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... > webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:75: // Applies > the postfilter mask to one chunk of audio. The audio is expected to > On 2016/06/23 22:02:40, peah-webrtc wrote: > > Suggestion: Since the comments are quite long, I think the code would benefit > in > > readability if there is a space between AnalyzeChunk and the following > comment. > > Done. > > https://codereview.webrtc.org/1982183002/diff/200001/webrtc/modules/audio_pro... > webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h:78: // > channels is expected to be 1, since that is the output number of channels > On 2016/06/23 22:02:40, peah-webrtc wrote: > > Please rewrite this comment as the constructor takes num_postfilter_channels > as > > an input which means that the the number of channels could be more than 1. > > Furthermore, the ProcessChunk method is now called AnalyzeChunk and has no > > outputs. > > Done. lgtm
The CQ bit was checked by aluebs@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from peah@webrtc.org Link to the patchset: https://codereview.webrtc.org/1982183002/#ps280001 (title: "Make NonlinearBeamforer interface virtual")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
aluebs@webrtc.org changed reviewers: + henrik.lundin@webrtc.org
henrik.lundin, could you please have a look at webrtc/common_audio/lapped_transform.cc. Thank you very much!
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_dbg on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/android_dbg/builds/14522)
lgtm
The CQ bit was checked by aluebs@webrtc.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.webrtc.org/...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: android_dbg on master.tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/android_dbg/builds/14565)
Description was changed from ========== Pull out the PostFilter to its own NonlinearBeamformer API This is done to avoid having a nonlinear component in the AEC path. Now the linear delay and sum is run before the AEC and the postfilter after it. ========== to ========== Pull out the PostFilter to its own NonlinearBeamformer API This is done to avoid having a nonlinear component in the AEC path. Now the linear delay and sum is run before the AEC and the postfilter after it. R=henrik.lundin@webrtc.org, peah@webrtc.org Committed: https://chromium.googlesource.com/external/webrtc/+/b983112bc7686ed4276def4c9... ==========
Description was changed from ========== Pull out the PostFilter to its own NonlinearBeamformer API This is done to avoid having a nonlinear component in the AEC path. Now the linear delay and sum is run before the AEC and the postfilter after it. R=henrik.lundin@webrtc.org, peah@webrtc.org Committed: https://chromium.googlesource.com/external/webrtc/+/b983112bc7686ed4276def4c9... ========== to ========== Pull out the PostFilter to its own NonlinearBeamformer API This is done to avoid having a nonlinear component in the AEC path. Now the linear delay and sum is run before the AEC and the postfilter after it. R=henrik.lundin@webrtc.org, peah@webrtc.org Committed: https://crrev.com/b983112bc7686ed4276def4c9215c498444c6bdd Cr-Commit-Position: refs/heads/master@{#13314} ==========
Committed patchset #12 (id:280001) manually as b983112bc7686ed4276def4c9215c498444c6bdd (presubmit successful).
Patchset 12 (id:??) landed as https://crrev.com/b983112bc7686ed4276def4c9215c498444c6bdd Cr-Commit-Position: refs/heads/master@{#13314} |