|
|
Created:
5 years ago by minyue-webrtc Modified:
4 years, 11 months ago Reviewers:
peah-webrtc CC:
webrtc-reviews_webrtc.org, Andrew MacDonald, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, kwiberg-webrtc, the sun, aluebs-webrtc, bjornv1 Base URL:
https://chromium.googlesource.com/external/webrtc.git@master Target Ref:
refs/pending/heads/master Project:
webrtc Visibility:
Public. |
DescriptionCalculate audio levels in AEC in time domain.
In AEC, audio levels are calculated in frequency domain. This makes the calculation dependent on FFT. We now make the calculation performed in time domain. The complexity is the same, but the dependence on FFT is removed.
BUG=
Committed: https://crrev.com/9846845da6ee88bf16cb5fc62c6839ed7aafe04c
Cr-Commit-Position: refs/heads/master@{#11357}
Patch Set 1 #Patch Set 2 : removing test code #
Total comments: 6
Patch Set 3 : normalizing |noisePower| #Messages
Total messages: 19 (6 generated)
Description was changed from ========== Merge branch 'master' into UpdateLevelInTime Merge branch 'ace_farend_timedomain' into UpdateLevelInTime patch finished should get time domain signal BUG= ========== to ========== Calculate audio levels in AEC in time domain. In AEC, audio levels are calculated in frequency domain. This makes the calculation dependent on FFT. We now make the calculation performed in time domain. The complexity is the same, but the dependence on FFT is removed. BUG= ==========
minyue@webrtc.org changed reviewers: + peah@webrtc.org
Patchset #2 (id:20001) has been deleted
Hi Per, I changed the way of calculating audio level in AEC from frequency domain to time domain. Patch set 1 contains a comparison between the old way and the new way. There can be numerical differences, but the difference is always smaller than 0.001% of the old result. Patch set 2 removes the checking code, and removed a common factor on them. Because it is a common factor, the metrics (ERLE etc) will be unaffected (with certain precision) Sorry for not being able to make Patch set 1 success on all platforms. Release codes failed to compile because assert() is used. I cannot use RTC_CHECK since it is not for C-code. All debug bots succeeded.
https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/aec/aec_core.c (right): https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/aec/aec_core.c:575: return energy / num_samples; I'm a bit concerned with this computation, while it is an accurate power computation, it differs very much from how it was used before in the sense that: 1) Before it was an energy computation, now it is a power computation. 2) The energy was not correctly computed before, and neither does not match the way the power is computed. While this may be fine, I would like a more clear explanation to why the current implementation differs, why it is ok that it does that, and what any potential differences would be. In matlab: x = [1:8]; X = fft(x); true_time_domain_power = sum(x.^2)/8 true_time_domain_energy = sum(x.^2) true_frequency_domain_energy = sum(abs(X).^2)/8 previous_frequency_domain_energy_approximation = (sum(abs(X(1:4).^2)) + abs(X(5).^2))/8 To summarize: -In the new function, you compute true_time_domain_power -Before previous_frequency_domain_energy_approximation was used https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/aec/aec_core.c:1291: if (aec->metricsMode == 1) { I think it is better to bundle the UpdateLevel calls to happen in the same place as it improves readability. It will also save some lines in the code.
https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/aec/aec_core.c (right): https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/aec/aec_core.c:575: return energy / num_samples; On 2015/12/22 10:54:32, peah-webrtc wrote: > I'm a bit concerned with this computation, while it is an accurate power > computation, it differs very much from how it was used before in the sense that: > 1) Before it was an energy computation, now it is a power computation. > 2) The energy was not correctly computed before, and neither does not match the > way the power is computed. > > While this may be fine, I would like a more clear explanation to why the current > implementation differs, why it is ok that it does that, and what any potential > differences would be. > > > > In matlab: > x = [1:8]; > X = fft(x); > true_time_domain_power = sum(x.^2)/8 > true_time_domain_energy = sum(x.^2) > true_frequency_domain_energy = sum(abs(X).^2)/8 > previous_frequency_domain_energy_approximation = (sum(abs(X(1:4).^2)) + > abs(X(5).^2))/8 > > To summarize: > -In the new function, you compute > true_time_domain_power > -Before > previous_frequency_domain_energy_approximation > was used Your finding is very true, and that is the reason, if you look at the patch set 1, the power I calculate now is (2.0f / length) times the old value. 2.0f is because that old calculation is half the energy, which is unnatural. Using power (rather than energy of a predefined length) for various audio levels makes more sense to me. The only problem is that these audio levels will have different values. But looking at where they are used, we can see that they all are used in a form of "A / B" to obtain various metrics. And that is why normalizing these audio levels in this CL does not cause any unittest to fail. I have tested using different normalization on A and B, and by that our unittest fails.
https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/aec/aec_core.c (right): https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/aec/aec_core.c:575: return energy / num_samples; On 2015/12/22 11:20:01, minyue-webrtc wrote: > On 2015/12/22 10:54:32, peah-webrtc wrote: > > I'm a bit concerned with this computation, while it is an accurate power > > computation, it differs very much from how it was used before in the sense > that: > > 1) Before it was an energy computation, now it is a power computation. > > 2) The energy was not correctly computed before, and neither does not match > the > > way the power is computed. > > > > While this may be fine, I would like a more clear explanation to why the > current > > implementation differs, why it is ok that it does that, and what any potential > > differences would be. > > > > > > > > In matlab: > > x = [1:8]; > > X = fft(x); > > true_time_domain_power = sum(x.^2)/8 > > true_time_domain_energy = sum(x.^2) > > true_frequency_domain_energy = sum(abs(X).^2)/8 > > previous_frequency_domain_energy_approximation = (sum(abs(X(1:4).^2)) + > > abs(X(5).^2))/8 > > > > To summarize: > > -In the new function, you compute > > true_time_domain_power > > -Before > > previous_frequency_domain_energy_approximation > > was used > > Your finding is very true, and that is the reason, if you look at the patch set > 1, the power I calculate now is (2.0f / length) times the old value. 2.0f is > because that old calculation is half the energy, which is unnatural. > > Using power (rather than energy of a predefined length) for various audio levels > makes more sense to me. > > The only problem is that these audio levels will have different values. But > looking at where they are used, we can see that they all are used in a form of > "A / B" to obtain various metrics. And that is why normalizing these audio > levels in this CL does not cause any unittest to fail. I have tested using > different normalization on A and B, and by that our unittest fails. > > > I totally agree that it is better to use the power measure. And it is also good that the unittests do not fail with this change. I have some concerns, however: 1) Do you have a good understanding of the unittest coverage? Is it sufficient to ensure that the fact that the unittests don't fail stands as a good guarantee that this is a valid change? 2) Regarding the former validation code (that was removed by this patch): float power = CalculatePower(e, PART_LEN) * PART_LEN2 / 4.0f; assert(fabs(CalculatePowerOld(e_fft) - power) <= power * kEpsilonRatio); UpdateLevel(linout_level, power); What kind of test data did you validate it on? CalculatePowerOld(e_fft) and CalculatePower(e, PART_LEN) * PART_LEN2 / 4.0f computes the powers based on two different data sets, one 64 and the other 128 samples long, and this is correctly compensated for using the scaling of the output of CalculatePower. I cannot, however, see how it can be ensured that the difference could be lower than kEpsilonRatio as the data sets are different (albeit overlapping). I would expect the assert to trigger on any speech signal. Are you sure you have tested it with speech data using aec->metricsMode == 1? 3) Regarding the statement that they are all used in a form of A/B I found one place where that is not the case: line 620 if (aec->farlevel.minlevel < noisyPower) { noisePower needs to be corrected to ensure that that statement is correct with the new code. 2) https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/aec/aec_core.c:1291: if (aec->metricsMode == 1) { On 2015/12/22 10:54:32, peah-webrtc wrote: > I think it is better to bundle the UpdateLevel calls to happen in the same place > as it improves readability. It will also save some lines in the code. PTAL
On 2016/01/08 13:12:59, peah-webrtc wrote: > https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... > File webrtc/modules/audio_processing/aec/aec_core.c (right): > > https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... > webrtc/modules/audio_processing/aec/aec_core.c:575: return energy / num_samples; > On 2015/12/22 11:20:01, minyue-webrtc wrote: > > On 2015/12/22 10:54:32, peah-webrtc wrote: > > > I'm a bit concerned with this computation, while it is an accurate power > > > computation, it differs very much from how it was used before in the sense > > that: > > > 1) Before it was an energy computation, now it is a power computation. > > > 2) The energy was not correctly computed before, and neither does not match > > the > > > way the power is computed. > > > > > > While this may be fine, I would like a more clear explanation to why the > > current > > > implementation differs, why it is ok that it does that, and what any > potential > > > differences would be. > > > > > > > > > > > > In matlab: > > > x = [1:8]; > > > X = fft(x); > > > true_time_domain_power = sum(x.^2)/8 > > > true_time_domain_energy = sum(x.^2) > > > true_frequency_domain_energy = sum(abs(X).^2)/8 > > > previous_frequency_domain_energy_approximation = (sum(abs(X(1:4).^2)) + > > > abs(X(5).^2))/8 > > > > > > To summarize: > > > -In the new function, you compute > > > true_time_domain_power > > > -Before > > > previous_frequency_domain_energy_approximation > > > was used > > > > Your finding is very true, and that is the reason, if you look at the patch > set > > 1, the power I calculate now is (2.0f / length) times the old value. 2.0f is > > because that old calculation is half the energy, which is unnatural. > > > > Using power (rather than energy of a predefined length) for various audio > levels > > makes more sense to me. > > > > The only problem is that these audio levels will have different values. But > > looking at where they are used, we can see that they all are used in a form of > > "A / B" to obtain various metrics. And that is why normalizing these audio > > levels in this CL does not cause any unittest to fail. I have tested using > > different normalization on A and B, and by that our unittest fails. > > > > > > > > I totally agree that it is better to use the power measure. And it is also good > that the unittests do not fail with this change. > > I have some concerns, however: > > 1) Do you have a good understanding of the unittest coverage? Is it sufficient > to ensure that the fact that the unittests don't fail stands as a good guarantee > that this is a valid change? Good point. I will consider trying more signals in the unittest. > > 2) Regarding the former validation code (that was removed by this patch): > float power = CalculatePower(e, PART_LEN) * PART_LEN2 / 4.0f; > assert(fabs(CalculatePowerOld(e_fft) - power) <= power * kEpsilonRatio); > UpdateLevel(linout_level, power); > > What kind of test data did you validate it on? CalculatePowerOld(e_fft) > and > CalculatePower(e, PART_LEN) * PART_LEN2 / 4.0f > computes the powers based on two different data sets, one 64 and the other 128 > samples long, and this is correctly compensated for using the scaling of the > output of CalculatePower. I cannot, however, see how it can be ensured that the > difference could be lower than kEpsilonRatio as the data sets are different > (albeit overlapping). I would expect the assert to trigger on any speech signal. > It holds because we know that e_fft is FFT on a zero padded version of e. > Are you sure you have tested it with speech data using aec->metricsMode == 1? Yes, changing kEpsilonRatio can make the test fail. > > > 3) Regarding the statement that they are all used in a form of A/B I found one > place where that is not the case: line 620 > if (aec->farlevel.minlevel < noisyPower) { > noisePower needs to be corrected to ensure that that statement is correct with > the new code. good finding. It seems noisyPower was a heuristic large number. But any ways, I will take caution there. > > 2) > > https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... > webrtc/modules/audio_processing/aec/aec_core.c:1291: if (aec->metricsMode == 1) > { > On 2015/12/22 10:54:32, peah-webrtc wrote: > > I think it is better to bundle the UpdateLevel calls to happen in the same > place > > as it improves readability. It will also save some lines in the code. > > PTAL
> > > It holds because we know that e_fft is FFT on a zero padded version of e. > Ah, good point. Then it should be fine, and I expect that any need for kEpsilonRatio larger than 0 (approx) should be due to computational inaccuracies in the FFT.
On 2016/01/08 13:59:16, peah-webrtc wrote: > > > > > It holds because we know that e_fft is FFT on a zero padded version of e. > > > > Ah, good point. Then it should be fine, and I expect that any need for > kEpsilonRatio larger than 0 (approx) > should be due to computational inaccuracies in the FFT. Thanks for the comments. I did a more careful check on how current unittests protect the changes. I also did a normalization on |noisyPower| so that the change should give the same result almost every time. Unit tests (ApmTest.Process) have a good coverage on the metrics. They run nontrivial far-end and near-end signals (resourcs/far**_stereo.pcm and resourcs/near**_stereo.pcm) as input and check with stored references (data/audio_processing/output_data_float.pb). The test can only pass when the metrics equal to the reference. ~~~~ Why don't tests fail after this CL? This CL makes only tiny changes to the audio levels (without normalization). The changes is within 0.00001 times the original numbers, as shown in Patch set 1. The metrics are converted to integer, and therefore, it is rare that the metrics can change. To verify that the test does a protection, I have tried to introduce some error in the calculation, then the test fails. ~~~~ Why don't tests fail even when audio levels are normalized? This is because the metrics are based on devision between audio levels. The normalization factor cancels out. To verify that the test does a protection, I have tried to normalize them with different factors, then the test fails. ~~~~ Why don't tests fail even when |noisyPower| is not normalized accordingly? |noisyPower| plays as a threshold for noise. The number is very large and is thus indifferent from being normalized, in normal cases. To make |noisyPower| effectively unchanged, I have applied the normalization as on audio levels on it. Now it is true "noisePower" instead of "noiseEnergy" In addition, to verify that the test does a protection, I have tried to make |noisyPower| very small, then the test fails.
BTW, putting level updates in one "if (aec->metricsMode == 1)" is also made. https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... File webrtc/modules/audio_processing/aec/aec_core.c (right): https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... webrtc/modules/audio_processing/aec/aec_core.c:1291: if (aec->metricsMode == 1) { On 2016/01/08 13:12:59, peah-webrtc wrote: > On 2015/12/22 10:54:32, peah-webrtc wrote: > > I think it is better to bundle the UpdateLevel calls to happen in the same > place > > as it improves readability. It will also save some lines in the code. > > PTAL sure, will do.
On 2016/01/14 16:53:51, minyue-webrtc wrote: > BTW, putting level updates in one "if (aec->metricsMode == 1)" is also made. > > https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... > File webrtc/modules/audio_processing/aec/aec_core.c (right): > > https://codereview.webrtc.org/1542573002/diff/40001/webrtc/modules/audio_proc... > webrtc/modules/audio_processing/aec/aec_core.c:1291: if (aec->metricsMode == 1) > { > On 2016/01/08 13:12:59, peah-webrtc wrote: > > On 2015/12/22 10:54:32, peah-webrtc wrote: > > > I think it is better to bundle the UpdateLevel calls to happen in the same > > place > > > as it improves readability. It will also save some lines in the code. > > > > PTAL > > sure, will do. a friendly reminder for review on this
Nice! Great work! lgtm
The CQ bit was checked by minyue@webrtc.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1542573002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1542573002/60001
Message was sent while issue was closed.
Description was changed from ========== Calculate audio levels in AEC in time domain. In AEC, audio levels are calculated in frequency domain. This makes the calculation dependent on FFT. We now make the calculation performed in time domain. The complexity is the same, but the dependence on FFT is removed. BUG= ========== to ========== Calculate audio levels in AEC in time domain. In AEC, audio levels are calculated in frequency domain. This makes the calculation dependent on FFT. We now make the calculation performed in time domain. The complexity is the same, but the dependence on FFT is removed. BUG= ==========
Message was sent while issue was closed.
Committed patchset #3 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== Calculate audio levels in AEC in time domain. In AEC, audio levels are calculated in frequency domain. This makes the calculation dependent on FFT. We now make the calculation performed in time domain. The complexity is the same, but the dependence on FFT is removed. BUG= ========== to ========== Calculate audio levels in AEC in time domain. In AEC, audio levels are calculated in frequency domain. This makes the calculation dependent on FFT. We now make the calculation performed in time domain. The complexity is the same, but the dependence on FFT is removed. BUG= Committed: https://crrev.com/9846845da6ee88bf16cb5fc62c6839ed7aafe04c Cr-Commit-Position: refs/heads/master@{#11357} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/9846845da6ee88bf16cb5fc62c6839ed7aafe04c Cr-Commit-Position: refs/heads/master@{#11357} |