|
|
Created:
4 years, 8 months ago by aluebs-webrtc Modified:
4 years, 8 months ago CC:
webrtc-reviews_webrtc.org, peah-webrtc, Andrew MacDonald, tterriberry_mozilla.com, audio-team_agora.io, hlundin-webrtc, kwiberg-webrtc, minyue-webrtc, the sun, bjornv1 Base URL:
https://chromium.googlesource.com/external/webrtc.git@master Target Ref:
refs/pending/heads/master Project:
webrtc Visibility:
Public. |
DescriptionDisable Intelligibility Enhancer for high SNRs
Committed: https://crrev.com/2fae89ed0d7dd54d4649b6dcbf5a6f0a33804469
Cr-Commit-Position: refs/heads/master@{#12352}
Patch Set 1 #
Total comments: 6
Patch Set 2 : Naming #Patch Set 3 : Fix division by zero #
Messages
Total messages: 19 (7 generated)
aluebs@webrtc.org changed reviewers: + henrik.lundin@webrtc.org, peah@webrtc.org, turaj@webrtc.org
lgtm, but with the additional comment that I have concerns about whether it makes sense to control the IE effect based on the estimate of the digital SNR without having any mapping from this to the acoustic SNR at the ear of the listener. https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:168: void IntelligibilityEnhancer::UpdateActivity() { What you are updating here is the is_active flag and gains, right? And the is_active flag is a flag for whether the effect of the IE should be active or not, and not for whether there is activity in the render signal, right? If I did not get that wrong, I think this method should be modified or renamed. since to me, activity means speech activity. What about renaming it to ControlEffectApplication, or SnrBasedEffectActivation which in my mind describe in more detail what is being done. https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:175: snr_ = kDecayRate * snr_ + (1.f - kDecayRate) * clear_power / noise_power; This SNR estimate is an average of the instantaneous SNR. An alternative could be to use the average of the overall SNR. Have you considered that (I'm not saying this is wrong). https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:175: snr_ = kDecayRate * snr_ + (1.f - kDecayRate) * clear_power / noise_power; This SNR estimate is assuming that the ratio of the clear_psd and noise_psd matches the ratio at the ear of the listener. What happens if the listener is using headphones? Then this ratio is very different from what the SNR is at the ear of the listener. The same is the case if a device is used in speaker mode. I don't really see any point in adding functionality for tuning the application of the IE effect based on the digital SNR until a mapping is in place to map this to the acoustic SNR at the ear. But I'm fine with the code change.
hlundin, turaj, could you please take a look when you have some time? Thanks! :) https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... File webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc (right): https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:168: void IntelligibilityEnhancer::UpdateActivity() { On 2016/04/12 13:39:21, peah-webrtc wrote: > What you are updating here is the is_active flag and gains, right? And the > is_active flag is a flag for whether the effect of the IE should be active or > not, and not for whether there is activity in the render signal, right? > > If I did not get that wrong, I think this method should be modified or renamed. > since to me, activity means speech activity. > What about renaming it to ControlEffectApplication, or SnrBasedEffectActivation > which in my mind describe in more detail what is being done. > > Yes, your understanding is completely right. And I agree that your naming suggestion is more intuitive. Done. https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:175: snr_ = kDecayRate * snr_ + (1.f - kDecayRate) * clear_power / noise_power; On 2016/04/12 13:39:21, peah-webrtc wrote: > This SNR estimate is an average of the instantaneous SNR. An alternative could > be to use the average of the overall SNR. Have you considered that (I'm not > saying this is wrong). That is an interesting point. Because the PSDs are already filtered over time already, this is not exactly an average of the instantaneous SNR, but more of an average of an averaged SNR, if that makes some sense. This additional filtering is just to ensure its smoothness. https://codereview.webrtc.org/1878133002/diff/1/webrtc/modules/audio_processi... webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc:175: snr_ = kDecayRate * snr_ + (1.f - kDecayRate) * clear_power / noise_power; On 2016/04/12 13:39:21, peah-webrtc wrote: > This SNR estimate is assuming that the ratio of the clear_psd and noise_psd > matches the ratio at the ear of the listener. What happens if the listener is > using headphones? Then this ratio is very different from what the SNR is at the > ear of the listener. The same is the case if a device is used in speaker mode. > > I don't really see any point in adding functionality for tuning the application > of the IE effect based on the digital SNR until a mapping is in place to map > this to the acoustic SNR at the ear. > > But I'm fine with the code change. As discussed offline at the beginning of this project, with some broad assumptions we can estimate acoustic SNRs from the digital one good enough for the IE to improve the intelligibility of the signal. But also, I am testing right now on a real device if this holds true and at the same time working on a mapping I suggested to see if it improves the relation between the SNRs. On the other hand, what we decided was to enable this feature first only for headphones and phone mode (no speaker phone), so we can focus on that and delay the additional tweaking to later on the process. I think this code is valuable as of today, but I agree that the thresholds will need to be adjusted if we apply any mapping.
I thought the method you added should be called within intelligibility to automatically switch the effect on/off. I didn't see anything of that sort. Are you planning to leave that to the client of IE? I still think having an API that one can change these values is useful. For instance we can have a two ctor one that takes a larger number of the tuning parameters, and one that takes subset/or none and calls the other one with the default values. Obviously not for this CL, just something to think of. It came to my mind again, as these parameters might be more device dependent that the rest.
On 2016/04/12 19:33:30, turaj wrote: > I thought the method you added should be called within intelligibility to > automatically switch the effect on/off. I didn't see anything of that sort. Are > you planning to leave that to the client of IE? > > I still think having an API that one can change these values is useful. For > instance we can have a two ctor one that takes a larger number of the tuning > parameters, and one that takes subset/or none and calls the other one with the > default values. Obviously not for this CL, just something to think of. It came > to my mind again, as these parameters might be more device dependent that the > rest. SnrBasedEffectActivation is called every block and updates is_active_, which is used in an if-statement to calculate the gains. The noise and PSD estimation has to continue running to avoid having to converge each time the component is re-enabled. I'll consider re-adding a constructor with more parameters in another CL, but hopefully this isn't too device specific.
Sorry, I missed that line. I was almost sure I was missing something. LGTM.
Neat and nice! LGTM.
The CQ bit was checked by aluebs@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from peah@webrtc.org Link to the patchset: https://codereview.webrtc.org/1878133002/#ps20001 (title: "Naming")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1878133002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1878133002/20001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_ubsan on tryserver.webrtc (JOB_FAILED, http://build.chromium.org/p/tryserver.webrtc/builders/linux_ubsan/builds/1194)
The CQ bit was checked by aluebs@webrtc.org
The patchset sent to the CQ was uploaded after l-g-t-m from henrik.lundin@webrtc.org, peah@webrtc.org, turaj@webrtc.org Link to the patchset: https://codereview.webrtc.org/1878133002/#ps40001 (title: "Fix division by zero")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1878133002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1878133002/40001
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== Disable Intelligibility Enhancer for high SNRs ========== to ========== Disable Intelligibility Enhancer for high SNRs Committed: https://crrev.com/2fae89ed0d7dd54d4649b6dcbf5a6f0a33804469 Cr-Commit-Position: refs/heads/master@{#12352} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/2fae89ed0d7dd54d4649b6dcbf5a6f0a33804469 Cr-Commit-Position: refs/heads/master@{#12352} |