webrtc/modules/audio_processing/vad/voice_activity_detector.h - Issue 1181933002: Pull the Voice Activity Detector out from the AGC

Side by Side Diff: webrtc/modules/audio_processing/vad/voice_activity_detector.h

Issue 1181933002: Pull the Voice Activity Detector out from the AGC (Closed) Base URL: https://chromium.googlesource.com/external/webrtc.git@master

Patch Set: Created 5 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« webrtc/modules/audio_processing/vad/vad_audio_proc_internal.h ('K') | « webrtc/modules/audio_processing/vad/vad_circular_buffer_unittest.cc ('k') | webrtc/modules/audio_processing/vad/voice_activity_detector.cc » ('j') | webrtc/modules/audio_processing/vad/voice_activity_detector.cc » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
(Empty)
	1 /*
	Andrew MacDonald 2015/06/15 04:32:25 After all reviewers have LGTM'd, please run git cl After all reviewers have LGTM'd, please run git cl format on your totally new files. aluebs-webrtc 2015/06/16 01:17:52 Acknowledged. Show quoted text On 2015/06/15 04:32:25, andrew wrote: > After all reviewers have LGTM'd, please run git cl format on your totally new > files. Acknowledged.
	2 * Copyright (c) 2015 The WebRTC project authors. All Rights Reserved.

	3 *

	4 * Use of this source code is governed by a BSD-style license

	5 * that can be found in the LICENSE file in the root of the source

	6 * tree. An additional intellectual property rights grant can be found

	7 * in the file PATENTS. All contributing project authors may

	8 * be found in the AUTHORS file in the root of the source tree.

	9 */

	10

	11 #ifndef WEBRTC_MODULES_AUDIO_PROCESSING_VAD_VOICE_ACTIVITY_DETECTOR_H_

	12 #define WEBRTC_MODULES_AUDIO_PROCESSING_VAD_VOICE_ACTIVITY_DETECTOR_H_

	13

	14 #include <vector>

	15

	16 #include "webrtc/base/scoped_ptr.h"

	17 #include "webrtc/common_audio/resampler/include/resampler.h"

	18 #include "webrtc/modules/audio_processing/vad/vad_audio_proc.h"

	19 #include "webrtc/modules/audio_processing/vad/common.h"

	20 #include "webrtc/modules/audio_processing/vad/pitch_based_vad.h"

	21 #include "webrtc/modules/audio_processing/vad/standalone_vad.h"

	22

	23 namespace webrtc {

	24

	25 // A Voice Activity Detector (VAD) that combines the voice probability from the

	26 // StandaloneVad and PitchBasedVad to get a more robust estimation.

	27 class VoiceActivityDetector {

	28 public:

	29 VoiceActivityDetector();

	30

	31 // Processes each capture audio chunk and estimates the voice probability. The

	32 // maximum supported sample rate is 32kHz.

	33 void ProcessCaptureAudio(const int16_t* audio, int length);
	Andrew MacDonald 2015/06/15 04:32:25 Perhaps ProcessChunk instead? It doesn't have to b Perhaps ProcessChunk instead? It doesn't have to be captured audio. aluebs-webrtc 2015/06/16 01:17:53 Good point! Done. Show quoted text On 2015/06/15 04:32:25, andrew wrote: > Perhaps ProcessChunk instead? It doesn't have to be captured audio. Good point! Done.
	34

	35 // Because ISAC has a different chunk length, it returns a zero size vector
	Andrew MacDonald 2015/06/15 04:32:25 This is an implementation detail; you could put it This is an implementation detail; you could put it in the implementation of ProcessCaptureAudio. Here, I think you should discuss what this method returns exactly (and for RMS below). aluebs-webrtc 2015/06/16 01:17:52 Done. Show quoted text On 2015/06/15 04:32:25, andrew wrote: > This is an implementation detail; you could put it in the implementation of > ProcessCaptureAudio. Here, I think you should discuss what this method returns > exactly (and for RMS below). Done.
	36 // every time there is no new data and then return a few chunk's data at once.
	bloch 2015/06/12 23:34:38 I'm thinking about what we discussed; I think I (p I'm thinking about what we discussed; I think I (personally) prefer this method. It's closer to honestly reporting chunk-by-chunk data. However, will the returned vector have all identical values? If so, it might make sense to just return one. If not, it'll likely be too late to update the probability of a past chunk--assuming you're using software that works in real-time. aluebs-webrtc 2015/06/16 01:17:52 It will not have identical values. As you point ou Show quoted text On 2015/06/12 23:34:38, bloch wrote: > I'm thinking about what we discussed; I think I (personally) prefer this method. > It's closer to honestly reporting chunk-by-chunk data. However, will the > returned vector have all identical values? If so, it might make sense to just > return one. If not, it'll likely be too late to update the probability of a past > chunk--assuming you're using software that works in real-time. It will not have identical values. As you point out, in real time, you can't update the probability of a past chunk. But it is used by the AGC for updating a histograms with all the values.
	37 std::vector<double> chunkwise_voice_probabilities() const {

	38 return chunkwise_voice_probabilities_;
	Andrew MacDonald 2015/06/15 04:32:26 Since this is called on every chunk, you don't wan Since this is called on every chunk, you don't want to return by value here. Return a "const std::vector<double>&" instead. aluebs-webrtc 2015/06/16 01:17:52 Done. Show quoted text On 2015/06/15 04:32:26, andrew wrote: > Since this is called on every chunk, you don't want to return by value here. > Return a "const std::vector<double>&" instead. Done.
	39 }

	40 std::vector<double> chunkwise_rms() const { return chunkwise_rms_; }
	Andrew MacDonald 2015/06/15 04:32:25 As above, return a const reference. As above, return a const reference. aluebs-webrtc 2015/06/16 01:17:52 Done. Show quoted text On 2015/06/15 04:32:25, andrew wrote: > As above, return a const reference. Done.
	41

	42 // Returns the last voice probability, regardless of the internal

	43 // implementation, although it has a few chunks of delay.

	44 float last_voice_probability() const { return last_voice_probability_; }
	Andrew MacDonald 2015/06/15 04:32:25 Should definitely convert things to float later gi Should definitely convert things to float later given this inconsistency :) aluebs-webrtc 2015/06/16 01:17:53 Fixed this inconsistency in this CL, since it make Show quoted text On 2015/06/15 04:32:25, andrew wrote: > Should definitely convert things to float later given this inconsistency :) Fixed this inconsistency in this CL, since it makes no sense, but will convert everything to float on another CL.
	45

	46 private:

	47 std::vector<double> chunkwise_voice_probabilities_;

	48 std::vector<double> chunkwise_rms_;

	49

	50 double last_voice_probability_;
	bloch 2015/06/12 23:34:38 I know you're just refactoring old code, but is th I know you're just refactoring old code, but is there a reason we're returning doubles instead of floats? There's no way we have enough significant digits from the calculation to need double the precision of just a floating-point number, right? Andrew MacDonald 2015/06/15 04:32:26 Agreed, I'm sure these could be floats. However it Show quoted text On 2015/06/12 at 23:34:38, bloch wrote: > I know you're just refactoring old code, but is there a reason we're returning doubles instead of floats? There's no way we have enough significant digits from the calculation to need double the precision of just a floating-point number, right? Agreed, I'm sure these could be floats. However it involves touching the moved code as well. Alex, do you want to do that in a follow-up? aluebs-webrtc 2015/06/16 01:17:52 Yes, I can do this in a follow up CL. But I change Show quoted text On 2015/06/15 04:32:26, andrew wrote: > On 2015/06/12 at 23:34:38, bloch wrote: > > I know you're just refactoring old code, but is there a reason we're returning > doubles instead of floats? There's no way we have enough significant digits from > the calculation to need double the precision of just a floating-point number, > right? > > Agreed, I'm sure these could be floats. However it involves touching the moved > code as well. Alex, do you want to do that in a follow-up? Yes, I can do this in a follow up CL. But I changed to float here, to avoid a cast.
	51

	52 Resampler resampler_;

	53 VadAudioProc audio_processing_;

	54

	55 rtc::scoped_ptr<StandaloneVad> standalone_vad_;

	56 PitchBasedVad pitch_based_vad_;

	57

	58 int16_t resampled_[kLength10Ms];

	59 AudioFeatures features_;

	60 };

	61

	62 } // namespace webrtc

	63

	64 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_VAD_VOICE_ACTIVITY_DETECTOR_H_

OLD	NEW