webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.h - Issue 1207353002: Add new variance update option and unittests for intelligibility

Side by Side Diff: webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.h

Issue 1207353002: Add new variance update option and unittests for intelligibility (Closed) Base URL: https://chromium.googlesource.com/external/webrtc.git@master

Patch Set: Renamed tests + minor changes Created 5 years, 5 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« no previous file with comments | « webrtc/modules/audio_processing/audio_processing_tests.gypi ('k') | webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc » ('j') | webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer.cc » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
1 /*	1 /*

2 * Copyright (c) 2014 The WebRTC project authors. All Rights Reserved.	2 * Copyright (c) 2014 The WebRTC project authors. All Rights Reserved.

3 *	3 *

4 * Use of this source code is governed by a BSD-style license	4 * Use of this source code is governed by a BSD-style license

5 * that can be found in the LICENSE file in the root of the source	5 * that can be found in the LICENSE file in the root of the source

6 * tree. An additional intellectual property rights grant can be found	6 * tree. An additional intellectual property rights grant can be found

7 * in the file PATENTS. All contributing project authors may	7 * in the file PATENTS. All contributing project authors may

8 * be found in the AUTHORS file in the root of the source tree.	8 * be found in the AUTHORS file in the root of the source tree.

9 */	9 */

10	10

(...skipping 24 matching lines...) Expand all Loading...
35 // Construct a new instance with the given filter bank resolution,	35 // Construct a new instance with the given filter bank resolution,

36 // sampling rate, number of channels and analysis rates.	36 // sampling rate, number of channels and analysis rates.

37 // \|analysis_rate\| sets the number of input blocks (containing speech!)	37 // \|analysis_rate\| sets the number of input blocks (containing speech!)

38 // to elapse before a new gain computation is made. \|variance_rate\| specifies	38 // to elapse before a new gain computation is made. \|variance_rate\| specifies

39 // the number of gain recomputations after which the variances are reset.	39 // the number of gain recomputations after which the variances are reset.

40 // \|cv_*\| are parameters for the VarianceArray constructor for the	40 // \|cv_*\| are parameters for the VarianceArray constructor for the

41 // clear speech stream.	41 // clear speech stream.

42 // TODO(bercic): the \|cv_\|, \|_rate\| and \|gain_limit\| parameters should	42 // TODO(bercic): the \|cv_\|, \|_rate\| and \|gain_limit\| parameters should

43 // probably go away once fine tuning is done. They override the internal	43 // probably go away once fine tuning is done. They override the internal

44 // constants in the class (kGainChangeLimit, kAnalyzeRate, kVarianceRate).	44 // constants in the class (kGainChangeLimit, kAnalyzeRate, kVarianceRate).

45 IntelligibilityEnhancer(int erb_resolution,	45 IntelligibilityEnhancer(int erb_resolution,
	hlundin-webrtc 2015/06/30 14:00:52 This is a very complicated constructor call. Consi This is a very complicated constructor call. Consider using a config struct instead. See for instance NetEq::Config in neteq.h. The config struct should typically have a constructor which populates the config with default values. A user will then only have to modify the values if they differ from the default. You can also add convenience method to the struct, for instance std::string ToString() const for logging and bool IsOk() const for validating a configuration before sending it to the class constructor. (An extra benefit of including an IsOk() method to the Config struct is that the class constructor can simply CHECK(config.IsOk()) and go down in a ball of fire if that fails.) ekm 2015/07/01 23:48:25 You're right. To stay uniform with the other APM c Show quoted text On 2015/06/30 14:00:52, hlundin-webrtc wrote: > This is a very complicated constructor call. Consider using a config struct > instead. See for instance NetEq::Config in neteq.h. The config struct should > typically have a constructor which populates the config with default values. A > user will then only have to modify the values if they differ from the default. > You can also add convenience method to the struct, for instance > std::string ToString() const > for logging and > bool IsOk() const > for validating a configuration before sending it to the class constructor. (An > extra benefit of including an IsOk() method to the Config struct is that the > class constructor can simply CHECK(config.IsOk()) and go down in a ball of fire > if that fails.) You're right. To stay uniform with the other APM components, how about using Constructor + Intialize for creation and SetExtraOptions to modify defaults? Might this change be better to include in the next cl, which integrates into the apm pipeline? Andrew MacDonald 2015/07/02 02:24:49 Don't worry too much about uniformity with other c Show quoted text On 2015/07/01 23:48:25, ekm wrote: > On 2015/06/30 14:00:52, hlundin-webrtc wrote: > > This is a very complicated constructor call. Consider using a config struct > > instead. See for instance NetEq::Config in neteq.h. The config struct should > > typically have a constructor which populates the config with default values. A > > user will then only have to modify the values if they differ from the default. > > You can also add convenience method to the struct, for instance > > std::string ToString() const > > for logging and > > bool IsOk() const > > for validating a configuration before sending it to the class constructor. (An > > extra benefit of including an IsOk() method to the Config struct is that the > > class constructor can simply CHECK(config.IsOk()) and go down in a ball of > fire > > if that fails.) > > You're right. To stay uniform with the other APM components, how about using > Constructor + Intialize for creation and SetExtraOptions to modify defaults? > Might this change be better to include in the next cl, which integrates into the > apm pipeline? Don't worry too much about uniformity with other components; some of the older ones may be less than ideal :) I think initializing everything in the constructor is fine, but is it really useful to expose all of these parameters? Do you intend to use them in a tool for optimizing them? If so, or if you want to keep some of them, an alternative is to use default arguments, which is recently permitted by the style guide in constructors. (Henrik's Config is a slick workaround for the ban on default arguments :) I'll send you an example off-review. But I'd make anything you're not intending to modify externally an internal const. hlundin-webrtc 2015/07/02 10:53:13 I still think that a Config struct is a better opt Show quoted text On 2015/07/02 02:24:49, andrew wrote: > On 2015/07/01 23:48:25, ekm wrote: > > On 2015/06/30 14:00:52, hlundin-webrtc wrote: > > > This is a very complicated constructor call. Consider using a config struct > > > instead. See for instance NetEq::Config in neteq.h. The config struct should > > > typically have a constructor which populates the config with default values. > A > > > user will then only have to modify the values if they differ from the > default. > > > You can also add convenience method to the struct, for instance > > > std::string ToString() const > > > for logging and > > > bool IsOk() const > > > for validating a configuration before sending it to the class constructor. > (An > > > extra benefit of including an IsOk() method to the Config struct is that the > > > class constructor can simply CHECK(config.IsOk()) and go down in a ball of > > fire > > > if that fails.) > > > > You're right. To stay uniform with the other APM components, how about using > > Constructor + Intialize for creation and SetExtraOptions to modify defaults? > > Might this change be better to include in the next cl, which integrates into > the > > apm pipeline? > > Don't worry too much about uniformity with other components; some of the older > ones may be less than ideal :) > > I think initializing everything in the constructor is fine, but is it really > useful to expose all of these parameters? Do you intend to use them in a tool > for optimizing them? If so, or if you want to keep some of them, an alternative > is to use default arguments, which is recently permitted by the style guide in > constructors. (Henrik's Config is a slick workaround for the ban on default > arguments :) > > I'll send you an example off-review. But I'd make anything you're not intending > to modify externally an internal const. I still think that a Config struct is a better option. Don't worry about the legacy code. Config structs are on the rise in the WebRTC code base, e.g., the new Call API and friends, new AudioEncoder classes. But, I'm fine with you doing that in a follow-up. ekm 2015/07/07 21:57:02 Acknowledged. Some of these params should not be e Show quoted text On 2015/07/02 10:53:13, hlundin-webrtc-VACATIONtoAUG3 wrote: > On 2015/07/02 02:24:49, andrew wrote: > > On 2015/07/01 23:48:25, ekm wrote: > > > On 2015/06/30 14:00:52, hlundin-webrtc wrote: > > > > This is a very complicated constructor call. Consider using a config > struct > > > > instead. See for instance NetEq::Config in neteq.h. The config struct > should > > > > typically have a constructor which populates the config with default > values. > > A > > > > user will then only have to modify the values if they differ from the > > default. > > > > You can also add convenience method to the struct, for instance > > > > std::string ToString() const > > > > for logging and > > > > bool IsOk() const > > > > for validating a configuration before sending it to the class constructor. > > (An > > > > extra benefit of including an IsOk() method to the Config struct is that > the > > > > class constructor can simply CHECK(config.IsOk()) and go down in a ball of > > > fire > > > > if that fails.) > > > > > > You're right. To stay uniform with the other APM components, how about using > > > Constructor + Intialize for creation and SetExtraOptions to modify defaults? > > > Might this change be better to include in the next cl, which integrates into > > the > > > apm pipeline? > > > > Don't worry too much about uniformity with other components; some of the older > > ones may be less than ideal :) > > > > I think initializing everything in the constructor is fine, but is it really > > useful to expose all of these parameters? Do you intend to use them in a tool > > for optimizing them? If so, or if you want to keep some of them, an > alternative > > is to use default arguments, which is recently permitted by the style guide in > > constructors. (Henrik's Config is a slick workaround for the ban on default > > arguments :) > > > > I'll send you an example off-review. But I'd make anything you're not > intending > > to modify externally an internal const. > > I still think that a Config struct is a better option. Don't worry about the > legacy code. Config structs are on the rise in the WebRTC code base, e.g., the > new Call API and friends, new AudioEncoder classes. > > But, I'm fine with you doing that in a follow-up. Acknowledged. Some of these params should not be exposed at all; a few I'd like to keep for optimizing in a tool. I'll go for the Config in the next cl.
46 int sample_rate_hz,	46 int sample_rate_hz,

47 int channels,	47 int channels,

48 int cv_type,	48 int cv_type,

49 float cv_alpha,	49 float cv_alpha,

50 int cv_win,	50 int cv_win,

51 int analysis_rate,	51 int analysis_rate,

52 int variance_rate,	52 int variance_rate,

53 float gain_limit);	53 float gain_limit);

54 ~IntelligibilityEnhancer();	54 ~IntelligibilityEnhancer();

55	55

(...skipping 20 matching lines...) Expand all Loading...
76 int in_channels,	76 int in_channels,

77 int frames,	77 int frames,

78 int out_channels,	78 int out_channels,

79 std::complex<float>* const* out_block);	79 std::complex<float>* const* out_block);

80	80

81 private:	81 private:

82 IntelligibilityEnhancer* parent_;	82 IntelligibilityEnhancer* parent_;

83 AudioSource source_;	83 AudioSource source_;

84 };	84 };

85 friend class TransformCallback;	85 friend class TransformCallback;

	86 FRIEND_TEST_ALL_PREFIXES(IntelligibilityEnhancerTest, TestErbCreation);

	87 FRIEND_TEST_ALL_PREFIXES(IntelligibilityEnhancerTest, TestSolveForGains);

86	88

87 // Sends streams to ProcessClearBlock or ProcessNoiseBlock based on source.	89 // Sends streams to ProcessClearBlock or ProcessNoiseBlock based on source.

88 void DispatchAudio(AudioSource source,	90 void DispatchAudio(AudioSource source,

89 const std::complex<float>* in_block,	91 const std::complex<float>* in_block,

90 std::complex<float>* out_block);	92 std::complex<float>* out_block);

91	93

92 // Updates variance computation and analysis with \|in_block_\|,	94 // Updates variance computation and analysis with \|in_block_\|,

93 // and writes modified speech to \|out_block\|.	95 // and writes modified speech to \|out_block\|.

94 void ProcessClearBlock(const std::complex<float>* in_block,	96 void ProcessClearBlock(const std::complex<float>* in_block,

95 std::complex<float>* out_block);	97 std::complex<float>* out_block);

96	98

97 // Computes and sets modified gains.	99 // Computes and sets modified gains.

98 void AnalyzeClearBlock(float power_target);	100 void AnalyzeClearBlock(float power_target);

99	101

	102 // Bisection search for optimal \|lambda\|.

	103 void SolveForLambda(float power_target, float power_bot, float power_top);

	104

	105 // Transforms freq gains to ERB gains.

	106 void UpdateErbGains();

	107

100 // Updates variance calculation for noise input with \|in_block\|.	108 // Updates variance calculation for noise input with \|in_block\|.

101 void ProcessNoiseBlock(const std::complex<float>* in_block,	109 void ProcessNoiseBlock(const std::complex<float>* in_block,

102 std::complex<float>* out_block);	110 std::complex<float>* out_block);

103	111

104 // Returns number of ERB filters.	112 // Returns number of ERB filters.

105 static int GetBankSize(int sample_rate, int erb_resolution);	113 static int GetBankSize(int sample_rate, int erb_resolution);

106	114

107 // Initializes ERB filterbank.	115 // Initializes ERB filterbank.

108 void CreateErbBank();	116 void CreateErbBank();

109	117

110 // Analytically solves quadratic for optimal gains given \|lambda\|.	118 // Analytically solves quadratic for optimal gains given \|lambda\|.

111 // Negative gains are set to 0. Stores the results in \|sols\|.	119 // Negative gains are set to 0. Stores the results in \|sols\|.

112 void SolveForGainsGivenLambda(float lambda, int start_freq, float* sols);	120 void SolveForGainsGivenLambda(float lambda, int start_freq, float* sols);

113	121

114 // Computes variance across ERB filters from freq variance \|var\|.	122 // Computes variance across ERB filters from freq variance \|var\|.

115 // Stores in \|result\|.	123 // Stores in \|result\|.

116 void FilterVariance(const float* var, float* result);	124 void FilterVariance(const float* var, float* result);

117	125

118 // Returns dot product of vectors specified by size \|length\| arrays \|a\|,\|b\|.	126 // Returns dot product of vectors specified by size \|length\| arrays \|a\|,\|b\|.

119 static float DotProduct(const float* a, const float* b, int length);	127 static float DotProduct(const float* a, const float* b, int length);

120	128

121 static const int kErbResolution;	129 static const int kErbResolution;
	hlundin-webrtc 2015/06/30 14:00:52 Static const data members should go before methods Static const data members should go before methods. http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Declaration_O... ekm 2015/07/01 23:48:25 Done. Show quoted text On 2015/06/30 14:00:52, hlundin-webrtc wrote: > Static const data members should go before methods. > http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Declaration_O... Done.
122 static const int kWindowSizeMs;	130 static const int kWindowSizeMs;

123 static const int kChunkSizeMs;	131 static const int kChunkSizeMs;

124 static const int kAnalyzeRate; // Default for \|analysis_rate_\|.	132 static const int kAnalyzeRate; // Default for \|analysis_rate_\|.

125 static const int kVarianceRate; // Default for \|variance_rate_\|.	133 static const int kVarianceRate; // Default for \|variance_rate_\|.

126 static const float kClipFreq;	134 static const float kClipFreq;

127 static const float kConfigRho; // Default production and interpretation SNR.	135 static const float kConfigRho; // Default production and interpretation SNR.

128 static const float kKbdAlpha;	136 static const float kKbdAlpha;

129 static const float kGainChangeLimit;	137 static const float kGainChangeLimit;

	138 static const float kLambdaBot; // Extreme values in bisection

	139 static const float kLambdaTop; // search for lambda.

130	140

131 const int freqs_; // Num frequencies in frequency domain.	141 const int freqs_; // Num frequencies in frequency domain.

132 const int window_size_; // Window size in samples; also the block size.	142 const int window_size_; // Window size in samples; also the block size.

133 const int chunk_length_; // Chunk size in samples.	143 const int chunk_length_; // Chunk size in samples.

134 const int bank_size_; // Num ERB filters.	144 const int bank_size_; // Num ERB filters.

135 const int sample_rate_hz_;	145 const int sample_rate_hz_;

136 const int erb_resolution_;	146 const int erb_resolution_;

137 const int channels_; // Num channels.	147 const int channels_; // Num channels.

138 const int analysis_rate_; // Num blocks before gains recalculated.	148 const int analysis_rate_; // Num blocks before gains recalculated.

139 const int variance_rate_; // Num recalculations before history is cleared.	149 const int variance_rate_; // Num recalculations before history is cleared.

(...skipping 29 matching lines...) Expand all Loading...
169 // Note: VAD currently does not affect anything in IntelligibilityEnhancer.	179 // Note: VAD currently does not affect anything in IntelligibilityEnhancer.

170 VadInst* vad_high_;	180 VadInst* vad_high_;

171 VadInst* vad_low_;	181 VadInst* vad_low_;

172 rtc::scoped_ptr<int16_t[]> vad_tmp_buffer_;	182 rtc::scoped_ptr<int16_t[]> vad_tmp_buffer_;

173 bool has_voice_low_; // Whether voice detected in speech stream.	183 bool has_voice_low_; // Whether voice detected in speech stream.

174 };	184 };

175	185

176 } // namespace webrtc	186 } // namespace webrtc

177	187

178 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_ENHAN CER_H_	188 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_ENHAN CER_H_

OLD	NEW