webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h - Issue 1693823004: Use VAD to get a better speech power estimation in the IntelligibilityEnhancer

Side by Side Diff: webrtc/modules/audio_processing/intelligibility/intelligibility_utils.h

Issue 1693823004: Use VAD to get a better speech power estimation in the IntelligibilityEnhancer (Closed) Base URL: https://chromium.googlesource.com/external/webrtc.git@pow

Patch Set: Make gain change limit relative Created 4 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« no previous file with comments | « webrtc/modules/audio_processing/intelligibility/intelligibility_enhancer_unittest.cc ('k') | webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc » ('j') | webrtc/modules/audio_processing/intelligibility/intelligibility_utils.cc » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
1 /*	1 /*

2 * Copyright (c) 2014 The WebRTC project authors. All Rights Reserved.	2 * Copyright (c) 2014 The WebRTC project authors. All Rights Reserved.

3 *	3 *

4 * Use of this source code is governed by a BSD-style license	4 * Use of this source code is governed by a BSD-style license

5 * that can be found in the LICENSE file in the root of the source	5 * that can be found in the LICENSE file in the root of the source

6 * tree. An additional intellectual property rights grant can be found	6 * tree. An additional intellectual property rights grant can be found

7 * in the file PATENTS. All contributing project authors may	7 * in the file PATENTS. All contributing project authors may

8 * be found in the AUTHORS file in the root of the source tree.	8 * be found in the AUTHORS file in the root of the source tree.

9 */	9 */

10	10

11 //

12 // Specifies helper classes for intelligibility enhancement.

13 //

14

15 #ifndef WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_UTILS_H_	11 #ifndef WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_UTILS_H_

16 #define WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_UTILS_H_	12 #define WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_UTILS_H_

17	13

18 #include <complex>	14 #include <complex>

	15 #include <vector>

19	16

20 #include "webrtc/base/scoped_ptr.h"	17 #include "webrtc/base/scoped_ptr.h"

21	18

22 namespace webrtc {	19 namespace webrtc {

23	20

24 namespace intelligibility {	21 namespace intelligibility {

25	22

26 // Return \|current\| changed towards \|target\|, with the change being at most	23 // Internal helper for computing the power of a stream of arrays.

27 // \|limit\|.	24 // The result is an array of power per position: the i-th power is the power of

28 float UpdateFactor(float target, float current, float limit);	25 // the stream of data on the i-th positions in the input arrays.

	26 class PowerEstimator {

	27 public:

	28 // Construct an instance for the given input array length (\|freqs\|), with the

	29 // appropriate parameters. \|decay\| is the forgetting factor.

	30 PowerEstimator(size_t freqs, float decay);

29	31

30 // Apply a small fudge to degenerate complex values. The numbers in the array	32 // Add a new data point to the series.

31 // were chosen randomly, so that even a series of all zeroes has some small	33 template <typename T>
	turaj 2016/02/19 16:48:47 I'm sure have compiled this code, just out of my c I'm sure have compiled this code, just out of my curiosity, usually templates need instantiations, wasn't that necessary here? And why implementation is in header file, while implementation of constructor is in .cc file? I'm not experienced with templates, but I would have templated the class and instantiate the possible templates. I guess in this case instantiations would be <float> and <std::complex<float>>. aluebs-webrtc 2016/02/19 19:30:48 I am not experienced with templates either, but I Show quoted text On 2016/02/19 16:48:47, turaj wrote: > I'm sure have compiled this code, just out of my curiosity, usually templates > need instantiations, wasn't that necessary here? And why implementation is in > header file, while implementation of constructor is in .cc file? > > I'm not experienced with templates, but I would have templated the class and > instantiate the possible templates. I guess in this case instantiations would be > <float> and <std::complex<float>>. I am not experienced with templates either, but I decided to only templatize the function, since the implementation is the same and power_ has the same type, changing only the input type of that function. But maybe it is preferred to templatize the whole class, I am not sure. Templates only need explicit instantiations when implemented in the cc file instead of the header, so I implemented the template function in the header to get the additional flexibility of not having to explicitly instantiate. If you prefer a class template, I am happy to change it, since I am not sure myself what the criteria is to chose between function and class template. Or if you prefer to have the implementation in the cc file and restrict the template with explicit instantations. turaj 2016/02/22 17:36:53 Like you, I really don't know what is preferred wi Like you, I really don't know what is preferred with respect to templating member function vs templating class, nor header-file implementation versus source-file implementation. However, the reason I proposed to have template, beside saving couple of lines of code, was to prevent a potential bug my mixing complex and float averaging. If that concern is valid (which I'm not sure as this is an internal class and only intelligibility is using it) then templating like this is not serving the purpose. Things can still go wrong. Templating class, on the other hand, prevents the above issue as intelligibility will have PowerEstimator<float> noise_power_; PowerEstimator<std::complex<float>> speech_power_; Given the above, I a bit prefer class templating, but I don't have strong opinion, so I leave it up to you. aluebs-webrtc 2016/02/22 20:41:03 That is an excellent point! Changed it to a templa Show quoted text On 2016/02/22 17:36:53, turaj wrote: > > Like you, I really don't know what is preferred with respect to templating > member function vs templating class, nor header-file implementation versus > source-file implementation. However, the reason I proposed to have template, > beside saving couple of lines of code, was to prevent a potential bug my mixing > complex and float averaging. If that concern is valid (which I'm not sure as > this is an internal class and only intelligibility is using it) then templating > like this is not serving the purpose. Things can still go wrong. > > Templating class, on the other hand, prevents the above issue as intelligibility > will have > > PowerEstimator<float> noise_power_; > PowerEstimator<std::complex<float>> speech_power_; > > Given the above, I a bit prefer class templating, but I don't have strong > opinion, so I leave it up to you. That is an excellent point! Changed it to a template class.
32 // variability.	34 void Step(const T* data) {

33 std::complex<float> zerofudge(std::complex<float> c);	35 for (size_t i = 0; i < power_.size(); ++i) {

	36 power_[i] = decay_ * power_[i] +

	37 (1.f - decay_) * std::abs(data[i]) * std::abs(data[i]);

	38 }

	39 }

34	40

35 // Incremental mean computation. Return the mean of the series with the	41 // The current power array.

36 // mean \|mean\| with added \|data\|.	42 const std::vector<float>& power() { return power_; };

37 std::complex<float> NewMean(std::complex<float> mean,

38 std::complex<float> data,

39 size_t count);

40

41 // Updates \|mean\| with added \|data\|;

42 void AddToMean(std::complex<float> data,

43 size_t count,

44 std::complex<float>* mean);

45

46 // Internal helper for computing the variances of a stream of arrays.

47 // The result is an array of variances per position: the i-th variance

48 // is the variance of the stream of data on the i-th positions in the

49 // input arrays.

50 // There are four methods of computation:

51 // * kStepInfinite computes variances from the beginning onwards

52 // * kStepDecaying uses a recursive exponential decay formula with a

53 // settable forgetting factor

54 // * kStepWindowed computes variances within a moving window

55 // * kStepBlocked is similar to kStepWindowed, but history is kept

56 // as a rolling window of blocks: multiple input elements are used for

57 // one block and the history then consists of the variances of these blocks

58 // with the same effect as kStepWindowed, but less storage, so the window

59 // can be longer

60 class VarianceArray {

61 public:

62 enum StepType {

63 kStepInfinite = 0,

64 kStepDecaying,

65 kStepWindowed,

66 kStepBlocked,

67 kStepBlockBasedMovingAverage

68 };

69

70 // Construct an instance for the given input array length (\|freqs\|) and

71 // computation algorithm (\|type\|), with the appropriate parameters.

72 // \|window_size\| is the number of samples for kStepWindowed and

73 // the number of blocks for kStepBlocked. \|decay\| is the forgetting factor

74 // for kStepDecaying.

75 VarianceArray(size_t freqs, StepType type, size_t window_size, float decay);

76

77 // Add a new data point to the series and compute the new variances.

78 // TODO(bercic) \|skip_fudge\| is a flag for kStepWindowed and kStepDecaying,

79 // whether they should skip adding some small dummy values to the input

80 // to prevent problems with all-zero inputs. Can probably be removed.

81 void Step(const std::complex<float>* data, bool skip_fudge = false) {

82 (this->*step_func_)(data, skip_fudge);

83 }

84 // Reset variances to zero and forget all history.

85 void Clear();

86 // Scale the input data by \|scale\|. Effectively multiply variances

87 // by \|scale^2\|.

88 void ApplyScale(float scale);

89

90 // The current set of variances.

91 const float* variance() const { return variance_.get(); }

92

93 // The mean value of the current set of variances.

94 float array_mean() const { return array_mean_; }

95	43

96 private:	44 private:

97 void InfiniteStep(const std::complex<float>* data, bool dummy);	45 // The current power array.

98 void DecayStep(const std::complex<float>* data, bool dummy);	46 std::vector<float> power_;

99 void WindowedStep(const std::complex<float>* data, bool dummy);

100 void BlockedStep(const std::complex<float>* data, bool dummy);

101 void BlockBasedMovingAverage(const std::complex<float>* data, bool dummy);

102	47

103 // TODO(ekmeyerson): Switch the following running means

104 // and histories from rtc::scoped_ptr to std::vector.

105

106 // The current average X and X^2.

107 rtc::scoped_ptr<std::complex<float>[]> running_mean_;

108 rtc::scoped_ptr<std::complex<float>[]> running_mean_sq_;

109

110 // Average X and X^2 for the current block in kStepBlocked.

111 rtc::scoped_ptr<std::complex<float>[]> sub_running_mean_;

112 rtc::scoped_ptr<std::complex<float>[]> sub_running_mean_sq_;

113

114 // Sample history for the rolling window in kStepWindowed and block-wise

115 // histories for kStepBlocked.

116 rtc::scoped_ptr<rtc::scoped_ptr<std::complex<float>[]>[]> history_;

117 rtc::scoped_ptr<rtc::scoped_ptr<std::complex<float>[]>[]> subhistory_;

118 rtc::scoped_ptr<rtc::scoped_ptr<std::complex<float>[]>[]> subhistory_sq_;

119

120 // The current set of variances and sums for Welford's algorithm.

121 rtc::scoped_ptr<float[]> variance_;

122 rtc::scoped_ptr<float[]> conj_sum_;

123

124 const size_t num_freqs_;

125 const size_t window_size_;

126 const float decay_;	48 const float decay_;

127 size_t history_cursor_;

128 size_t count_;

129 float array_mean_;

130 bool buffer_full_;

131 void (VarianceArray::step_func_)(const std::complex<float>, bool);

132 };	49 };

133	50

134 // Helper class for smoothing gain changes. On each applicatiion step, the	51 // Helper class for smoothing gain changes. On each application step, the

135 // currently used gains are changed towards a set of settable target gains,	52 // currently used gains are changed towards a set of settable target gains,

136 // constrained by a limit on the magnitude of the changes.	53 // constrained by a limit on the relative changes.

137 class GainApplier {	54 class GainApplier {

138 public:	55 public:

139 GainApplier(size_t freqs, float change_limit);	56 GainApplier(size_t freqs, float relative_change_limit);

140	57

141 // Copy \|in_block\| to \|out_block\|, multiplied by the current set of gains,	58 // Copy \|in_block\| to \|out_block\|, multiplied by the current set of gains,

142 // and step the current set of gains towards the target set.	59 // and step the current set of gains towards the target set.

143 void Apply(const std::complex<float>* in_block,	60 void Apply(const std::complex<float>* in_block,

144 std::complex<float>* out_block);	61 std::complex<float>* out_block);

145	62

146 // Return the current target gain set. Modify this array to set the targets.	63 // Return the current target gain set. Modify this array to set the targets.

147 float* target() const { return target_.get(); }	64 float* target() const { return target_.get(); }

148	65

149 private:	66 private:

150 const size_t num_freqs_;	67 const size_t num_freqs_;

151 const float change_limit_;	68 const float relative_change_limit_;

152 rtc::scoped_ptr<float[]> target_;	69 rtc::scoped_ptr<float[]> target_;

153 rtc::scoped_ptr<float[]> current_;	70 rtc::scoped_ptr<float[]> current_;

154 };	71 };

155	72

156 } // namespace intelligibility	73 } // namespace intelligibility

157	74

158 } // namespace webrtc	75 } // namespace webrtc

159	76

160 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_UTILS _H_	77 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_INTELLIGIBILITY_INTELLIGIBILITY_UTILS _H_

OLD	NEW