webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h - Issue 1394103003: Make the nonlinear beamformer steerable

Side by Side Diff: webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.h

Issue 1394103003: Make the nonlinear beamformer steerable (Closed) Base URL: https://chromium.googlesource.com/external/webrtc.git@highfreq

Patch Set: Created 5 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« no previous file with comments | « webrtc/modules/audio_processing/audio_processing_impl.cc ('k') | webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc » ('j') | webrtc/modules/audio_processing/beamformer/nonlinear_beamformer.cc » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
1 /*	1 /*

2 * Copyright (c) 2014 The WebRTC project authors. All Rights Reserved.	2 * Copyright (c) 2014 The WebRTC project authors. All Rights Reserved.

3 *	3 *

4 * Use of this source code is governed by a BSD-style license	4 * Use of this source code is governed by a BSD-style license

5 * that can be found in the LICENSE file in the root of the source	5 * that can be found in the LICENSE file in the root of the source

6 * tree. An additional intellectual property rights grant can be found	6 * tree. An additional intellectual property rights grant can be found

7 * in the file PATENTS. All contributing project authors may	7 * in the file PATENTS. All contributing project authors may

8 * be found in the AUTHORS file in the root of the source tree.	8 * be found in the AUTHORS file in the root of the source tree.

9 */	9 */

10	10

(...skipping 13 matching lines...) Expand all Loading...
24 // Enhances sound sources coming directly in front of a uniform linear array	24 // Enhances sound sources coming directly in front of a uniform linear array

25 // and suppresses sound sources coming from all other directions. Operates on	25 // and suppresses sound sources coming from all other directions. Operates on

26 // multichannel signals and produces single-channel output.	26 // multichannel signals and produces single-channel output.

27 //	27 //

28 // The implemented nonlinear postfilter algorithm taken from "A Robust Nonlinear	28 // The implemented nonlinear postfilter algorithm taken from "A Robust Nonlinear

29 // Beamforming Postprocessor" by Bastiaan Kleijn.	29 // Beamforming Postprocessor" by Bastiaan Kleijn.

30 class NonlinearBeamformer	30 class NonlinearBeamformer

31 : public Beamformer<float>,	31 : public Beamformer<float>,

32 public LappedTransform::Callback {	32 public LappedTransform::Callback {

33 public:	33 public:

34 explicit NonlinearBeamformer(const std::vector<Point>& array_geometry);	34 explicit NonlinearBeamformer(const std::vector<Point>& array_geometry,

	35 float target_angle_radians = M_PI / 2.f);
	Andrew MacDonald 2015/10/14 22:12:31 Since we're exposing a setter, I think it's redund Since we're exposing a setter, I think it's redundant to also take a target angle at construction time. WDYT? aluebs-webrtc 2015/10/20 00:04:20 Since it has a default value, it is not necessary Show quoted text On 2015/10/14 22:12:31, Andrew MacDonald wrote: > Since we're exposing a setter, I think it's redundant to also take a target > angle at construction time. WDYT? Since it has a default value, it is not necessary to specify it. I was just feeling it would be inefficient to initialize all matrices twice, once with the default value and once with the real one, just because the user needs to set the angle with the setter. If you still think it is redundant, I am happy to remove, I don't have such a strong opinion.
35	36

36 // Sample rate corresponds to the lower band.	37 // Sample rate corresponds to the lower band.

37 // Needs to be called before the NonlinearBeamformer can be used.	38 // Needs to be called before the NonlinearBeamformer can be used.

38 void Initialize(int chunk_size_ms, int sample_rate_hz) override;	39 void Initialize(int chunk_size_ms, int sample_rate_hz) override;

39	40

40 // Process one time-domain chunk of audio. The audio is expected to be split	41 // Process one time-domain chunk of audio. The audio is expected to be split

41 // into frequency bands inside the ChannelBuffer. The number of frames and	42 // into frequency bands inside the ChannelBuffer. The number of frames and

42 // channels must correspond to the constructor parameters. The same	43 // channels must correspond to the constructor parameters. The same

43 // ChannelBuffer can be passed in as \|input\| and \|output\|.	44 // ChannelBuffer can be passed in as \|input\| and \|output\|.

44 void ProcessChunk(const ChannelBuffer<float>& input,	45 void ProcessChunk(const ChannelBuffer<float>& input,

45 ChannelBuffer<float>* output) override;	46 ChannelBuffer<float>* output) override;

46	47

	48 void SteerBeam(float target_angle_radians);
	peah-webrtc 2015/10/13 12:53:41 My gut feeling is that it would be simpler for an My gut feeling is that it would be simpler for an outside user to use degrees when steering the beam. At least I need to think a second time to get a feeling for what angle a certain value in radians corresponds to. Andrew MacDonald 2015/10/14 22:12:31 This seems like a fairly generic beamformer task. This seems like a fairly generic beamformer task. We should probably add it to Beamformer. And since we have an existing method taking a SphericalPointf, this should as well for consistency. (For comparison, consider AimAt in MR) aluebs-webrtc 2015/10/20 00:04:20 Agreed. Added. Show quoted text On 2015/10/14 22:12:31, Andrew MacDonald wrote: > This seems like a fairly generic beamformer task. We should probably add it to > Beamformer. And since we have an existing method taking a SphericalPointf, this > should as well for consistency. > > (For comparison, consider AimAt in MR) Agreed. Added. aluebs-webrtc 2015/10/20 00:04:20 That is true for human input, but this is probably Show quoted text On 2015/10/13 12:53:41, peah-webrtc wrote: > My gut feeling is that it would be simpler for an outside user to use degrees > when steering the beam. At least I need to think a second time to get a feeling > for what angle a certain value in radians corresponds to. That is true for human input, but this is probably be set automatically by some BemaformingManager which uses a Localizer (returns radians) to decide where to steer towards. Plus all trigonometric functions take radians by default. I would use as a standard to always use radians except for human user input, as I did for the test. peah-webrtc 2015/10/20 21:22:33 Acknowledged. Show quoted text On 2015/10/20 00:04:20, aluebs-webrtc wrote: > On 2015/10/13 12:53:41, peah-webrtc wrote: > > My gut feeling is that it would be simpler for an outside user to use degrees > > when steering the beam. At least I need to think a second time to get a > feeling > > for what angle a certain value in radians corresponds to. > > That is true for human input, but this is probably be set automatically by some > BemaformingManager which uses a Localizer (returns radians) to decide where to > steer towards. Plus all trigonometric functions take radians by default. I would > use as a standard to always use radians except for human user input, as I did > for the test. Acknowledged.
	49

47 bool IsInBeam(const SphericalPointf& spherical_point) override;	50 bool IsInBeam(const SphericalPointf& spherical_point) override;

48	51

49 // After processing each block \|is_target_present_\| is set to true if the	52 // After processing each block \|is_target_present_\| is set to true if the

50 // target signal es present and to false otherwise. This methods can be called	53 // target signal es present and to false otherwise. This methods can be called

51 // to know if the data is target signal or interference and process it	54 // to know if the data is target signal or interference and process it

52 // accordingly.	55 // accordingly.

53 bool is_target_present() override { return is_target_present_; }	56 bool is_target_present() override { return is_target_present_; }

54	57

55 protected:	58 protected:

56 // Process one frequency-domain block of audio. This is where the fun	59 // Process one frequency-domain block of audio. This is where the fun

57 // happens. Implements LappedTransform::Callback.	60 // happens. Implements LappedTransform::Callback.

58 void ProcessAudioBlock(const complex<float>* const* input,	61 void ProcessAudioBlock(const complex<float>* const* input,

59 int num_input_channels,	62 int num_input_channels,

60 size_t num_freq_bins,	63 size_t num_freq_bins,

61 int num_output_channels,	64 int num_output_channels,

62 complex<float>* const* output) override;	65 complex<float>* const* output) override;

63	66

64 private:	67 private:

65 typedef Matrix<float> MatrixF;	68 typedef Matrix<float> MatrixF;

66 typedef ComplexMatrix<float> ComplexMatrixF;	69 typedef ComplexMatrix<float> ComplexMatrixF;

67 typedef complex<float> complex_f;	70 typedef complex<float> complex_f;

68	71

69 void InitFrequencyCorrectionRanges();	72 void InitLowFrequencyCorrectionRanges();

	73 void InitHighFrequencyCorrectionRanges();

70 void InitInterfAngles();	74 void InitInterfAngles();

71 void InitDelaySumMasks();	75 void InitDelaySumMasks();

72 void InitTargetCovMats();	76 void InitTargetCovMats();

	77 void InitDifuseCovMats();

73 void InitInterfCovMats();	78 void InitInterfCovMats();

	79 void NormalizeCovMats();

74	80

75 // Calculates postfilter masks that minimize the mean squared error of our	81 // Calculates postfilter masks that minimize the mean squared error of our

76 // estimation of the desired signal.	82 // estimation of the desired signal.

77 float CalculatePostfilterMask(const ComplexMatrixF& interf_cov_mat,	83 float CalculatePostfilterMask(const ComplexMatrixF& interf_cov_mat,

78 float rpsiw,	84 float rpsiw,

79 float ratio_rxiw_rxim,	85 float ratio_rxiw_rxim,

80 float rmxi_r);	86 float rmxi_r);

81	87

82 // Prevents the postfilter masks from degenerating too quickly (a cause of	88 // Prevents the postfilter masks from degenerating too quickly (a cause of

83 // musical noise).	89 // musical noise).

(...skipping 42 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
126 size_t high_mean_start_bin_;	132 size_t high_mean_start_bin_;

127 size_t high_mean_end_bin_;	133 size_t high_mean_end_bin_;

128	134

129 // Quickly varying mask updated every block.	135 // Quickly varying mask updated every block.

130 float new_mask_[kNumFreqBins];	136 float new_mask_[kNumFreqBins];

131 // Time smoothed mask.	137 // Time smoothed mask.

132 float time_smooth_mask_[kNumFreqBins];	138 float time_smooth_mask_[kNumFreqBins];

133 // Time and frequency smoothed mask.	139 // Time and frequency smoothed mask.

134 float final_mask_[kNumFreqBins];	140 float final_mask_[kNumFreqBins];

135	141

	142 // For both target and interference angles, PI / 2 is perpendicular to the

	143 // microphone array, facing forwards. The positive direction goes

	144 // counterclockwise.
	Andrew MacDonald 2015/10/14 22:12:31 This is really about our coordinate system, and sp This is really about our coordinate system, and specifically how we convert from Cartesian to spherical (since the microphone positions are Cartesian). I think this documentation belongs with the declaration of SphericalPoint. Just as a reference, see this for the definition of Carteisan mic position coordinates in Chromium: https://code.google.com/p/chromium/codesearch#chromium/src/media/audio/audio_... aluebs-webrtc 2015/10/20 00:04:20 I am not sure if we want to impose a specific conv Show quoted text On 2015/10/14 22:12:31, Andrew MacDonald wrote: > This is really about our coordinate system, and specifically how we convert from > Cartesian to spherical (since the microphone positions are Cartesian). > > I think this documentation belongs with the declaration of SphericalPoint. > > Just as a reference, see this for the definition of Carteisan mic position > coordinates in Chromium: > https://code.google.com/p/chromium/codesearch#chromium/src/media/audio/audio_... I am not sure if we want to impose a specific convention to the general Point class, I don't have a strong opinion. Removed this comment and added documentation for CartesianPoint and SphericalPoint following the Chromium example you suggested.
	145 float target_angle_radians_;

136 // Angles of the interferer scenarios.	146 // Angles of the interferer scenarios.

137 std::vector<float> interf_angles_radians_;	147 std::vector<float> interf_angles_radians_;

138	148

139 // Array of length \|kNumFreqBins\|, Matrix of size \|1\| x \|num_channels_\|.	149 // Array of length \|kNumFreqBins\|, Matrix of size \|1\| x \|num_channels_\|.

140 ComplexMatrixF delay_sum_masks_[kNumFreqBins];	150 ComplexMatrixF delay_sum_masks_[kNumFreqBins];

141 ComplexMatrixF normalized_delay_sum_masks_[kNumFreqBins];	151 ComplexMatrixF normalized_delay_sum_masks_[kNumFreqBins];

142	152

143 // Array of length \|kNumFreqBins\|, Matrix of size \|num_input_channels_\| x	153 // Arrays of length \|kNumFreqBins\|, Matrix of size \|num_input_channels_\| x

144 // \|num_input_channels_\|.	154 // \|num_input_channels_\|.

145 ComplexMatrixF target_cov_mats_[kNumFreqBins];	155 ComplexMatrixF target_cov_mats_[kNumFreqBins];

146	156 ComplexMatrixF uniform_cov_mat_[kNumFreqBins];

147 // Array of length \|kNumFreqBins\|, Matrix of size \|num_input_channels_\| x	157 // Array of length \|kNumFreqBins\|, Matrix of size \|num_input_channels_\| x

148 // \|num_input_channels_\|. ScopedVector has a size equal to the number of	158 // \|num_input_channels_\|. ScopedVector has a size equal to the number of

149 // interferer scenarios.	159 // interferer scenarios.

150 ScopedVector<ComplexMatrixF> interf_cov_mats_[kNumFreqBins];	160 ScopedVector<ComplexMatrixF> interf_cov_mats_[kNumFreqBins];

151	161

152 // Of length \|kNumFreqBins\|.	162 // Of length \|kNumFreqBins\|.

153 float wave_numbers_[kNumFreqBins];	163 float wave_numbers_[kNumFreqBins];

154	164

155 // Preallocated for ProcessAudioBlock()	165 // Preallocated for ProcessAudioBlock()

156 // Of length \|kNumFreqBins\|.	166 // Of length \|kNumFreqBins\|.

(...skipping 12 matching lines...) Expand all Loading...
169 // Number of blocks after which the data is considered interference if the	179 // Number of blocks after which the data is considered interference if the

170 // mask does not pass \|kMaskSignalThreshold\|.	180 // mask does not pass \|kMaskSignalThreshold\|.

171 size_t hold_target_blocks_;	181 size_t hold_target_blocks_;

172 // Number of blocks since the last mask that passed \|kMaskSignalThreshold\|.	182 // Number of blocks since the last mask that passed \|kMaskSignalThreshold\|.

173 size_t interference_blocks_count_;	183 size_t interference_blocks_count_;

174 };	184 };

175	185

176 } // namespace webrtc	186 } // namespace webrtc

177	187

178 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_BEAMFORMER_NONLINEAR_BEAMFORMER_H_	188 #endif // WEBRTC_MODULES_AUDIO_PROCESSING_BEAMFORMER_NONLINEAR_BEAMFORMER_H_

OLD	NEW