webrtc/modules/audio_coding/codecs/audio_decoder.h - Issue 2326953003: Added a ParsePayload method to AudioDecoder.

Side by Side Diff: webrtc/modules/audio_coding/codecs/audio_decoder.h

Issue 2326953003: Added a ParsePayload method to AudioDecoder. (Closed)

Patch Set: Created 4 years, 3 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 /*	1 /*

2 * Copyright (c) 2012 The WebRTC project authors. All Rights Reserved.	2 * Copyright (c) 2012 The WebRTC project authors. All Rights Reserved.

3 *	3 *

4 * Use of this source code is governed by a BSD-style license	4 * Use of this source code is governed by a BSD-style license

5 * that can be found in the LICENSE file in the root of the source	5 * that can be found in the LICENSE file in the root of the source

6 * tree. An additional intellectual property rights grant can be found	6 * tree. An additional intellectual property rights grant can be found

7 * in the file PATENTS. All contributing project authors may	7 * in the file PATENTS. All contributing project authors may

8 * be found in the AUTHORS file in the root of the source tree.	8 * be found in the AUTHORS file in the root of the source tree.

9 */	9 */

10	10

11 #ifndef WEBRTC_MODULES_AUDIO_CODING_NETEQ_INCLUDE_AUDIO_DECODER_H_	11 #ifndef WEBRTC_MODULES_AUDIO_CODING_NETEQ_INCLUDE_AUDIO_DECODER_H_

12 #define WEBRTC_MODULES_AUDIO_CODING_NETEQ_INCLUDE_AUDIO_DECODER_H_	12 #define WEBRTC_MODULES_AUDIO_CODING_NETEQ_INCLUDE_AUDIO_DECODER_H_

13	13

14 #include <stdlib.h> // NULL	14 #include <stdlib.h> // NULL

15	15

	16 #include "webrtc/base/array_view.h"

	17 #include "webrtc/base/buffer.h"

16 #include "webrtc/base/constructormagic.h"	18 #include "webrtc/base/constructormagic.h"

	19 #include "webrtc/base/optional.h"

17 #include "webrtc/typedefs.h"	20 #include "webrtc/typedefs.h"

18	21

19 namespace webrtc {	22 namespace webrtc {

20	23

21 // This is the interface class for decoders in NetEQ. Each codec type will have	24 // This is the interface class for decoders in NetEQ. Each codec type will have

22 // and implementation of this class.	25 // and implementation of this class.

23 class AudioDecoder {	26 class AudioDecoder {

24 public:	27 public:

25 enum SpeechType {	28 enum SpeechType {

26 kSpeech = 1,	29 kSpeech = 1,

27 kComfortNoise = 2	30 kComfortNoise = 2

28 };	31 };

29	32

30 // Used by PacketDuration below. Save the value -1 for errors.	33 // Used by PacketDuration below. Save the value -1 for errors.

31 enum { kNotImplemented = -2 };	34 enum { kNotImplemented = -2 };

32	35

33 AudioDecoder() = default;	36 AudioDecoder() = default;

34 virtual ~AudioDecoder() = default;	37 virtual ~AudioDecoder() = default;

35	38

	39 class Frame {
	hlundin-webrtc 2016/09/09 12:11:50 Frame is too general, and already claimed for too Frame is too general, and already claimed for too many different concepts in audio processing and coding. Consider EncodedFrame. ossu 2016/09/12 10:31:37 I agree - EncodedFrame is better. EncodedAudioFram Show quoted text On 2016/09/09 12:11:50, hlundin-webrtc wrote: > Frame is too general, and already claimed for too many different concepts in > audio processing and coding. Consider EncodedFrame. I agree - EncodedFrame is better. EncodedAudioFrame? I'm not sure. It's a bit long, especially when prefixed with AudioDecoder::. Hmm... ossu 2016/09/13 13:37:46 I've gone with EncodedAudioFrame. It makes more se Show quoted text On 2016/09/12 10:31:37, ossu wrote: > On 2016/09/09 12:11:50, hlundin-webrtc wrote: > > Frame is too general, and already claimed for too many different concepts in > > audio processing and coding. Consider EncodedFrame. > > I agree - EncodedFrame is better. EncodedAudioFrame? I'm not sure. It's a bit > long, especially when prefixed with AudioDecoder::. > > Hmm... I've gone with EncodedAudioFrame. It makes more sense for naming with the changes I'm making to LegacyFrame in the next CL.
	40 public:

	41 struct DecodeResult {

	42 size_t num_decoded_samples;

	43 SpeechType speech_type;

	44 };

	45

	46 virtual ~Frame() = default;

	47

	48 // Returns the duration in samples-per-channel of this audio frame.

	49 // If no duration can be ascertained, returns zero.

	50 virtual size_t Duration() const = 0;

	51

	52 // Decodes this frame of audio and writes the result in \|decoded\|.
	hlundin-webrtc 2016/09/09 12:11:49 What can be expected of the state of the Frame aft What can be expected of the state of the Frame after having called Decode? Is it dead then? What if I call Decode twice on the same object? ossu 2016/09/12 10:31:36 I'm not sure. Practically, it currently acts as if Show quoted text On 2016/09/09 12:11:49, hlundin-webrtc wrote: > What can be expected of the state of the Frame after having called Decode? Is it > dead then? What if I call Decode twice on the same object? I'm not sure. Practically, it currently acts as if doing AudioDecoder::Decode() twice with the same payload, but that's just the implementation and not really a contract. Besides this issue, I'm wondering about the contract for the size of decoded versus the length of a frame. How will the implementer of an AudioDecoder know how long a Frame is expected to be? How large can it be? Some guidance like: - An AudioDecoder::Frame is expected to be about 20 ms long, but must not be larger than X ms. - If the input payload is longer than X ms, it must be split up into multiple frames. We'll either need to decide on rules such as these, or that Decode() can be called several times, each returning a new set of samples until the Frame is actually completely decoded, at which point it will return zero decoded samples and the Frame will be discarded. ossu 2016/09/13 13:37:46 I've clarified that Decode should only be called o Show quoted text On 2016/09/12 10:31:36, ossu wrote: > On 2016/09/09 12:11:49, hlundin-webrtc wrote: > > What can be expected of the state of the Frame after having called Decode? Is > it > > dead then? What if I call Decode twice on the same object? > > I'm not sure. Practically, it currently acts as if doing AudioDecoder::Decode() > twice with the same payload, but that's just the implementation and not really a > contract. Besides this issue, I'm wondering about the contract for the size of > decoded versus the length of a frame. How will the implementer of an > AudioDecoder know how long a Frame is expected to be? How large can it be? Some > guidance like: > - An AudioDecoder::Frame is expected to be about 20 ms long, but must not be > larger than X ms. > - If the input payload is longer than X ms, it must be split up into multiple > frames. > > We'll either need to decide on rules such as these, or that Decode() can be > called several times, each returning a new set of samples until the Frame is > actually completely decoded, at which point it will return zero decoded samples > and the Frame will be discarded. I've clarified that Decode should only be called once per frame. It's not yet enforced in any way, but I guess the implementations could keep track and DCHECK if we really wanted to. hlundin-webrtc 2016/09/15 08:12:16 Acknowledged. Show quoted text On 2016/09/13 13:37:46, ossu wrote: > On 2016/09/12 10:31:36, ossu wrote: > > On 2016/09/09 12:11:49, hlundin-webrtc wrote: > > > What can be expected of the state of the Frame after having called Decode? > Is > > it > > > dead then? What if I call Decode twice on the same object? > > > > I'm not sure. Practically, it currently acts as if doing > AudioDecoder::Decode() > > twice with the same payload, but that's just the implementation and not really > a > > contract. Besides this issue, I'm wondering about the contract for the size of > > decoded versus the length of a frame. How will the implementer of an > > AudioDecoder know how long a Frame is expected to be? How large can it be? > Some > > guidance like: > > - An AudioDecoder::Frame is expected to be about 20 ms long, but must not be > > larger than X ms. > > - If the input payload is longer than X ms, it must be split up into multiple > > frames. > > > > We'll either need to decide on rules such as these, or that Decode() can be > > called several times, each returning a new set of samples until the Frame is > > actually completely decoded, at which point it will return zero decoded > samples > > and the Frame will be discarded. > > I've clarified that Decode should only be called once per frame. It's not yet > enforced in any way, but I guess the implementations could keep track and DCHECK > if we really wanted to. Acknowledged.
	53 // Returns rtc::Optional containing the total number of samples across all

	54 // channels, as well as whether the decoder produced comfort noise or

	55 // speech.

	56 virtual rtc::Optional<DecodeResult> Decode(
	hlundin-webrtc 2016/09/09 12:11:50 How will error codes from the decoder be handled? How will error codes from the decoder be handled? Today we are storing them for later reporting. Will this change? ossu 2016/09/12 10:31:36 No idea! You tell me! :) Are these the error codes Show quoted text On 2016/09/09 12:11:50, hlundin-webrtc wrote: > How will error codes from the decoder be handled? Today we are storing them for > later reporting. Will this change? No idea! You tell me! :) Are these the error codes gotten from AudioDecoder::ErrorCode? If so, I think that information should be put either as a getter on the Frame, or as a return value in DecodeResult - which in turn means it no longer makes sense to make it an Optional. It could still contain some internal Optional stuff, but by then it's probably easier to just have a \|status\| field and let the rest of the fields be valid only if \|status\| is kOK. ossu 2016/09/12 12:37:35 From what I can see, the decoder error code is fre Show quoted text On 2016/09/12 10:31:36, ossu wrote: > On 2016/09/09 12:11:50, hlundin-webrtc wrote: > > How will error codes from the decoder be handled? Today we are storing them > for > > later reporting. Will this change? > > No idea! You tell me! :) Are these the error codes gotten from > AudioDecoder::ErrorCode? If so, I think that information should be put either as > a getter on the Frame, or as a return value in DecodeResult - which in turn > means it no longer makes sense to make it an Optional. It could still contain > some internal Optional stuff, but by then it's probably easier to just have a > \|status\| field and let the rest of the fields be valid only if \|status\| is kOK. From what I can see, the decoder error code is free-form and logged at one point in neteq_impl.cc. It is also possible to get the latest decoder error through NetEq::LastDecoderError(). This is only used in four places, all of which are in NetEq unit tests. I see no reason why this information could not be put into DecodeResult directly, even if we want to retain the current functionality (i.e. LastDecoderError()). The question is: do we? If it's only used by a couple of unit tests. hlundin-webrtc 2016/09/15 08:12:16 Right. Since no production code seems to care abou Show quoted text On 2016/09/12 12:37:35, ossu wrote: > On 2016/09/12 10:31:36, ossu wrote: > > On 2016/09/09 12:11:50, hlundin-webrtc wrote: > > > How will error codes from the decoder be handled? Today we are storing them > > for > > > later reporting. Will this change? > > > > No idea! You tell me! :) Are these the error codes gotten from > > AudioDecoder::ErrorCode? If so, I think that information should be put either > as > > a getter on the Frame, or as a return value in DecodeResult - which in turn > > means it no longer makes sense to make it an Optional. It could still contain > > some internal Optional stuff, but by then it's probably easier to just have a > > \|status\| field and let the rest of the fields be valid only if \|status\| is > kOK. > > From what I can see, the decoder error code is free-form and logged at one point > in neteq_impl.cc. It is also possible to get the latest decoder error through > NetEq::LastDecoderError(). This is only used in four places, all of which are in > NetEq unit tests. I see no reason why this information could not be put into > DecodeResult directly, even if we want to retain the current functionality (i.e. > LastDecoderError()). The question is: do we? If it's only used by a couple of > unit tests. Right. Since no production code seems to care about the decoder error codes, we might just scrap that. However, this interface should still define how a decoder error is flagged (not specified). Does an empty return value imply an error?
	57 rtc::ArrayView<int16_t> decoded) const = 0;

	58 };

	59

	60 struct ParseResult {

	61 ParseResult();

	62 ParseResult(uint32_t timestamp, bool primary, std::unique_ptr<Frame> frame);

	63 ParseResult(ParseResult&& b);

	64 ~ParseResult();

	65

	66 ParseResult& operator=(ParseResult&& b);

	67

	68 uint32_t timestamp;
	hlundin-webrtc 2016/09/09 12:11:50 rtp_timestamp? Or timestamp_in_samples? G.722... rtp_timestamp? Or timestamp_in_samples? G.722... ossu 2016/09/12 10:31:36 I think it should be in samples, however I'm reall Show quoted text On 2016/09/09 12:11:50, hlundin-webrtc wrote: > rtp_timestamp? Or timestamp_in_samples? G.722... I think it should be in samples, however I'm really not sure that that is the case right now. If we want to decouple the decoders from the specifics of the network transport (which I strongly think we do), then we shouldn't be using RTP timestamps to communicate between AudioDecoder and NetEq, just as we're not sending whole (parsed) RTP packets into the AudioDecoder. ossu 2016/09/13 13:37:46 I've looked through the calling code and the times Show quoted text On 2016/09/12 10:31:36, ossu wrote: > On 2016/09/09 12:11:50, hlundin-webrtc wrote: > > rtp_timestamp? Or timestamp_in_samples? G.722... > > I think it should be in samples, however I'm really not sure that that is the > case right now. If we want to decouple the decoders from the specifics of the > network transport (which I strongly think we do), then we shouldn't be using RTP > timestamps to communicate between AudioDecoder and NetEq, just as we're not > sending whole (parsed) RTP packets into the AudioDecoder. I've looked through the calling code and the timestamp is in samples already. Yay! hlundin-webrtc 2016/09/15 08:12:16 Acknowledged. Show quoted text On 2016/09/13 13:37:46, ossu wrote: > On 2016/09/12 10:31:36, ossu wrote: > > On 2016/09/09 12:11:50, hlundin-webrtc wrote: > > > rtp_timestamp? Or timestamp_in_samples? G.722... > > > > I think it should be in samples, however I'm really not sure that that is the > > case right now. If we want to decouple the decoders from the specifics of the > > network transport (which I strongly think we do), then we shouldn't be using > RTP > > timestamps to communicate between AudioDecoder and NetEq, just as we're not > > sending whole (parsed) RTP packets into the AudioDecoder. > > I've looked through the calling code and the timestamp is in samples already. > Yay! Acknowledged.
	69 bool primary;

	70 std::unique_ptr<Frame> frame;

	71 };
	kwiberg-webrtc 2016/09/10 07:34:59 Why do you need ParseResult and Frame to be two di Why do you need ParseResult and Frame to be two different classes? ... Ah. So that Frame can be a pure interface. ossu 2016/09/12 10:31:36 Also, the stuff in ParseResult is used to create n Show quoted text On 2016/09/10 07:34:59, kwiberg-webrtc wrote: > Why do you need ParseResult and Frame to be two different classes? > > ... Ah. So that Frame can be a pure interface. Also, the stuff in ParseResult is used to create new Packet objects by cloning the original packet and updating timestamp, primary and (the newly added) frame. This could be changed by having a (simpler) Packet type that is shared by NetEq and AudioDecoder, so that the AudioDecoder can create fully-fledged packets to be put into NetEq's PacketBuffer directly, rather than having NetEq fill in the missing details.
	72

	73 // Let the decoder parse this payload and prepare zero or more decodable

	74 // frames. The decoder is free to steal the contents of the payload and retain

	75 // them for as long as necessary.

	76 virtual std::vector<ParseResult> ParsePayload(rtc::Buffer* payload,

	77 uint32_t timestamp,

	78 bool is_primary);
	kwiberg-webrtc 2016/09/10 07:34:59 "and retain them for as long as necessary" is redu "and retain them for as long as necessary" is redundant; it follows from being allowed to steal the contents of \|payload\|. Maybe it's good to be specific, and say that the callee is allowed to swap or move the buffer. ossu 2016/09/12 10:31:37 Yeah, I'll change it to something more specific. Show quoted text On 2016/09/10 07:34:59, kwiberg-webrtc wrote: > "and retain them for as long as necessary" is redundant; it follows from being > allowed to steal the contents of \|payload\|. > > Maybe it's good to be specific, and say that the callee is allowed to swap or > move the buffer. Yeah, I'll change it to something more specific. ossu 2016/09/13 13:37:46 Also tried to address the contract of EncodedAudio Show quoted text On 2016/09/12 10:31:37, ossu wrote: > On 2016/09/10 07:34:59, kwiberg-webrtc wrote: > > "and retain them for as long as necessary" is redundant; it follows from being > > allowed to steal the contents of \|payload\|. > > > > Maybe it's good to be specific, and say that the callee is allowed to swap or > > move the buffer. > > Yeah, I'll change it to something more specific. Also tried to address the contract of EncodedAudioFrame here, i.e. how big a frame is supposed to be. Should we mention an ideal frame size as well? 10 ms? 20 ms? hlundin-webrtc 2016/09/15 08:12:16 kMaxFrameSize in neteq_impl.h defines that 120 ms Show quoted text On 2016/09/13 13:37:46, ossu wrote: > On 2016/09/12 10:31:37, ossu wrote: > > On 2016/09/10 07:34:59, kwiberg-webrtc wrote: > > > "and retain them for as long as necessary" is redundant; it follows from > being > > > allowed to steal the contents of \|payload\|. > > > > > > Maybe it's good to be specific, and say that the callee is allowed to swap > or > > > move the buffer. > > > > Yeah, I'll change it to something more specific. > > Also tried to address the contract of EncodedAudioFrame here, i.e. how big a > frame is supposed to be. Should we mention an ideal frame size as well? 10 ms? > 20 ms? kMaxFrameSize in neteq_impl.h defines that 120 ms per frame is the maximum allowed. And lower than 10 ms is not recommended either. Other than that, I would argue that the ideal frame size is between 10 and 30 ms.
	79

36 // Decodes \|encode_len\| bytes from \|encoded\| and writes the result in	80 // Decodes \|encode_len\| bytes from \|encoded\| and writes the result in

37 // \|decoded\|. The maximum bytes allowed to be written into \|decoded\| is	81 // \|decoded\|. The maximum bytes allowed to be written into \|decoded\| is

38 // \|max_decoded_bytes\|. Returns the total number of samples across all	82 // \|max_decoded_bytes\|. Returns the total number of samples across all

39 // channels. If the decoder produced comfort noise, \|speech_type\|	83 // channels. If the decoder produced comfort noise, \|speech_type\|

40 // is set to kComfortNoise, otherwise it is kSpeech. The desired output	84 // is set to kComfortNoise, otherwise it is kSpeech. The desired output

41 // sample rate is provided in \|sample_rate_hz\|, which must be valid for the	85 // sample rate is provided in \|sample_rate_hz\|, which must be valid for the

42 // codec at hand.	86 // codec at hand.

43 int Decode(const uint8_t* encoded,	87 int Decode(const uint8_t* encoded,

44 size_t encoded_len,	88 size_t encoded_len,

45 int sample_rate_hz,	89 int sample_rate_hz,

(...skipping 69 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
115 int sample_rate_hz,	159 int sample_rate_hz,

116 int16_t* decoded,	160 int16_t* decoded,

117 SpeechType* speech_type);	161 SpeechType* speech_type);

118	162

119 private:	163 private:

120 RTC_DISALLOW_COPY_AND_ASSIGN(AudioDecoder);	164 RTC_DISALLOW_COPY_AND_ASSIGN(AudioDecoder);

121 };	165 };

122	166

123 } // namespace webrtc	167 } // namespace webrtc

124 #endif // WEBRTC_MODULES_AUDIO_CODING_NETEQ_INCLUDE_AUDIO_DECODER_H_	168 #endif // WEBRTC_MODULES_AUDIO_CODING_NETEQ_INCLUDE_AUDIO_DECODER_H_

OLD	NEW

« no previous file with comments | « no previous file | webrtc/modules/audio_coding/codecs/audio_decoder.cc » ('j') | webrtc/modules/audio_coding/codecs/audio_decoder.cc » ('J')