webrtc/modules/audio_processing/test/py_conversational_speech/README.md - Issue 2722173003: Conversational Speech generator tool

Side by Side Diff

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Keyboard Shortcuts

	File
u :	up to issue
j / k :	jump to file after / before current file
J / K :	jump to next file with a comment after / before current file
	Side-by-side diff
i :	toggle intra-line diffs
e :	expand all comments
c :	collapse all comments
s :	toggle showing all comments
n / p :	next / previous diff chunk or comment
N / P :	next / previous comment
<Up> / <Down> :	next / previous line

	Issue
u :	up to list of issues
j / k :	jump to patch after / before current patch
o / <Enter> :	open current patch in side-by-side view
i :	open current patch in unified diff view

	Issue List
j / k :	jump to issue after / before current issue
o / <Enter> :	open current issue

Side by Side Diff: webrtc/modules/audio_processing/test/py_conversational_speech/README.md

Issue 2722173003: Conversational Speech generator tool (Closed)

Patch Set: Remarks Created 3 years, 9 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 #Conversational Speech generator tool

	2

	3 Python tool to generate two-ends audio track pairs to simulate conversational

	4 speech.

	5

	6 The input to the tool is a directory containing a number of audio tracks and

	7 a text file indicating how to time the sequence of speech turns (see the Example

	8 section).

	9

	10 Since the timing of the speaking turns is specified by the user, the generated

	11 tracks may not be suitable for testing scenarios in which there is unpredictable

	12 network delay (e.g., end-to-end RTC assessment).

	13 Instead, the generated pairs can be used when the delay is constant (obviously

	14 including the case in which there is no delay).

	15 For instance, echo cancellation in the APM module can be evaluated using the

	16 generated tracks as input and reverse input.

	17

	18 By indicating negative and positive time offsets, one can reproduce cross-talk

	19 and silence in the conversation.

	20

	21 IMPORTANT: the whole code has not been landed yet.

	22

	23 ###Example

	24

	25 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A)

	26 and b1, b2 (speaker B).

	27 The text file with the timing information may look like this:

	28 ``` a1 0
	hlundin-webrtc 2017/03/03 14:01:10 Is this implicit assumption that the speakers swit Is this implicit assumption that the speakers switch after each file? That is, it is not possible to pass two files in a row to A, and then one file to B, right? If so, please, document this. AleBzk 2017/03/03 15:27:08 Good point. I think it's easier to allow freedom h Show quoted text On 2017/03/03 14:01:10, hlundin-webrtc wrote: > Is this implicit assumption that the speakers switch after each file? That is, > it is not possible to pass two files in a row to A, and then one file to B, > right? > > If so, please, document this. Good point. I think it's easier to allow freedom here (also in terms of coding). This requires to add a third column to the text file to indicate which end is active. This is also good to make an arbitrary number of ends (not just two).
	29 b1 0

	30 a2 100

	31 b2 -200

	32 a3 0```

	33 The first column contains the audio track file names, the second the offsets (in

	34 milliseconds) used to concatenate the chunks.

	35

	36 Assume that all the audio tracks in the example above are 1000 ms long.

	37 The tool will then generate two tracks that look like this:

	38

	39 ```Track A:

	40 a1 (1000 ms)

	41 silence (1100 ms)

	42 a2 (1000 ms)

	43 silence (800 ms)

	44 a3 (1000 ms)```

	45

	46 ```Track B:

	47 silence (1000 ms)

	48 b1 (1000 ms)

	49 silence (900 ms)

	50 b2 (1000 ms)

	51 silence (1000 ms)```

	52

	53 The two tracks can be also visualized as follows (one characheter represents

	54 100 ms, "." is silence and "*" is speech).

	55

	56 ```t: 0 1 2 3 4 5 (s)

	57 A: ********...........******........********

	58 B: ..........********.........********..........```

	59 ^ 200 ms cross-talk

	60 100 ms silence ^

OLD	NEW

« no previous file with comments | « webrtc/modules/audio_processing/test/py_conversational_speech/OWNERS ('k') | no next file » | no next file with comments »