Chromium Code Reviews| Index: webrtc/modules/audio_processing/test/py_conversational_speech/README.md |
| diff --git a/webrtc/modules/audio_processing/test/py_conversational_speech/README.md b/webrtc/modules/audio_processing/test/py_conversational_speech/README.md |
| new file mode 100644 |
| index 0000000000000000000000000000000000000000..9abd275250acac7cb725938af0375696975c8004 |
| --- /dev/null |
| +++ b/webrtc/modules/audio_processing/test/py_conversational_speech/README.md |
| @@ -0,0 +1,60 @@ |
| +#Conversational Speech generator tool |
| + |
| +Python tool to generate two-ends audio track pairs to simulate conversational |
| +speech. |
| + |
| +The input to the tool is a directory containing a number of audio tracks and |
| +a text file indicating how to time the sequence of speech turns (see the Example |
| +section). |
| + |
| +Since the timing of the speaking turns is specified by the user, the generated |
| +tracks may not be suitable for testing scenarios in which there is unpredictable |
| +network delay (e.g., end-to-end RTC assessment). |
| +Instead, the generated pairs can be used when the delay is constant (obviously |
| +including the case in which there is no delay). |
| +For instance, echo cancellation in the APM module can be evaluated using the |
| +generated tracks as input and reverse input. |
| + |
| +By indicating negative and positive time offsets, one can reproduce cross-talk |
| +and silence in the conversation. |
| + |
| +IMPORTANT: **the whole code has not been landed yet.** |
| + |
| +###Example |
| + |
| +For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) |
| +and b1, b2 (speaker B). |
| +The text file with the timing information may look like this: |
| +``` a1 0 |
|
hlundin-webrtc
2017/03/03 14:01:10
Is this implicit assumption that the speakers swit
AleBzk
2017/03/03 15:27:08
Good point. I think it's easier to allow freedom h
|
| + b1 0 |
| + a2 100 |
| + b2 -200 |
| + a3 0``` |
| +The first column contains the audio track file names, the second the offsets (in |
| +milliseconds) used to concatenate the chunks. |
| + |
| +Assume that all the audio tracks in the example above are 1000 ms long. |
| +The tool will then generate two tracks that look like this: |
| + |
| +```Track A: |
| + a1 (1000 ms) |
| + silence (1100 ms) |
| + a2 (1000 ms) |
| + silence (800 ms) |
| + a3 (1000 ms)``` |
| + |
| +```Track B: |
| + silence (1000 ms) |
| + b1 (1000 ms) |
| + silence (900 ms) |
| + b2 (1000 ms) |
| + silence (1000 ms)``` |
| + |
| +The two tracks can be also visualized as follows (one characheter represents |
| +100 ms, "." is silence and "*" is speech). |
| + |
| +```t: 0 1 2 3 4 5 (s) |
| +A: **********...........**********........********** |
| +B: ..........**********.........**********..........``` |
| + ^ 200 ms cross-talk |
| + 100 ms silence ^ |