Chromium Code Reviews| OLD | NEW |
|---|---|
| (Empty) | |
| 1 #Conversational Speech generator tool | |
| 2 | |
| 3 Python tool to generate two-ends audio track pairs to simulate conversational | |
| 4 speech. | |
| 5 | |
| 6 The input to the tool is a directory containing a number of audio tracks and | |
| 7 a text file indicating how to time the sequence of speech turns (see the Example | |
| 8 section). | |
| 9 | |
| 10 Since the timing of the speaking turns is specified by the user, the generated | |
| 11 tracks may not be suitable for testing scenarios in which there is unpredictable | |
| 12 network delay (e.g., end-to-end RTC assessment). | |
| 13 Instead, the generated pairs can be used when the delay is constant (obviously | |
| 14 including the case in which there is no delay). | |
| 15 For instance, echo cancellation in the APM module can be evaluated using the | |
| 16 generated tracks as input and reverse input. | |
| 17 | |
| 18 By indicating negative and positive time offsets, one can reproduce cross-talk | |
| 19 and silence in the conversation. | |
| 20 | |
| 21 IMPORTANT: **the whole code has not been landed yet.** | |
| 22 | |
| 23 ###Example | |
| 24 | |
| 25 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) | |
| 26 and b1, b2 (speaker B). | |
| 27 The text file with the timing information may look like this: | |
| 28 ``` a1 0 | |
|
hlundin-webrtc
2017/03/03 14:01:10
Is this implicit assumption that the speakers swit
AleBzk
2017/03/03 15:27:08
Good point. I think it's easier to allow freedom h
| |
| 29 b1 0 | |
| 30 a2 100 | |
| 31 b2 -200 | |
| 32 a3 0``` | |
| 33 The first column contains the audio track file names, the second the offsets (in | |
| 34 milliseconds) used to concatenate the chunks. | |
| 35 | |
| 36 Assume that all the audio tracks in the example above are 1000 ms long. | |
| 37 The tool will then generate two tracks that look like this: | |
| 38 | |
| 39 ```Track A: | |
| 40 a1 (1000 ms) | |
| 41 silence (1100 ms) | |
| 42 a2 (1000 ms) | |
| 43 silence (800 ms) | |
| 44 a3 (1000 ms)``` | |
| 45 | |
| 46 ```Track B: | |
| 47 silence (1000 ms) | |
| 48 b1 (1000 ms) | |
| 49 silence (900 ms) | |
| 50 b2 (1000 ms) | |
| 51 silence (1000 ms)``` | |
| 52 | |
| 53 The two tracks can be also visualized as follows (one characheter represents | |
| 54 100 ms, "." is silence and "*" is speech). | |
| 55 | |
| 56 ```t: 0 1 2 3 4 5 (s) | |
| 57 A: **********...........**********........********** | |
| 58 B: ..........**********.........**********..........``` | |
| 59 ^ 200 ms cross-talk | |
| 60 100 ms silence ^ | |
| OLD | NEW |