| OLD | NEW |
| (Empty) |
| 1 #Conversational Speech generator tool | |
| 2 | |
| 3 Python tool to generate multiple-end audio tracks to simulate conversational | |
| 4 speech with two or more participants. | |
| 5 | |
| 6 The input to the tool is a directory containing a number of audio tracks and | |
| 7 a text file indicating how to time the sequence of speech turns (see the Example | |
| 8 section). | |
| 9 | |
| 10 Since the timing of the speaking turns is specified by the user, the generated | |
| 11 tracks may not be suitable for testing scenarios in which there is unpredictable | |
| 12 network delay (e.g., end-to-end RTC assessment). | |
| 13 | |
| 14 Instead, the generated pairs can be used when the delay is constant (obviously | |
| 15 including the case in which there is no delay). | |
| 16 For instance, echo cancellation in the APM module can be evaluated using two-end | |
| 17 audio tracks as input and reverse input. | |
| 18 | |
| 19 By indicating negative and positive time offsets, one can reproduce cross-talk | |
| 20 and silence in the conversation. | |
| 21 | |
| 22 IMPORTANT: **the whole code has not been landed yet.** | |
| 23 | |
| 24 ###Example | |
| 25 | |
| 26 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) | |
| 27 and b1, b2 (speaker B). | |
| 28 The text file with the timing information may look like this: | |
| 29 | |
| 30 ``` | |
| 31 A a1 0 | |
| 32 B b1 0 | |
| 33 A a2 100 | |
| 34 B b2 -200 | |
| 35 A a3 0 | |
| 36 A a4 0 | |
| 37 ``` | |
| 38 | |
| 39 The first column indicates the speaker name, the second contains the audio track | |
| 40 file names, and the third the offsets (in milliseconds) used to concatenate the | |
| 41 chunks. | |
| 42 | |
| 43 Assume that all the audio tracks in the example above are 1000 ms long. | |
| 44 The tool will then generate two tracks (A and B) that look like this: | |
| 45 | |
| 46 **Track A** | |
| 47 ``` | |
| 48 a1 (1000 ms) | |
| 49 silence (1100 ms) | |
| 50 a2 (1000 ms) | |
| 51 silence (800 ms) | |
| 52 a3 (1000 ms) | |
| 53 a4 (1000 ms) | |
| 54 ``` | |
| 55 | |
| 56 **Track B** | |
| 57 ``` | |
| 58 silence (1000 ms) | |
| 59 b1 (1000 ms) | |
| 60 silence (900 ms) | |
| 61 b2 (1000 ms) | |
| 62 silence (2000 ms) | |
| 63 ``` | |
| 64 | |
| 65 The two tracks can be also visualized as follows (one characheter represents | |
| 66 100 ms, "." is silence and "*" is speech). | |
| 67 | |
| 68 ``` | |
| 69 t: 0 1 2 3 4 5 6 (s) | |
| 70 A: **********...........**********........******************** | |
| 71 B: ..........**********.........**********.................... | |
| 72 ^ 200 ms cross-talk | |
| 73 100 ms silence ^ | |
| 74 ``` | |
| OLD | NEW |