OLD | NEW |
(Empty) | |
| 1 #Conversational Speech generator tool |
| 2 |
| 3 Python tool to generate multiple-end audio tracks to simulate conversational |
| 4 speech with two or more participants. |
| 5 |
| 6 The input to the tool is a directory containing a number of audio tracks and |
| 7 a text file indicating how to time the sequence of speech turns (see the Example |
| 8 section). |
| 9 |
| 10 Since the timing of the speaking turns is specified by the user, the generated |
| 11 tracks may not be suitable for testing scenarios in which there is unpredictable |
| 12 network delay (e.g., end-to-end RTC assessment). |
| 13 |
| 14 Instead, the generated pairs can be used when the delay is constant (obviously |
| 15 including the case in which there is no delay). |
| 16 For instance, echo cancellation in the APM module can be evaluated using two-end |
| 17 audio tracks as input and reverse input. |
| 18 |
| 19 By indicating negative and positive time offsets, one can reproduce cross-talk |
| 20 and silence in the conversation. |
| 21 |
| 22 IMPORTANT: **the whole code has not been landed yet.** |
| 23 |
| 24 ###Example |
| 25 |
| 26 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) |
| 27 and b1, b2 (speaker B). |
| 28 The text file with the timing information may look like this: |
| 29 |
| 30 ``` |
| 31 A a1 0 |
| 32 B b1 0 |
| 33 A a2 100 |
| 34 B b2 -200 |
| 35 A a3 0 |
| 36 A a4 0 |
| 37 ``` |
| 38 |
| 39 The first column indicates the speaker name, the second contains the audio track |
| 40 file names, and the third the offsets (in milliseconds) used to concatenate the |
| 41 chunks. |
| 42 |
| 43 Assume that all the audio tracks in the example above are 1000 ms long. |
| 44 The tool will then generate two tracks (A and B) that look like this: |
| 45 |
| 46 **Track A** |
| 47 ``` |
| 48 a1 (1000 ms) |
| 49 silence (1100 ms) |
| 50 a2 (1000 ms) |
| 51 silence (800 ms) |
| 52 a3 (1000 ms) |
| 53 a4 (1000 ms) |
| 54 ``` |
| 55 |
| 56 **Track B** |
| 57 ``` |
| 58 silence (1000 ms) |
| 59 b1 (1000 ms) |
| 60 silence (900 ms) |
| 61 b2 (1000 ms) |
| 62 silence (2000 ms) |
| 63 ``` |
| 64 |
| 65 The two tracks can be also visualized as follows (one characheter represents |
| 66 100 ms, "." is silence and "*" is speech). |
| 67 |
| 68 ``` |
| 69 t: 0 1 2 3 4 5 6 (s) |
| 70 A: **********...........**********........******************** |
| 71 B: ..........**********.........**********.................... |
| 72 ^ 200 ms cross-talk |
| 73 100 ms silence ^ |
| 74 ``` |
OLD | NEW |