OLD | NEW |
1 #Conversational Speech generator tool | 1 # Conversational Speech generator tool |
2 | 2 |
3 Python tool to generate multiple-end audio tracks to simulate conversational | 3 Tool to generate multiple-end audio tracks to simulate conversational speech |
4 speech with two or more participants. | 4 with two or more participants. |
5 | 5 |
6 The input to the tool is a directory containing a number of audio tracks and | 6 The input to the tool is a directory containing a number of audio tracks and |
7 a text file indicating how to time the sequence of speech turns (see the Example | 7 a text file indicating how to time the sequence of speech turns (see the Example |
8 section). | 8 section). |
9 | 9 |
10 Since the timing of the speaking turns is specified by the user, the generated | 10 Since the timing of the speaking turns is specified by the user, the generated |
11 tracks may not be suitable for testing scenarios in which there is unpredictable | 11 tracks may not be suitable for testing scenarios in which there is unpredictable |
12 network delay (e.g., end-to-end RTC assessment). | 12 network delay (e.g., end-to-end RTC assessment). |
13 | 13 |
14 Instead, the generated pairs can be used when the delay is constant (obviously | 14 Instead, the generated pairs can be used when the delay is constant (obviously |
15 including the case in which there is no delay). | 15 including the case in which there is no delay). |
16 For instance, echo cancellation in the APM module can be evaluated using two-end | 16 For instance, echo cancellation in the APM module can be evaluated using two-end |
17 audio tracks as input and reverse input. | 17 audio tracks as input and reverse input. |
18 | 18 |
19 By indicating negative and positive time offsets, one can reproduce cross-talk | 19 By indicating negative and positive time offsets, one can reproduce cross-talk |
20 and silence in the conversation. | 20 and silence in the conversation. |
21 | 21 |
22 IMPORTANT: **the whole code has not been landed yet.** | 22 IMPORTANT: **the whole code has not been landed yet.** |
23 | 23 |
24 ###Example | 24 ### Example |
25 | 25 |
26 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) | 26 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) |
27 and b1, b2 (speaker B). | 27 and b1, b2 (speaker B). |
28 The text file with the timing information may look like this: | 28 The text file with the timing information may look like this: |
29 | 29 |
30 ``` | 30 ``` |
31 A a1 0 | 31 A a1 0 |
32 B b1 0 | 32 B b1 0 |
33 A a2 100 | 33 A a2 100 |
34 B b2 -200 | 34 B b2 -200 |
(...skipping 30 matching lines...) Expand all Loading... |
65 The two tracks can be also visualized as follows (one characheter represents | 65 The two tracks can be also visualized as follows (one characheter represents |
66 100 ms, "." is silence and "*" is speech). | 66 100 ms, "." is silence and "*" is speech). |
67 | 67 |
68 ``` | 68 ``` |
69 t: 0 1 2 3 4 5 6 (s) | 69 t: 0 1 2 3 4 5 6 (s) |
70 A: **********...........**********........******************** | 70 A: **********...........**********........******************** |
71 B: ..........**********.........**********.................... | 71 B: ..........**********.........**********.................... |
72 ^ 200 ms cross-talk | 72 ^ 200 ms cross-talk |
73 100 ms silence ^ | 73 100 ms silence ^ |
74 ``` | 74 ``` |
OLD | NEW |