OLD | NEW |
1 # Conversational Speech generator tool | 1 # Conversational Speech generator tool |
2 | 2 |
3 Tool to generate multiple-end audio tracks to simulate conversational speech | 3 Tool to generate multiple-end audio tracks to simulate conversational speech |
4 with two or more participants. | 4 with two or more participants. |
5 | 5 |
6 The input to the tool is a directory containing a number of audio tracks and | 6 The input to the tool is a directory containing a number of audio tracks and |
7 a text file indicating how to time the sequence of speech turns (see the Example | 7 a text file indicating how to time the sequence of speech turns (see the Example |
8 section). | 8 section). |
9 | 9 |
10 Since the timing of the speaking turns is specified by the user, the generated | 10 Since the timing of the speaking turns is specified by the user, the generated |
11 tracks may not be suitable for testing scenarios in which there is unpredictable | 11 tracks may not be suitable for testing scenarios in which there is unpredictable |
12 network delay (e.g., end-to-end RTC assessment). | 12 network delay (e.g., end-to-end RTC assessment). |
13 | 13 |
14 Instead, the generated pairs can be used when the delay is constant (obviously | 14 Instead, the generated pairs can be used when the delay is constant (obviously |
15 including the case in which there is no delay). | 15 including the case in which there is no delay). |
16 For instance, echo cancellation in the APM module can be evaluated using two-end | 16 For instance, echo cancellation in the APM module can be evaluated using two-end |
17 audio tracks as input and reverse input. | 17 audio tracks as input and reverse input. |
18 | 18 |
19 By indicating negative and positive time offsets, one can reproduce cross-talk | 19 By indicating negative and positive time offsets, one can reproduce cross-talk |
20 and silence in the conversation. | 20 (aka double-talk) and silence in the conversation. |
21 | |
22 IMPORTANT: **the whole code has not been landed yet.** | |
23 | 21 |
24 ### Example | 22 ### Example |
25 | 23 |
26 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) | 24 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) |
27 and b1, b2 (speaker B). | 25 and b1, b2 (speaker B). |
28 The text file with the timing information may look like this: | 26 The text file with the timing information may look like this: |
29 | 27 |
30 ``` | 28 ``` |
31 A a1 0 | 29 A a1 0 |
32 B b1 0 | 30 B b1 0 |
(...skipping 32 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
65 The two tracks can be also visualized as follows (one characheter represents | 63 The two tracks can be also visualized as follows (one characheter represents |
66 100 ms, "." is silence and "*" is speech). | 64 100 ms, "." is silence and "*" is speech). |
67 | 65 |
68 ``` | 66 ``` |
69 t: 0 1 2 3 4 5 6 (s) | 67 t: 0 1 2 3 4 5 6 (s) |
70 A: **********...........**********........******************** | 68 A: **********...........**********........******************** |
71 B: ..........**********.........**********.................... | 69 B: ..........**********.........**********.................... |
72 ^ 200 ms cross-talk | 70 ^ 200 ms cross-talk |
73 100 ms silence ^ | 71 100 ms silence ^ |
74 ``` | 72 ``` |
OLD | NEW |