OLD | NEW |
| (Empty) |
1 #Conversational Speech generator tool | |
2 | |
3 Python tool to generate multiple-end audio tracks to simulate conversational | |
4 speech with two or more participants. | |
5 | |
6 The input to the tool is a directory containing a number of audio tracks and | |
7 a text file indicating how to time the sequence of speech turns (see the Example | |
8 section). | |
9 | |
10 Since the timing of the speaking turns is specified by the user, the generated | |
11 tracks may not be suitable for testing scenarios in which there is unpredictable | |
12 network delay (e.g., end-to-end RTC assessment). | |
13 | |
14 Instead, the generated pairs can be used when the delay is constant (obviously | |
15 including the case in which there is no delay). | |
16 For instance, echo cancellation in the APM module can be evaluated using two-end | |
17 audio tracks as input and reverse input. | |
18 | |
19 By indicating negative and positive time offsets, one can reproduce cross-talk | |
20 and silence in the conversation. | |
21 | |
22 IMPORTANT: **the whole code has not been landed yet.** | |
23 | |
24 ###Example | |
25 | |
26 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) | |
27 and b1, b2 (speaker B). | |
28 The text file with the timing information may look like this: | |
29 | |
30 ``` | |
31 A a1 0 | |
32 B b1 0 | |
33 A a2 100 | |
34 B b2 -200 | |
35 A a3 0 | |
36 A a4 0 | |
37 ``` | |
38 | |
39 The first column indicates the speaker name, the second contains the audio track | |
40 file names, and the third the offsets (in milliseconds) used to concatenate the | |
41 chunks. | |
42 | |
43 Assume that all the audio tracks in the example above are 1000 ms long. | |
44 The tool will then generate two tracks (A and B) that look like this: | |
45 | |
46 **Track A** | |
47 ``` | |
48 a1 (1000 ms) | |
49 silence (1100 ms) | |
50 a2 (1000 ms) | |
51 silence (800 ms) | |
52 a3 (1000 ms) | |
53 a4 (1000 ms) | |
54 ``` | |
55 | |
56 **Track B** | |
57 ``` | |
58 silence (1000 ms) | |
59 b1 (1000 ms) | |
60 silence (900 ms) | |
61 b2 (1000 ms) | |
62 silence (2000 ms) | |
63 ``` | |
64 | |
65 The two tracks can be also visualized as follows (one characheter represents | |
66 100 ms, "." is silence and "*" is speech). | |
67 | |
68 ``` | |
69 t: 0 1 2 3 4 5 6 (s) | |
70 A: **********...........**********........******************** | |
71 B: ..........**********.........**********.................... | |
72 ^ 200 ms cross-talk | |
73 100 ms silence ^ | |
74 ``` | |
OLD | NEW |