OLD | NEW |
---|---|
(Empty) | |
1 #Conversational Speech generator tool | |
2 | |
3 Python tool to generate two-ends audio track pairs to simulate conversational | |
4 speech. | |
5 | |
6 The input to the tool is a directory containing a number of audio tracks and | |
7 a text file indicating how to time the sequence of speech turns (see the Example | |
8 section). | |
9 | |
10 Since the timing of the speaking turns is specified by the user, the generated | |
11 tracks may not be suitable for testing scenarios in which there is unpredictable | |
12 network delay (e.g., end-to-end RTC assessment). | |
13 Instead, the generated pairs can be used when the delay is constant (obviously | |
14 including the case in which there is no delay). | |
15 For instance, echo cancellation in the APM module can be evaluated using the | |
16 generated tracks as input and reverse input. | |
17 | |
18 By indicating negative and positive time offsets, one can reproduce cross-talk | |
19 and silence in the conversation. | |
20 | |
21 IMPORTANT: **the whole code has not been landed yet.** | |
22 | |
23 ###Example | |
24 | |
25 For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A) | |
26 and b1, b2 (speaker B). | |
27 The text file with the timing information may look like this: | |
28 ``` a1 0 | |
hlundin-webrtc
2017/03/03 14:01:10
Is this implicit assumption that the speakers swit
AleBzk
2017/03/03 15:27:08
Good point. I think it's easier to allow freedom h
| |
29 b1 0 | |
30 a2 100 | |
31 b2 -200 | |
32 a3 0``` | |
33 The first column contains the audio track file names, the second the offsets (in | |
34 milliseconds) used to concatenate the chunks. | |
35 | |
36 Assume that all the audio tracks in the example above are 1000 ms long. | |
37 The tool will then generate two tracks that look like this: | |
38 | |
39 ```Track A: | |
40 a1 (1000 ms) | |
41 silence (1100 ms) | |
42 a2 (1000 ms) | |
43 silence (800 ms) | |
44 a3 (1000 ms)``` | |
45 | |
46 ```Track B: | |
47 silence (1000 ms) | |
48 b1 (1000 ms) | |
49 silence (900 ms) | |
50 b2 (1000 ms) | |
51 silence (1000 ms)``` | |
52 | |
53 The two tracks can be also visualized as follows (one characheter represents | |
54 100 ms, "." is silence and "*" is speech). | |
55 | |
56 ```t: 0 1 2 3 4 5 (s) | |
57 A: **********...........**********........********** | |
58 B: ..........**********.........**********..........``` | |
59 ^ 200 ms cross-talk | |
60 100 ms silence ^ | |
OLD | NEW |