Skip to content

Commit 81d4409

Browse files
committed
Update README.md and upload Korean samples
1 parent 8732aa0 commit 81d4409

File tree

10 files changed

+26
-86
lines changed

10 files changed

+26
-86
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88

99
- 2021/04/09 [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) 分支 提供 [PWG](https://arxiv.org/abs/1910.11480) / [MelGAN](https://arxiv.org/abs/1910.06711) / [Multi-band MelGAN](https://arxiv.org/abs/2005.05106) 声码器!
1010
- 2021/04/05 支持 [ParallelText2Mel](https://github.com/atomicoo/ParallelTTS/blob/main/models/parallel.py) + [MelGAN](https://arxiv.org/abs/1910.06711) 声码器!
11+
- [ 关键信息 ] [速度指标](#速度指标)[合成样例](https://github.com/atomicoo/ParallelTTS/tree/main/samples/)[网页演示](#)[欢迎交流](#欢迎交流) ……
1112

1213
## 目录结构
1314

@@ -126,6 +127,7 @@ $ tensorboard --logdir logdir/[DIR]/
126127
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/):英语,女性,22050 Hz,约 24 小时
127128
- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut):日语,女性,48000 Hz,约 10 小时
128129
- [BiaoBei](https://www.data-baker.com/open_source.html):普通话,女性,48000 Hz,约 12 小时
130+
- [KSS](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset):韩语,女性,44100 Hz,约 12 小时
129131
- [RuLS](https://www.openslr.org/96/):俄语,多说话人(仅使用单一说话人音频),16000 Hz,约 98 小时
130132
- [TWLSpeech](#)(非公开,质量较差):藏语,女性(多说话人,音色相近),16000 Hz,约 23 小时
131133

@@ -152,6 +154,7 @@ TODO:待补充
152154

153155
-[wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) 分支中,`vocoder` 代码取自 [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN),由于声学特征提取方式不兼容,需要进行转化,具体转化代码见[这里](https://github.com/atomicoo/ParallelTTS/blob/4eb44679271494f1d478da281ae474a07dfe77c6/synthesize.wave.py#L79-L85)
154156
- 普通话模型的文本输入选择拼音序列,因为 [BiaoBei](https://www.data-baker.com/open_source.html) 的原始拼音序列不包含标点、以及对齐模型训练不完全,所以合成语音的节奏会有点问题。
157+
- 韩语模型没有专门训练对应的声码器,而是直接使用 LJSpeech(同为 22050 Hz)的声码器,可能稍微影响合成语音的质量。
155158

156159
## 参考资料
157160

README_en.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@
44

55
[TOC]
66

7-
## What's New
7+
## What's New !
88

99
- 2021/04/09 [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) branch support [PWG](https://arxiv.org/abs/1910.11480) / [MelGAN](https://arxiv.org/abs/1910.06711) / [Multi-band MelGAN](https://arxiv.org/abs/2005.05106) vocoder!
1010
- 2021/04/05 Support [ParallelText2Mel](https://github.com/atomicoo/ParallelTTS/blob/main/models/parallel.py) + [MelGAN](https://arxiv.org/abs/1910.06711) vocoder!
11+
- [ Key Info ] [Speed indicator](#Speed)[Samples](https://github.com/atomicoo/ParallelTTS/tree/main/samples/)[Web Demo](#)[Communication](#Communication) ......
1112

1213
## Repo Structure
1314

@@ -125,6 +126,7 @@ It is highly recommended to use [Wandb](https://wandb.ai/)(Weights & Biases)
125126
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/): English, Female, 22050 Hz, ~24 h
126127
- [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): Japanese, Female, 48000 Hz, ~10 h
127128
- [BiaoBei](https://www.data-baker.com/open_source.html): Mandarin, Female, 48000 Hz, ~12 h
129+
- [KSS](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset): Korean, Female, 44100 Hz, ~12 h
128130
- [RuLS](https://www.openslr.org/96/): Russian, Multi-speakers (only use audios of single speaker), 16000 Hz, ~98 h
129131
- [TWLSpeech](#) (non-public, poor quality): Tibetan, Female (multi-speakers, sound similar), 16000 Hz,~23 h
130132

@@ -151,6 +153,7 @@ Attention, no multiple tests, for reference only.
151153

152154
- In [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) branch, code of `vocoder` is from [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN). Since the method of acoustic feature extraction is not compatible, it needs to be transformed. See [here](https://github.com/atomicoo/ParallelTTS/blob/4eb44679271494f1d478da281ae474a07dfe77c6/synthesize.wave.py#L79-L85).
153155
- The input of mandarin model is pinyin. Because of the lack of punctuations in [BiaoBei](https://www.data-baker.com/open_source.html)'s raw pinyin sequence and the incomplete alignment model training, there's something wrong with the rhythm of synthesized samples.
156+
- I haven't trained a Korean vocoder specially, and just use the vocoder of LJSpeech (22050 Hz), which might slightly affect the quality of synthesized audio.
154157

155158
## References
156159

config/mbspeech.yaml renamed to config/kospeech.yaml

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
data:
22
datasets_path: './datasets'
3-
dataset: 'mbspeech'
4-
dataset_dir: 'MBSpeech-1.0'
3+
dataset: 'ksspeech'
4+
dataset_dir: 'KSSpeech-1.4'
55
text:
6-
graphemes: &gs !!python/object/apply:eval ['list("абвгдеёжзийклмноөпрстуүфхцчшъыьэюя")']
6+
graphemes: &gs !!python/object/apply:eval ['list([chr(_) for _ in range(0x1100, 0x1113)]+[chr(_) for _ in range(0x1161, 0x1176)]+[chr(_) for _ in range(0x11A8, 0x11C3)])']
77
phonemes: &ps !!python/object/apply:eval ['[]']
88
specials: &sp !!python/object/apply:eval ['["<pad>", "<unk>"]']
99
punctuations: &pt !!python/object/apply:eval ['[".", ",", "?", "!", " ", "-"]']
@@ -14,18 +14,21 @@ audio:
1414
filter_length: 1024
1515
hop_length: 256 # WARNING: this can't be changed.
1616
win_length: 1024
17-
sampling_rate: &sr 22050
17+
sampling_rate: &sr 22050 # 44100 (raw)
1818
segment_length: *sr
1919
pad_short: 2000
2020
mel_fmin: 0.0
2121
mel_fmax: 8000.0
2222
# Precomputed statistics for log-mel-spectrs for speech dataset
23-
spec_mean: -5.522 # for LJSpeech dataset
24-
spec_std: 2.063 # for LJSpeech dataset
25-
spec_min: -11.5129 # for LJSpeech dataset
26-
spec_max: 2.0584 # for LJSpeech dataset
23+
spec_mean: -4.855 # for KSSpeech dataset
24+
spec_std: 2.036 # for KSSpeech dataset
25+
spec_min: -11.5129 # for KSSpeech dataset
26+
spec_max: 1.9256 # for KSSpeech dataset
2727
# Others
28-
normalize: false
28+
force_frame_rate: false
29+
normalize:
30+
match_volume: true
31+
trim_silence: true
2932
reduction_rate: 4
3033
parallel:
3134
ground_truth: false
@@ -70,5 +73,5 @@ trainer:
7073
disable_progress_bar: false
7174
logdir: './logdir'
7275
synthesizer:
73-
inputs_file_path: './samples/text/synthesize-mb.txt'
74-
outputs_dir: './samples/audio'
76+
inputs_file_path: './outputs/text/synthesize-ko.txt'
77+
outputs_dir: './outputs/audio'

config/ruspeech.yaml

Lines changed: 0 additions & 74 deletions
This file was deleted.

samples/korean/001-syn.wav

287 KB
Binary file not shown.

samples/korean/002-syn.wav

287 KB
Binary file not shown.

samples/korean/003-syn.wav

287 KB
Binary file not shown.

samples/korean/004-syn.wav

287 KB
Binary file not shown.

samples/korean/005-syn.wav

287 KB
Binary file not shown.

samples/korean/synthesize.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
나는 살아오면서 감기를 앓은 적이 한 번도 없다.
2+
아무리 그 질문에 대해 생각해 봐도 모르겠어요.
3+
저는 어릴 때부터 샴푸 대신 비누로 머리를 감았어요.
4+
많은 아빠들이 집에 돌아가는 길에 빵집에 들른다.
5+
오늘 아침 폭설로 열차 운행이 한 시간 동안 중단되었다.

0 commit comments

Comments
 (0)