atomicoo
diff --git a/‎README.md
Lines changed: 3 additions & 0 deletions b/‎README.md
Lines changed: 3 additions & 0 deletions
diff --git a/‎README_en.md
Lines changed: 4 additions & 1 deletion b/‎README_en.md
Lines changed: 4 additions & 1 deletion
diff --git a/‎config/mbspeech.yaml renamed to ‎config/kospeech.yaml
Lines changed: 14 additions & 11 deletions b/‎config/mbspeech.yaml renamed to ‎config/kospeech.yaml
Lines changed: 14 additions & 11 deletions
diff --git a/‎config/ruspeech.yaml
Lines changed: 0 additions & 74 deletions b/‎config/ruspeech.yaml
Lines changed: 0 additions & 74 deletions
diff --git a/‎samples/korean/001-syn.wav
287 KB b/‎samples/korean/001-syn.wav
287 KB
diff --git a/‎samples/korean/002-syn.wav
287 KB b/‎samples/korean/002-syn.wav
287 KB
diff --git a/‎samples/korean/003-syn.wav
287 KB b/‎samples/korean/003-syn.wav
287 KB
diff --git a/‎samples/korean/004-syn.wav
287 KB b/‎samples/korean/004-syn.wav
287 KB
diff --git a/‎samples/korean/005-syn.wav
287 KB b/‎samples/korean/005-syn.wav
287 KB
diff --git a/‎samples/korean/synthesize.txt
Lines changed: 5 additions & 0 deletions b/‎samples/korean/synthesize.txt
Lines changed: 5 additions & 0 deletions
@@ -8,6 +8,7 @@
 
 - 2021/04/09 [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) 分支 提供 [PWG](https://arxiv.org/abs/1910.11480) / [MelGAN](https://arxiv.org/abs/1910.06711) / [Multi-band MelGAN](https://arxiv.org/abs/2005.05106) 声码器！
 - 2021/04/05 支持 [ParallelText2Mel](https://github.com/atomicoo/ParallelTTS/blob/main/models/parallel.py) + [MelGAN](https://arxiv.org/abs/1910.06711) 声码器！
+- [ 关键信息 ]  [速度指标](#速度指标)，[合成样例](https://github.com/atomicoo/ParallelTTS/tree/main/samples/)，[网页演示](#)，[欢迎交流](#欢迎交流) ……
 
 ## 目录结构
 
@@ -126,6 +127,7 @@ $ tensorboard --logdir logdir/[DIR]/
 - [LJSpeech](https://keithito.com/LJ-Speech-Dataset/)：英语，女性，22050 Hz，约 24 小时
 - [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut)：日语，女性，48000 Hz，约 10 小时
 - [BiaoBei](https://www.data-baker.com/open_source.html)：普通话，女性，48000 Hz，约 12 小时
+- [KSS](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset)：韩语，女性，44100 Hz，约 12 小时
 - [RuLS](https://www.openslr.org/96/)：俄语，多说话人（仅使用单一说话人音频），16000 Hz，约 98 小时
 - [TWLSpeech](#)（非公开，质量较差）：藏语，女性（多说话人，音色相近），16000 Hz，约 23 小时
 
@@ -152,6 +154,7 @@ TODO：待补充
 
 - 在 [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) 分支中，`vocoder` 代码取自 [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)，由于声学特征提取方式不兼容，需要进行转化，具体转化代码见[这里](https://github.com/atomicoo/ParallelTTS/blob/4eb44679271494f1d478da281ae474a07dfe77c6/synthesize.wave.py#L79-L85)。
 - 普通话模型的文本输入选择拼音序列，因为 [BiaoBei](https://www.data-baker.com/open_source.html) 的原始拼音序列不包含标点、以及对齐模型训练不完全，所以合成语音的节奏会有点问题。
+- 韩语模型没有专门训练对应的声码器，而是直接使用 LJSpeech（同为 22050 Hz）的声码器，可能稍微影响合成语音的质量。
 
 ## 参考资料
 
 
@@ -4,10 +4,11 @@
 
 [TOC]
 
-## What's New
+## What's New !
 
 - 2021/04/09 [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) branch support [PWG](https://arxiv.org/abs/1910.11480) / [MelGAN](https://arxiv.org/abs/1910.06711) / [Multi-band MelGAN](https://arxiv.org/abs/2005.05106) vocoder!
 - 2021/04/05 Support [ParallelText2Mel](https://github.com/atomicoo/ParallelTTS/blob/main/models/parallel.py) + [MelGAN](https://arxiv.org/abs/1910.06711) vocoder!
+- [ Key Info ]  [Speed indicator](#Speed)，[Samples](https://github.com/atomicoo/ParallelTTS/tree/main/samples/)，[Web Demo](#)，[Communication](#Communication) ......
 
 ## Repo Structure
 
@@ -125,6 +126,7 @@ It is highly recommended to use [Wandb](https://wandb.ai/)（Weights & Biases）
 - [LJSpeech](https://keithito.com/LJ-Speech-Dataset/): English, Female, 22050 Hz, ~24 h
 - [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut): Japanese, Female, 48000 Hz, ~10 h
 - [BiaoBei](https://www.data-baker.com/open_source.html): Mandarin, Female, 48000 Hz, ~12 h
+- [KSS](https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset): Korean, Female, 44100 Hz, ~12 h
 - [RuLS](https://www.openslr.org/96/): Russian, Multi-speakers (only use audios of single speaker), 16000 Hz, ~98 h
 - [TWLSpeech](#) (non-public, poor quality): Tibetan, Female (multi-speakers, sound similar), 16000 Hz，~23 h
 
@@ -151,6 +153,7 @@ Attention, no multiple tests, for reference only.
 
 - In [wavegan](https://github.com/atomicoo/ParallelTTS/tree/wavegan) branch, code of `vocoder` is from [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN). Since the method of acoustic feature extraction is not compatible, it needs to be transformed. See [here](https://github.com/atomicoo/ParallelTTS/blob/4eb44679271494f1d478da281ae474a07dfe77c6/synthesize.wave.py#L79-L85).
 - The input of mandarin model is pinyin. Because of the lack of punctuations in [BiaoBei](https://www.data-baker.com/open_source.html)'s raw pinyin sequence and the incomplete alignment model training, there's something wrong with the rhythm of synthesized samples.
+- I haven't trained a Korean vocoder specially, and just use the vocoder of LJSpeech (22050 Hz), which might slightly affect the quality of synthesized audio.
 
 ## References
 
 
@@ -1,9 +1,9 @@
 data:
   datasets_path: './datasets'
-  dataset: 'mbspeech'
-  dataset_dir: 'MBSpeech-1.0'
+  dataset: 'ksspeech'
+  dataset_dir: 'KSSpeech-1.4'
 text:
-  graphemes: &gs !!python/object/apply:eval ['list("абвгдеёжзийклмноөпрстуүфхцчшъыьэюя")']
+  graphemes: &gs !!python/object/apply:eval ['list([chr(_) for _ in range(0x1100, 0x1113)]+[chr(_) for _ in range(0x1161, 0x1176)]+[chr(_) for _ in range(0x11A8, 0x11C3)])']
   phonemes: &ps !!python/object/apply:eval ['[]']
   specials: &sp !!python/object/apply:eval ['["<pad>", "<unk>"]']
   punctuations: &pt !!python/object/apply:eval ['[".", ",", "?", "!", " ", "-"]']
@@ -14,18 +14,21 @@ audio:
   filter_length: 1024
   hop_length: 256  # WARNING: this can't be changed.
   win_length: 1024
-  sampling_rate: &sr 22050
+  sampling_rate: &sr 22050  # 44100 (raw)
   segment_length: *sr
   pad_short: 2000
   mel_fmin: 0.0
   mel_fmax: 8000.0
   # Precomputed statistics for log-mel-spectrs for speech dataset
-  spec_mean: -5.522  # for LJSpeech dataset
-  spec_std: 2.063  # for LJSpeech dataset
-  spec_min: -11.5129  # for LJSpeech dataset
-  spec_max: 2.0584  # for LJSpeech dataset
+  spec_mean: -4.855  # for KSSpeech dataset
+  spec_std: 2.036  # for KSSpeech dataset
+  spec_min: -11.5129  # for KSSpeech dataset
+  spec_max: 1.9256  # for KSSpeech dataset
   # Others
-  normalize: false
+  force_frame_rate: false
+  normalize:
+    match_volume: true
+    trim_silence: true
   reduction_rate: 4
 parallel:
   ground_truth: false
@@ -70,5 +73,5 @@ trainer:
   disable_progress_bar: false
   logdir: './logdir'
 synthesizer:
-  inputs_file_path: './samples/text/synthesize-mb.txt'
-  outputs_dir: './samples/audio'
+  inputs_file_path: './outputs/text/synthesize-ko.txt'
+  outputs_dir: './outputs/audio'
@@ -0,0 +1,5 @@
+나는 살아오면서 감기를 앓은 적이 한 번도 없다.
+아무리 그 질문에 대해 생각해 봐도 모르겠어요.
+저는 어릴 때부터 샴푸 대신 비누로 머리를 감았어요.
+많은 아빠들이 집에 돌아가는 길에 빵집에 들른다.
+오늘 아침 폭설로 열차 운행이 한 시간 동안 중단되었다.