Skip to content

Commit 2b9994c

Browse files
committed
✨ 新機能(main.py, requirements.txt, pyproject.toml): 依存関係の追加とpyproject.tomlによるパッケージ管理
🐛 修正(main.py): 音声ファイル正規化とサイレンストリミングにlibrosaを使用 ✨ 新機能(main.py): コマンドライン引数の検証を追加 ✨ 新機能(main.py): バージョン情報を追加 🐛 修正(main.py): SOFAモデルのパスを修正 🐛 修正(main.py): TextGridの作成を修正 ✨ 新機能(main.py): 音声ファイルの長さを取得する方法を変更 ♻️ リファクタ(main.py): コードの整理と不要なインポートの削除
1 parent f16e54e commit 2b9994c

File tree

9 files changed

+2459
-70
lines changed

9 files changed

+2459
-70
lines changed

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -161,9 +161,9 @@ cython_debug/
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
#.idea/
163163

164-
src/cktp/**
165-
!src/cktp/**/
166-
!src/cktp/**/.gitkeep
164+
src/ckpt/**
165+
!src/ckpt/**/
166+
!src/ckpt/**/.gitkeep
167167

168168
src/dictionaries/**
169169
!src/dictionaries/**/

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.11

README.md

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,55 @@
11
# Voicebank2DiffSinger
2-
UTAUの音源ファイルからSOFAとMakeDiffSingerを用いて、学習前のデータセットを生成する
2+
UTAU音源からSOFAとMakeDiffSingerを用いて、DiffSinger用の学習用データセットを作成する
33

44
## 前提要件
5+
- Windows
56
- C++ によるデスクトップ開発 (Visual Studio)
67
- CMake
7-
- Python 3.12未満 (3.10.11にてテスト済み)
8+
- Python 3.12未満 (3.11.11にてテスト済み)
89

9-
## 使い方 (Windows)
10-
1. このリポジトリをsubmoduleを含めcloneする
11-
```sh
10+
## インストール方法 (uv (高速) )
11+
1. uvをセットアップ (オプション)
12+
```powershell
13+
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
14+
```
15+
2. このリポジトリをsubmoduleを含めcloneし、ディレクトリに移動
16+
```powershell
17+
git clone --recursive
18+
cd Voicebank2DiffSinger
19+
```
20+
3. 必要なモジュールをインストールする
21+
```powershell
22+
uv sync
23+
```
24+
4. [日本語のSOFAモデル](https://github.com/Greenleaf2001/SOFA_Models/releases/tag/JPN_Test2)から「step.100000.ckpt」と「japanese-extension-sofa.txt
25+
」をダウンロードし、「step.100000.ckpt」を「src/ckpt」に配置し、「japanese-extension-sofa.txt
26+
」を「src/dictionaries」に配置する
27+
28+
## インストール方法 (pip)
29+
1. このリポジトリをsubmoduleを含めcloneし、ディレクトリに移動
30+
```powershell
1231
git clone --recursive
32+
cd Voicebank2DiffSinger
1333
```
1434
2. 仮想環境を構築し、入る
15-
```sh
35+
```powershell
1636
python -m venv .venv
1737
.venv/scripts/activate
1838
```
1939
3. 必要なモジュールをインストールする
20-
```sh
40+
```powershell
2141
pip install -r requirements.txt
22-
pip install -r src/SOFA/requirements.txt
23-
pip install -r src/MakeDiffSinger/acoustic_forced_alignment/requirements.txt
24-
pip install -r src/MakeDiffSinger/variance-temp-solution/requirements.txt
25-
```
26-
4. [PyTorchの公式サイト](https://pytorch.org/get-started/locally/)にて、セットアップをする
27-
5. [日本語のSOFAモデル](https://github.com/colstone/SOFA_Models/releases/tag/JPN-V0.0.2b)をダウンロードし、解凍後中にある「japanese-v2.0-45000.ckpt」を「src/cktp」に配置し、同じく「japanese-dictionary.txt」を「src/dictionaries」に配置する
28-
6. src/main.py の args に音源フォルダを一つ(もしくは複数)渡し起動する
29-
```sh
42+
```
43+
4. [日本語のSOFAモデル](https://github.com/Greenleaf2001/SOFA_Models/releases/tag/JPN_Test2)から「step.100000.ckpt」と「japanese-extension-sofa.txt
44+
」をダウンロードし、「step.100000.ckpt」を「src/ckpt」に配置し、「japanese-extension-sofa.txt
45+
」を「src/dictionaries」に配置する
46+
47+
## 使用方法
48+
1. 仮想環境に入る (オプション)
49+
```powershell
50+
.venv/scripts/activate
51+
```
52+
2. src/main.py の args に音源 (音階) フォルダを一つ(もしくは複数)渡し起動する
53+
```powershell
3054
python src/main.py example/A3 example/A2 example/A4
3155
```

pyproject.toml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
[project]
2+
name = "voicebank2diffsinger"
3+
version = "0.1.0"
4+
description = "Convert the UTAU Voicebank to a configuration compatible with DiffSinger Dataset"
5+
readme = "README.md"
6+
requires-python = ">=3.11"
7+
dependencies = [
8+
"beautifulsoup4>=4.13.3",
9+
"biopython==1.78",
10+
"chardet>=5.2.0",
11+
"click>=8.1.8",
12+
"einops==0.6.1",
13+
"h5py>=3.13.0",
14+
"librosa<0.10.0",
15+
"lightning>=2.0.0",
16+
"matplotlib~=3.7.3",
17+
"numba>=0.61.0",
18+
"numpy~=1.24.1",
19+
"pandas~=2.0.3",
20+
"praat-parselmouth>=0.4.5",
21+
"praatio<6.0.0",
22+
"pyopenjtalk-plus>=0.3.4.post10",
23+
"pyyaml~=6.0.1",
24+
"soundfile>=0.13.1",
25+
"sox>=1.5.0",
26+
"sqlalchemy==1.4.46",
27+
"tensorboard>=2.19.0",
28+
"tensorboardx>=2.6.2.2",
29+
"textgrid>=1.6.1",
30+
"torch>=2.6.0",
31+
"torchaudio>=2.6.0",
32+
"tqdm~=4.66.1",
33+
"utaupy>=1.19.1",
34+
]

requirements.txt

Lines changed: 271 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,271 @@
1-
pyopenjtalk==0.3.3
2-
pydub==0.25.1
3-
beautifulsoup4==4.12.3
4-
utaupy==1.18.3
1+
# This file was autogenerated by uv via the following command:
2+
# uv pip compile pyproject.toml -o requirements.txt
3+
absl-py==2.1.0
4+
# via tensorboard
5+
aiohappyeyeballs==2.4.6
6+
# via aiohttp
7+
aiohttp==3.11.13
8+
# via fsspec
9+
aiosignal==1.3.2
10+
# via aiohttp
11+
attrs==25.1.0
12+
# via aiohttp
13+
audioread==3.0.1
14+
# via librosa
15+
beautifulsoup4==4.13.3
16+
# via voicebank2diffsinger (pyproject.toml)
17+
biopython==1.78
18+
# via voicebank2diffsinger (pyproject.toml)
19+
certifi==2025.1.31
20+
# via requests
21+
cffi==1.17.1
22+
# via soundfile
23+
chardet==5.2.0
24+
# via voicebank2diffsinger (pyproject.toml)
25+
charset-normalizer==3.4.1
26+
# via requests
27+
click==8.1.8
28+
# via voicebank2diffsinger (pyproject.toml)
29+
colorama==0.4.6
30+
# via
31+
# click
32+
# tqdm
33+
coloredlogs==15.0.1
34+
# via onnxruntime
35+
contourpy==1.3.1
36+
# via matplotlib
37+
cycler==0.12.1
38+
# via matplotlib
39+
decorator==5.2.1
40+
# via librosa
41+
einops==0.6.1
42+
# via voicebank2diffsinger (pyproject.toml)
43+
filelock==3.17.0
44+
# via torch
45+
flatbuffers==25.2.10
46+
# via onnxruntime
47+
fonttools==4.56.0
48+
# via matplotlib
49+
frozenlist==1.5.0
50+
# via
51+
# aiohttp
52+
# aiosignal
53+
fsspec==2025.2.0
54+
# via
55+
# lightning
56+
# pytorch-lightning
57+
# torch
58+
greenlet==3.1.1
59+
# via sqlalchemy
60+
grpcio==1.70.0
61+
# via tensorboard
62+
h5py==3.13.0
63+
# via voicebank2diffsinger (pyproject.toml)
64+
humanfriendly==10.0
65+
# via coloredlogs
66+
idna==3.10
67+
# via
68+
# requests
69+
# yarl
70+
jinja2==3.1.5
71+
# via torch
72+
joblib==1.4.2
73+
# via
74+
# librosa
75+
# scikit-learn
76+
kiwisolver==1.4.8
77+
# via matplotlib
78+
librosa==0.9.2
79+
# via voicebank2diffsinger (pyproject.toml)
80+
lightning==2.5.0.post0
81+
# via voicebank2diffsinger (pyproject.toml)
82+
lightning-utilities==0.12.0
83+
# via
84+
# lightning
85+
# pytorch-lightning
86+
# torchmetrics
87+
llvmlite==0.44.0
88+
# via numba
89+
markdown==3.7
90+
# via tensorboard
91+
markupsafe==3.0.2
92+
# via
93+
# jinja2
94+
# werkzeug
95+
matplotlib==3.7.5
96+
# via voicebank2diffsinger (pyproject.toml)
97+
mpmath==1.3.0
98+
# via sympy
99+
multidict==6.1.0
100+
# via
101+
# aiohttp
102+
# yarl
103+
networkx==3.4.2
104+
# via torch
105+
numba==0.61.0
106+
# via
107+
# voicebank2diffsinger (pyproject.toml)
108+
# librosa
109+
# resampy
110+
numpy==1.24.4
111+
# via
112+
# voicebank2diffsinger (pyproject.toml)
113+
# biopython
114+
# contourpy
115+
# h5py
116+
# librosa
117+
# matplotlib
118+
# numba
119+
# onnxruntime
120+
# pandas
121+
# praat-parselmouth
122+
# pyopenjtalk-plus
123+
# resampy
124+
# scikit-learn
125+
# scipy
126+
# soundfile
127+
# sox
128+
# tensorboard
129+
# tensorboardx
130+
# torchmetrics
131+
onnxruntime==1.20.1
132+
# via pyopenjtalk-plus
133+
packaging==24.2
134+
# via
135+
# librosa
136+
# lightning
137+
# lightning-utilities
138+
# matplotlib
139+
# onnxruntime
140+
# pooch
141+
# pytorch-lightning
142+
# tensorboard
143+
# tensorboardx
144+
# torchmetrics
145+
pandas==2.0.3
146+
# via voicebank2diffsinger (pyproject.toml)
147+
pillow==11.1.0
148+
# via matplotlib
149+
platformdirs==4.3.6
150+
# via pooch
151+
pooch==1.8.2
152+
# via librosa
153+
praat-parselmouth==0.4.5
154+
# via voicebank2diffsinger (pyproject.toml)
155+
praatio==5.1.1
156+
# via voicebank2diffsinger (pyproject.toml)
157+
propcache==0.3.0
158+
# via
159+
# aiohttp
160+
# yarl
161+
protobuf==5.29.3
162+
# via
163+
# onnxruntime
164+
# tensorboard
165+
# tensorboardx
166+
pycparser==2.22
167+
# via cffi
168+
pyopenjtalk-plus==0.3.4.post10
169+
# via voicebank2diffsinger (pyproject.toml)
170+
pyparsing==3.2.1
171+
# via matplotlib
172+
pyreadline3==3.5.4
173+
# via humanfriendly
174+
python-dateutil==2.9.0.post0
175+
# via
176+
# matplotlib
177+
# pandas
178+
pytorch-lightning==2.5.0.post0
179+
# via lightning
180+
pytz==2025.1
181+
# via pandas
182+
pyyaml==6.0.2
183+
# via
184+
# voicebank2diffsinger (pyproject.toml)
185+
# lightning
186+
# pytorch-lightning
187+
requests==2.32.3
188+
# via pooch
189+
resampy==0.4.3
190+
# via librosa
191+
scikit-learn==1.6.1
192+
# via librosa
193+
scipy==1.15.2
194+
# via
195+
# librosa
196+
# scikit-learn
197+
setuptools==75.8.0
198+
# via
199+
# lightning-utilities
200+
# tensorboard
201+
six==1.17.0
202+
# via
203+
# python-dateutil
204+
# tensorboard
205+
soundfile==0.13.1
206+
# via
207+
# voicebank2diffsinger (pyproject.toml)
208+
# librosa
209+
soupsieve==2.6
210+
# via beautifulsoup4
211+
sox==1.5.0
212+
# via voicebank2diffsinger (pyproject.toml)
213+
sqlalchemy==1.4.46
214+
# via voicebank2diffsinger (pyproject.toml)
215+
sudachidict-core==20250129
216+
# via pyopenjtalk-plus
217+
sudachipy==0.6.10
218+
# via
219+
# pyopenjtalk-plus
220+
# sudachidict-core
221+
sympy==1.13.1
222+
# via
223+
# onnxruntime
224+
# torch
225+
tensorboard==2.19.0
226+
# via voicebank2diffsinger (pyproject.toml)
227+
tensorboard-data-server==0.7.2
228+
# via tensorboard
229+
tensorboardx==2.6.2.2
230+
# via voicebank2diffsinger (pyproject.toml)
231+
textgrid==1.6.1
232+
# via voicebank2diffsinger (pyproject.toml)
233+
threadpoolctl==3.5.0
234+
# via scikit-learn
235+
torch==2.6.0
236+
# via
237+
# voicebank2diffsinger (pyproject.toml)
238+
# lightning
239+
# pytorch-lightning
240+
# torchaudio
241+
# torchmetrics
242+
torchaudio==2.6.0
243+
# via voicebank2diffsinger (pyproject.toml)
244+
torchmetrics==1.6.1
245+
# via
246+
# lightning
247+
# pytorch-lightning
248+
tqdm==4.66.6
249+
# via
250+
# voicebank2diffsinger (pyproject.toml)
251+
# lightning
252+
# pytorch-lightning
253+
typing-extensions==4.12.2
254+
# via
255+
# beautifulsoup4
256+
# lightning
257+
# lightning-utilities
258+
# praatio
259+
# pytorch-lightning
260+
# sox
261+
# torch
262+
tzdata==2025.1
263+
# via pandas
264+
urllib3==2.3.0
265+
# via requests
266+
utaupy==1.19.1
267+
# via voicebank2diffsinger (pyproject.toml)
268+
werkzeug==3.1.3
269+
# via tensorboard
270+
yarl==1.18.3
271+
# via aiohttp
File renamed without changes.

0 commit comments

Comments
 (0)