Default Podcast Generation Fails Due to Hardcoded Voice Types in tts_node #65

ZhiyuXu0124 · 2025-05-11T13:52:39Z

Description

It appears that the BV002_streaming and BV001_streaming voice types are no longer available in the Volcengine TTS service. This is confirmed by referencing the official voice type list: Volcengine TTS Voice Types. As a result, the default podcast generation process fails.

Initially, I attempted to update the VOLCENGINE_TTS_VOICE_TYPE in the .env file to resolve the issue. However, the error persisted, as shown in the logs below:

2025-05-11 20:45:27,796 - src.podcast.graph.tts_node - ERROR - {'reqid': 'xxxxxxxxxxxxxxxx', 'code': 3001, 'message': '[resource_id=volc.tts.default] requested resource not granted'}

Upon further investigation, I found that the voice types for different speakers are hardcoded in the src/podcast/graph/tts_node.py file. This is causing the issue since the unavailable voice types are directly referenced in the code.

Relevant Code

Here is the relevant code block from src/podcast/graph/tts_node.py:

def tts_node(state: PodcastState):
    logger.info("Generating audio chunks for podcast...")
    tts_client = _create_tts_client()
    for line in state["script"].lines:
        tts_client.voice_type = (
            "BV002_streaming" if line.speaker == "male" else "BV001_streaming"
        )
        result = tts_client.text_to_speech(line.paragraph, speed_ratio=1.05)
        if result["success"]:
            audio_data = result["audio_data"]
            audio_chunk = base64.b64decode(audio_data)
            state["audio_chunks"].append(audio_chunk)
        else:
            logger.error(result["error"])
    return {
        "audio_chunks": state["audio_chunks"],
    }

Suggestions for Improvement

To address this issue and make the system more flexible, I propose the following changes:

Configuration via .env: Allow all voice type configurations to be managed through environment variables. For example:
- VOLCENGINE_TTS_VOICE_TYPE_MALE
- VOLCENGINE_TTS_VOICE_TYPE_FEMALE
Enhanced Podcast Configuration: Introduce a more dynamic configuration system for podcast generation. This could include:
- The number of speakers.
- Role assignments for each speaker.
- Customizable voice types for each role.

The text was updated successfully, but these errors were encountered:

jizhi0v0 · 2025-05-12T01:22:04Z

https://console.volcengine.com/speech/app

在这里编辑你所用的app，并且授予「语音合成大模型」下的「大模型语音合成」权限

ZhiyuXu0124 · 2025-05-13T01:18:12Z

https://console.volcengine.com/speech/app

在这里编辑你所用的app，并且授予「语音合成大模型」下的「大模型语音合成」权限

火山这边的应用我一直在用的，目前测试看下来是火山下架了你们默认的音色，我改用我之前克隆的声音一切就都正常了。主要是发现你们代码中写死了男女生的音色，导致env的配置不起作用了

hahazei · 2025-05-15T09:44:51Z

配置自己购买的音色类型
VOLCENGINE_TTS_CLUSTER=volcano_tts # Optional, default is volcano_tts
VOLCENGINE_TTS_VOICE_TYPE=BV005_streaming # Optional, default is BV700_V2_streaming

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default Podcast Generation Fails Due to Hardcoded Voice Types in tts_node #65

Default Podcast Generation Fails Due to Hardcoded Voice Types in tts_node #65

ZhiyuXu0124 commented May 11, 2025

jizhi0v0 commented May 12, 2025 •

edited

Loading

Uh oh!

ZhiyuXu0124 commented May 13, 2025

Uh oh!

hahazei commented May 15, 2025

Uh oh!

Default Podcast Generation Fails Due to Hardcoded Voice Types in tts_node #65

Default Podcast Generation Fails Due to Hardcoded Voice Types in tts_node #65

Comments

ZhiyuXu0124 commented May 11, 2025

Description

Relevant Code

Suggestions for Improvement

jizhi0v0 commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZhiyuXu0124 commented May 13, 2025

Uh oh!

hahazei commented May 15, 2025

Uh oh!

jizhi0v0 commented May 12, 2025 •

edited

Loading