Skip to content

Default Podcast Generation Fails Due to Hardcoded Voice Types in tts_node #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZhiyuXu0124 opened this issue May 11, 2025 · 3 comments

Comments

@ZhiyuXu0124
Copy link

Description

It appears that the BV002_streaming and BV001_streaming voice types are no longer available in the Volcengine TTS service. This is confirmed by referencing the official voice type list: Volcengine TTS Voice Types. As a result, the default podcast generation process fails.

Initially, I attempted to update the VOLCENGINE_TTS_VOICE_TYPE in the .env file to resolve the issue. However, the error persisted, as shown in the logs below:

2025-05-11 20:45:27,796 - src.podcast.graph.tts_node - ERROR - {'reqid': 'xxxxxxxxxxxxxxxx', 'code': 3001, 'message': '[resource_id=volc.tts.default] requested resource not granted'}

Upon further investigation, I found that the voice types for different speakers are hardcoded in the src/podcast/graph/tts_node.py file. This is causing the issue since the unavailable voice types are directly referenced in the code.

Relevant Code

Here is the relevant code block from src/podcast/graph/tts_node.py:

def tts_node(state: PodcastState):
    logger.info("Generating audio chunks for podcast...")
    tts_client = _create_tts_client()
    for line in state["script"].lines:
        tts_client.voice_type = (
            "BV002_streaming" if line.speaker == "male" else "BV001_streaming"
        )
        result = tts_client.text_to_speech(line.paragraph, speed_ratio=1.05)
        if result["success"]:
            audio_data = result["audio_data"]
            audio_chunk = base64.b64decode(audio_data)
            state["audio_chunks"].append(audio_chunk)
        else:
            logger.error(result["error"])
    return {
        "audio_chunks": state["audio_chunks"],
    }

Suggestions for Improvement

To address this issue and make the system more flexible, I propose the following changes:

  1. Configuration via .env: Allow all voice type configurations to be managed through environment variables. For example:
    • VOLCENGINE_TTS_VOICE_TYPE_MALE
    • VOLCENGINE_TTS_VOICE_TYPE_FEMALE
  2. Enhanced Podcast Configuration: Introduce a more dynamic configuration system for podcast generation. This could include:
    • The number of speakers.
    • Role assignments for each speaker.
    • Customizable voice types for each role.
@jizhi0v0
Copy link

jizhi0v0 commented May 12, 2025

https://console.volcengine.com/speech/app

在这里编辑你所用的app,并且授予「语音合成大模型」下的「大模型语音合成」权限

Image

@ZhiyuXu0124
Copy link
Author

https://console.volcengine.com/speech/app

在这里编辑你所用的app,并且授予「语音合成大模型」下的「大模型语音合成」权限

Image

火山这边的应用我一直在用的,目前测试看下来是火山下架了你们默认的音色,我改用我之前克隆的声音一切就都正常了。主要是发现你们代码中写死了男女生的音色,导致env的配置不起作用了

@hahazei
Copy link

hahazei commented May 15, 2025

配置自己购买的音色类型
VOLCENGINE_TTS_CLUSTER=volcano_tts # Optional, default is volcano_tts
VOLCENGINE_TTS_VOICE_TYPE=BV005_streaming # Optional, default is BV700_V2_streaming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants