Skip to content

SD2-1327-ll-ms-make-it-possible-to-generate-audio-files #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Daggx
Copy link
Contributor

@Daggx Daggx commented Mar 26, 2025

Summary by CodeRabbit

  • New Features
    • Enhanced chat functionality now supports multiple input modalities, including audio.
    • Users can enjoy a more flexible interaction experience by combining text and audio inputs.

Copy link

coderabbitai bot commented Mar 26, 2025

Walkthrough

The changes update several API classes by adding two new optional parameters, modalities and audio, to the llm__chat method signatures. These parameters are then passed along to the underlying completion or client method calls. Additionally, the llm_engine.py file’s completion method is modified to accept these parameters, conditionally include them in the parameters dictionary, and adjust error handling formatting. Some files also include added type-hint imports.

Changes

File(s) Change Summary
.../apis/{amazon,anthropic,cohere,deepseek,google,groq,meta,microsoft,mistral,openai,replicate,together_ai,xai}_llm_api.py Added optional parameters modalities: Optional[List[Literal["text", "audio"]]] and audio: Optional[Dict] to the llm__chat method signatures and passed them to the completion calls. Additional type imports were added where necessary.
.../features/llm/llm_interface.py Updated the llm__chat method signature by including the new optional parameters modalities and audio.
.../llmengine/llm_engine.py Extended the completion method to accept modalities and audio parameters. The method now conditionally adds these to the completion_params dictionary and reformats error handling messages.

Sequence Diagram(s)

sequenceDiagram
    participant U as User Request
    participant API as API Class (e.g. AmazonLLMApi)
    participant LC as LLM Client
    U->>API: Call llm__chat(..., modalities, audio)
    API->>LC: Invoke completion(..., modalities, audio)
    LC-->>API: Return completion result
    API-->>U: Return chat response
Loading
sequenceDiagram
    participant U as User Request
    participant LE as LLMEngine.completion
    participant LC as LLM Client
    U->>LE: Call completion(..., modalities, audio)
    LE->>LE: Add modalities & audio to parameters (if provided)
    LE->>LC: Call completion with updated parameters
    LC-->>LE: Return completion result
    LE->>U: Return result (or handle ValueError)
Loading

Possibly related PRs

Suggested reviewers

  • juandavidcruzgomez

Poem

Hopping through the code with glee,
I found new options—so wild and free!
modalities and audio join the dance,
Making our chat API enhanced.
With every hop and joyful bound,
The rabbit cheers for improvements found! 🐰🎉

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
edenai_apis/features/llm/llm_interface.py (1)

30-31: LGTM: New parameters for audio capabilities added to the interface

The addition of modalities and audio parameters aligns well with the PR objective of enabling audio file generation. The typing is appropriately defined with modalities restricted to specific literal values.

However, it would be beneficial to update the method docstring (lines 56-73) to include descriptions for these new parameters to improve developer understanding.

edenai_apis/llmengine/llm_engine.py (1)

746-747: Consider standardizing the audio parameter type across API classes.

There's a type inconsistency between API classes (which use Optional[Dict]) and LLMEngine (which uses Optional[ChatCompletionAudioParam]). While this might not cause runtime issues if the dictionary structure matches the expected format, standardizing the type would improve type safety.

For consistency, consider either:

  1. Using ChatCompletionAudioParam in all API classes
  2. Adding documentation to clarify the expected dictionary structure in the API classes
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b2bc24 and 0cd4218.

📒 Files selected for processing (15)
  • edenai_apis/apis/amazon/amazon_llm_api.py (3 hunks)
  • edenai_apis/apis/anthropic/anthropic_api.py (2 hunks)
  • edenai_apis/apis/cohere/cohere_api.py (2 hunks)
  • edenai_apis/apis/deepseek/deepseek_api.py (2 hunks)
  • edenai_apis/apis/google/google_llm_api.py (3 hunks)
  • edenai_apis/apis/groq/groq_api.py (2 hunks)
  • edenai_apis/apis/meta/meta_api.py (2 hunks)
  • edenai_apis/apis/microsoft/microsoft_llm_api.py (3 hunks)
  • edenai_apis/apis/mistral/mistral_api.py (2 hunks)
  • edenai_apis/apis/openai/openai_llm_api.py (3 hunks)
  • edenai_apis/apis/replicate/replicate_api.py (2 hunks)
  • edenai_apis/apis/together_ai/together_ai_api.py (2 hunks)
  • edenai_apis/apis/xai/xai_llm_api.py (3 hunks)
  • edenai_apis/features/llm/llm_interface.py (2 hunks)
  • edenai_apis/llmengine/llm_engine.py (4 hunks)
🔇 Additional comments (26)
edenai_apis/apis/cohere/cohere_api.py (1)

349-350: LGTM: New audio-related parameters properly implemented

The parameters added to the method signature match those in the interface and are correctly passed to the underlying llm_client.completion method. Implementation maintains consistency with the interface design.

Also applies to: 407-408

edenai_apis/apis/google/google_llm_api.py (1)

28-29: LGTM: Audio parameters correctly implemented

The new parameters match the interface definition and are properly passed to the completion method. Implementation is consistent with the other API classes being updated.

Also applies to: 86-87

edenai_apis/apis/amazon/amazon_llm_api.py (1)

28-29: LGTM: Audio capability parameters added consistently

The implementation follows the same pattern as the other API classes, maintaining consistency across the codebase. The parameters are properly defined and passed to the underlying completion method.

Also applies to: 86-87

edenai_apis/apis/microsoft/microsoft_llm_api.py (3)

1-1: Import modifications are properly handled.

The import statement has been updated to include Literal and Dict which are required for the new parameters.


27-28: LGTM! Parameter additions for audio support.

The new parameters modalities and audio have been properly defined with appropriate type hints and default values.


85-87: Parameters correctly passed to the completion method.

The newly added parameters are properly passed to the underlying llm_client.completion method.

edenai_apis/apis/groq/groq_api.py (2)

77-78: LGTM! Parameter additions for audio support.

The new parameters modalities and audio have been properly defined with appropriate type hints and default values.


135-137: Parameters correctly passed to the completion method.

The newly added parameters are properly passed to the underlying llm_client.completion method.

edenai_apis/apis/deepseek/deepseek_api.py (2)

76-77: LGTM! Parameter additions for audio support.

The new parameters modalities and audio have been properly defined with appropriate type hints and default values.


134-136: Parameters correctly passed to the completion method.

The newly added parameters are properly passed to the underlying llm_client.completion method.

edenai_apis/apis/together_ai/together_ai_api.py (2)

80-81: LGTM! Parameter additions for audio support.

The new parameters modalities and audio have been properly defined with appropriate type hints and default values.


138-140: Parameters correctly passed to the completion method.

The newly added parameters are properly passed to the underlying llm_client.completion method.

edenai_apis/apis/replicate/replicate_api.py (1)

261-262: Appropriate implementation for multimodal support.

The addition of modalities and audio parameters enables audio generation capabilities. The parameters are properly typed and aligned with the PR objective to support audio file generation.

Also applies to: 319-320

edenai_apis/apis/anthropic/anthropic_api.py (1)

150-151: Implementation follows API pattern consistently.

The addition of modalities and audio parameters correctly extends the API to support audio generation. The parameters are properly typed and consistently passed to the underlying completion method.

Also applies to: 208-209

edenai_apis/apis/mistral/mistral_api.py (1)

184-185: Clean implementation of multimodal support.

The addition of modalities and audio parameters to the Mistral API follows the same pattern as other provider implementations, ensuring consistency across the codebase.

Also applies to: 242-243

edenai_apis/apis/openai/openai_llm_api.py (2)

1-1: Type import correctly updated.

The Literal import has been appropriately added to support the new parameter typing.


27-28: Audio generation implementation properly added.

The OpenAI LLM API implementation correctly adds support for audio generation with the new parameters. The implementation is consistent with other API classes in the system.

Also applies to: 84-85

edenai_apis/apis/meta/meta_api.py (2)

143-144: Implementation of multimodal support with audio parameter looks correct.

The added parameters modalities and audio extend the method to support multimodal chat capabilities, specifically for audio inputs.


201-202: Properly passing new parameters to the completion method.

The new parameters are correctly passed to the underlying llm_client.completion method.

edenai_apis/apis/xai/xai_llm_api.py (3)

1-1: Updated imports to support new parameter types.

The import statement has been correctly updated to include Literal and Dict from the typing module, which are necessary for the type annotations of the new parameters.


28-29: Added multimodal support parameters with appropriate type hints.

The implementation correctly adds the modalities and audio parameters with proper type annotations.


86-87: Properly passing new parameters to the completion method.

New parameters are correctly passed to the underlying completion method.

edenai_apis/llmengine/llm_engine.py (4)

12-12: Added proper import for audio parameter type.

The import for ChatCompletionAudioParam is correctly added, providing a specific type for the audio parameter.


114-116: Improved error message formatting.

The error message has been reformatted for better readability while maintaining the same error handling logic.


746-747: Implemented core parameters for multimodal support.

The modalities and audio parameters are correctly added to the method signature, allowing for multimodal capabilities.


784-787: Properly handling new parameters in request preparation.

The code correctly checks if the parameters are provided and adds them to the completion parameters dictionary only when they're not None.

@juandavidcruzgomez juandavidcruzgomez merged commit 398dd74 into master Mar 26, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants