NOVA AI is an intelligent VRChat assistant that brings conversational AI directly into your VRChat experience. Using advanced speech recognition, natural language processing, text-to-speech technology, computer vision, and VRChat API integration, NOVA can listen to your voice, understand what you're saying, see your VRChat world, manage social interactions, and respond intelligently through VRChat's chatbox. With native support for 29+ languages, NOVA can communicate naturally in your preferred language with automatic language detection and multilingual voice synthesis.
- What is NOVA AI?
- Features
- Configuration System
- Vision System
- VRChat API Integration
- Together AI Integration
- Multilingual Support
- Prerequisites
- Installation Guide
- Setup Instructions
- Configuration
- Usage
- Troubleshooting
- Contributing
- License
NOVA AI is a sophisticated VRChat companion that:
- Listens to your voice using advanced speech recognition (OpenAI Whisper)
- Thinks using powerful AI language models (OpenAI GPT, Together AI, or local models via LM Studio)
- Sees your VRChat world using computer vision and screenshot analysis
- Responds through VRChat's chatbox using OSC (Open Sound Control)
- Speaks back to you using text-to-speech technology (Microsoft Edge TTS)
- Manages VRChat social interactions via API integration (friend requests, notifications)
- Monitors system performance with a real-time resource dashboard
- Communicates in 29+ languages with automatic language detection and multilingual TTS voices
Perfect for content creators, VRChat enthusiasts, or anyone who wants an intelligent AI companion in their virtual world!
- 🎤 Voice Recognition: Advanced speech-to-text using OpenAI Whisper with configurable models
- 🧠 AI Conversation: Powered by OpenAI's language models, Together AI, or local models via LM Studio
- 💬 VRChat Integration: Seamlessly displays responses in VRChat chatbox via OSC
- 🔊 Text-to-Speech: Speaks responses back using Microsoft Edge TTS with customizable voices
- 🎚️ Voice Activity Detection: Automatically detects when you start and stop speaking with WebRTC VAD
- ⚙️ Customizable: Configurable system prompts, voices, and behavior through centralized constants
- 📊 Resource Monitoring: Built-in performance monitoring with customizable GUI dashboard
- 🔧 Modular Design: Easy to extend and customize with class-based architecture
- ⚙️ Centralized Configuration: All tunable settings in one location (
constants.py
) - 🎯 Easy Tuning: Comprehensive configuration system for all aspects of NOVA
- 👁️ Vision System: Advanced computer vision capabilities for VRChat world analysis
- 🤝 VRChat API Integration: Automatic friend request handling and notification management
- 🔒 Secure Configuration: Environment variable system for sensitive credentials
- 🎮 Avatar Movement: Automated VRChat avatar positioning and movement capabilities
- 🌐 Multilingual Support: Native support for 29+ languages with automatic language detection and multilingual TTS voices
NOVA AI features a revolutionary centralized configuration system that makes customization and tuning incredibly simple. Instead of hunting through multiple files to change settings, everything is organized in one place: constants.py
.
Before the constants system, changing NOVA's behavior meant:
- ❌ Searching through multiple Python files
- ❌ Finding hardcoded values scattered everywhere
- ❌ Risk of breaking the code with incorrect edits
- ❌ No clear documentation of what each value does
Now with the constants system:
- ✅ One file controls everything -
constants.py
- ✅ Organized by purpose - Network, Audio, AI, Voice, etc.
- ✅ Clear documentation - Every setting has explanatory comments
- ✅ Safe to modify - Designed specifically for user customization
- ✅ Easy to backup - Save your perfect configuration in one file
The constants.py
file contains configuration classes grouped by functionality:
class Network: # All networking settings (IPs, ports)
class Audio: # Audio device configuration
class Voice: # Text-to-speech settings
class LanguageModel: # AI model configuration
class WhisperSettings: # Speech recognition tuning
class LLM_API: # Language model API settings (OpenAI, Together AI, LM Studio)
class Vision_API: # Vision model API settings (OpenAI, Together AI, LM Studio)
class TTSSettings: # Text-to-speech engine options
class FilePaths: # All file and folder locations
# ... and more!
- 🎚️ Fine-tune Performance: Adjust speech recognition sensitivity, AI creativity, response speed
- 🎭 Customize Personality: Modify system prompts and AI behavior
- 🔊 Perfect Audio: Set exact audio device indices and voice preferences
- ⚡ Optimize Speed: Balance between accuracy and response time
- 🌐 Network Flexibility: Easy port and IP configuration for different setups
- 📊 Monitor Control: Customize the resource monitor appearance and behavior
Want to make NOVA more creative? Just open constants.py
and change:
class LanguageModel:
LM_TEMPERATURE = 0.9 # Changed from 0.7 to 0.9 for more creativity
Want better speech recognition? Update:
class WhisperSettings:
MODEL_SIZE = "small" # Changed from "base" for better accuracy
That's it! The entire codebase automatically uses your new settings.
NOVA AI includes an advanced Vision System that brings computer vision capabilities to your VRChat experience. This powerful feature allows NOVA to "see" your VRChat world, analyze what's happening, and provide contextual responses based on visual information.
The Vision System is an optional module that:
- 📸 Captures Screenshots: Automatically takes screenshots of your VRChat window
- 🔍 Analyzes Content: Uses AI vision models to understand what's in the image
- 👥 Identifies Players: Recognizes avatars, usernames, and player interactions
- 🌍 Describes Environments: Understands world themes, lighting, and atmosphere
- 💬 Provides Context: Gives NOVA visual context for more intelligent responses
- Identifies visible avatars and their appearance
- Reads usernames when visible
- Counts how many players are in view
- Describes avatar styles, outfits, and accessories
- Recognizes world themes (nightclub, forest, city, etc.)
- Describes lighting conditions and atmosphere
- Identifies notable objects and structures
- Understands the overall vibe of the space
- Observes player actions (dancing, sitting, emoting)
- Detects social interactions
- Notices movement and activities
The Vision System is controlled through the VisionSystem
class in constants.py
:
class VisionSystem:
# Enable or disable the vision system
ENABLED = False # Set to True to enable vision
# How often to analyze screenshots (seconds)
ANALYSIS_INTERVAL = 15
# Image processing settings
MAX_IMAGE_SIZE = 1024 # Max resolution for AI processing
IMAGE_QUALITY = 85 # JPEG quality (1-100)
# AI model settings
VISION_MODEL = "qwen/qwen2.5-vl-7b" # Vision model to use
MAX_VISION_TOKENS = 150 # Response length limit
VISION_TEMPERATURE = 0.3 # Creativity level (0.0-1.0)
# File locations
STATE_FILE = "json_files/vision_state.json"
LOG_FILE = "json_files/vision_log.json"
VISION_PROMPT_PATH = "prompts/vision_prompt.txt"
- Open
constants.py
in a text editor - Find the
VisionSystem
class - Change
ENABLED = False
toENABLED = True
The Vision System supports both local and cloud-based AI vision models:
Option A: Local Models (LM Studio)
- Keep the default
VISION_MODEL = "qwen/qwen2.5-vl-7b"
- Ensure your LM Studio setup supports vision models
Option B: OpenAI Vision API
- Change
VISION_MODEL = "gpt-4-vision-preview"
- Update
constants.py
to use OpenAI for vision:class Vision_API: API_TYPE = "openai" # Change to "openai" BASE_URL = "https://api.openai.com/v1"
- Ensure your OpenAI API key has vision access
Option C: Together AI Vision API (Default)
- Keep the default
VISION_MODEL = "meta-llama/Llama-Vision-Free"
- The default settings in
constants.py
are already configured for Together AI:class Vision_API: API_TYPE = "together" # Default setting BASE_URL = "https://api.together.xyz/v1"
- Ensure your Together AI API key is set in your
.env
file asVISION_API_KEY
Test the vision system by:
- Ensuring VRChat is running and visible
- Starting NOVA with vision enabled
- Monitoring console output for vision updates
- Checking the
json_files/vision_log.json
file for logged vision data
- Automatic Detection: The system automatically finds your VRChat window
- Periodic Analysis: Takes screenshots at regular intervals (configurable)
- AI Processing: Sends images to the vision model for analysis
- Context Integration: Provides visual context to NOVA for better responses
- Logging: Keeps a log of recent visual observations
When the Vision System is active, NOVA can make responses like:
- "I can see you're in a beautiful cyberpunk nightclub with neon lights everywhere!"
- "There are 3 other players here - someone with butterfly wings is dancing near the center."
- "This cozy forest world has such peaceful vibes with those cherry blossom trees."
- "I notice xX_Gamer_Xx just joined in that golden armor avatar - pretty cool look!"
- Vision analysis adds processing overhead
- Adjust
ANALYSIS_INTERVAL
to balance responsiveness vs. performance - Lower
MAX_IMAGE_SIZE
for faster processing - Use
IMAGE_QUALITY
to balance file size vs. detail
- Screenshots are processed locally (unless using cloud APIs)
- No images are permanently stored
- Vision logs can be cleared at any time
- System can be disabled instantly
Edit prompts/vision_prompt.txt
to change how the AI interprets images:
- Modify what details to focus on
- Change the response style
- Add specific instructions for your use case
# For better accuracy (slower)
ANALYSIS_INTERVAL = 10 # More frequent analysis
MAX_IMAGE_SIZE = 1920 # Higher resolution
VISION_TEMPERATURE = 0.1 # More consistent results
# For better performance (faster)
ANALYSIS_INTERVAL = 30 # Less frequent analysis
MAX_IMAGE_SIZE = 512 # Lower resolution
VISION_TEMPERATURE = 0.5 # More varied results
The Vision System can be configured to only analyze when:
- NOVA is directly spoken to
- Specific keywords are mentioned
- Manual triggers are activated
Vision System Not Starting:
- Verify
ENABLED = True
inconstants.py
- Check that VRChat window is visible and active
- Monitor console output for vision system startup messages
- Ensure your AI model supports vision capabilities
Poor Recognition Quality:
- Increase
MAX_IMAGE_SIZE
for better detail - Adjust
IMAGE_QUALITY
for clearer images - Check lighting in your VRChat world
- Try different vision models
Performance Issues:
- Increase
ANALYSIS_INTERVAL
for less frequent analysis - Decrease
MAX_IMAGE_SIZE
for faster processing - Consider using a faster vision model
- Monitor system resources during operation
API Errors:
- Verify your AI model supports vision capabilities
- Check API key permissions for vision access
- Ensure sufficient API credits/quota
- Monitor console output for specific error messages
- Start Conservative: Begin with longer analysis intervals and smaller image sizes
- Monitor Performance: Watch system resources when vision is enabled
- Customize Prompts: Tailor the vision prompt for your specific VRChat activities
- Test Different Models: Try various vision models to find the best balance
- Privacy Awareness: Remember that the system can see everything in your VRChat window
The Vision System transforms NOVA from a voice-only assistant into a truly aware VRChat companion that can see and understand your virtual world!
NOVA AI includes comprehensive VRChat API Integration that enables advanced social features and automation within VRChat. This powerful system allows NOVA to interact with VRChat's official API to manage social interactions, handle notifications, and provide a more integrated VRChat experience.
The VRChat API Integration is an optional module that:
- 👥 Friend Management: Automatically accepts friend requests based on your preferences
- 🔔 Notification Handling: Monitors and processes VRChat notifications in real-time
- 📊 Social Analytics: Tracks friend interactions and social metrics
- 🛡️ Rate Limiting: Implements proper rate limiting to respect VRChat's API guidelines
- 🔐 Secure Authentication: Uses your VRChat credentials securely via environment variables
- Automatically accept incoming friend requests
- Configurable auto-accept behavior
- Rate-limited processing to prevent API abuse
- Retry logic for failed operations
- Real-time notification checking
- Processes various notification types
- Configurable check intervals
- Duplicate notification filtering
- Respects VRChat's API usage policies
- Implements exponential backoff for retries
- Connection timeout handling
- Comprehensive error logging
The VRChat API system is controlled through the VRChatAPI
class in constants.py
:
class VRChatAPI:
# Master switch to enable/disable all VRChat API functionality
USING_API = False # Set to True to enable API usage
# VRChat account credentials (loaded from environment variables)
USERNAME = os.getenv('VRCHAT_EMAIL')
PASSWORD = os.getenv('VRCHAT_PASSWORD')
# User agent string as per VRChat Usage Policy
USER_AGENT = f"NOVA-AI/2025.1.1 {os.getenv('VRCHAT_EMAIL')}"
# API check intervals (seconds)
FRIEND_REQUEST_CHECK_INTERVAL = 60 # 1 minute
NOTIFICATION_CHECK_INTERVAL = 120 # 2 minutes
# Rate limiting and cooldown settings
API_COOLDOWN = 30 # Seconds to wait between API calls
# Feature toggles
AUTO_ACCEPT_FRIEND_REQUESTS = True
ENABLE_NOTIFICATION_CHECKS = True
ENABLE_FRIEND_REQUEST_CHECKS = True
# Connection timeout settings
CONNECTION_TIMEOUT = 30
REQUEST_TIMEOUT = 15
# Retry settings for failed operations
MAX_RETRY_ATTEMPTS = 3
RETRY_DELAY = 5 # Seconds between retries
# Debug settings
VERBOSE_LOGGING = False # Set to True for detailed API logs
- Open
constants.py
in a text editor - Find the
VRChatAPI
class - Change
USING_API = False
toUSING_API = True
The VRChat API requires your VRChat account credentials:
- Ensure your
.env
file exists (copy from.env.example
if needed) - Add your VRChat credentials:
# VRChat Login Credentials VRCHAT_EMAIL[email protected] VRCHAT_PASSWORD=your-vrchat-password
Adjust the settings in constants.py
based on your preferences:
# Enable/disable specific features
AUTO_ACCEPT_FRIEND_REQUESTS = True # Automatically accept friend requests
ENABLE_NOTIFICATION_CHECKS = True # Monitor notifications
ENABLE_FRIEND_REQUEST_CHECKS = True # Check for new friend requests
# Adjust timing
FRIEND_REQUEST_CHECK_INTERVAL = 60 # How often to check for friend requests
NOTIFICATION_CHECK_INTERVAL = 120 # How often to check notifications
- VRChat credentials are stored in environment variables (
.env
file) - Never hardcode credentials directly in the code
- The
.env
file is automatically ignored by Git for security
- Follows VRChat's API Usage Policy and Terms of Service
- Implements proper User-Agent strings as required
- Uses appropriate rate limiting to prevent API abuse
- Respects VRChat's API guidelines and best practices
- API integration can be disabled completely
- Verbose logging can be turned off
- No personal data is logged or transmitted beyond VRChat's API
- Background Processing: The API system runs in the background while NOVA operates
- Friend Request Handling: Automatically processes incoming friend requests
- Notification Monitoring: Checks for new notifications at regular intervals
- Integration with NOVA: Can inform NOVA about VRChat events for contextual responses
- Rate Limiting: Automatically manages API call frequency to stay within limits
When the VRChat API is active, NOVA can provide responses like:
- "I just accepted a friend request from VRChatUser123!"
- "You have 3 new notifications in VRChat."
- "Your friend list has been updated with 2 new friends."
- Connection status tracking
- API call success/failure rates
- Rate limiting status
- Error logging and recovery
# For better responsiveness (more frequent checks)
FRIEND_REQUEST_CHECK_INTERVAL = 30 # Check every 30 seconds
NOTIFICATION_CHECK_INTERVAL = 60 # Check every minute
# For better performance (less frequent checks)
FRIEND_REQUEST_CHECK_INTERVAL = 300 # Check every 5 minutes
NOTIFICATION_CHECK_INTERVAL = 600 # Check every 10 minutes
API Not Connecting:
- Verify
USING_API = True
inconstants.py
- Check VRChat credentials in
.env
file - Ensure VRChat account has API access enabled
Authentication Errors:
- Verify email and password are correct in
.env
- Check if two-factor authentication is enabled on your VRChat account
- Ensure your VRChat account is in good standing
Rate Limiting Issues:
- Increase
API_COOLDOWN
value for longer delays between calls - Reduce check intervals to make fewer API requests
- Monitor console output for rate limiting warnings
Connection Timeouts:
- Increase
CONNECTION_TIMEOUT
andREQUEST_TIMEOUT
values - Check internet connection stability
- Verify VRChat API status
- Start Conservatively: Begin with longer check intervals and increase frequency as needed
- Monitor Performance: Watch console output for API errors or warnings
- Respect Rate Limits: Don't set intervals too aggressively
- Keep Credentials Secure: Never share your
.env
file or commit it to version control - Test Thoroughly: Verify API functionality before relying on automated features
The VRChat API Integration makes NOVA a more complete VRChat companion by bridging the gap between your AI assistant and VRChat's social features! (We are not liable if your accounts gets suspended or banned because of VRChat API usage)
NOVA AI now includes first-class support for Together AI, providing access to cutting-edge open-source language models with fast inference and competitive pricing.
Together AI offers several advantages for NOVA users:
- 🔓 Open Source Models: Access to the latest open-source language models like Llama 3.3, Qwen, and more
- ⚡ Fast Inference: Optimized infrastructure for quick response times
- 💰 Cost Effective: Competitive pricing compared to other cloud AI providers
- 🧠 Advanced Models: Support for both text and vision models
- 🔄 Easy Integration: Drop-in replacement for OpenAI API with minimal configuration changes
Together AI provides access to a wide range of models suitable for different use cases:
- Llama 3.3 70B Instruct Turbo (Default): High-quality responses with good speed
- Qwen 2.5 72B Instruct: Excellent for general conversation and reasoning
- Mixtral 8x7B: Fast responses with good quality
- And many more: Browse available models at api.together.xyz
- Llama Vision Free: Multimodal understanding for VRChat screenshot analysis
- Qwen VL models: Advanced vision-language understanding
- Custom vision models: Support for specialized vision tasks
Together AI is configured as the default API provider in NOVA AI. The configuration is handled through two main classes in constants.py
:
class LLM_API:
API_TYPE = "together" # Uses Together AI for text generation
BASE_URL = "https://api.together.xyz/v1"
API_KEY = os.getenv('LLM_API_KEY') # Your Together AI API key
class Vision_API:
API_TYPE = "together" # Uses Together AI for vision tasks
BASE_URL = "https://api.together.xyz/v1"
API_KEY = os.getenv('VISION_API_KEY') # Your Together AI API key
- Create Account: Sign up at api.together.xyz
- Get API Key: Generate your API key from the dashboard
- Set Environment Variables: Add your key to the
.env
file:LLM_API_KEY=your-together-ai-api-key-here VISION_API_KEY=your-together-ai-api-key-here
- Choose Models: Update model names in
constants.py
if desired - Start NOVA: The system will automatically use Together AI for inference
- Model Selection: Choose models based on your performance vs. quality needs
- Temperature Settings: Lower values (0.3-0.7) for more focused responses
- Token Limits: Monitor usage to stay within your preferred budget
- Rate Limits: Together AI has generous rate limits for most use cases
NOVA AI includes comprehensive multilingual support, allowing you to interact with NOVA in over 29 languages with natural conversation flow and appropriate voice responses.
NOVA can understand and respond in the following languages:
European Languages:
- English (US, UK, AU, CA) - Primary language
- Spanish (ES, MX, AR, CO, and more)
- French (FR, CA, BE, CH)
- German (DE, AT, CH)
- Italian (IT, CH)
- Portuguese (PT, BR)
- Dutch (NL, BE)
- Russian (RU)
- Polish (PL)
- Turkish (TR)
- Swedish (SE)
- Norwegian (NO)
- Finnish (FI)
- Ukrainian (UA)
- Romanian (RO)
- Hungarian (HU)
- Greek (GR)
- Czech (CZ)
- Hebrew (IL)
Asian Languages:
- Chinese (Mandarin - CN, TW)
- Japanese (JP)
- Korean (KR)
- Hindi (IN)
- Bengali (IN, BD)
- Urdu (PK, IN)
- Thai (TH)
- Vietnamese (VN)
- Indonesian (ID)
Middle Eastern & African:
- Arabic (Multiple dialects: SA, AE, EG, MA, TN, and more)
NOVA uses OpenAI Whisper for speech recognition, which provides:
- Automatic language detection - Whisper can detect what language you're speaking
- High accuracy across all supported languages
- Real-time processing with configurable model sizes
- Robust performance with accents and dialects
NOVA's TTS system supports 142+ languages through Microsoft Edge TTS:
NOVA includes access to advanced multilingual neural voices that can speak multiple languages naturally:
- EmmaMultilingualNeural (Default) - English with multilingual capabilities
- VivienneMultilingualNeural - French with multilingual capabilities
- SeraphinaMultilingualNeural - German with multilingual capabilities
- GiuseppeMultilingualNeural - Italian with multilingual capabilities
- HyunsuMultilingualNeural - Korean with multilingual capabilities
- And many more specialized voices for each language
- Native character support - Displays text in original scripts (Chinese: 你好, Arabic: مرحبا, etc.)
- Proper pronunciation - Each language uses appropriate phonetic models
- Cultural context - Responses adapt to cultural norms and expressions
- Regional variants - Support for different regional accents and dialects
To change NOVA's voice to another language, modify constants.py
:
class Voice:
# Examples of multilingual voices:
VOICE_NAME = "en-US-EmmaMultilingualNeural" # English (Default)
# VOICE_NAME = "es-ES-ElviraNeural" # Spanish
# VOICE_NAME = "fr-FR-VivienneMultilingualNeural" # French
# VOICE_NAME = "de-DE-SeraphinaMultilingualNeural" # German
# VOICE_NAME = "ja-JP-NanamiNeural" # Japanese
# VOICE_NAME = "ko-KR-HyunsuMultilingualNeural" # Korean
# VOICE_NAME = "zh-CN-XiaoyiNeural" # Chinese
Whisper automatically detects languages, but you can optimize settings in constants.py
:
class WhisperSettings:
MODEL_SIZE = "base" # "base" for multilingual, "small" for better accuracy
# Larger models (small, medium, large) provide better multilingual support
- You speak in any supported language
- Whisper detects your language automatically
- NOVA understands and processes in the detected language
- NOVA responds in the same language (or English if configured)
- TTS speaks the response using the appropriate voice
- Use larger Whisper models (
small
ormedium
) for better language detection - Speak clearly - especially important for non-native languages
- Choose appropriate TTS voices - use multilingual voices for mixed-language conversations
- Consider context - NOVA maintains conversation context across languages
- NOVA can handle code-switching (switching between languages mid-conversation)
- Each response maintains the language of the input
- Multilingual voices can naturally handle multiple languages in one conversation
- Greetings and expressions adapt to cultural norms
- Formality levels adjust based on language conventions
- Regional references include local context when appropriate
- Latin scripts (English, Spanish, French, German, etc.)
- Cyrillic (Russian, Ukrainian, etc.)
- CJK characters (Chinese, Japanese, Korean)
- Arabic script (Arabic, Urdu, etc.)
- Devanagari (Hindi, Bengali, etc.)
You can browse and select from 400+ voices across 142 languages. To see available voices:
# Add this to a Python script to list available voices
import edge_tts
import asyncio
async def list_voices():
voices = await edge_tts.list_voices()
for voice in voices:
if 'your_language_code' in voice['Locale']:
print(f"{voice['ShortName']} - {voice['FriendlyName']}")
asyncio.run(list_voices())
- Whisper model size affects multilingual accuracy:
tiny
: Fast but limited multilingual supportbase
: Good balance for most languagessmall
: Better accuracy for non-English languagesmedium/large
: Best multilingual performance
Before installing NOVA AI, you'll need the following on your Windows machine:
- Download from python.org
- Important: During installation, check "Add Python to PATH"
- Verify installation by opening Command Prompt and typing:
python --version
- Download from git-scm.com
- This allows you to easily download and update NOVA AI
- Have VRChat installed and an account
- OSC must be enabled in VRChat settings (we'll cover this in setup)
- A microphone for voice input
- Audio output device (speakers/headphones)
- Optional: Virtual audio cables for advanced audio routing (VB-Audio Virtual Cable)
Choose one of these options for AI processing:
Option A: Local AI Models (Recommended for Privacy)
- LM Studio - Free local AI model runner
- At least 8GB RAM (16GB recommended for larger models)
- Compatible with many open-source models
Option B: OpenAI API
- OpenAI account with API access
- API credits for usage
- Internet connection for API calls
- RAM: 8GB minimum (16GB recommended with vision system)
- Storage: 2GB free space (more for local AI models)
- GPU: Optional but recommended for better performance with local models
- Windows: Windows 10 or later (for compatibility with all features)
Option A: Using Git (Recommended)
- Open Command Prompt or PowerShell
- Navigate to where you want to install NOVA AI:
cd C:\Users\%USERNAME%\Documents
- Clone the repository:
git clone https://github.com/S0L0GUY/NOVA-AI.git cd NOVA-AI
Option B: Manual Download
- Go to the NOVA AI GitHub page
- Click the green "Code" button
- Select "Download ZIP"
- Extract the ZIP file to a folder like
C:\Users\%USERNAME%\Documents\NOVA-AI
- Open Command Prompt or PowerShell as Administrator
- Navigate to the NOVA-AI folder:
cd C:\Users\%USERNAME%\Documents\NOVA-AI
- Install required packages:
pip install -r requirements.txt
If you encounter errors, try:
python -m pip install --upgrade pip
pip install -r requirements.txt
Some packages may require additional setup:
For Audio Processing: If PyAudio installation fails:
- Download the appropriate .whl file from here
- Install it with:
pip install path\to\downloaded\file.whl
For Windows-specific features: The installation includes Windows-specific packages for:
- System resource monitoring (
psutil
,GPUtil
) - Window management (
pywin32
) - Custom GUI components (
customtkinter
)
Test that all components are properly installed:
-
Test audio device detection:
python list_audio_devices.py
-
Verify Python packages:
python -c "import openai, whisper, edge_tts, PIL, customtkinter; print('All packages installed successfully')"
Before starting setup, NOVA AI includes example configuration files to help you get started:
- Environment Variables: Copy
.env.example
to.env
for your API keys and VRChat credentials - Configuration Reference: Review
constants.py
for all available settings - System Prompts: Check the
prompts/
folder for personality customization options
You'll configure these files in the following setup steps.
-
Find your audio device indices:
python list_audio_devices.py
-
Note the index numbers for your microphone (input) and speakers (output)
-
Edit the constants.py file:
- Open
constants.py
in a text editor (Notepad, VS Code, etc.) - Navigate to the
Audio
class and update the device indices:
class Audio: AUDIO_OUTPUT_INDEX = 7 # Replace with your speaker index AUDIO_INPUT_INDEX = 2 # Replace with your microphone index
- Open
NOVA AI uses environment variables to securely store sensitive information like API keys and VRChat credentials. This keeps your login information separate from the code.
-
Create a .env file from the example:
- In the NOVA-AI folder, you'll find a
.env.example
file - Copy this file and rename it to
.env
:
copy .env.example .env
- Or manually create a new file called
.env
(note the dot at the beginning)
- In the NOVA-AI folder, you'll find a
-
Edit your .env file:
- Open the
.env
file in any text editor - Replace the example values with your actual credentials:
# VRChat Login Credentials (required for VRChat API features) VRCHAT_EMAIL[email protected] VRCHAT_PASSWORD=your-actual-vrchat-password # OpenAI API Key (use "lm-studio" for local models, or your actual key for OpenAI) OPENAI_API_KEY=lm-studio
- Open the
-
Configure based on your AI setup:
For Local AI Models (LM Studio):
OPENAI_API_KEY=lm-studio
For OpenAI API:
OPENAI_API_KEY=sk-your-actual-openai-api-key-here
-
Important Security Notes:
- Never share your
.env
file or commit it to version control - The
.env
file is automatically ignored by Git for your security - NOVA will automatically load these credentials when it starts
- The
.env.example
file shows the format but contains no real credentials
- Never share your
Choose your AI backend and configure accordingly:
Option A: Local Models with LM Studio (Recommended)
- Download and install LM Studio
- Download a compatible model (e.g., Llama 3.1 8B Instruct)
- Start the local server in LM Studio (default:
http://localhost:1234
) - Keep the default settings in
constants.py
:class OpenAI: BASE_URL = "http://localhost:1234/v1" API_KEY = os.getenv('OPENAI_API_KEY') # Will use "lm-studio" from .env
Option B: OpenAI API
- Get an OpenAI API key from platform.openai.com
- Update your
.env
file with your real API key - Update
constants.py
for OpenAI:class LLM_API: API_TYPE = "openai" # Change to "openai" BASE_URL = "https://api.openai.com/v1" # Change to OpenAI's API API_KEY = os.getenv('LLM_API_KEY')
Option C: Together AI API (Default)
- Get a Together AI API key from api.together.xyz
- Update your
.env
file with your Together API key:LLM_API_KEY=your-together-api-key-here
- The default settings in
constants.py
are already configured for Together AI:class LLM_API: API_TYPE = "together" # Default setting BASE_URL = "https://api.together.xyz/v1" # Together AI API endpoint API_KEY = os.getenv('LLM_API_KEY')
-
Enable OSC in VRChat:
- Launch VRChat
- Go to Settings → OSC
- Enable "Enabled"
- Note the port number (usually 9000)
-
Update network settings:
- The
constants.py
file should automatically detect your IP - Verify the
VRC_PORT
in theNetwork
class matches VRChat's OSC port:
class Network: LOCAL_IP = socket.gethostbyname(socket.gethostname()) # Auto-detected VRC_PORT = 9000 # Should match VRChat's OSC port
- The
-
Find your audio device indices:
python list_audio_devices.py
This will show all available audio devices with their index numbers.
-
Configure audio devices:
- Open
constants.py
in a text editor - Navigate to the
Audio
class and update the device indices:
class Audio: AUDIO_OUTPUT_INDEX = 6 # Replace with your speaker/headphone index AUDIO_INPUT_INDEX = 2 # Replace with your microphone index
- Open
-
Test audio functionality:
- Test microphone recording and TTS playback by running NOVA
- Adjust device indices if audio doesn't work properly
NOVA AI supports 29+ languages out of the box. To optimize for your preferred language:
-
Choose a multilingual TTS voice:
- Edit
constants.py
and find theVoice
class - The default voice
en-US-EmmaMultilingualNeural
already supports multiple languages - For better language-specific pronunciation, choose a native voice:
class Voice: # Multilingual voices (recommended): VOICE_NAME = "en-US-EmmaMultilingualNeural" # English + multilingual # VOICE_NAME = "fr-FR-VivienneMultilingualNeural" # French + multilingual # VOICE_NAME = "de-DE-SeraphinaMultilingualNeural" # German + multilingual # Native language voices: # VOICE_NAME = "es-ES-ElviraNeural" # Spanish # VOICE_NAME = "ja-JP-NanamiNeural" # Japanese # VOICE_NAME = "ko-KR-HyunsuMultilingualNeural" # Korean # VOICE_NAME = "zh-CN-XiaoyiNeural" # Chinese
- Edit
-
Optimize speech recognition for your language:
- For non-English languages, consider using a larger Whisper model:
class WhisperSettings: MODEL_SIZE = "small" # Better for multilingual (was "base") # Options: "tiny", "base", "small", "medium", "large"
-
Language detection is automatic:
- Whisper automatically detects the language you're speaking
- No additional configuration needed for language detection
- NOVA will respond in the language you use
NOVA AI uses a centralized configuration system in constants.py
that makes tuning and customization simple. All adjustable settings are organized into logical classes with clear documentation.
Open constants.py
in any text editor to modify NOVA's behavior. Here are the main configuration classes:
Configure networking and communication:
class Network:
LOCAL_IP = socket.gethostbyname(socket.gethostname()) # Auto-detected local IP
VRC_PORT = 9000 # VRChat OSC port
Set up your audio devices:
class Audio:
AUDIO_OUTPUT_INDEX = 6 # Speaker/headphone device index
AUDIO_INPUT_INDEX = 2 # Microphone device index
Customize text-to-speech:
class Voice:
VOICE_NAME = "en-US-EmmaMultilingualNeural" # Default multilingual voice
# Other multilingual options:
# VOICE_NAME = "es-ES-ElviraNeural" # Spanish
# VOICE_NAME = "fr-FR-VivienneMultilingualNeural" # French
# VOICE_NAME = "de-DE-SeraphinaMultilingualNeural" # German
# VOICE_NAME = "ja-JP-NanamiNeural" # Japanese
# VOICE_NAME = "ko-KR-HyunsuMultilingualNeural" # Korean
# VOICE_NAME = "zh-CN-XiaoyiNeural" # Chinese
class TTSSettings:
ENGINE = "edge-tts" # TTS engine (currently only edge-tts supported)
AUDIO_CONVERSION_FACTOR = 2**15 # Audio processing factor
QUEUE_SLEEP_INTERVAL = 0.1 # Queue processing interval
Adjust AI behavior and performance:
class LanguageModel:
MODEL_ID = "meta-llama/Llama-3.3-70B-Instruct-Turbo" # AI model to use
LM_TEMPERATURE = 0.7 # Creativity (0.0-1.0)
class LLM_API:
API_TYPE = "together" # API provider: "openai", "together"
BASE_URL = "https://api.together.xyz/v1" # API endpoint URL
API_KEY = os.getenv('LLM_API_KEY') # Your API key from .env file
class Vision_API:
API_TYPE = "together" # API provider: "openai", "together"
BASE_URL = "https://api.together.xyz/v1" # API endpoint URL
API_KEY = os.getenv('VISION_API_KEY') # Your vision API key from .env file
Fine-tune voice detection:
class WhisperSettings:
MODEL_SIZE = "base" # Whisper model: tiny, base, small, medium, large
SAMPLE_RATE = 16000 # Audio sample rate
FRAME_DURATION_MS = 30 # Frame duration for voice detection
NUM_PADDING_FRAMES = 10 # Voice detection padding
VOICE_THRESHOLD = 0.9 # Speech detection threshold (0.0-1.0)
MAX_RECORDING_DURATION = 30 # Maximum recording time in seconds
VAD_AGGRESSIVENESS = 0 # Voice detection sensitivity (0-3)
Control computer vision capabilities:
class VisionSystem:
ENABLED = False # Enable/disable vision system
ANALYSIS_INTERVAL = 15 # Screenshot analysis frequency (seconds)
MAX_IMAGE_SIZE = 1024 # Maximum image resolution for processing
VISION_MODEL = "qwen/qwen2.5-vl-7b" # AI vision model to use
VISION_TEMPERATURE = 0.3 # Vision analysis creativity (0.0-1.0)
Customize the performance monitor window:
class ResourceMonitor:
WINDOW_TITLE = "Nova Resource Monitor" # Monitor window title
WINDOW_WIDTH = 400 # Window width
WINDOW_HEIGHT = 745 # Window height
WINDOW_SIZE = f"{WINDOW_WIDTH}x{WINDOW_HEIGHT}" # Combined size
UPDATE_INTERVAL = 1000 # Update frequency (milliseconds)
ALWAYS_ON_TOP = True # Keep window on top
APPEARANCE_MODE = "dark" # GUI theme
COLOR_THEME = "dark-blue" # Color scheme
CORNER_RADIUS = 15 # Window corner radius
BORDER_WIDTH = 2 # Border width
Configure VRChat-specific features:
class NovaPlacement:
STARTUP_DELAY = 15 # Initial delay before starting placement (seconds)
DEFAULT_SPEED = 1 # Default movement speed
class VRChatAPI:
USING_API = False # Enable/disable VRChat API features
USERNAME = os.getenv('VRCHAT_EMAIL') # VRChat email from .env
PASSWORD = os.getenv('VRCHAT_PASSWORD') # VRChat password from .env
AUTO_ACCEPT_FRIEND_REQUESTS = True # Auto-accept friend requests
FRIEND_REQUEST_CHECK_INTERVAL = 60 # Check interval (seconds)
NOTIFICATION_CHECK_INTERVAL = 120 # Notification check interval
API_COOLDOWN = 30 # Cooldown between API calls
Configure status messages:
class SystemMessages:
INITIAL_USER_MESSAGE = "Who are you?" # First conversation starter
SYSTEM_STARTING = "System Starting" # VRChat startup message
THINKING_MESSAGE = "Thinking" # Processing message
LISTENING_MESSAGE = "Listening" # Voice input message
Customize console output colors:
class ConsoleColors:
# Various ANSI color codes for different types of console output
AI_LABEL = "\033[93m" # AI response label color
AI_TEXT = "\033[92m" # AI response text color
HUMAN_LABEL = "\033[93m" # Human input label color
HUMAN_TEXT = "\033[92m" # Human input text color
# ... and many more color options
All file locations are centralized:
class FilePaths:
HISTORY_PATH = "json_files/history.json" # Conversation history
NORMAL_SYSTEM_PROMPT_PATH = "prompts/normal_system_prompt.txt" # Main system prompt
# Vision system files (in VisionSystem class):
# STATE_FILE = "json_files/vision_state.json"
# LOG_FILE = "json_files/vision_log.json"
# VISION_PROMPT_PATH = "prompts/vision_prompt.txt"
Edit the system prompt files in the prompts/
folder:
normal_system_prompt.txt
- Default personality and behaviorsnapchat_system_prompt.txt
- Alternative casual personality modeadditional_system_prompt.txt
- Extra context and instructionsvision_prompt.txt
- Instructions for the vision system AI model
Making NOVA More Creative:
class LanguageModel:
LM_TEMPERATURE = 0.9 # Higher = more creative/random
Improving Speech Recognition:
class WhisperSettings:
MODEL_SIZE = "small" # Better accuracy than "base"
VAD_AGGRESSIVENESS = 2 # More sensitive voice detection
VOICE_THRESHOLD = 0.8 # Lower = more sensitive to speech
NUM_PADDING_FRAMES = 15 # More padding for better detection
Reducing Response Time:
class WhisperSettings:
MODEL_SIZE = "tiny" # Faster but less accurate
class ResourceMonitor:
UPDATE_INTERVAL = 2000 # Less frequent GUI updates
Using OpenAI Instead of Local Models:
class OpenAI:
BASE_URL = "https://api.openai.com/v1" # Official OpenAI API
# API_KEY will be loaded from .env file (set it to your real OpenAI key)
Enabling VRChat API Features:
class VRChatAPI:
USING_API = True # Enable VRChat API integration
AUTO_ACCEPT_FRIEND_REQUESTS = True # Automatically accept friend requests
FRIEND_REQUEST_CHECK_INTERVAL = 30 # Check more frequently
Enabling Vision Capabilities:
class VisionSystem:
ENABLED = True # Enable vision system
ANALYSIS_INTERVAL = 10 # More frequent analysis
VISION_MODEL = "gpt-4-vision-preview" # Use OpenAI vision model
Custom Voice Selection:
- NOVA uses Microsoft Edge TTS voices (no separate script needed)
- Update the
VOICE_NAME
in constants.py with your preferred voice - Common voices: "en-US-JennyNeural", "en-US-GuyNeural", "en-US-AriaNeural"
VRChat API Configuration:
- Set up
.env
file with VRChat credentials for API features - Enable
USING_API = True
in theVRChatAPI
class - Customize friend request and notification handling intervals
Vision System Setup:
- Enable
ENABLED = True
in theVisionSystem
class - Adjust
ANALYSIS_INTERVAL
for screenshot frequency - Configure
VISION_MODEL
for your preferred AI vision model
Network Troubleshooting:
VRC_PORT
is auto-detected but can be manually set if neededLOCAL_IP
is automatically determined using system network configuration
Performance Optimization:
- Adjust
UPDATE_INTERVAL
in ResourceMonitor for monitoring frequency - Modify Whisper settings for balance between accuracy and speed
- Configure vision system intervals based on your hardware capabilities
- Start with defaults - The included settings work well for most users
- Change one setting at a time - This helps identify what each change does
- Test thoroughly - Restart NOVA after making changes to see effects
- Keep backups - Save a copy of working configurations
- Use comments - Add your own notes in constants.py for custom settings
- Open Command Prompt or PowerShell
- Navigate to NOVA AI folder:
cd C:\Users\%USERNAME%\Documents\NOVA-AI
- Launch VRChat and join a world
- Start NOVA AI:
python main.py
- Wait for startup: You'll see colored messages indicating NOVA is loading
- Speak clearly: NOVA will automatically detect when you start speaking
- Wait for response: NOVA will process your speech and respond in VRChat's chatbox
- Continue conversation: NOVA remembers conversation context
- Press
Ctrl+C
in the Command Prompt to stop the program
"No module named" errors:
pip install --upgrade pip
pip install -r requirements.txt
Audio device not found:
- Run
python list_audio_devices.py
to find correct device indices - Update the
Audio
class inconstants.py
with correctAUDIO_INPUT_INDEX
andAUDIO_OUTPUT_INDEX
values
VRChat not receiving messages:
- Ensure OSC is enabled in VRChat settings
- Check that VRChat is running and you're in a world
- Verify the
VRC_PORT
in theNetwork
class matches VRChat's OSC port (default 9000)
API Connection errors:
- Verify your API keys are set correctly in the
.env
file - For OpenAI API: Check your account has available credits and model access
- For Together AI: Verify your API key is valid and has sufficient credits
- For LM Studio: Ensure LM Studio is running and the server is accessible at
http://localhost:1234
- Verify the
BASE_URL
in theLLM_API
class matches your chosen provider - Check console output for specific error messages
- Ensure your internet connection is stable for cloud API providers
Microphone not working:
- Check Windows microphone permissions
- Verify the microphone works in other applications
- Run
python list_audio_devices.py
and update indices in theAudio
class - Adjust
WhisperSettings.VAD_AGGRESSIVENESS
for better voice detection - Try different
VOICE_THRESHOLD
values (lower = more sensitive)
VRChat API Issues:
- Ensure
USING_API = True
inconstants.py
if you want API features - Verify VRChat credentials are correct in the
.env
file - Check if your VRChat account has two-factor authentication enabled
- Monitor console output for API-specific error messages
- Disable API features by setting
USING_API = False
if not needed
Vision System Issues:
- Ensure
ENABLED = True
in theVisionSystem
class to use vision features - Verify VRChat window is visible and active during operation
- Check that your AI model supports vision capabilities
- Monitor console output for vision-specific error messages
- Test with different
ANALYSIS_INTERVAL
values for your hardware
Resource Monitor Issues:
- The resource monitor runs as a separate process
- If it fails to start, check console output for error messages
Multilingual Issues:
- Speech not recognized in your language: Try using a larger Whisper model (
small
,medium
, orlarge
) inWhisperSettings.MODEL_SIZE
- TTS voice not working: Ensure the voice name is correct in
Voice.VOICE_NAME
- run the voice listing script to see available voices - Wrong language detection: Whisper may detect the wrong language if audio quality is poor - try speaking more clearly or adjusting microphone settings
- Mixed language responses: For better multilingual support, use multilingual TTS voices (ones with "Multilingual" in the name)
- Character encoding issues: Ensure your terminal supports UTF-8 encoding for non-Latin scripts (Chinese, Arabic, etc.)
- Check the console output for error messages - NOVA provides detailed colored output
- Verify all dependencies are installed correctly using
pip list
- Ensure your
.env
file is properly configured with valid credentials - Test each component individually:
- Audio devices:
python list_audio_devices.py
- Basic functionality: Start with minimal configuration
- Audio devices:
- Check that your AI backend (LM Studio or OpenAI) is accessible
- Review the configuration in
constants.py
for any syntax errors
For advanced audio routing and streaming setups:
- Install VB-Audio Virtual Cable
- Configure audio routing through virtual cables
- Update audio device indices in the
Audio
class inconstants.py
- This allows you to separate NOVA's audio from your main audio streams
Create custom personalities by editing files in the prompts/
directory:
- normal_system_prompt.txt: Main personality and behavior instructions
- snapchat_system_prompt.txt: Alternative casual personality mode
- additional_system_prompt.txt: Extra context and instructions
- vision_prompt.txt: Instructions for the vision AI model
Using Together AI (Default - Recommended):
- Sign up at api.together.xyz
- Get your API key from the dashboard
- Set
LLM_API_KEY=your-together-key
in your.env
file - Keep default
constants.py
settings (already configured for Together AI) - Enjoy fast inference with competitive pricing
Using Local Models (LM Studio):
- Download and install LM Studio
- Download compatible models (Llama, Mistral, etc.)
- Start the local server (default:
http://localhost:1234
) - Update
constants.py
to use LM Studio:class LLM_API: API_TYPE = "openai" # LM Studio uses OpenAI-compatible API BASE_URL = "http://localhost:1234/v1"
- Set
LLM_API_KEY=lm-studio
in your.env
file
Using OpenAI Cloud API:
- Set up OpenAI account and API key
- Update
.env
file with your real API key - Update
constants.py
for OpenAI:class LLM_API: API_TYPE = "openai" BASE_URL = "https://api.openai.com/v1"
- Consider cost implications for usage
Using Other Compatible APIs:
- Any OpenAI-compatible API can be used
- Update
BASE_URL
in theOpenAI
class - Ensure the API supports streaming responses
Running multiple NOVA instances:
- Create separate project folders
- Use different OSC ports for each instance
- Configure different audio devices if needed
- Modify
VRC_PORT
inconstants.py
for each instance
For developers wanting to extend NOVA:
- Install development dependencies:
pip install flake8
- Follow Python PEP 8 style guidelines
- Use the modular class structure in the
classes/
folder - Test changes thoroughly before deployment
We welcome contributions from the community! NOVA AI is open-source and benefits from community input.
- Fork the repository on GitHub
- Create a new branch for your feature:
git checkout -b feature-name
- Make your changes and test them thoroughly
- Commit your changes:
git commit -m "Add feature: description of changes"
- Push to your branch:
git push origin feature-name
- Create a pull request on GitHub
- Follow Python PEP 8 style guidelines
- Add comments and documentation for new features
- Test your changes thoroughly before submitting
- Update the README if you add new features or change setup procedures
- 🐛 Bug fixes and improvements
- ✨ New features and enhancements
- 📚 Documentation improvements
- 🧪 Testing and quality assurance
- 🎨 UI/UX improvements
- 🌐 Multi-language support
Special thanks to all the contributors who have helped make NOVA AI better:
- Evan Grinnell - Project Lead & Core Developer
- Duck Song - Core Contributor
- Viscrimson - Core Contributor
This project is licensed under the MIT License. See the LICENSE file for more details.
If you find NOVA AI helpful, consider:
- ⭐ Starring the repository on GitHub
- 🐛 Reporting bugs and issues
- 💡 Suggesting new features
- 🤝 Contributing to the codebase
For support, questions, or feature requests, please open an issue on GitHub.