VampNet Experiments

This repository contains experiments and tutorials for understanding VampNet, a masked acoustic token modeling approach for music generation.

Overview

VampNet is a neural audio codec model that uses masked token modeling to generate, compress, and transform music. This project breaks down VampNet's operation into understandable components, showing both high-level usage and low-level implementation details.

vampnet_tutorial.ipynb - Main tutorial notebook demonstrating:
- High-level VampNet interface usage
- Step-by-step breakdown of the generation pipeline
- Low-level implementation matching the high-level interface
- Visualization of tokens, masks, and generation process
assets/ - Audio files for testing:
- stargazing.wav - Example input audio
- example.wav - Additional test audio
- vampnet.png - VampNet architecture diagram

Requirements

See requirements.txt for dependencies. Key packages include:

vampnet - The VampNet model implementation
audiotools - Audio processing utilities
torch - PyTorch for model inference
matplotlib - For visualizations
ipython - For notebook audio playback

Setup

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Download pretrained models:
- All pretrained models (trained by Hugo) are stored at: https://huggingface.co/hugggof/vampnet
- Download the models from this link: https://zenodo.org/record/8136629
- Extract the models to the models/ folder
- Licensing for Pretrained Models: The weights for the models are licensed CC BY-NC-SA 4.0. Likewise, any VampNet models fine-tuned on the pretrained models are also licensed CC BY-NC-SA 4.0.
Run the notebook:
```
jupyter notebook vampnet_tutorial.ipynb
```

Key Concepts

Masked Token Modeling

VampNet operates on discrete audio tokens obtained from a neural codec. It uses masking strategies to:

Preserve periodic prompts (e.g., every 13th token)
Mask upper codebooks while preserving lower ones
Generate new tokens in masked positions

Generation Pipeline

Preprocessing: Normalize audio to -24 dBFS
Encoding: Convert audio to discrete tokens using neural codec
Masking: Apply strategic masking patterns
Coarse Generation: Generate coarse tokens with transformer
Fine Generation: Refine with coarse-to-fine model
Decoding: Convert tokens back to audio

Parameters

periodic_prompt: Interval for keeping unmasked tokens (e.g., 13)
upper_codebook_mask: Number of lower codebooks to preserve
temperature: Sampling temperature for generation
typical_filtering: Whether to use typical sampling

References

Paper: VAMPNET: Music generation via masked acoustic token modeling (García et al., 2023)
Original implementation: https://github.com/hugofloresgarcia/vampnet

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
mask_test_notebooks		mask_test_notebooks
.gitignore		.gitignore
README.md		README.md
Vampnet_MusicHackspace.pptx		Vampnet_MusicHackspace.pptx
requirements.txt		requirements.txt
vampnet_tutorial.ipynb		vampnet_tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VampNet Experiments

Overview

Contents

Requirements

Setup

Key Concepts

Masked Token Modeling

Generation Pipeline

Parameters

References

About

Uh oh!

Releases

Packages

Languages

stephendwolff/vampnet_experiments

Folders and files

Latest commit

History

Repository files navigation

VampNet Experiments

Overview

Contents

Requirements

Setup

Key Concepts

Masked Token Modeling

Generation Pipeline

Parameters

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages