Vintix: Action Model via In-Context Reinforcement Learning

An official implementation for Vintix: Action Model via In-Context Reinforcement Learning — a cross-domain model capable of learning behaviors through in-context reinforcement learning.

Load Training Data

The Vintix training dataset is hosted on a public S3 bucket and is freely available to everyone under the CC BY-SA 4.0 License.

You can download the dataset using the curl utility (or alternatives like wget). Be sure to unzip the downloaded file before use.

# approx 130GB size
curl -L -o VintixDataset.zip https://tinyurl.com/426ckafn

unzip VintixDataset.zip

Dataset Structure

The dataset consists of multiple .h5 files, each corresponding to a single trajectory in a specific environment. Each file is divided into groups of 10,000 steps (the last group in a trajectory may contain fewer). These groups contain the following keys:

proprio_observation: The sequence of observations (np.float32)
action: The sequence of actions taken in the environment (np.float32)
reward: The sequence of rewards received after each action (np.float32)
step_num: The sequence of step numbers within each episode (np.int32)

For more details on the collected trajectories, please refer to our paper.

Training Vintix

Train your Vintix model (run on 8xH100)

Clone this repository

git clone https://github.com/dunnolab/vintix.git

Prepare Python environment following this instruction
Update the data_dir parameter in the train configuration file to the directory where the downloaded dataset was unpacked
Update the save_dirparameter in the train configuration file to the directory where you want to save the model checkpoints
Run the following command from vintix directory:

export WORLD_SIZE=$(nvidia-smi -L | wc -l)

cd vintix
OMP_NUM_THREADS=1 torchrun \
  --standalone \
  --nnodes=1 \
  --nproc-per-node=$WORLD_SIZE \
  --module scripts.train \
  --config_path vintix/scripts/train/configs/train_config.yaml

Model Performance

Detailed information about the model’s performance on each task across all domains (MuJoCo, Meta-World, Bi-DexHands, Industrial Benchmark) can be found in EVAL.md.

Usage Examples

To get started with Vintix, follow the next steps:

Prepare Python environment following this instruction
Clone this repository

git clone https://github.com/dunnolab/vintix.git

Install Vintix

cd vintix
pip3 install -e .

Download checkpoint from hugginface

pip3 install hugginface_hub

from huggingface_hub import snapshot_download

snapshot_download(repo_id="dunnolab/Vintix",
                  local_dir="/path/to/checkpoint")

Use it. You can find simple usage example below or find more examples here

import torch
import metaworld
import gymnasium as gym
from vintix import Vintix


PATH_TO_CHECKPOINT = "/path/to/checkpoint"
model = Vintix()
model.load_model(PATH_TO_CHECKPOINT)
model.to(torch.device('cuda'))
model.eval()

# task_name = "Humanoid-v4"
task_name = "shelf-place-v2"
env = gym.make(task_name)
model.reset_model(task_name,
                  use_cache=True,
                  torch_dtype=torch.float16)
max_env_steps = 50

episode_rewards = []
for step in range(max_env_steps):
    cur_ep_rews = []
    observation, info = env.reset()
    reward = None
    done = False
    while not done:
        action = model.get_next_action(observation=observation,
                                       prev_reward=reward)
        observation, reward, termined, truncated, info = env.step(action)

        done = termined or truncated
        cur_ep_rews.append(reward)
    episode_rewards.append(sum(cur_ep_rews))
print(f"Rewards per episode for {task_name}: {episode_rewards}")

It’s worth mentioning that Vintix was trained and tested on Mujoco environments in version ‘v4’ (e.g., Ant-v4, Pusher-v4) and Meta-World environments in version ‘v2’ (e.g., assembly-v2, shelf-place-v2).

Other Domains

To validate the model on other domains, please use the links below.

Bi-DexHands: Docker Image, Code Snippet, Environment
Industrial Benchmark: Docker Image, Code Snippet, Environment

Citation

If you would like to cite our work, please use the following bibtex

@article{polubarov2025vintix,
  author={Andrey Polubarov and Nikita Lyubaykin and Alexander Derevyagin and Ilya Zisman and Denis Tarasov and Alexander Nikulin and Vladislav Kurenkov},
  title={Vintix: Action Model via In-Context Reinforcement Learning},
  journal={arXiv},
  volume={2501.19400},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docker		docker
envs/wrappers		envs/wrappers
figures		figures
scripts		scripts
vintix		vintix
EVAL.md		EVAL.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vintix: Action Model via In-Context Reinforcement Learning

Load Training Data

Dataset Structure

Training Vintix

Model Performance

Usage Examples

Other Domains

Citation

About

Contributors 2

Languages

License

dunnolab/vintix

Folders and files

Latest commit

History

Repository files navigation

Vintix: Action Model via In-Context Reinforcement Learning

Load Training Data

Dataset Structure

Training Vintix

Model Performance

Usage Examples

Other Domains

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages