Skip to content

dunnolab/vintix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vintix: Action Model via In-Context Reinforcement Learning

License arXiv

An official implementation for Vintix: Action Model via In-Context Reinforcement Learning — a cross-domain model capable of learning behaviors through in-context reinforcement learning.

Load Training Data

The Vintix training dataset is hosted on a public S3 bucket and is freely available to everyone under the CC BY-SA 4.0 License.

You can download the dataset using the curl utility (or alternatives like wget). Be sure to unzip the downloaded file before use.

# approx 130GB size
curl -L -o VintixDataset.zip https://tinyurl.com/426ckafn

unzip VintixDataset.zip

Dataset Structure

The dataset consists of multiple .h5 files, each corresponding to a single trajectory in a specific environment. Each file is divided into groups of 10,000 steps (the last group in a trajectory may contain fewer). These groups contain the following keys:

  • proprio_observation: The sequence of observations (np.float32)
  • action: The sequence of actions taken in the environment (np.float32)
  • reward: The sequence of rewards received after each action (np.float32)
  • step_num: The sequence of step numbers within each episode (np.int32)

For more details on the collected trajectories, please refer to our paper.

Training Vintix

Train your Vintix model (run on 8xH100)

  1. Clone this repository
git clone https://github.com/dunnolab/vintix.git
  1. Prepare Python environment following this instruction
  2. Update the data_dir parameter in the train configuration file to the directory where the downloaded dataset was unpacked
  3. Update the save_dirparameter in the train configuration file to the directory where you want to save the model checkpoints
  4. Run the following command from vintix directory:
export WORLD_SIZE=$(nvidia-smi -L | wc -l)

cd vintix
OMP_NUM_THREADS=1 torchrun \
  --standalone \
  --nnodes=1 \
  --nproc-per-node=$WORLD_SIZE \
  --module scripts.train \
  --config_path vintix/scripts/train/configs/train_config.yaml

Model Performance

Detailed information about the model’s performance on each task across all domains (MuJoCo, Meta-World, Bi-DexHands, Industrial Benchmark) can be found in EVAL.md.

Usage Examples

To get started with Vintix, follow the next steps:

  1. Prepare Python environment following this instruction
  2. Clone this repository
git clone https://github.com/dunnolab/vintix.git
  1. Install Vintix
cd vintix
pip3 install -e .
  1. Download checkpoint from hugginface
pip3 install hugginface_hub
from huggingface_hub import snapshot_download

snapshot_download(repo_id="dunnolab/Vintix",
                  local_dir="/path/to/checkpoint")
  1. Use it. You can find simple usage example below or find more examples here
import torch
import metaworld
import gymnasium as gym
from vintix import Vintix


PATH_TO_CHECKPOINT = "/path/to/checkpoint"
model = Vintix()
model.load_model(PATH_TO_CHECKPOINT)
model.to(torch.device('cuda'))
model.eval()

# task_name = "Humanoid-v4"
task_name = "shelf-place-v2"
env = gym.make(task_name)
model.reset_model(task_name,
                  use_cache=True,
                  torch_dtype=torch.float16)
max_env_steps = 50

episode_rewards = []
for step in range(max_env_steps):
    cur_ep_rews = []
    observation, info = env.reset()
    reward = None
    done = False
    while not done:
        action = model.get_next_action(observation=observation,
                                       prev_reward=reward)
        observation, reward, termined, truncated, info = env.step(action)

        done = termined or truncated
        cur_ep_rews.append(reward)
    episode_rewards.append(sum(cur_ep_rews))
print(f"Rewards per episode for {task_name}: {episode_rewards}")

It’s worth mentioning that Vintix was trained and tested on Mujoco environments in version ‘v4’ (e.g., Ant-v4, Pusher-v4) and Meta-World environments in version ‘v2’ (e.g., assembly-v2, shelf-place-v2).

Other Domains

To validate the model on other domains, please use the links below.

Citation

If you would like to cite our work, please use the following bibtex

@article{polubarov2025vintix,
  author={Andrey Polubarov and Nikita Lyubaykin and Alexander Derevyagin and Ilya Zisman and Denis Tarasov and Alexander Nikulin and Vladislav Kurenkov},
  title={Vintix: Action Model via In-Context Reinforcement Learning},
  journal={arXiv},
  volume={2501.19400},
  year={2025}
}