Environment
- Add Metadrive environment and its configurations (#192)
- Add Sampled MuZero/UniZero and DMC environment with related configurations (#260)
- Polish Chess environment and its render method; add unit tests and configurations (#272)
- Add Jericho environment and its related configurations (#307)
Algorithm
- Add Harmony Dream loss balance in MuZero (#242)
- Adopt AlphaZero for non-zero-sum games (#245)
- Add AlphaZero CTree unittest (#306)
- Add recent MCTS-related papers (#324)
- Introduce rope to use the true timestep index as pos_index (#266)
- Add Jericho DDP configuration (#337)
Enhancement
- Add LightZero Sphinx documentation (#237)
- Add Wandb support (#294)
- Add Atari100k metric utilities (#295)
- Add eval_benchmark tests (#296)
- Include save_replay and collect_episode_data options in Jericho (#333)
- Add an MCTS TicTacToe demo in a single file (#315)
Polish
- Polish efficiency and performance on Atari and DMC (#292)
- Update requirements (#298)
- Optimize reward/value/policy_head_hidden_channels (#314)
- Update configuration and log instructions in tutorials (#330)
Fix
- Fix DownSample issues for different observation shapes (#254)
- Fix the wrong chance values in Stochastic MuZero (#275)
- Use display_frames_as_gif in CartPole (#288)
- Fix the chance encoder in stochastic_muzero_model_mlp.py (#284)
- Correct typo in model/utils.py (#290)
- Fix SMZ compile_args and num_simulations bug in world_model (#297)
- Fix reward type bug in 2048 and OS import issue in CartPole (#304)
- Switch to macos-13 in action (#319)
- Fix SMZ & SEZ config for pixel-based DMC (#322)
- Fix update_per_collect in DDP setting (#321)
- Fix bug with obs_shape tuple in initialize_zeros_batch (#327)
- Fix prepare_obs_stack_for_unizero issue (#328)
- Fix random_policy when len(ready_env_id) < collector_env_num (#335)
- Fix timestep compatibility issues (#339)
CI & Test
Full Changelog: v0.1.0...v0.2.0
Contributors: @ruiheng123 @TuTuHuss @HarryXuancy @ShivamKumar2002 @Roland0511 @cmarlin @xiongjyu @PaParaZz1 @puyuan1996