Runze Liu RyanLiu112

🎯

Focusing

Incoming Ph.D. @ HKU & Master's student @ THU

Achievements

compute-optimal-tts Public

Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".

Python 267 21
GenPRM Public

Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".

Python 79 2
Awesome-Process-Reward-Models Public

A comprehensive collection of process reward models.

95 1
wizard-III/ArcherCodeR Public

ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning.

Python 5
MRN Public

[NeurIPS 2022] Official codebase for "Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning".

Python 23 5
ChangWinde/RAT Public

[AAAI 2025 Oral] Official code for "RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors"

Python 14

Less

RyanLiu112 has no activity yet for this period.