Aneesh Muppidi

I am an incoming Stanford CS PhD student and currently a Rhodes Scholar at Oxford University, co-advised by Jakob Foerster and Joao Henriques.

Email  /  Scholar  /  Follow on X  /  Github

profile photo

Recent News

Open Source

  • Policy Iteration via Search Distillation — A clean implementation combining Policy Iteration with Gumbel MuZero, distilled into a fast neural policy—using JAX/XLA; beats PPO.
  • NNX-Control — High-performance JAX control environments with end-to-end PPO training in a single file.

Research

/
Pac-Man gate interpretability animation
Learning Planning Budgets in Real-Time RL
Aneesh Muppidi*, Firas Darwish*, Dylan Cope, Joao F. Henriques, Jakob Nicolaus Foerster
* equal contribution
OpenReview, 2026
project page / code / paper

Run RL in real-time, and choose how long to think.

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
Katrina Brown*, Aneesh Muppidi*, Rana Shahout
* equal contribution
ICML ES-FoMo, 2025
project page / code / arXiv

Train lightweight predictors to estimate reasoning requirements before generation, then greedily allocate tokens where they matter most.

Parameter-free Optimization for Reinforcement Learning
Aneesh Muppidi, Zhiyu Zhang, Hank Yang
NeurIPS, 2024
project page / code / arXiv

Mitigate plasticity loss, accelerate forward transfer, and avoid policy collapse with just one line of code. 

RL Projects

Let's Learn Agency: Learning Emergent Agent and Non-Agent Trajectory Representations
MIT 6.8200 with Pulkit Agrawal
Project Report, Video Presentation
Generating Suboptimal Expert Demonstrations with Large Language Models
MIT 6.4212 with Russ Tedrake, project advised by Lirui Wang
Project Report, Video Presentation
Rapid Learning Mechanisms and Neural Representations in Reinforcement Learning
Harvard PSY 2350R with Sam Gershman, project advised by Jay Henning
Project Report, Code, Research Notebook
Diffusion Policy for Classical Control Problems
Harvard ES158 with Heng Yang
Project Report, Code, Slides
Visualizing Collaborative Multi-Agent Reinforcement Learning
Harvard CS271 with Johanna Beyer and Hanspeter Pfister
Project Report, Slides

Modified from Jon Barron.