Aneesh C. Muppidi

Aneesh Muppidi

I am an incoming Stanford CS PhD student and currently a Rhodes Scholar at Oxford University, co-advised by Jakob Foerster and Joao Henriques.

Email / Scholar / Follow on X / Github

Open Source

Policy Iteration via Search Distillation — A clean implementation combining Policy Iteration with Gumbel MuZero, distilled into a fast neural policy—using JAX/XLA; beats PPO.
NNX-Control — High-performance JAX control environments with end-to-end PPO training in a single file.

Research

Finding the Time to Think in Real-Time RL
Aneesh Muppidi*, Firas Darwish*, Dylan Cope, Joao F. Henriques, Jakob Nicolaus Foerster
* equal contribution
Preprint, 2026
project page / code / paper

Run RL in real-time, and choose how long to think.

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
Katrina Brown*, Aneesh Muppidi*, Rana Shahout
* equal contribution
ICML ES-FoMo, 2025
project page / code / arXiv

Train lightweight predictors to estimate reasoning requirements before generation, then greedily allocate tokens where they matter most.

Parameter-free Optimization for Reinforcement Learning
Aneesh Muppidi, Zhiyu Zhang, Hank Yang
NeurIPS, 2024
project page / code / arXiv

Mitigate plasticity loss, accelerate forward transfer, and avoid policy collapse with just one line of code.

RL Projects
	Let's Learn Agency: Learning Emergent Agent and Non-Agent Trajectory Representations MIT 6.8200 with Pulkit Agrawal Project Report, Video Presentation
	Generating Suboptimal Expert Demonstrations with Large Language Models MIT 6.4212 with Russ Tedrake, project advised by Lirui Wang Project Report, Video Presentation
	Rapid Learning Mechanisms and Neural Representations in Reinforcement Learning Harvard PSY 2350R with Sam Gershman, project advised by Jay Henning Project Report, Code, Research Notebook
	Diffusion Policy for Classical Control Problems Harvard ES158 with Heng Yang Project Report, Code, Slides
	Visualizing Collaborative Multi-Agent Reinforcement Learning Harvard CS271 with Johanna Beyer and Hanspeter Pfister Project Report, Slides