My first-author work was accepted to NeurIPS (Poster) Main Conference, and RLC (Spotlight), RSS(Spotlight),
and TTIC workshops. Check out the website and code.
Policy Iteration via Search Distillation — A clean implementation combining Policy Iteration with Gumbel MuZero, distilled into a fast
neural policy—using JAX/XLA; beats PPO.
NNX-Control — High-performance JAX control environments with end-to-end PPO training in a single file.
What is the optimal order of training data? Particle filters can be invariant to training data permutations, mitigiating plasticity loss and catastrophic forgetting.