I'm an undergrad researcher in the LARG Lab at UT Austin (Turing Scholar, CS Honors), advised by Prof. Peter Stone. I study how large models retain previously learned capabilities during sequential fine-tuning — a problem at the intersection of continual learning, reinforcement learning, and safe deployment.
Jiaheng Hu*, Jay Shim*, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
We show that simple sequential LoRA fine-tuning with RL avoids catastrophic forgetting in VLAs, matching or outperforming dedicated continual learning methods (EWC, Experience Replay, Weight Merge). Built a distributed training framework extending RLinf with custom post-training techniques and efficient dataloaders. Scaled JAX training infrastructure by 1000x via XLA profiling and JIT compilation redesign.
Jay Shim et al. · SoCal NLP Symposium 2024
Zero-shot inference-time decoding method achieving ~6% improvement on reasoning benchmarks (GSM8K, HotpotQA, CommonsenseQA) with Mistral-7B and Phi-1.5.
| continual-vla-rl | Continual RL for VLAs — PPO, GRPO, 5 CRL baselines on LIBERO. 196 commits, open-sourced. |
| MJX-PureJaxRL | GPU-accelerated RL — 50M env steps in 15 min via MJX + PureJaxRL integration. |
| AttentionIsAllYouNeed | Full Transformer reimplementation from scratch — 15.34 BLEU on WMT'14 DE→EN. |
B.S. Computer Science Honors (Turing Scholar) · UT Austin · GPA 3.97/4.00

