Skip to content
View ShimBoi's full-sized avatar

Highlights

  • Pro

Block or report ShimBoi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ShimBoi/README.md

Jay Shim

ML Researcher · Continual Learning · Post-Training · VLA Models

LinkedIn Google Scholar arXiv Blog Email


I'm an undergrad researcher in the LARG Lab at UT Austin (Turing Scholar, CS Honors), advised by Prof. Peter Stone. I study how large models retain previously learned capabilities during sequential fine-tuning — a problem at the intersection of continual learning, reinforcement learning, and safe deployment.


Research

Simple Recipe Works: VLAs are Natural Continual Learners with RL

Jiaheng Hu*, Jay Shim*, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin

Paper Code Stars

We show that simple sequential LoRA fine-tuning with RL avoids catastrophic forgetting in VLAs, matching or outperforming dedicated continual learning methods (EWC, Experience Replay, Weight Merge). Built a distributed training framework extending RLinf with custom post-training techniques and efficient dataloaders. Scaled JAX training infrastructure by 1000x via XLA profiling and JIT compilation redesign.

Contrastive Decoding for Improved CoT Reasoning in LLMs

Jay Shim et al. · SoCal NLP Symposium 2024

Zero-shot inference-time decoding method achieving ~6% improvement on reasoning benchmarks (GSM8K, HotpotQA, CommonsenseQA) with Mistral-7B and Phi-1.5.


Selected Projects

continual-vla-rl Continual RL for VLAs — PPO, GRPO, 5 CRL baselines on LIBERO. 196 commits, open-sourced.
MJX-PureJaxRL GPU-accelerated RL — 50M env steps in 15 min via MJX + PureJaxRL integration.
AttentionIsAllYouNeed Full Transformer reimplementation from scratch — 15.34 BLEU on WMT'14 DE→EN.

Tech Stack

Python JAX PyTorch CUDA C++ Docker Slurm Ray


Writing


B.S. Computer Science Honors (Turing Scholar) · UT Austin · GPA 3.97/4.00

Pinned Loading

  1. UT-Austin-RobIn/continual-vla-rl UT-Austin-RobIn/continual-vla-rl Public

    Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

    Python 59 1

  2. AttentionIsAllYouNeed AttentionIsAllYouNeed Public

    Recreating from scratch the original encoder-decoder transformers architecture

    Python

  3. RLBook2020 RLBook2020 Public

    My answers to the exercises and code to experiment

    Python