- [2026.05.01] ETS is accepted as a poster at ICML 2026.
- [2026.01.29] We have uploaded ETS paper to arXiv, and open-sourced ETS on Github!
- [2026.05.15] Support vllm for ETS and AIME24 avg@32 evaluation for all methods.
We introduce ETS (Energy-Guided Test-Time Scaling), a training-free inference method that samples directly from the optimal RL policy under a unified Masked Language Modeling (MLM) framework that covers both:
- Autoregressive Models (ARMs)
- Diffusion Language Models (DLMs)
Core idea:
For RL objective, the optimal policy admits a closed-form structure. ETS leverages this to construct an optimal transition kernel that factorizes into:
- a reference transition given by a base model
$p_{\mathrm{ref}}$ , and - an energy term that is a conditional expectation of exponentiated rewards.
Run the following script to setup environment.
git clone https://github.com/sheriyuo/ETS.git
cd ETS
pip install -e .ETS compute is dominated by three hyperparameters:
-
$M$ : number of candidates per guidance step -
$K$ : number of Monte Carlo estimation,$K=3$ works best in most cases -
$I$ : number of guidance steps
For evaluating autoregressive models (Qwen), the ETS compute parameters map to:
-
$M$ :m_candidates -
$K$ :k_monte_carlo -
$I$ is implicit . It is determined by the total decoding length and block granularity:-
max_length= total generation length$d_x$ -
block_size= block length$B$ - so
$I = \lceil \mathrm{max_length} / \mathrm{block_size} \rceil$
-
For evaluating diffusion language models (LLaDA), the mapping is explicit:
-
$I$ :guide_steps -
$M$ :num_candidates -
$K$ :monte_carlo_num
We evaluate in a pass@1 setting on:
-
Math/Reasoning: MATH500, GSM8K
-
Coding: HumanEval
-
STEM: GPQA (Diamond)
Evaluate ETS with transformers:
cd qwen
bash eval.shEvaluate ETS with vllm:
cd qwen_vllm
bash eval.shcd llada
bash eval.sh- Download BytedTsinghua-SIA/AIME-2024 to a local directory and update the data path in
aime24/aime24.yamlto point to your local dataset. - Replace the existing
utils.pyandaime24.yamlinlm_eval/tasks/aime/. For example:
rm -rf /usr/local/miniconda3/lib/python3.10/site-packages/lm_eval/tasks/aime/aime24.yaml
rm -rf /usr/local/miniconda3/lib/python3.10/site-packages/lm_eval/tasks/aime/utils.py
cp aime24/aime24.yaml /usr/local/miniconda3/lib/python3.10/site-packages/lm_eval/tasks/aime/aime24.yaml
cp aime24/utils.py /usr/local/miniconda3/lib/python3.10/site-packages/lm_eval/tasks/aime/utils.py- Execute
eval_aime.sh.
@inproceedings{li2026ets,
title={ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment},
author={Li, Xiuyu and Zhang, Jinkai and Yi, Mingyang and Li, Yu and Wang, Longqiang and Wang, Yue and Fan, Ju},
booktitle={Forty-third International Conference on Machine Learning},
year={2026}
}