CPQL: Peng's Q(λ) for Conservative Value Estimation in Offline Reinforcement Learning

🧵 This paper introduces CPQL: Conservative Peng's Q($\lambda$), mitigates overly-pessimistic value estimation, achieves the performance greater than (or equal to) that of the behavior policy, and provides near-optimal performance guarantees. This codebase is heavily inspired by CORL, an offline RL codebase.

Getting started

For first-time installation, please follow the installation instructions provided in the CORL GitHub repository.

git clone https://github.com/tinkoff-ai/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt

Training

To train d4rl datasets,

python algorithms/cpql.py --config configs/cpql/hopper/random_v2.yaml

Citing CORL

If you use CORL in your work, please use the following bibtex

@inproceedings{kim2026peng,
  title={Peng's Q ($$\backslash$lambda $) for Conservative Value Estimation in Offline Reinforcement Learning},
  author={Kim, Byeongchan and Oh, Min-hwan},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
algorithms		algorithms
config		config
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CPQL: Peng's Q(λ) for Conservative Value Estimation in Offline Reinforcement Learning

Getting started

Training

Citing CORL

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CPQL: Peng's Q(λ) for Conservative Value Estimation in Offline Reinforcement Learning

Getting started

Training

Citing CORL

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages