GitHub - Hsiang-1/MiniOpt: Code for MiniOpt paper.

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

Ke Zhao^1,* · Zixiang Di^1,* · Hong Qian^1,† · Xiang Shu² · Yaolin Wen¹ · Qitao Shi² · Bingdong Li¹
Xingyu Lu² · Xiangfeng Wang¹ · Jun Zhou² · Ke Tang³ · Yang Yu⁴

^*Equal Contribution, ^†Corresponding Authors.

¹East China Normal University | ²AntGroup | ³Southern University of Science and Technology | ⁴Nanjing University

MiniOpt is an end-to-end optimization solving paradigm based on reinforcement learning with verifiable reward (RLVR). It enables small language models (1.5B-14B parameters) to achieve state-of-the-art performance in solving optimization problems from natural language descriptions, significantly reducing computational costs while maintaining competitive accuracy.

📊 Performance

MiniOpt achieves remarkable performance across 8 optimization benchmarks.

Category	Model / Method	Performance
Category	Model / Method	SA Avg. (%)	ER Avg. (%)
General Models	Qwen2.5-3B-Instruct	11.23	16.57
	Qwen2.5-7B-Instruct	33.20	41.86
	Qwen2.5-14B-Instruct	47.46	60.64
	DeepSeek-V3	60.14	81.86
General Models (Thinking)	Qwen3-4B	11.16	14.02
	Qwen3-8B	21.79	25.43
	Qwen3-14B	23.78	30.04
	DeepSeek-R1	60.85	82.24
	Gemini-2.5-Pro	57.39	88.87
	GPT-5	57.54	84.73
Prompt-based Methods	Chain-of-Experts	45.78	60.33
	OptiMUS	20.65	49.43
	Reflexion	45.54	78.28
Learning-based Models	Step-OPT-Qwen2.5-3B	39.76	54.65
	Step-OPT-Qwen2.5-7B	52.22	69.76
	OptMATH-7B	54.62	83.39
	LLMOPT-14B	60.10	89.75
Ours	MiniOpt-3B	59.65	87.92
	MiniOpt-7B	64.76	91.17

🛠️ Installation

Prerequisites

Python 3.10+
Conda package manager

Setup

# Clone the repository
# git clone https://github.com/xxxxx/MiniOpt.git
# cd MiniOpt

# Create a conda environment
conda create -n MiniOpt python=3.10 -y
conda activate MiniOpt

# Install the required packages
bash init.sh

🔍 File System

.
├── init.sh
├── README.md
├── datasets
│   ├── rl_dataset
│   │   └── example.parquet
│   └── sft_dataset
│       └── example.jsonl
├── inference
│   └── inference.py
├── prompts
│   ├── code_conversion.py
│   ├── question_scenario_labeling.py
│   ├── question_type_labeling.py
│   └── rl_prompt.py
├── rl
│   ├── configs
│   │   ├── rl_example.sh
│   │   ├── rl_phase1.sh
│   │   └── rl_phase2.sh
│   ├── opt_reward.py
│   ├── pyomo_executor.py
│   └── rl.sh
└── sft
    ├── configs
    │   ├── merge_config.yaml
    │   └── sft_config.yaml
    ├── data
    │   └── dataset_info.json
    └── sft.sh

datasets: Examples of SFT/RL training dataset.
inference: Example of using the fine-tuned model to infer an optimization problem.
prompts: All the prompts used and mentioned in our paper.
rl: This folder includes the opt_reward and the execution method of pyomo code. The configs folder includes the 2-stage rl training configuration files and a configuration example. rl.sh shows how to use these scripts.
sft: This folder provides the code for SFT based on LLaMAFacroty, including dataset configuration (./sft/data/dataset_info.json), fine-tuning script (./sft/configs/sft_config.yaml), and post-training model merge script (./sft/configs/merge_config.yaml). sft.sh shows how to use these scripts.
init.sh shows the setup of the environment.

🚦 Usage

SFT Warm-up

Prepare the sft training dataset. Here is an example of SFT training dataset format: ./datasets/sft_dataset/example.jsonl.
Config the dataset. Here is an example of LLaMAFactory dataset configuration: ./sft/data/dataset_info.json.
Run SFT and merge Lora model. The hyperparameter setting used for SFT warm-up in our paper is shown in ./sft/configs/sft_config.yaml.

cd sft
bash sft.sh

RL Training

Prepare your RL training dataset and eval dataset. MiniOpt uses a 2-stage RL training approach, including Paradigm Acquisition (phase 1) and Optimization Generalization (phase 2). Although the training data used in the two stages are different, the format and attributes are the same. Here is an example of RL training dataset format: ./datasets/rl_dataset/example.jsonl.
Run RL training. The training parameters of the 2-stage RL are fully listed in ./rl/configs/rl_phase1.sh and ./rl/configs/rl_phase2.sh.

cd rl
bash rl.sh

Inference

Run python ./inference/inference.py to perform inference. This script shows the system prompt used for inference and tests the first case of nl4opt_test benchmark.

=======

💭 Citation

If you find this repository useful in your research, please cite:

@misc{zhao2026minioptreasoningmodelsolve,
      title={MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources}, 
      author={Ke Zhao and Zixiang Di and Hong Qian and Xiang Shu and Yaolin Wen and Qitao Shi and Bingdong Li and Xingyu Lu and Xiangfeng Wang and Jun Zhou and Ke Tang and Yang Yu},
      year={2026},
      eprint={2606.25832},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.25832}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

📊 Performance

🛠️ Installation

Prerequisites

Setup

🔍 File System

🚦 Usage

SFT Warm-up

RL Training

Inference

💭 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
datasets		datasets
inference		inference
prompts		prompts
rl		rl
sft		sft
README.md		README.md
init.sh		init.sh

Folders and files

Latest commit

History

Repository files navigation

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

📊 Performance

🛠️ Installation

Prerequisites

Setup

🔍 File System

🚦 Usage

SFT Warm-up

RL Training

Inference

💭 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages