Skip to content

Hsiang-1/MiniOpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

Ke Zhao1,* · Zixiang Di1,* · Hong Qian1,† · Xiang Shu2 · Yaolin Wen1 · Qitao Shi2 · Bingdong Li1
Xingyu Lu2 · Xiangfeng Wang1 · Jun Zhou2 · Ke Tang3 · Yang Yu4

*Equal Contribution, Corresponding Authors.

1East China Normal University    |    2AntGroup   |   3Southern University of Science and Technology   |   4Nanjing University

Framework

MiniOpt is an end-to-end optimization solving paradigm based on reinforcement learning with verifiable reward (RLVR). It enables small language models (1.5B-14B parameters) to achieve state-of-the-art performance in solving optimization problems from natural language descriptions, significantly reducing computational costs while maintaining competitive accuracy.

📊 Performance

MiniOpt achieves remarkable performance across 8 optimization benchmarks.

Category Model / Method Performance
SA Avg. (%) ER Avg. (%)
General Models Qwen2.5-3B-Instruct 11.23 16.57
Qwen2.5-7B-Instruct 33.20 41.86
Qwen2.5-14B-Instruct 47.46 60.64
DeepSeek-V3 60.14 81.86
General Models (Thinking) Qwen3-4B 11.16 14.02
Qwen3-8B 21.79 25.43
Qwen3-14B 23.78 30.04
DeepSeek-R1 60.85 82.24
Gemini-2.5-Pro 57.39 88.87
GPT-5 57.54 84.73
Prompt-based Methods Chain-of-Experts 45.78 60.33
OptiMUS 20.65 49.43
Reflexion 45.54 78.28
Learning-based Models Step-OPT-Qwen2.5-3B 39.76 54.65
Step-OPT-Qwen2.5-7B 52.22 69.76
OptMATH-7B 54.62 83.39
LLMOPT-14B 60.10 89.75
Ours MiniOpt-3B 59.65 87.92
MiniOpt-7B 64.76 91.17

🛠️ Installation

Prerequisites

  • Python 3.10+

  • Conda package manager

Setup

# Clone the repository
# git clone https://github.com/xxxxx/MiniOpt.git
# cd MiniOpt

# Create a conda environment
conda create -n MiniOpt python=3.10 -y
conda activate MiniOpt

# Install the required packages
bash init.sh

🔍 File System

.
├── init.sh
├── README.md
├── datasets
│   ├── rl_dataset
│   │   └── example.parquet
│   └── sft_dataset
│       └── example.jsonl
├── inference
│   └── inference.py
├── prompts
│   ├── code_conversion.py
│   ├── question_scenario_labeling.py
│   ├── question_type_labeling.py
│   └── rl_prompt.py
├── rl
│   ├── configs
│   │   ├── rl_example.sh
│   │   ├── rl_phase1.sh
│   │   └── rl_phase2.sh
│   ├── opt_reward.py
│   ├── pyomo_executor.py
│   └── rl.sh
└── sft
    ├── configs
    │   ├── merge_config.yaml
    │   └── sft_config.yaml
    ├── data
    │   └── dataset_info.json
    └── sft.sh
  • datasets: Examples of SFT/RL training dataset.
  • inference: Example of using the fine-tuned model to infer an optimization problem.
  • prompts: All the prompts used and mentioned in our paper.
  • rl: This folder includes the opt_reward and the execution method of pyomo code. The configs folder includes the 2-stage rl training configuration files and a configuration example. rl.sh shows how to use these scripts.
  • sft: This folder provides the code for SFT based on LLaMAFacroty, including dataset configuration (./sft/data/dataset_info.json), fine-tuning script (./sft/configs/sft_config.yaml), and post-training model merge script (./sft/configs/merge_config.yaml). sft.sh shows how to use these scripts.
  • init.sh shows the setup of the environment.

🚦 Usage

SFT Warm-up

  1. Prepare the sft training dataset. Here is an example of SFT training dataset format: ./datasets/sft_dataset/example.jsonl.
  2. Config the dataset. Here is an example of LLaMAFactory dataset configuration: ./sft/data/dataset_info.json.
  3. Run SFT and merge Lora model. The hyperparameter setting used for SFT warm-up in our paper is shown in ./sft/configs/sft_config.yaml.
cd sft
bash sft.sh

RL Training

  1. Prepare your RL training dataset and eval dataset. MiniOpt uses a 2-stage RL training approach, including Paradigm Acquisition (phase 1) and Optimization Generalization (phase 2). Although the training data used in the two stages are different, the format and attributes are the same. Here is an example of RL training dataset format: ./datasets/rl_dataset/example.jsonl.
  2. Run RL training. The training parameters of the 2-stage RL are fully listed in ./rl/configs/rl_phase1.sh and ./rl/configs/rl_phase2.sh.
cd rl
bash rl.sh

Inference

Run python ./inference/inference.py to perform inference. This script shows the system prompt used for inference and tests the first case of nl4opt_test benchmark.

=======

💭 Citation

If you find this repository useful in your research, please cite:

@misc{zhao2026minioptreasoningmodelsolve,
      title={MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources}, 
      author={Ke Zhao and Zixiang Di and Hong Qian and Xiang Shu and Yaolin Wen and Qitao Shi and Bingdong Li and Xingyu Lu and Xiangfeng Wang and Jun Zhou and Ke Tang and Yang Yu},
      year={2026},
      eprint={2606.25832},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.25832}, 
}

About

Code for MiniOpt paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors