AllIn is a production-grade artificial intelligence for heads-up No-Limit Texas Hold'em, built on Monte Carlo CFR+ (Counterfactual Regret Minimization) โ the same family of self-play, regret-minimization algorithms behind championship-level poker bots. It approximates game-theory-optimal (GTO) strategy through millions of iterations of self-play, serves that strategy through a Flask API, and exposes it in an interactive React platform.
- Monte Carlo CFR+ with external sampling: each iteration samples chance and opponent actions, walking one trajectory through the game tree instead of the full exponential tree โ making millions of training iterations tractable.
- Discounted CFR+ (Linear-CFR-style): time-discounted updates โ regret discount
ฮฑ = 1.5, strategy-sum discount ฮณ = 2.0 โ for faster, more stable convergence
toward a Nash equilibrium. (CFR+ with a
((t-1)/t)^ฮฑdiscount on floored regrets and a separatet^ฮณ-weighted average strategy โ not the canonical DCFR ฮฑ/ฮฒ/ฮณ scheme.) - Self-play reinforcement learning: no human data and no hand-crafted heuristics โ the strategy emerges purely from regret minimization.
- Multi-layer abstraction: a hierarchical state representation built from decoupled 30-fine / 10-coarse equity-based preflop buckets + distribution-aware (potential-aware) postflop buckets (20 flop / 16 turn / 10 river) clustered by Earth Mover's Distance over equity distributions.
Served blueprint (capped run, 25M-iteration snapshot โ see Deployment):
โโโ Algorithm: Monte Carlo CFR+ with external sampling + Linear-CFR-style discount
โโโ Training iterations: 25,550,000 (the least-exploitable snapshot; served as blueprint_final.db)
โโโ Info sets: 128,177 (trained situations stored)
โโโ Game: Heads-up NLHE, 100 BB effective stacks (SB 1 / BB 2)
โโโ Storage: SQLite (incremental checkpoint + resume)
Training Pipeline:
Random self-play deal โ Monte Carlo CFR+ traversal โ regret/strategy update โ
SQLite checkpoint โ automatic active-blueprint selection โ API inference
Core technologies (actually used):
- NumPy for vectorized numerical computing (regret matching, the exploitability evaluator)
- phevaluator โ high-performance C hand-strength library
- SQLite (WAL mode) for incremental, resumable strategy storage
- Flask REST API ยท React + Vite frontend
- Hypothesis property-based testing for engine correctness
- External sampling turns a full game-tree traversal into a single sampled path per iteration โ the key to scaling to millions of iterations.
- CFR+ regret flooring (clamping cumulative regrets at 0) accelerates convergence over vanilla CFR.
- Position-aware information sets learn in-position and out-of-position play separately.
- Stack-aware game engine models real chip costs, all-ins, and side-stack constraints โ not a toy abstraction.
- Exploitability evaluator measures how far the blueprint is from unexploitable (best-response, in milli-big-blinds/hand) so convergence is measured, not assumed.
- Python 3.12 โ core development language
- NumPy โ vectorized regret matching and best-response evaluation
- phevaluator โ O(1) hand evaluation via precomputed tables
- SQLite โ blueprint persistence with checkpoint/resume + read-while-writing
- Monte Carlo CFR+ with external sampling and Linear-CFR-style discounting
- Nash-equilibrium approximation through iterative self-play
- Feature engineering: equity-based card bucketing, action abstraction, and position-aware information-set keys
- Flask API โ strategy lookup + live game endpoints
- React + Vite frontend โ strategy explorer and play-vs-bot table
- PyPokerEngine โ used in the test harness for bot-vs-bot simulation
- phevaluator โ fast showdown evaluation
- Fast inference: direct blueprint lookup from SQLite, no per-decision search.
- Distribution-aware abstractions: 30-fine/10-coarse decoupled preflop + 20/16/10 potential-aware postflop buckets (EMD-clustered equity distributions).
- Mixed-strategy output: probability distributions over fold / call / bet / raise / all-in, sampled at play time.
- Honest "unknown" handling: situations never reached in training report
found: falserather than guessing.
- Strategy Explorer โ look up the blueprint's play for any spot:
- Hand Explorer: enter real cards + a betting line, see the resulting info-set key and strategy.
- Key Explorer: build an info-set key from abstraction dropdowns (or paste one) and see the strategy.
- Play vs the Bot โ an interactive heads-up table against the trained AI, 100 BB deep, with full action and pot tracking.
- Exploitability scoring via a vectorized best-response walk of the public
game tree (
tests/run_evaluation.py). - Property-based testing (Hypothesis) over the engine's semantic invariants โ chip conservation, call/all-in arithmetic, legal-action shape โ backed by a documented bug log.
- Python 3.12
- Node.js 18+ (frontend)
- Git
git clone https://github.com/jianrontan/AllIn.git
cd AllIn# Install Python dependencies
cd backend
pip install -r requirements.txt
# Start the inference API (must run from backend/api/)
cd api
python strategy_api.py # http://localhost:5000cd frontend
npm install
npm run dev # http://localhost:5173cd backend/bot
# Quick smoke run (seconds)
python -c "from tests.run_blueprint_trainer import run_training; run_training(100)"
# A real run โ checkpoints as it goes; resume any time with resume='<db>.db'
python -c "from tests.run_blueprint_trainer import run_training; run_training(5000000)"Training writes a timestamped backend/bot/analysis/blueprints/blueprint_*.db. The API and
bot automatically use the blueprint with the most iterations โ no manual
promotion step.
- Open the frontend at
http://localhost:5173. - Strategy Explorer: enter a hand + betting line (or build an info-set key) and get the GTO strategy with probabilities.
- Play vs the Bot: play heads-up against the AI and watch how it responds.
cd backend/bot
python tests/run_evaluation.py --samples 1000 # exploitability in mbb/hand (lower = better)- โ Blueprint training โ Monte Carlo CFR+ with SQLite checkpoint/resume
- โ Serving + Play-vs-bot โ Flask API + React platform
- โ Exploitability evaluation โ best-response convergence scoreboard
- โ Hand-level Bayesian range tracker โ opponent-range belief with confidence
- โ River subgame solving โ real-time re-solving of the river with full pot/stack information and the live range (the shippable real-time-solving feature)
- ๐ง Turn/flop depth-limited solving โ built and validated in the lab, but shelved: it lowered exploitability yet did not beat the blueprint in real games (a cross-street consistency problem needing continual re-solving). See ROADMAP.
- ๐ Online 1v1 play on AWS โ DynamoDB session store, Cloudflare Pages frontend, +EV leaderboard (unrestricted human bet sizing already shipped)
See docs/ROADMAP.md for detail, and docs/DEVELOPER_GUIDE.md for the architecture.
| Doc | Purpose |
|---|---|
| USER_GUIDE.md | Install, train, run, play, evaluate |
| docs/DEVELOPER_GUIDE.md | Architecture and module reference |
| docs/ROADMAP.md | Phase status and what's next |
| docs/TRAININGFLOW.md | One CFR+ iteration, end to end |
| CLAUDE.md | Canonical short reference for contributors |
| backend/bot/docs/BUG_LOG.md | Correctness bug history |