Skip to content

jianrontan/AllIn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

91 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AllIn: Game-Theory-Optimal Heads-Up Poker AI

AllIn is a production-grade artificial intelligence for heads-up No-Limit Texas Hold'em, built on Monte Carlo CFR+ (Counterfactual Regret Minimization) โ€” the same family of self-play, regret-minimization algorithms behind championship-level poker bots. It approximates game-theory-optimal (GTO) strategy through millions of iterations of self-play, serves that strategy through a Flask API, and exposes it in an interactive React platform.


๐ŸŽฏ AI & Machine Learning Overview

๐Ÿง  The Intelligence Engine

  • Monte Carlo CFR+ with external sampling: each iteration samples chance and opponent actions, walking one trajectory through the game tree instead of the full exponential tree โ€” making millions of training iterations tractable.
  • Discounted CFR+ (Linear-CFR-style): time-discounted updates โ€” regret discount ฮฑ = 1.5, strategy-sum discount ฮณ = 2.0 โ€” for faster, more stable convergence toward a Nash equilibrium. (CFR+ with a ((t-1)/t)^ฮฑ discount on floored regrets and a separate t^ฮณ-weighted average strategy โ€” not the canonical DCFR ฮฑ/ฮฒ/ฮณ scheme.)
  • Self-play reinforcement learning: no human data and no hand-crafted heuristics โ€” the strategy emerges purely from regret minimization.
  • Multi-layer abstraction: a hierarchical state representation built from decoupled 30-fine / 10-coarse equity-based preflop buckets + distribution-aware (potential-aware) postflop buckets (20 flop / 16 turn / 10 river) clustered by Earth Mover's Distance over equity distributions.

๐Ÿ“Š Trained Blueprint (active model)

Served blueprint (capped run, 25M-iteration snapshot โ€” see Deployment):
โ”œโ”€โ”€ Algorithm:          Monte Carlo CFR+ with external sampling + Linear-CFR-style discount
โ”œโ”€โ”€ Training iterations: 25,550,000 (the least-exploitable snapshot; served as blueprint_final.db)
โ”œโ”€โ”€ Info sets:          128,177 (trained situations stored)
โ”œโ”€โ”€ Game:               Heads-up NLHE, 100 BB effective stacks (SB 1 / BB 2)
โ””โ”€โ”€ Storage:            SQLite (incremental checkpoint + resume)

๐Ÿ”ฌ Algorithmic Architecture

Training Pipeline:
Random self-play deal โ†’ Monte Carlo CFR+ traversal โ†’ regret/strategy update โ†’
SQLite checkpoint โ†’ automatic active-blueprint selection โ†’ API inference

Core technologies (actually used):

  • NumPy for vectorized numerical computing (regret matching, the exploitability evaluator)
  • phevaluator โ€” high-performance C hand-strength library
  • SQLite (WAL mode) for incremental, resumable strategy storage
  • Flask REST API ยท React + Vite frontend
  • Hypothesis property-based testing for engine correctness

๐Ÿš€ Why CFR+? (Algorithmic Highlights)

  • External sampling turns a full game-tree traversal into a single sampled path per iteration โ€” the key to scaling to millions of iterations.
  • CFR+ regret flooring (clamping cumulative regrets at 0) accelerates convergence over vanilla CFR.
  • Position-aware information sets learn in-position and out-of-position play separately.
  • Stack-aware game engine models real chip costs, all-ins, and side-stack constraints โ€” not a toy abstraction.
  • Exploitability evaluator measures how far the blueprint is from unexploitable (best-response, in milli-big-blinds/hand) so convergence is measured, not assumed.

๐Ÿ›  Technical Stack

๐Ÿ AI / ML Backend

  • Python 3.12 โ€” core development language
  • NumPy โ€” vectorized regret matching and best-response evaluation
  • phevaluator โ€” O(1) hand evaluation via precomputed tables
  • SQLite โ€” blueprint persistence with checkpoint/resume + read-while-writing

๐Ÿงฎ Algorithms

  • Monte Carlo CFR+ with external sampling and Linear-CFR-style discounting
  • Nash-equilibrium approximation through iterative self-play
  • Feature engineering: equity-based card bucketing, action abstraction, and position-aware information-set keys

๐ŸŒ Full-Stack Integration

  • Flask API โ€” strategy lookup + live game endpoints
  • React + Vite frontend โ€” strategy explorer and play-vs-bot table
  • PyPokerEngine โ€” used in the test harness for bot-vs-bot simulation
  • phevaluator โ€” fast showdown evaluation

๐ŸŽฏ Key Features

๐Ÿค– Strategy Engine

  • Fast inference: direct blueprint lookup from SQLite, no per-decision search.
  • Distribution-aware abstractions: 30-fine/10-coarse decoupled preflop + 20/16/10 potential-aware postflop buckets (EMD-clustered equity distributions).
  • Mixed-strategy output: probability distributions over fold / call / bet / raise / all-in, sampled at play time.
  • Honest "unknown" handling: situations never reached in training report found: false rather than guessing.

๐Ÿ“Š Interactive Platform

  • Strategy Explorer โ€” look up the blueprint's play for any spot:
    • Hand Explorer: enter real cards + a betting line, see the resulting info-set key and strategy.
    • Key Explorer: build an info-set key from abstraction dropdowns (or paste one) and see the strategy.
  • Play vs the Bot โ€” an interactive heads-up table against the trained AI, 100 BB deep, with full action and pot tracking.

๐Ÿ”ฌ Quality & Correctness

  • Exploitability scoring via a vectorized best-response walk of the public game tree (tests/run_evaluation.py).
  • Property-based testing (Hypothesis) over the engine's semantic invariants โ€” chip conservation, call/all-in arithmetic, legal-action shape โ€” backed by a documented bug log.

๐Ÿ›  Getting Started

Prerequisites

  • Python 3.12
  • Node.js 18+ (frontend)
  • Git

1. Clone

git clone https://github.com/jianrontan/AllIn.git
cd AllIn

2. Backend + API

# Install Python dependencies
cd backend
pip install -r requirements.txt

# Start the inference API (must run from backend/api/)
cd api
python strategy_api.py        # http://localhost:5000

3. Frontend

cd frontend
npm install
npm run dev                   # http://localhost:5173

๐ŸŽ“ Train your own blueprint

cd backend/bot

# Quick smoke run (seconds)
python -c "from tests.run_blueprint_trainer import run_training; run_training(100)"

# A real run โ€” checkpoints as it goes; resume any time with resume='<db>.db'
python -c "from tests.run_blueprint_trainer import run_training; run_training(5000000)"

Training writes a timestamped backend/bot/analysis/blueprints/blueprint_*.db. The API and bot automatically use the blueprint with the most iterations โ€” no manual promotion step.

๐Ÿ“Š Using the platform

  1. Open the frontend at http://localhost:5173.
  2. Strategy Explorer: enter a hand + betting line (or build an info-set key) and get the GTO strategy with probabilities.
  3. Play vs the Bot: play heads-up against the AI and watch how it responds.

๐Ÿ“ˆ Measure blueprint quality

cd backend/bot
python tests/run_evaluation.py --samples 1000   # exploitability in mbb/hand (lower = better)

๐Ÿ—บ Roadmap

  • โœ… Blueprint training โ€” Monte Carlo CFR+ with SQLite checkpoint/resume
  • โœ… Serving + Play-vs-bot โ€” Flask API + React platform
  • โœ… Exploitability evaluation โ€” best-response convergence scoreboard
  • โœ… Hand-level Bayesian range tracker โ€” opponent-range belief with confidence
  • โœ… River subgame solving โ€” real-time re-solving of the river with full pot/stack information and the live range (the shippable real-time-solving feature)
  • ๐ŸงŠ Turn/flop depth-limited solving โ€” built and validated in the lab, but shelved: it lowered exploitability yet did not beat the blueprint in real games (a cross-street consistency problem needing continual re-solving). See ROADMAP.
  • ๐Ÿ“… Online 1v1 play on AWS โ€” DynamoDB session store, Cloudflare Pages frontend, +EV leaderboard (unrestricted human bet sizing already shipped)

See docs/ROADMAP.md for detail, and docs/DEVELOPER_GUIDE.md for the architecture.


๐Ÿ“š Documentation

Doc Purpose
USER_GUIDE.md Install, train, run, play, evaluate
docs/DEVELOPER_GUIDE.md Architecture and module reference
docs/ROADMAP.md Phase status and what's next
docs/TRAININGFLOW.md One CFR+ iteration, end to end
CLAUDE.md Canonical short reference for contributors
backend/bot/docs/BUG_LOG.md Correctness bug history

About

Poker bot built using Counterfactual Regret Minimizatation, implenting game theory concepts and Monte Carlo methods to achieve optimal decision-making

Resources

Stars

Watchers

Forks

Contributors