cryptic-bench-monorepo

A monorepo containing a CLI tool for benchmarking LLMs on cryptic crossword clues and a Next.js web visualizer.

Project Structure

Root: CLI tool and benchmark logic
web/: Next.js web application for visualizing results
packages/shared/: Shared TypeScript types and utilities

Setup

# Install all dependencies at once
bun install

This uses Bun workspaces to manage dependencies across all packages.

To run:

bun run index.ts

This project was created using bun init in bun v1.3.5. Bun is a fast all-in-one JavaScript runtime.

cryptic-bench

This tool benchmarks LLMs (via OpenRouter) on a set of crossword clues. It runs each model over a set of clues and reports accuracy and average score.

Setup

Install dependencies with Bun:

bun install

Set your OpenRouter API key in the environment:

export OPENROUTER_API_KEY="sk-..."

The CLI will automatically load variables from a .env file if present. Copy .env.example to .env and update the value.

Usage

CLI Tool

Run benchmarks from the root:

# Run a set of models explicitly:
bun run src/index.ts --models=openai/gpt-4o,google/gemini-3

# Or use the built-in `models.json` list (default). To point at another list:
bun run src/index.ts --models-file=./my-models.json

# Quick smoke test mode (runs a single small model):
bun run src/index.ts --mode=test
OR
bun run src/index.ts --test

Web Visualizer

Start the Next.js development server:

bun run dev:web

Open http://localhost:3000 to view results (the app reads results_test.json if present, otherwise results.json).

Development Scripts

# Install dependencies for all packages
bun run install:all

# Run CLI tool only
bun run dev:cli

# Run web visualizer only
bun run dev:web

# Run both CLI and web (in parallel)
bun run dev:all

# Build web application
bun run build:web

# Run all builds
bun run build:all

Or supply the API key inline (not recommended for security):

bun run src/index.ts --api-key=sk-... --models=openai/gpt-4o

Clues

Edit clues/clues.json to provide cryptic clues and answers. The format is an array of objects:

{ "id": "c1", "clue": "Not yes (2)", "answer": "NO" }

Notes:

This is a starting scaffold — replace the example clues with an authoritative set of cryptic clues and expected answers.
Scoring is a simple normalized comparison with some tolerance via Levenshtein distance; you can extend src/scorer.ts for richer evaluation.

Notes:

This is a starting scaffold — replace the example clues with an authoritative set of cryptic clues and expected answers.
Scoring is a simple normalized comparison with some tolerance via Levenshtein distance; you can extend src/scorer.ts for richer evaluation.

If you'd like, I can add CSV/JSON exporters, more advanced scoring (e.g., synonyms), or a test harness for running many models in parallel. ✅

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.cursor/rules		.cursor/rules
clues		clues
packages/shared		packages/shared
src		src
web		web
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
costs.json		costs.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cryptic-bench-monorepo

Project Structure

Setup

cryptic-bench

Setup

Usage

CLI Tool

Web Visualizer

Development Scripts

Clues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cryptic-bench-monorepo

Project Structure

Setup

cryptic-bench

Setup

Usage

CLI Tool

Web Visualizer

Development Scripts

Clues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages