Skip to content

Mesh-LLM/mesh-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,746 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mesh LLM

Mesh LLM web console

Mesh LLM pools GPUs and memory across machines and exposes the result as one OpenAI-compatible API at http://localhost:9337/v1. Start one node, add more nodes later, and let the mesh decide whether a model runs locally, routes to a peer, or uses Skippy stage splits for models that are too large for one box.

Quick start

Install the latest release:

curl -fsSL https://raw.githubusercontent.com/Mesh-LLM/mesh-llm/main/install.sh | bash

Join the public mesh and start serving:

mesh-llm serve --auto

That command chooses a backend flavor, downloads a suitable model if needed, joins the best discovered public mesh, starts the local API on port 9337, and starts the web console on port 3131.

Check available models:

curl -s http://localhost:9337/v1/models | jq '.data[].id'

Send an OpenAI-compatible request:

curl http://localhost:9337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"GLM-4.7-Flash-Q4_K_M","messages":[{"role":"user","content":"hello"}]}'

For server deployments, add --headless to hide the web UI while keeping the management API on the --console port:

mesh-llm serve --auto --headless

Pick the workflow you need

Goal Command Full guide
Try the public mesh mesh-llm serve --auto docs/MESHES.md
Start a private mesh mesh-llm serve --model Qwen3-8B-Q4_K_M docs/MESHES.md
Publish your own mesh mesh-llm serve --model Qwen3-8B-Q4_K_M --publish docs/MESHES.md
Join by invite token mesh-llm serve --join <token> docs/MESHES.md
Run an API-only client mesh-llm client --auto docs/MESHES.md
Run a big model with splits mesh-llm serve --model hf://meshllm/<repo>@<rev> --split docs/SKIPPY_SPLITS.md
Attach a Flash-MoE SSD backend mesh-llm serve with [[plugin]] name = "flash-moe" docs/plugins/flash-moe.md
Fan out one prompt to every model in the mesh curl ... -d '{"model":"mesh", ...}' docs/design/MOA_GATEWAY.md
Use Goose, OpenCode, Claude Code, or Pi mesh-llm goose, mesh-llm opencode, mesh-llm claude, mesh-llm pi docs/AGENTS.md
Build or contribute just build CONTRIBUTING.md

How the mesh works

  • Single-machine fit first. If one node can host the full model, it serves the model locally without stage traffic.
  • Mesh routing. Every node exposes the same /v1 API. Requests are routed by the model field to the peer that can serve that model.
  • Owner-control plane. Operator config and inventory actions use an additive mesh-llm-control/1 lane with explicit endpoint bootstrap, while public mesh join, gossip, routing, and inference stay on the public mesh plane for mixed-version compatibility.
  • Skippy stage splits. Large dense models can load as package-backed layer stages. The coordinator plans contiguous layer ranges, starts downstream stages first, waits for readiness, then publishes the stage-0 route.
  • Layer packages. Package repositories contain model-package.json plus GGUF fragments so peers fetch only the pieces needed for their assigned stage.
  • Public discovery. Published meshes advertise through Nostr discovery; private meshes stay invite-token based.

For a deeper operator guide, see docs/USAGE.md. For every CLI command and switch, see docs/CLI.md.

Mixture-of-Agents (model: "mesh") — experimental

⚠️ Experimental. The MoA gateway is new in this release. Behavior, routing heuristics, error shapes, and tuning knobs may change between versions while we tune it. Treat model: "mesh" as a preview feature rather than a stable production path; use a specific model id when you need stable semantics.

Send a request with "model": "mesh" and the proxy fans it out to every model available in the mesh in parallel, arbitrates their responses with deterministic logic, and returns one OpenAI-compatible reply. The arbiter runs in code (not as another model call) and only escalates to a reducer LLM on genuine conflict. Tool calls flow through the full pipeline.

curl http://localhost:9337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"mesh","messages":[{"role":"user","content":"What is the capital of Japan?"}]}'

Requires at least two distinct models in the mesh. See docs/design/MOA_GATEWAY.md for the architecture, arbitration rules, and tuning knobs.

Supported model families

Mesh LLM's Skippy runtime tracks llama.cpp family parity with reviewed GGUF representatives. The current reviewed support set covers 72 P0/P1 family rows, with 89 certified rows in the full parity inventory, including Qwen, Llama, Gemma, Mistral, DeepSeek, GLM, MiniMax, Phi, Granite, Hunyuan, EXAONE, Cohere, Falcon, RWKV, and many others.

Split multimodal serving is certified for Qwen2-VL, Qwen3-VL, Qwen3-VL-MoE, HunyuanOCR/Hunyuan-VL, and DeepSeek-OCR using real GGUF plus projector fixtures. DeepSeek3 and EXAONE-MoE use package-backed stages because the full GGUFs are too large for the cheap local baseline.

See docs/skippy/FAMILY_STATUS.md for the full artifact, split, wire dtype, cache policy, and exception matrix. See docs/skippy/LLAMA_PARITY.md for the remaining llama.cpp parity queue.

Install and build notes

Tagged releases publish macOS bundles plus Linux CPU, Linux ARM64 CPU, Linux ARM64 CUDA, Linux CUDA, Linux CUDA Blackwell, Linux ROCm, Linux Vulkan, Windows CPU, Windows CUDA, Windows ROCm, and Windows Vulkan bundles. Metal is macOS-only. The Linux ARM64 CPU artifact is mesh-llm-aarch64-unknown-linux-gnu.tar.gz; the Linux ARM64 CUDA artifact is mesh-llm-aarch64-unknown-linux-gnu-cuda.tar.gz. In install and release contexts, arm64 and aarch64 mean the same 64-bit ARM target.

Build from source with just:

git clone https://github.com/Mesh-LLM/mesh-llm
cd mesh-llm
just build

Source builds require just, cmake, Rust, and Node.js 24 + npm. CUDA builds need nvcc, ROCm builds need ROCm/HIP, and Vulkan builds need Vulkan dev files plus glslc.

The shipped mesh-llm executable uses embedded release attestation for provenance and admission hardening only. It does not apply to SDK, XCFramework, or other native artifacts, and it is not a runtime integrity proof. Verify a stamped packaged executable with cargo run -p xtask -- release-attestation inspect --binary <path-to-packaged-mesh-llm> --public-key-file <release-signing-public-key.json>. A packaged release binary reports valid, an unstamped local or dev build reports missing, and a binary that changed after packaging reports invalid. Bare inspect --binary ... is only enough to classify an unstamped binary as missing; stamped binaries require --public-key-file and otherwise report invalid with an explicit error. Post-download mutation can flip a stamped binary to invalid, but default startup still allows it.

Documentation hub

Doc Use it for
docs/MESHES.md Private meshes, public discovery, publishing, invite tokens, API-only clients
docs/SKIPPY_SPLITS.md Running big models with package-backed Skippy stage splits
docs/LAYER_PACKAGE_REPOS.md Contributing and publishing layer package repositories
docs/AGENTS.md Goose, Claude Code, OpenCode, Pi, curl, and blackboard
docs/EXO_COMPARISON.md Balanced comparison with Exo
docs/CLI.md Command reference and JSON automation
docs/USAGE.md Longer operational usage guide, runtime control, owner-control operator flows
docs/design/TESTING.md Testing playbook, mixed-version QA, remote deploy checks
docs/plugins/flash-moe.md Optional Flash-MoE SSD expert streaming backend setup
docs/skippy/FAMILY_STATUS.md Certified Skippy model-family status
docs/specs/layer-package-repos.md Manifest and artifact format spec

Community

Mesh LLM is experimental distributed-systems software. When you report bugs, include the command you ran, platform/backend flavor, /api/status output if available, and whether the node was private, published, or joined with --auto.

About

Distributed AI/LLM for the people. Share compute privately or publicly to power your agents and chat.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors