Observable, Traceable, Data-Driven
TriageCore is an observable, traceable, data-driven control-plane harness for evaluating local AI routing decisions. It records structured evidence about model and runtime choices, agent-group behavior, token use, latency, quality gates, and energy-measurement tier before making stronger orchestration claims. The project remains an early research workbench for AI-assisted software work that keeps local control, reviewable artifacts, and privacy boundaries visible to the operator. It can generate preflight and handoff packets, inspect privacy-safe route audit events, run local benchmark/report workflows, and support a bounded Qwen Cloud path for external-safe packets.
Status TriageCore is active as a local-first prototype/workbench. Current capabilities, supporting docs, tests, and demo paths are in-repo now. Broader governance, release polish, and long-term environmental-edge integrations should be treated as ongoing work, not completed product claims.
- Verifies operator environment and local repo state with
tc doctor. - Generates reviewable preflight and handoff artifacts with
tc preflightandtc handoff. - Records and inspects privacy-safe route audit events with
tc audit. - Validates and renders offline Task Envelope and Admission Evidence contracts via the
tc task-envelopeandtc admissionCLI tools. - Validates static agent authority manifests with
tc authority checkwithout granting execution authority. - Supports local benchmark fixtures and benchmark reports without hiding the evidence trail.
- Enforces local-only privacy boundaries before any optional external-safe Qwen Cloud path is considered.
- It makes local vs cloud execution explicit instead of burying that decision inside an agent loop.
- It preserves human review, permission boundaries, and fail-closed local-only handling as core workflow rules.
- It produces inspectable artifacts and route evidence instead of relying on vague autonomy claims.
- It is useful today as safer AI-assisted SDLC framing and useful later as a control pattern for environmental edge workflows.
TriageCore includes a local-first Workspace Unifier for orientation, focus selection, review artifacts, and bounded handoffs. It works from local YAML files, keeps previews reviewable, and does not turn agent coordination into an approval surface.
1. Create your private configuration:
mkdir ~/.triagecore
# Create ~/.triagecore/work_items.yaml (your full registry)
# Create ~/.triagecore/today.yaml (your daily focus list)2. Generate the dashboard:
tc workspace dashboard --items ~/.triagecore/work_items.yaml --today ~/.triagecore/today.yaml --output ~/.triagecore/dashboard.html3. Open it (Windows/PowerShell):
Invoke-Item ~/.triagecore/dashboard.html
# Or just: ii ~/.triagecore/dashboard.html| Command | Purpose |
|---|---|
tc workspace board --items <path> |
Read-only board view grouped by status. |
tc workspace wbs --items <path> |
Read-only work breakdown view by area, project, and component. |
tc workspace now --items <path> --today <path> |
Read-only daily focus view combining the registry and today list. |
tc workspace dashboard --items <path> --today <path> --output <path> |
Writes a static HTML dashboard with no external dependencies. |
tc workspace handoff --items <path> --id <id> --tool <tool> |
Exports a bounded handoff packet for a selected item. |
tc workspace import-github --repo <owner/repo> --output <path> |
Imports open GitHub issues into a preview file for review. |
tc workspace review-import --preview <path> |
Reviews imported preview items before promotion. |
tc workspace promote --items <path> --preview <path> --id <id> --output <path> |
Explicitly promotes selected preview items into a live board file. |
tc workspace close --items <path> --id <id> |
Generates a closing packet and can persist closure metadata only when explicitly directed. |
tc workspace review --items <path> |
Shows a weekly review view of stale, active, and blocked work. |
tc workspace touch --items <path> --id <id> |
Updates review metadata for an item with explicit write intent. |
tc workspace export-eval --items <path> --id <id> --output <path> |
Writes a static evaluator-input packet for a selected work item without scoring it. |
Note on GitHub imports: You can import open GitHub issues via
tc workspace import-github --repo owner/repo --output preview.yaml. Generated previews are review artifacts, not the live board. Usetc workspace promoteto select which items to pull into your realwork_items.yaml.
- TriageCore = policy, contracts, state, and CLI engine.
- TriageDesk = human control cockpit.
- Meta-harness = agent coordination layer.
- Independent evaluator = external assessment layer.
- See Workspace Evaluator Preview for the file-contract-based workspace export that feeds external assessment without importing or invoking the evaluator.
- See Fluidic Signal Paths for the architecture note on how context, handoffs, approvals, evaluator outputs, and evidence should flow between these layers.
- Local-first.
- Read-only by default.
- Explicit mutation only.
- Backup support for in-place writes.
- Generated previews are review artifacts, not the live board.
- Dashboard has no external dependencies.
- Handoffs omit private notes by default.
- Evaluator must not become approval authority.
Capture → Clarify → Promote → Focus → Handoff → Execute → Close → Weekly Review
- Does not replace TriageDesk.
- Does not approve actions automatically.
- Does not execute agent work.
- Does not mutate GitHub.
- Does not import everything into the live board.
- Does not make the meta-harness the source of truth.
TriageDesk should become the human-facing cockpit for approvals, evidence, review, and dashboard operation. Meta-harness should coordinate agents and sessions. Independent evaluator should assess whether observed behavior matched expected control boundaries. TriageCore remains the stable contract/evidence substrate.
Current capabilities
- local-first operator workflow
- route audit inspection
- benchmark scaffolding and reports
- bounded Qwen Cloud escalation for external-safe packets only
Planned / future-facing
- public release polish such as release metadata upkeep and GitHub metadata
- deeper environmental-edge packaging around Clear Lake Watch style workflows
Research framing
- methodology, evidence schema, and benchmark comparison docs remain first-class because the project is also an evaluation workbench, not only a tool wrapper
Install locally:
git clone https://github.com/coreytshaffer/TriageCore
cd TriageCore
pip install -e .Then run:
tc doctor
tc demo --dry-run
tc preflight CR-017
tc handoff latest --print
tc audit --self-test
tc audit --kind route_audit --last 10
tc audit --kind demo_dry_run --last 5
tc audit --privacy-invariants
triagecore benchmark --list-onlyOptional deeper verification:
tc model check --manifest docs\security\examples\model_route_manifest_local_ollama.json
tc model warn --manifest docs\security\examples\model_route_manifest_local_ollama.json --route docs\security\examples\route_payload_local_ollama.json
tc model warn --manifest docs\security\examples\model_route_manifest_cloud_qwen.json --route docs\security\examples\route_payload_local_ollama.json
tc authority check --manifest docs\security\examples\agent_authority_manifest_reviewer.jsonExpected outputs:
tc doctorconfirms repo root, Python, CLI path, ledger path, and pytest visibility.tc demo --dry-runshows the offline safety-control loop from packet summary through human review and writes one metadata-only demo event.tc preflight CR-017writes a handoff artifact under.triagecore/handoffs/.tc handoff latest --printprints a reviewable handoff packet.tc audit --self-testwrites one privacy-saferoute_auditevent.tc audit --kind route_audit --last 10shows routing metadata without raw prompt/data leakage.tc audit --kind demo_dry_run --last 5shows the deterministic demo evidence without raw request or proposed-output content.tc audit --privacy-invariantsscans the persistent ledger for forbidden raw-content keys and confirms the CR-021 invariant still holds.triagecore benchmark --list-onlyshows the benchmark fixture set without contacting a backend.tc model checkvalidates the documented manifest example locally.tc model warnprovides warning-only route/manifest comparison visibility and remains non-blocking when mismatches exist.tc authority checkvalidates the static authority-manifest example without writing ledger or identity state.
For a hop-by-hop walkthrough of how a task's route decision, evidence record, review state, and verification evidence link together, see Reviewer Traceability.
Sample audit transcript:
> tc audit --self-test
Success: Wrote privacy-safe route_audit self-test event to ...\.triagecore\ledger.jsonl.
> tc audit --kind route_audit --last 10
[2026-06-11T03:39:17.292773+00:00] Task: audit-self-test | Type: route_audit
Decision: allowed | Reason: audit_self_test
Privacy: public (Scan Passed: True)
Local Only: False | Route: self_test | Backend: self_test
The deterministic demo runs offline, calls no model backend, and changes no source files. It demonstrates the current workflow structure and review gates; it is not evidence of production safety certification.
The manifest warning commands are optional deeper verification only. They demonstrate route/manifest comparison visibility, not runtime enforcement, backend probing, or production certification.
Start here if you want the shortest guided path:
- Daily-Driver Quickstart
- Hackathon Demo
- Judge Submission Bundle
- Verification Guide
- Evidence Schema
- Agent Identity Provenance Boundary
- External Runtime Admission Governance
- Benchmark Fixtures
- Public Evidence Example
TriageCore models external agent actions using a tri-part governance model:
- Task Envelope (The Contract)
- Admission Evidence (The Proof)
- Execution Sidecar (The Enforcer)
The execution sidecar is the future/runtime integration boundary; the current CLI tools provide deterministic preflight governance evidence.
These commands validate and render operator-facing governance artifacts. They do not execute external runtimes, write to the ledger, or mutate approval state.
# Draft or preview envelopes
tc task-envelope wizard
tc task-envelope draft --from-json docs/examples/task-envelope.example.json
# Validate strict schemas
tc task-envelope validate --from-json docs/examples/task-envelope.example.json
tc admission validate --from-json docs/examples/admission-evidence.example.json
# Render for operator review
tc admission render --from-json docs/examples/admission-evidence.example.jsonFor more details, refer to:
Current in-repo proof markers:
- a runnable reviewer path using existing commands
- a judge-facing submission bundle under
docs/submission/ - a privacy-safe route audit self-test and public evidence example
- persistent artifact privacy invariant audit via
tc audit --privacy-invariants - benchmark fixtures and benchmark-report scaffolding
- a full offline-oriented test suite runnable with
python -m pytest -q - a public README test badge backed by the GitHub Actions workflow
Proof markers that still depend on GitHub/release state rather than repository files:
- release metadata upkeep
- GitHub About description
- GitHub topics
You can install TriageCore locally for CLI access:
git clone https://github.com/coreytshaffer/TriageCore
cd TriageCore
pip install -e .For a bounded operator walkthrough that works with existing commands, see docs/workflows/hackathon_demo.md.
For the judge-facing submission bundle, start with docs/submission/README.md.
That demo is designed to support:
- TriageCore local-first plus optional Qwen Cloud escalation as the primary story
- safer AI-assisted SDLC as the secondary framing
- Clear Lake Watch or other environmental edge workflows as a future extension
Launch the local control plane GUI to actively review tasks, monitor telemetry, and perform context planning:
triagecore desk- Read-Only Operator Console (Baseline Tag:
triagedesk-daily-driver-baseline-2026-06-25): The GUI acts strictly as a read-only telemetry and context-planning tool. To ensure the UI does not become a hidden execution surface, it relies exclusively ontriagedesk_adapter.pyand performs zero LLM calls, file writes, or ledger mutations directly. - Energy-Aware Routing:
psutilintegration actively monitors your battery life. If your battery dips below 20% while unplugged, TriageCore refuses to run heavy LLM tasks and prompts you to plug in (Permacomputing in action). - Telemetry & Resource Accounting: Tracks measured or heuristic resource estimates for energy consumption (kWh/Joules) and carbon emissions (gCO2e) in a local append-only ledger (
.triagecore/ledger.jsonl). - Local-First Benefit Signals: The dashboard foregrounds accepted yield, local-first routing share, accepted local work, and review-light tasks so the bench encourages continued evidence collection while formal reports remain baseline-bound.
Audit the files your agents modify to ensure they didn't bypass the initial risk assessment:
triagecore audit <task_id> --files src/main.py- Scope Verification: Flags modified files that were not in the original target list.
- Profile Adherence: Blocks changes if the task was rated
read-only. - Escalation Detection: Static analysis checks for
requests,socket,subprocess, etc., if the task was classified as low-risk.
TriageCore supports pluggable backends so you can process tasks against any local runner without manually wrangling URLs. All local generations route through a unified OpenAICompatibleBackend adapter, and the Qwen Cloud path stays bounded behind explicit external-safe routing.
You can configure your TriageClient with the following presets:
from triage_core import TriageClient
client = TriageClient(backend_type="ollama", model="qwen2.5-coder:7b")client = TriageClient(backend_type="vllm", model="Qwen/Qwen2.5-Coder-7B-Instruct")client = TriageClient(backend_type="llama.cpp", model="local-model")from triage_core.backends import OpenAICompatibleBackend
from triage_core import TriageClient
backend = OpenAICompatibleBackend(
name="lmstudio",
base_url="http://localhost:1234/v1",
model="local-model"
)
client = TriageClient(backend=backend)TriageCore is also being developed as a scientific model evaluation and token-balancing workbench. Each task attempt can be treated as an experimental observation that records routing decisions, backend behavior, token use, validation outcomes, energy estimates, and human review results.
The project methodology is documented in docs/methodology.md. Supporting literature is collected in docs/references.md. Together, these describe the evidence loop for model evaluation, safety routing, mistake logging, and human-reviewed learning.
The shared evidence schema is documented in docs/evidence_schema.md. The first repeatable study plan is docs/study_001_local_model_baseline.md, model/backend comparison is planned in docs/study_002_model_backend_comparison.md, and Codex/Antigravity supervision is described in docs/codex_antigravity_bridge.md.
Use docs/verification_guide.md for practical code, UI, study-evidence, and human-review verification checks.
TriageCore is inspired by sustainable and permacomputing practices that emphasize sufficiency, repairability, visible infrastructure, and graceful operation under constraints.
Rather than optimizing for maximum automation, TriageCore optimizes for bounded, reviewable, locally controlled developer-agent work.
Design commitments:
- Prefer local files over remote services.
- Prefer Markdown, JSON, and TOML over opaque state.
- Prefer small task packets over broad autonomous sessions.
- Prefer explicit permission recommendations over silent execution.
- Prefer deferral or refusal when a task is too broad, risky, or wasteful.
- Preserve human review as a first-class part of the workflow.
- Treat compute, attention, battery, trust, and hardware lifespan as scarce resources.
TriageCore includes repeatable benchmark fixtures in benchmarks/tasks.jsonl. List them without contacting a backend:
triagecore benchmark --list-onlyRun them against a local backend and append model-evaluation evidence to .triagecore/ledger.jsonl:
triagecore benchmark --backend-type ollama --model qwen2.5-coder:7bTag formal study runs so reports can exclude exploratory ledger history:
triagecore benchmark --study-id study_001 --run-id trial_001Summarize benchmark evidence:
triagecore benchmark-report
triagecore benchmark-report --output reports/benchmark-report.md
triagecore benchmark-report --study-id study_001 --run-id trial_001 --output reports/study_001_benchmark_report.mdCompare backend/model pairs by giving each run a unique run_id under one study:
triagecore benchmark --study-id study_002 --run-id ollama_qwen25_coder_7b_trial_001 --backend-type ollama --model qwen2.5-coder:7b-triagecore
triagecore benchmark --study-id study_002 --run-id lmstudio_loaded_model_trial_001 --backend-type custom --base-url http://localhost:1234/v1 --model <loaded-model-name>
triagecore benchmark-report --study-id study_002 --output reports/study_002_model_backend_comparison.mdComparison reports include By Supervision, By Backend, By Model, and By Category sections. When supervised benchmark records exist, reports also include a Supervisor Reviews table with decision counts and estimated supervisor token totals under the same study/run filter.
TriageCore can generate pending learning proposals from ledger evidence, but it does not automatically change routing behavior:
triagecore propose-lessonsRecord an explicit human decision:
triagecore review-lesson <proposal_id> --decision accepted --notes "Evidence supports this routing change."Record a Codex or Antigravity supervisor decision for a task:
triagecore record-supervisor-review <task_id> --tool codex --decision needs_revision --notes "Local draft missed tests." --model gpt-5 --profile high
triagecore record-supervisor-review <task_id> --tool antigravity --decision accepted --notes "IDE supervisor accepted the local draft." --model gemini-3.1-pro-high --profile supervisorImport supervisor usage from a verified JSON or JSONL artifact:
triagecore scan-supervisor-usage supervisor_logs\
triagecore import-supervisor-usage supervisor_usage.jsonl --tool codex --token-source imported_exact --dry-run
triagecore import-supervisor-usage supervisor_usage.jsonl --tool codex --token-source imported_exactTriageCore provides a convenient CLI for generating agent task bundles offline:
Generate a default AGENTS.md file in your repository:
triagecore init-agentsCreate a standalone markdown task file (triage_tasks/codex_task_low.md) for Codex:
triagecore codex-task --prompt "Refactor the database connection string logic" --files src/db.pyCreate a robust multi-file bundle (.agent_tasks/my-slug/TASK.md, ACCEPTANCE_CRITERIA.md):
triagecore antigravity-task --prompt "Add pytest coverage for handoff.py" --files tests/test_handoff.py --slug add-testsTriageCore uses pytest to ensure all routing and safety logic operates completely offline without network calls. Tests actively mock the backend requests module to verify payload structures across Ollama, vLLM, and llama.cpp presets.
TriageCore is a research-stage orchestration and workflow-control project. It is designed to support privacy-aware routing, local-first execution, human approval gates, and auditable task records.
TriageCore is not a certified safety system, compliance system, medical device, legal decision system, emergency dispatch system, or critical infrastructure control system. It does not guarantee safe, lawful, complete, accurate, or compliant outcomes.
Operators are responsible for validating outputs, configuring policies, reviewing logs, and ensuring that any deployment satisfies applicable legal, security, privacy, safety, and sector-specific requirements.
pip install pytest
pytest tests/MIT