Skip to content

Piping/sftopd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SFT OPD

A lightweight operations dashboard and CLI for SFT dataset inspection, training-log monitoring, and pass@K candidate workflows.

Go Version License Go Reference Go Report Card Repository

中文说明 · English

SFT OPD dashboard showing indexed files, row counts, data links, and runtime configuration

Overview

SFT OPD is a small Go application for operating an SFT workspace. It serves a browser UI for browsing dataset artifacts and training tasks, and it ships a CLI for status checks, log inspection, candidate generation, grading, accepted-output merging, and remaining-ID preparation.

The project is intentionally self-contained: the web UI is embedded into the Go binary, the server uses the standard library HTTP stack, and the core dashboard works without a Node.js build step.

Features

  • Dataset cataloging for jsonl, json, csv, parquet, and txt files under a configured data root.
  • JSON and JSONL sample previews with math-friendly rendering in the browser.
  • Training task pages backed by log discovery, tail output, progress counters, pass@K blocks, and task detail views.
  • HTTP JSON APIs for health checks, catalog refreshes, file previews, training tasks, task events, and Asymptote rendering.
  • Go CLI commands for training status, logs, specs, full-coverage launch, grading, merging accepted JSONL, and remaining source ID generation.
  • Embedded static assets and templates, so a built binary can run the UI directly.

Screenshots

SFT OPD data file browser with filterable dataset table and sample preview panel

Data browser with indexed artifacts and preview workflow.

Quick Start

Prerequisites

  • Go 1.22 or newer.
  • A local or remote SFT workspace with dataset files and training logs.
  • Optional: Python and the open-r1 repository when using grading commands that call the math verification filter.

Run From Source

git clone https://github.com/Piping/sftopd.git
cd sftopd

go run ./cmd/sftopd \
  --addr 127.0.0.1:6060 \
  --data-root /data00/open-r1/data \
  --log-root /data00/open-r1/logs \
  --work-dir /data00/sftopd

Open http://127.0.0.1:6060.

For a remote host, forward the port first:

ssh -N -L 6060:127.0.0.1:6060 l20

Build A Binary

go build -o sftopd ./cmd/sftopd
./sftopd --addr 127.0.0.1:6060

Configuration

Server flags can also be supplied through environment variables.

Flag Environment Default Description
--addr SFTOPD_ADDR 127.0.0.1:6060 HTTP listen address.
--data-root SFTOPD_DATA_ROOT /data00/open-r1/data Directory scanned for dataset artifacts.
--log-root SFTOPD_LOG_ROOT /data00/open-r1/logs Directory scanned for training logs.
--work-dir SFTOPD_WORK_DIR /data00/sftopd Runtime working directory for generated files.

CLI

sftopd training status --json
sftopd training logs --tail 120
sftopd training specs
sftopd training launch-full-coverage --dry-run
sftopd training grade --work-dir /data00/sftopd/run-001
sftopd training merge-accepted --inputs a.jsonl,b.jsonl --output accepted.jsonl
sftopd training remaining-ids --source-ids source_ids.txt --accepted-jsonl accepted.jsonl --output remaining.txt

sftopd training specs prints which legacy Python scripts are covered by the Go CLI and which workflows are still pending.

HTTP API

Method Path Purpose
GET /api/health Runtime health and effective configuration.
GET /api/catalog Current catalog snapshot and aggregate stats.
POST /api/catalog/refresh Rescan the data root.
GET /api/files/{id}/preview Preview JSON or JSONL samples.
GET /api/training/tasks List discovered training tasks.
GET /api/training/tasks/{id} Read one task detail view.
DELETE /api/training/tasks/{id} Delete an inactive task artifact.
GET /api/training/tasks/{id}/events Stream task events.
GET /api/training/specs List implemented and pending training workflow specs.
POST /api/render/asy Render supported Asymptote snippets.

Project Layout

cmd/sftopd/                 CLI entrypoint and command flags
internal/app/               HTTP server, routes, templates, embedded web assets
internal/app/web/static/    Browser JavaScript, CSS, and vendored KaTeX assets
internal/data/              Dataset cataloging and preview logic
internal/training/          Training task discovery, grading, generation, merge helpers
docs/                       Design notes, demos, and README screenshots

Development

go test ./...

mkdir -p /tmp/sftopd-demo/data /tmp/sftopd-demo/logs /tmp/sftopd-demo/work
printf '%s\n' '{"source_id":"demo-001","problem":"Find x if x+7=12.","answer":"5"}' \
  > /tmp/sftopd-demo/data/demo.jsonl

go run ./cmd/sftopd \
  --data-root /tmp/sftopd-demo/data \
  --log-root /tmp/sftopd-demo/logs \
  --work-dir /tmp/sftopd-demo/work

See CONTRIBUTING.md for the contributor workflow, screenshot refresh steps, and review checklist.

Status

SFT OPD is an operational tool for an active SFT workflow. The core dashboard, data browser, training monitor, and several training CLI replacements are implemented. Dataset preparation and some static repair or analysis flows are still tracked as pending by sftopd training specs.

License

SFT OPD is licensed under the Apache License 2.0.

About

SFT operations dashboard and CLI for dataset inspection, training log monitoring, and pass@K workflows

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors