Skip to content

feat: close --remote coverage gaps + ledger archive + auth-scoping fixes#1237

Open
bplatz wants to merge 9 commits into
fix/explain-time-travelfrom
feature/cli-drop-ledger
Open

feat: close --remote coverage gaps + ledger archive + auth-scoping fixes#1237
bplatz wants to merge 9 commits into
fix/explain-time-travelfrom
feature/cli-drop-ledger

Conversation

@bplatz
Copy link
Copy Markdown
Contributor

@bplatz bplatz commented May 11, 2026

Based on #1236

Brings remaining --remote commands to feature parity, lands the local + remote variants of fluree export --format ledger (the .flpack archive workflow), and fixes several auth-scoping / tracked-alias issues uncovered along the way. No protocol or wire-format changes — every endpoint these commands hit either already existed on the server or was added in earlier branches.

What's new for users

--remote on previously-local-only commands

Command Endpoint Auth bracket
fluree drop <name> --remote <name> POST /drop admin
fluree log <ledger> --remote <name> GET /log/*ledger (new) read
`fluree context get set --remote ` GET/PUT /context/*ledger
fluree history --remote <name> POST /query/*ledger (ledger-scoped) read
fluree create <ledger> --remote <name> POST /create admin
fluree export <ledger> --remote <name> (RDF) POST /export/*ledger (new) admin

All follow the same three-mode dispatch the existing --remote commands use: explicit --remote <name> → named remote, otherwise auto-route through a locally running fluree server start if server.meta.json is present, otherwise direct local execution. --direct skips auto-routing.

Ledger-archive export (.flpack)

docs/cli/server-integration.md claimed fluree export --format ledger -o mydb.flpack worked. The export side was never wired up; only fluree create --from <file>.flpack (import) actually existed. This branch closes that:

  • Local: fluree export mydb --format ledger -o mydb.flpack calls a new Fluree::archive_ledger API that streams pack frames through any AsyncWrite and appends a phase: "nameservice" manifest frame so fluree create --from <file>.flpack can reconstruct head pointers. --no-indexes for a smaller archive that reindexes on import. TTY stdout is refused.
  • Remote: fluree export mydb --remote origin --format ledger -o mydb.flpack fetches the remote NsRecord, streams POST /pack/*ledger, and swaps the terminal End frame for the synthesized nameservice manifest on the fly. Resulting .flpack is byte-compatible with a locally-generated archive — fluree create --from doesn't care which side produced it.

fluree query --remote --at <t> (time-travel) auth fix

The previous remote time-travel path posted to the connection-level /query endpoint, where the server's can_read check used body.from for the ledger id. With from: "mydb:main@t:5", scoped read tokens (fluree.ledger.read.mydb:main) failed because they don't match the time-suffixed form. This branch routes all time-travel cases through the ledger-scoped POST /query/{ledger} instead — path drives auth, body's from / SPARQL FROM drives time-travel resolution.

Applies to JSON-LD --at, SPARQL --at, JSON-LD --at --explain, and SPARQL --at --explain (the last two via the server fix from the prerequisite PR).

Notable engineering choices

  • Tracked-alias resolution for --remote. fluree-db-cli/src/context.rs::build_remote_mode now canonicalizes the ledger alias via to_ledger_id (so mydbmydb:main on the URL path). Without this, scoped tokens 404 because the path differs from the auth identifier. The --format ledger remote archive specifically also looks up the tracked-config store: when <alias> is tracked at the same remote, it archives the upstream's tracked.remote_alias rather than the local alias literally.

  • Archive splicer error handling. RemoteLedgerClient::archive_ledger_to_writer (extracted as splice_archive_stream for testability) is careful to distinguish PackError::Incomplete(_) (need more bytes) from fatal pack-decoder errors. Previously every decoder error was swallowed as "need more", so a corrupt FPK1 magic or oversize payload would buffer until EOF and surface as "ended before End frame". Five unit tests cover End-frame substitution, chunk-boundary splits inside frames, manifest field selection on --no-indexes, server Error frame propagation, and prompt rejection of bad magic.

  • create --remote doesn't require a project .fluree/. Falls back to global config ($FLUREE_HOME or platform default) for remote registration lookups, so the command works from any directory. The local-only fluree create still requires a project .fluree/ so new ledgers land in a discoverable place.

  • fluree drop --remote active-ledger handling. Explicit --remote <name> never touches local state. Auto-route (no --remote, server detected) operates against the same on-disk storage as --direct, so it also clears the local active-ledger pointer when it matched the dropped name.

  • History prefix expansion is client-side. fluree history --remote ex:alice ... expands the compact IRI against the project's stored prefix map before the request leaves the CLI, so the server never has to consult the local prefix table. The body still ships its @context (derived from local prefixes) for response display.

Limitations / known caveats

  • fluree export --format ledger --remote requires storage permissions. Both /storage/ns/:ledger-id and /pack/*ledger sit in the replication-grade bracket (fluree.storage.*), same auth as fluree clone / pull. Without those permissions the server returns 404 to avoid existence leaks.
  • fluree export --remote (RDF) is admin-protected. RDF export reads from the binary index without per-flake policy filtering, so it lives alongside /create, /drop, /reindex rather than the data-read bracket. Adding policy-filtered streaming export would let it move to read-auth in the future.
  • fluree export --format ledger doesn't support --at / --all-graphs / --graph / --context*. Archives capture the current head; the other flags apply only to RDF formats.

Tests

  • cargo test -p fluree-db-cli: 41 lib + 73 integration tests pass. The 5 new lib tests cover the .flpack remote archive splicer (End substitution, chunk boundaries, index-field gating, error-frame propagation, magic validation).
  • cargo test -p fluree-db-server --test integration explain: 4 explain tests pass (from the prerequisite PR).
  • cargo clippy -p fluree-db-cli --all-features --all-targets -- -D warnings: clean.

Docs

docs/cli/server-integration.md is the canonical reference for implementers building custom servers against the CLI. This PR adds/updates:

  • ### fluree create <ledger> --remote <name> section
  • ### fluree context get|set --remote section
  • ### fluree history --remote section (with note about ledger-scoped routing for token auth)
  • ### fluree log --remote section
  • ### fluree export --remote section (RDF)
  • ### fluree export --format ledger section (local + remote modes)
  • "Remote time travel (--at)" callout under the data-API section
  • "Remote --at --explain" callout following the time-travel note
  • ## Commit Log Contract (full schema + required server semantics)
  • ## RDF Export Contract (request body, content-types, error responses)
  • Expanded "Data API" intro list, terminology cleanup on the history section, more precise active-ledger pointer rules under fluree drop --remote
  • Validation script gains log, export (RDF + ledger), context get/set, history, query --at, query --at --explain, create --remote, drop --remote lines

docs/operations/pack-archive-restore.md is updated to drop the old "no dedicated CLI command" stub and document the local + remote archive flows.

docs/api/endpoints.md gains GET /log/*ledger and POST /export/*ledger entries.

bplatz added 9 commits May 11, 2026 05:59
Wires the existing POST /drop endpoint into the CLI's drop command so
remote/auto-routed drops work the same way as list, reindex, and the
other admin operations. The server-side endpoint already handled the
ledger -> graph-source fallback; this just exposes it through the CLI.
Adds `fluree log --remote` and `fluree export --remote` so users can
browse a remote's commit history and export RDF directly without
cloning. Both follow the same three-mode dispatch (explicit remote,
auto-route via local server, local execution) used by list/reindex/
iceberg drop.

New endpoints:
- GET /v1/fluree/log/*ledger — paginated CommitSummary list, read-auth.
- POST /v1/fluree/export/*ledger — RDF export (Turtle/NT/NQuads/TriG/
  JSON-LD), admin-protected. Export bypasses per-flake policy filtering
  today, so it lives alongside /create, /drop, /reindex rather than the
  data-read bracket of /query and /show.

Fix drop auto-route: when `fluree drop` ran via auto-route to a local
server, the active-ledger pointer was not cleared, leaving CLI state
pointing at a deleted ledger. Explicit `--remote <name>` still leaves
local state untouched (remote storage is separate).

Docs: contracts in server-integration.md plus full endpoint entries in
api/endpoints.md.
`docs/cli/server-integration.md` has long claimed `fluree export
--format ledger -o mydb.flpack` exists, and `fluree create --from
<file>.flpack` already imports the format, but the export side was
never wired up — `parse_format` only accepted RDF formats and there
was no `-o` flag. This closes that gap.

API:
- `pack::stream_archive` mirrors `stream_pack` but injects a
  `phase: "nameservice"` manifest frame before End. Unlike
  `stream_pack`, on producer failure it drops the sender and returns
  `Err(message)` instead of emitting an Error frame so the caller
  never persists a partial archive.
- `Fluree::archive_ledger(ledger_id, include_indexes, writer)`
  resolves the ledger record, sources the manifest *and* pack heads
  from the same `LedgerView` snapshot (so they cannot disagree under
  cache lag), and writes frames to any `AsyncWrite` sink. The
  manifest's `index_head_id` / `index_t` are emitted only when index
  artifacts are actually archived, so `--no-indexes` no longer
  produces an archive that points at missing index data.

CLI:
- `fluree export` accepts `--format ledger` (alias `flpack`) and a new
  `-o, --output <FILE>` flag that works for any format. `--no-indexes`
  produces a smaller archive that the importer reindexes on load.
- Refuses TTY stdout for binary archives and rejects `--remote`,
  `--at`, `--all-graphs`, `--graph`, and `--context*` for
  `--format ledger` since they don't apply to whole-ledger archives.
- On producer-side archive failure, the partial output file is
  removed before the error is returned.

Docs:
- `docs/cli/server-integration.md`: `fluree export --format ledger`
  section now reflects what's implemented.
- `docs/operations/pack-archive-restore.md`: replaces the "no
  dedicated command" stub with the actual CLI invocation; the Rust
  API section continues to cover non-CLI use cases (S3 upload, etc.).

Round-trip verified: `fluree create flptest && fluree insert ... &&
fluree export flptest --format ledger -o flptest.flpack && fluree
create restored --from flptest.flpack && fluree query restored ...`
returns the original triple. Same with `--no-indexes`.

Remote archive (`--format ledger --remote <name>`) is intentionally
deferred: it requires fetching the remote nameservice record and
intercepting the `/pack` stream's End frame to inject the manifest.
Closes the remaining gaps from the original `--remote` audit:

- `fluree context get|set --remote` rides the existing `GET`/`PUT
  /context/*ledger` endpoints. New `RemoteLedgerClient::get_context` /
  `set_context` methods, three-mode dispatch in `commands/context_cmd.rs`.
- `fluree history --remote` posts the existing JSON-LD history body to
  `POST /query/{ledger}` (ledger-scoped, not connection-level) so
  scoped read tokens authorize. Compact-IRI expansion still happens
  client-side; the body's `@context` is preserved for response display.
- `fluree create <ledger> --remote <name>` calls `POST /create` for the
  empty-ledger case. Refuses combinations with `--from`/`--memory`
  (those need local data ingestion) and points at `fluree publish` for
  the create-and-push workflow. Falls back to global config so the
  command works without a project-local `.fluree/`.

Also addresses several reviewer findings from this branch:

- `fluree query --remote --at <t>` now uses ledger-scoped query/explain
  endpoints (`POST /query/{ledger}`, `POST /explain/{ledger}`). The
  path drives `can_read`, the body's `from`/SPARQL `FROM` carries the
  `@t:N` suffix for snapshot resolution. Posting to the connection-
  level endpoint forced auth to derive the ledger ID from `from` and
  rejected scoped tokens.
- `build_remote_mode` canonicalizes `ledger_alias` via `to_ledger_id`
  before storing as `LedgerMode::Tracked.remote_alias`, so one-shot
  `--remote` always sends the full `name:branch` form on the URL path.
  A token scoped to `mydb:main` would 404 if we sent `mydb`.
- `--at --explain --remote` is refused outright rather than silently
  returning a HEAD-snapshot plan: the server's explain handler loads
  the ledger at HEAD regardless of any time-travel `from`. Run with
  `--direct` for a local time-travel explain, or drop `--at` to
  explain the HEAD plan against the remote.

Open server-side items (out of scope here):

- Both `/explain` and `/explain/{ledger}` need to honor body's `from`
  time-travel (delegate to the same `execute_dataset_query`-style
  path the regular query uses). Once that lands, the CLI's
  `--at --explain --remote` bail-out can be lifted.
- Ledger-scoped `/explain` rejects SPARQL `FROM/FROM NAMED` outright;
  relaxing to accept same-ledger time-travel `FROM` is needed for the
  SPARQL flavor of the same fix.
`fluree export --format ledger -o file.flpack` already worked locally
(via `Fluree::archive_ledger`); this lifts the remote sub-gap so the
same command also archives remote ledgers, e.g. cold-archiving a
production ledger to local disk.

Implementation:

- `RemoteLedgerClient::archive_ledger_to_writer` fetches the remote
  pack stream via the existing `fetch_pack_response` (`POST /pack/...`),
  decodes it frame-by-frame as bytes arrive, forwards Header/Data/inner
  Manifest frames to the user's writer verbatim, and **swaps the
  terminal End frame** for a synthesized `phase: "nameservice"`
  manifest + End. The manifest is built from the supplied `NsRecord`
  so the on-disk byte stream is byte-compatible with
  `Fluree::archive_ledger`'s local output. Server `Error` frames are
  surfaced as a `RemoteLedgerError` and stop the copy without writing
  the End — the CLI cleans up the partial file.

- `commands/export.rs::run_ledger_archive_remote` orchestrates the
  remote path: fetch the NsRecord (so we know the head CIDs and `t`
  values), build a `PackRequest` mirroring `Fluree::archive_ledger`'s
  index policy (commits-only when `--no-indexes` or the remote has no
  index root), then drive the streaming copy. On error the partial
  output file is removed.

Both endpoints sit in the replication-grade auth bracket
(`fluree.storage.*`), same as `fluree clone` / `pull`. Without those
permissions the server returns `404 Not Found` for the NsRecord lookup
to avoid existence leaks; the CLI surfaces this as
`not found: ledger '...' not found on remote '...'`.

Docs:

- `server-integration.md`: replaces the "remote not yet supported"
  caveat with a section spelling out the two endpoints, the auth
  bracket, and the byte-compat guarantee.
- `pack-archive-restore.md`: drops the "Local-only today" note and
  adds the `--remote` example. Rust API section continues to cover
  non-CLI flows (S3 upload, etc.).
- Validation script gains an `export --remote ... --format ledger`
  line.
…ases, tests

Three follow-ups on the remote `--format ledger` archive added in the
previous commit:

- Distinguish `PackError::Incomplete` from fatal pack-decoder errors in
  the archive splicer. Previously every decoder error was treated as
  "need more bytes", so a corrupt FPK1 magic, an oversize payload, or
  an invalid frame type would buffer until EOF and surface as a
  misleading "ended before End frame". Now Incomplete loops, every
  other variant returns `InvalidResponse` immediately and the
  max-payload guard actually fires.

- Resolve tracked aliases for `fluree export <alias> --remote <name>
  --format ledger`. If `<alias>` is tracked at `<name>`, archive the
  upstream copy under its `tracked.remote_alias`. Without this, a
  ledger tracked as `local -> upstream:main` would look up
  `local:main` on the remote and 404. Falls back to using the alias
  literally when it isn't tracked or `--remote` points elsewhere —
  matches the existing `resolve_ledger_mode` semantics.

- Split the splicer out as `splice_archive_stream` and
  `build_archive_manifest` so the End-frame substitution and manifest
  synthesis are unit-testable without a live server. Five new tests
  cover: End → manifest+End substitution, chunk boundaries inside
  frames (single chunk vs many small ones produce identical output),
  index fields omitted when `archived_index = false`, server `Error`
  frame surfaced as `ServerError`, and corrupt magic surfaced as
  `InvalidResponse` rather than buffered until EOF.
The defensive refusal added when remote explain silently dropped
time-travel `from` is no longer needed: the server-side fix in
`fix(server): accept time-travel from in explain endpoints` (parent
commit on this branch) accepts the request and routes it through the
dataset-aware explain path.

Both SPARQL `--at --explain --remote` and JSON-LD `--at --explain
--remote` now flow through the same ledger-scoped paths the non-explain
`--at` cases already use:

- SPARQL: injects `FROM <ledger@t:N>` before WHERE, POSTs to
  `/explain/{ledger}` (which now accepts same-ledger time-travel FROM
  rather than rejecting all FROM clauses).
- JSON-LD: injects `from: "ledger@t:N"` into the body, POSTs to
  `/explain/{ledger}`.

Plan content for a given query text is largely independent of `t`
because Fluree maintains a single set of index stats (latest), and
the planner uses them regardless of query `t`. The value here is
consistency with the query path and honoring an explicit request
parameter, not producing meaningfully different plans.

Doc updates in `docs/cli/server-integration.md`: replace the "known
limitation: refused" callout with a note explaining the actual flow
and the stats-singularity reality.
…er detail

- Expand the "Data API" intro list to reflect what's actually supported
  via --remote now (log, history, context, explain, etc.) plus the
  admin operations.
- Drop the "resolve the snapshot" phrasing in the history --remote
  section; Fluree builds a historical *view* at the requested t, not
  a point-in-time snapshot (singular index, view does the
  time-traveling).
- Spell out the active-ledger-pointer behavior on `fluree drop` more
  precisely: explicit --remote leaves local state alone; auto-route
  and --direct both clear the pointer when it matches the dropped
  ledger.
- Add a `fluree query --remote --at --explain` line to the validation
  script to exercise the now-working combination.
`fluree drop <name>` resolves the name as a ledger first and falls
back to a graph source, both locally and against `--remote` (the
server's `/drop` does the same). The CLI's top-line help still said
"Drop (delete) a ledger", giving no hint that the same command works
for an Iceberg/BM25/etc. graph source — users were reaching for
`fluree iceberg drop` instead.

Update the about text and the <NAME> arg help to mention graph
sources, and point at `fluree iceberg drop` as the explicit variant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants