Skip to content

vsov/strata-k

Repository files navigation

Strata/K

CI License: MIT OR Apache-2.0

A stratified, semiring-parameterized Datalog and its execution engine — the symbolic core of a neuro-symbolic logic system. This repository is the CPU reference stack: frontend, IR, checker, interpreter, and reference implementations of the probabilistic (режим B) and answer-set layers. The GPU engine and the knowledge-compilation / incremental optimizations are later phases (see ARCHITECTURE.md).

Status. The executable core runs end-to-end: text → parse → check → Core-IR → interpret → result. Positive Datalog, stratified negation, and aggregates over the Bool and Trop (tropical, min-plus) semirings; exact probabilistic queries (?prob, distribution semantics) and an answer-set (stable model) solver for @asp modules. Cross-checked against Soufflé and fuzzed (10k random programs, naive vs semi-naive and vs Soufflé). Everything beyond this — the GPU backend, Prov/Prov_k provenance, neural, @terms, ?gradparses into valid IR and returns a stable "not implemented in Phase 0" diagnostic.

Quick start

cargo build
cargo run -p strata-cli -- run examples/tc.strata

examples/tc.strata computes a transitive closure:

% transitive closure
domain node.
pred edge(node, node): Bool.
pred path(node, node): Bool.

path(X, Y) :- edge(X, Y).
path(X, Z) :- edge(X, Y), path(Y, Z).

edge(a, b).
edge(b, c).
edge(c, d).
$ strata run examples/tc.strata
edge(a, b)
edge(b, c)
edge(c, d)
path(a, b)
path(a, c)
path(a, d)
path(b, c)
path(b, d)
path(c, d)

The language

Grammar priority: optimized for LLMs to write, human readability second. Familiar Datalog :-, Prolog lexical convention (variables Uppercase, constants lowercase), and mandatory predicate signatures so a typo becomes a compile error, not a silently empty relation.

pred edge(node, node): Bool.          % a predicate signature: types + semiring
5 :: edge(a, b).                       % a tropical (weighted) fact
reach(X, Z) :- edge(X, Y), reach(Y, Z). % a recursive rule
unreach(X) :- node(X), not reach(X).    % stratified negation
outdeg(X, count<Y>) :- edge(X, Y).      % an aggregate
input edge from "edges.tsv".            % load facts from a Soufflé-compatible TSV
Feature Notes
Semirings Bool (boolean Datalog), Trop (min-plus: shortest paths, Viterbi)
Weights Trop weights are i64 with a distinct +∞ — comparisons are bit-exact
Negation not p(..) — stratified in the deductive core; unstratified under @asp
Aggregates min/max/sum/count (prob_or reserved); non-recursive
Probabilistic 0.87 :: fact. + ?prob q(..) → exact marginals (distribution semantics)
Answer sets @asp. module → stable models via a reference solver
EDB inline facts, or input p from "file.tsv"
$ strata run examples/prob.strata          # ?prob path(a, c)
0.625 :: path(a, c)

$ strata run examples/asp.strata           # @asp in/out choice over {x, y}
Answer 1: {in(x), in(y), node(x), node(y)}
Answer 2: {in(x), node(x), node(y), out(y)}
Answer 3: {in(y), node(x), node(y), out(x)}
Answer 4: {node(x), node(y), out(x), out(y)}

The full language is described in docs/language.md; the formal grammar is in docs/grammar.ebnf.

CLI

strata check <file.strata> [--error-format=text|json]   # parse + type-check, no run
strata run   <file.strata> [--semi-naive]               # evaluate and print relations
strata fmt   <file.strata> [--check]                    # canonical formatter
strata ir    <file>        --to json|surface            # convert surface <-> JSON IR

Diagnostics carry a stable code (E0xxx front-end, E1xxx checker), a source span, and (where possible) a machine-applicable fix:

$ strata check bad.strata
error[E1001]: predicate `edge` is used but never declared
  --> 2:1
   | path(X, X) :- edge(X, X).
   | ^^^^^^^^^^^^^^^^^^^^^^^^^

Exit codes: 0 ok · 1 diagnostics · 2 usage · 4 runtime fault.

Two representations

The High-IR JSON document is the source of truth; the surface syntax is a canonical projection of it (strata ir converts both ways, and fmt is parse → print). An LLM can author either. The JSON Schema is published at schema/high-ir.schema.json; the encoding convention is in docs/ir-encoding.md.

Architecture

A Cargo workspace of six crates, layered so the base has no sibling dependencies:

strata-ir      IR data model (High-IR + Core-IR), symbol dictionary, diagnostics,
               JSON schema, tropical weight — the shared base (no sibling deps)
strata-front   lexer, parser (surface → High-IR), canonical printer, fmt, E0xxx diagnostics
strata-check   dependency graph, stratification, type/semiring checks (table 2.4),
               normalization High-IR → Core-IR, E1xxx diagnostics
strata-eval    the reference interpreter over Core-IR — naive T_P, semi-naive,
               and exact probabilistic marginals (режим B); the differential oracle
strata-asp     the reference answer-set solver (grounding + stable models)
strata-cli     the `strata` binary

strata-front and strata-check are siblings that both depend only on strata-ir. The GPU backend (later phase) will sit beside strata-eval, consuming the same Core-IR. See ARCHITECTURE.md for the full picture and CONTRIBUTING.md to build and test.

Correctness

The reference interpreter is the oracle. It ships two engines — the obviously-correct naive T_P fixpoint and a semi-naive delta engine — that are cross-checked against each other, and against Soufflé for the Bool fragment:

cargo test                                     # unit + integration + corpus diffs
cargo test -p strata-eval --test fuzz          # naive == semi-naive over 10k random programs
cargo test -p strata-cli  --test souffle_diff  # our engine vs Soufflé (needs `souffle`)
STRATA_SOUFFLE_FUZZ_N=10000 cargo test -p strata-cli --test souffle_diff fuzz_bool_vs_souffle

Trop is validated against an independent shortest-path oracle rather than Soufflé. If souffle is not installed, the differential tests skip cleanly. Probabilistic queries are computed by exact possible-world enumeration (the obviously-correct режим-B reference); answer sets by the Gelfond–Lifschitz reduct — both the slow, exact oracles a compiled/GPU method must later match.

The book

book/ holds Programs That Know Why — a short book on logic programming in the age of LLMs and the design of Strata/K, built with mdBook. 📖 Read it online: https://vsov.github.io/strata-k/ Work in progress. Every runnable listing lives in examples/book/ and runs under the current strata CLI (CI-checked — see crates/strata-cli/tests/book_listings.rs). Text is licensed CC BY-SA 4.0; example code is MIT/Apache like the rest of the repo.

mdbook serve book   # if you have mdBook installed

Documentation

License

Licensed under either of

at your option. The book text under book/ is licensed CC BY-SA 4.0; its example code follows the repo's MIT/Apache dual license.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

A stratified, semiring-parameterized Datalog and its reference engine — the symbolic core of a neuro-symbolic logic system. Bool/Trop semirings, stratified negation, aggregates, exact probabilistic (?prob) queries, and a stable-model (@asp) solver.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages