PhiSQL is the declarative query language for PII privacy operations across the Philterd toolkit.
This repository is the home of two things that evolve together:
- The redaction policy schema (
schema/) - the canonical, versioned JSON Schema that defines a valid Phileas redaction policy. It is published tohttps://philterd.ai/schemas/redaction-policy/<version>/schema.jsonand is the contract that PhiSQL compiles to and that Phileas executes against. - PhiSQL - the authoring language that compiles to that schema: its specification (
spec/) and the reference parser/compiler (reference/).
They live in one repository because they change together: adding an entity type or strategy means updating the schema and PhiSQL's grammar and catalog in the same pull request, and CI validates every PhiSQL example against the schema under schema/.
Important
PhiSQL v1.0 is stable. The grammar and semantics of the v1.0 surface are frozen, and conforming implementations may claim conformance to v1.0. Subsequent changes follow the versioning policy: additive features land in minor versions, breaking changes require a new major version.
PhiSQL v1.0 is a complete authoring surface for the Phileas redaction policy schema. Discovery, monitoring, and cross-tool query verbs are scoped for later versions.
The spec is the set of machine-readable artifacts under spec/. There is no prose specification document; the artifacts are the spec.
There are three reference implementations under reference/, which produce identical Phileas JSON from the same input:
reference/java/generates a Java parser fromspec/v1.0/grammar/PhiSQL.g4at build time. It is published asai.philterd:phisqland consumed by other Philterd projects (Phileas, Phinder, the future PhiSQL CLI).reference/python/is a Python parser and compiler. Its parser is generated fromspec/v1.0/grammar/PhiSQL.g4with ANTLR (committed underphisql/_generated/, regenerated byscripts/generate_parser.sh).reference/dotnet/is a .NET 10 / C# parser and compiler (published asPhilterd.PhiSql). Like the Python reference, its parser is generated fromspec/v1.0/grammar/PhiSQL.g4with ANTLR (committed underPhiSql/Generated/, regenerated byscripts/generate_parser.sh).
All are driven by the catalog YAML under spec/v1.0/catalog/; none keeps a copy of the grammar or catalog.
| Version | Status | Tag |
|---|---|---|
| v1.0 | Stable | v1.0.0 |
The reference implementation versions and the schema version are independent. An implementation may receive bug fixes and improvements without a schema change. Use this table to find the right implementation version for your target schema.
| Schema version | Java jar (ai.philterd:phisql) |
Python package (phisql) |
.NET package (Philterd.PhiSql) |
|---|---|---|---|
| 1.0.0 | 1.0.0 | 1.0.0 | 1.0.0 |
| 1.1.0 | 1.1.0 | 1.1.0 | 1.1.0 |
The redaction JSON policy schema is the canonical execution contract for redaction. PhiSQL is a convenience authoring layer that compiles to it.
PhiSQL source -> Compiler -> Phileas JSON policy -> Phileas runtime
The governance posture:
- The policy json schema leads; PhiSQL follows. Anything PhiSQL can express must be representable as Phileas JSON.
- The runtime does not change. Phileas continues to execute against the JSON schema it already understands.
- The policy library stays in JSON.
philterd/pii-redaction-policiesremains the source of truth for distributable policies. - No proprietary extensions. PhiSQL must not introduce constructs that have no Phileas JSON equivalent.
- Backward compatible forever. Existing JSON policies remain canonical. There is no migration; PhiSQL is additive.
The Phileas JSON schema has no top-level name or description fields; policy identity comes from the JSON filename, and human-readable description lives in a sibling Markdown file. PhiSQL POLICY <name> is optional; when present, its name must match the file basename after hyphen/underscore normalization (the filename can be hipaa-safe-harbor.phisql while the PhiSQL identifier is hipaa_safe_harbor). The full rule is documented in spec/v1.0/catalog/policy.yaml. DESCRIPTION '<text>' compiles to a sibling <basename>.md file.
PERSON is deferred to a later spec version. The Phileas schema replaced person with a pheyes block whose configuration surface is not yet settled; PhiSQL v1.0 exposes FIRST_NAME, SURNAME, and PHYSICIAN_NAME instead.
Two CI workflows enforce that the spec and the reference implementation cannot drift:
-
.github/workflows/validate.ymlrunsscripts/validate_spec.pyto verify (a) the catalog YAML files are well-formed, (b) every Phileas field referenced by the catalogs exists in the canonical Phileas schema, (c) every example JSON file validates against the same Phileas schema, (d) discovery examples reference known findings columns, (e) PhiSQL covers the schema - every schema identifier, strategy, and top-level block is either exposed by PhiSQL or recorded as a deliberate deferral - and (f) PhiSQL covers every schema leaf field, descending into each policy object so no individual property can silently fall behind the schema. -
.github/workflows/reference.ymlbuilds all three reference implementations (Java, Python, and .NET), each of which parses every.phisqlexample file — and the Python and .NET jobs compile and schema-validate them — as part of its test suite. Any grammar change that breaks an example, or any new example an implementation can't handle, fails this job.
Run them locally:
# Spec checks
python3 -m venv .venv
.venv/bin/pip install -r scripts/requirements.txt
.venv/bin/python scripts/validate_spec.py
# Reference implementation (Java)
cd reference/java && mvn verify
# Reference implementation (Python)
cd reference/python && pip install -e ".[test]" && pytest
# Reference implementation (.NET)
cd reference/dotnet && dotnet test PhiSql.TestsThe published spec reference lives at https://philterd.github.io/phisql/. It
is generated from the spec artifacts (spec/<version>/) — the grammar, the
catalog YAML, and the example pairs — so the rendered reference cannot drift
from the artifacts it documents.
scripts/gen_docs.pyrenders the catalogs, grammar, and examples into Markdown pages (run via themkdocs-gen-filesplugin at build time; no generated pages are committed).mkdocs.ymlconfigures the MkDocs Material site, search, and themikeversion selector..github/workflows/docs.ymlchecks the build on every pull request and publishes the versioned site to GitHub Pages (thegh-pagesbranch) on every push tomain.
Build and preview locally:
python3 -m venv .venv
.venv/bin/pip install -r docs/requirements.txt
.venv/bin/mkdocs serve # live preview at http://127.0.0.1:8000Note
Publishing requires GitHub Pages to be enabled for this repository with the
source set to the gh-pages branch (Settings → Pages).
See CONTRIBUTING.md for the RFC process, lifecycle, decision criteria, and versioning policy. RFCs are filed and tracked as GitHub issues using the RFC proposal form (the phisql-rfc label).
Bug fixes, documentation tweaks, and new examples exercising already-specified grammar do not need an RFC - open a normal pull request. Feedback on PhiSQL v1.0 is welcome via GitHub issues.
"PhiSQL" is a registered trademark of Philterd, LLC. The specification is freely readable and implementable, but the name is reserved for implementations that pass the conformance test suite (forthcoming at philterd/phisql-conformance).
The specification, reference implementation, and all artifacts in this repository are licensed under the Apache License, Version 2.0.