Skip to content

philterd/phisql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PhiSQL

PhiSQL is the declarative query language for PII privacy operations across the Philterd toolkit.

This repository is the home of two things that evolve together:

  • The redaction policy schema (schema/) - the canonical, versioned JSON Schema that defines a valid Phileas redaction policy. It is published to https://philterd.ai/schemas/redaction-policy/<version>/schema.json and is the contract that PhiSQL compiles to and that Phileas executes against.
  • PhiSQL - the authoring language that compiles to that schema: its specification (spec/) and the reference parser/compiler (reference/).

They live in one repository because they change together: adding an entity type or strategy means updating the schema and PhiSQL's grammar and catalog in the same pull request, and CI validates every PhiSQL example against the schema under schema/.

Status

Important

PhiSQL v1.0 is stable. The grammar and semantics of the v1.0 surface are frozen, and conforming implementations may claim conformance to v1.0. Subsequent changes follow the versioning policy: additive features land in minor versions, breaking changes require a new major version.

PhiSQL v1.0 is a complete authoring surface for the Phileas redaction policy schema. Discovery, monitoring, and cross-tool query verbs are scoped for later versions.

Repository layout

The spec is the set of machine-readable artifacts under spec/. There is no prose specification document; the artifacts are the spec.

There are three reference implementations under reference/, which produce identical Phileas JSON from the same input:

  • reference/java/ generates a Java parser from spec/v1.0/grammar/PhiSQL.g4 at build time. It is published as ai.philterd:phisql and consumed by other Philterd projects (Phileas, Phinder, the future PhiSQL CLI).
  • reference/python/ is a Python parser and compiler. Its parser is generated from spec/v1.0/grammar/PhiSQL.g4 with ANTLR (committed under phisql/_generated/, regenerated by scripts/generate_parser.sh).
  • reference/dotnet/ is a .NET 10 / C# parser and compiler (published as Philterd.PhiSql). Like the Python reference, its parser is generated from spec/v1.0/grammar/PhiSQL.g4 with ANTLR (committed under PhiSql/Generated/, regenerated by scripts/generate_parser.sh).

All are driven by the catalog YAML under spec/v1.0/catalog/; none keeps a copy of the grammar or catalog.

Versions

Version Status Tag
v1.0 Stable v1.0.0

Reference implementation compatibility

The reference implementation versions and the schema version are independent. An implementation may receive bug fixes and improvements without a schema change. Use this table to find the right implementation version for your target schema.

Schema version Java jar (ai.philterd:phisql) Python package (phisql) .NET package (Philterd.PhiSql)
1.0.0 1.0.0 1.0.0 1.0.0
1.1.0 1.1.0 1.1.0 1.1.0

Relationship to the redaction policy schema

The redaction JSON policy schema is the canonical execution contract for redaction. PhiSQL is a convenience authoring layer that compiles to it.

PhiSQL source  ->  Compiler  ->  Phileas JSON policy  ->  Phileas runtime

The governance posture:

  • The policy json schema leads; PhiSQL follows. Anything PhiSQL can express must be representable as Phileas JSON.
  • The runtime does not change. Phileas continues to execute against the JSON schema it already understands.
  • The policy library stays in JSON. philterd/pii-redaction-policies remains the source of truth for distributable policies.
  • No proprietary extensions. PhiSQL must not introduce constructs that have no Phileas JSON equivalent.
  • Backward compatible forever. Existing JSON policies remain canonical. There is no migration; PhiSQL is additive.

The Phileas JSON schema has no top-level name or description fields; policy identity comes from the JSON filename, and human-readable description lives in a sibling Markdown file. PhiSQL POLICY <name> is optional; when present, its name must match the file basename after hyphen/underscore normalization (the filename can be hipaa-safe-harbor.phisql while the PhiSQL identifier is hipaa_safe_harbor). The full rule is documented in spec/v1.0/catalog/policy.yaml. DESCRIPTION '<text>' compiles to a sibling <basename>.md file.

PERSON is deferred to a later spec version. The Phileas schema replaced person with a pheyes block whose configuration surface is not yet settled; PhiSQL v1.0 exposes FIRST_NAME, SURNAME, and PHYSICIAN_NAME instead.

Validation

Two CI workflows enforce that the spec and the reference implementation cannot drift:

  • .github/workflows/validate.yml runs scripts/validate_spec.py to verify (a) the catalog YAML files are well-formed, (b) every Phileas field referenced by the catalogs exists in the canonical Phileas schema, (c) every example JSON file validates against the same Phileas schema, (d) discovery examples reference known findings columns, (e) PhiSQL covers the schema - every schema identifier, strategy, and top-level block is either exposed by PhiSQL or recorded as a deliberate deferral - and (f) PhiSQL covers every schema leaf field, descending into each policy object so no individual property can silently fall behind the schema.

  • .github/workflows/reference.yml builds all three reference implementations (Java, Python, and .NET), each of which parses every .phisql example file — and the Python and .NET jobs compile and schema-validate them — as part of its test suite. Any grammar change that breaks an example, or any new example an implementation can't handle, fails this job.

Run them locally:

# Spec checks
python3 -m venv .venv
.venv/bin/pip install -r scripts/requirements.txt
.venv/bin/python scripts/validate_spec.py

# Reference implementation (Java)
cd reference/java && mvn verify

# Reference implementation (Python)
cd reference/python && pip install -e ".[test]" && pytest

# Reference implementation (.NET)
cd reference/dotnet && dotnet test PhiSql.Tests

Documentation site

The published spec reference lives at https://philterd.github.io/phisql/. It is generated from the spec artifacts (spec/<version>/) — the grammar, the catalog YAML, and the example pairs — so the rendered reference cannot drift from the artifacts it documents.

  • scripts/gen_docs.py renders the catalogs, grammar, and examples into Markdown pages (run via the mkdocs-gen-files plugin at build time; no generated pages are committed).
  • mkdocs.yml configures the MkDocs Material site, search, and the mike version selector.
  • .github/workflows/docs.yml checks the build on every pull request and publishes the versioned site to GitHub Pages (the gh-pages branch) on every push to main.

Build and preview locally:

python3 -m venv .venv
.venv/bin/pip install -r docs/requirements.txt
.venv/bin/mkdocs serve        # live preview at http://127.0.0.1:8000

Note

Publishing requires GitHub Pages to be enabled for this repository with the source set to the gh-pages branch (Settings → Pages).

Contributing

See CONTRIBUTING.md for the RFC process, lifecycle, decision criteria, and versioning policy. RFCs are filed and tracked as GitHub issues using the RFC proposal form (the phisql-rfc label).

Bug fixes, documentation tweaks, and new examples exercising already-specified grammar do not need an RFC - open a normal pull request. Feedback on PhiSQL v1.0 is welcome via GitHub issues.

License

"PhiSQL" is a registered trademark of Philterd, LLC. The specification is freely readable and implementable, but the name is reserved for implementations that pass the conformance test suite (forthcoming at philterd/phisql-conformance).

The specification, reference implementation, and all artifacts in this repository are licensed under the Apache License, Version 2.0.

About

A declarative language for PII redaction and discovery that compiles to Phileas JSON policies.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors