Sanitize PII before it hits an LLM. Replace it with placeholders, get the original back on the way out.
Works with any LLM backend — OpenAI, Anthropic, Gemini, Mistral, Llama, local models, your own inference server. The core nullpii library is provider-agnostic: you call sanitize() before your existing API call, then restore() on the response. The @lbroth/nullpii-gateway package is just a ready-made HTTP proxy for the Anthropic Messages API — handy with Claude Code, but optional. For anything else, drop the lib in wherever you call your model.
🧪 Hobby / experiment. A nights-and-weekends project, not a product. No SLA, no roadmap commitments, no enterprise pitch. If it helps you, great. If you find a bug, file an issue.
npm install nullpii onnxruntime-nodeNode ≥ 22. First run downloads the model (~1.2 GB) into ~/.cache/nullpii/. Pre-warm with npx nullpii prefetch.
import { sanitize, restore, wrapForLLM } from 'nullpii';
const safe = await sanitize('Email John Smith at john@acme.io about SSN 123-45-6789');
// safe.sanitized → 'Email {{PII_PRIVATE_PERSON_0_…}} at {{PII_PRIVATE_EMAIL_0_…}} about SSN {{PII_ACCOUNT_NUMBER_0_…}}'
// Optional: prefix prompt with the built-in preservation hint
const prompt = wrapForLLM(safe, 'Translate to Italian');
// … your LLM call here — OpenAI, Anthropic, Gemini, Ollama, anything …
// The model only ever sees placeholders. Your real PII never leaves the box.
const back = restore(reply, safe.sessionId);
// back.restored → original textLong-lived engine (e.g. gateway):
import { NullPii } from 'nullpii';
const np = new NullPii({ backend: 'auto' });
const { sessionId, sanitized } = await np.sanitize(text);
const { restored } = np.restore(reply, sessionId);
await np.dispose();Streaming restore — buffers placeholders that straddle SSE chunk boundaries:
import { RestoreStream } from 'nullpii';
const stream = new RestoreStream(np, sessionId);
for await (const chunk of upstreamSse) emit(stream.push(chunk));
emit(stream.end().restored);Placeholders look like {{PII_PRIVATE_PERSON_0_…}}. Each one is bound to the session that minted it, so a placeholder from one conversation can't be restored against another.
A small HTTP proxy that sits in front of the Anthropic API. Your client (Claude Code, the Anthropic SDK, anything that talks to api.anthropic.com) points its baseURL at the gateway and works as before — but the prompts get sanitized before leaving your machine and the response gets restored before it reaches you. Streaming works too.
# 1. boot the gateway (first run downloads the GLiNER model into a named volume)
docker compose -f examples/claude-code/docker-compose.yml up -d
# 2. point Claude Code at it (or any Anthropic SDK)
export ANTHROPIC_BASE_URL=http://localhost:8787
export ANTHROPIC_API_KEY=sk-ant-… # your real key, passed through
# 3. use Claude Code normally
claude "summarise the email I just wrote to John Doe at john@acme.io"Subscription works too: if you're logged into Claude Code with a Pro / Max subscription instead of an API key (claude /login), the OAuth Bearer token is forwarded verbatim. Just set ANTHROPIC_BASE_URL to the gateway and skip ANTHROPIC_API_KEY entirely — same routing, same endpoint, no extra config. Subscription quota applies normally.
Prefer a per-project or per-user config file over exports? Drop the
same vars into Claude Code's settings file — they're picked up
automatically on every claude invocation, no shell wiring needed.
Project-local (checked into the repo, or git-ignored if it holds the key) — .claude/settings.local.json:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:8787",
"ANTHROPIC_API_KEY": "sk-ant-…"
}
}User-global — ~/.claude/settings.json uses the same shape. Project-local wins on conflict. Add .claude/settings.local.json to .gitignore if you keep the API key inline.
The gateway sees the raw prompt, replaces names and emails with placeholders, forwards the cleaned text to api.anthropic.com, then puts the originals back in the response before Claude Code prints them.
Verify it's working by tailing the log — counts only, never PII values:
docker compose -f examples/claude-code/docker-compose.yml logs -f gateway
# {"msg":"anthropic.messages.streamed","replacements":3,"replacementsByLabel":{"private_person":1,"private_email":1,"private_address":1},...}Full walk-through (host-mounted-model variant for air-gapped / pre-release, GPU notes, troubleshooting, multi-replica caveats): examples/claude-code/.
| Label | Examples | Source |
|---|---|---|
private_person |
names | model |
private_email |
emails | model + regex |
private_phone |
int'l + IT / FR / ES / HIPAA-fax domestic | model + regex |
private_address |
street, city, ZIP | model |
private_date |
birth / hire dates | model |
private_url |
http(s)://, www. |
model + regex |
private_ip |
IPv4, IPv6 (RFC 1918 / 5737 / loopback filtered) | regex post-pass |
private_mac |
MAC addresses (broadcast / multicast filtered) | regex post-pass |
private_passport |
US / IT / FR / ES / DE / UK + context-anchored generic (30 countries) | model (zero-shot) + regex post-pass |
private_driver_license |
US per-state + IT / EU per-country (context-anchored) | model (zero-shot) + regex post-pass |
private_vehicle_id |
VIN (ISO 3779 mod-11), plates IT / FR / DE / UK / ES / US | model (zero-shot) + regex (validated) |
private_geolocation |
lat/lon decimal pairs (range-validated) + DMS notation | model (zero-shot) + regex (validated) |
account_number |
IBAN mod-97, cards (Luhn), SSN, MRN, BTC / ETH, DNI / CPF / CF / EIN, Medicare MBI / HIC, NPI, insurance policy, IMEI | model + regex (validated) |
secret |
API keys (AWS / GitHub / OpenAI / Anthropic / Stripe / 30+), JWT, PEM, base64-wrapped PII | regex (50+) + base64 |
Out of scope: things that look like opinions or implications (race, religion, health conditions). Those need a different kind of model — this one only finds explicit text spans.
Add your own via np.addRecognizer({ id, pattern, label, confidence, validate? }). Validator-passing matches (iban97, luhn, base58check, cpf, codiceFiscale, vin, latLonPair) win cross-label dedupe over ML mislabels.
Mac M5 Pro, IoU ≥ 0.5 macro F1 (sklearn-standard — labels with no gt support are excluded, symmetric for every tool). Cap 5,000 / dataset, --parallel-tools 1 fair-serial. 16-dataset matrix at packages/eval/published-bench/matrix.csv.
Two nullpii rows + one upstream-GLiNER row let readers isolate the model from the runtime:
nullpii-bare— the publishedlBroth/nullpiiONNX (project-fine-tuned weights) consumed via the baregliner_v2_predictor: GLiNER decoder + chunking, no recognizer pack, no preprocessor, no base64 decoder, no boundary refine, no never-PII filter. What the HF artifact alone delivers.gliner-onnx-pii-fp32— the unmodified upstreamonnx-community/gliner_multi_pii-v1ONNX, same bare consumer. Baseline before any project fine-tuning.nullpii— the npm package (full runtime): published model + recognizer pack + adversarial preprocessor + base64 decoder + reversible vault.
v0.3.0 bench (M5 Pro CPU, 2026-05-18 + opf 2026-05-20, full 9×16 matrix). OOD macro for nullpii = 0.7784 (presidio-synthetic + isotonic-{en,de,fr,it}-heldout + ai4privacy-300k-heldout + tab-echr).
| Dataset | n | nullpii |
nullpii-bare |
nemotron-pii-raw |
gliner-pii-large-v1 |
gliner-onnx-pii-fp32 |
deberta |
piiranha |
presidio |
opf |
|---|---|---|---|---|---|---|---|---|---|---|
presidio-synthetic |
5,000 | 0.9137 | 0.8487 | 0.7154 | 0.6749 | 0.5254 | 0.5111 | 0.3853 | 0.5511 § | 0.6530 |
isotonic-en-heldout |
1,900 | 0.7197 | 0.5969 | 0.7518 | 0.6662 | 0.5485 | 0.6224 | 0.4124 | 0.4472 | 0.4095 |
isotonic-de-heldout |
2,400 | 0.7297 | 0.6191 | 0.7271 | 0.6325 | 0.5432 | 0.3969 | 0.4112 | 0.3859 | 0.4155 |
isotonic-fr-heldout |
2,800 | 0.7254 | 0.6001 | 0.7276 | 0.6663 | 0.5393 | 0.4824 | 0.4172 | 0.4042 | 0.4257 |
isotonic-it-heldout |
2,200 | 0.7395 | 0.6148 | 0.7273 | 0.6605 | 0.5519 | 0.4509 | 0.4176 | 0.4057 | 0.4420 |
tab-echr ⚠ |
127 | 0.9239 | 0.9275 | 0.6026 | 0.6346 | 0.6463 | 0.2908 | 0.3163 | 0.7761 | 0.4166 |
nemotron-pii-test ⚠ |
5,000 | 0.8063 | 0.6814 | 0.9286 ‡ | 0.7675 | 0.7352 | 0.4153 | 0.3286 | 0.4236 | 0.4005 |
ai4privacy-400k ⚠ |
5,000 | 0.6410 | 0.6339 | 0.5962 | 0.6624 | 0.6256 | 0.4508 | 0.9532 ‡ | 0.3897 | 0.6367 |
ai4privacy-300k ⚠ |
5,000 | 0.7094 | 0.5303 | 0.6554 | 0.3930 | 0.4691 | 0.3015 | 0.3203 | 0.5553 | 0.4583 |
ai4privacy-300k-heldout |
5,000 | 0.6966 | 0.5241 | 0.6608 | 0.4306 | 0.5131 | 0.2183 | 0.3266 | 0.4882 | 0.4630 |
argilla-pii |
2,096 | 0.6465 | 0.5549 | 0.6820 | 0.6035 | 0.5047 | 0.5694 | 0.4149 | 0.4506 | 0.3939 |
isotonic-en ⚠ |
5,000 | 0.7428 | 0.6226 | 0.7720 | 0.6784 | 0.5573 | 0.6216 | 0.4235 | 0.4535 | 0.4178 |
isotonic-de ⚠ |
5,000 | 0.7293 | 0.6300 | 0.7337 | 0.6510 | 0.5556 | 0.4069 | 0.4144 | 0.3913 | 0.4243 |
isotonic-fr ⚠ |
5,000 | 0.7199 | 0.5970 | 0.7340 | 0.6714 | 0.5503 | 0.4728 | 0.4137 | 0.4029 | 0.4233 |
isotonic-it ⚠ |
5,000 | 0.7306 | 0.6215 | 0.7225 | 0.6647 | 0.5697 | 0.4531 | 0.4137 | 0.4052 | 0.4333 |
nullpii-internal-bench ⚐ self-authored, regression cell |
2,361 | 0.4228 | 0.3090 | 0.3065 | 0.2851 | 0.2936 | 0.1711 | 0.1669 | 0.1436 | 0.2488 |
Legend:
- bold = best F1 in the row
- ⚠ = the dataset overlaps the training distribution of at least one competitor in the row — read those cells with caution
- ⚐ = in-distribution for
nullpiiitself — regression cell, not counted in the OOD headline. The held-out OOD macro (0.7784) is computed overpresidio-synthetic+isotonic-{en,de,fr,it}-heldout+ai4privacy-300k-heldout+tab-echronly. Thenullpii-internal-benchrow sits at the bottom of the table and is shown only as a regression watcher across releases — read it that way. - ‡ = competitor benched on its own training distribution (best-case self-report)
- § = Presidio benched on its own evaluator dataset (best-case self-report)
How long a single sanitize() call takes against the published lBroth/nullpii ONNX, M5 Pro CPU, Node 24:
| Input size | p50 | p95 | p99 |
|---|---|---|---|
| 100 chars | 23 ms | 25 ms | 27 ms |
| 1,000 chars | 95 ms | 113 ms | 114 ms |
| 10,000 chars | 938 ms | 972 ms | 1,122 ms |
Cold start (first call, ONNX load included): ~756 ms. Numbers from packages/eval/scripts/bench_latency_public.mjs against the public runtime — no LoRA, no router, just new NullPii({ backend: 'cpu' }).
Methodology disclosures (read these before drawing conclusions):
- Threshold parity — every GLiNER-family tool (
nullpii,nullpii-bare,gliner-pii-large-v1,gliner-onnx-pii-fp32) runs at threshold 0.5.nemotron-pii-rawruns at 0.3 per its upstream model card which prescribes 0.3 as the production decision boundary. Running nemotron at 0.5 parity would disadvantage it relative to its published characteristic (~0.07 F1 drop avg across the matrix). Both thresholds disclosed for reader mental adjustment. - DeBERTa aggregation —
firststrategy, A/B-logged againstsimpleinadapters.py. No tuning, just picking the one HuggingFace ships as the documented default. - Per-tool chunking — each tool uses its upstream maintainers' recommended chunker (
gliner_multi_pii-v1model card → 140-word/30 fornullpii;glinerpackage default → 1400-char/200 for the upstream GLiNERs; piiranha model-card §Limitations → 1000-char/200 to dodge 256-token truncation). Full breakdown + rationale inpackages/eval/README.md. This is NOT hand-tuned in nullpii's favour: forcing a single normalised window would silently truncate piiranha, break DeBERTa's continuation handling, and drop Presidio's NER+anchor coordination — every baseline would lose F1.
Reproduce:
# CPU run — portable, slower; matches the M5 Pro headline numbers above.
NULLPII_MODEL_DIR=/path/to/lBroth-nullpii \
python -u packages/eval/scripts/bench_full.py \
--tools nullpii,nullpii-bare,deberta,piiranha,presidio,gliner-pii-large-v1,gliner-onnx-pii-fp32,nemotron-pii-raw,openai-privacy-filter \
--datasets all --backend cpu \
--out-dir packages/eval/results/$(date +%Y%m%d)-bench
# CUDA run — bench_full.py default; what RunPod 4090 / 5090 nodes use.
# `nullpii` itself stays on CPU (onnxruntime CUDA EP can't run the
# GLiNER MoE node on SM_120); transformer baselines benefit from GPU.
NULLPII_MODEL_DIR=/path/to/lBroth-nullpii \
python -u packages/eval/scripts/bench_full.py \
--tools nullpii,nullpii-bare,deberta,piiranha,presidio,gliner-pii-large-v1,gliner-onnx-pii-fp32,nemotron-pii-raw,openai-privacy-filter \
--datasets all \
--out-dir packages/eval/results/$(date +%Y%m%d)-benchWhere the preprocessor + recognizer pack pulls PII the model alone would miss:
| Surface | Input | Detected as |
|---|---|---|
| base64-wrapped secret | (base64-encoded) c2stYW50LWFwaTAzLWFCY0RlRmcw… |
sk-ant-api03-aBcDeFg012345… (Anthropic key) |
| HTML-entity-encoded secret | sk-ant… |
sk-ant-… (Anthropic key) |
| double-URL-encoded email | bob.jones%2540company.io |
bob.jones@company.io (email) |
| zero-width-obfuscated address | 221B Baker StU+200BreU+200Bet U+200BLondon |
221B Baker Street London (address) |
| spaced-out email | u s e r . 1 2 3 @ g m a i l . c o m |
user.123@gmail.com (email) |
| Cyrillic-homoglyph email | pаyments@bank.com (а = U+0430) |
payments@bank.com (email) |
| fullwidth ASCII email | USER.NAME@example.com |
USER.NAME@example.com (email) |
| Italian IBAN in prose | IT60X0542811101000001023456 |
IT60X0542811101000001023456 (account_number, mod-97 verified) |
| Stripe live key in code | api_key = 'sk_live_<24+ alphanumeric chars>' |
flagged as secret (Stripe sk_live_ prefix + length check). Real example omitted to avoid tripping GitHub push-protection scanners on the docs themselves. |
Roughly five passes: Unicode normalisation, base64 decoding, percent + HTML-entity decoding, zero-width strip, regex pack.
new NullPii({ backend: 'cpu' }); // ['cpu']
new NullPii({ backend: 'cuda' }); // ['cuda', 'cpu'] — NVIDIA, falls back on CPU
new NullPii({ backend: 'mps' }); // ['coreml', 'cpu'] — Apple Silicon
new NullPii({ backend: 'auto' }); // currently 'cpu'CPU thread tuning: pass intraOpNumThreads (parallelism inside a single op) and interOpNumThreads (parallelism across ops) to new NullPii({...}). Both are forwarded to the underlying ONNX Runtime session config.
- It's not a HIPAA tool. Medical diagnoses, dosages, that kind of thing — out of scope.
- IPs and MAC addresses are caught by regex, not the model.
- Inputs over 1 MB are refused — chunk them yourself.
- Detection is best-effort. Don't make it your only privacy control.
- Detection runs entirely on your machine. The only network call is the one-time model download.
- The vault lives in memory and goes away when you call
dispose(). - Logs never contain PII — just counts and short ids. See SECURITY.md.
Apache-2.0 — see LICENSE and NOTICE. Model weights have their own licence (see Credits).
CHANGELOG.md— release notesCONTRIBUTING.md— dev setup, architecture rules, release checklistpackages/eval/README.md— bench harnesspackages/eval/datasets/README.md— dataset schema + licences
The detection model builds on urchade/gliner_multi_pii-v1 (GLiNER, Zaratiana et al., NAACL 2024, mDeBERTa-v3 base). Model artifact + attribution: lBroth/nullpii. Licence notes: NOTICE.
