Skip to content

feat(nullpii): mask template syntax ranges from PII detection#44

Open
lBroth wants to merge 1 commit into
mainfrom
fix/template-skip-ranges
Open

feat(nullpii): mask template syntax ranges from PII detection#44
lBroth wants to merge 1 commit into
mainfrom
fix/template-skip-ranges

Conversation

@lBroth
Copy link
Copy Markdown
Owner

@lBroth lBroth commented May 21, 2026

Summary

After #43 stripped the PUA sentinels around user-authored {{...}}, the GLiNER model still tagged template variable names (short-kebab-case-slug, user_name, …) as private_person. Vault substitution then nested the PII placeholder inside the user's template, producing bracket-broken output like {{{{PII_PRIVATE_PERSON_*}}}}.

This adds a post-filter that finds the syntactic ranges of common templating dialects and drops any span overlapping one of them:

  • {{ ... }} — Mustache / Handlebars / Vue / Jinja2 expression
  • ${ ... } — JS / TS template literal
  • <% ... %> — ERB / EJS
  • {% ... %} — Jinja2 / Twig statement

Partial overlap counts — a span crossing }} would also corrupt bracket count after vault substitution.

Files

  • src/template-mask.ts (new) — findTemplateRanges, dropSpansInsideTemplates
  • src/nullpii.ts — wire post-applyThresholds, pre-refineSpanBoundaries
  • test/template-mask.test.ts (new) — 11 unit tests
  • test/nullpii.test.ts — integration: recognizer FP inside {{...}} gets masked

Test plan

  • npm test — 283 passing (was 271)
  • npm run typecheck / lint / build — clean
  • Repro user's gateway report case: system prompt with {{short-kebab-case-slug}} survives round-trip without nested-brace garbage

Notes

  • Template ranges computed on the original input — escape is length-preserving so offsets transfer to the escaped coord space spans live in.
  • Non-greedy regex match — nested {{ {{x}} }} resolves on inner pair (acceptable trade-off; outer becomes bare braces which GLiNER no longer sentinel-confuses).

🤖 Generated with Claude Code

After PR #43 stripped the PUA sentinels around user-authored `{{...}}`,
the GLiNER model still tagged template variable names (`short-kebab-case-slug`,
`user_name`, …) as `private_person`. Vault substitution then nested the
PII placeholder inside the user's template, producing bracket-broken
output like `{{{{PII_PRIVATE_PERSON_*}}}}`.

This adds a post-filter that finds the syntactic ranges of common
templating dialects and drops any span overlapping one of them:

  - `{{ ... }}` (Mustache / Handlebars / Vue / Jinja2 expression)
  - `${ ... }` (JS / TS template literal)
  - `<% ... %>` (ERB / EJS)
  - `{% ... %}` (Jinja2 / Twig statement)

Partial overlap counts — a span crossing `}}` would also corrupt
bracket count after vault substitution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant