Skip to content

feat: UK postcode / ITL support via ONS NSPL (#7)#135

Merged
bk86a merged 18 commits into
mainfrom
feat/uk-itl-support
Jul 3, 2026
Merged

feat: UK postcode / ITL support via ONS NSPL (#7)#135
bk86a merged 18 commits into
mainfrom
feat/uk-itl-support

Conversation

@bk86a

@bk86a bk86a commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Closes #7. Implements UK postcode → ITL support per the approved design spec and 15-task plan.

What this adds

UK postcodes now resolve to ITL (International Territorial Level) codes — the UK's post-Brexit successor to NUTS — sourced from the ONS NSPL dataset. UK is a parallel data channel: it reuses the same in-memory lookup, SQLite cache, and lookup waterfall as TERCET, with isolated failure handling (an NSPL failure never blocks TERCET serving).

Highlights:

  • code_system field ("NUTS" | "ITL") on /lookup responses — additive, non-breaking. ITL diverges from NUTS-2016 UK at L2/L3, so consumers branch on it.
  • country=GB accepted as an alias for UK (like GR → EL).
  • Outward-code lookup: outward-only input (SW1A) or an unlisted full postcode resolves to the majority-vote ITL3 for that outward code (estimated/medium confidence). Placed before the generic prefix tier — the outward code is the meaningful UK boundary, and a prefix match would otherwise shadow it and return approximate.
  • NSPL loader: doterm filter (live postcodes only), pcds/itl column aliases, dedicated 1 GB extraction cap (the CSV far exceeds the 100 MB TERCET limit), conditional-GET wrapper.
  • ITL region names from the ONS "Names and Codes" CSVs.
  • New config: PC2NUTS_NSPL_URL, PC2NUTS_ITL_NAMES_URLS. UK is not added to settings.json countries (would trigger wasted GISCO URL guesses — per the Codex review on docs: UK/ITL support spec and implementation plan (#7) #52); it registers automatically once NSPL rows land in _lookup.
  • Crown Dependencies (JE/GG/IM) and Gibraltar (GI) are out of scope → 400.

Notes on plan adaptation

The plan predated the Albania Tier 2b resolver, the /resolve geocoding path, and other changes, so line numbers and tier ordinals were stale. The main substantive deviation: the plan placed the outward tier after prefix estimation, but the prefix index already indexes outward-length strings, which made that placement dead and returned approximate. Moved it ahead of the prefix tier to honour the intended estimated/medium behaviour.

Release

Bumps to v1.1.0 (app/__init__.py + CHANGELOG). patterns_version1.3. This release also ships the already-merged Albania completeness work (#118) currently sitting in [Unreleased].

Tests

388 passing (up from 333). New coverage: UK regex/outward extraction, NSPL parsing + doterm filter, outward index majority vote, GB alias, outward lookup tier, code_system tagging, ITL names loader, and NSPL-failure isolation. ruff clean.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6088941b16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/data_loader.py
Comment on lines +1307 to +1309
if outward is not None:
outward_hit = _outward_lookup.get((cc, outward))
if outward_hit is not None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop UK outward misses before prefix fallback

For a valid-format UK postcode whose outward code is not present but shares a shorter prefix with loaded NSPL data (for example SW99 9ZZ when SW1A... exists), this branch falls through to _estimate_by_prefix, so the service can return an arbitrary approximate ITL based only on S/SW. Since the outward code is the meaningful UK boundary, an outward miss for outward_only countries should stop rather than use the generic prefix tier.

Useful? React with 👍 / 👎.

Comment thread app/data_loader.py
Comment on lines +1155 to +1156
if not timed_out and settings.nspl_url:
nspl_count = _load_nspl(client, settings.nspl_url, cache_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Invalidate the DB when NSPL config changes

When PC2NUTS_NSPL_URL is added, removed, or changed while the SQLite cache is still valid, load_data() returns from the cache fast path above before reaching this block because _db_is_valid() only considers the NUTS version and extra_sources_hash. That leaves UK unsupported after enabling NSPL, or keeps stale UK rows after disabling/changing it, until the cache expires or is manually deleted; include the NSPL/ITL-name configuration in cache metadata or bypass the fast path when it changes.

Useful? React with 👍 / 👎.

Comment thread app/data_loader.py
return 0
cache_path = cache_dir / "nspl.zip"
try:
resp = _download_zip_conditional(client, url, {})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse cached NSPL data on transient fetch failures

In a rebuild where TERCET downloads succeed but the NSPL request temporarily fails, _load_nspl() returns 0 and load_data() later saves the rebuilt _lookup without any UK rows, even though this function writes nspl.zip to disk on successful runs and never reads it back. For configured UK deployments, a transient ONS outage can therefore drop UK support until a later successful rebuild; parse the cached NSPL ZIP or preserve old UK rows when the fetch fails.

Useful? React with 👍 / 👎.

- outward miss for outward_only countries no longer falls through to generic
  prefix estimation (would answer from an arbitrary 1-2 char prefix)
- reuse cached nspl.zip when the NSPL fetch transiently fails, so an ONS
  outage does not silently drop UK support on a rebuild
- bust the SQLite fast-path cache when NSPL/ITL-names config changes, so
  enabling/disabling/swapping the URL takes effect without waiting for TTL
@bk86a bk86a force-pushed the feat/uk-itl-support branch from 4890e5e to 9050047 Compare July 3, 2026 20:27
@bk86a

bk86a commented Jul 3, 2026

Copy link
Copy Markdown
Owner Author

Thanks — all three P2 findings were valid and are now fixed in 9050047.

  1. Outward miss falling through to prefix — for outward_only countries, an outward-code miss now returns None instead of falling through to _estimate_by_prefix. The outward code is the authoritative UK boundary; a miss means the code isn't in NSPL, so answering from an arbitrary 1–2 char prefix (e.g. SW for an unknown SW99) would mix distinct ITL3s. Regression test: SW99 9ZZNone.

  2. Cache invalidation on NSPL config change — added _nspl_config_hash() (over nspl_url + itl_names_urls), stored in the DB metadata and checked in _db_is_valid() alongside extra_sources_hash. Enabling/disabling/swapping the URL now busts the fast path so UK rows are added or dropped on the next load instead of after TTL expiry. Empty hash when unconfigured keeps TERCET-only caches valid.

  3. Reuse cached NSPL on transient fetch failure_load_nspl now falls back to the on-disk nspl.zip (via a new _load_nspl_from_cache) when the fetch/parse fails or returns 304, so a transient ONS outage during a rebuild doesn't silently drop UK support. Parsing was split into _parse_nspl_zip for reuse.

5 new tests added (393 total, green).

@bk86a bk86a merged commit 706a7df into main Jul 3, 2026
10 checks passed
@bk86a bk86a deleted the feat/uk-itl-support branch July 3, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement support for UK postal codes and the ITL (International Territorial Level)

1 participant