feat: UK postcode / ITL support via ONS NSPL (#7)#135
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6088941b16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if outward is not None: | ||
| outward_hit = _outward_lookup.get((cc, outward)) | ||
| if outward_hit is not None: |
There was a problem hiding this comment.
Stop UK outward misses before prefix fallback
For a valid-format UK postcode whose outward code is not present but shares a shorter prefix with loaded NSPL data (for example SW99 9ZZ when SW1A... exists), this branch falls through to _estimate_by_prefix, so the service can return an arbitrary approximate ITL based only on S/SW. Since the outward code is the meaningful UK boundary, an outward miss for outward_only countries should stop rather than use the generic prefix tier.
Useful? React with 👍 / 👎.
| if not timed_out and settings.nspl_url: | ||
| nspl_count = _load_nspl(client, settings.nspl_url, cache_dir) |
There was a problem hiding this comment.
Invalidate the DB when NSPL config changes
When PC2NUTS_NSPL_URL is added, removed, or changed while the SQLite cache is still valid, load_data() returns from the cache fast path above before reaching this block because _db_is_valid() only considers the NUTS version and extra_sources_hash. That leaves UK unsupported after enabling NSPL, or keeps stale UK rows after disabling/changing it, until the cache expires or is manually deleted; include the NSPL/ITL-name configuration in cache metadata or bypass the fast path when it changes.
Useful? React with 👍 / 👎.
| return 0 | ||
| cache_path = cache_dir / "nspl.zip" | ||
| try: | ||
| resp = _download_zip_conditional(client, url, {}) |
There was a problem hiding this comment.
Reuse cached NSPL data on transient fetch failures
In a rebuild where TERCET downloads succeed but the NSPL request temporarily fails, _load_nspl() returns 0 and load_data() later saves the rebuilt _lookup without any UK rows, even though this function writes nspl.zip to disk on successful runs and never reads it back. For configured UK deployments, a transient ONS outage can therefore drop UK support until a later successful rebuild; parse the cached NSPL ZIP or preserve old UK rows when the fetch fails.
Useful? React with 👍 / 👎.
- outward miss for outward_only countries no longer falls through to generic prefix estimation (would answer from an arbitrary 1-2 char prefix) - reuse cached nspl.zip when the NSPL fetch transiently fails, so an ONS outage does not silently drop UK support on a rebuild - bust the SQLite fast-path cache when NSPL/ITL-names config changes, so enabling/disabling/swapping the URL takes effect without waiting for TTL
4890e5e to
9050047
Compare
|
Thanks — all three P2 findings were valid and are now fixed in
5 new tests added (393 total, green). |
Closes #7. Implements UK postcode → ITL support per the approved design spec and 15-task plan.
What this adds
UK postcodes now resolve to ITL (International Territorial Level) codes — the UK's post-Brexit successor to NUTS — sourced from the ONS NSPL dataset. UK is a parallel data channel: it reuses the same in-memory lookup, SQLite cache, and lookup waterfall as TERCET, with isolated failure handling (an NSPL failure never blocks TERCET serving).
Highlights:
code_systemfield ("NUTS"|"ITL") on/lookupresponses — additive, non-breaking. ITL diverges from NUTS-2016 UK at L2/L3, so consumers branch on it.country=GBaccepted as an alias forUK(likeGR → EL).SW1A) or an unlisted full postcode resolves to the majority-vote ITL3 for that outward code (estimated/medium confidence). Placed before the generic prefix tier — the outward code is the meaningful UK boundary, and a prefix match would otherwise shadow it and returnapproximate.dotermfilter (live postcodes only),pcds/itlcolumn aliases, dedicated 1 GB extraction cap (the CSV far exceeds the 100 MB TERCET limit), conditional-GET wrapper.PC2NUTS_NSPL_URL,PC2NUTS_ITL_NAMES_URLS. UK is not added tosettings.jsoncountries(would trigger wasted GISCO URL guesses — per the Codex review on docs: UK/ITL support spec and implementation plan (#7) #52); it registers automatically once NSPL rows land in_lookup.400.Notes on plan adaptation
The plan predated the Albania Tier 2b resolver, the
/resolvegeocoding path, and other changes, so line numbers and tier ordinals were stale. The main substantive deviation: the plan placed the outward tier after prefix estimation, but the prefix index already indexes outward-length strings, which made that placement dead and returnedapproximate. Moved it ahead of the prefix tier to honour the intendedestimated/medium behaviour.Release
Bumps to v1.1.0 (
app/__init__.py+ CHANGELOG).patterns_version→1.3. This release also ships the already-merged Albania completeness work (#118) currently sitting in[Unreleased].Tests
388 passing (up from 333). New coverage: UK regex/outward extraction, NSPL parsing +
dotermfilter, outward index majority vote, GB alias, outward lookup tier,code_systemtagging, ITL names loader, and NSPL-failure isolation.ruffclean.