Skip to content

feat: authoritative Albania block resolver (#118)#133

Merged
bk86a merged 7 commits into
mainfrom
feat/albania-block-resolver
Jul 3, 2026
Merged

feat: authoritative Albania block resolver (#118)#133
bk86a merged 7 commits into
mainfrom
feat/albania-block-resolver

Conversation

@bk86a

@bk86a bk86a commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Closes #118.

Problem

Albania has no Eurostat TERCET file, so v0.21.0 shipped AL as incomplete GeoNames estimates (~489 codes). Officially-valid codes GeoNames omitted returned 404 — e.g. Tirana 1055/1057/1060–1065, and whole districts GeoNames never covered (Gramsh 33xx, Peqin 35xx, Tepelenë 63xx, Përmet 64xx). The gap spanned all 12 qarks.

Approach

Albanian postal codes are block-allocated by district: the first two digits identify one of ~33 postal districts, each belonging to exactly one of the 12 NUTS3 qarks. This branch replaces the GeoNames enumeration with an authoritative in-code block resolver (app/albania_blocks.py) that maps any well-formed 4-digit code to its qark by the block it falls into — covering the gaps by construction, at NUTS3 granularity.

  • New app/albania_blocks.py: ~35-row block table + resolve_al_block() (bisect).
  • lookup() gains an AL-only "Tier 2b" consulting it; get_loaded_countries() keeps AL supported.
  • The 489 GeoNames AL rows and scripts/build_albania_estimates.py are removed — the block map (code, not data) is now AL's sole resolver, so a PC2NUTS_ESTIMATES_REFRESH_URL full-replace can no longer clobber AL coverage.

Continuity

match_type="estimated" and high confidence (nuts3 0.9) are unchanged, so AL's /lookup and /resolve behavior is identical for the 489 known codes (in particular /resolve still won't geocode AL). The only observable change: gap codes now resolve instead of 404.

Validation

A golden regression test captures all 489 previously-shipped (code → NUTS3) pairs as a fixture and asserts the block resolver reproduces every one — 0 mismatches. Plus gap-coverage, service-code (1700/1800), top-open-range (9800–9999→AL035), and malformed/out-of-range tests. Docs (README + lookup() docstring + estimates statistics) updated to match.

Suite: 48 passed across the AL + data_loader tests, ruff clean.

@bk86a bk86a merged commit 2da6a8f into main Jul 3, 2026
10 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec2290baa2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/albania_blocks.py
n = int(postal_code)
if n < _STARTS[0]:
return None
return _NUTS3[bisect_right(_STARTS, n) - 1]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unallocated AL prefixes instead of range-filling them

This lookup uses the previous center code for every numeric value until the next center, so an AL input like 1900 or 9999 is returned as a high-confidence Tirana/Sarandë estimate even though the module's own allocation describes district identity by the first two digits and BLOCKS only lists prefixes such as 10, 15, 17, 18, 20, etc. The new tier therefore turns unallocated 4-digit AL codes in the gaps between listed prefixes into successful NUTS matches; use the first-two-digit block keys (or explicit valid ranges) rather than a continuous bisect interval if these should keep returning no match.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct that the bisect range-fills: any code ≥ 1000 resolves to its enclosing district block, so unallocated-prefix inputs like 1900 or 9999 return a best-effort region rather than no-match. This is a deliberate design choice, not an oversight — the goal of #118 was to stop valid AL codes 404-ing, and the resolver leans toward best-effort coverage (documented in the docstring's "9800–9999 → Sarandë as best-effort" note).

Worth stating precisely, since it bounds the concern: all 489 real AL codes — and every gap code #118 was about — sit in allocated 2-digit prefixes, so they resolve identically under either approach (the golden regression test covers this). The only inputs affected by range-fill vs. prefix-strict are ~5,500 codes in unallocated prefixes, which are almost certainly non-existent codes. Range-fill returns a plausible neighbor region for those; prefix-strict would 404 them.

Your alternative (key on the 35 allocated 2-digit prefixes, 404 the gaps) is a reasonable and arguably more honest tradeoff — it stops asserting confidence 0.9 for codes with no allocated district — and it's free for real coverage. I've flagged it to the maintainer as a follow-up decision, since switching reverses the approved best-effort behavior. If we adopt it, it'll be a small follow-up PR against the _STARTS/bisect logic. Thanks for the catch.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #134: the resolver now keys on the allocated 2-digit district prefix, so codes in unallocated prefixes (1900, 9999, …) return not-found instead of a fabricated region. All 489 real codes and every #118 gap code still resolve identically (golden test unchanged).

@bk86a bk86a deleted the feat/albania-block-resolver branch July 4, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Albania (AL) postal-code coverage completeness (GeoNames gaps)

1 participant