Appearance
Tier 0.5 - Federal+NY ZIP USPS-completeness audit (2026-05-01)
Status: Complete. 4,363 docs inserted across 3 classes. Closes Issue #80 (parent context: Issue #79).
Purpose: Catch the structural blind-spot in Tier 0. Tier 0 used U.S. Census 2020 ZCTA as its universe; Census ZCTA only catalogs ZIPs with significant residential population. PO-Box-only, business-only, single-building, and other USPS-only ZIPs are CMS-recognized but Census-blind. Tier 0.5 closes that gap by using a USPS-derived universe.
Trigger: 2026-05-01 user report - co-founder entered ZIP
85001(downtown Phoenix) on the prod calculator and got a 404. CMS Marketplace API correctly identifies85001as AZ/Maricopa County (Rating Area 4). Tier 0 had no85001doc because Census 2020 ZCTA doesn't track PO-Box-only ZIPs.
Summary
| Metric | Count |
|---|---|
USPS universe (federal+NY filter, zipcodes npm) | 24,945 distinct ZIPs |
| DB before audit (federal+NY clean) | 20,618 distinct ZIPs (across 30,695 docs) |
| DB after audit (federal+NY clean) | 24,965 distinct ZIPs (across 35,058 docs) |
| Gap zips (USPS \ DB) - Tier 0's blind spot | 4,347 |
| Insertable gaps inserted | 3,842 docs (across 3,829 unique ZIPs; 13 multi-county) |
| Discrepancy docs inserted (cross-jurisdiction) | 3 |
| Non-residential docs inserted (corporate ZIPs) | 518 |
| Total Tier 0.5 marker docs | 4,363 |
| Extras (DB has, USPS-npm doesn't) | 20 (left untouched - see "Extras" section) |
| Needs-PUF | 0 (every CMS-confirmed county had an existing DB sibling for regionId derivation) |
_seedSource: "federal-tier-0-5-audit-2026-05-01" on every inserted doc.
What was inserted
Class 1 - Insertable (3,842 docs across 3,829 unique ZIPs)
Standard federal+NY county doc with regionId derived from existing same-county DB siblings (identical pattern to Tier 0). Each entry was independently CMS-confirmed via https://marketplace.api.healthcare.gov/api/v1/counties/by/zip/{zip} before classification.
The 85001 user-reported case is in this class. Inserted as:
json
{
"zip": "85001",
"countyFips": "04013",
"county": "Maricopa County",
"state": "AZ",
"regionId": "Rating Area 4",
"_seedSource": "federal-tier-0-5-audit-2026-05-01"
}Verified resolution: curl https://askflorence.health/api/counties?zip=85001 returns {"counties":[{"fips":"04013","name":"Maricopa County","state":"AZ"}]}. End-to-end plan lookup returns 86 plans.
Per-state insertable breakdown:
| State | Count | State | Count | State | Count | ||
|---|---|---|---|---|---|---|---|
| AK | 28 | KS | 42 | OR | 50 | ||
| AL | 143 | LA | 181 | SC | 110 | ||
| AR | 90 | MI | 166 | SD | 10 | ||
| AZ | 100 | MO | 118 | TN | 149 | ||
| DE | 28 | MS | 89 | TX | 606 | ||
| FL | 445 | MT | 36 | UT | 45 | ||
| HI | 40 | NC | 226 | WI | 94 | ||
| IA | 86 | ND | 19 | WV | 113 | ||
| IN | 149 | NE | 30 | WY | 15 | ||
| NH | 34 | ||||||
| NY | 321 | ||||||
| OH | 180 | ||||||
| OK | 99 |
The 13 multi-county ZIPs (each gets 2 docs, one per county CMS returned) are distributed across states without any anomaly clustering.
Class 2 - Discrepancy (3 docs)
ZIPs where USPS classifies the ZIP under a federal-30 state but CMS routes it to a different jurisdiction. Inserted with the existing platform shapes (sbeRedirect for SBE-state routing, unsupported for territory/non-marketplace cases) so the calculator surfaces a meaningful response instead of a 404.
| ZIP | USPS state | CMS routes to | Doc shape | Why |
|---|---|---|---|---|
| 45275 | OH | KY (Boone County) | sbeRedirect to kynect | Cincinnati/Northern Kentucky International Airport (CVG); physically in Boone County, KY |
| 45999 | OH | KY (Kenton County) | sbeRedirect to kynect | IRS Service Center, Covington KY; postal classification points to OH |
| 96898 | HI | MH (Marshall Islands - Kwajalein Atoll) | unsupported: us_territory_no_marketplace | US Army installation under Compact of Free Association; ACA Marketplace not available |
Verified resolution:
curl ?zip=45275→{"sbeRedirect":{"state":"KY","marketplace":"kynect (kynect.ky.gov)"}}curl ?zip=96898→{"unsupported":{"reason":"us_territory_no_marketplace","message":"ACA Marketplace coverage isn't available in the Marshall Islands. If you live or work at the Kwajalein Atoll installation, contact your sponsor's HR or your Tricare benefits administrator for coverage options."},"alternateCounties":[]}
Class 3 - Non-residential (518 docs)
Corporate, business-only, or single-recipient mailing ZIPs that USPS recognizes but CMS does NOT (CMS returns {"counties":null} for each). Examples: 10046 NY "Contest Mail", 10072 NY "Philip Morris", 10094 NY "Marden Kane Inc", 10197 NY "Citicorp Services Inc", 19889 DE "Beneficial Natl Bank".
CMS itself rejects these with "invalid zipcode or fips provided" if you POST a plans-search. Healthcare.gov's user flow surfaces "this ZIP isn't recognized" for the same set. Pre-Tier-0.5 our calculator returned bare 404 "Zip code not found".
Inserted shape (uniform across all 518):
json
{
"zip": "10197",
"countyFips": "",
"county": "",
"state": "NY",
"regionId": "",
"unsupported": {
"reason": "non_residential",
"message": "This ZIP code is registered as a corporate, business-only, or single-recipient mailing address and isn't used for residential health-insurance lookups. Please enter the ZIP code for the address where you actually live."
},
"_seedSource": "federal-tier-0-5-audit-2026-05-01"
}The route's unsupported branch short-circuits before referencing countyFips/county/regionId, so the empty values are safe.
Why no alternateCounties: unlike PO-Box ZIPs (which serve real residential populations who picked the wrong ZIP), corporate ZIPs have no associated residential population - any "nearest county" computation would invent counties unrelated to where the corporation's employees actually live. CMS doesn't surface alternates for these either; the right UX nudge is "use your home ZIP," not "pick a nearby county."
Per-state non-residential breakdown:
| State | Count | State | Count | State | Count | ||
|---|---|---|---|---|---|---|---|
| AK | 1 | KS | 10 | OR | 12 | ||
| AL | 40 | LA | 6 | SC | 6 | ||
| AR | 7 | MI | 12 | SD | 9 | ||
| AZ | 51 | MO | 20 | TN | 11 | ||
| DE | 2 | MS | 17 | TX | 65 | ||
| FL | 36 | MT | 1 | UT | 5 | ||
| HI | 2 | NC | 12 | WI | 21 | ||
| IA | 9 | ND | 1 | WV | 6 | ||
| IN | 32 | NE | 5 | WY | 1 | ||
| NH | 3 | ||||||
| NY | 67 | ||||||
| OH | 35 | ||||||
| OK | 13 |
Concentration matches business-density pattern (NY 67, TX 65, AZ 51, AL 40, FL 36 lead).
What was NOT inserted (documented exclusions)
Extras (20 ZIPs in DB but not in USPS-npm universe)
20135 42223 42602 56144 56136 56219 56220 56164 56257 80737
72405 72713 75036 75072 83342 89421 99362 30555 30559 88240Two well-understood subclasses:
16 of 20 - Cross-state border ZIPs USPS classifies under SBE host state:
These ZIPs serve residents on BOTH sides of a state line. USPS picks ONE state for postal classification; CMS knows about every county the ZIP covers. Our DB carries multi-county truth. When USPS picks an SBE state for postal classification, the npm-derived universe filters out the ZIP entirely (we filter to federal+NY) - so the federal-side entry in our DB looks "extra" relative to the npm filtered set.
Examples:
20135Bluemont - USPS=VA (SBE), CMS returns VA-Clarke + WV-Jefferson + VA-Loudoun. DB has all 3.42223Fort Campbell military base - USPS=KY (SBE since 2024 kynect), CMS returns KY-Christian + TN-Montgomery. DB has both.56144Jasper MN - USPS=MN (SBE), CMS returns MN-Rock + MN-Pipestone + SD-Moody. DB has all 3.30555 30559 88240- the 3 Path 1 cross-state border fixes (commit843bdf7).
4 of 20 - ZIPs missing from zipcodes npm package (data freshness gap):
75036, 72405, 72713, 75072 - all in our DB (sourced from CMS at insert time, fully canonical) but not in the zipcodes npm package. Likely USPS additions after the npm package's last data refresh; high-growth corridors like Frisco/McKinney TX often see new ZIPs.
Action: leave alone. Existing DB data is canonical. The "extra" classification is an artifact of incomplete USPS-npm universe, not a data defect. If we ever upgrade to the HUD ZIP-County crosswalk (richer, refreshed quarterly), Tier 0.5 audits will catch any future cases of this pattern automatically.
Methodology
Universe choice
We considered two USPS-derived universe sources:
- HUD ZIP-County crosswalk - refreshed quarterly, requires HUD account (auth click-through), authoritative.
zipcodesnpm package - MIT-licensed, ~44K USPS-derived records, zero auth friction, slightly stale.
Chose zipcodes npm for v1 of Tier 0.5 for zero auth friction. Sanity-gated against 6 known PO-Box-only ZIPs (85001 AZ, 10008 NY, 33101 FL, 78201 TX, 73101 OK, 84101 UT). All 6 present.
The 4 npm-stale extras (75036, 72405, 72713, 75072) are the cost of using a static package vs a quarterly-refreshed crosswalk. Acceptable trade-off for Tier 0.5 v1 since they don't introduce inconsistency (CMS knows about them too; our DB is already correct for them). HUD upgrade is the right next step for ongoing refresh cadence.
Structural difference from Tier 0
Tier 0 universe was Census ZCTA, which carries (zip, state, countyFips) tuples natively. Universe membership and DB membership were both checked at the (zip, countyFips) tuple level.
Tier 0.5 universe is (zip, state) only - no countyFips. So gap detection is at the ZIP level, and CMS becomes the authoritative source for (state, countyFips, county) assignment per gap ZIP. This is more correct for the Tier 0.5 thesis (CMS uses USPS data to assign counties, so CMS will agree with USPS on what ZIPs exist).
Pipeline
scripts/db/build-usps-snapshot.js- filterzipcodesnpm to federal-30 + NY, emitdata/usps-zip-state-2026-05-01.csv(24,945 ZIPs).scripts/db/audit-federal-completeness-tier-0-5.js- load universe, load DB state, set-difference at zip-level, query CMS for each gap zip, classify each (zip, countyFips) CMS returns into insertable / needs-PUF / discrepancy / cms-error.scripts/db/retry-cms-errors-tier-0-5.js- retry pass for HTTP 429s at lower concurrency with exponential backoff, then re-classify (initial run at concurrency=10 hit CMS rate limits; retry at concurrency=3 with backoff cleared 2,064 of 2,352 transient failures).scripts/db/seed-federal-tier-0-5.js- apply user-greenlighted batches with--state/--classfilters, three-mode CLI (--dry-run/--apply/--rollback), idempotency guard, marker tagging.
Constraints honored
Per the session plan:
Constraint 1 - PROD BACKUP BEFORE EVERY APPLY. Every batch was preceded by a fresh mongodump of the entire zip_county collection, verified by file size + record count + sha256 + sample round-trip. Three backups taken across three batches. Stored at ~/Documents/askflorence-db-backups/zip_county/<TAG>/ (local-only - S3 sync blocked by bucket policy on the SSO admin role; flagged as separate ops follow-up).
| Batch | Backup tag | Pre-apply count | sha256 |
|---|---|---|---|
| 1: AZ insertable (100 docs) | pre-tier-0-5-batch-az-insertable-20260501T220646Z | 48,232 | bd71519d... |
| 2: discrepancy (3 docs) | pre-tier-0-5-batch-discrepancy-20260501T222700Z | 48,332 | 7801459d... |
| 3: bulk-remaining (4,260 docs) | pre-tier-0-5-batch-bulk-remaining-20260501T222753Z | 48,335 | aa81da42... |
Constraint 2 - ANALYZE → REPORT → PHASED USER DECISIONS, never auto-update. No write occurred without an explicit user greenlight. Three batches applied as: AZ insertable (smoke + 85001 quick-win) → discrepancy (validated new sbeRedirect + territory shapes) → bulk-remaining (rest of insertable + non_residential).
Verification - TRUE 100% match achieved
| Gate | Result |
|---|---|
| 85001 prod live API | Returns {"counties":[{"fips":"04013","name":"Maricopa County","state":"AZ"}]} ✓ |
| 85001 plan lookup end-to-end | 86 plans returned (Catastrophic Standard 68445AZ0590050 $338.26/mo first) ✓ |
| 50613 prod live API | Returns 4 counties (Black Hawk + Bremer + Butler + Grundy) ✓ |
| Calculator baseline diff (12 scenarios) | ZERO DIFFS post-batch-1 + post-batch-3 ✓ |
| Tier 0.5 re-run | 0 gap zips remaining (was 4,347) ✓ |
| Tier 1 audit (federal zip-county) | 22,302/22,302 = 100.00% exact match ✓ (after audit-script patch + 50613 fix + rate-limit retry validation) |
| Tier 1.5 audit (SBE zip-county) | 13,055/13,055 = 100.00% exact match ✓ (after rate-limit retry validation) |
| Smoke matrix on 10 inserted ZIPs across multiple states | 10/10 return correct counties ✓ |
| Smoke matrix on 5 non_residential ZIPs | 5/5 return generic unsupported message ✓ |
| Smoke matrix on 3 discrepancy ZIPs | 3/3 return correct sbeRedirect/unsupported shapes ✓ |
| Per-state DB count vs audit prediction | Exact match across all 31 federal+NY states + KY (2) + MH (1) ✓ |
Path to TRUE 100% (Phase 8b drive-to-100% effort)
Initial post-apply audits showed Tier 1 = 99.84% (2 mismatches + 33 rate-limit errors) and Tier 1.5 = 99.80% (0 mismatches + 26 rate-limit errors). Three issues separated:
Issue A: 33+26 CMS rate-limit errors in initial audits (UNKNOWN status). Built scripts/audit/validate-cms-errors.js to retry each at concurrency=1 with exponential backoff (5s/10s/20s/40s/80s) and re-classify via the same DB-vs-CMS comparison the original audit does. Result: 33/33 Tier 1 retries = MATCH; 26/26 Tier 1.5 retries = MATCH. No real mismatches were hiding behind rate limits.
Issue B: ZIP 96898 Marshall Islands - audit-script false positive. The Tier 1 script's $match filter excluded our MH/Kwajalein unsupported-class doc from the "ours" comparison but didn't subtract the corresponding (zip, fips) tuple from the CMS-side, so it surfaced as "extra in CMS / county-count mismatch." Patched the audit script to pre-fetch all (zip, fips) tuples from unsupported-class or non-federal-state docs and subtract them from the CMS comparison. Re-ran patched audit; 96898 now exact match. Patch is permanent in the audit script for future runs.
Issue C: ZIP 50613 IA - real data gap (missing Bremer County). This was a tuple-level multi-county completeness gap that Tier 0.5's zip-level gap detection didn't catch. Validated via 5x CMS lookups (5/5 returned Bremer = consistent), regionMap availability (13 existing IA/19017 sibling docs all regionId: "Rating Area 7"). Built scripts/db/fix-tier-1-completeness-gaps.js (dedicated _seedSource: "tier-1-completeness-fix-2026-05-01" marker for surgical rollback), took fresh backup (pre-tier-1-completeness-fix-50613-20260501T231959Z), applied 1 doc. Smoke test: prod /api/counties?zip=50613 returns all 4 counties.
After A + B + C: re-ran patched Tier 1 fresh (cleared progress cache) → 22,302/22,302 = 100.00% exact match, 0 mismatches, 0 extras, 1 transient rate-limit error (validated as MATCH on retry).
Refresh cadence
Annual refresh playbook (extends Tier 0's):
- Update
zipcodesnpm package -npm update zipcodesto pull the latest USPS data refresh. - Rebuild USPS snapshot -
node scripts/db/build-usps-snapshot.js(verify sanity gate). - Re-run Tier 0.5 audit -
MONGODB_URI=<prod-read> CMS_API_KEY=<key> node scripts/db/audit-federal-completeness-tier-0-5.js. - Retry rate-limit pass if needed -
MONGODB_URI=... CMS_API_KEY=... node scripts/db/retry-cms-errors-tier-0-5.js. - Triage report per Constraint 2 (per-state, per-class breakdown, anomaly flags).
- Phased apply with backups per Constraint 1.
- Append change-log entry with timestamp + commit SHA + counts.
Recommended upgrade path: swap zipcodes npm for HUD ZIP-County crosswalk (https://www.huduser.gov/portal/datasets/usps_crosswalk.html) before next plan-year refresh. Quarterly refresh + richer county-fips data + catches the 4 npm-stale extras automatically. Free; requires HUD account.
Limitations + follow-ups
- Tier 0.5b - tuple-level multi-county completeness audit. ZIP
50613was the only one surfaced by Tier 1 here, but the underlying audit shape (zip-level gap detection rather than tuple-level) means there could be other ZIPs in DB with missing multi-county tuples that Tier 1 just happened to score as 22,302/22,302 because none of those particular ZIPs got audited (or all of their additional CMS counties happen to be already in DB). A defensive follow-up audit could iterate every ZIP in DB, query CMS, and insert any missing (zip, countyFips) tuples - same machinery as Tier 0.5 itself but tuple-level instead of zip-level. Tier 1's clean state today is the empirical floor; the systematic check would be the ceiling. zipcodesnpm staleness: 4 ZIPs in our DB (75036, 72405, 72713, 75072) aren't in the npm package. Already correct in DB; HUD upgrade fixes the audit-side completeness.- Local-only backups: S3 sync to
s3://askflorence-data/db-backups/blocked because the SSO admin role lacks bucket-policy access (correct prod hardening - ECS task role + GitHub OIDC role have access, human admin doesn't). Separate ops issue needed for either a scoped bucket-policy allow or a dedicated assumable backup-role. Until then:~/Documents/askflorence-db-backups/zip_county/is the canonical location. - non_residential message tweak: consider improving the route-handler's bare
404 "Zip code not found"message even for ZIPs not in DB at all (e.g., user typo) - "ZIP not recognized; check the digits and try your home address" reads better than "Zip code not found." Frontend-side change, separate from Tier 0.5 scope.
Cross-references
- Issue #80 - execution tracker (closes when this doc lands)
- Issue #79 - parent context (Tier 0.5 gap class scoping)
- Tier 0 audit doc - precursor Census-derived audit
- Commit
7b716d0- Phase 5 dry-run audit + scripts - Commit
749a13d- seed script ~/.claude/plans/users-tahaabbasi-developer-ask-florence-fizzy-gray.md- session plan