Tier 0 — Federal ZIP completeness audit (2026-05-01)

Status: Complete. 366 gaps inserted; 451 discrepancies logged for review; 1,353 extras logged. Closes Issue #73 Path 2.
Purpose: Systematic completeness check of federal-30 + NY ZIP coverage in zip_county against the U.S. Census 2020 ZCTA-County universe. Companion to the Tier 1 zip-county audit (which checks accuracy at 100% match) — Tier 0 checks coverage.

Summary

Metric	Count
Census federal+NY universe (zip, countyFips tuples)	29,793
DB before audit	30,329
DB after audit	30,695
Insertable gaps (Census has, DB doesn't, CMS confirms)	366
Discrepancies (Census says X, CMS says Y — log only, not inserted)	451
Extras (DB has, Census doesn't — informational, not modified)	1,353
Needs-PUF (county entirely missing — would need PUF re-ingest)	0
CMS errors during audit	0

What was inserted

All 366 inserts are NY-side multi-county additions. Every NY ZIP that gained a doc already had ≥1 sibling doc in the DB; the audit added the missing additional county docs. The original NY ingest (scripts/db/load-ny-2026.js, 2026-04-12) loaded the primary county per ZIP for many multi-county ZIPs but missed the secondary counties.

Examples of fixed ZIPs:

10463 → had only Bronx; added New York County (Manhattan)
10470 → had only Bronx; added Westchester County
10509 → had only Putnam; added Westchester County
10940 → had only Orange; added Sullivan County

User impact: residents of these NY ZIPs whose actual address is in the secondary county will now correctly see plans for that county. Pre-audit they may have been mapped to the wrong county's plan flow.

Per-state breakdown

Insertable (366 gaps inserted)

State	Count
NY	366 (all)

All other federal-30 states: 0 insertable gaps. Federal-30 ZIP coverage was already complete at the (zip, countyFips) tuple level — only NY had the multi-county-secondary-county gap pattern.

Extras (1,353 — logged, not modified)

DB has these (zip, countyFips) tuples; Census 2020 ZCTA doesn't recognize them. Likely because:

Our PUF Service Areas data (CMS, more current) tracks ZIPs Census 2020 ZCTA didn't capture (Census ZCTA boundaries lag USPS by years)
Some county-boundary changes since Census 2020
Multi-county tracking in our DB beyond what Census 2020 reflects

Distribution:

State	Extras	Notes
TX	155	Largest absolute count (consistent with state size)
IA	96
IN	91
OH	86
KS	77
MO	71
NE	68
AR	67
MI	61
WI	55
TN	51
OK	50
SD	50
NC	46
WV	46
FL	43
MS	40
ND	38
AL	33
LA	32
MT	27
NH	14
OR	14
SC	14
WY	11
AK	9
UT	4
AZ	3
HI	1

These extras are not corrected because:

They serve users correctly today (users in these ZIPs get plans via existing data)
Removing them risks stranding real users on USPS ZIPs Census 2020 doesn't yet recognize
CMS-side validation hasn't been run on the extras (out of scope for Tier 0)

A future audit could spot-check the extras against CMS to confirm they're real ZIPs. Out of scope here.

Discrepancies (451 — logged, not inserted)

Census 2020 says ZIP X is in county A; CMS says it's in county B (different county). For these, trust CMS. The audit doesn't insert based on Census's view because CMS is canonical.

Most discrepancies are NY-skewed (~440 of 451). Likely Census 2020 boundaries differ from current CMS Marketplace API view of ZIP→county mapping, particularly for upstate NY ZIPs.

Sample:

ZIP	Census says	CMS says
03458	NH/Cheshire	NH/Hillsborough
11370	NY/Bronx	NY/Queens
12120	NY/Greene	NY/Albany
12763	NY/Ulster	NY/Sullivan
12785	NY/Orange	NY/Sullivan

For these ZIPs, our DB likely already has the CMS-correct county. The discrepancy is just "Census 2020 ZCTA is mildly stale relative to current CMS data." No action.

The full list is in scripts/db/data/federal-gap-report-2026-05-01.json under the discrepancy key.

Methodology

Input sources

Census 2020 ZCTA-County relationship file — universe of every U.S. ZIP→county mapping. Free, federal, refreshed annually. Source: https://www2.census.gov/geo/docs/maps-data/data/rel2020/zcta520/tab20_zcta520_county20_natl.txt
MongoDB zip_county collection — current state of our ZIP coverage data. Filter: state ∈ federal-30 ∪ {NY} AND sbeRedirect: { $exists: false }.
CMS Marketplace API /counties/by/zip/{zip} — canonical truth for each gap, used to verify Census's claim before inserting.

Pipeline

Census ZCTA file ──→ build-federal-snapshot.js ──→ federal-zip-state-2020.csv (committed)
                                                            │
                                                            ▼
                                          audit-federal-completeness.js ──→ federal-gap-report-2026-05-01.json
                                                            │              (committed)
                                                            ▼
                                            seed-federal-completeness.js
                                                  --apply
                                                            │
                                                            ▼
                                                  zip_county collection
                                                  (366 new docs with
                                                  _seedSource marker)

Classification logic per gap

For each (zip, countyFips) tuple in Census \ DB:

Query CMS for the ZIP. If CMS returns counties:
- CMS confirms (state, fips) match → check regionId lookup
  - regionId found in DB (county already has siblings) → insertable
  - regionId not found (county entirely absent from our PUF) → needs-PUF
- CMS doesn't return Census's expected county → discrepancy (logged only)
- CMS state ≠ Census state → discrepancy (logged only)
CMS error → cms-errors (logged for retry)

Safety guards

Hard-coded state allowlist (FEDERAL_STATES ∪ {NY}) — won't touch SBE-state docs
Per-(zip, countyFips) keying for inserts; idempotent
Marker tag _seedSource: "federal-completeness-audit-2026-05-01" on every insert → unambiguous rollback
regionId sourced from existing DB siblings (any other ZIP in the same (state, countyFips)) — guarantees rating-area consistency within a county
Never modifies existing docs

Verification

Apply results identical on staging + prod:

Step	Result
Inserted	366
Already present (idempotent skip)	0
Rejected — state allowlist	0
Rejected — missing fields	0

Validation tier (Phase 8)

Test	Result
Calculator baseline diff (12 scenarios — UT, TX, FL, NY, SBE redirect, PO Box, Medicaid)	ZERO DIFFS
Prod consistency check (no-marker docs unchanged)	30,326 → 30,326 (verified)
Multi-county integrity check (sample 5 inserted ZIPs)	All return correct multi-county responses (e.g., 10463 → Bronx + New York County)

Smoke probe matrix on prod (post-deploy)

zip=10463 → counties:[{Bronx, 36005}, {New York County, 36061}]      ← multi-county now
zip=10470 → counties:[{Bronx, 36005}, {Westchester County, 36119}]   ← multi-county now
zip=10509 → counties:[{Putnam, 36079}, {Westchester County, 36119}]  ← multi-county now
zip=10512 → counties:[{Putnam, 36079}, {Dutchess County, 36027}]     ← multi-county now
zip=10940 → counties:[{Orange, 36071}, {Sullivan County, 36105}]     ← multi-county now

Rollback

bash

MONGODB_WRITE_URI=$(aws --profile askflorence-prod secretsmanager get-secret-value \
  --secret-id prod/mongodb/app-write --query SecretString --output text) \
  node scripts/db/seed-federal-completeness.js --rollback

Removes only docs with _seedSource: "federal-completeness-audit-2026-05-01". The 30,326 legacy federal/NY docs untouched. Federal-gap-fix marker (3 docs) untouched. SBE marker (17,537 docs) untouched.

Annual refresh

Add to the data-sources playbook:

Re-pull Census ZCTA file at plan-year transition
Re-run build-federal-snapshot.js → updated CSV
Re-run audit-federal-completeness.js → updated report
Triage classification counts; should be ~0 new gaps in steady state (the federal-30 ingest captures things directly via PUF)
If gaps surface, run seed-federal-completeness.js after triage
Append change-log entry

Files

scripts/db/build-federal-snapshot.js (build the universe CSV)
scripts/db/data/federal-zip-state-2020.csv (committed snapshot, 29,793 rows)
scripts/db/audit-federal-completeness.js (run the audit)
scripts/db/data/federal-gap-report-2026-05-01.json (committed report, 467 KB)
scripts/db/seed-federal-completeness.js (apply the inserts)

Issue #73 — parent (Path 1: 3 known gaps fixed in commit aa2a97a; Path 2: this audit)
docs/validation/methodology.md — audit methodology reference
docs/infrastructure/data-sources.md — ingest pipeline overview
Tier 1 zip-county audit (scripts/audit/tier-1-zip-county.js) — companion accuracy check
Tier 1.5 SBE zip-county audit (Issue #70) — companion SBE-side audit