Appearance
Tier 0.5 federal ZIP completeness audit — fresh-session entry brief
STATUS: SESSION COMPLETE 2026-05-01. Outcome captured below; full reports linked.
Worktree:
/Users/tahaabbasi/Developer/ask-florence-tier-0-5-federalBranch:tier-0-5-federal-completeness-auditIssue (execution tracker): #80 (final close-out comment here) Parent context issue: #79 Reference plan (read for context, do NOT execute verbatim):/Users/tahaabbasi/.claude/plans/hey-so-okay-that-delightful-eich.mdSession plan (executed):/Users/tahaabbasi/.claude/plans/users-tahaabbasi-developer-ask-florence-fizzy-gray.md
Session outcome (2026-05-01)
4,364 docs landed on prod across 4 user-greenlighted batches (3 Tier 0.5 + 1 Tier 1 tuple-completeness). All under verified mongodump backups + per-batch greenlights per Constraints 1+2.
| Final audit | Result |
|---|---|
| Tier 0.5 re-run | 0 gap zips remaining (was 4,347) |
| Tier 1 (federal zip-county) | 22,302/22,302 = 100.00% exact match |
| Tier 1.5 (SBE zip-county) | 13,055/13,055 = 100.00% exact match |
| Calculator baseline diff | ZERO DIFFS on all 12 scenarios |
| 85001 + 50613 user-reported gaps | Both resolve correctly on prod |
Where to read the full story:
docs/validation/tier-0-5-federal-uspscompleteness.md— canonical Tier 0.5 audit reportdocs/validation/audit-operations-runbook.md— operational lessons captured for any future auditdocs/infrastructure/change-log.md— top two entries cover this session
Original entry brief follows below — preserved verbatim as the historical handoff that this session was given.
What you're solving
User report 2026-05-01: co-founder entered ZIP 85001 (downtown Phoenix) on prod calculator → got 404 / "Oops". CMS says 85001 is valid AZ/04013/Maricopa. Our zip_county collection has zero docs for it.
Root cause: the original 2026-04-13 federal-30 ingest (commit 330871e) AND the recent Tier 0 federal completeness audit (commit 2b24f2c) both used U.S. Census 2020 ZCTA-County as the universe. Census ZCTA only catalogs ZIPs with significant residential population. USPS recognizes a broader class — PO-Box-only ZIPs in business districts, single-building ZIPs (universities, military bases, large corporate campuses), industrial-zone ZIPs. CMS Marketplace API (which uses USPS data) recognizes them; Census doesn't. Tier 0 was structurally blind to them.
The single-doc fix for 85001 is staged in commit 843bdf7 (scripts/db/fix-federal-zip-gaps.js extended) but not applied — apply pending fresh AWS SSO. You may apply that fix as a Phase 0 quick-win for 85001 only, but only after honoring Constraint 1 (BACKUP) below. Or you may roll it into the broader Tier 0.5 sweep.
Goal: build Tier 0.5 — same audit pipeline as Tier 0 but with a USPS-derived universe — to systematically catch every PO-Box-only / business-only / USPS-only federal+NY ZIP missing from zip_county.
TWO HARD CONSTRAINTS (load-bearing — the user wrote these specifically for this work)
CONSTRAINT 1 — BACKUP PROD BEFORE ANYTHING
Before you write a single doc to the prod zip_county collection (any apply path: Tier 0.5 batch, single-doc 85001 patch, anything), take a fresh mongodump of the collection.
Required:
- Tag the snapshot with ISO-UTC timestamp + the planned
_seedSourcemarker - Store at a versioned, retrievable location (S3 in the prod account preferred — bucket
askflorence-dataworks; local with documented path acceptable for fast iteration) - Verify the snapshot by file size + record count + spot-check that
mongorestoreworks against a scratch DB - Document the restore command in your plan file
- Each batched apply gets its own immediately-preceding fresh backup
No apply step proceeds without a verified backup. Non-negotiable. If you find yourself about to call bulkWrite / insertOne / --apply on prod without a backup taken in the last few minutes, STOP.
Recommended pattern:
bash
# 1. Get prod read URI from Secrets Manager
aws sso login --profile askflorence-prod
PROD_READ_URI=$(aws --profile askflorence-prod secretsmanager get-secret-value \
--secret-id prod/mongodb/app-read --query SecretString --output text)
# 2. Dump zip_county to local with timestamp tag
TS=$(date -u +%Y%m%dT%H%M%SZ)
TAG="pre-tier-0-5-${TS}"
mkdir -p /tmp/backups/${TAG}
mongodump --uri="${PROD_READ_URI}" --collection=zip_county \
--out=/tmp/backups/${TAG}/dump
# 3. Verify
ls -lh /tmp/backups/${TAG}/dump/*/zip_county.bson
mongorestore --uri="<scratch-uri>" --nsFrom='*.zip_county' --nsTo='restored.zip_county_test' \
--dryRun /tmp/backups/${TAG}/dump
# 4. Upload to S3 (prod-account bucket)
aws --profile askflorence-prod s3 sync /tmp/backups/${TAG}/ \
s3://askflorence-data/db-backups/zip_county/${TAG}/
# 5. Document restore command in your plan fileCONSTRAINT 2 — ANALYZE → REPORT → PHASED DECISIONS — never auto-update
The audit DOES NOT auto-insert. Flow:
- Dry-run audit → produce a structured gap report:
- Per-state counts (insertable / needs-PUF / discrepancy / cms-error)
- Per-class counts
- Per-conflict samples (if a state has > 50 gaps, that's anomalous — sample 10)
- Full machine-readable JSON at
scripts/db/data/federal-tier-0-5-gap-report-<date>.json - Human-readable markdown summary in your status comment on issue #80
- STOP. Surface the report to the user via issue comment + a clear "awaiting user review" message in the chat. Do not proceed.
- User makes phased decisions explicitly. Examples:
- "Apply the AZ + TX classes only, hold the rest"
- "Hold the discrepancy class entirely until I review samples"
- "State X has 50 gaps — investigate before any apply"
- You execute ONLY what the user explicitly greenlights, in the batches they approve.
- Each batch gets a fresh backup per Constraint 1.
- After each batch: post a status comment on #80 with batch summary + verification matrix.
If at any point you're unsure whether the user has greenlighted a specific apply, ask — do not assume.
Scope
- PROD ONLY this run. Staging is locked out (massive in-flight data task there). Don't touch staging Atlas — no allowlist add, no script run, no DB write.
- States in scope: federal-30 ∪ {NY}. SBE states are out of scope (Tier 1.5 covers those at #70).
- Pure data layer. No app code changes, no route changes, no
STATE_BASED_MARKETPLACESchanges. - No app deploy needed. Vercel + AWS apex unchanged.
Step-by-step shape (your starting framework — adapt during planning)
This is the original 10-phase blueprint from the reference plan. Treat it as a starting reference. Your own plan will diverge per the two constraints — phase 5 becomes the explicit user-decision gate.
| # | Phase | What |
|---|---|---|
| 0 | Recon + create your own plan file | Read this prompt + reference plan + #80 + #79. Write your own plan file at ~/.claude/plans/<your-session-tag>.md (or use the existing one — your call). Bake Constraints 1+2 into every apply path. |
| 1 | HUD download + sanity | Click-through HUD account at huduser.gov; download ZIP_COUNTY_<YYYYqQ>.csv. Verify 85001 + 5 known PO-Box ZIPs are present. Fallback: zipcodes npm package if HUD auth is friction. |
| 2 | Build USPS snapshot | scripts/db/build-usps-snapshot.js — filter HUD CSV to federal-30 + NY; write scripts/db/data/usps-zip-state-<date>.csv (committed). |
| 3 | Build Tier 0.5 audit script | scripts/db/audit-federal-completeness-tier-0-5.js — mirrors Tier 0 (scripts/db/audit-federal-completeness.js); only universe input differs. Same 4-way classification (insertable / needs-PUF / discrepancy / cms-error). |
| 4 | Run dry-run audit against PROD READ replica | Generates federal-tier-0-5-gap-report-<date>.json. ~10 min runtime. |
| 5 | HARD STOP — TRIAGE | Surface report on #80. Wait for user phased-decision. No apply path runs in this phase. |
| 6 | Build seed script (per user direction) | scripts/db/seed-federal-tier-0-5.js (or extend seed-federal-completeness.js) — supports per-class / per-state filters so user can apply subsets. Marker: _seedSource: "federal-tier-0-5-audit-<date>". Three modes: --dry-run / --apply / --rollback. |
| 7 | Apply user-greenlighted batches (each with backup) | For each batch: BACKUP → dry-run → user approves → apply → smoke matrix → status comment on #80. |
| 8 | Validation tier | Calculator baseline diff (ZERO DIFFS gate), Tier 1 audit, multi-county integrity check, smoke matrix on user-approved sample of newly-inserted ZIPs. |
| 9 | Docs | docs/validation/tier-0-5-federal-uspscompleteness.md, docs/infrastructure/data-sources.md (annual refresh playbook), docs/infrastructure/change-log.md (dated entry per apply batch). |
| 10 | Status comments + close | Update #80 with final state. Update #79 / #47 Phase 11 checklist. |
Realistic actual end-to-end estimate: 30-60 min of script writing + dry-run, then potentially several user-paced phases for actual applies depending on what the report surfaces.
Critical files
To create:
scripts/db/build-usps-snapshot.jsscripts/db/data/usps-zip-state-<date>.csv(committed snapshot)scripts/db/audit-federal-completeness-tier-0-5.jsscripts/db/data/federal-tier-0-5-gap-report-<date>.json(committed audit-trail report)scripts/db/seed-federal-tier-0-5.js— must support--state=<list>,--class=<list>filters so user can apply subsets per Constraint 2docs/validation/tier-0-5-federal-uspscompleteness.md
To modify:
docs/infrastructure/data-sources.md— add HUD ZIP-County crosswalk; extend annual refresh playbookdocs/infrastructure/change-log.md— dated entry per apply batchdocs/security-compliance/vendor-register.md— note HUD as a data source (no BAA needed; public federal data)
To leave UNTOUCHED (hard guardrails):
- All 30,326 pre-existing federal+NY county docs (never modify)
- All 17,537 SBE-side docs with
_seedSource: "cms-2026-04-30"(out of scope) - All 3 docs with
_seedSource: "federal-gap-fix-2026-05-01"(Path 1 fixes) - All 366 docs with
_seedSource: "federal-completeness-audit-2026-05-01"(Tier 0 fixes) src/app/api/counties/route.ts(route logic correct)src/lib/hooks/use-calculator.ts(hook logic correct)STATE_BASED_MARKETPLACES,FEDERAL_STATES,SBE_STATESconstants- STAGING ATLAS entirely
Patterns to reuse (don't re-invent)
- Audit pipeline —
scripts/db/audit-federal-completeness.js(Tier 0). Copy structure, swap universe loader. - Seed pattern —
scripts/db/seed-federal-completeness.js. Copy structure, change marker tag, add per-state / per-class filter flags for Constraint 2. - CMS query — same
lookupZipViaCms()pattern used inscripts/db/build-cms-snapshot.js(5 concurrent, 25 req/sec validated). - regionId lookup — same in-memory map pattern from
audit-federal-completeness.js. - 3-mode CLI —
scripts/db/ingest-puf-augment.js. FEDERAL_PLUS_NYset — same constant as Tier 0 audit.- Calculator baseline diff —
scripts/audit/calculator-baseline-diff.tsruns against prod URI in.env.local; ZERO DIFFS gate.
Tooling + auth expectations
You'll need:
- AWS SSO refreshed:
aws sso login --profile askflorence-prod(andaskflorence-stagingif you ever need it — but you shouldn't). - Atlas CLI auth:
atlas auth loginif you need to add/remove your laptop IP from the prod Atlas project allowlist. - MongoDB tools:
mongodump,mongorestore,mongosh(Homebrew:brew install mongodb-database-tools mongosh). - Node 22+: should already be installed.
- HUD account: register at https://www.huduser.gov/portal/datasets/usps_crosswalk.html if you go that route. Free.
- gh CLI: for posting status comments on #80.
Prod Mongo URIs from Secrets Manager (only fetch when you need them; keep them out of .env.local since they rotate):
bash
PROD_READ_URI=$(aws --profile askflorence-prod secretsmanager get-secret-value \
--secret-id prod/mongodb/app-read --query SecretString --output text)
PROD_WRITE_URI=$(aws --profile askflorence-prod secretsmanager get-secret-value \
--secret-id prod/mongodb/app-write --query SecretString --output text)PROD_READ_URI is sufficient for Phases 0-5 (everything until apply). PROD_WRITE_URI only needed at apply time (Phase 7), and only for the specific batch you're applying.
Style + protocol reminders (from CLAUDE.md / AGENTS.md)
- No em dashes (—) or en dashes (–) in any new docs/comments. Hyphens only.
- This Next.js version has breaking changes from typical training data — don't touch app code in this session anyway, but if you ever need to, read
node_modules/next/dist/docs/first. - Don't push to main without explicit user sign-off. This work lives on the
tier-0-5-federal-completeness-auditbranch. - Don't deploy. No
vercel --prodruns from this session. - All commits in this worktree use the standard message format with
Co-Authored-By: Claude Opus 4.7 (1M context). - Track every batch boundary as a status comment on #80, not just in chat.
What "done" looks like
- Tier 0.5 audit run end-to-end at 100% match (every USPS-known federal+NY ZIP either accounted for in DB or documented as an exclusion)
- Every apply was preceded by a verified backup with documented restore command
- Every apply was explicitly user-greenlighted via the analyze → report → phased flow on #80
- Calculator baseline diff: ZERO DIFFS post-apply
- Tier 1 + Tier 1.5 audits: 100% post-apply
- 85001 specifically resolves on prod (curl
https://askflorence.health/api/counties?zip=85001returns Maricopa County response) - Markdown report committed at
docs/validation/tier-0-5-federal-uspscompleteness.md - Annual refresh playbook in
docs/infrastructure/data-sources.mdextended - #80 closed with final summary comment
Your first action when this session starts
- Read this entire file.
- Read
/Users/tahaabbasi/.claude/plans/hey-so-okay-that-delightful-eich.md(the original session's plan — comprehensive context). - Read
gh issue view 80andgh issue view 79. - Read
scripts/db/audit-federal-completeness.jsandscripts/db/seed-federal-completeness.jsto internalize the Tier 0 pattern you'll mirror. - Write your own plan file at
~/.claude/plans/<your-session-tag>.md. The plan must explicitly bake in Constraints 1 + 2 — every phase that touches prod has a backup precondition, and Phase 5 is a hard stop with no apply path. - Enter plan mode, share the plan via ExitPlanMode for user approval.
- Then execute Phase 0 through Phase 4 (recon → dry-run audit → report).
- Stop at Phase 5. Surface report on #80. Wait for user.
- Execute user-greenlighted batches one at a time, each with its own backup + post-apply verification + status comment.