Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Tier 0 — Federal ZIP completeness audit (2026-05-01) ​

Status: Complete. 366 gaps inserted; 451 discrepancies logged for review; 1,353 extras logged. Closes Issue #73 Path 2.

Purpose: Systematic completeness check of federal-30 + NY ZIP coverage in zip_county against the U.S. Census 2020 ZCTA-County universe. Companion to the Tier 1 zip-county audit (which checks accuracy at 100% match) — Tier 0 checks coverage.

Summary ​

MetricCount
Census federal+NY universe (zip, countyFips tuples)29,793
DB before audit30,329
DB after audit30,695
Insertable gaps (Census has, DB doesn't, CMS confirms)366
Discrepancies (Census says X, CMS says Y — log only, not inserted)451
Extras (DB has, Census doesn't — informational, not modified)1,353
Needs-PUF (county entirely missing — would need PUF re-ingest)0
CMS errors during audit0

What was inserted ​

All 366 inserts are NY-side multi-county additions. Every NY ZIP that gained a doc already had ≥1 sibling doc in the DB; the audit added the missing additional county docs. The original NY ingest (scripts/db/load-ny-2026.js, 2026-04-12) loaded the primary county per ZIP for many multi-county ZIPs but missed the secondary counties.

Examples of fixed ZIPs:

  • 10463 → had only Bronx; added New York County (Manhattan)
  • 10470 → had only Bronx; added Westchester County
  • 10509 → had only Putnam; added Westchester County
  • 10940 → had only Orange; added Sullivan County

User impact: residents of these NY ZIPs whose actual address is in the secondary county will now correctly see plans for that county. Pre-audit they may have been mapped to the wrong county's plan flow.

Per-state breakdown ​

Insertable (366 gaps inserted) ​

StateCount
NY366 (all)

All other federal-30 states: 0 insertable gaps. Federal-30 ZIP coverage was already complete at the (zip, countyFips) tuple level — only NY had the multi-county-secondary-county gap pattern.

Extras (1,353 — logged, not modified) ​

DB has these (zip, countyFips) tuples; Census 2020 ZCTA doesn't recognize them. Likely because:

  • Our PUF Service Areas data (CMS, more current) tracks ZIPs Census 2020 ZCTA didn't capture (Census ZCTA boundaries lag USPS by years)
  • Some county-boundary changes since Census 2020
  • Multi-county tracking in our DB beyond what Census 2020 reflects

Distribution:

StateExtrasNotes
TX155Largest absolute count (consistent with state size)
IA96
IN91
OH86
KS77
MO71
NE68
AR67
MI61
WI55
TN51
OK50
SD50
NC46
WV46
FL43
MS40
ND38
AL33
LA32
MT27
NH14
OR14
SC14
WY11
AK9
UT4
AZ3
HI1

These extras are not corrected because:

  1. They serve users correctly today (users in these ZIPs get plans via existing data)
  2. Removing them risks stranding real users on USPS ZIPs Census 2020 doesn't yet recognize
  3. CMS-side validation hasn't been run on the extras (out of scope for Tier 0)

A future audit could spot-check the extras against CMS to confirm they're real ZIPs. Out of scope here.

Discrepancies (451 — logged, not inserted) ​

Census 2020 says ZIP X is in county A; CMS says it's in county B (different county). For these, trust CMS. The audit doesn't insert based on Census's view because CMS is canonical.

Most discrepancies are NY-skewed (~440 of 451). Likely Census 2020 boundaries differ from current CMS Marketplace API view of ZIP→county mapping, particularly for upstate NY ZIPs.

Sample:

ZIPCensus saysCMS says
03458NH/CheshireNH/Hillsborough
11370NY/BronxNY/Queens
12120NY/GreeneNY/Albany
12763NY/UlsterNY/Sullivan
12785NY/OrangeNY/Sullivan

For these ZIPs, our DB likely already has the CMS-correct county. The discrepancy is just "Census 2020 ZCTA is mildly stale relative to current CMS data." No action.

The full list is in scripts/db/data/federal-gap-report-2026-05-01.json under the discrepancy key.

Methodology ​

Input sources ​

  1. Census 2020 ZCTA-County relationship file — universe of every U.S. ZIP→county mapping. Free, federal, refreshed annually. Source: https://www2.census.gov/geo/docs/maps-data/data/rel2020/zcta520/tab20_zcta520_county20_natl.txt
  2. MongoDB zip_county collection — current state of our ZIP coverage data. Filter: state ∈ federal-30 ∪ {NY} AND sbeRedirect: { $exists: false }.
  3. CMS Marketplace API /counties/by/zip/{zip} — canonical truth for each gap, used to verify Census's claim before inserting.

Pipeline ​

Census ZCTA file ──→ build-federal-snapshot.js ──→ federal-zip-state-2020.csv (committed)
                                                            │
                                                            ▼
                                          audit-federal-completeness.js ──→ federal-gap-report-2026-05-01.json
                                                            │              (committed)
                                                            ▼
                                            seed-federal-completeness.js
                                                  --apply
                                                            │
                                                            ▼
                                                  zip_county collection
                                                  (366 new docs with
                                                  _seedSource marker)

Classification logic per gap ​

For each (zip, countyFips) tuple in Census \ DB:

  1. Query CMS for the ZIP. If CMS returns counties:
    • CMS confirms (state, fips) match → check regionId lookup
      • regionId found in DB (county already has siblings) → insertable
      • regionId not found (county entirely absent from our PUF) → needs-PUF
    • CMS doesn't return Census's expected county → discrepancy (logged only)
    • CMS state ≠ Census state → discrepancy (logged only)
  2. CMS error → cms-errors (logged for retry)

Safety guards ​

  • Hard-coded state allowlist (FEDERAL_STATES ∪ {NY}) — won't touch SBE-state docs
  • Per-(zip, countyFips) keying for inserts; idempotent
  • Marker tag _seedSource: "federal-completeness-audit-2026-05-01" on every insert → unambiguous rollback
  • regionId sourced from existing DB siblings (any other ZIP in the same (state, countyFips)) — guarantees rating-area consistency within a county
  • Never modifies existing docs

Verification ​

Apply results identical on staging + prod:

StepResult
Inserted366
Already present (idempotent skip)0
Rejected — state allowlist0
Rejected — missing fields0

Validation tier (Phase 8) ​

TestResult
Calculator baseline diff (12 scenarios — UT, TX, FL, NY, SBE redirect, PO Box, Medicaid)ZERO DIFFS
Prod consistency check (no-marker docs unchanged)30,326 → 30,326 (verified)
Multi-county integrity check (sample 5 inserted ZIPs)All return correct multi-county responses (e.g., 10463 → Bronx + New York County)

Smoke probe matrix on prod (post-deploy) ​

zip=10463 → counties:[{Bronx, 36005}, {New York County, 36061}]      ← multi-county now
zip=10470 → counties:[{Bronx, 36005}, {Westchester County, 36119}]   ← multi-county now
zip=10509 → counties:[{Putnam, 36079}, {Westchester County, 36119}]  ← multi-county now
zip=10512 → counties:[{Putnam, 36079}, {Dutchess County, 36027}]     ← multi-county now
zip=10940 → counties:[{Orange, 36071}, {Sullivan County, 36105}]     ← multi-county now

Rollback ​

bash
MONGODB_WRITE_URI=$(aws --profile askflorence-prod secretsmanager get-secret-value \
  --secret-id prod/mongodb/app-write --query SecretString --output text) \
  node scripts/db/seed-federal-completeness.js --rollback

Removes only docs with _seedSource: "federal-completeness-audit-2026-05-01". The 30,326 legacy federal/NY docs untouched. Federal-gap-fix marker (3 docs) untouched. SBE marker (17,537 docs) untouched.

Annual refresh ​

Add to the data-sources playbook:

  1. Re-pull Census ZCTA file at plan-year transition
  2. Re-run build-federal-snapshot.js → updated CSV
  3. Re-run audit-federal-completeness.js → updated report
  4. Triage classification counts; should be ~0 new gaps in steady state (the federal-30 ingest captures things directly via PUF)
  5. If gaps surface, run seed-federal-completeness.js after triage
  6. Append change-log entry

Files ​

  • scripts/db/build-federal-snapshot.js (build the universe CSV)
  • scripts/db/data/federal-zip-state-2020.csv (committed snapshot, 29,793 rows)
  • scripts/db/audit-federal-completeness.js (run the audit)
  • scripts/db/data/federal-gap-report-2026-05-01.json (committed report, 467 KB)
  • scripts/db/seed-federal-completeness.js (apply the inserts)

Related ​

  • Issue #73 — parent (Path 1: 3 known gaps fixed in commit aa2a97a; Path 2: this audit)
  • docs/validation/methodology.md — audit methodology reference
  • docs/infrastructure/data-sources.md — ingest pipeline overview
  • Tier 1 zip-county audit (scripts/audit/tier-1-zip-county.js) — companion accuracy check
  • Tier 1.5 SBE zip-county audit (Issue #70) — companion SBE-side audit
Pager
Next pageHome

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.