Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

CA Phase C/D — provider + drug coverage ingestion playbook ​

Linear: ENG-395 under M7 (CA Phase C/D).

Status (2026-05-26): discovery complete, methodology validated, ready to execute. POC PDF parse pending.

Reusable for other SBEs: the patterns documented below — CalHEERS-style anonymous APIs, SBP-standardized formulary PDFs, Symphony-style statewide provider directory utilities — are common across SBE states. NY has its own equivalents (NYSOH + DFS); WA / OR / CO / CT each have similar structures. Treat this playbook as the template for per-state execution.


Architecture overview ​

        ┌──────────────────────────────────────────────┐
        │  CalHEERS plan-detail endpoint               │
        │  (gethealthplansbyids, anon, needs handOffId)│
        │  → formularyUrl + providerUrl + tier copays  │
        └──────────────┬───────────────────────────────┘
                       │
              one POST per CA rating area (~19)
                       │
                       v
       ┌───────────────────────────────────────┐
       │ scripts/db/data/ca-plan-documents-2026.json │
       │ map: planNumber → {formularyUrl, copays, ...} │
       └────────┬──────────────────┬───────────┘
                │                  │
                │                  v
                │      ┌────────────────────────────┐
                │      │  Carrier formulary PDFs    │
                │      │  (~15-20 unique URLs)      │
                │      │  Molina, Kaiser, Anthem... │
                │      └─────────┬──────────────────┘
                │                │
                │                v parse (pdftotext -layout + regex)
                │                │
                │      ┌────────────────────────────┐
                │      │ formularies_staging     │
                │      │ {rxcui, planIds[], tier,    │
                │      │  requirements}             │
                │      └────────────────────────────┘
                │
                v
       ┌────────────────────────────────────┐
       │ Update prod CA plan docs with      │
       │ puf.formularyId + puf.networkId    │
       │ (NEW additive fields only)         │
       └─────┬──────────────────────────────┘
             │
             v
   ┌─────────────────────────────────────┐
   │  CalHEERS anon provider endpoint    │
   │  (getproviderdetails, no auth)      │  ◄── Symphony Provider Directory (IHA/Availity)
   │  → 16,944 SF providers in one call  │      (SB 137 statewide source-of-truth)
   └────────────────┬────────────────────┘
                    │
                    v in-VPC proxy (Fargate or Next.js route handler)
                    │
   ┌─────────────────────────────────────┐
   │ providers_staging                │  ◄── optional cache; the live proxy may be enough
   │ {npi, name, specialty, networkIds[]} │
   └─────────────────────────────────────┘
                    │
                    v
   ┌─────────────────────────────────────┐
   │ /api/providers/search (CA branch)   │
   │ /api/drugs/search    (CA branch)    │
   └─────────────────────────────────────┘
                    │
                    v
   ┌─────────────────────────────────────┐
   │ <CoveragePanel /> ⚕ + ℞ live for CA │
   │ Plan-card coverage pills            │
   └─────────────────────────────────────┘

Hard constraints ​

  1. PROD CA plan collection is touched ONLY for puf.formularyId + puf.networkId foreign-key population (Phase 4). All formulary + provider data lives in staging Mongo (cost design — formularies + providers are 10+ GB; prod uses PrivateLink to staging for these queries per ADR 0004).
  2. Existing FFM drug + provider entries must not be touched. providers_staging (2.14M docs, 9.27 GB) and formularies_staging (12.5K docs, 919 MB) keep all existing entries byte-identical. CA writes are additive via MongoDB $addToSet — the operator literally cannot modify existing array elements (deep-equality compare on insertion). Single unified collection, source-tagged per entry (source: "ca_<carrier>_<year>_marketplace_formulary" vs FFM's source: "ffm_1311_mrf"). Year is part of the _id natural key (<rxcui>:<year> / <npi>:<year>) so 2026 vs 2027 formularies are separate docs by construction. Pre/post FFM-entry count assertion in the ingest script catches any deviation immediately.
  3. Snapshot staging Atlas BEFORE any writes. Atlas Cloud Backup enabled 2026-05-26; on-demand snapshot taken pre-ingest; snapshot ID logged in the ingest run.
  4. Cluster identity guard in every script. Refuses to run if MongoDB URI host does not match expected env.
  5. AWS Fargate RunTask for bulk operations. Crawls run in-VPC, Atlas writes via PrivateLink — no Starlink/home-bandwidth dependency. Local POC parse of one PDF is allowed for validation.
  6. No HubSpot egress for any data this pipeline produces. Provider/drug data is not PII per HIPAA in the form CalHEERS exposes (provider business addresses + drug catalog metadata).
  7. Rebuild drug_search_index after the formulary ingest completes (ENG-425). After CA (or any) formulary docs land in formularies_staging, run node scripts/db/derive-drug-search-index.js --apply so the drug-name search read-model reflects the new drugs. It re-derives from the WHOLE collection (FFM + CA), so CA meds become searchable with brand/generic strength parity + commonality ranking. Search-only; coverage stays per-rxcui. See docs/decisions/2026-05-09-refresh-cadence.md § "Post-ingest: rebuild derived collections".

Source-of-truth references ​

Provider directory data ​

  • Upstream: Symphony Provider Directory operated by IHA (Integrated Healthcare Association), tech-partnered with Availity. Mandated by CA SB 137 (Hernandez, 2015). All 12 CA carriers contribute. Single statewide source-of-truth.
  • Operational source for us: CalHEERS anonymous endpoint POST https://apply.coveredca.com/enrollment/enrldriver/v1/alt/anon/getproviderdetails?size=50000
  • Headers required: Content-Type: application/json, Origin: https://apply.coveredca.com, Referer: https://apply.coveredca.com/static/lw-enrollment/anon/preferences/plan-preferences/
  • Request body: {"providerType":"P","zip":"94102","radius":"10","year":"2026"} — providerType "P" = Physician, "D" = Dentist
  • Response shape:
    json
    {
      "metaData": {"currentPageItems":16944, "totalItems":16944, "totalPages":1},
      "providers": [
        {
          "providerId": 8478620,
          "firstName": "Jonathan", "lastName": "Huynh",
          "networkId": "70285CAN011-2600|40513CAN001-2600",  // pipe-delimited HIOS network IDs
          "specialty": "Surgery",
          "address": "365 Hawthorne Ave", "city": "Oakland", "state": "CA", "zip": "94609",
          "latitude": 37.820616223766, "longitude": -122.263393337961
        }
      ]
    }
  • networkId field maps directly to our existing puf.networkId field on CA plan docs. No mapping translation required.
  • Legal: unintentionally-public endpoint, scraped without IHA license. Acceptable as backend interim solution. NOT marketable as "powered by Symphony" until we license directly from IHA. Symphony customer login at symphony.iha.org; contact IHA Oakland 510-208-1740 for downstream-data-consumer subscription pricing (no public price sheet; expect $5-20K/yr).

Drug formulary data ​

  • Per carrier, per metal tier: each CA carrier publishes a marketplace formulary PDF. CC's Standard Benefit Design mandate means plans within a metal tier (Silver / Bronze / Gold / Platinum / Catastrophic) share the same formulary structure for a given carrier.
  • URL source: extracted from CalHEERS gethealthplansbyids response's formularyUrl field per plan.
  • Discovery path (one-time, per CA rating area):
    POST https://apply.coveredca.com/shopandcompare/screening
    → returns handOffId + planNumbers list
    
    POST https://apply.coveredca.com/enrollment/enrollment-shopping/v1/alt/anon/gethealthplansbyids
    Body: {"planNumbers":["..."],"handOffId":"..."}
    → returns full plan-detail array including formularyUrl, providerUrl, brochureUrl, sbcDocName,
      drugs.{mostGenericDrugsInNetwork, preferredBrandDrugInNetwork, nonPreferredBrandDrugsInNetwork, specialtyDrugsInNetwork}
  • Expected unique PDF count: ~15-20 total across all 12 carriers
  • PDF format (validated against Molina CA 2026 Marketplace formulary, 166 pages, 2.5 MB):
    Drug Name                                          Tier    Requirements/Limits
    acetaminophen rectal suppository 120 mg            Tier 1  OTC
    abacavir sulfate oral tablet 300 mg                Tier 1  QL (2 EA per 1 day)
    VIREAD ORAL POWDER 40 MG/GM (Tenofovir Disoproxil) Tier 2  QL (7.5 GM per 1 day)
    • BRAND DRUGS in ALL CAPS (per the PDF's own stated convention)
    • generic drugs in lowercase (italic-bold visually, plain lowercase in text extraction)
    • Generic equivalent in parentheses after brand
    • Tier 1-5 (Tier 5 = preventative with $0 copay per ACA)
    • Requirements: PA (prior auth), ST (step therapy), QL N EA per N days (quantity limit), MAIL (mail order required), OTC, AGE LIMIT, LD (limited distribution)
    • Section headers between drug classes: *Antiretrovirals - Rti-Nucleoside Analogues-Pyrimidines***
  • Parse approach: pdftotext -layout + ~50 lines of regex. RxCUI resolution via existing FFM cache at scripts/db/data/rxcui-resolution-cache.json + CMS autocomplete for misses.

Plan tier copays (bonus data) ​

  • Embedded in same CalHEERS gethealthplansbyids response
  • Already standardized per carrier (Tier 1 generic $/copay, Preferred Brand $/copay, etc.)
  • Can populate puf.copays.{primaryCare, specialist, genericDrugs, ...} for richer plan-detail pages without per-PDF parse

SBC + brochure URLs ​

  • Also embedded in same response (sbcDocName, brochureUrl)
  • Populate puf.urls.{sbc, brochure} on prod plan docs for plan-detail page Documents section

Compute path: AWS Fargate vs local ​

OperationWhereWhy
One-PDF POC parseLocal (mac)Quick iteration, validates the regex
19 CA rating-area screenings + handOffId harvestFargate RunTask in-VPCMany short HTTPS POSTs, throttled to avoid CalHEERS rate limits
Download ~15-20 PDFsFargate (or local — they're small, ~2-5 MB each)Negligible bandwidth
Parse all PDFsFargateCPU-bound, parallelizable across PDFs
Mongo upsert to stagingFargate via PrivateLinkAtlas writes in-VPC are fastest + most reliable
One-time prod plan doc update (foreign-key population)Fargate with explicit prod approval gateTouch prod only when explicitly authorized

Fargate task definition pattern ​

Reuse the ENG-325 esbuild bundle pattern (scripts/preflight.ts already does this for ensure-indexes):

  • esbuild the ingest script + helpers into a single CJS bundle
  • COPY into the existing askflorence-app ECR image at /app/scripts/ca-ingest.cjs
  • New workflow_dispatch GitHub Action ca-ingest.yml with phase input (harvest / parse / upsert / linkfk)
  • ECS RunTask invokes the bundle with the phase argument
  • Atlas writes go through the existing staging PrivateLink endpoint
  • Secrets sourced from existing AWS Secrets Manager (staging/mongodb/app-write)

Per-state reusability ​

This playbook is CA-specific in execution but the patterns repeat across SBEs:

StateMarketplaceEquivalent of CalHEERSStatewide provider directoryPer-carrier formulary PDFs
CACovered CaliforniaCalHEERS (Accenture-built)Symphony (IHA + Availity, SB 137)Yes — SBP-standardized
NYNY State of HealthNYSOHNYDFS provider directory + DOH plan adequacy reportsYes — per carrier
MAMassachusetts Health ConnectorMMIS/HIXMA DOI provider data + Sapphire-backed searchYes — per carrier
WAWashington HealthplanfinderWAhealthPlanFinder (Deloitte)OneHealthPort provider directoryYes — per carrier
COConnect for Health ColoradoCFHCOColorado DOI filesYes — per carrier
CTAccess Health CTAHCTCT InsuranceDept filesYes — per carrier
MDMaryland Health ConnectionMHBEMaryland Insurance AdministrationYes — per carrier
NJGet Covered NJGetCoveredNJ (Accenture, sibling to CalHEERS)NJ DOI filesYes — per carrier
PAPenniePennie (Deloitte)PA Insurance Dept filesYes — per carrier

For each new state we add:

  1. Probe the state marketplace's SPA bundles for the equivalent anon endpoints (same technique as CalHEERS — pull the production config JSON, read the API URL constants)
  2. Identify the statewide provider directory utility (varies — some states have a centralized one like Symphony; others have per-carrier portals)
  3. Verify the formulary PDF format follows the same general 3-column pattern (most do — federal ACA guidance creates de-facto standardization)
  4. Reuse this playbook's parse code + RxCUI resolution layer

Phase-by-phase execution checklist ​

See ENG-395 for the live checklist. The phases mirror the architecture diagram above (Phase 0 pre-flight → Phase 1 provider proxy → Phase 2 URL harvest → Phase 3 PDF parse → Phase 4 staging writes → Phase 5 query layer + UI flip → Phase 6 docs + cleanup → Phase 7 Symphony license track).


Operational notes captured during execution ​

(this section appended as the work progresses — patterns, gotchas, deviations from plan, retroactive lessons)

Pager
Previous pageSBE State Watchouts + Decisions
Next pageNY Phase C/D Playbook

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.