Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Data Classification Policy ​

Status: Active. Last updated April 12, 2026. Purpose: SOC 2 evidence for CC6.1 (Logical Access), CC6.5 (Data Protection), A1.2 (Availability)


Classification Levels ​

LevelDefinitionExamplesEncryptionRetention
PublicNo restrictions. Intentionally published.Plan names, metal levels, issuer names, premium amountsAt rest (AES-256)Indefinite
InternalBusiness-sensitive. Not for external sharing.SLCSP calculations, data source URLs, API keysAt rest + in transit (TLS)Duration of use
PIIPersonally identifiable information.Email, name, phone, addressAt rest + in transit + field-level (CSFLE)Per purpose + 7yr audit
PHIProtected health information (HIPAA).SSN, DOB, income (with health context), enrollment recordsAt rest + in transit + field-level (CSFLE + KMS)Per purpose + 7yr audit

Collection Classification ​

Phase 1 Collections (Active) ​

CollectionClassificationContains PII/PHI?EncryptionRetentionAccess
plan_yearsPublicNoAt rest (Atlas default)Per plan year (keep all years)app-read, app-write
plansPublicNoAt rest (Atlas default)Per plan year (keep all years)app-read, app-write
regionsPublicNoAt rest (Atlas default)Per plan year (keep all years)app-read, app-write
zip_countyPublicNoAt rest (Atlas default)Indefinite (geographic data)app-read, app-write
audit_logInternalMay contain IP addressesAt rest (Atlas default)7 years (TTL index)audit-write (insert), admin (read)

Key: Phase 1 collections contain NO PII or PHI. All data is publicly available plan information from government sources (DFS filings, marketplace data, CMS PUF).

Cross-cluster reference collections (live on staging Atlas, read by prod via AWS PrivateLink — Phase 11) ​

CollectionClassificationContains PII/PHI?EncryptionRetentionAccess
formularies_stagingPublicNo (CMS §1311 MRF formulary data — RxCUI → plan tier mappings)At rest (Atlas default) + TLS in transit + AWS PrivateLink (network layer)Per plan yearapp_read_staging (read-only, prod) + ingest pipeline (write, staging account)
providers_stagingPublicNo (NPPES public NPI directory — provider name, NPI, specialty, network membership)At rest (Atlas default) + TLS in transit + AWS PrivateLink (network layer)Per refresh cycleapp_read_staging (read-only, prod) + ingest pipeline (write, staging account)

Where these live + read path: these collections live ONLY on the staging Atlas cluster (askflorence-staging, project_id 69e31af12fd2c0aef51bbb41). The prod app (askflorence.health) reads them via AWS PrivateLink endpoint vpce-0c81aea11e29bb928 using the read-only app_read_staging Atlas user. The §1311 ingest pipeline writes them from the staging AWS account; nothing on prod ever writes to these collections.

Why staging cluster, not prod cluster: keeps prod cluster on M10 HIPAA tier ($56/mo) instead of upgrading to M30 ($382/mo) to handle the 2.14M-doc + 30M-tuple footprint. Saves ~$326/mo recurring while keeping prod's audit boundary clean (only PHI processing on prod cluster). See ADR 0004 for the full decision.

Drift guard: #100 / ENG-239. Two-phase enforcement, both shipped:

  • Phase 1 (static CI guard) shipped 2026-05-08 — scripts/audit/staging-collections-guard.ts enforces the data-classification contract at PR time: fails the build if any getReferenceDb() call references a collection not on STAGING_ALLOWED_COLLECTIONS (defined in src/lib/db.ts). Workflow at .github/workflows/staging-collections-guard.yml. Allow-list duplicated in the script (defense-in-depth).
  • Phase 2 (live nightly drift check) shipped 2026-05-09 — scripts/audit/staging-cluster-drift.ts audits the actual Atlas state of app_read_staging (the cross-cluster reader) at 08:00 UTC daily via .github/workflows/staging-cluster-drift.yml. Verifies the user has exactly one custom role (role_reader_reference@admin) granting only FIND on exactly the expected 2 collections (formularies_staging + providers_staging) — opens a P1 GitHub issue on drift. As part of Phase 2 the user's role was tightened from built-in read@askflorence (whole-DB scope) to per-collection custom role role_reader_reference; verified prod cross-cluster reads remain healthy after the tightening.

Together these protect the classification claim above: Phase 1 catches code-level drift at PR time; Phase 2 catches runtime drift (privilege escalation via Atlas Admin UI, out-of-band role changes, etc.).

Phase 2 Collections (Future — Not Yet Created) ​

CollectionClassificationContains PII/PHI?EncryptionRetentionAccess
consumersPHIYes (SSN, name, DOB, address)At rest + CSFLE + KMSPer purpose + 7yr audit trailScoped (per-consumer access)
enrollmentsPHIYes (links consumer to health plan)At rest + CSFLEPer purpose + 7yr audit trailBroker (assigned only), consumer (own)
broker_assignmentsInternalNo (broker business info only)At restDuration of relationshipAdmin

Phase 2 requires: MongoDB Client-Side Field Level Encryption (CSFLE) with AWS KMS before these collections are created. See docs/security-compliance/encryption-policy.md for the encryption policy + CSFLE roadmap.


Data Flow Classification ​

Data FlowClassificationHandling
User enters zip + age + incomeNot storedStateless; used for calculation only; not persisted
Plan search resultsPublicReturned to client; no PII
Waitlist email submissionPIIStored via Resend API; not in MongoDB
Enrollment application (future)PHIField-level encrypted in MongoDB; audit logged
Broker view of consumer data (future)PHI access eventDecrypted on-demand; time-limited session; audit logged

Source File Classification ​

SourceClassificationStorageRetention
DFS Final Exhibit ZIPsPublic (government filings)S3 + local backupIndefinite
NYSOH scraped HTMLPublic (public marketplace data)S3 + local backupIndefinite
CMS PUF CSVsPublic (government data)S3 + local backupIndefinite
Official NY documents (PDFs)PublicS3 + local backupIndefinite
Data ingestion manifestsInternalS3 (with source file checksums)Indefinite

Role-to-Collection Access Matrix ​

Roleplan_yearsplansregionszip_countyaudit_log
app-readReadReadReadRead—
app-writeRead/WriteRead/WriteRead/WriteRead/Write—
audit-write————Insert only
Atlas adminFullFullFullFullFull

SOC 2 Control Mapping ​

ControlEvidence
CC6.1 (Logical Access)Role-to-collection matrix, minimum necessary access
CC6.5 (Data Protection)Classification levels, encryption requirements per level
A1.2 (Availability)Retention policies, backup configuration
P6.1 (Privacy — Data Use)Data flow classification, "not stored" for anonymous queries
Pager
Previous pageAccess Control
Next pageDocumentation Hosting

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.