Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Brief — Florence AI architecture & infrastructure ​

Status: input-of-record, not finalized architecture. This is the incoming brief handed to the agent on 2026-04-22 to frame the Florence AI research + design track. Authoritative design decisions are tracked in the parent GitHub issue linked below; sub-decisions spawn their own issues as they resolve.

Naming note: the name "Florence" has trademark exposure. Until legal clears the final brand, use Florence AI or AskFlorence AI in new work. The text below preserves the original "Florence" usage verbatim for fidelity to the input document.

Sequencing note: Florence AI integration ships after AWS migration completes and after the full deterministic lead → enrollment → member-servicing flow is live for both consumer and agent users. Research and architecture design run now, in parallel.

Tracking issue: #61 — Florence AI architecture + infrastructure design.


Brief for Claude Code: Florence Architecture & Infrastructure ​

Purpose of this brief ​

This brief frames the architecture and infrastructure work for Florence — the core conversational AI agent that is the primary user interface to the AskFlorence platform. Florence is not a support chatbot or a feature; Florence is the product. Every user interaction (pre-enrollment discovery, plan selection, enrollment assistance, member lifecycle, post-enrollment support, renewal) flows through Florence.

The goal of this brief is to provide Claude Code with the full context needed to architect Florence's LLM layer, infrastructure, deterministic API integration, and compliance posture in a way that is:

  1. Viable for immediate Phase 1 launch (2-week horizon) under HIPAA + BAA coverage
  2. Cleanly migratable to Phase 3 EDE compliance when AskFlorence pursues primary EDE entity status, without architectural rewrites
  3. Aligned with the product vision of Florence as a continuous conversational agent across the entire user journey
  4. Economically sustainable at launch scale and at growth scale

The architecture must be designed once, built in stages, and migrate cleanly between compliance postures without re-platforming the core logic.

Product vision context ​

Florence is the conversational interface for AskFlorence. Users interact with Florence via text and eventually voice. Florence:

  • Conducts intake conversationally (meds, doctors, pharmacies, household context, income)
  • Translates natural language to structured parameters for the deterministic API layer
  • Presents plan options with personalized explanations derived from deterministic plan data
  • Handles post-enrollment questions (benefits, drug coverage, provider network, claims, bills)
  • Identifies and initiates SEP workflows for life events
  • Supports renewal analysis and plan optimization
  • Escalates to human agents when a conversation requires human judgment

Florence is positioned as an AI agent operating under regulated agent/broker rules, supervised by licensed human agents, with the long-term goal of being the first AI agent licensed as a health insurance broker under an NPN (or operating under a human NPN in the interim). This positioning shapes architecture choices: Florence's decisions must be auditable, her outputs must be grounded in deterministic data where possible, and her operation must be defensible under CMS oversight.

The deterministic API layer is the foundation ​

Florence is not the source of truth for any computable fact. All factual claims (is a drug covered, what's a copay, which plan matches a user's doctors, what's the subsidy amount, what's the member's current deductible progress) come from the deterministic API layer that already exists in the AskFlorence platform. Florence's job is to:

  1. Translate user natural language into structured API calls via tool use
  2. Interpret structured API results and produce natural-language explanations
  3. Manage conversation context and state
  4. Escalate appropriately when she cannot ground an answer in deterministic data

This separation is critical for both accuracy (no hallucinated benefit details) and compliance (every factual claim traces to an auditable API call and data source). Claude Code should treat the deterministic API as a given and design Florence to consume it, not replicate its logic.

Compliance context Claude Code must design around ​

Phase 1 (launch through pre-EDE operation, HIPAA-governed) ​

AskFlorence is operating as a downstream agent/broker at launch. Enrollments are submitted through a primary EDE entity's environment (HealthSherpa or equivalent). Florence processes PHI under HIPAA Security Rule obligations. The compliance posture:

  • HIPAA Privacy Rule + Security Rule applies to Florence's handling of member data
  • BAA required with every vendor in the PHI flow
  • SOC 2 Type II is in progress (Vanta/Drata to be initiated within 60 days of launch)
  • Current infrastructure (Vercel, MongoDB Atlas commercial) is acceptable for Phase 1

Phase 3 EDE (future target, NIST 800-53 Moderate / MARS-E 2.2 governed) ​

When AskFlorence pursues primary EDE entity status (estimated 12-18 months out), the compliance bar rises dramatically:

  • Approximately 294 NIST 800-53 Moderate controls must be implemented and audited
  • MARS-E 2.2 controls for ACA marketplace data handling
  • Two independent audits required: Business Requirements Audit + Privacy and Security Audit
  • Interconnection Security Agreement (ISA) signed with CMS
  • All systems handling EDE-scoped data (PII, FTI, SSA/DHS verification data, full enrollment applications) must run on FedRAMP-authorized infrastructure or inherit controls from authorized providers
  • Subprocessors in the EDE scope must have FedRAMP authorization or be architecturally excluded from EDE data flows

Data classification Claude Code must enforce in the architecture ​

Data classExample fieldsPhase 1 treatmentPhase 3 treatment
Marketing / recruitingName, email, ad source, engagement metricsNon-scoped, commercial tools OK (HubSpot, etc.)Unchanged
PHI (HIPAA)Meds, doctors, health conditions mentionedBAA-covered vendors onlyBAA + FedRAMP Moderate if in EDE scope
PII (non-FTI)Name, DOB, address, phone in marketplace contextHIPAA postureFedRAMP Moderate when in EDE scope
FTI-derivedHousehold income, tax family composition, subsidy amountHIPAA posture, strict controlsFedRAMP Moderate, in-platform only, never in external tools
Application payloadSSN, citizenship, full eligibility applicationHIPAA postureFedRAMP Moderate, in-platform only
CMS Hub responsesEligibility determination, verification statusHIPAA postureFedRAMP Moderate, in-platform only

The architecture must enforce these boundaries through explicit data flow controls, not through convention. At Phase 3 audit time, the auditor will trace every field in every user interaction to its storage and processing location.

What Claude Code should design ​

1. Florence's LLM execution layer ​

Phase 1 target: Claude via Anthropic direct API under signed BAA. Model routing:

  • Default: Claude Haiku 4.5 for standard queries (benefit lookups, drug coverage questions, general explanations)
  • Escalate: Claude Sonnet 4.6 for complex plan comparisons, nuanced SEP scenarios, edge cases
  • Use sparingly: Claude Opus 4.7 only if Sonnet cannot resolve; measure necessity

Phase 3 target: Claude via AWS Bedrock in a FedRAMP Moderate-authorized region, or AWS GovCloud (FedRAMP High) if the EDE auditor requires it. The model family and routing logic remain the same; only the provider changes.

Design requirement: Florence's LLM calls must go through a provider-abstraction layer (Vercel AI SDK or equivalent thin wrapper) so the Phase 1 → Phase 3 migration is a configuration change, not a code rewrite. Every prompt, every tool definition, every response parsing path must be provider-agnostic.

Claude Code should design:

  • The abstraction layer (likely Vercel AI SDK given the stack)
  • The model routing logic (Haiku vs. Sonnet vs. Opus selection criteria)
  • The retry, fallback, and timeout policies
  • The token budgeting per conversation (context management, summarization triggers)
  • The cost monitoring hooks (per-conversation cost attribution, daily/monthly cost aggregates, alerting thresholds)

2. Tool use architecture ​

Florence interacts with AskFlorence's deterministic APIs via LLM tool use. Claude Code should design:

  • The tool schema (function definitions Florence can call)
  • The tool execution layer (how tool calls route to internal APIs, auth, error handling)
  • The result serialization back into Florence's context
  • The tool versioning strategy (as APIs evolve, tool definitions evolve)
  • The tool access control (which tools Florence can use in which user contexts — e.g., authenticated member context unlocks member-specific tools; anonymous pre-enrollment context does not)

Initial tool surface likely includes:

  • search_plans(zip, household, income, desired_metal_tier, ...)
  • check_drug_coverage(plan_id, drug_name)
  • check_provider_network(plan_id, provider_npi)
  • get_member_profile(member_id) — authenticated only
  • get_member_plan_details(member_id) — authenticated only
  • initiate_sep_workflow(member_id, life_event_type) — authenticated only
  • escalate_to_human(conversation_id, reason, urgency)
  • create_ticket(member_id, summary, transcript) — the escalation queue interface

Additional tools will emerge; the architecture should make tool addition routine.

3. Conversation state and context management ​

Florence maintains conversation context across sessions. Claude Code should design:

  • Conversation persistence (what gets stored, where, how long)
  • Context retrieval for returning users (authenticated member sees continuation of prior conversations)
  • Anonymous-to-authenticated transition (lead has pre-enrollment conversation; creates account; conversation continues under member identity)
  • Context window management (summarization strategy when conversations grow long)
  • Memory architecture — what Florence should remember long-term (medication list, provider preferences, past concerns) vs. what stays session-scoped
  • The storage layer for conversation data, including PHI handling at rest

Storage target: extends the existing MongoDB architecture. Conversation data is PHI in many cases and must follow the same encryption, access control, and retention policies as other PHI in the platform.

4. Voice integration (Phase 2 feature, architect now) ​

Voice input via Whisper or equivalent. Claude Code should design:

  • The transcription service layer (OpenAI Whisper API + BAA for Phase 1; AWS Transcribe or self-hosted Whisper for Phase 3)
  • The provider abstraction for transcription (same pattern as LLM provider abstraction)
  • Audio handling (capture, encoding, streaming vs. batch)
  • Transcription correction and confirmation UX patterns (let Florence re-ask if confidence is low)
  • Voice output (text-to-speech) if bidirectional voice is planned

Voice is not a Phase 1 launch feature but the architecture should not preclude it.

5. Human escalation and ticketing ​

When Florence cannot resolve a conversation, she escalates to a human queue. There is no external support platform. The escalation system lives inside the admin dashboard. Claude Code should design:

  • The escalation trigger logic (confidence thresholds, explicit user request, specific intent categories that require human judgment)
  • The queue data model (conversation reference, member context, reason for escalation, urgency, SLA timer, assignee, status)
  • The admin UI for human responders (transcript view, member profile alongside, response composition, handoff back to Florence if appropriate)
  • The member-side continuation UX (human response appears in the same conversation thread; member doesn't feel context-switched)
  • The audit trail for escalated conversations (who handled, when, what was decided, outcome)

This is a feature of the admin dashboard, not a separate support product. Engineering scope is 1-2 weeks, not a full support platform build.

6. Audit logging ​

Every Florence interaction produces audit-relevant data. Claude Code should design:

  • What gets logged for every conversation (user identity, tools called, tool parameters, tool results summary, model used, token counts, escalation events, any PHI access)
  • Where logs go (immutable audit store, retention policy, access control)
  • The audit log schema (query-able for compliance reviews, incident investigation, Phase 3 audit evidence)
  • The relationship between conversation transcripts and audit logs (they are distinct but cross-referenced)

This matters for HIPAA (Security Rule audit controls) and becomes critical for NIST 800-53 AU-family controls at Phase 3.

7. Evaluation and quality assurance ​

Florence must be provably correct on factual questions, appropriately cautious on advisory questions, and reliably escalatory on out-of-scope questions. Claude Code should design:

  • The eval dataset structure (golden Q&A pairs, scenarios, edge cases)
  • The automated eval harness (runs on every prompt change or model upgrade)
  • Regression testing (changes to Florence's prompts or tools do not break prior behavior)
  • Human-in-the-loop review queue (sample of Florence conversations reviewed by licensed humans for quality and compliance)
  • Specific eval categories: drug coverage accuracy, plan comparison reasoning, SEP identification, escalation appropriateness, PHI handling, hallucination detection

Without evals, Florence cannot be safely iterated on. This is not optional infrastructure.

8. Compliance boundary enforcement ​

The architecture must enforce data classification boundaries in code, not by policy. Claude Code should design:

  • A data classification tagging system (types, fields, or records labeled by compliance class)
  • Egress controls (data classified as FTI or EDE-scoped cannot be sent to non-FedRAMP vendors even in principle — blocked at the wire)
  • Tool access control by user context (anonymous users cannot invoke tools that return FTI data)
  • The subprocessor data flow map (living documentation of which data class flows to which vendor)
  • Encryption in transit and at rest, with key management strategy that supports Phase 3 audit requirements

At Phase 3 audit time, the auditor will ask "can you prove FTI never reaches HubSpot?" The architecture must answer yes by construction.

9. Phase 1 → Phase 3 migration plan ​

Claude Code should produce an explicit migration plan as part of the architecture, documenting:

  • What moves from Vercel + Atlas commercial to AWS FedRAMP Moderate (or GovCloud)
  • What stays on commercial cloud (marketing site, HubSpot integration, investor tools, non-scoped services)
  • The LLM provider switch (Anthropic direct → Bedrock)
  • The transcription provider switch (Whisper → AWS Transcribe or equivalent)
  • The data migration path for conversation history, member records, PHI
  • The sequencing (what migrates first, dependencies, rollback plan)
  • The estimated timeline and engineering cost
  • The operational cost delta (FedRAMP-authorized infrastructure is meaningfully more expensive; budget implications)

The migration plan does not need to be executed in Phase 1, but the Phase 1 architecture must not foreclose any of these migration paths.

Infrastructure requirements Claude Code should address ​

Phase 1 infrastructure (operating today) ​

  • Hosting: Vercel (Next.js application)
  • Database: MongoDB Atlas (commercial tier)
  • LLM: Claude via Anthropic direct API with signed BAA
  • Transcription (when added): OpenAI Whisper via OpenAI API with signed BAA
  • Email: Resend or Postmark (BAA available on paid tiers)
  • Auth: to be designed alongside Florence architecture
  • Observability: LLM call logging (Langfuse, Helicone, or in-house), application logs, infrastructure metrics
  • Secrets management: to be designed; supports Phase 3 requirements

Phase 3 infrastructure (future target) ​

  • Hosting: AWS (FedRAMP Moderate region) or AWS GovCloud (FedRAMP High) depending on auditor guidance and cost analysis
  • Database: MongoDB Atlas for Government or equivalent FedRAMP-authorized managed DB
  • LLM: Claude via AWS Bedrock in authorized region
  • Transcription: AWS Transcribe or self-hosted Whisper in authorized region
  • Email: BAA and FedRAMP-authorized provider (AWS SES in authorized region is an option)
  • Auth: same system, hosted in authorized region with appropriate configurations
  • Observability: tooling must have FedRAMP posture or be self-hosted in the authorized region
  • Secrets: AWS KMS in authorized region, HSM-backed

The Phase 1 infrastructure choices should not create lock-in that prevents Phase 3 migration. For example: MongoDB Atlas has a government tier, so it's a defensible Phase 1 choice. A vendor with no government offering would not be.

Specific questions Claude Code should answer through architecture ​

  1. How does Florence's conversation state survive authentication transitions without losing context?
  2. How does the escalation queue integrate into the admin dashboard without becoming a full support product?
  3. What is the exact tool use retry and error handling protocol when the deterministic API is slow, errors, or returns ambiguous results?
  4. How are conversations that touch FTI-derived data handled differently from conversations that don't? (They must be, at Phase 3.)
  5. What is the cost model at 100 members, 1,000 members, 10,000 members, 100,000 members — both Phase 1 and Phase 3?
  6. What is the fallback behavior when the LLM is unavailable, the deterministic API is unavailable, or both?
  7. What is the human review mechanism for Florence's quality, and how does it feed back into prompt improvements and eval updates?
  8. What is the security model for Florence's tool access when the user is anonymous vs. authenticated vs. mid-enrollment vs. post-enrollment?
  9. How does Florence's architecture change (or not) to accommodate voice in Phase 2?
  10. What does the Phase 3 migration look like as a sequenced engineering plan with estimated effort?

What is out of scope for this brief ​

  • Marketing, advertising, CRM tooling (HubSpot Free handles this, non-Florence)
  • Investor relations tooling (Foundersuite/Visible, non-Florence)
  • Agent recruiting pipeline (HubSpot, Ian's workflow, non-Florence)
  • Public help articles (handled via Florence-generated content within the platform; no separate help center tool required)
  • Broker portal UI design (adjacent project, informed by Florence architecture but separate)
  • Agent platform for enrollment submission (adjacent, references Florence for broker-mode queries but is its own system)

Deliverable expectations from Claude Code ​

Claude Code should produce, in collaboration with the human architect:

  1. An architecture document covering the nine design areas above
  2. A data flow diagram showing Florence's interactions with the deterministic API, storage, LLM provider, transcription provider, and escalation queue — with data classifications labeled on every flow
  3. A migration plan from Phase 1 to Phase 3 infrastructure with sequencing and estimated effort
  4. A cost model for Phase 1 and Phase 3 operation at defined member scales
  5. An implementation sequence that separates what ships at launch from what ships later (voice, advanced tool surface, multi-modal inputs, etc.)
  6. An explicit list of open questions for human decision, clearly marked

The output should be specific enough to build from, not generic architectural guidance. Where defensible defaults exist (Vercel AI SDK over custom wrapper, Bedrock over direct API at Phase 3), Claude Code should state them and justify them. Where trade-offs require human judgment (GovCloud vs. FedRAMP Moderate region, managed auth vs. self-hosted), Claude Code should lay out the trade-offs and flag for decision.

Closing context for Claude Code ​

Florence is the product. Every decision in this architecture should reinforce Florence as the primary interface, the continuous agent, the regulated entity in waiting. The architecture should make the Phase 3 EDE pursuit feel like a migration, not a rebuild. The architecture should keep Florence's core logic independent of infrastructure choices so the team can iterate on Florence's intelligence without fighting the plumbing.

Every dollar and hour of engineering investment should compound toward the long-term vision: Florence as the first AI agent to operate as a licensed health insurance broker, serving millions of members, at operational margins that make the MedVi-style lean team possible. The architecture is the substrate for that vision.

Pager
Next pageHome

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.