Brief — Florence AI architecture & infrastructure

Status: input-of-record, not finalized architecture. This is the incoming brief handed to the agent on 2026-04-22 to frame the Florence AI research + design track. Authoritative design decisions are tracked in the parent GitHub issue linked below; sub-decisions spawn their own issues as they resolve.
Naming note: the name "Florence" has trademark exposure. Until legal clears the final brand, use Florence AI or AskFlorence AI in new work. The text below preserves the original "Florence" usage verbatim for fidelity to the input document.
Sequencing note: Florence AI integration ships after AWS migration completes and after the full deterministic lead → enrollment → member-servicing flow is live for both consumer and agent users. Research and architecture design run now, in parallel.
Tracking issue: #61 — Florence AI architecture + infrastructure design.

Brief for Claude Code: Florence Architecture & Infrastructure

Purpose of this brief

This brief frames the architecture and infrastructure work for Florence — the core conversational AI agent that is the primary user interface to the AskFlorence platform. Florence is not a support chatbot or a feature; Florence is the product. Every user interaction (pre-enrollment discovery, plan selection, enrollment assistance, member lifecycle, post-enrollment support, renewal) flows through Florence.

The goal of this brief is to provide Claude Code with the full context needed to architect Florence's LLM layer, infrastructure, deterministic API integration, and compliance posture in a way that is:

Viable for immediate Phase 1 launch (2-week horizon) under HIPAA + BAA coverage
Cleanly migratable to Phase 3 EDE compliance when AskFlorence pursues primary EDE entity status, without architectural rewrites
Aligned with the product vision of Florence as a continuous conversational agent across the entire user journey
Economically sustainable at launch scale and at growth scale

The architecture must be designed once, built in stages, and migrate cleanly between compliance postures without re-platforming the core logic.

Product vision context

Florence is the conversational interface for AskFlorence. Users interact with Florence via text and eventually voice. Florence:

Conducts intake conversationally (meds, doctors, pharmacies, household context, income)
Translates natural language to structured parameters for the deterministic API layer
Presents plan options with personalized explanations derived from deterministic plan data
Handles post-enrollment questions (benefits, drug coverage, provider network, claims, bills)
Identifies and initiates SEP workflows for life events
Supports renewal analysis and plan optimization
Escalates to human agents when a conversation requires human judgment

Florence is positioned as an AI agent operating under regulated agent/broker rules, supervised by licensed human agents, with the long-term goal of being the first AI agent licensed as a health insurance broker under an NPN (or operating under a human NPN in the interim). This positioning shapes architecture choices: Florence's decisions must be auditable, her outputs must be grounded in deterministic data where possible, and her operation must be defensible under CMS oversight.

The deterministic API layer is the foundation

Florence is not the source of truth for any computable fact. All factual claims (is a drug covered, what's a copay, which plan matches a user's doctors, what's the subsidy amount, what's the member's current deductible progress) come from the deterministic API layer that already exists in the AskFlorence platform. Florence's job is to:

Translate user natural language into structured API calls via tool use
Interpret structured API results and produce natural-language explanations
Manage conversation context and state
Escalate appropriately when she cannot ground an answer in deterministic data

This separation is critical for both accuracy (no hallucinated benefit details) and compliance (every factual claim traces to an auditable API call and data source). Claude Code should treat the deterministic API as a given and design Florence to consume it, not replicate its logic.

Compliance context Claude Code must design around

Phase 1 (launch through pre-EDE operation, HIPAA-governed)

AskFlorence is operating as a downstream agent/broker at launch. Enrollments are submitted through a primary EDE entity's environment (HealthSherpa or equivalent). Florence processes PHI under HIPAA Security Rule obligations. The compliance posture:

HIPAA Privacy Rule + Security Rule applies to Florence's handling of member data
BAA required with every vendor in the PHI flow
SOC 2 Type II is in progress (Vanta/Drata to be initiated within 60 days of launch)
Current infrastructure (Vercel, MongoDB Atlas commercial) is acceptable for Phase 1

Phase 3 EDE (future target, NIST 800-53 Moderate / MARS-E 2.2 governed)

When AskFlorence pursues primary EDE entity status (estimated 12-18 months out), the compliance bar rises dramatically:

Approximately 294 NIST 800-53 Moderate controls must be implemented and audited
MARS-E 2.2 controls for ACA marketplace data handling
Two independent audits required: Business Requirements Audit + Privacy and Security Audit
Interconnection Security Agreement (ISA) signed with CMS
All systems handling EDE-scoped data (PII, FTI, SSA/DHS verification data, full enrollment applications) must run on FedRAMP-authorized infrastructure or inherit controls from authorized providers
Subprocessors in the EDE scope must have FedRAMP authorization or be architecturally excluded from EDE data flows

Data classification Claude Code must enforce in the architecture

Data class	Example fields	Phase 1 treatment	Phase 3 treatment
Marketing / recruiting	Name, email, ad source, engagement metrics	Non-scoped, commercial tools OK (HubSpot, etc.)	Unchanged
PHI (HIPAA)	Meds, doctors, health conditions mentioned	BAA-covered vendors only	BAA + FedRAMP Moderate if in EDE scope
PII (non-FTI)	Name, DOB, address, phone in marketplace context	HIPAA posture	FedRAMP Moderate when in EDE scope
FTI-derived	Household income, tax family composition, subsidy amount	HIPAA posture, strict controls	FedRAMP Moderate, in-platform only, never in external tools
Application payload	SSN, citizenship, full eligibility application	HIPAA posture	FedRAMP Moderate, in-platform only
CMS Hub responses	Eligibility determination, verification status	HIPAA posture	FedRAMP Moderate, in-platform only

The architecture must enforce these boundaries through explicit data flow controls, not through convention. At Phase 3 audit time, the auditor will trace every field in every user interaction to its storage and processing location.

What Claude Code should design

1. Florence's LLM execution layer

Phase 1 target: Claude via Anthropic direct API under signed BAA. Model routing:

Default: Claude Haiku 4.5 for standard queries (benefit lookups, drug coverage questions, general explanations)
Escalate: Claude Sonnet 4.6 for complex plan comparisons, nuanced SEP scenarios, edge cases
Use sparingly: Claude Opus 4.7 only if Sonnet cannot resolve; measure necessity

Phase 3 target: Claude via AWS Bedrock in a FedRAMP Moderate-authorized region, or AWS GovCloud (FedRAMP High) if the EDE auditor requires it. The model family and routing logic remain the same; only the provider changes.

Design requirement: Florence's LLM calls must go through a provider-abstraction layer (Vercel AI SDK or equivalent thin wrapper) so the Phase 1 → Phase 3 migration is a configuration change, not a code rewrite. Every prompt, every tool definition, every response parsing path must be provider-agnostic.

Claude Code should design:

The abstraction layer (likely Vercel AI SDK given the stack)
The model routing logic (Haiku vs. Sonnet vs. Opus selection criteria)
The retry, fallback, and timeout policies
The token budgeting per conversation (context management, summarization triggers)
The cost monitoring hooks (per-conversation cost attribution, daily/monthly cost aggregates, alerting thresholds)

2. Tool use architecture

Florence interacts with AskFlorence's deterministic APIs via LLM tool use. Claude Code should design:

The tool schema (function definitions Florence can call)
The tool execution layer (how tool calls route to internal APIs, auth, error handling)
The result serialization back into Florence's context
The tool versioning strategy (as APIs evolve, tool definitions evolve)
The tool access control (which tools Florence can use in which user contexts — e.g., authenticated member context unlocks member-specific tools; anonymous pre-enrollment context does not)

Initial tool surface likely includes:

search_plans(zip, household, income, desired_metal_tier, ...)
check_drug_coverage(plan_id, drug_name)
check_provider_network(plan_id, provider_npi)
get_member_profile(member_id) — authenticated only
get_member_plan_details(member_id) — authenticated only
initiate_sep_workflow(member_id, life_event_type) — authenticated only
escalate_to_human(conversation_id, reason, urgency)
create_ticket(member_id, summary, transcript) — the escalation queue interface

Additional tools will emerge; the architecture should make tool addition routine.

3. Conversation state and context management

Florence maintains conversation context across sessions. Claude Code should design:

Conversation persistence (what gets stored, where, how long)
Context retrieval for returning users (authenticated member sees continuation of prior conversations)
Anonymous-to-authenticated transition (lead has pre-enrollment conversation; creates account; conversation continues under member identity)
Context window management (summarization strategy when conversations grow long)
Memory architecture — what Florence should remember long-term (medication list, provider preferences, past concerns) vs. what stays session-scoped
The storage layer for conversation data, including PHI handling at rest

Storage target: extends the existing MongoDB architecture. Conversation data is PHI in many cases and must follow the same encryption, access control, and retention policies as other PHI in the platform.

4. Voice integration (Phase 2 feature, architect now)

Voice input via Whisper or equivalent. Claude Code should design:

The transcription service layer (OpenAI Whisper API + BAA for Phase 1; AWS Transcribe or self-hosted Whisper for Phase 3)
The provider abstraction for transcription (same pattern as LLM provider abstraction)
Audio handling (capture, encoding, streaming vs. batch)
Transcription correction and confirmation UX patterns (let Florence re-ask if confidence is low)
Voice output (text-to-speech) if bidirectional voice is planned

Voice is not a Phase 1 launch feature but the architecture should not preclude it.

5. Human escalation and ticketing

When Florence cannot resolve a conversation, she escalates to a human queue. There is no external support platform. The escalation system lives inside the admin dashboard. Claude Code should design:

The escalation trigger logic (confidence thresholds, explicit user request, specific intent categories that require human judgment)
The queue data model (conversation reference, member context, reason for escalation, urgency, SLA timer, assignee, status)
The admin UI for human responders (transcript view, member profile alongside, response composition, handoff back to Florence if appropriate)
The member-side continuation UX (human response appears in the same conversation thread; member doesn't feel context-switched)
The audit trail for escalated conversations (who handled, when, what was decided, outcome)

This is a feature of the admin dashboard, not a separate support product. Engineering scope is 1-2 weeks, not a full support platform build.

6. Audit logging

Every Florence interaction produces audit-relevant data. Claude Code should design:

What gets logged for every conversation (user identity, tools called, tool parameters, tool results summary, model used, token counts, escalation events, any PHI access)
Where logs go (immutable audit store, retention policy, access control)
The audit log schema (query-able for compliance reviews, incident investigation, Phase 3 audit evidence)
The relationship between conversation transcripts and audit logs (they are distinct but cross-referenced)

This matters for HIPAA (Security Rule audit controls) and becomes critical for NIST 800-53 AU-family controls at Phase 3.

7. Evaluation and quality assurance

Florence must be provably correct on factual questions, appropriately cautious on advisory questions, and reliably escalatory on out-of-scope questions. Claude Code should design:

The eval dataset structure (golden Q&A pairs, scenarios, edge cases)
The automated eval harness (runs on every prompt change or model upgrade)
Regression testing (changes to Florence's prompts or tools do not break prior behavior)
Human-in-the-loop review queue (sample of Florence conversations reviewed by licensed humans for quality and compliance)
Specific eval categories: drug coverage accuracy, plan comparison reasoning, SEP identification, escalation appropriateness, PHI handling, hallucination detection

Without evals, Florence cannot be safely iterated on. This is not optional infrastructure.

8. Compliance boundary enforcement

The architecture must enforce data classification boundaries in code, not by policy. Claude Code should design:

A data classification tagging system (types, fields, or records labeled by compliance class)
Egress controls (data classified as FTI or EDE-scoped cannot be sent to non-FedRAMP vendors even in principle — blocked at the wire)
Tool access control by user context (anonymous users cannot invoke tools that return FTI data)
The subprocessor data flow map (living documentation of which data class flows to which vendor)
Encryption in transit and at rest, with key management strategy that supports Phase 3 audit requirements

At Phase 3 audit time, the auditor will ask "can you prove FTI never reaches HubSpot?" The architecture must answer yes by construction.

9. Phase 1 → Phase 3 migration plan

Claude Code should produce an explicit migration plan as part of the architecture, documenting:

What moves from Vercel + Atlas commercial to AWS FedRAMP Moderate (or GovCloud)
What stays on commercial cloud (marketing site, HubSpot integration, investor tools, non-scoped services)
The LLM provider switch (Anthropic direct → Bedrock)
The transcription provider switch (Whisper → AWS Transcribe or equivalent)
The data migration path for conversation history, member records, PHI
The sequencing (what migrates first, dependencies, rollback plan)
The estimated timeline and engineering cost
The operational cost delta (FedRAMP-authorized infrastructure is meaningfully more expensive; budget implications)

The migration plan does not need to be executed in Phase 1, but the Phase 1 architecture must not foreclose any of these migration paths.

Infrastructure requirements Claude Code should address

Phase 1 infrastructure (operating today)

Hosting: Vercel (Next.js application)
Database: MongoDB Atlas (commercial tier)
LLM: Claude via Anthropic direct API with signed BAA
Transcription (when added): OpenAI Whisper via OpenAI API with signed BAA
Email: Resend or Postmark (BAA available on paid tiers)
Auth: to be designed alongside Florence architecture
Observability: LLM call logging (Langfuse, Helicone, or in-house), application logs, infrastructure metrics
Secrets management: to be designed; supports Phase 3 requirements

Phase 3 infrastructure (future target)

Hosting: AWS (FedRAMP Moderate region) or AWS GovCloud (FedRAMP High) depending on auditor guidance and cost analysis
Database: MongoDB Atlas for Government or equivalent FedRAMP-authorized managed DB
LLM: Claude via AWS Bedrock in authorized region
Transcription: AWS Transcribe or self-hosted Whisper in authorized region
Email: BAA and FedRAMP-authorized provider (AWS SES in authorized region is an option)
Auth: same system, hosted in authorized region with appropriate configurations
Observability: tooling must have FedRAMP posture or be self-hosted in the authorized region
Secrets: AWS KMS in authorized region, HSM-backed

The Phase 1 infrastructure choices should not create lock-in that prevents Phase 3 migration. For example: MongoDB Atlas has a government tier, so it's a defensible Phase 1 choice. A vendor with no government offering would not be.

Specific questions Claude Code should answer through architecture

How does Florence's conversation state survive authentication transitions without losing context?
How does the escalation queue integrate into the admin dashboard without becoming a full support product?
What is the exact tool use retry and error handling protocol when the deterministic API is slow, errors, or returns ambiguous results?
How are conversations that touch FTI-derived data handled differently from conversations that don't? (They must be, at Phase 3.)
What is the cost model at 100 members, 1,000 members, 10,000 members, 100,000 members — both Phase 1 and Phase 3?
What is the fallback behavior when the LLM is unavailable, the deterministic API is unavailable, or both?
What is the human review mechanism for Florence's quality, and how does it feed back into prompt improvements and eval updates?
What is the security model for Florence's tool access when the user is anonymous vs. authenticated vs. mid-enrollment vs. post-enrollment?
How does Florence's architecture change (or not) to accommodate voice in Phase 2?
What does the Phase 3 migration look like as a sequenced engineering plan with estimated effort?

What is out of scope for this brief

Marketing, advertising, CRM tooling (HubSpot Free handles this, non-Florence)
Investor relations tooling (Foundersuite/Visible, non-Florence)
Agent recruiting pipeline (HubSpot, Ian's workflow, non-Florence)
Public help articles (handled via Florence-generated content within the platform; no separate help center tool required)
Broker portal UI design (adjacent project, informed by Florence architecture but separate)
Agent platform for enrollment submission (adjacent, references Florence for broker-mode queries but is its own system)

Deliverable expectations from Claude Code

Claude Code should produce, in collaboration with the human architect:

An architecture document covering the nine design areas above
A data flow diagram showing Florence's interactions with the deterministic API, storage, LLM provider, transcription provider, and escalation queue — with data classifications labeled on every flow
A migration plan from Phase 1 to Phase 3 infrastructure with sequencing and estimated effort
A cost model for Phase 1 and Phase 3 operation at defined member scales
An implementation sequence that separates what ships at launch from what ships later (voice, advanced tool surface, multi-modal inputs, etc.)
An explicit list of open questions for human decision, clearly marked

The output should be specific enough to build from, not generic architectural guidance. Where defensible defaults exist (Vercel AI SDK over custom wrapper, Bedrock over direct API at Phase 3), Claude Code should state them and justify them. Where trade-offs require human judgment (GovCloud vs. FedRAMP Moderate region, managed auth vs. self-hosted), Claude Code should lay out the trade-offs and flag for decision.

Closing context for Claude Code

Florence is the product. Every decision in this architecture should reinforce Florence as the primary interface, the continuous agent, the regulated entity in waiting. The architecture should make the Phase 3 EDE pursuit feel like a migration, not a rebuild. The architecture should keep Florence's core logic independent of infrastructure choices so the team can iterate on Florence's intelligence without fighting the plumbing.

Every dollar and hour of engineering investment should compound toward the long-term vision: Florence as the first AI agent to operate as a licensed health insurance broker, serving millions of members, at operational margins that make the MedVi-style lean team possible. The architecture is the substrate for that vision.