Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Voice ​

Voice ships in Phase 1.5, immediately after text Florence is validated in production. The architecture is designed so voice is an I/O affordance on the same text runtime, not a separate product.

Three independent streams, joined in FlorenceRuntime ​

Text is the source of truth. ASR produces text; text is what the runtime consumes, what the audit log stores, what evals grade, what the grounding check verifies. Voice does not bypass any quality or compliance control that applies to text.

Why not voice-to-voice models (OpenAI Realtime, Gemini Live, etc.) ​

Tempting — one stream in, one stream out, visibly magical. Wrong choice for Florence. Reasons:

  1. They abstract away the tool loop, grounding check, and audit trail — exactly the things we cannot abstract under our compliance posture.
  2. Text is the legal record; voice is a UI affordance. Designs that make audio the primary artifact are harder to audit.
  3. Unit economics are opaque and vendor-controlled.

The three-stream design costs us ~100–200 ms of latency over voice-to-voice; that's the price of auditability and we accept it.

Latency budget ​

End-of-speech to first audio back, target ≤ 400 ms:

StageBudget
ASR finalization100–250 ms
Intent classification + model routing50–80 ms
First LLM token (with prompt cache hit + pre-warmed tool calls where possible)100–250 ms
TTS first audio75–150 ms

Dominant term is ASR finalization. Two moves compress it:

  • Partial-transcript dispatch. Start intent classification (and optionally speculative tool calls) on partial ASR output; commit when ASR finalizes.
  • Warm connections. Persistent streaming connections to ASR + TTS; avoid TLS + DNS per turn.

Vendor strategy through Phase 3 ​

The principle: preserve premium quality through Phase 3, not degrade to basic FedRAMP options. Detail of the strategy, including the dedicated-VPC + FedRAMP-reference-customer tracks, is in #61 — summarized below.

Phase 1.5 launch (commercial AWS + BAA) ​

RoleVendorLanguagesCostLatencyBAA
ASRDeepgram Nova-336 incl. EN + ES with code-switching~$0.004/min~250 msYes
TTSCartesia Sonic-215 incl. ES~$0.015/min~75 ms first audioConfirming (Issue #57)

Both pluggable via adapter sinks (see tool surface); swap is a config change.

Phase 3 — keep premium quality without regression ​

Three tracks run in parallel; whichever lands first becomes primary.

Track A — Dedicated VPC / single-tenant deployment ​

Negotiate with Deepgram / Cartesia / ElevenLabs for deployment of their models inside our AWS account, on our GPUs. Pattern exists (MongoDB dedicated-VPC, Anthropic via Bedrock). Vendor becomes a software license, not a data subprocessor — our FedRAMP posture covers them.

Most likely fast yes: Cartesia (Series A, hungry for enterprise logos). Deepgram already offers enterprise on-prem. ElevenLabs is the hardest nut; worth the ask at our projected scale.

Track B — Named reference customer for FedRAMP ​

Deepgram is actively pursuing FedRAMP Moderate. Named regulated-healthcare reference customers accelerate vendor 3PAO packages 6–12 months. Same conversation opening with Cartesia and ElevenLabs. We become part of their regulatory roadmap rather than waiting passively.

Track C — Self-hosted fine-tuned Florence voice ​

Fallback AND potentially the authenticated-member experience by design. Open-weight models:

  • ASR: Whisper v3 large-turbo, hosted on AWS SageMaker in our FedRAMP account. Quality matches Deepgram on EN, very close on ES. Latency 100–250 ms warm.
  • TTS: F5-TTS / Orpheus / StyleTTS-2, fine-tuned on a reference-audio corpus we collect from launch day forward (consent-captured). Produces a proprietary Florence voice no competitor can fingerprint.

Inherits our FedRAMP posture. No subprocessor. Unit economics flip favorable past ~100 voice-hours/day (GPU amortization).

Product framing: after enrollment, Florence becomes "your Florence" — slightly warmer, personalized, hers alone. Transition feels like an upgrade, not a compromise.

Compliance read that may unlock Track A+B wholesale ​

FTI is IRS-sourced data via the CMS Hub — not user-self-attested income. Pre-enrollment subsidy estimates computed from "I make $40k" are PII, not FTI. Post-enrollment confirmed APTC from the Hub is FTI.

If an EDE-literate compliance counsel confirms this reading, most of the voice surface is not in EDE scope, and Deepgram + Cartesia with HIPAA BAA covers it. Only the narrow post-enrollment authenticated-member FTI utterances need special handling. This would be the single biggest unlock on the whole voice track — #61 carries the action item.

Multilingual — EN + ES from launch, more later ​

Multilingual is nearly free by architecture, because:

  1. Claude speaks Spanish fluently — no translation layer needed for the LLM turn.
  2. Tool results are language-agnostic JSON.
  3. Deepgram Nova-3 handles EN/ES natively with in-utterance code-switching (common for US Hispanic members).
  4. Cartesia has ES voices; Polly / Transcribe have ES at the P3 fallback tier.
  5. The system prompt includes a one-line instruction: respond in the user's language.

Adding Mandarin / Vietnamese / Tagalog later = UI locale toggle + new eval set + TTS-voice check. No architectural change.

Voice adapter sinks ​

Same pattern as every vendor integration:

ts
// src/lib/adapters/voice-asr.ts
export const asrAdapter = defineAdapter({
  name: "voice-asr",
  provider: process.env.ASR_PROVIDER ?? "deepgram",   // or "aws-transcribe" | "self-whisper"
  fedramp: /* resolved from provider */,
  baa: /* resolved from provider */,
  accepts: ["Public", "PHI", "PII", "FTI"] as const,  // after compliance read
});

Swap is a config change. Adapter enforces declared class acceptance at compile + runtime.

On-device voice for the native app ​

When a React Native app ships:

  • iOS: SFSpeechRecognizer for ASR, AVSpeechSynthesizer (premium voices) for TTS. Audio never leaves the device.
  • Android: equivalent on-device APIs.

Zero voice subprocessor. Strongest privacy posture. Marketable as "your voice stays on your phone." Web support via Web Speech API is inconsistent; skip on web, use in native.

Quality on modern iPhones is competitive enough for the majority of conversations; fall back to server-side voice for the long tail where on-device is too weak.

What voice does NOT change ​

  • The tool surface (see tool surface): same tools, same schemas.
  • The audit log (see evals & observability): text transcript is the record; audio is stored encrypted with retention policy but not the primary artifact.
  • Evals: grade the text transcript produced by ASR + the text response Florence generates. Audio-level quality is a separate telemetry dimension (see evals & observability).
  • Guardrails: all five classifier layers run on the text path, before TTS.

Tracking ​

  • Phase 1.5 voice launch: tied to Florence text launch + 1 iteration window
  • Phase 3 voice track outcome: #61 + voice-specific sub-issue to be spawned when text launch nears
  • Vendor partnership work: #61 voice-vendor-partnership-track comment
  • BAA + FedRAMP status per vendor: #57
Pager
Previous pageKnowledge: SBC scenarios & CSR
Next pageEvals & observability

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.