Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Outage playbook ​

Anthropic's public status page shows substantial instability in recent months: ~15 incidents in ~10 days in mid-April 2026 alone, including a ~13-hour Opus 4.6 outage on 2026-04-20, repeated Sonnet 4.6 error spikes, and multiple 1–2 hour API/login incidents. 90-day uptime has ranged 98.8 %–99.91 % — at the lower bound, that is ~105 hours/year of customer-facing unavailability. Unacceptable for a product where the LLM is the interface.

This playbook specifies exactly how Florence stays up, and stays at quality, when a provider degrades or fails. The design goal is explicit:

Quality-preserving failover first, quality-regressing failover only as a last resort. Most Claude outages are regional or model-specific — a same-weights fallback (different region, different transport) preserves full quality and is invisible to users.

Provider risk & portability frames the strategic risks and the four tiers of vendor switch. This doc is the operational companion: exact thresholds, concrete runbook steps, graceful degradation modes, and the chaos-drill cadence that keeps all of it rehearsed.

Failure-domain analysis — what is actually independent? ​

Not all "Claude" endpoints share a failure domain. This determines which failovers preserve quality.

EndpointFailure domainCorrelation with other endpoints
Anthropic direct API (api.anthropic.com) — us-eastAnthropic infraCorrelates with claude.ai; uncorrelated with Bedrock
Anthropic direct API — other regionsAnthropic infraSame as above
AWS Bedrock Claude — us-east-1AWS regionUncorrelated with Anthropic direct; correlated with other AWS services in region
AWS Bedrock Claude — us-west-2AWS regionUncorrelated with us-east-1
AWS Bedrock Claude — eu-west-1, ap-northeast-1AWS regionUncorrelated with other regions
OpenAI via AzureMicrosoft / OpenAIFully uncorrelated
Vertex AI GeminiGoogleFully uncorrelated
Self-hosted SageMakerOur AWS regionCorrelates with own region

The useful insight: Anthropic's Apr 20 Opus outage was at Anthropic's infra. Bedrock Claude Opus in us-east-1 was likely unaffected. A product that routes only via Anthropic direct feels the full 13-hour outage; a product with Bedrock as a parallel transport feels nothing.

The five-tier quality-preserving cascade ​

At every turn, Florence attempts providers in order. Each tier is attempted with a tight per-tier timeout; failure or timeout promotes to the next tier. Tiers 1–3 preserve full quality (same model family, same weights). Tier 4 accepts measured quality drift. Tier 5 is a graceful degradation, not a Florence turn.

Tier 1 — Primary transport, primary region ​

Whatever is configured as FLORENCE_LLM_PRIMARY — e.g., bedrock-claude-sonnet-4-6@us-east-1. Timeout budget: 8 s for first token; 30 s total. Retries on 5xx: 1 retry with jittered backoff.

Tier 2 — Same model, alternate region ​

Same Claude model, different Bedrock region. bedrock-claude-sonnet-4-6@us-west-2 or @eu-west-1. Zero quality drift — identical weights, identical behavior. The only variable is cross-region latency (+50–150 ms typically).

This is the single most effective failover step. AWS region outages are rare and regionally scoped; "all Bedrock regions down" has never happened.

Tier 3 — Alternate transport, same model ​

Bedrock → Anthropic direct, or Anthropic direct → Bedrock. Same Claude model, same prompt, same tools — just a different vendor's infrastructure carrying the request. Zero quality drift.

Catches the Anthropic-infra-specific outages (like the Apr 15 login/API incident, the Apr 20 Opus outage at Anthropic — Bedrock was likely unaffected).

Tier 4 — Alternate model within Claude family ​

Sonnet failing? Try Haiku. Opus failing? Try Sonnet. Per-model circuit breaker trips only that model.

Measured quality drift. For ~85 % of turns (lookup + response synthesis, already Haiku-default) this is zero-impact. For the 14 % of Sonnet turns, Haiku-on-Sonnet-task is measurably weaker on complex plan comparison and SEP triage — but still better than "Florence is unavailable."

Eval harness maintains a "downshift" eval set that measures exactly how much quality we lose when Sonnet turns are forcibly run on Haiku; published to the outage-readiness dashboard.

Tier 5 — Cross-vendor (Claude → OpenAI / Gemini) ​

Invokes the provider-risk Tier 2 switch. Adaptation prompts engage. Measured quality drift, possibly in both directions (some tasks OpenAI is stronger on, some Claude is).

Only engaged when Tiers 1–4 are all failing — i.e., broad Claude unavailability across transports. Rare, but Apr 15 (Anthropic login/API) + a hypothetical AWS Bedrock regional failure same-day would qualify.

Tier 6 — Graceful deterministic-only mode ​

All LLM providers unreachable. Florence UI switches to a clearly-labeled degraded state:

  • "Florence is temporarily unavailable. You can still browse plans, check drug coverage, and use the full platform — just without the conversational assistant."
  • All UI flows (/plans, /agents, member dashboard) remain fully functional via the deterministic API — no LLM required.
  • Authenticated members see a banner offering to leave a message that Florence will answer when she's back; message queued to the audit-log collection.
  • Voice is disabled with a specific message; text-mode basic forms stay available.

Florence fully degraded but the product is not down. The deterministic platform exists for exactly this.

Circuit breakers and thresholds ​

Per-tier, per-model, per-region breakers. Separate state so one model failing doesn't trip an unrelated one.

SignalWindowThresholdAction
5xx / timeout rate on (tier, model, region)60 s rolling> 10 %Open breaker for that combo; promote to next tier automatically
Latency p95 on (tier, model, region)60 s rolling> 3 × baselineSame — open breaker, promote
Tool-call failure rate5 min rolling> 5 %Alert on-call; evaluate breaker
Grounding-check failure spike10 min rolling> 3 × baselineAlert on-call; possible model regression — consider Tier 4 downshift
Broad Claude unavailable (all tiers 1–4 tripped)30 sAll openAuto-engage Tier 5; page on-call
All providers unavailable30 sTiers 1–5 openAuto-engage Tier 6; page founder

Breakers half-open after a cooldown (60 s → 5 min exponential) with a small canary-probe traffic share. Fully close on sustained success.

Request hedging — for turns where user experience dominates cost ​

For a small fraction of high-value turns, send the same request to two providers simultaneously and accept the first valid response. Costs ~2 × for hedged turns; kills tail latency during partial outages.

Default hedging policy:

  • Hedge always: first turn of a conversation (user's first-impression experience). ~1 % of turns at typical usage.
  • Hedge conditionally: when primary breaker is half-open (during recovery). ~<0.1 % of turns in normal operation; higher during recovery.
  • Never hedge: routine mid-conversation turns. Keeps steady-state cost overhead to ~1 %.

Hedging is a weapon against partial degradation (some requests slow, others fast) which is harder to circuit-break on than a clean failure.

Response cache — the freebie during outages ​

Anthropic prompt caching is about prefix hits on the Anthropic side. Separately, Florence maintains a response cache keyed on (prompt_version, tool_results_hash, user_context_hash) for deterministic-grade queries — "what's the cheapest Silver plan for a family of 4 in Miami at $45k income?" is the same answer for any user who asks it.

Sizes:

  • Cache size target: ~10 000 most-common deterministic-grade answers
  • TTL: 24 hours (tool results themselves cache-invalidate on deterministic-data updates)
  • Hit rate in normal operation: ~15 % (bonus on cost)
  • Hit rate during LLM outage: effectively 100 % for any cached query — user asks something we've answered before, Florence serves from cache with a freshness marker

During a Tier 6 outage, cache serves what it can before the deterministic-only banner engages.

Multi-region Bedrock — the operational detail ​

Currently staging has a bedrock-runtime VPC endpoint only in us-east-1. For the outage posture:

  • Provision Bedrock access in at least three regions pre-launch: us-east-1, us-west-2, and one of eu-west-1 / ap-northeast-1. Region-specific Bedrock access must be explicitly enabled per model (AWS Bedrock model-access UI).
  • Private endpoints (VPC interface endpoints for bedrock-runtime) in each of those regions, scoped to the ECS security group. Avoids relying on public internet egress during a regional degradation where routing is flaky.
  • Automatic region selection by the FlorenceLLMProvider adapter — starts with nearest region, promotes to alternate regions on breaker trip.
  • Region-level evals — daily eval suite runs against each region; a region failing eval (rare but possible during a regional incident) auto-excludes from the selection pool until it recovers.

Chaos drill schedule ​

Outage plans atrophy if not rehearsed. Cadence:

FrequencyDrill
WeeklyTier 2 region swap — force us-east-1 breaker closed for 10 min; verify seamless failover to us-west-2. Automated; runs in staging.
MonthlyTier 3 transport swap — flip Anthropic direct ↔ Bedrock in production for 30 min. Monitor parity.
QuarterlyTier 4 downshift — force Sonnet circuit open for 60 min; measure quality delta on live traffic against the downshift eval set.
QuarterlyTier 5 cross-vendor — route 1–2 % of traffic through OpenAI / Gemini for 1 hour; post quality-comparison report.
Semi-annualTier 6 "nuclear drill" — in staging only, disable all LLM providers; verify deterministic-only mode engages cleanly, cache serves, UI banner renders, member messages queue correctly.
On any sustained production outagePost-incident review within 48 hours — did the cascade engage as designed? Any tier skip? What to tune?

Drills are not optional. A cascade that has never been exercised is a cascade that doesn't work when you need it.

SLO targets during outages ​

Outage behavior has its own SLO, distinct from normal operation:

MetricNormal SLOOutage SLO (any tier engaged)Hard floor (Tier 6 engaged)
Florence turn success rate≥ 99.5 %≥ 99 %N/A (Florence unavailable, deterministic-only)
First-token latency p95≤ 500 ms≤ 1000 msN/A
Grounding check pass rate≥ 99.5 %≥ 99 %N/A
Deterministic platform availability≥ 99.9 %≥ 99.9 %≥ 99.9 %
User-visible degradation banner when Tier 6 engaged——Shown within 30 s

The deterministic platform SLO never drops. Whatever happens to the LLM, the user can still browse plans and see accurate prices.

Customer-visible behavior across the cascade ​

Consistent framing across tiers; users should never see raw errors.

Tier engagedUser-facing behavior
1Normal operation
2, 3Normal operation. No indicator. Latency may be slightly higher (~100–200 ms).
4Normal operation. No indicator unless the downshift materially affects a complex comparison — in which case Florence offers "Want me to dig deeper? This might take a moment," which is a natural-sounding way to prompt a Tier 3 retry or patience.
5Normal operation. No indicator. Voice/style may feel very slightly different.
6Explicit banner: "Florence is temporarily unavailable. The rest of the platform works normally — browse plans, check coverage, manage your account. [Leave a message for Florence]"

Never show raw "provider error 503" to a user. Ever.

What the FlorenceRuntime needs to own ​

The runtime is where the cascade executes. Concrete additions to runtime beyond what's already documented:

  1. Multi-region Bedrock discovery. Not a single LLM_PROVIDER env var — a config array of (provider, region, model) tuples in priority order.
  2. Per-(provider, region, model) circuit breaker state. Stored in-memory per server instance; shared via Redis or equivalent for coordinated ingress behavior. Breaker decisions propagate within ~1 second.
  3. Hedging policy module. Declarative: { always: [firstTurn], conditional: [breakerHalfOpen] }.
  4. Response-cache lookup layer. Sits before Tier 1 attempt; populates on successful response.
  5. Tier 6 "deterministic-only" mode toggle. Server-side flag; UI polls it and renders the banner.
  6. Outage-readiness dashboard. Rolls up: current tier in use, breaker state per combo, eval pass rate per region, downshift-quality-delta trend.

What this costs vs. what it buys ​

Ongoing cost:

  • Multi-region Bedrock: VPC endpoints in 3 regions (~$30–$40/region/month for endpoints + minor cross-region latency when failed over)
  • Hedging: ~1 % LLM spend overhead at baseline
  • Response cache infra: Redis / DynamoDB TTL-based; ~low $100s/month at 10 k member scale
  • Chaos drills: engineering time monthly; production impact negligible
  • Daily region-level evals: ~$10–20/day at the full cascade

Estimated total: ~3–4 % of steady-state LLM spend as outage-preparedness overhead.

What it buys:

  • Zero user-visible impact on the majority of Anthropic-infra incidents (Apr 15 login/API would have been transparent; Apr 20 Opus outage would have been transparent via Tier 3 or Tier 4 downshift if Bedrock was also affected)
  • Measurable-but-bounded quality impact on broad Claude outages (Tier 4 / 5)
  • Graceful product-survives mode on total-LLM-failure scenarios (Tier 6)
  • Defensible uptime story when members, agents, or EDE auditors ask "what happens when your AI goes down?"

Related ​

  • Provider risk & portability — strategic framing + the four tiers of vendor switch (Tiers 4–5 here correspond to those)
  • Runtime — where the cascade executes
  • Evals & observability — downshift eval set, outage-readiness dashboard
  • Voice — vendor strategy — the same cascade discipline applies to ASR + TTS
  • AWS Bedrock status: status.aws.amazon.com
  • Anthropic status: status.claude.com

Tracking ​

Open items to land before Florence text launch:

  • [ ] Provision Bedrock access in ≥ 3 regions with private VPC endpoints
  • [ ] Implement multi-region / multi-transport FlorenceLLMProvider adapter
  • [ ] Circuit-breaker module with per-combo state
  • [ ] Response-cache layer + Tier 6 mode toggle
  • [ ] Downshift eval set authored and integrated into CI
  • [ ] Chaos drill #1 (weekly region swap in staging) scheduled
  • [ ] Outage-readiness dashboard live before first user sees Florence
Pager
Previous pageProvider risk & portability
Next pageRoadmap

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.