Appearance
Principles
Invariants that govern every Florence AI decision. Change these only by amending this document in a PR with explicit rationale.
1. Deterministic grounding — Florence never computes a fact
Florence is a natural-language interface, not a knowledge base. Every factual claim in a Florence response must trace to a tool call in the current turn. Enforced three ways:
- System prompt contract — the prompt explicitly forbids arithmetic, benefit derivation, or advisory claims without a backing tool result.
- Post-response grounding check — a cheap Haiku call after every assistant turn scans for factual claims and asserts each traces to a tool result ID in the same turn. Ungrounded claims → block + log + escalate.
- Hallucination dragnet in evals — CI eval regexes every number in every response against that turn's tool-result JSON. Unbacked number = failing test.
Consequence: model upgrades (Claude 4.7 → 5 → N) are safe, because knowledge lives in tools, not weights.
2. Text is the source of truth — voice is a UI affordance
Text transcripts are the legal record. Voice I/O (ASR on the way in, TTS on the way out) wraps the same text code path. We do not use integrated voice-to-voice models (OpenAI Realtime, Gemini Live, etc.) — they abstract away the tool loop, grounding check, and audit trail, exactly the things we cannot abstract.
3. Camouflage — raise the cost of fingerprinting
Competitors should have to work to know which model powers Florence. No "powered by Claude" badges. All model calls server-mediated. Tool-use blocks never stream to the client. Output runs through a style normalizer that enforces the Florence voice and strips model-family tells. Full detail in guardrails & camouflage.
4. Unit economics — committed targets
These are binding design goals, not aspirations. Architecture choices that break them require explicit review.
| Metric | Target |
|---|---|
| Text turn (LLM + guardrails + grounding) | ≤ $0.005 |
| Text conversation (~10 turns) | ≤ $0.05 |
| Voice turn (ASR + LLM + TTS) | ≤ $0.03 |
| Voice conversation (~5 min, ~15 turns) | ≤ $0.50 |
| Per-member-per-month Florence cost | ≤ $0.50 |
| LLM + voice + infra as % of PMPM revenue | ≤ 3 % at 10 k members, ≤ 2 % at 100 k+ |
| Escalation-to-human rate | ≤ 5 % |
| First-token latency (text) | ≤ 500 ms |
| End-of-speech to first audio (voice) | ≤ 400 ms |
Three moves make or break these targets:
- Prompt caching as a first-class design constraint. Fixed-order prompt structure; only the delta is fresh tokens. Target ≥ 85 % input-token cache-hit rate.
- Haiku-default model routing. ~85 % Haiku 4.5, ~14 % Sonnet 4.6, ~1 % Opus 4.7. Measured monthly; alert on drift.
- Tool-result caching with clear TTLs. Plan data, drug coverage, provider network cached per (input-hash) for minutes, not seconds.
Missing any one of these 10× cost at scale. See runtime for implementation.
5. Data classification is enforced in code, not policy
Every vendor integration is a typed adapter sink that declares the data classes it accepts. Routing FTI to HubSpot is a compile error, not a policy violation. Every stored field carries a compliance class; every MongoDB document is encrypted with the CMK for its class. Full detail in infrastructure/data-classification and applied here in tool surface.
6. Tool access is scoped by user context
Every tool call carries an auth context: anonymous | authenticated_member | authenticated_agent | authenticated_admin. Tools declare which contexts they accept. An anonymous user cannot invoke member-specific tools. An agent cannot invoke member-data tools for members not assigned to them. Enforced in the tool wrapper, not the prompt.
7. Evals are deployment gates
Florence's prompts, tool schemas, and model selections are code. Every change runs the eval suite. A > 2 % regression on any category blocks the merge. The bar is better than a licensed human health insurance agent on factual recall, appropriately deferential on advisory judgment, reliably escalatory on edge cases. See evals & observability.
8. Every turn is an audit record
Every Florence turn produces an immutable audit-log row: user identity, turn content (encrypted with the appropriate CMK), tools called, tool parameters, tool result summaries, model used, token counts, grounding-check outcome, any escalation, any PHI/FTI touched. Retention ≥ 6 years (HIPAA) or 10 years (EDE-safer). See evals & observability.
9. Member and agent — one runtime, two prompts
Florence serves both sides. Same FlorenceRuntime, same Claude Agent SDK wiring, same grounding check, same audit log. The delta is system prompt + tool surface + auth context. Agent-side Florence has her own tool registry (draft_sep_letter, list_my_assigned_members, compose_member_message) and her own eval set (PHI boundary tests, compliance-language accuracy).
10. Provider independence — portable by construction
Florence's core intelligence runs on a third-party LLM. The specific vendor is a commodity choice, not a differentiator. Every LLM call goes through a provider abstraction; tool schemas are model-neutral Zod with per-provider renderers; prompts have per-provider adaptation layers; evals run against the primary AND at least one warm-standby provider daily, not quarterly. A vendor switch is a config change plus a known-quantity quality delta, not a platform rewrite.
The four tiers of switch (same-vendor transport → model version → cross-vendor → self-hosted open-weight) and the risk register that justifies each live in provider risk & portability. Treat that document as binding; the enablers it requires (abstraction layer, warm-standby evals, adaptation prompts, kill-switch) are non-optional.
11. Florence ships after AWS migration + full deterministic flow
No Florence code runs in production until:
- AWS migration completes (#47)
- The full deterministic lead → enrollment → member-servicing flow is live for consumer + agent users
Research, architecture, eval harness, data-classification retrofit, and FlorenceRuntime spike can all run now, in parallel. Integration and launch wait. See roadmap.