Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

ADR 0009 — Self-hosted analytics + observability: OpenPanel + GlitchTip, full-journey first-party ​

Status ​

Superseded by ADR 0010 — 2026-05-26. The portal-CSP argument below remains factually correct and is preserved as design context; ADR 0010 explicitly carves the portal surface out of the new PostHog HIPAA Cloud path because of it. The "self-hosted from day one" moat was conceded for v1 timing reasons; revisit triggers are documented in ADR 0010.

Originally accepted — 2026-05-16 (ENG-347). Superseded the "self-hosted Umami" direction recorded in ENG-217 / GitHub #75 (sub-deliverable A — PostHog rip — shipped 2026-05-12, PRs #184/#186). Was tracked for build under ENG-347 / GitHub #342.

Context ​

ENG-217 removed PostHog Cloud (cost: $250+/mo for the BAA tier; no FedRAMP path for EDE Phase 3; broken-on-apex via WAF). It named "self-hosted Umami" as the replacement. Two things invalidated that specific choice before build started:

  1. The product is going web → mobile. The first mobile app iteration is the full onboarding → pricing → plan → waitlist funnel plus Florence AI. Umami has no first-party mobile SDK — its documented mobile path is a hidden WebView, and the real-world workaround is hand-rolled per-platform HTTP against /api/send with a silent-data-loss footgun (missing/wrong User-Agent → 200 response, events discarded, no error). Umami is a website-traffic tool, not a product-analytics tool: thin funnels/paths, anonymous-only identity.

  2. The questions are product-analytics questions. "What are users doing, where do they struggle, are they shopping or poking the prefilled demo, how many hit a state-based exchange we don't serve yet" require funnels, path analysis, retention, and cross-surface identity — exactly Umami's documented weak spots.

A HIPAA BAA does not solve this: a BAA makes a third-party processor legally permitted to handle PHI, but the portal + mobile surfaces block third-party scripts/processors architecturally (CSP, per the Creative AdBundance subdomain-cut model). A legal BAA does not override a CSP. So PostHog Cloud — even on the $250+/mo Boost-for-BAA tier — structurally cannot see the app/portal, the exact surfaces we most want to measure. PostHog self-hosted can (first-party) but is the heaviest stack of all options (ClickHouse + Kafka + Redis + PG, 4 vCPU / 16 GB floor), is the deployment path PostHog actively de-emphasizes, and the Cloud→self-hosted migration direction is deprecated/unreliable.

The compliance posture is not a constraint to route around — it is the moat. First-party + self-hosted is the only analytics allowed across the full journey including the PHI portal and mobile. Tools Creative AdBundance brings (Meta CAPI, Hotjar, etc.) die at "the cut"; ours does not. "Log everything from everywhere, one continuous funnel" is the goal, and we are the only party who can have it.

Decision ​

Self-hosted from day one, two tools, full-journey first-party:

LayerToolScope
Product / behavioral analyticsOpenPanel (AGPL-3.0, self-hosted; Docker Compose; Postgres + ClickHouse + Redis; first-party SDKs Web / Swift / Kotlin / React Native)Marketing apex → /plans → portal/enroll → mobile app, one funnel
Errors / crash / perfGlitchTip (self-hosted, lightweight Django + Postgres + Redis; Sentry-SDK-compatible: web JS + iOS/Android/RN)Same surfaces; web now, mobile when the app ships
Server healthStructured src/lib/logger.ts → CloudWatch Logs + CloudWatch dashboards/alarms + a synthetic canary running the ENG-275 critical-flow smoke on a scheduleAll API surfaces
Florence AI interaction loggingNot an analytics tool. First-party encrypted transcript/tool-trace store + structured logs, designed under #61. Feeds derived florence_* events into OpenPanel only.—

Architecture invariants:

  • Server-side event spine is the through-line. Business events fire server-side from the shared /api/session/* + /api/share/* routes (the ADR-adjacent #274 transport-agnostic session API). Web sends cookies, mobile sends Bearer — same routes, same events. This makes the whole stack tool-portable: swapping an analytics vendor is changing an ingest target, not re-instrumenting.
  • No "cut" for first-party. The browser-cookie boundary at the subdomain cut stays enforced (it fences third-party tools). Journey continuity is reconstructed server-side via the identity graph anonymous af_visitor_id → member_id (server-side join at login/enroll) → device session (mobile Bearer). The funnel is defined end-to-end through enrollment_submitted, not truncated at the cut.
  • Privacy by construction. Income always bucketed, never raw. No raw doctor/drug strings (identifying / health data) — only search patterns. Opaque per-session plan IDs resolved to real plan_id server-side at the fire site; real IDs never in a client URL/payload. Identity = af_visitor_id, not PII.
  • Self-hosted on our AWS = compliant via the data-control path. No analytics-vendor BAA. The infra sits under the existing AWS Organizations BAA. This is the EDE posture (operating history accrues now), not a deferral.

Consequences ​

Accepted:

  • OpenPanel adds ClickHouse as an ops component — medium weight (single container; no Kafka), far from the PostHog monster, but a new thing to operate/back up/upgrade.
  • OpenPanel is a younger project than Umami (longevity risk). Mitigated by the server-side spine: events fire from our API, so a future vendor swap is an ingest-target change, not a re-instrumentation.
  • Reverses the recorded "Umami" decision → doc-trail update (CLAUDE.md, Creative AdBundance brief, session-cookie + share-flows architecture docs, #274). The compliance argument is identical (both self-hosted first-party); only the tool name changes. The cross-surface-moat framing already in CLAUDE.md is correct and preserved verbatim, retargeted to the new tools.
  • Cross-surface identity stitching (visitor → member → device) is a new required design item. The join key must exist in the event schema from day one so marketing→portal→app stitching is not a retrofit (same discipline as the reserved florence_* slots). Coordinated with the Phase-5 portal work and #274's identity model.

Gained:

  • ~$45-90/mo all-in, zero new BAAs, vs the $250+/mo PostHog-BAA path or the ~$2,500-4,000/12-mo phased-PostHog route (Cloud now → self-host at EDE), which was also ~3-5x more expensive and back-loaded the heaviest infra build to right before the Sept audit.
  • One continuous marketing→portal→app funnel under one identity graph — the report Creative AdBundance's third-party stack structurally cannot produce. The moat made visible.
  • Mobile is additive, not a re-platform: OpenPanel/Sentry SDKs drop onto the same server-side taxonomy when the app ships. This is the entire reason OpenPanel beat Umami.
  • The ECS task-def drift hazard that recurred four times in ten days (the MONGODB_WRITE_URI incident class) is closed by ADR 0007 (Terraform owns the task def, shipped 2026-05-13 / ENG-277). Adding OPENPANEL_* / GLITCHTIP_DSN env vars is now a normal terraform apply change, not a drift risk — no #163 mitigation needed.

Alternatives considered ​

  • Self-hosted Umami (the prior ENG-217 direction) — rejected: no first-party mobile SDK (hand-rolled /api/send with silent-fail footgun), website-traffic tool not product-analytics, would force a re-platform exactly when mobile + Florence land.
  • PostHog Cloud + HIPAA BAA, then self-host at EDE — rejected: BAA ≠ CSP override, so it cannot see portal/mobile even while paid; ~3-5x cost over 12 months; back-loads the heaviest self-host build to right before the EDE audit (weaker evidence — auditors weigh operating history); migrates the deprecated Cloud→self-hosted direction.
  • PostHog self-hosted — rejected: heaviest stack of all options (ClickHouse + Kafka, 4 vCPU/16 GB floor), the path PostHog de-emphasizes, and we just removed PostHog for cost/compliance; reintroducing the monster for capabilities OpenPanel+GlitchTip already cover is a bad bootstrapped trade.
  • Single all-in-one (PostHog/Mixpanel/Amplitude) — rejected: either third-party (blocked post-cut regardless of BAA) or heavy/expensive; the OpenPanel + GlitchTip split covers ~90% of the relevant capability at a fraction of ops/cash. The 10% gap (integrated session replay, feature flags) is deferred (OpenReplay only-if-needed) or already solved (env-flag dormant→flip rollout, ENG-322 precedent).

References ​

  • ENG-347 / GitHub #342 — build plan-of-record (taxonomy, phasing, decision + cost thread)
  • ENG-217 / GitHub #75 — parent (sub-deliverable A, PostHog rip, shipped)
  • GitHub #274 — server-side session + clean URLs (the tool-portable through-line)
  • GitHub #298 / #163 — share flows / task-def drift root cause
  • GitHub #61 — Florence AI architecture (owns Layer-3 transcript logging)
  • ADR 0007 — Terraform owns the ECS task def (closes the env-var drift hazard)
  • docs/briefs/creative-adbundance-analytics-brief.md — the subdomain-cut model (fences third-party tools, not ours)
Pager
Previous page0008 — E2E testing strategy
Next page0010 — PostHog HIPAA Cloud (supersedes 0009)

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.