Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Session log — 2026-04-21 — Phase 5 staging go-live ​

Scope ​

Stand up the AWS staging application stack end-to-end on top of the Phase 3 Terraform scaffolding + Phase 4 staging networking, deploy the Next.js app to stage.askflorence.health, validate every outbound integration (MongoDB Atlas, CMS Marketplace API, AWS SES, PostHog), and get the staging environment to a state where shipping the current Vercel-served app on AWS is a no-risk path. No production traffic moved. Vercel askflorence.health and www.askflorence.health continue to serve production users throughout the session.

Actor ​

  • Human: Taha Abbasi.
  • Agent: Claude Opus 4.7 (1M context), running in Claude Code CLI.

Tickets ​

  • Advances Issue #47 from Phase 3 (Terraform scaffolding) through Phase 5.6 (end-to-end SES send proven on the staging app).
  • Provisions the app_writer_waitlist user in the staging Atlas project — closes the staging-side gap tracked under Issue #56. Prod rollout remains deferred per the plan.

External systems touched ​

AWS (staging account 549136075525) ​

  • ECR repository askflorence-app created (was missing — Phase 4 had the networking/KMS/secrets only).
  • ECS cluster askflorence-staging. Fargate capacity providers FARGATE + FARGATE_SPOT. Container Insights enabled.
  • ECS task definition askflorence-staging-app-task — 0.25 vCPU / 0.5 GB, non-root user nextjs (UID 1001), port 3000, 14-day CloudWatch log retention under CMK alias/askflorence-staging-data. Revisions :1–:8 registered across the session; :8 is the live image on main@04cfd35. Task role policy limits runtime AWS actions to ses:SendEmail/ses:SendRawEmail on account identities + configuration sets.
  • ECS service askflorence-staging-app: desired 1, min 100/max 200 for rollover, deployment circuit breaker enabled, target group attached to staging ALB.
  • ALB askflorence-staging-alb fronting the ECS service in public subnets. HTTPS listener with the stage.askflorence.health ACM certificate; HTTP redirects to HTTPS. Target group askflorence-staging-tg health-checks /api/health.
  • Secrets Manager — staging/mongodb/waitlist-write rotated to point at the new Atlas user (see Atlas section). All other staging/mongodb/* secrets left untouched.
  • Task role inline policy widened from identity/stage.askflorence.health to identity/* scoped to the staging account. Rationale: ses:SendEmail authorizes on every identity referenced in the call (From + To/CC/BCC). In SES sandbox, recipients must also be verified identities in the account, so the role needs permission on them too.
  • IAM / no new roles created. GitHub Actions deploy role from Phase 3 is the only principal that pushes images + updates the service.
  • SES — staging ses:SendEmail path exercised successfully from three call sites (direct AWS CLI, /api/waitlist via ECS task). AWS/SES/Send CloudWatch metric shows 3 DeliveryAttempts, 0 bounces, 0 rejects. Still sandbox mode; production access request ticket separately filed.
  • CloudWatch Logs log group /aws/ecs/askflorence-staging-app captures container stdout/stderr. CMK-encrypted.
  • Route 53 subzone for stage.askflorence.health (delegated from Cloudflare in Phase 4) now has an A-record alias pointing stage.askflorence.health → staging ALB DNS name.

MongoDB Atlas (staging project 69e31af12fd2c0aef51bbb41) ​

  • New custom role role_writer_waitlist — 7 actions (FIND, INSERT, UPDATE, REMOVE, CREATE_INDEX, DROP_INDEX, COLL_MOD) scoped to askflorence.agent_waitlist_submissions only.
  • New database user app_writer_waitlist — bound to role_writer_waitlist. Password (32-char alphanumeric, generated locally, never echoed) written via a temp file to Secrets Manager + .env.staging.local.
  • Prod project (AskFlorence, 69dc20c64005b222804dafa4) — untouched. No Atlas CLI command in this session targeted the prod project.

Cloudflare + Route 53 ​

  • Unchanged from Phase 4. Cloudflare remains authoritative for apex askflorence.health; Route 53 holds the delegated stage.askflorence.health subzone. Cloudflare was not touched today.

Vercel ​

  • Untouched. No project settings, no env vars, no deployments. askflorence.health and www.askflorence.health continued to serve production traffic through every phase of this session. Two commits land on main today (e24c5ca, 44c1493, 90d05af, 04cfd35) — none are promoted to Vercel in this session. A separate deploy step using vercel --prod from a dev machine will roll them forward as a discrete action with its own owner approval.

What shipped (chronological) ​

Phase 5.5 — email provider abstraction ​

Code (main@e24c5ca):

  • New src/lib/email.ts with sendEmail() + getEmailProvider(). Two providers behind a single typed API:
    • ResendProvider — existing behavior, unchanged; uses RESEND_API_KEY + fetch("https://api.resend.com/emails").
    • SesProvider — new; uses @aws-sdk/client-sesv2 with SESv2Client. Client is lazily constructed so Vercel builds don't require AWS creds at build time.
  • Provider selected once at module load via EMAIL_PROVIDER env var ("ses" vs "resend"; default is "resend").
  • Both providers return the same result shape { ok, messageId?, error?, provider } — sendEmail never throws on provider errors, callers inspect result.ok.
  • Refactored call sites:
    • src/app/api/waitlist/route.ts: 3 sends (consumer confirmation, agent confirmation, ops notification) + kept the Resend-specific audience REST sync, now gated behind emailProvider === "resend" so it's a no-op on SES.
    • src/app/api/agents/discovery/route.ts: 2 sends (agent confirmation, ops notification). sendResendEmail helper + RESEND_API_BASE constant deleted.
  • Added dep @aws-sdk/client-sesv2 ^3.1033.0 to package.json.

Vercel posture: EMAIL_PROVIDER is unset on Vercel → falls through to the Resend path, RESEND_API_KEY still read, unchanged behavior. Zero runtime change. Verified by npm run build producing a bundle that doesn't pull in the AWS SDK on the Resend code path (tree-shaking).

Phase 5.5a — EMAIL_FROM_DOMAIN override ​

Code (main@44c1493): After the first SES deploy, SES rejected sends from [email protected] (the Resend-verified prod sender, hardcoded in the route files) because staging SES only has stage.askflorence.health verified. Rather than touch the route files or add five separate env vars, extended sendEmail() with a single EMAIL_FROM_DOMAIN env override that rewrites the domain part of every From header at send time. Works for bare addresses (user@domain) and display-name form (Name <user@domain>). Unset on Vercel → no rewrite. Staging ECS sets EMAIL_FROM_DOMAIN=stage.askflorence.health.

Phase 5.6 — end-to-end SES validation on /api/waitlist ​

Three layers of evidence accumulated before declaring the SES path green:

  1. Direct aws sesv2 send-email from a staging SSO session: MessageId 0100019daf13d623-07efec70-..., email delivered to [email protected]. Proves domain + DKIM + MAIL FROM + IAM at the account level.
  2. ECS task role policy widened from identity/stage.askflorence.health to identity/* (main@90d05af) after the ses:SendEmail call failed with "not authorized to perform ses:SendEmail on resource identity/[email protected]". Rationale + implementation in the change log entry below.
  3. POST /api/waitlist with [email protected] returned HTTP 200 + a real Mongo waitlist_submission_id. No error log in /aws/ecs/askflorence-staging-app. AWS/SES/Send metric incremented.

Blocker surfaced along the way: the staging Mongo secret staging/mongodb/waitlist-write was a placeholder string (PLACEHOLDER-REPLACE-ME-OUT-OF-BAND) because the parallel Mongo session hadn't provisioned app_writer_waitlist yet. Rather than hack around with a broader user (tried — app_admin_agents doesn't have createIndex on agent_waitlist_submissions either), ran the Atlas CLI flow described in the Atlas section above to create the narrow-scoped user properly.

Phase 5.7 — PostHog server fail-open + staging analytics opt-out ​

Code (main@04cfd35): Last blocker on the staging app code path was getPostHogClient() throwing on missing token, returning 500 to the caller AFTER the Mongo write + SES send had already succeeded. Two-part fix:

  1. Server client fail-open (src/lib/posthog-server.ts): returns a no-op client (same methods, no-op implementations) when the token is missing OR when DEPLOY_ENV === "staging". Contract is "capture-by-default unless we see a positive signal we're not prod" — critical ordering because Vercel prod doesn't set DEPLOY_ENV, so inverting the rule to "only capture on DEPLOY_ENV=prod" would have silently killed production analytics.
  2. Client host opt-out (instrumentation-client.ts): extended the existing syncNoTrackMode() toggle with a OPT_OUT_HOSTS set containing stage.askflorence.health. The opt-out condition is now hostOptOut || paramOptOut — whichever trigger fires causes opt_out_capturing(), and opt_in_capturing() only runs when both are false. Prod behavior of ?no_track=1 is preserved exactly: add param → opted out; remove param → opted back in (no reload needed). On staging, hostOptOut is always true, so the param is additive but cannot opt back in.

Infra wiring: NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN + NEXT_PUBLIC_POSTHOG_HOST threaded in two places because Next.js inlines NEXT_PUBLIC_* at build time:

  • Dockerfile: accepted as ARGs and exported as ENV before RUN npm run build so they're baked into the client bundle.
  • .github/workflows/deploy-staging.yml: passed as --build-args sourced from GitHub Actions variables (not secrets — PostHog project tokens are public and ship in every page load's browser bundle).
  • infra/envs/staging/ecs.tf: added as plain environment entries so server-side reads at runtime have them too; also makes future token rotation a task-def update, not an image rebuild.

Evidence that the wiring is correct: grepping /_next/static/chunks/0u92fl5tvujj9.js served from stage.askflorence.health finds both the exact token value and the literal string stage.askflorence.health. A follow-up SES send via POST /api/waitlist returned HTTP 200 with no PostHog crash.

Addresses from Issue #47 docs comment ​

  • docs/infrastructure/aws-setup.md — created in this session as the general AWS runbook. Follows the established file naming + frontmatter pattern.
  • Reference in docs/infrastructure/cloudtrail-setup.md to aws-setup.md will be re-linked once that file's initial commit lands alongside this session log.
  • ignoreDeadLinks in docs/.vitepress/config.ts tightened to cover only the specific cross-repo Terraform source references that genuinely cannot be fixed without a pattern change (the repo-root SESSION_BRIEF_*.md issue is being handled separately by a follow-up of moving those artifacts into docs/session-log/ over time).

What this session does NOT do (explicit non-goals) ​

  • Does not move production traffic. Cloudflare apex DNS still points at Vercel. Nothing in this session affects what a real visitor hitting askflorence.health or www.askflorence.health experiences.
  • Does not touch prod Atlas. All Mongo operations targeted the staging project (69e31af12fd2c0aef51bbb41); the prod project was not even discovered-against.
  • Does not retire Resend. ResendProvider + EMAIL_PROVIDER=resend code path stays live until Phase 11 post-cutover cleanup.
  • Does not provision prod AWS. Prod account askflorence-prod (039624954211) stays at Phase 2.5 baseline — no VPC, no ECS, no ALB. Phase 8 is the mirror-from-staging step.
  • Does not grant SES production access. Staging still needs verified sandbox recipients; taking SES out of sandbox is an AWS-side review on a ticket filed Phase 5.4.
  • Does not touch /agents, /agent-onboarding, /agent-discovery page UIs. Route handler code was refactored to use the sendEmail() abstraction, but form flows, validation, copy, and styling are byte-for-byte unchanged from v0.14.0.

Verification ​

All exercised on the staging ALB hostname stage.askflorence.health, which is reachable globally. None of these steps touched Vercel prod.

  • GET /api/health → 200 {"status":"ok","commit":"04cfd35...","env":"staging"}.
  • POST /api/waitlist with {"email":"[email protected]","zip":"10001","interest":"consumer"} → 200 with real waitlist_submission_id; record visible in Atlas agent_waitlist_submissions; SES DeliveryAttempts metric +1; zero error logs.
  • aws sesv2 get-account shows sandbox still true (expected pre production-access). SentLast24Hours: 3.
  • Client-side PostHog bundle verification: curl https://stage.askflorence.health/_next/static/chunks/0u92fl5tvujj9.js | grep -aoE '(phc_Azu[^"]+|stage\.askflorence\.health)' returns both expected strings.
  • Vercel prod regression sanity: npm run build green; route-handler diff shows no behavioral change when EMAIL_PROVIDER is unset (Resend path identical). Live Vercel deploy not modified.

Next session priorities ​

  1. Phase 6 — staging CloudFront distribution + WAFv2 web ACL in front of the ALB. Cloudflare CNAME stage.askflorence.health swings from ALB DNS → CloudFront distribution. WAF managed rule sets: CommonRuleSet + KnownBadInputs + SQLiRuleSet + AmazonIpReputationList + AnonymousIpList + rate-based rule (2000 req / 5min / IP).
  2. Phase 7 — staging Atlas VPC peering. Replaces NAT EIP 54.164.140.5 currently on the Atlas allowlist with the staging VPC CIDR 10.40.0.0/16. Allowlist tightened to VPC-only.
  3. (Taha) Reply to the AWS SES production-access review email.
  4. (Taha) Fix the trailing \n on CMS_API_KEY on Vercel prod env (staging is already clean).
  5. Once Phase 6 + 7 are green, the staging stack is feature-complete — Phase 8 is mirroring that exact shape into askflorence-prod.
Pager
Previous page2026-04-22 — Phase 6 CloudFront + WAF
Next page2026-04-17 — Atlas staging

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.