Brief uptime probe timeouts during monitoring rollout
2026-06-22 03:20 UTC → 2026-06-22 11:10 UTC
Uptime monitoring · Web App & API · 4 automated probe gaps grouped here
On 22 June 2026 (UTC) our new five-minute uptime sampler recorded four short gaps while probing the public /api/health endpoint from Convex. Each gap lasted about five to ten minutes. Customer-facing services stayed up; we have no confirmed user-facing outage tied to these windows.
- Customer impact
- No customer-reported outages. Launches, dashboard, API, and player traffic were not blocked by this event. The public status page and toolbar uptime figure dipped to ~96% for the day because failed probes count against our self-reported history even when the app was serving traffic.
- Root cause
- Real uptime sampling went live on 21 June (Convex cron + CONNECT_STATUS_PROBE_URL → https://app.allureconnect.com/api/health). On 22 June several overnight and morning probes timed out or failed to complete within the five-second probe budget—consistent with transient Vercel cold starts, deploy windows, or network latency between Convex and the app region—not a sustained Convex or application failure. Failed end-to-end probes are stored as "data backend" gaps in the automated log even when Convex itself was running.
- Resolution
- Each gap cleared on the next successful probe without manual intervention. Health checks and live traffic continued to recover normally. We are watching probe reliability as baseline history accumulates and will tune timeouts and reporting thresholds if flakiness persists.
Timeline
- 2026-06-22 03:30 UTCFirst automated probe gap detected (~03:25–03:30 UTC). Monitoring marked a failed sample; /api/health recovered on the next five-minute check.
- 2026-06-22 07:10 UTCTwo additional brief gaps during the morning sampling window (~07:05–07:10 and ~07:30–07:40 UTC). No elevated error rates on customer API routes during these windows.
- 2026-06-22 11:05 UTCFinal recorded gap of the day (~11:00–11:05 UTC). Subsequent probes returned healthy.
- 2026-06-22 18:00 UTCPublished this incident note for transparency. June 21 (monitoring go-live) remained at 100% in the rollup; only 22 June shows degraded in the 90-day bar due to these probe gaps.