Increased errors and latency for presence in FRA and BOM

Incident Report for PubNub

Postmortem

Problem Description, Impact, and Resolution 

Starting at 12:20 pm UTC on June 2, 2025, we noticed an increase in errors and latency affecting the Presence service in multiple regions. After investigating the increase in errors and latency, we identified the cause to be a higher-than-normal traffic pattern. In response, we  scaled up resources to handle the request load, and the Presence service was fully restored in the affected regions by 1:51 PM UTC on June 2, 2025.

Mitigation Steps and Recommended Future Preventative Measures 

To prevent a similar issue from occurring in the future, we are tuning our autoscaling configuration, as well as analyzing system behavior and bottlenecks observed during the incident window to inform further adjustments.

Posted Jun 03, 2025 - 17:29 UTC

Resolved

Starting at 12:20 UTC, we noticed an increase in errors and latency affecting the Presence service in the BOM and FRA regions. Our engineers investigated the issue and successfully restored the service completely, which has remained stable since 13:51 UTC.
Posted Jun 02, 2025 - 14:43 UTC
This incident affected: Points of Presence (European Points of Presence, Southern Asia Points of Presence) and Realtime Network (Presence Service).