Increase Latency and Errors in Presence in the US (East and West) and AP-North regions

Incident Report for PubNub

Postmortem

Problem Description, Impact, and Resolution

On May 12, 2025 at 23:22 UTC, we observed elevated errors and increased latency for customers using our Presence service in the Virginia (US-East), California (US-West), and Tokyo (AP-North) regions. After discovering the source of the errors and latency, a solution was set up and ready to be implemented, but the incident was resolved before it was implemented. 

This issue occurred because we could not be sure that the fix would not negatively impact some functionality of the presence system.

Mitigation Steps

To prevent a similar issue from occurring in the future, we are ensuring that all presence features work with channel sharding to safely enable the solution to any sudden usage changes.

Posted May 16, 2025 - 00:03 UTC

Resolved

This incident has been resolved.
Posted May 13, 2025 - 00:43 UTC

Monitoring

The issue has resolved and we are monitoring the system for stability.
Posted May 13, 2025 - 00:12 UTC

Investigating

Beginning at 23:22 UTC we detected increased errors and latency for the Presence service in US (East and West) and AP-North regions. Our engineers are investigating the issue and the service has stablized as of 23:52 UTC.
Posted May 13, 2025 - 00:00 UTC
This incident affected: Realtime Network (Presence Service).