Presence API errors & latency in US and AP North

Incident Report for PubNub

Postmortem

Problem Description, Impact, and Resolution 

On May 16, 2025 at 09:05 AM UTC, some customers in US and APAC regions experienced elevated error rates and increased latency when using the Presence service. This disruption was caused by an interruption of underlying infrastructure supported by a third-party provider service.  This failure also led to delays in resolving the issue, which prolonged the impact.  

We were able to adjust resource capacity to overcome this and bring the service back to normal.  Services were fully restored on May 16, 2025 at 10:23 AM UTC.

Mitigation Steps and Recommended Future Preventative Measures 

We increased memory on the Presence pods and are working with the third-party provider to address the issues encountered during the incident. To prevent such issues from recurring in the future, we will also work to improve the Presence service to be more resilient to fluctuations in traffic.

Posted Jun 05, 2025 - 22:00 UTC

Resolved

This incident has been resolved.
Posted May 16, 2025 - 10:23 UTC

Monitoring

At 09:05 UTC we detected increased errors and latency for the Presence service in US and AP-North regions. The service has stablized as of 9:30 UTC. Our engineers are monitoring the system.
Posted May 16, 2025 - 09:59 UTC
This incident affected: Points of Presence (North America Points of Presence, Asia Pacific Points of Presence) and Realtime Network (Presence Service).