Increased latency and errors across all services in US East
Incident Report for PubNub
Postmortem

Problem Description, Impact, and Resolution 

On Sunday, May 12, 2024 at 14:16 UTC, customers using PubNub’s legacy Access Manager Version 2 may have experienced increased errors and latency across all services in our US-East point of presence. After investigation, we discovered that several database nodes were failing. We were prepared to fail out of that region when the nodes recovered and the errors stopped by 15:20 UTC. 

Customers using Access Manager Version 3 were unaffected because Version 3 does not leverage a database. 

Mitigation Steps and Recommended Future Preventative Measures 

To prevent a similar issue from occurring in the future, we are increasing allocated resources in the affected infrastructure, as well as tuning the auto-scale threshold. Additionally, we continue to encourage and assist customers using Access Manager Version 2 to migrate to Version 3.

For information on migrating to Access Manager Version 3, please refer to our Migration Guide, or contact support@pubnub.com for assistance.

Posted May 16, 2024 - 00:50 UTC

Resolved
From 14:16 UTC to 15:20 UTC on May 12, 2024, users may have experienced increased latency and errors across all PubNub services in our US East PoP. Our Engineering teams applied a fix and the issue has been resolved since 15:20 UTC.

A root cause analysis will be posted soon.
Posted May 12, 2024 - 15:41 UTC
This incident affected: Realtime Network (Publish/Subscribe Service, Storage and Playback Service, Stream Controller Service, Presence Service, Access Manager Service, Realtime Analytics Service, DNS Service, Mobile Push Gateway, MQTT Gateway, App Context Service).