Errors and Latency Across All Services in our US-East and US-West Points of Presence

Incident Report for PubNub

Postmortem

Problem Description, Impact, and Resolution

At 21:50 UTC on 2024-12-11 we observed increased latencies and error rates across all services in our US-East point-of-presence and, a few minutes later, in US-West as well. We observed that the PubNub Access Manager (PAM) was at the center of the degradation, and an investigation noted that nodes in that service were highly memory constrained. We increased capacity and the issue was mitigated in both points-of-presence at 22:10 UTC, and declared resolved at 22:22 UTC. This issue occurred because a previously unseen pattern of customer behavior overwhelmed a cache in the PAM system, causing memory to become constrained and performance to degrade.

Mitigation Steps and Recommended Future Preventative Measures

To prevent a similar issue from occurring in the future we changed the cache capacity and updated our monitoring to alert on this and similar patterns of behavior.

Posted Dec 16, 2024 - 16:27 UTC

Resolved

This incident has been resolved. We apologize for any impact this may have had on your service. Don't hesitate to contact us by reaching out to PubNub Support (support@pubnub.com) if you wish to discuss the impact on your service. An RCA will be provided soon.

Posted Dec 11, 2024 - 22:22 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Dec 11, 2024 - 22:10 UTC

Investigating

We are currently investigating an issue that is causing requests in our US-East points-of-presence to fail or respond slowly.

Posted Dec 11, 2024 - 22:04 UTC

This incident affected: Realtime Network (Publish/Subscribe Service, Storage and Playback Service, Stream Controller Service, Presence Service, Access Manager Service, Realtime Analytics Service, DNS Service, Mobile Push Gateway, App Context Service), Website and Portals (Website, Administration Portal, SDK Documentation, PubNub Support Portal), Functions (Functions Service, Vault, Key Value store, Scheduler Service), and Points of Presence (North America Points of Presence).