Increased latency and errors observed in US-West

Incident Report for PubNub

Postmortem

Problem Description, Impact, and Resolution

On October 20th, 2025 at 07:06 UTC, our monitoring systems alerted us to elevated error levels across multiple PubNub services in the IAD region (US-East). Some customers may have experienced increased error rates and latency, as well as intermittent issues with Presence service availability across IAD (US-East), SJC (US-West), and HND (AP-Northeast).

We quickly determined the issue was caused by a broader infrastructure outage affecting our cloud provider (AWS) in the IAD region. We initiated regional failover procedures and re-routed new connections to alternate regions. However, due to undefined steps in some of our failover processes and delays accessing some tools due to the provider issue, existing connections for some services remained degraded for longer than expected.

To restore full service, we manually reset established connections, re-routed Presence traffic to Frankfurt (EU-Central), and brought on additional infrastructure in other regions to absorb traffic. Errors were mitigated by 09:20 UTC. Later in the day, additional regional load in US-West triggered a new wave of service degradation. We responded by isolating the US-East region again and scaling up balancer capacity in US-West. PubNub services were stabilized by 13:20 UTC, and remained in a monitoring state while our infrastructure provider worked to fully resolve the underlying issue.

By 22:35 UTC, our provider reported full restoration of service. After validating stability in US-East, we completed rebalancing traffic by 23:48 UTC, and declared the incident resolved.

Mitigation Steps and Recommended Future Preventative Measures

While this incident was caused by an external infrastructure outage, we’ve identified several opportunities to strengthen our internal readiness and response procedures.

We are consolidating and centralizing our regional failover procedures to ensure they are immediately accessible and complete for all production services. Any gaps in our process documentation for newer services will be addressed to ensure readiness before they are fully adopted into production. Additionally, we are reviewing and resolving issues with internal tooling, including inventory and DNS resolution problems, which made mitigation more difficult during the incident.

These improvements will ensure faster and more consistent responses to future infrastructure-level disruptions, and reduce potential impact on customer traffic across regions.

Posted Oct 27, 2025 - 18:33 UTC

Resolved

This incident has been fully resolved. Our infrastructure provider has restored normal operations, and all systems are functioning as expected. Service performance and reliability have returned to pre-event levels. PubNub continues to monitor systems closely to ensure ongoing stability and we will provide an RCA in in the next couple of day.

Thank you for your patience and understanding throughout this event. If you experience any further issues, please reach out to PubNub Support.

Posted Oct 20, 2025 - 23:48 UTC

Update

PubNub remains stable, and our infrastructure provider has restored normal service capacity. Systems are now operating at pre-event levels, and most dependent services are performing as expected. We continue to monitor closely and will share further updates as needed. Please contact PubNub Support if you experience any impact.

Posted Oct 20, 2025 - 22:32 UTC

Monitoring

PubNub remains stable, and our infrastructure provider continues to make steady progress in restoring services. System performance and reliability are improving, and functionality across affected components is returning to normal levels. We continue to monitor closely and will share additional updates as recovery progresses. We will provide another update when there are any changes in status which we expect within the hour. Please contact PubNub Support if you experience any impact.

Posted Oct 20, 2025 - 20:25 UTC

Update

Our status is the same: stable, and working with our infrastructure provider to restore services to their normal state. Our normal policy is to update every 30 minutes, but we will stop issuing this status; we will update this page when a change to the status occurs. As always, reach out to PubNub support with any issues.

Posted Oct 20, 2025 - 19:06 UTC

Update

PubNub remains stable, with only minor issues being observed. Our infrastructure provider is making progress in restoring their services, with early signs of recovery in several regions. We continue to work closely with them as they complete mitigations across the remaining areas. Please reach out to PubNub Support if you experience any impact.

Posted Oct 20, 2025 - 18:37 UTC

Update

PubNub continues to remains stable, with only minor issues being observed. Our infrastructure provider is making progress in restoring their services, with early signs of recovery in several regions. We continue to work closely with them as they complete mitigations across the remaining areas. Please reach out to PubNub Support if you experience any impact.

Posted Oct 20, 2025 - 18:00 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Oct 20, 2025 - 17:25 UTC

Monitoring

PubNub remains stable and fully operational. Our infrastructure provider is continuing to apply mitigation steps to restore normal health across their network and related services. We are closely monitoring the situation and coordinating with them to ensure continued stability. Please reach out to PubNub Support if you experience any service issues.

Posted Oct 20, 2025 - 17:25 UTC

Update

We are still stable, but seeing minor issues. We are still working with our infrastructure provider to address the remaining issues. Please reach out to PubNub support with any issues.

Posted Oct 20, 2025 - 16:43 UTC

Update

We are still working with our infrastructure provider to mitigate the effects of the ongoing issues. We have been monitoring and handling traffic to minimize disruptions. While points of presence in the US may see slightly higher latencies, we are stable. Please reach out to PubNub support to report issues.

Posted Oct 20, 2025 - 15:13 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Oct 20, 2025 - 14:41 UTC

Update

Apologies, but we are continuing to investigate this issue.

Posted Oct 20, 2025 - 14:21 UTC

Update

We are continuing to investigate this issue.

Posted Oct 20, 2025 - 13:47 UTC

Investigating

Beginning at 12:40 UTC, we observed increased latency and errors in our US-West region for multiple services. PubNub Technical Staff is currently investigating, and more updates will follow once available.

Posted Oct 20, 2025 - 13:37 UTC

This incident affected: Realtime Network (Publish/Subscribe Service, Presence Service, Access Manager Service) and Points of Presence (North America Points of Presence).