Increased errors and latency in FRA region

Incident Report for PubNub

Postmortem

Beginning on Friday, June 27, 2025 at 08:15 UTC, there were occasional, intermittent increases in latency and errors in three of our services: Pub/Sub, History, and Presence. The root cause discussed in this analysis was identified and corrected on Monday, June 30.

Problem Description, Impact, and Resolution 

Recently, to ensure PubNub had access to more cloud server capacity across our many regions, we introduced new instance types to our system to provide a more heterogeneous set of instance types on which PubNub’s services run. Over time, PubNub has created many OS/kernel-level configurations to optimize the performance of each server. However, with the more heterogeneous instance types, an underlying setting that we were explicitly specifying, which controls limits on network connectivity, was being silently overridden by our upstream load balancers. When we introduced the new instance types, they would reach connectivity limits. Unfortunately, the errors we initially encountered pointed us in incorrect directions, causing the investigation to take longer than we normally strive for.

The issue was mitigated once we identified this issue and configured the affected services to run on other instance types and launched more capacity.

Mitigation Steps and Recommended Future Preventative Measures 

To prevent recurrence, we modified the new instance types to emit metrics related to these OS thresholds and limits, enabling us to detect when these limits are approached or exceeded, regardless of instance type. This change allows us to scale proactively and properly route traffic based on instance type, ensuring we are more dynamic in heterogenous instance type deployment configuration. 

Again, we apologize for the incidents outlined above and are committed to maintaining transparency when issues affect our customers. Should you have any questions regarding this analysis, please reach out to our support team at support@pubnub.com.

Posted Jul 03, 2025 - 18:49 UTC

Resolved

With no further issues observed, the incident has been resolved. We will follow up soon with a root cause analysis.
If you believe you experienced an impact related to this incident, please report it to PubNub Support at support@pubnub.com.
Posted Jun 27, 2025 - 10:38 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jun 27, 2025 - 10:06 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Jun 27, 2025 - 10:00 UTC

Update

The PubNub Technical Staff continues to investigate. More updates will follow once available.
Posted Jun 27, 2025 - 09:50 UTC

Investigating

At approximately 08:15 UTC, PubNub services began experiencing elevated latencies and server errors in the Europe region. PubNub Technical Staff is currently investigating, and more updates will follow once available.
Posted Jun 27, 2025 - 09:21 UTC
This incident affected: Points of Presence (European Points of Presence).