Increased errors and latency in IAD region

Incident Report for PubNub

Postmortem

Beginning on Friday, June 27, 2025 at 08:15 UTC, there were occasional, intermittent increases in latency and errors in three of our services: Pub/Sub, History, and Presence. The root cause discussed in this analysis was identified and corrected on Monday, June 30.

Problem Description, Impact, and Resolution 

Recently, to ensure PubNub had access to more cloud server capacity across our many regions, we introduced new instance types to our system to provide a more heterogeneous set of instance types on which PubNub’s services run. Over time, PubNub has created many OS/kernel-level configurations to optimize the performance of each server. However, with the more heterogeneous instance types, an underlying setting that we were explicitly specifying, which controls limits on network connectivity, was being silently overridden by our upstream load balancers. When we introduced the new instance types, they would reach connectivity limits. Unfortunately, the errors we initially encountered pointed us in incorrect directions, causing the investigation to take longer than we normally strive for.

The issue was mitigated once we identified this issue and configured the affected services to run on other instance types and launched more capacity.

Mitigation Steps and Recommended Future Preventative Measures 

To prevent recurrence, we modified the new instance types to emit metrics related to these OS thresholds and limits, enabling us to detect when these limits are approached or exceeded, regardless of instance type. This change allows us to scale proactively and properly route traffic based on instance type, ensuring we are more dynamic in heterogenous instance type deployment configuration. 

Again, we apologize for the incidents outlined above and are committed to maintaining transparency when issues affect our customers. Should you have any questions regarding this analysis, please reach out to our support team at support@pubnub.com.

Posted Jul 03, 2025 - 18:50 UTC

Resolved

At approximately 14:02 - 14:09 UTC, PubNub services began experiencing elevated latencies and server errors in the Europe region. PubNub Technical Staff is investigated and the issue has resolved
Posted Jun 27, 2025 - 15:52 UTC