Errors When Making API Calls To PubNub Services
Incident Report for PubNub
Postmortem

Problem Description & Impact

At 12:08 pm PDT on 2020/06/21, encrypted end-user requests into the PubNub Data Stream Network began failing for origins including pubsub.pubnub.com and other subdomains of pubnub.com. This resulted in failed requests to all PubNub services for which TLS and SSL were attempted for these domains until service was restored globally at approximately 1:25 pm PDT. All other origins, including custom origins, and subdomains of pubnubapi.com and pndsn.com were unaffected.

Root Cause

The root cause was determined to be a TLS/SSL certificate for *.pubnub.com origins that expired at 12:08 pm PDT on 2020/06/21.

Impact

Between 2020/06/21 12:08 pm PDT and 1:25 pm PDT, all new end-user connections into the PubNub Data Stream Network began failing for origins including pubsub.pubnub.com and other subdomains of pubnub.com. This includes requests to all PubNub services for these origins for which TLS and SSL were utilized, including Publish/Subscribe, Storage, Presence, Analytics, Functions, and Access Manager.

Other origins, including *.pubnubapi.com and *.pndsn.com, were unaffected. Also, existing, long-lived connections to *.pubsub.com origins may not have been impacted.

Resolution

Initial mitigation involved migrating pubsub.pubnub.com and other *.pubnub.com origins via DNS to our new edge systems which included an updated TLS certificate. This change was completed at 1:19 pm PDT and may have required an additional 300 seconds of DNS propagation. This restored service for all TLS end-users.

However these systems do not support SSLv3 (a very small fraction of overall traffic), so further changes required a global deployment of a new certificate and migration back to the original edge system which was completed at 11:33 am PDT on 6/22/2020. During the window of 1:19 pm, PDT on 6/21/2020 and 11:33 am PDT on 6/22/2020, all TLS 1.0, TLS 1.1 and TLS 1.2 requests should have succeeded, however, SSLv3 requests would have failed.

Mitigation Steps and Recommended Future Preventative Measures

  • Additional monitoring of all TLS/SSL endpoints
  • Validation of existing TLS/SSL validity alerts
  • Deprecation of SSLv3 and move to new edge systems with managed certificates
Posted Jun 25, 2020 - 17:09 UTC

Resolved
This incident has been resolved.
Posted Jun 21, 2020 - 21:55 UTC
Monitoring
All traffic using modern security protocols is fully operational. We have moved into monitoring phase and will update this status page if there are any issues.
Posted Jun 21, 2020 - 21:19 UTC
Update
We are seeing a vast majority of customer API traffic return to normal behaviors. The very small percentage of consumers of our service using the very old SSLv3 protocol may continue to have connectivity issues.
Posted Jun 21, 2020 - 20:42 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jun 21, 2020 - 20:15 UTC
Update
An expired certificate has been identified as the root cause. We are presently re-directing traffic to reduce impact to customers' usage of PubNub as much as possible. We expect most API calls to recover successfully within the next 10-15 minutes.
Posted Jun 21, 2020 - 20:14 UTC
Investigating
We are currently experiencing service degradation with some users experiencing errors when making API calls to the PubNub service. We are investigating and will update ASAP.
Posted Jun 21, 2020 - 20:09 UTC