Elevated History Error/Latency in Tokyo Region
Incident Report for PubNub
Postmortem

Problem Description, Impact, and Resolution 

At 00:06 UTC on March 17, 2024, we observed increased error rates and latency in our Tokyo region for History calls. We then identified the source of latency and errors were due to our third-party provider for storage. We alerted the third-party provider, which then restarted the impacted storage nodes, and the issue was resolved at t 00:47 UTC on March 17, 2024.  

Mitigation Steps and Recommended Future Preventative Measures 

To prevent a similar issue from occurring in the future we have added monitoring to the swap space level on our servers so we will have better alerting if such issues with our third-party provider occur in the future.

Posted Mar 20, 2024 - 18:28 UTC

Resolved
This incident has been resolved.
Posted Mar 17, 2024 - 01:24 UTC
Update
We have not seen any errors or increased latency for ~45 minutes. We will continue to monitor history to validate the resolution.
Posted Mar 17, 2024 - 01:24 UTC
Monitoring
The elevated state of History errors and latency has returned to normal. We will continue to monitor the incident
Posted Mar 17, 2024 - 01:06 UTC
Investigating
Around 00:06 UTC we began to notice increasing errors and latency for History in Tokyo region.
We are investigating this incident.
Posted Mar 17, 2024 - 00:43 UTC
This incident affected: Realtime Network (Storage and Playback Service).