Elevated History Error/Latency in Tokyo Region

Incident Report for PubNub

Postmortem

Problem Description, Impact, and Resolution

At 00:06 UTC on March 17, 2024, we observed increased error rates and latency in our Tokyo region for History calls. We then identified the source of latency and errors were due to our third-party provider for storage. We alerted the third-party provider, which then restarted the impacted storage nodes, and the issue was resolved at t 00:47 UTC on March 17, 2024.

‌

Mitigation Steps and Recommended Future Preventative Measures

To prevent a similar issue from occurring in the future we have added monitoring to the swap space level on our servers so we will have better alerting if such issues with our third-party provider occur in the future.

Posted Mar 20, 2024 - 18:28 UTC

Resolved

This incident has been resolved.

Posted Mar 17, 2024 - 01:24 UTC

Update

We have not seen any errors or increased latency for ~45 minutes. We will continue to monitor history to validate the resolution.

Posted Mar 17, 2024 - 01:24 UTC

Monitoring

The elevated state of History errors and latency has returned to normal. We will continue to monitor the incident

Posted Mar 17, 2024 - 01:06 UTC

Investigating

Around 00:06 UTC we began to notice increasing errors and latency for History in Tokyo region.
We are investigating this incident.

Posted Mar 17, 2024 - 00:43 UTC

This incident affected: Realtime Network (Storage and Playback Service).