At 15:45 UTC on 2022-06-27, we observed delays in push notifications sent and messages written to history, as well as excess presence join & leave events. In response, we scaled the underlying systems supporting these services,, and the issue was resolved at 16:05 UTC. This issue occurred because our third party service provider experienced an outage in the US-East PoP.
To prevent a similar issue from occurring in the future, we are updating our processes to ensure that malfunctioning nodes are restarted in a way that will preserve their state for analysis, as well as updating our runbook for scaling the system.