There were intermittent errors and increased latencies related to Functions executions (XHR, Vault, and Pubnub library call) as well as minor impact to Presence, Storage, and Mobile Push Webhook calls. This was isolated to our Southern Asia PoP.
The root cause was a large spike in outbound request errors which caused stress to our system beyond the scale of traffic it was processing. We were able to scale to meet demand, but we also improved our monitoring and alerting around this edge case to minimize any effects in the future. We are also working to improve the resilience of the system to these issues so they will be prevented entirely.