US East latencies have recovered. Affected services include History Writes, Presence Writes and Push Writes. Reads not affected. History/Presence/Push Read Events were not affected. Publish and Subscribe was not affected. Stream Controller was not affected. BLOCKS was not affected. Access Manager was not affected.
Root cause found with new hardware additions. We upgraded our hardware in US East with networked mounted SSD Provisioned IOPS. Replacing old server types with new ones. As per our standard operations practices, we upgrade our hardware and software. Our new servers included modern network mounted SSD. Our older hardware used local ephemeral SSDs. The new hardware routes, after operating successfully, had an effective variable disk IO performance. This variability initiated a backlog in our event pipeline engine over time. Due to the variable performance of network mounted SSD vs Local Ephemeral SSD, we began to see a backlog. This slowdown did not occur instantly of course. This change was not noticeable or alert-able according to our current expectations in our metrics. We recovered to normal latencies after adding additional disk IO throughput. Our plan going forward is to monitor variable latency on network mounted SSDs. Also we will ensure enough IO capacity is available taking into account the variable IO throughput provided to us with network mounted SSDs.
If you have questions send us an email support@pubnub.com