Incident date and time: 7/18/2018, Intermittently between 21:05 and 21:14 UTC
Affected Services: History and Channel Groups
Problem Description, Impact and Resolution:
We received a large spike in traffic which temporarily caused high latency and small error rates for Channel Groups and History. Both automatic and manual operational resiliency measures were deployed, which caused latency and errors to return to normal.
Mitigation Steps and Recommended Future Preventative Measures:
As we always continue to be the most reliable network some of the manual intervention that took place will soon be automated to reduce the time to resolution. While we are happy this incident was identified and resolved by internal alerting and monitoring, we are actively improving automated traffic shaping and elasticity.