Earlier today, our monitoring detected irregularities in system performance. Initially, there was no noticeable impact to clients. However, starting at 8:45 AM, service performance began to degrade, resulting in slower response times and delays. During this period, we received multiple client reports related to service responsiveness.
Resolution
Our engineering team immediately investigated the issue and identified a problem within the autoscaling automation of one service component. A corrective action was applied, and by 9:20 AM, system performance began to stabilize. Subsequent monitoring indicated that a small number of previously provisioned instances were still affected, which led to intermittent message delivery issues for a subset of requests. The team conducted additional investigation to trace and identify the specific servers involved. These instances were isolated and replaced, and service stability continued to improve. By 12:40 PM, message delivery and overall system performance had fully returned to normal. We continue to closely monitor the system to ensure stability.
Current Status
All systems are now operating normally, and the platform remains stable under continuous monitoring.
Next Steps
We are conducting a post-incident review to strengthen autoscaling safeguards and improve early detection, in order to prevent similar issues in the future. We appreciate your patience and understanding.