On November 5, 2024, at 12:22 PM EST (5:22 PM UTC), partners on the Vidal (US East) platform experienced a service disruption that caused Managed Devices to go offline.
The incident was triggered by slower-than-expected recovery from emergency maintenance that was required to address an infrastructure issue.
The Datto RMM Infrastructure team, through their alerting systems, proactively identified database resource exhaustion on servers managing device sessions. To resolve this, they manually scaled the infrastructure and performed a failover. This action necessitated emergency maintenance to prevent further device offline alerts. During this time, a Kaseya Status page post was created to keep our partners informed.
The issue was confirmed resolved at 3:27 PM EST (8:27 PM UTC) on the same day.
To prevent a recurrence, the infrastructure team is currently reviewing platform utilization and growth projections to ensure sufficient resources are permanently allocated to support future demand.