On 25th April 2025 around 16:45 UTC, some Datto RMM partners on the Zinfandel (US WEST) platform experienced a service issue that caused Jobs to execute with a long delay.
The root cause of the incident was identified to be resource exhaustion of the service that handles Job queuing and execution: in spite of automatic scaling, the available resources were insufficient to handle all requests in real time at the time of this issue.
The Infrastructure team increased resolved the issue by cycling service tasks and manually scaling the infrastructure over the auto-scaling limit.
In the interest of mitigating the risk of recurrence, the Infrastructure team increased the baseline of resources available to the service.
Further investigation will be underway to identify opportunities to improve the efficiency of the supporting service.