Starting on April 10 at 7:10PM UTC, a subset of 5% of the BCDR On Prem Device Partner fleet experienced a service interruption which caused them to receive false positive hard drive failure alerts if their devices contained certain models of Western Digital HDDs.
The root cause for this service interruption was deploying updates to drive health reporting of SMART data from disk drives in partner devices. This data is used to measure when a partner alert to replace a failing drive should be sent.
Efforts to improve overall drive health reporting and make it clear and useful to our partners remain ongoing.
Our Engineering team deployed a fix to correct the problem on April 16 at 7:00 PM UTC.
To prevent this issue from occurring in the future, we have added exceptions to upload raw values for all SMART statistics that drive portal hard drive alerting.