Back To Schedule
Thursday, June 23 • 5:05pm - 5:30pm
Getting Back Up: Understanding How Enterprise Data Backups Fail

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

In the enterprise world, retaining data backups is the de-facto solution against data loss in the event of catastrophic failures. As backup software evolves to achieve faster backup and recovery times, however, backup systems deploying it become increasingly complex to administer. This complexity stems from optimizations targeted to specific applications, which increase the number of configuration parameters for the system. Still, there is no work in the literature that attempts to study the error characteristics of enterprise backup systems, despite our reliance on the guarantees they provide.

With this study we aim to help researchers and practitioners understand how backup system jobs fail, and identify factors that can be used to predict these failures. Our results are derived from an analysis of data on 775 million jobs, collected from more than 20,000 backup software installations over a span of 3 years. We confirm that trends reported in the software reliability literature also hold for backup systems, such as that the majority of job errors are due to misconfigurations. For the systems in our dataset, we find that error rates remain stable across software versions and over time. To better understand these errors, we investigate the effect of several factors on the system’s error rate, such as job sizes and policy complexity, and demonstrate their predictive power for future errors.

Thursday June 23, 2016 5:05pm - 5:30pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

Attendees (2)