Loading…
Wednesday, June 22 • 2:55pm - 3:20pm
FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication

Sign up or log in to save this to your schedule and see who's attending!

Content-Defined Chunking (CDC) has been playing a key role in data deduplication systems in the past 15 years or so due to its high redundancy detection abil- ity. However, existing CDC-based approaches introduce heavy CPU overhead because they declare the chunk cut- points by computing and judging the rolling hashes of the data stream byte by byte. In this paper, we pro- pose FastCDC, a Fast and efficient CDC approach, that builds and improves on the latest Gear-based CDC ap- proach, one of the fastest CDC methods to our knowl- edge. The key idea behind FastCDC is the combined use of three key techniques, namely, simplifying and enhanc- ing the hash judgment to address our observed challenges facing Gear-based CDC, skipping sub-minimum chunk cut-point to further speed up CDC, and normalizing the chunk-size distribution in a small specified region to ad- dress the problem of the decreased deduplication ratio stemming from the cut-point skipping. Our evaluation results show that, by using a combination of the three techniques, FastCDC is about 10x faster than the best of open-source Rabin-based CDC, and about 3x faster than the state-of-the-art Gear- and AE-based CDC, while achieving nearly the same deduplication ratio as the clas- sic Rabin-based approach.

Wednesday June 22, 2016 2:55pm - 3:20pm
Denver Marriott City Center 1701 California Street, Denver, CO 80202

Attendees (2)