Loading…
Monday, June 20 • 9:50am - 10:15am
SEeSAW - Similarity Exploiting Storage for Accelerating Analytics Workflows

Sign up or log in to save this to your schedule and see who's attending!

The key to successful deployment of big data solutions lies in the timely distillation of meaningful information. This is made difficult by the mismatch between volume and velocity of data at scale and challenges posed by disparate speeds of IO, CPU, memory and communication links of data storage and processing systems. Instead of viewing storage as a bottleneck in this pipeline, we believe that storage systems are best positioned to discover and exploit intrinsic data properties to enhance information density of stored data. This has the potential to reduce the amount of new information that needs to be processed by an analytics workflow. Towards exploring this possibility, we propose SEeSAW, a Similarity Exploiting Storage for Accelerating Analytics Workflows that makes similarity a fundamental storage primitive. We show that SEeSAW transparently eliminates the need for applications to process uninformative data, thereby incurring substantially lower costs on IO, memory, computation and communication while speeding up (about 97% as observed in our experiment) the rate at which actionable outcomes can be derived by analyzing data. By increasing capacity of analytics workloads to absorb more data within the same resource envelope, SEeSAW can open up rich opportunities to reap greater benefits from machine and human generated data accumulated from various sources.

Monday June 20, 2016 9:50am - 10:15am
Denver Marriott City Center 1701 California Street, Denver, CO 80202