Loading…
Colorado G-J [clear filter]
Monday, June 20
 

9:00am MDT

Opening Remarks
Program Co-Chairs: Austin Clements, Google, and Tyson Condie, University of California, Los Angeles

Monday June 20, 2016 9:00am - 9:15am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

9:15am MDT

How Not to Bid the Cloud
Cloud providers have begun to allow users to bid for surplus servers on a spot market. These servers are allocated if a user’s bid price is higher than their market price and revoked otherwise. Thus, analyzing price data to derive optimal bidding strategies has become a popular research topic. In this paper, we argue that sophisticated bidding strategies, in practice, do not provide any advantages over simple strategies for multiple reasons. First, due to price characteristics, there are a wide range of bid prices that yield the optimal cost and availability. Second, given the large number of spot markets, there is always a market with available surplus resources. Thus, if resources become unavailable due to a price spike, users need not wait until the spike subsides, but can instead provision a new spot resource elsewhere and migrate to it. Third, current spot market rules enable users to place maximum bids for resources without any penalty. Given bidding’s irrelevance, users can adopt trivial bidding strategies and focus instead on modifying applications to efficiently seek out and migrate to the lowest cost resources.

Monday June 20, 2016 9:15am - 9:40am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

9:40am MDT

QoX: Quality of Service and Consumption in the Cloud
Cloud services today are increasingly built using functionality from other running services. In this paper, we question whether legacy Quality of Services (QoS) metrics and enforcement techniques are sufficient as they are producer centric. We argue that, similar to customer rating systems found in banking systems and many sharing economy apps (e.g., Uber and Airbnb), Quality of Consumption (QoC) should be introduced to capture different metrics about service consumers. We show how the combination of QoS and QoC, dubbed QoX, can be used by consumers and providers to improve the security and management of their infrastructure. In addition, we demonstrate how sharing information among other consumers and providers increase the value of QoX. To address the main challenge with sharing information, namely sybil attacks and mis-information, we describe how we can leverage cloud providers as vouching authorities to ensure the integrity of information. We explore the motivations, challenges, and potentials to introduce such a framework in the cloud environment.

Monday June 20, 2016 9:40am - 10:05am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

10:05am MDT

Cloud Spot Markets are Not Sustainable: The Case for Transient Guarantees
Computational spot markets enable users to bid on servers, and then continuously allocates them to the highest bidder: if a user is “out bid” for a server, the market revokes it and re-allocates it to the new highest bidder. Spot markets are common when trading commodities to balance real-time supply and demand—cloud platforms use them to sell their idle capacity, which varies over time. However, server-time differs from other commodities in that it is “stateful”: losing a spot server incurs an overhead that decreases the useful work it performs. Thus, variations in the spot price actually affect the inherent value of server-time bought in the spot market. As the spot market matures, we argue that price volatility will significantly decrease the value of spot servers. Thus, somewhat counter-intuitively, spot markets may not maximize the value of idle server capacity. To address the problem, we propose a more sustainable alternative that offers a variable amount of idle capacity to users for a fixed price, but with transient guarantees.

Monday June 20, 2016 10:05am - 10:30am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

11:15am MDT

Interactive Debugging for Big Data Analytics
An abundance of data in many disciplines has accelerated the adoption of distributed technologies such as Hadoop and Spark, which provide simple programming semantics and an active ecosystem. However, the current cloud computing model lacks the kinds of expressive and interactive debugging features found in traditional desktop computing. We seek to address these challenges with the development of BIGDEBUG, a framework providing interactive debugging primitives and tool-assisted fault localization services for big data analytics. We showcase the data provenance and optimized incremental computation features to effectively and efficiently support interactive debugging, and investigate new research directions on how to automatically pinpoint and repair the root cause of errors in large-scale distributed data processing.

Monday June 20, 2016 11:15am - 11:40am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

11:40am MDT

Ovid: A Software-Defined Distributed Systems Framework
We present Ovid, a framework for building evolvable large-scale distributed systems that run in the cloud. Ovid constructs and deploys distributed systems as a collection of simple components, creating systems suited for containerization in the cloud. Ovid supports evolution of systems through transformations, which are automated refinements. Examples of transformations include replication, batching, sharding, and encryption. Ovid transformations guarantee that an evolving system still implements the same specification. Moreover, systems built with transformations can be combined with other systems to implement more complex infrastructure services. The result of this framework is a software-defined distributed system, in which a logically centralized controller specifies the components, their interactions, and their transformations.

Monday June 20, 2016 11:40am - 12:05pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

12:05pm MDT

Serverless Computation with OpenLambda
We present OpenLambda, a new, open-source platform for building next-generation web services and applications in the burgeoningmodel of serverless computation. We describe the key aspects of serverless computation, and present numerous research challenges that must be addressed in the design and implementation of such systems. We also include a brief study of current web applications, so as to better motivate some aspects of serverless application construction.

Monday June 20, 2016 12:05pm - 12:30pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

2:00pm MDT

Accelerating Complex Data Transfer for Cluster Computing
The ability to move data quickly between the nodes of a distributed system is important for the performance of cluster computing frameworks, such as Hadoop and Spark. We show that in a cluster with modern networking technology data serialization is the main bottleneck and source of overhead in the transfer of rich data in systems based on high-level programming languages such as Java. We propose a new data transfer mechanism that avoids serialization altogether by using a shared clusterwide address space to store data. The design and a prototype implementation of this approach are described. We show that our mechanism is significantly faster than serialized data transfer, and propose a number of possible applications for it.

Monday June 20, 2016 2:00pm - 2:25pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

2:25pm MDT

HyperOptics: A High Throughput and Low Latency Multicast Architecture for Datacenters
Multicast has long been a performance bottleneck for data centers. Traditional solutions relying on IP multicast suffer from poor congestion control and loss recovery on the data plane, as well as slow and complex group membership and multicast tree management on the control plane. Some recent proposals have employed alternate optical circuit switched paths to enable lossless multicast and a centralized control architecture to quickly configure multicast trees. However, the high circuit reconfiguration delay of optical switches has substantially limited multicast performance.

In this paper, we propose to eliminate this reconfiguration delay by an unconventional optical multicast architecture called HyperOptics that directly interconnects top of rack switches by low cost optical splitters, thereby eliminating the need for optical switches. The ToRs are organized to form the connectivity of a regular graph. We analytically show that this architecture is scalable and efficient for multicasts. Preliminary simulations show that running multicasts on HyperOptics can on average be 2.1x faster than on an optical circuit switched network.

Monday June 20, 2016 2:25pm - 2:50pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

2:50pm MDT

Graviton: Twisting Space and Time to Speed-up CoFlows
In this paper, we make a key observation that using multiple priority queues and weighted fair sharing at each port, Aalo does a good job in approximating SJF, but it does so only at the queue-granularity, as using FIFO to schedule CoFlows in each queue is rather simplistic, and has no reminiscence of SJF.

Instead, we discuss three insights into Aalo’s scheduler where exploiting the spatial dimension of the problem domain, i.e., the width (number of ports) of the CoFlows, can lead to better scheduling policies within each priority queue, improving the overall CCT.

Monday June 20, 2016 2:50pm - 3:15pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

4:00pm MDT

Mlcached: Multi-level DRAM-NAND Key-value Cache
We present Mlcached, multi-level DRAM-NAND keyvalue cache, that is designed to enable independent resource provisioning of DRAM and NAND flash memory by completely decoupling each caching layers. Mlcached utilizes DRAM for L1 cache and our new KVcache device for L2 cache. The index-integrated FTL is implemented in the KV-cache device to eliminate any inmemory indexes that prohibit the independent resource provisioning. We show that Mlcached is only 12.8% slower than a DRAM-only Web caching service in the average RTT with 80% L1 cache hit while saving twothirds of its TCO. Moreover, our model-based study shows that Mlcached can provide up to 6X lower cost or 4X lower latency at the same SLA or TCO, respectively.

Monday June 20, 2016 4:00pm - 4:25pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

4:25pm MDT

When Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration
FPGA-enabled datacenters have shown great potential for providing performance and energy efficiency improvement. In this paper we aim to answer one key question: how can we efficiently integrate FPGAs into stateof- the-art big-data computing frameworks like Apache Spark? To provide a generalized methodology and insights for efficient integration, we conduct an indepth analysis of challenges at single-thread, single-node multi-thread, and multi-node levels, and propose solutions including batch processing and the FPGA-as-a- Service framework to address them. With a step-by-step case study for the next-generation DNA sequencing application, we demonstrate how a straightforward integration with 1,000x slowdown can be tuned into an efficient integration with 2.6x overall system speedup and 2.4x energy efficiency improvement.

Monday June 20, 2016 4:25pm - 4:50pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

4:50pm MDT

Unikernel Monitors: Extending Minimalism Outside of the Box
Recently, unikernels have emerged as an exploration of minimalist software stacks to improve the security of applications in the cloud. In this paper, we propose extending the notion of minimalism beyond an individual virtual machine to include the underlying monitor and the interface it exposes. We propose unikernel monitors. Each unikernel is bundled with a tiny, specialized monitor that only contains what the unikernel needs both in terms of interface and implementation. Unikernel monitors improve isolation through minimal interfaces, reduce complexity, and boot unikernels quickly. Our initial prototype, ukvm, is less than 5% the code size of a traditional monitor, and bootsMirageOS unikernels in as little as 10ms (8× faster than a traditional monitor).

Monday June 20, 2016 4:50pm - 5:15pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202
 
Tuesday, June 21
 

11:00am MDT

Mapping Cross-Cloud Systems: Challenges and Opportunities
Recent years have seen significant growth in the cloud computing market, both in terms of provider competition (including private offerings) and customer adoption. However, the cloud computing world still lacks adopted standard programming interfaces, which has a knock-on effect on the costs associated with interoperability and severely limits the flexibility and portability of applications and virtual infrastructures. This has brought about an increasing number of cross-cloud architectures, i.e. systems that span across cloud provisioning boundaries. This paper condenses discussions from the CrossCloud event series to outline the types of cross-cloud systems and their associated design decisions, and laments challenges and opportunities they create.

Tuesday June 21, 2016 11:00am - 11:25am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

11:25am MDT

Towards a Network Marketplace in a Cloud
Virtually all public clouds today are run by single providers, and this creates near-monopolies, inefficient markets, and hinders innovation at the infrastructure level. There are current proposals to change this, by creating open architectures that allow providers of computing and storage resources to compete for tenant services at multiple levels, all the way down to the bare metal. Networking, however, is not part of this, and is viewed as a commodity much like power or cooling. In this paper we borrow ideas from the Internet architecture, and propose to structure the cloud datacenter network as a marketplace where multiple service providers can offer connectivity services to tenants. Our marketplace, NetEx, divides the network into independently managed pods of resources, interconnected with multiple providers through special programmable switches that play a role analogous to that of an IXP. We demonstrate the feasibility of such an architecture by a prototype in Mininet, and argue that this can be a way to provide innovation, competition, and efficiency in future cloud datacenter networks.

Tuesday June 21, 2016 11:25am - 11:50am MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

11:50am MDT

Neutrality in Future Public Clouds: Implications and Challenges
With public cloud providers poised to become indispensable utility providers, neutrality-related mandates will likely emerge to ensure a level playing field among their customers (“tenants”). We analogize with net neutrality to discuss: (i) what form cloud neutrality might take, (ii) what lessons might the net neutrality debate have to offer, and (iii) in what ways cloud neutrality would be different from (and even more difficult than) net neutrality. We use idealized thought experiments and simple workload case studies to illustrate our points and conclude with a discussion of challenges and future directions. Our paper points to a rich and important area for future work.

Tuesday June 21, 2016 11:50am - 12:15pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

2:00pm MDT

Scalable Cloud Security via Asynchronous Virtual Machine Introspection
Software will always be vulnerable to attacks. Although techniques exist that could prevent or limit the risk of exploits, performance overhead blocks their adoption. Services deployed into the cloud are typically customer facing, leaving them even more exposed to attacks from malicious users. However, the use of virtual machines, and the economy of scale found in cloud platforms, provides an opportunity to offer strong security guarantees to tenants at low cost to the cloud provider. We present ScaaS, a security Scanning as a Service framework for cloud platforms that uses frequent virtual machine checkpointing coupled with memory introspection techniques to detect bugs and malicious behavior in real time. By buffering VM outputs (i.e., outgoing network packets and disk writes) until a scan has been completed, ScaaS gives strong guarantees about the amount of damage an attack can do, while minimizing overheads.

Tuesday June 21, 2016 2:00pm - 2:25pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

2:25pm MDT

Low-Profile Source-side Deduplication for Virtual Machine Backup
This paper presents a source-side backup scheme with low-resource usage through collaborative deduplication and approximated lazy deletion when frequent virtual machine snapshot backup is required in a large-scale cloud cluster. The key ideas are to orchestrate multiround duplicate detection batches among machines in a partitioned asynchronous manner and remove most unreferenced content chunks with approximated snapshot deletion. This paper discusses the challenges, main design and strategies, and evaluation results.

Tuesday June 21, 2016 2:25pm - 2:50pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

2:50pm MDT

Design Patterns for Container-based Distributed Systems
In the late 1980s and early 1990s, object-oriented programming revolutionized software development, popularizing the approach of building of applications as collections of modular components. Today we are seeing a similar revolution in distributed system development, with the increasing popularity of microservice architectures built from containerized software components. Containers [15] [22] [1] [2] are particularly well-suited as the fundamental “object” in distributed systems by virtue of the walls they erect at the container boundary. As this architectural style matures, we are seeing the emergence of design patterns, much as we did for objectoriented programs, and for the same reason – thinking in terms of objects (or containers) abstracts away the lowlevel details of code, eventually revealing higher-level patterns that are common to a variety of applications and algorithms.

This paper describes three types of design patterns that we have observed emerging in container-based distributed systems: single-container patterns for container management, single-node patterns of closely cooperating containers, and multi-node patterns for distributed algorithms. Like object-oriented patterns before them, these patterns for distributed computation encode best practices, simplify development, and make the systems where they are used more reliable.

Tuesday June 21, 2016 2:50pm - 3:15pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

4:00pm MDT

An Experiment on Bare-Metal BigData Provisioning
Many BigData customers use on-demand platforms in the cloud, where they can get a dedicated virtual cluster in a couple of minutes and pay only for the time they use. Increasingly, there is a demand for bare-metal bigdata solutions for applications that cannot tolerate the unpredictability and performance degradation of virtualized systems. Existing bare-metal solutions can introduce delays of 10s of minutes to provision a cluster by installing operating systems and applications on the local disks of servers. This has motivated recent research developing sophisticated mechanisms to optimize this installation. These approaches assume that using network mounted boot disks incur unacceptable run-time overhead. Our analysis suggest that while this assumption is true for application data, it is incorrect for operating systems and applications, and network mounting the boot disk and applications result in negligible run-time impact while leading to faster provisioning time.

Tuesday June 21, 2016 4:00pm - 4:25pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

4:25pm MDT

The Tail at Scale: How to Predict It?
Scale-out applications have emerged as the dominant Internet services today. A request in a scale-out workload generally involves task partitioning and merging with barrier synchronization, making it difficult to predict the request tail latency to meet stringent tail Service Level Objectives (SLOs). In this paper, we find that the request tail latency can be faithfully predicted, in the high load region, by a prediction model using only the mean and variance of the task response time as input. The prediction errors for the 99th percentile request latency are found to be consistently within 10% at the load of 90%for both model and measurement-based testing cases. Consequently, the work in this paper establishes an important link between the request tail SLOs and the low order task statistics in a high load region, where the resource provisioning is desired. Finally, we discuss how the prediction model may facilitate highly scalable, tail-constrained resource provisioning for scaleout workloads.

Tuesday June 21, 2016 4:25pm - 4:50pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202

4:50pm MDT

On The [Ir]relevance of Network Performance for Data Processing
Modern data processing frameworks are used in a variety of settings for a diverse set of workloads such as sorting, indexing, iterative computations, structured query processing, etc. As these frameworks run in a distributed environment, a natural question to ask is – how important is the network to the performance of these frameworks? Recent research in this field has led to contradictory results. One camp advocates the limited impact of networking performance on the overall performance of the framework. On the other hand, there is a large body of work on networking optimizations for data processing frameworks.

In this paper, we search for a better understanding of the matter. While answering the basic question concerning the importance of the network performance, our analysis raises new questions and points to previously unexplored or unnoticed avenues for performance optimizations. We take Apache Spark as a representative of a modern data-processing framework. However, to broaden the scope of our investigation, we also experiment with other frameworks such as Flink, Power- Graph or Timely. In our study – rather than analysing Spark-specific peculiarities – we look into procedures and subsystems that are common in any of these frameworks such as networking IO, shuffle data management, object (de)serialization, copies, job scheduling and coordination, etc. Nonetheless, we are aware that the roles of those individual components are different for the various systems, and we exercise caution when making generalized statements about the performance.

Tuesday June 21, 2016 4:50pm - 5:15pm MDT
Denver Marriott City Center 1701 California Street, Denver, CO 80202
 
Filter sessions
Apply filters to sessions.