Successfully reported this slideshow.

Scale-Out ccNUMA - Eurosys'18 Poster



1 of 1
1 of 1

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Scale-Out ccNUMA - Eurosys'18 Poster

  1. 1. Keeping the Caches ConsistentMotivation Results Scale-Out ccNUMA: Exploiting Skew with Strongly Consistent Caching Antonios Katsarakis, Vasilis Gavrielatos, Nicolai Oswald, Arpit Joshi, Boris Grot, Vijay Nagarajan University of Edinburgh State of the Art Our Solution % Cache size 
 (proportional to dataset) HitRate Symmetric Caching … … … Emerging technologies - Can be exploited to alleviate performance bottlenecks Remote Direct Memory Access (RDMA)
 Low-latency remote memory access In-Memory Storage
 Avoids slow disk access Need high performance - Low latency: 
 Response time is critical to user satisfaction - High throughput: 
 Must satisfy many concurrent requests - Real-world workloads exhibit skewed data accesses - Leads to inter-server load imbalance Skewed data accesses 128 Servers Observations - Most large scale workloads are Read-Intensive! - Writes: Performance vs Consistency tradeoff Stronger consistency more network traffic - Typical consistency protocols serialize via a directory
 Can lead to hot-spots due to skew Large scale online services - Massive datasets - Many concurrent users - Rely on multiple nodes for 
 storage and performance Fully Distributed Protocols - Symmetric Caching does not need a directory - Distributed write serialization via logical timestamps Directly execute hot writes on any node - Two strong (per-key) consistency flavours Sequential Consistency (SC) & Linearizability (Lin) - Efficient RDMA implementation Enhance all servers with a cache Skew: hottest objects responsible for most accesses
 Small but effective cache
 - 50% hit rate by caching just 0.1% of the dataset Less B/W: only cache misses require remote access Challenge: must keep the caches consistent Enhance all servers with a cache . Symmetric: Store same hottest objects on all nodes
 Exploit skew: small but effective cache Throughput scales with number of servers Less network b/w: most requests served locally ~ Challenge: must keep the caches consistent Uniformly distribute the accesses across all servers Servers use RDMA to access data within the cluster
 No locality: 
 Most requests require inter-server communication
 Increased latency Bottlenecked by network b/w! 9 servers, 56 Gbit NICs, skew exponent = 0.99 (YCSB) … Overloaded … … NUMA Abstraction … … … Local access Remote access >3χ 2.2χ 1.6χ Contrary to conventional wisdom:
 High-Performance & Strong Consistency with aggressive replication