1. Keeping the Caches ConsistentMotivation
Results
Scale-Out ccNUMA: Exploiting Skew with Strongly Consistent Caching
Antonios Katsarakis, Vasilis Gavrielatos, Nicolai Oswald, Arpit Joshi, Boris Grot, Vijay Nagarajan
University of Edinburgh
State of the Art Our Solution
% Cache size
(proportional to dataset)
HitRate
Symmetric Caching
… … …
Emerging technologies
- Can be exploited to alleviate performance bottlenecks
Remote Direct Memory Access (RDMA)
Low-latency remote memory access
In-Memory Storage
Avoids slow disk access
Need high performance
- Low latency:
Response time is critical to user satisfaction
- High throughput:
Must satisfy many concurrent requests
- Real-world workloads exhibit skewed data accesses
- Leads to inter-server load imbalance
Skewed data accesses
128 Servers
Observations
- Most large scale workloads are Read-Intensive!
- Writes: Performance vs Consistency tradeoff
Stronger consistency more network traffic
- Typical consistency protocols serialize via a directory
Can lead to hot-spots due to skew
Large scale online services
- Massive datasets
- Many concurrent users
- Rely on multiple nodes for
storage and performance
Fully Distributed Protocols
- Symmetric Caching does not need a directory
- Distributed write serialization via logical timestamps
Directly execute hot writes on any node
- Two strong (per-key) consistency flavours
Sequential Consistency (SC) & Linearizability (Lin)
- Efficient RDMA implementation
Enhance all servers with a cache
Skew: hottest objects responsible for most accesses
Small but effective cache
- 50% hit rate by caching just 0.1% of the dataset
Less B/W: only cache misses require remote access
Challenge: must keep the caches consistent
Enhance all servers with a cache .
Symmetric: Store same hottest objects on all nodes
Exploit skew: small but effective cache
Throughput scales with number of servers
Less network b/w: most requests served locally
~ Challenge: must keep the caches consistent
Uniformly distribute the accesses across all servers
Servers use RDMA to access data within the cluster
No locality:
Most requests require inter-server communication
Increased latency
Bottlenecked by network b/w!
9 servers, 56 Gbit NICs, skew exponent = 0.99 (YCSB)
…
Overloaded
… …
NUMA Abstraction
… … …
Local access
Remote access
>3χ 2.2χ
1.6χ
Contrary to conventional wisdom:
High-Performance & Strong Consistency with aggressive replication