3. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
Wednesday, November 17, 2010
4. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
Wednesday, November 17, 2010
5. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
Wednesday, November 17, 2010
6. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related
services.
Wednesday, November 17, 2010
7. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related
services.
• Generalizes to diverse types of data and
requests.
Wednesday, November 17, 2010
8. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related
services.
• Generalizes to diverse types of data and
requests.
• Allows ad hoc queries.
Wednesday, November 17, 2010
9. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related
services.
• Generalizes to diverse types of data and
requests.
• Allows ad hoc queries.
• Minimal maintenance.
Wednesday, November 17, 2010
10. Desired properties of a
back-end
• Robust and fault-tolerant to both machine
and human error.
• Low latency reads and updates.
• Scalable to increases in data or traffic.
• Extensible to support new features or related
services.
• Generalizes to diverse types of data and
requests.
• Allows ad hoc queries.
• Minimal maintenance.
• Debuggable: can trace how any value in the
system came to be.
Wednesday, November 17, 2010
14. Batch Layer Views
• Arbitrary
• High latency
• No random access
Wednesday, November 17, 2010
15. Serving Layer
• Provide random access to batch-computed
views
• Update in batch, no random writes
• High latency updates
Wednesday, November 17, 2010
16. ElephantDB
• Our implementation of serving layer
• Pre-shard key/value data via MapReduce
• ElephantDB ring pulls shards from HDFS
on startup
• Read-only access to data
Wednesday, November 17, 2010
17. ElephantDB Flow
0
1 ElephantDB
Batch Layer
2
ElephantDB
3
Shards
on HDFS
Wednesday, November 17, 2010
18. Batch and Serving Layers
Tweet count ElephantDB
view Shards
Complete Influencer ElephantDB ElephantDB
dataset scores view Shards Ring
(HDFS)
Site affinity ElephantDB
view Shards
Batch Layer Serving Layer
Wednesday, November 17, 2010
19. Batch and Serving Layers
Robust and fault-tolerant to both machine
and human error.
Low latency reads and updates.
Scalable to increases in data or traffic.
Extensible to support new features or related
services.
Generalizes to diverse types of data and requests.
Allows ad hoc queries.
Minimal maintenance.
Debuggable: can trace how any value in the
system came to be.
Wednesday, November 17, 2010
20. Speed Layer
• Compensate for high latency of updates to
serving layer
Wednesday, November 17, 2010
21. Speed Layer
Key point: Only needs to compensate for
data not yet absorbed in serving layer
Wednesday, November 17, 2010
22. Speed Layer
Key point: Only needs to compensate for
data not yet absorbed in serving layer
Hours of data instead of years of data
Wednesday, November 17, 2010
24. Speed Layer
• Speed layer is transient
• Serving layer eventually corrects speed
layer
• Can make tradeoffs aggressively for
performance
• Can even tradeoff accuracy
Wednesday, November 17, 2010
25. Example
Example: Unique visitors to a domain
• Batch/Serving layers
• Compute exact count
• Speed layer
• Keep set of visitors in a bloom filter
• Incrementally update count and bloom
filter
Wednesday, November 17, 2010