Designing for MassiveScalability at BackType   Michael Montano / @michaelmontano
Desired properties of a                      back-endWednesday, November 17, 2010
Desired properties of a                      back-end                   •      Robust and fault-tolerant to both machine  ...
Desired properties of a                      back-end                   •      Robust and fault-tolerant to both machine  ...
Desired properties of a                      back-end                   •      Robust and fault-tolerant to both machine  ...
Desired properties of a                      back-end                   •      Robust and fault-tolerant to both machine  ...
Desired properties of a                      back-end                   •      Robust and fault-tolerant to both machine  ...
Desired properties of a                      back-end                   • Robust and fault-tolerant to both machine       ...
Desired properties of a                      back-end                   • Robust and fault-tolerant to both machine       ...
Desired properties of a                      back-end                   • Robust and fault-tolerant to both machine       ...
Layered Architecture                               Speed layer                               Serving layer                ...
Layered Architecture                                      Speed layer                                     Serving layer   ...
Batch Layer       view = fn(complete dataset)Wednesday, November 17, 2010
Batch Layer Views                   • Arbitrary                   • High latency                   • No random accessWedne...
Serving Layer                   • Provide random access to batch-computed                          views                  ...
ElephantDB                   • Our implementation of serving layer                   • Pre-shard key/value data via MapRed...
ElephantDB Flow                                     0                                     1      ElephantDB   Batch Layer ...
Batch and Serving Layers                               Tweet count     ElephantDB                                  view   ...
Batch and Serving Layers                      Robust and fault-tolerant to both machine                      and human err...
Speed Layer                   • Compensate for high latency of updates to                          serving layerWednesday,...
Speed Layer             Key point: Only needs to compensate for               data not yet absorbed in serving layerWednes...
Speed Layer             Key point: Only needs to compensate for               data not yet absorbed in serving layer      ...
Application-level Queries                       Serving Layer   Query                                               Merge ...
Speed Layer                   • Speed layer is transient                      • Serving layer eventually corrects speed   ...
Example             Example: Unique visitors to a domain                   • Batch/Serving layers                      • C...
Upcoming SlideShare
Loading in …5
×

Designing for Massive Scalability at BackType #bigdatacamp

8,040 views

Published on

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,040
On SlideShare
0
From Embeds
0
Number of Embeds
4,948
Actions
Shares
0
Downloads
71
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Designing for Massive Scalability at BackType #bigdatacamp

  1. 1. Designing for MassiveScalability at BackType Michael Montano / @michaelmontano
  2. 2. Desired properties of a back-endWednesday, November 17, 2010
  3. 3. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error.Wednesday, November 17, 2010
  4. 4. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates.Wednesday, November 17, 2010
  5. 5. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic.Wednesday, November 17, 2010
  6. 6. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services.Wednesday, November 17, 2010
  7. 7. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests.Wednesday, November 17, 2010
  8. 8. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. • Allows ad hoc queries.Wednesday, November 17, 2010
  9. 9. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. • Allows ad hoc queries. • Minimal maintenance.Wednesday, November 17, 2010
  10. 10. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. • Allows ad hoc queries. • Minimal maintenance. • Debuggable: can trace how any value in the system came to be.Wednesday, November 17, 2010
  11. 11. Layered Architecture Speed layer Serving layer Batch layerWednesday, November 17, 2010
  12. 12. Layered Architecture Speed layer Serving layer Batch layer Work in tandem to satisfy our desired propertiesWednesday, November 17, 2010
  13. 13. Batch Layer view = fn(complete dataset)Wednesday, November 17, 2010
  14. 14. Batch Layer Views • Arbitrary • High latency • No random accessWednesday, November 17, 2010
  15. 15. Serving Layer • Provide random access to batch-computed views • Update in batch, no random writes • High latency updatesWednesday, November 17, 2010
  16. 16. ElephantDB • Our implementation of serving layer • Pre-shard key/value data via MapReduce • ElephantDB ring pulls shards from HDFS on startup • Read-only access to dataWednesday, November 17, 2010
  17. 17. ElephantDB Flow 0 1 ElephantDB Batch Layer 2 ElephantDB 3 Shards on HDFSWednesday, November 17, 2010
  18. 18. Batch and Serving Layers Tweet count ElephantDB view Shards Complete Influencer ElephantDB ElephantDB dataset scores view Shards Ring (HDFS) Site affinity ElephantDB view Shards Batch Layer Serving LayerWednesday, November 17, 2010
  19. 19. Batch and Serving Layers Robust and fault-tolerant to both machine and human error. Low latency reads and updates. Scalable to increases in data or traffic. Extensible to support new features or related services. Generalizes to diverse types of data and requests. Allows ad hoc queries. Minimal maintenance. Debuggable: can trace how any value in the system came to be.Wednesday, November 17, 2010
  20. 20. Speed Layer • Compensate for high latency of updates to serving layerWednesday, November 17, 2010
  21. 21. Speed Layer Key point: Only needs to compensate for data not yet absorbed in serving layerWednesday, November 17, 2010
  22. 22. Speed Layer Key point: Only needs to compensate for data not yet absorbed in serving layer Hours of data instead of years of dataWednesday, November 17, 2010
  23. 23. Application-level Queries Serving Layer Query Merge Speed Layer QueryWednesday, November 17, 2010
  24. 24. Speed Layer • Speed layer is transient • Serving layer eventually corrects speed layer • Can make tradeoffs aggressively for performance • Can even tradeoff accuracyWednesday, November 17, 2010
  25. 25. Example Example: Unique visitors to a domain • Batch/Serving layers • Compute exact count • Speed layer • Keep set of visitors in a bloom filter • Incrementally update count and bloom filterWednesday, November 17, 2010

×