SlideShare a Scribd company logo
1 of 25
Download to read offline
Designing for Massive
Scalability at BackType
   Michael Montano / @michaelmontano
Desired properties of a
                      back-end




Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   •      Robust and fault-tolerant to both machine
                          and human error.




Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   •      Robust and fault-tolerant to both machine
                          and human error.
                   •      Low latency reads and updates.




Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   •      Robust and fault-tolerant to both machine
                          and human error.
                   •      Low latency reads and updates.
                   •      Scalable to increases in data or traffic.




Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   •      Robust and fault-tolerant to both machine
                          and human error.
                   •      Low latency reads and updates.
                   •      Scalable to increases in data or traffic.
                   •      Extensible to support new features or related
                          services.




Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   •      Robust and fault-tolerant to both machine
                          and human error.
                   •      Low latency reads and updates.
                   •      Scalable to increases in data or traffic.
                   •      Extensible to support new features or related
                          services.
                   •      Generalizes to diverse types of data and
                          requests.




Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   • Robust and fault-tolerant to both machine
                     and human error.
                   • Low latency reads and updates.
                   • Scalable to increases in data or traffic.
                   • Extensible to support new features or related
                     services.
                   • Generalizes to diverse types of data and
                     requests.
                   • Allows ad hoc queries.

Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   • Robust and fault-tolerant to both machine
                     and human error.
                   • Low latency reads and updates.
                   • Scalable to increases in data or traffic.
                   • Extensible to support new features or related
                     services.
                   • Generalizes to diverse types of data and
                     requests.
                   • Allows ad hoc queries.
                   • Minimal maintenance.
Wednesday, November 17, 2010
Desired properties of a
                      back-end
                   • Robust and fault-tolerant to both machine
                     and human error.
                   • Low latency reads and updates.
                   • Scalable to increases in data or traffic.
                   • Extensible to support new features or related
                     services.
                   • Generalizes to diverse types of data and
                     requests.
                   • Allows ad hoc queries.
                   • Minimal maintenance.
                   • Debuggable: can trace how any value in the
                     system came to be.
Wednesday, November 17, 2010
Layered Architecture
                               Speed layer
                               Serving layer
                               Batch layer

Wednesday, November 17, 2010
Layered Architecture
                                      Speed layer
                                     Serving layer
                                      Batch layer
                               Work in tandem to satisfy our
                                   desired properties
Wednesday, November 17, 2010
Batch Layer


       view = fn(complete dataset)




Wednesday, November 17, 2010
Batch Layer Views

                   • Arbitrary
                   • High latency
                   • No random access


Wednesday, November 17, 2010
Serving Layer

                   • Provide random access to batch-computed
                          views
                   • Update in batch, no random writes
                   • High latency updates


Wednesday, November 17, 2010
ElephantDB

                   • Our implementation of serving layer
                   • Pre-shard key/value data via MapReduce
                   • ElephantDB ring pulls shards from HDFS
                          on startup
                   • Read-only access to data

Wednesday, November 17, 2010
ElephantDB Flow
                                     0

                                     1      ElephantDB
   Batch Layer
                                     2
                                            ElephantDB
                                     3
                                   Shards
                                  on HDFS
Wednesday, November 17, 2010
Batch and Serving Layers
                               Tweet count     ElephantDB
                                  view            Shards

       Complete                 Influencer      ElephantDB   ElephantDB
        dataset                scores view        Shards        Ring
        (HDFS)
                               Site affinity    ElephantDB
                                   view           Shards


   Batch Layer                                Serving Layer
Wednesday, November 17, 2010
Batch and Serving Layers
                      Robust and fault-tolerant to both machine
                      and human error.
                      Low latency reads and updates.
                      Scalable to increases in data or traffic.
                      Extensible to support new features or related
                      services.
                      Generalizes to diverse types of data and requests.
                      Allows ad hoc queries.
                      Minimal maintenance.
                      Debuggable: can trace how any value in the
                      system came to be.
Wednesday, November 17, 2010
Speed Layer


                   • Compensate for high latency of updates to
                          serving layer




Wednesday, November 17, 2010
Speed Layer

             Key point: Only needs to compensate for
               data not yet absorbed in serving layer




Wednesday, November 17, 2010
Speed Layer

             Key point: Only needs to compensate for
               data not yet absorbed in serving layer



                       Hours of data instead of years of data


Wednesday, November 17, 2010
Application-level Queries

                       Serving Layer   Query

                                               Merge

                         Speed Layer   Query



Wednesday, November 17, 2010
Speed Layer

                   • Speed layer is transient
                      • Serving layer eventually corrects speed
                                 layer
                   • Can make tradeoffs aggressively for
                          performance
                               • Can even tradeoff accuracy
Wednesday, November 17, 2010
Example
             Example: Unique visitors to a domain

                   • Batch/Serving layers
                      • Compute exact count
                   • Speed layer
                      • Keep set of visitors in a bloom filter
                      • Incrementally update count and bloom
                               filter

Wednesday, November 17, 2010

More Related Content

What's hot

Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
Derek Collison
 
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig MorganOSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OpenStorageSummit
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloads
pittmantony
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
Tomas Doran
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
Rails infrastructure
Rails infrastructureRails infrastructure
Rails infrastructure
qureshiomar
 
Varrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationVarrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentation
pittmantony
 

What's hot (20)

SM16 - Can i move my stuff to openstack
SM16 - Can i move my stuff to openstackSM16 - Can i move my stuff to openstack
SM16 - Can i move my stuff to openstack
 
Zero mq logs
Zero mq logsZero mq logs
Zero mq logs
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for Success
 
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig MorganOSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
 
Vm13 vnx mixed workloads
Vm13 vnx mixed workloadsVm13 vnx mixed workloads
Vm13 vnx mixed workloads
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Codemotion 2015 Infinispan Tech lab
Codemotion 2015 Infinispan Tech labCodemotion 2015 Infinispan Tech lab
Codemotion 2015 Infinispan Tech lab
 
Mini-Training: Message Brokers
Mini-Training: Message BrokersMini-Training: Message Brokers
Mini-Training: Message Brokers
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community Update
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...
 
Varrow datacenter storage today and tomorrow
Varrow   datacenter storage today and tomorrowVarrow   datacenter storage today and tomorrow
Varrow datacenter storage today and tomorrow
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Rails infrastructure
Rails infrastructureRails infrastructure
Rails infrastructure
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicorn
 
CS6270 Virtual Machines - Retargetable Binary Translators
CS6270 Virtual Machines - Retargetable Binary TranslatorsCS6270 Virtual Machines - Retargetable Binary Translators
CS6270 Virtual Machines - Retargetable Binary Translators
 
Road show 2015 triangle meetup
Road show 2015 triangle meetupRoad show 2015 triangle meetup
Road show 2015 triangle meetup
 
Varrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationVarrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentation
 
Toward low-latency Java applications - javaOne 2014
Toward low-latency Java applications - javaOne 2014Toward low-latency Java applications - javaOne 2014
Toward low-latency Java applications - javaOne 2014
 

Similar to Designing for Massive Scalability at BackType #bigdatacamp

Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
Andy Gross
 
Deduplication and single instance storage
Deduplication and single instance storageDeduplication and single instance storage
Deduplication and single instance storage
Interop
 
Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)
Ontico
 
GWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr Pilot
GWAVA
 
HBase @ Hadoop Day Seattle
HBase @ Hadoop Day SeattleHBase @ Hadoop Day Seattle
HBase @ Hadoop Day Seattle
amansk
 

Similar to Designing for Massive Scalability at BackType #bigdatacamp (20)

Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
Inside Flume
Inside FlumeInside Flume
Inside Flume
 
Infinispan for Dummies
Infinispan for DummiesInfinispan for Dummies
Infinispan for Dummies
 
What's this NetKernel Thing Anyway?
What's this NetKernel Thing Anyway?What's this NetKernel Thing Anyway?
What's this NetKernel Thing Anyway?
 
Deduplication and single instance storage
Deduplication and single instance storageDeduplication and single instance storage
Deduplication and single instance storage
 
InterBase XE3 Datasheet
InterBase XE3 DatasheetInterBase XE3 Datasheet
InterBase XE3 Datasheet
 
Abstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at TwitterAbstractions at Scale – Our Experiences at Twitter
Abstractions at Scale – Our Experiences at Twitter
 
Scientific Applications with Python
Scientific Applications with PythonScientific Applications with Python
Scientific Applications with Python
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)Addressing vendor weaknesses in user space (Robert Treat)
Addressing vendor weaknesses in user space (Robert Treat)
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Aspera - Bridging On Premise and Cloud Deployments for Broadcast IT
Aspera - Bridging On Premise and Cloud Deployments for Broadcast ITAspera - Bridging On Premise and Cloud Deployments for Broadcast IT
Aspera - Bridging On Premise and Cloud Deployments for Broadcast IT
 
GWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr PilotGWAVACon 2013: Filr Pilot
GWAVACon 2013: Filr Pilot
 
Openstorage Openstack
Openstorage OpenstackOpenstorage Openstack
Openstorage Openstack
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2Netflix Open Source Meetup Season 3 Episode 2
Netflix Open Source Meetup Season 3 Episode 2
 
NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2NetflixOSS Meetup season 3 episode 2
NetflixOSS Meetup season 3 episode 2
 
The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin : 360|iDev Austin 2010The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin : 360|iDev Austin 2010
 
HBase @ Hadoop Day Seattle
HBase @ Hadoop Day SeattleHBase @ Hadoop Day Seattle
HBase @ Hadoop Day Seattle
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
 

Designing for Massive Scalability at BackType #bigdatacamp

  • 1. Designing for Massive Scalability at BackType Michael Montano / @michaelmontano
  • 2. Desired properties of a back-end Wednesday, November 17, 2010
  • 3. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. Wednesday, November 17, 2010
  • 4. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. Wednesday, November 17, 2010
  • 5. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. Wednesday, November 17, 2010
  • 6. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. Wednesday, November 17, 2010
  • 7. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. Wednesday, November 17, 2010
  • 8. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. • Allows ad hoc queries. Wednesday, November 17, 2010
  • 9. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. • Allows ad hoc queries. • Minimal maintenance. Wednesday, November 17, 2010
  • 10. Desired properties of a back-end • Robust and fault-tolerant to both machine and human error. • Low latency reads and updates. • Scalable to increases in data or traffic. • Extensible to support new features or related services. • Generalizes to diverse types of data and requests. • Allows ad hoc queries. • Minimal maintenance. • Debuggable: can trace how any value in the system came to be. Wednesday, November 17, 2010
  • 11. Layered Architecture Speed layer Serving layer Batch layer Wednesday, November 17, 2010
  • 12. Layered Architecture Speed layer Serving layer Batch layer Work in tandem to satisfy our desired properties Wednesday, November 17, 2010
  • 13. Batch Layer view = fn(complete dataset) Wednesday, November 17, 2010
  • 14. Batch Layer Views • Arbitrary • High latency • No random access Wednesday, November 17, 2010
  • 15. Serving Layer • Provide random access to batch-computed views • Update in batch, no random writes • High latency updates Wednesday, November 17, 2010
  • 16. ElephantDB • Our implementation of serving layer • Pre-shard key/value data via MapReduce • ElephantDB ring pulls shards from HDFS on startup • Read-only access to data Wednesday, November 17, 2010
  • 17. ElephantDB Flow 0 1 ElephantDB Batch Layer 2 ElephantDB 3 Shards on HDFS Wednesday, November 17, 2010
  • 18. Batch and Serving Layers Tweet count ElephantDB view Shards Complete Influencer ElephantDB ElephantDB dataset scores view Shards Ring (HDFS) Site affinity ElephantDB view Shards Batch Layer Serving Layer Wednesday, November 17, 2010
  • 19. Batch and Serving Layers Robust and fault-tolerant to both machine and human error. Low latency reads and updates. Scalable to increases in data or traffic. Extensible to support new features or related services. Generalizes to diverse types of data and requests. Allows ad hoc queries. Minimal maintenance. Debuggable: can trace how any value in the system came to be. Wednesday, November 17, 2010
  • 20. Speed Layer • Compensate for high latency of updates to serving layer Wednesday, November 17, 2010
  • 21. Speed Layer Key point: Only needs to compensate for data not yet absorbed in serving layer Wednesday, November 17, 2010
  • 22. Speed Layer Key point: Only needs to compensate for data not yet absorbed in serving layer Hours of data instead of years of data Wednesday, November 17, 2010
  • 23. Application-level Queries Serving Layer Query Merge Speed Layer Query Wednesday, November 17, 2010
  • 24. Speed Layer • Speed layer is transient • Serving layer eventually corrects speed layer • Can make tradeoffs aggressively for performance • Can even tradeoff accuracy Wednesday, November 17, 2010
  • 25. Example Example: Unique visitors to a domain • Batch/Serving layers • Compute exact count • Speed layer • Keep set of visitors in a bloom filter • Incrementally update count and bloom filter Wednesday, November 17, 2010