SlideShare a Scribd company logo
Yahoo! Cloud Serving Benchmark
            Overview and results – February 3, 2010

                               Brian F. Cooper

                            cooperb@yahoo-inc.com


Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears

  System setup and tuning assistance from members of the Cassandra and HBase
                  committers, and the Sherpa engineering team




                                                                                    1
Versions of this deck
• V4.1 – Original set of results from
  benchmark
• V4.2 – added Cassandra 0.5 versus 0.4.2
  comparison, Cassandra range query
  results, and vary scan size results




                                            2
Motivation
•   There are many “cloud DB” and “nosql” systems out there
     – Sherpa/PNUTS
     – BigTable
          • HBase, Hypertable, HTable
     –   Megastore
     –   Azure
     –   Cassandra
     –   Amazon Web Services
          • S3, SimpleDB, EBS
     –   CouchDB
     –   Voldemort
     –   Dynomite
     –   Etc: Tokyo, Redis, MongoDB

•   How do they compare?
     – Feature tradeoffs
     – Performance tradeoffs
     – Not clear!


                                                              3
Goal
• Implement a standard benchmark
   – Evaluate different systems on common workloads
   – Focus on performance and scale out
      • Future additions – availability, replication


• Artifacts
   – Open source workload generator
   – Experimental study comparing several systems




                                                       4
Benchmark tool
•   Java application
      – Many systems have Java APIs
      – Other systems via HTTP/REST, JNI or some other solution

                                       Command-line parameters
                                       • DB to use
                                       • Target throughput
                                       • Number of threads
                                       •…




          Workload
                                       YCSB client




                                                                               Cloud DB
          parameter file


                                                            DB client
          • R/W mix
                                                 Client
          • Record size
                                       Workload threads
          • Data set
                                       executor
          •…
                                                   Stats


    Extensible: define new workloads
    Extensible: define new workloads
                                             Extensible: plug in new clients
                                             Extensible: plug in new clients
                                                                                          5
Workloads
•   Workload – particular combination of workload parameters, defining
    one workload
    – Defines read/write mix, request distribution, record size, …
    – Two ways to define workloads:
         • Adjust parameters to an existing workload (via properties file)
         • Define a new kind of workload (by writing Java code)

•   Experiment – running a particular workload on a particular hardware
    setup to produce a single graph for 1 or N systems
    – Example – vary throughput and measure latency while running a
      workload against Cassandra and HBase

•   Workload package – A collection of related workloads
    – Example: CoreWorkload – a set of basic read/write workloads




                                                                             6
Benchmark tiers
• Tier 1 – Performance
   – For constant hardware, increase offered throughput
     until saturation
   – Measure resulting latency/throughput curve
   – “Sizeup” in Wisconsin benchmark terminology

• Tier 2 – Scalability
   – Scaleup – Increase hardware, data size and workload
     proportionally. Measure latency; should be constant

   – Elastic speedup – Run workload against N servers;
     while workload is running att N+1th server; measure
     timeseries of latencies (should drop after adding
     server)
                                                           7
Test setup
•   Setup
     –   Six server-class machines
            •   8 cores (2 x quadcore) 2.5 GHz CPUs, 8 GB RAM, 6 x 146GB 15K RPM SAS drives in RAID 1+0,
                Gigabit ethernet, RHEL 4
     –   Plus extra machines for clients, routers, controllers, etc.
     –   Cassandra 0.4.2
     –   HBase 0.20.2
     –   MySQL 5.1.32 organized into a sharded configuration
     –   Sherpa 1.8
     –   No replication; force updates to disk (except HBase, which does not yet support this)

•   Workloads
     –   120 million 1 KB records = 20 GB per server
     –   Reads retrieve whole record; updates write a single field
     –   100 or more client threads

•   Caveats
     –   Write performance would be improved for Sherpa, sharded MySQL and Cassandra with a
         dedicated log disk
     –   We tuned each system as well as we knew how, with assistance from the teams of
         developers



                                                                                                           8
Workload A – Update heavy
                            •       50/50 Read/update
                                           Workload A - Read latency                                                       Workload A - Update latency

                            90                                                                                 80
                            80                                                                                 70
Average read latency (ms)




                            70




                                                                                         Update latency (ms)
                                                                                                               60
                            60
                                                                                                               50
                            50
                                                                                                               40
                            40
                                                                                                               30
                            30
                            20                                                                                 20

                            10                                                                                 10
                                0                                                                              0
                                    0       2000       4000            6000       8000                              0       2000        4000         6000       8000
                                                Throughput (ops/sec)                                                             Throughput (ops/sec)

                                    Cassandra      Hbase      Sherpa          MySQL                                 Cassandra      Hbase       Sherpa       MySQL


                            Comment: Cassandra is optimized for writes, and has better write latency. However, Sherpa
                              has pretty good write latency, comparable read latency, and comparable peak
                              throughput. HBase has good write latency because it does not sync updates to disk, at
                              the cost of lower durability; but read latency is very bad                              9
Workload B – Read heavy
•                               95/5 Read/update
                                            Workload B - Read latency                                                               Workload B - Update latency

                                60                                                                                      40




                                                                                          Average update latency (ms)
                                                                                                                        35
    Average read latency (ms)




                                50
                                                                                                                        30
                                40
                                                                                                                        25
                                30                                                                                      20
                                                                                                                        15
                                20
                                                                                                                        10
                                10
                                                                                                                        5
                                0                                                                                       0
                                     0     2000        4000      6000      8000   10000                                      0    2000         4000      6000      8000   10000
                                                 Throughput (operations/sec)                                                             Throughput (operations/sec)

                                     Cassandra        HBase       Sherpa       MySQL                                         Cassandra        Hbase       Sherpa       MySQL


Comment: Sherpa does very well here, with better read and write latency and peak
  throughput than Cassandra, and better read latency and peak throughput than HBase.
  Again HBase write latency is very low because of no disk syncs. Buffer pool architecture
  is good for random reads.                                                               10
Workload E – short scans
• Scans of 1-100 records of size 1KB
                                                               Workload E - Scan latency

                                         120


                                         100
             Average scan latency (ms)




                                         80


                                         60

                                         40


                                         20


                                          0
                                               0   200   400       600        800       1000        1200   1400   1600
                                                                  Throughput (operations/sec)

                                                                 Hbase      Sherpa      Cassandra



Comment: HBase and Sherpa are roughly equivalent for latency and peak throughput,
  even though HBase is “meant” for scans. Cassandra’s performance is poor, but the
  development team notes that many optimizations still need to be done.
                                                                                                                         11
Workload E – range size
• Vary size of range scans
                                                             Range size versus latency (Workload E)

                                            500
          Average range scan latency (ms)




                                            450
                                            400
                                            350
                                            300
                                            250
                                            200
                                            150
                                            100
                                            50
                                             0
                                                  0   200   400    600        800      1000     1200   1400   1600   1800
                                                                         Max range size (records)

                                                                             Hbase        Sherpa


Comment: For small ranges, queries are similar to random lookups; Sherpa is efficient for
  random lokoups and does well. As range increases, HBase begins to perform better
  since it is optimized for large scans                                                   12
Scale-up
• Read heavy workload with varying hardware
                                                       Read latency during scale-up


                                      35

                                      30
          Average read latency (ms)




                                      25

                                      20

                                      15

                                      10

                                      5

                                      0
                                           0   2   4            6             8            10   12   14
                                                               Number of servers

                                                        Cassandra        Hbase        Sherpa



Comment: Sherpa scales well, with flat latency as system size increases.
  Cassandra scales less well, with more P2P communication. HBase is very
  unstable; 3 servers or less performs very poorly. More experiments are
  needed to get more data points on these curves.
                                                                                                          13
Elasticity
• Run a read-heavy workload on 3 servers; add a 4th
  server after 5 minutes
                                                         Cassandra elastic read performance


                                          8.2

                                           8

                                          7.8
              Average read latency (ms)




                                          7.6

                                          7.4

                                          7.2

                                           7

                                          6.8

                                          6.6
                                                0   10    20          30          40          50   60   70
                                                                        Time (min)



Comment: Cassandra shows nice elasticity; after a fourth server is added,
  average latency of requests quickly drops by 11% with little or no
  disruption.                                                                                                14
Elasticity
• Run a read-heavy workload on 3 servers; add a 4th
  server after 5 minutes
                                                 Hbase elastic read performance (detail)

                                  70

                                  65

                                  60
      Average read latency (ms)




                                  55

                                  50

                                  45

                                  40

                                  35

                                  30
                                       0   10   20            30                40         50   60   70
                                                                   Time (min)



Comment: HBase initially exhibits a large latency spike, with some requests
  taking as much as 1000 ms; then, latency settles down and eventually
  becomes 12% lower than latency before adding the server.                                                15
Cassandra 0.5 Results
                                                         Workload A - Update heavy


                       90

                       80

                       70
Average latency (ms)




                       60

                       50

                       40

                       30

                       20

                       10

                       0
                            0     2000           4000          6000         8000         10000         12000        14000
                                                           Throughput (operations/sec)

                                  Cas 0.5 Read          Cas 0.5 Update      Cas 0.4.2 Read       Cas 0.4.2 Update




                                                                                                                            16
Cassandra 0.5 Results
                                                    Workload B - Read heavy


                       60


                       50
Average latency (ms)




                       40


                       30


                       20


                       10


                       0
                            0    1000    2000     3000     4000       5000          6000   7000      8000     9000
                                                      Throughput (operations/sec)

                                   Cas 0.5 Read    Cas 0.5 Update      Cas 0.4.2 Read      Cas 0.4.2 Update




                                                                                                                     17
For more information
• Contact: Brian Cooper (cooperb@yahoo-inc.com)
• Detailed writeup of benchmark:
  http://www.brianfrankcooper.net/pubs/ycsb.pdf
• Open source YCSB tool coming soon




                                                  18

More Related Content

What's hot

MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
Frederic Descamps
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
Kenny Gryp
 
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
MySQL 8.0 InnoDB Cluster - Easiest TutorialMySQL 8.0 InnoDB Cluster - Easiest Tutorial
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
Frederic Descamps
 
MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
Kenny Gryp
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
Frederic Descamps
 
Oracle ACFS High Availability NFS Services (HANFS)
Oracle ACFS High Availability NFS Services (HANFS)Oracle ACFS High Availability NFS Services (HANFS)
Oracle ACFS High Availability NFS Services (HANFS)
Anju Garg
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Kenny Gryp
 
Ironic - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12Ironic - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12
Vietnam Open Infrastructure User Group
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
Memory access tracing [poug17]
Memory access tracing [poug17]Memory access tracing [poug17]
Memory access tracing [poug17]
Mahmoud Hatem
 
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle Database
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle DatabaseOracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle Database
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle Database
Sandesh Rao
 
InnoDb Vs NDB Cluster
InnoDb Vs NDB ClusterInnoDb Vs NDB Cluster
InnoDb Vs NDB Cluster
Mark Swarbrick
 
MySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best PracticesMySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best Practices
Frederic Descamps
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for Performance
ScyllaDB
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
Jean-François Gagné
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
Michelle Holley
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
Cloudera, Inc.
 
Hybrid Data Guard to Cloud GEN2 ExaCS.pdf
Hybrid Data Guard to Cloud GEN2 ExaCS.pdfHybrid Data Guard to Cloud GEN2 ExaCS.pdf
Hybrid Data Guard to Cloud GEN2 ExaCS.pdf
ALI ANWAR, OCP®
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
Sveta Smirnova
 

What's hot (20)

MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorialMySQL InnoDB Cluster and Group Replication in a nutshell  hands-on tutorial
MySQL InnoDB Cluster and Group Replication in a nutshell hands-on tutorial
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
 
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
MySQL 8.0 InnoDB Cluster - Easiest TutorialMySQL 8.0 InnoDB Cluster - Easiest Tutorial
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
 
MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08MySQL Database Architectures - 2022-08
MySQL Database Architectures - 2022-08
 
MySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & OperationsMySQL InnoDB Cluster - Advanced Configuration & Operations
MySQL InnoDB Cluster - Advanced Configuration & Operations
 
Oracle ACFS High Availability NFS Services (HANFS)
Oracle ACFS High Availability NFS Services (HANFS)Oracle ACFS High Availability NFS Services (HANFS)
Oracle ACFS High Availability NFS Services (HANFS)
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
Ironic - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12Ironic - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Memory access tracing [poug17]
Memory access tracing [poug17]Memory access tracing [poug17]
Memory access tracing [poug17]
 
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle Database
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle DatabaseOracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle Database
Oracle AHF Insights 23c: Deeper Diagnostic Insights for your Oracle Database
 
InnoDb Vs NDB Cluster
InnoDb Vs NDB ClusterInnoDb Vs NDB Cluster
InnoDb Vs NDB Cluster
 
MySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best PracticesMySQL Group Replication: Handling Network Glitches - Best Practices
MySQL Group Replication: Handling Network Glitches - Best Practices
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for Performance
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...Accelerating Virtual Machine Access with the Storage Performance Development ...
Accelerating Virtual Machine Access with the Storage Performance Development ...
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Hybrid Data Guard to Cloud GEN2 ExaCS.pdf
Hybrid Data Guard to Cloud GEN2 ExaCS.pdfHybrid Data Guard to Cloud GEN2 ExaCS.pdf
Hybrid Data Guard to Cloud GEN2 ExaCS.pdf
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 

Viewers also liked

Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
Sqrrl
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
DataStax Academy
 
Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Alexey Rusnak
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
Carlos Juzarte Rolo
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDB
UNETA
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
Antonio Severien
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking Tutorial
Tilmann Rabl
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
Tim Lossen
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
Venkata Naga Ravi
 
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...
Amazon Web Services
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
Amazon Web Services
 
Why Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't WinWhy Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't Win
Health Catalyst
 
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)
Amazon Web Services
 
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
Amazon Web Services
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
Amazon Web Services
 
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
CLOUDIAN KK
 
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
Amazon Web Services
 
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
Amazon Web Services
 
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
Amazon Web Services Korea
 

Viewers also liked (20)

Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
 
Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDB
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking Tutorial
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...
AWS re:Invent 2016: State of the Union: Amazon Alexa and Recent Advances in C...
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
 
Why Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't WinWhy Your Healthcare Business Intelligence Strategy Can't Win
Why Your Healthcare Business Intelligence Strategy Can't Win
 
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)
AWS re:Invent 2016: Building a Smarter Home with Alexa(ALX303)
 
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Rekognition (MAC203)
 
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
 
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
AWS re:Invent 2016: Voice-enabling Your Home and Devices with Amazon Alexa an...
 
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
AWS re:Invent 2016: Alexa in the Enterprise: How JPL Leverages Alexa to Furth...
 
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
AWS CLOUD 2017 - 인공 지능과 클라우드와의 만남: Amazon의 신규 AI 서비스 (김무현 솔루션즈 아키텍트)
 

Similar to Yahoo Cloud Serving Benchmark

Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
Транслируем.бел
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
Performance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case StudyPerformance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case Study
shane_gibson
 
Hive
HiveHive
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
Cloudera, Inc.
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
Jun Park
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
OpenStack Foundation
 
xPatterns - Spark Summit 2014
xPatterns - Spark Summit   2014xPatterns - Spark Summit   2014
xPatterns - Spark Summit 2014
Claudiu Barbura
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Cloudera, Inc.
 
Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011
Hortonworks
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
Clarence J M Tauro
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
Adam Muise
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
Chiradeep Vittal
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
ScyllaDB
 
Olivier_Tisserand_projects
Olivier_Tisserand_projectsOlivier_Tisserand_projects
Olivier_Tisserand_projects
Olivier Tisserand
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
lucenerevolution
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
Driven Inc.
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmark
Clustrix
 

Similar to Yahoo Cloud Serving Benchmark (20)

Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Performance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case StudyPerformance Tuning a Cloud Application: A Real World Case Study
Performance Tuning a Cloud Application: A Real World Case Study
 
Hive
HiveHive
Hive
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
Blue host openstacksummit_2013
Blue host openstacksummit_2013Blue host openstacksummit_2013
Blue host openstacksummit_2013
 
Blue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environmentBlue host using openstack in a traditional hosting environment
Blue host using openstack in a traditional hosting environment
 
xPatterns - Spark Summit 2014
xPatterns - Spark Summit   2014xPatterns - Spark Summit   2014
xPatterns - Spark Summit 2014
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
 
Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
Olivier_Tisserand_projects
Olivier_Tisserand_projectsOlivier_Tisserand_projects
Olivier_Tisserand_projects
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmark
 

More from kevin han

모바일게임 시장 및 기업의 대응 사례 분석[1]
모바일게임 시장 및 기업의 대응 사례 분석[1]모바일게임 시장 및 기업의 대응 사례 분석[1]
모바일게임 시장 및 기업의 대응 사례 분석[1]kevin han
 
China Mobile Internet Development
China Mobile Internet DevelopmentChina Mobile Internet Development
China Mobile Internet Development
kevin han
 
487 모바일게임의 성공적인 비즈니스모델
487 모바일게임의 성공적인 비즈니스모델487 모바일게임의 성공적인 비즈니스모델
487 모바일게임의 성공적인 비즈니스모델kevin han
 
게임빌발표자료
게임빌발표자료게임빌발표자료
게임빌발표자료kevin han
 
2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]
2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]
2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]kevin han
 
게임시장 동향
게임시장 동향게임시장 동향
게임시장 동향kevin han
 
Mobile Web Content And Services In Europe
Mobile Web Content And Services In EuropeMobile Web Content And Services In Europe
Mobile Web Content And Services In Europe
kevin han
 
일본 모바일 시장 분석
일본 모바일 시장 분석일본 모바일 시장 분석
일본 모바일 시장 분석
kevin han
 

More from kevin han (8)

모바일게임 시장 및 기업의 대응 사례 분석[1]
모바일게임 시장 및 기업의 대응 사례 분석[1]모바일게임 시장 및 기업의 대응 사례 분석[1]
모바일게임 시장 및 기업의 대응 사례 분석[1]
 
China Mobile Internet Development
China Mobile Internet DevelopmentChina Mobile Internet Development
China Mobile Internet Development
 
487 모바일게임의 성공적인 비즈니스모델
487 모바일게임의 성공적인 비즈니스모델487 모바일게임의 성공적인 비즈니스모델
487 모바일게임의 성공적인 비즈니스모델
 
게임빌발표자료
게임빌발표자료게임빌발표자료
게임빌발표자료
 
2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]
2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]
2008년전세계모바일콘텐츠시장규모및부문별트렌드[1]
 
게임시장 동향
게임시장 동향게임시장 동향
게임시장 동향
 
Mobile Web Content And Services In Europe
Mobile Web Content And Services In EuropeMobile Web Content And Services In Europe
Mobile Web Content And Services In Europe
 
일본 모바일 시장 분석
일본 모바일 시장 분석일본 모바일 시장 분석
일본 모바일 시장 분석
 

Yahoo Cloud Serving Benchmark

  • 1. Yahoo! Cloud Serving Benchmark Overview and results – February 3, 2010 Brian F. Cooper cooperb@yahoo-inc.com Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and tuning assistance from members of the Cassandra and HBase committers, and the Sherpa engineering team 1
  • 2. Versions of this deck • V4.1 – Original set of results from benchmark • V4.2 – added Cassandra 0.5 versus 0.4.2 comparison, Cassandra range query results, and vary scan size results 2
  • 3. Motivation • There are many “cloud DB” and “nosql” systems out there – Sherpa/PNUTS – BigTable • HBase, Hypertable, HTable – Megastore – Azure – Cassandra – Amazon Web Services • S3, SimpleDB, EBS – CouchDB – Voldemort – Dynomite – Etc: Tokyo, Redis, MongoDB • How do they compare? – Feature tradeoffs – Performance tradeoffs – Not clear! 3
  • 4. Goal • Implement a standard benchmark – Evaluate different systems on common workloads – Focus on performance and scale out • Future additions – availability, replication • Artifacts – Open source workload generator – Experimental study comparing several systems 4
  • 5. Benchmark tool • Java application – Many systems have Java APIs – Other systems via HTTP/REST, JNI or some other solution Command-line parameters • DB to use • Target throughput • Number of threads •… Workload YCSB client Cloud DB parameter file DB client • R/W mix Client • Record size Workload threads • Data set executor •… Stats Extensible: define new workloads Extensible: define new workloads Extensible: plug in new clients Extensible: plug in new clients 5
  • 6. Workloads • Workload – particular combination of workload parameters, defining one workload – Defines read/write mix, request distribution, record size, … – Two ways to define workloads: • Adjust parameters to an existing workload (via properties file) • Define a new kind of workload (by writing Java code) • Experiment – running a particular workload on a particular hardware setup to produce a single graph for 1 or N systems – Example – vary throughput and measure latency while running a workload against Cassandra and HBase • Workload package – A collection of related workloads – Example: CoreWorkload – a set of basic read/write workloads 6
  • 7. Benchmark tiers • Tier 1 – Performance – For constant hardware, increase offered throughput until saturation – Measure resulting latency/throughput curve – “Sizeup” in Wisconsin benchmark terminology • Tier 2 – Scalability – Scaleup – Increase hardware, data size and workload proportionally. Measure latency; should be constant – Elastic speedup – Run workload against N servers; while workload is running att N+1th server; measure timeseries of latencies (should drop after adding server) 7
  • 8. Test setup • Setup – Six server-class machines • 8 cores (2 x quadcore) 2.5 GHz CPUs, 8 GB RAM, 6 x 146GB 15K RPM SAS drives in RAID 1+0, Gigabit ethernet, RHEL 4 – Plus extra machines for clients, routers, controllers, etc. – Cassandra 0.4.2 – HBase 0.20.2 – MySQL 5.1.32 organized into a sharded configuration – Sherpa 1.8 – No replication; force updates to disk (except HBase, which does not yet support this) • Workloads – 120 million 1 KB records = 20 GB per server – Reads retrieve whole record; updates write a single field – 100 or more client threads • Caveats – Write performance would be improved for Sherpa, sharded MySQL and Cassandra with a dedicated log disk – We tuned each system as well as we knew how, with assistance from the teams of developers 8
  • 9. Workload A – Update heavy • 50/50 Read/update Workload A - Read latency Workload A - Update latency 90 80 80 70 Average read latency (ms) 70 Update latency (ms) 60 60 50 50 40 40 30 30 20 20 10 10 0 0 0 2000 4000 6000 8000 0 2000 4000 6000 8000 Throughput (ops/sec) Throughput (ops/sec) Cassandra Hbase Sherpa MySQL Cassandra Hbase Sherpa MySQL Comment: Cassandra is optimized for writes, and has better write latency. However, Sherpa has pretty good write latency, comparable read latency, and comparable peak throughput. HBase has good write latency because it does not sync updates to disk, at the cost of lower durability; but read latency is very bad 9
  • 10. Workload B – Read heavy • 95/5 Read/update Workload B - Read latency Workload B - Update latency 60 40 Average update latency (ms) 35 Average read latency (ms) 50 30 40 25 30 20 15 20 10 10 5 0 0 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Throughput (operations/sec) Throughput (operations/sec) Cassandra HBase Sherpa MySQL Cassandra Hbase Sherpa MySQL Comment: Sherpa does very well here, with better read and write latency and peak throughput than Cassandra, and better read latency and peak throughput than HBase. Again HBase write latency is very low because of no disk syncs. Buffer pool architecture is good for random reads. 10
  • 11. Workload E – short scans • Scans of 1-100 records of size 1KB Workload E - Scan latency 120 100 Average scan latency (ms) 80 60 40 20 0 0 200 400 600 800 1000 1200 1400 1600 Throughput (operations/sec) Hbase Sherpa Cassandra Comment: HBase and Sherpa are roughly equivalent for latency and peak throughput, even though HBase is “meant” for scans. Cassandra’s performance is poor, but the development team notes that many optimizations still need to be done. 11
  • 12. Workload E – range size • Vary size of range scans Range size versus latency (Workload E) 500 Average range scan latency (ms) 450 400 350 300 250 200 150 100 50 0 0 200 400 600 800 1000 1200 1400 1600 1800 Max range size (records) Hbase Sherpa Comment: For small ranges, queries are similar to random lookups; Sherpa is efficient for random lokoups and does well. As range increases, HBase begins to perform better since it is optimized for large scans 12
  • 13. Scale-up • Read heavy workload with varying hardware Read latency during scale-up 35 30 Average read latency (ms) 25 20 15 10 5 0 0 2 4 6 8 10 12 14 Number of servers Cassandra Hbase Sherpa Comment: Sherpa scales well, with flat latency as system size increases. Cassandra scales less well, with more P2P communication. HBase is very unstable; 3 servers or less performs very poorly. More experiments are needed to get more data points on these curves. 13
  • 14. Elasticity • Run a read-heavy workload on 3 servers; add a 4th server after 5 minutes Cassandra elastic read performance 8.2 8 7.8 Average read latency (ms) 7.6 7.4 7.2 7 6.8 6.6 0 10 20 30 40 50 60 70 Time (min) Comment: Cassandra shows nice elasticity; after a fourth server is added, average latency of requests quickly drops by 11% with little or no disruption. 14
  • 15. Elasticity • Run a read-heavy workload on 3 servers; add a 4th server after 5 minutes Hbase elastic read performance (detail) 70 65 60 Average read latency (ms) 55 50 45 40 35 30 0 10 20 30 40 50 60 70 Time (min) Comment: HBase initially exhibits a large latency spike, with some requests taking as much as 1000 ms; then, latency settles down and eventually becomes 12% lower than latency before adding the server. 15
  • 16. Cassandra 0.5 Results Workload A - Update heavy 90 80 70 Average latency (ms) 60 50 40 30 20 10 0 0 2000 4000 6000 8000 10000 12000 14000 Throughput (operations/sec) Cas 0.5 Read Cas 0.5 Update Cas 0.4.2 Read Cas 0.4.2 Update 16
  • 17. Cassandra 0.5 Results Workload B - Read heavy 60 50 Average latency (ms) 40 30 20 10 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Throughput (operations/sec) Cas 0.5 Read Cas 0.5 Update Cas 0.4.2 Read Cas 0.4.2 Update 17
  • 18. For more information • Contact: Brian Cooper (cooperb@yahoo-inc.com) • Detailed writeup of benchmark: http://www.brianfrankcooper.net/pubs/ycsb.pdf • Open source YCSB tool coming soon 18