Your SlideShare is downloading. ×
NoSQL benchmarking
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

NoSQL benchmarking

1,655
views

Published on

NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these …

NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these heterogeneous database systems. MongoDB, CouchDB, Cassandra, HBase, Memcaches etc belong to one of 4 families and a common workload can be generated by ycsb to simulate your usecase and benchmark them.

Published in: Technology, Business

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,655
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. #CMGIndia NoSQL Database Benchmarking Prasoon Kumar MongoDB
  • 2. Agenda •  What is NoSQL? •  YCSB •  Benchmarking parameters •  Findings •  Wrap up
  • 3. What is NoSQL? There are many NoSQL systems http://nosql-database.org/ Lists 150 currently, up from 120 in 2012 •  •  How do they compare? –  Feature trade-off –  Performance trade-off –  Not clear! PNUTS
  • 4. Scale-out vs Scale-up Then: Scale Up Now: Scale Out
  • 5. Data Model Key-Value Store Riak Document Database Column-Family Stores Graph Databases MongoDB Amazon SimpleDB FlockDB CouchDB Cassandra Neo4J OrientDB Hbase OrientDB Memcache Project Voldemort Redis Hypertable BerkeleyDB
  • 6. Yahoo! Cloud Serving Benchmark aka ycsb
  • 7. Origin Paper that started it all! Benchmarking cloud serving systems with YCSB, 2010 Authors: Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research Goal •  Evaluate different systems on common workload •  Focus on performance, scaleout •  Extensible by parameters and even writing different workload
  • 8. Install •  Download YCSB to client node git clone https://github.com/achille/YCSB •  Install Java driver for particular database •  Run ycsb command: •  •  ./bin/ycsb load mongodb -P workloads/workloada -p recordcount=10000 ./bin/ycsb run mongodb -P workloads/workloada -p recordcount=10000
  • 9. Steps There are 6 steps to running a workload: 1.  Set up the database system to test 2.  Choose the appropriate DB interface layer 3.  Choose the appropriate workload 4.  Choose the appropriate runtime parameters (number of client threads, target throughput, etc.) 5.  Load the data 6.  Execute the workload
  • 10. Tool YCSB client Workload executor Client threads DB client Workload parameter file •  R/W mix •  Record size •  Data set •  … Stats Extensible: plug in new clients Extensible: define new workloads NOSQL DB Command-line parameters •  DB to use •  Target throughput •  Number of threads •  …
  • 11. Operations within workload Operations against a data store were randomly selected and could be of the following types: 1.  2.  3.  4.  Insert: Inserts a new record. Update: Updates a record by replacing the value of one field. Read: Reads a record, either one randomly selected field, or all fields. Scan: Scans records in order, starting at a randomly selected record key. The number of records to scan is also selected randomly from the range between 1 and 100.
  • 12. Workloads 1.  2.  3.  4.  5.  6.  7.  Workload A: Update heavily Workload B: Read mostly Workload C: Read only Workload D: Read latest Workload E: Scan short ranges Workload F: Read-modify-write Workload G: Write heavily Each workload is defined by: 1.  2.  3.  4.  The number of records manipulated (read or written) The number of columns per each record The total size of a record or the size of each column The number of threads used to load the system
  • 13. Measure It! •  Availability: what is uptime requirement •  Throughput Ø  average read/write/users Ø  peak throughput? Ø  OPS (operations per second)? per hour? per day •  Responsiveness Ø  what is acceptable latency? Ø  is higher during peak times acceptable?
  • 14. Results Throughput varies a lot depending on durability and consistency parameters, even for a single database.
  • 15. Disadvantages •  Secondary indexes are not provided by all, so YCSB doesn’t test the same. Some other rich functionalities aren’t tested either. •  It provides limited support for scaling across multiple clients •  Poor documentation – Read the source code.
  • 16. Summary •  YCSB is versatile tool for testing many different workloads. •  It allows apple to apple comparison for NoSQL and cloud databases. •  It doesn’t test for correctness. •  Database can be configured for durability, consistency and other parameters. You should model your usecases and benchmark. •  Polyglot persistence is here to stay! Use different data-stores in a single application for work.
  • 17. #CMGIndia Thank You Prasoon Kumar, @prasoonk MongoDB