Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010

3,150 views

Published on

This is the summary materials of "Benchmarking Cloud Serving Systems with YCSB" paper for nosql summer reading in Tokyo on September 15, 2010 at Gemini Mobile Technologies in Shibuya, Tokyo.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,150
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
54
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010

  1. 1. Benchmarking Cloud Serving Systems with YCSBby Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.<br />Gemini Mobile Technologies, Inc.<br />NOSQL Tokyo Reading Group<br />(http://nosqlsummer.org/city/tokyo)<br />September 15, 2010<br />Tags: #ycsb #nosql<br />10.9.11<br />Gemini Mobile Technologies, Inc.<br />1<br />
  2. 2. Benchmarking Cloud Serving Systems with YCSB<br />Authors: Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R, Sears, R..<br />Abstract: … We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple shardedMySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible---it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.<br />Appeared in: ACM Symposium on Cloud Computing, ACM, Indianapolis, IN, USA (2010)<br />http://research.yahoo.com/files/ycsb.pdf<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />2<br />
  3. 3. 1. Introduction<br />Hard to compare non-relational DBs<br /><ul><li>Data model varies. Key-Value vs. Column-oriented vs. Document-oriented.
  4. 4. DB’s performance profile (writes/reads/updates) has different emphasis.
  5. 5. Consistency model, replication, fault handling, etc. are all different.</li></ul>Goal: A standard benchmarking framework to evaluate “serving” systems that do online read/write data ops.<br />YCSB (Yahoo! Cloud Serving Benchmark)<br /><ul><li>Workload generating client.
  6. 6. Package of standard workloads (e.g., read-heavy, scan, etc.)
  7. 7. Package of DB interface layers for Cassandra, HBase, MongoDB.
  8. 8. Extensible. Add new workloads. Add new DBs.</li></ul>10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />3<br />
  9. 9. 2.1. Cloud Serving System Characteristics<br />Scale-out<br />To add capacity, add servers. <br />Goal is constant performance/node.<br />Elasticity<br />Load is distributed by adding a server to a running system. <br />Temporary performance decrease as data is re-distributed.<br />High Availability <br />System remains available in face of failures.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />4<br />
  10. 10. 2.2 Classifications of Systems and Tradeoffs<br />Read vs. Write Performance<br />Write-optimized. Log-structured systems. Append updates to commit log. Reads may need to merge update information.<br />Latency vs. Durability<br />Disk sync writes. <br />Synchronous vs. Asynchronous Replication<br />Data Partitioning<br />Row-based storage: A row’s data is stored contiguously on disk.<br />Column storage: Different columns can be stored separately.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />5<br />
  11. 11. 3.1 Benchmark Tiers<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />6<br />Tier 1: Performance (Latency)<br />Measure latency as throughput is increased until system is saturated.<br />Tier 2: Scaling<br /><ul><li>Scaleup. Increase number of servers, amount of data, and offered throughput scale proportionally. Latency should be constant.
  12. 12. Elastic Speedup. In running system, add more servers. Performance should improve.</li></li></ul><li>4. Benchmark Workloads<br />Operation Types<br />Insert<br />Update<br />Read<br />Scan<br />Data size <br />Number of fields (e.g., 10)<br />Field length (e.g., 100 bytes)<br />Request distribution<br />Uniform: All items equally likely.<br />Zipfian: Some records are very popular, most records are unpopular.<br />Latest: Like Zipfian with most recently inserted records as the most popular<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />7<br />
  13. 13. 4.2 Core Workloads<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />8<br />
  14. 14. 5.1 YCSB Client Architecture<br />Workload Executor. Traffic generation for both “load” and “transaction” phases.<br />DB Interface Layer. Custom for each DB.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />9<br />
  15. 15. 5.2 Extensibility<br />YCSB package is open-source Java code.<br />Workload Executor<br />Modify configuration (e.g., operation mix, distribution, data size, etc.)<br />Custom Java class to define workload.<br />DB Interface Layer<br />Implement interface (read,update, insert, delete, scan) for DB.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />10<br />
  16. 16. 6. Results: Setup<br />Tested 4 DBs<br />Cassandra 0.5.0<br />HBase 0.20.3 <br />PNUTS MySQL 5.1.24<br />MySQL(sharded) 5.1.32.<br />6 servers. Dual 65-bit quad-core 2.5 GHz Intel Xeon CPUs, 8GB RAM, 6-disk RAID-10 array, GB ethernet.<br />YCSB Client on a separate 8-core server.<br />Up to 500 threads.<br />Client was not the bottleneck.<br />No replication<br />Data is 120M 1KB records (total size: 120GB). Each server then stored 20GB data. <br />Cassandra, PNUTS, MySQL configured to sync to disk. HBase not sync to disk.<br />Periodic compaction operations.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />11<br />
  17. 17. 6. Results: Read vs. Write Performance<br />Cassandra and HBase had better performance on write-heavy workload.<br />PNUTS and MySQL had better performance on read-heavy workload.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />12<br />
  18. 18. 6. Results: Scalability<br />Vary number of servers from 2 to 12. Data size and request rate varied proportionally.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />13<br />HBase is erratic.<br />Cassandra and <br />PNUTS scale well.<br />
  19. 19. 6. Results: Elasticity<br />Start with 2 servers with 120GB data. Then add more servers up to 6.<br />Cassandra, HBase, PNUTS were able to grow elastically. <br />HBase does not repartition data until next compaction.<br />PNUTS was best, most stable latency while elastically repartitioning data.<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />14<br />Go from 5 to 6 servers at 10 minute mark.<br />
  20. 20. 7. Future Work<br />Tier 3: Availability<br />Tier 3: Replication<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />15<br />
  21. 21. Further Study<br />Main Site: http://research.yahoo.com/Web_Information_Management/YCSB<br />Source Code:  http://github.com/brianfrankcooper/YCSB <br />Mailing list: http://tech.groups.yahoo.com/group/ycsb-users/<br />10.9.11<br />Gemini Mobile Technologies, Inc. All rights reserved.<br />16<br />

×