Your SlideShare is downloading. ×
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NoSQL: Cassadra vs. HBase

8,891

Published on

Published in: Technology
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,891
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
195
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. YCSBYahoo! Cloud Serving BenchmarkScalable Distributed SystemsAntonio L. Severienantonio.severien@gmail.comJoão RosaJoao.rui.rosa@gmail.com
  • 2. Overview• Distributed Databases• Cassandra• HBase• YCSB General View• YCSB Details• Amazon EC2• YCSB Results• YCSB Future• Conclusions• References
  • 3. Distributed DatabasesTraditional RDBMS• ACID transactions• Query language (SQL)• Data tied to the modeling (hard to analyze)• Scalable to a limitDistributed Databases• Not ACID• Not Relational• Column oriented (key-value)• CAP (Consistency, Availability, Partitioning)• Big Data (Massively scalable)
  • 4. Distributed Databases• Sherpa/PNUTS• BigTable• HBase, Hypertable, HTable• Megastore• Azure• Cassandra• Amazon Web Services• S3, SimpleDB, EBS • CouchDB• Voldemort• Dynomite• Tokyo• Redis• MongoDB
  • 5. Distributed Databases• NoSQL Databases have different designs and architectureCassandraThriftGossipToken ring…HbaseHDFSZookeeperHadoop (MapReduce)BigTableGFSChubby (Lock Service)MapReduce
  • 6. Cassandra• Highlights• High availability• Incremental scalability• Eventually consistent• Tradeoffs between consistency and latency• Minimal administration• No SPF (Single Point of Failure)
  • 7. Cassandra• CAP-aware• Cassandra values Availability and Partitioning tolerance (AP) eventually consistent• Providing strong Consistency in Cassandra increases latency• Partitioning• Token oriented• Explicit Replication• Replication factor ≤ Total nodes• High level clients• Python, Java, C#, .NET, Scala, Ruby, PHP, Erlang, Haskell…etc• Thrift  driver-level interface
  • 8. Cassandra• Data Model• Cluster:• Machines (nodes) in a logicalCassandra instance• can contain multiple keyspaces• Keyspace:• name for ColumnFamilies• ColumnFamilies:• contain multiple columns each with name, value and timestampreferenced by row keys.• Analogous to table on RDBMS• SuperColumns:• columns with subcolumns• Rows• ColumnskeyA Column1 Column2 Column3keyB Column5 Column6 column10ColumnByte[] NameByte[] ValueI64 Timestamp
  • 9. CassandraPartitioning Replication
  • 10. HBase“HBase is more a datastore than a database”• It lacks many of the features of RDBMS• Distributed and scalable big data store.• Regions model• Strong consistency
  • 11. HBaseBuilt on top of Hadoop Distributed Filesystem (HDFS)
  • 12. HBase• The NameNode isresponsible for maintainingthe filesystem metadata.• The DataNodes areresponsible for storing HDFSblocks.
  • 13. HBase• The NameNode isresponsible for maintainingthe filesystem metadata.• The DataNodes areresponsible for storing HDFSblocks.Note: In our study case, we onlyhad interest on HDFS layer.
  • 14. HBase
  • 15. HBaseDatanodesNamenode
  • 16. HBase• Data is stored into HBase tables.• Tables are made of rows and columns.• All columns belong to a particular column family.Important note: All column family members are stored together.• A query on acolumn familymodel has a betterperformance
  • 17. YCSB General View• Which is the best NoSQL DB?• How to compare?• Yahoo! Cloud Serving Benchmark (YCSB)• Benchmarking tool• Evaluate key-value and cloud DBs performance on a common setof workloads• Client – an extensible workload generator• Yahoo! Research• Brian F. Cooper - cooperb@yahoo-inc.com• Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnanand Russell Sear
  • 18. YCSB Details• How it works?YCSB ClientDBInterfaceLayerClientThreadsStatisticsWorkloadExecutorCloudServingStoreWorkload file• Read/write mix• Record size• Popularity distribution• …Command line• DB to use• Workload to use• Target throughput• Number of threads• …
  • 19. YCSB DetailsBenchmark Tiers• Performance• Measure latency/throughput curve• Increase throughput until saturation• Scalability• Scale up: increase hardware, data size and throughputproportionally• Elastic speedup: add servers while running a workload
  • 20. YCSB DetailsLoad phase- Load the database$ ycsb load cassandra-10–p hosts=127.0.0.1 –P workloadXTransactions phase- Executes the workload$ ycsb run cassandra-10–p hosts=127.0.0.1 –P workloadXRandom Load Distribution
  • 21. YCSB Details• # Yahoo! Cloud System Benchmark• # Workload A: Update heavy workload• # Application example: Session store recording recent actions• #• # Read/update ratio: 50/50• # Default data size: 1 KB records (10 fields, 100 bytes each, plus key)• # Request distribution: zipfian• recordcount=1000• operationcount=1000• workload=com.yahoo.ycsb.workloads.CoreWorkload• readallfields=true• readproportion=0.5• updateproportion=0.5• scanproportion=0• insertproportion=0• requestdistribution=zipfian
  • 22. YCSB Details• Execution parameters• $ ./bin/ycsb run cassandra-10 –P workloads/workloada –s –threads 10 –target 100> transactions.dat[OVERALL],RunTime(ms), 10110[OVERALL],Throughput(ops/sec), 98.91196834817013[UPDATE], Operations, 491[UPDATE], AverageLatency(ms), 0.054989816700611[UPDATE], MinLatency(ms), 0[UPDATE], MaxLatency(ms), 1[UPDATE], 95thPercentileLatency(ms), 1[UPDATE], 99thPercentileLatency(ms), 1[UPDATE], Return=0, 491[UPDATE], 0, 464[UPDATE], 1, 27[UPDATE], 2, 0[UPDATE], 3, 0[UPDATE], 4, 0...
  • 23. YCSB Details• $ ./bin/ycsb run basic -P workloads/workloada -P large.dat -s -threads 10 -target 100 –p measurementtype=timeseries -p timeseries.granularity=2000 >transactions.dat[OVERALL],RunTime(ms), 10077[OVERALL],Throughput(ops/sec), 9923.58836955443[UPDATE], Operations, 50396[UPDATE], AverageLatency(ms), 0.04339630129375347[UPDATE], MinLatency(ms), 0[UPDATE], MaxLatency(ms), 338[UPDATE], Return=0, 50396[UPDATE], 0, 0.10264765784114054[UPDATE], 2000, 0.026989343690867442[UPDATE], 4000, 0.0352882703777336[UPDATE], 6000, 0.004238958990536277[UPDATE], 8000, 0.052813085033008175[UPDATE], 10000, 0.0[READ], Operations, 49604[READ], AverageLatency(ms), 0.038242883638416256[READ], MinLatency(ms), 0[READ], MaxLatency(ms), 230[READ], Return=0, 49604[READ], 0, 0.08997245741099663[READ], 2000, 0.02207505518763797[READ], 4000, 0.03188493260913297[READ], 6000, 0.004869141813755326[READ], 8000, 0.04355329949238579[READ], 10000, 0.005405405405405406
  • 24. YCSB DetailsStatus Output
  • 25. Amazon EC2 ConfigurationLarge Instance7.5 GB memory4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)850 GB instance storage64-bit platformI/O Performance: HighAPI name: m1.largeExperiment Set-upCassandra Cluster3 nodes + 1 node (Elasticity)Hbase Cluster3 nodes
  • 26. Amazon EC2 UsageCassandraLoad phase: 60,000,000 records of 1Kb
  • 27. Amazon EC2 UsageHBaseLoad phase: 60,000,000 records of 1Kb
  • 28. Amazon EC2 UsageLoad phase: 60,000,000 records of 1KbCassandraHBase
  • 29. Amazon EC2 UsageLoad phase: 60,000,000 records of 1KbCassandra HBase
  • 30. Amazon EC2 UsageTransaction phase:- 10,000 records- 1,000,000 operations- 250 threadsCassandra
  • 31. YCSB Cassandra ResultsUpdate Heavy Workload(50/50)01020304050600 1,000 2,000 3,000 4,000 5,000 6,000AverageLatency(ms)Throughput (ops/sec)Update01020304050600 1,000 2,000 3,000 4,000 5,000 6,000AverageLatency(ms)Throughput (ops/sec)Read
  • 32. YCSB HBase Results0.000.050.100.150.200.250.30471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15AverageLatency(ms)Throughput (ops/sec)Update Hbase 0.90.50.00200.00400.00600.00800.001000.001200.00471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15AverageLatency(ms)Throughput (ops/sec)Read HBase 0.90.5
  • 33. YCSB Cassandra Results010,00020,00030,00040,00050,00060,00070,00080,0000 50000 100000 150000 200000 250000 300000 350000 400000Latency(ms)Time milisecondsElasticity Cassandra 1.0
  • 34. YCSB Cassandra Results010,00020,00030,00040,00050,00060,00070,00080,0000 50000 100000 150000 200000 250000 300000 350000 400000Latency(ms)Time milisecondsElasticity Cassandra 1.0
  • 35. YCSB FutureProvide statistics for:- Availability- ReplicationAdditional Distributed DatabasesCurrently supported:Cassandra MapkeeperMongoDB RedisVoldemort Vmware vFabric GemfireHbase
  • 36. Conclusions• YCSB provides a common ground for benchmarking cloud DBservices• Good for leaning and experimenting with different distributeddatabases• Open source, extensible for new databases• Laboratory with Amazon EC2 provided good insight into settingup cloud services• Challenges• Installation problems• Hard to follow documentation• Working on distributed environment require lots of configuration
  • 37. References• YCSB (Yahoo! Cloud Serving Benchmark)• https://github.com/brianfrankcooper/YCSB/wiki• Yahoo! Research• http://research.yahoo.com/Web_Information_Management/YCSB• BigTable• http://en.wikipedia.org/wiki/BigTable• Cassandra• http://wiki.apache.org/cassandra/• HBase• http://hbase.apache.org/
  • 38. Questions

×