Your SlideShare is downloading. ×
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

892
views

Published on

Speaker: Mike Drob …

Speaker: Mike Drob

Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
892
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1 Benchmarking Accumulo: How Fast is Fast? Mike Drob Software Engineer, Cloudera
  • 2. Me • Cloudera Engineer • Accumulo Committer • Perpetual Tinkerer 2 Victor Grigas CC-BY-SA 3.0
  • 3. Agenda • Methodology • Accumulo 1.4 to 1.6 • Accumulo to HBase • Conclusions 3 Reuvenk CC-BY-SA 2.5
  • 4. Methodology • Measuring Performance • Task Latency (time) • Throughput (bps) • Workloads • Read • Write • Mixed 4 AngMoKio CC-BY-SA 2.5
  • 5. Methodology • Yahoo! Cloud Serving Benchmark • Workloads • Connectors • Highly configurable • # of Rows/Columns • Size of Value • # of Threads • Parallelizable number of clients 5 Sfoskett CC BY-SA 3.0
  • 6. 6 Accumulo across versions
  • 7. Accumulo across versions • Accumulo 1.4.4-cdh4.5.0 • Accumulo 1.6.0-cdh4.6.0-beta-1 • YCSB 0.14+50 • 80 node cluster • 10 clients • 5 racks 7 Public Domain via USAF
  • 8. Accumulo across versions The Data: • 200 GB • 2k Columns • Pre-Split Table 80x • Vary # of rows • Vary value size (we actually did a lot more, but it was hard to graph) 8 Morio CC BY-SA 3.0
  • 9. Accumulo across versions 9 0 200 400 600 800 1000 1200 1400 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Read Accumulo 1.4 Accumulo 1.6
  • 10. Accumulo across versions 10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Mixed Accumulo 1.4 Accumulo 1.6
  • 11. Accumulo across versions 11 0 50 100 150 200 250 300 350 400 450 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Write Accumulo 1.4 Accumulo 1.6
  • 12. Accumulo across versions • Write speed improved! • Read speed about the same. • Something weird happens writing 1000 rows. 12 Christopher Foster CC BY-SA 3.0
  • 13. Accumulo across versions So, what happens at 1000 rows…? Nothing. 13 100 200 300 400 500 600 700 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Problem is at 100 rows.
  • 14. 14 Accumulo and HBase
  • 15. Accumulo and HBase • Accumulo 1.6.0-cdh4.6.0-beta-1 • HBase 0.94.15-cdh4.6.0 • YCSB 0.14+50 • 5 worker nodes • 5 split points • 5G Heap, 3G mem map 15 Abdullah AlBargan CC BY-ND 2.0
  • 16. Accumulo and HBase • Single client (5 threads) • Workload sizes • In memory (15G) • Force disk activity (30G) • Constant # of rows • Vary # of columns • Activity • 100% Write • 100% Read 16 nahtanoj CC-BY-2.0
  • 17. Accumulo and HBase 17 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 15GB (500 rows) Accumulo Hbase
  • 18. Accumulo and HBase 18 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 30GB (1000 rows) Accumulo Hbase
  • 19. Accumulo and HBase 19 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 15GB (500 rows) Accumulo Hbase
  • 20. Accumulo and HBase 20 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 30GB (1000 rows) Accumulo Hbase
  • 21. 21 Performance Tweaks
  • 22. Performance Tweaks – Client Side • Number of rows/columns • Batch Writer Threads • Batch Writer Buffer Size • Use large buffer for small values • Use small buffer for large values • ACCUMULO-2766 possible fix 22 Public Domain via USN
  • 23. Performance Tweaks – Server Side • Apply table splits liberally • Increase automatic split threshold • Some properties to play with: • table.compaction.minor.logs.threshold • tserver.compaction.minor.concurrent.max • tserver.walog.max.size • If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes 23
  • 24. 24 Thank You! Please visit our booth! Mike Drob – madrob@cloudera.com @mikhaildrob