Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

1,848 views

Published on

Speaker: Mike Drob

Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,848
On SlideShare
0
From Embeds
0
Number of Embeds
177
Actions
Shares
0
Downloads
43
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

  1. 1. 1 Benchmarking Accumulo: How Fast is Fast? Mike Drob Software Engineer, Cloudera
  2. 2. Me • Cloudera Engineer • Accumulo Committer • Perpetual Tinkerer 2 Victor Grigas CC-BY-SA 3.0
  3. 3. Agenda • Methodology • Accumulo 1.4 to 1.6 • Accumulo to HBase • Conclusions 3 Reuvenk CC-BY-SA 2.5
  4. 4. Methodology • Measuring Performance • Task Latency (time) • Throughput (bps) • Workloads • Read • Write • Mixed 4 AngMoKio CC-BY-SA 2.5
  5. 5. Methodology • Yahoo! Cloud Serving Benchmark • Workloads • Connectors • Highly configurable • # of Rows/Columns • Size of Value • # of Threads • Parallelizable number of clients 5 Sfoskett CC BY-SA 3.0
  6. 6. 6 Accumulo across versions
  7. 7. Accumulo across versions • Accumulo 1.4.4-cdh4.5.0 • Accumulo 1.6.0-cdh4.6.0-beta-1 • YCSB 0.14+50 • 80 node cluster • 10 clients • 5 racks 7 Public Domain via USAF
  8. 8. Accumulo across versions The Data: • 200 GB • 2k Columns • Pre-Split Table 80x • Vary # of rows • Vary value size (we actually did a lot more, but it was hard to graph) 8 Morio CC BY-SA 3.0
  9. 9. Accumulo across versions 9 0 200 400 600 800 1000 1200 1400 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Read Accumulo 1.4 Accumulo 1.6
  10. 10. Accumulo across versions 10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Mixed Accumulo 1.4 Accumulo 1.6
  11. 11. Accumulo across versions 11 0 50 100 150 200 250 300 350 400 450 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Write Accumulo 1.4 Accumulo 1.6
  12. 12. Accumulo across versions • Write speed improved! • Read speed about the same. • Something weird happens writing 1000 rows. 12 Christopher Foster CC BY-SA 3.0
  13. 13. Accumulo across versions So, what happens at 1000 rows…? Nothing. 13 100 200 300 400 500 600 700 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Problem is at 100 rows.
  14. 14. 14 Accumulo and HBase
  15. 15. Accumulo and HBase • Accumulo 1.6.0-cdh4.6.0-beta-1 • HBase 0.94.15-cdh4.6.0 • YCSB 0.14+50 • 5 worker nodes • 5 split points • 5G Heap, 3G mem map 15 Abdullah AlBargan CC BY-ND 2.0
  16. 16. Accumulo and HBase • Single client (5 threads) • Workload sizes • In memory (15G) • Force disk activity (30G) • Constant # of rows • Vary # of columns • Activity • 100% Write • 100% Read 16 nahtanoj CC-BY-2.0
  17. 17. Accumulo and HBase 17 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 15GB (500 rows) Accumulo Hbase
  18. 18. Accumulo and HBase 18 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 30GB (1000 rows) Accumulo Hbase
  19. 19. Accumulo and HBase 19 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 15GB (500 rows) Accumulo Hbase
  20. 20. Accumulo and HBase 20 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 30GB (1000 rows) Accumulo Hbase
  21. 21. 21 Performance Tweaks
  22. 22. Performance Tweaks – Client Side • Number of rows/columns • Batch Writer Threads • Batch Writer Buffer Size • Use large buffer for small values • Use small buffer for large values • ACCUMULO-2766 possible fix 22 Public Domain via USN
  23. 23. Performance Tweaks – Server Side • Apply table splits liberally • Increase automatic split threshold • Some properties to play with: • table.compaction.minor.logs.threshold • tserver.compaction.minor.concurrent.max • tserver.walog.max.size • If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes 23
  24. 24. 24 Thank You! Please visit our booth! Mike Drob – madrob@cloudera.com @mikhaildrob

×