Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Compaction and
Splitting in Apache
Accumulo
Billie Rinaldi
billie@hortonworks.com
October 24, 2012
© Hortonworks Inc. 2012...
What are compaction and splitting?

•Accumulo tables are divided into
 non-overlapping key ranges called
 tablets
•Compact...
Tablet Overview

•When memory fills, new sorted files
 are created by flushing
•Sorted files are combined together
 into f...
How much data are you writing?

•If you never compact – O(N)
                                 …
•If you always compact – O...
Accumulo Compaction Algorithm

•Compact a set of files when:

size of the
largest file
                               ×
  ...
In Action (r = 3, N = 1, W = 1)




                                  Page 6
    © Hortonworks Inc. 2012
In Action (r = 3, N = 2, W = 2)




                                  Page 7
    © Hortonworks Inc. 2012
In Action (r = 3, N = 3, W = 3)




                                  Page 8
    © Hortonworks Inc. 2012
In Action (r = 3, N = 3, W = 6)




                                  Page 9
    © Hortonworks Inc. 2012
In Action (r = 3, N = 4, W = 7)




                                  Page 10
    © Hortonworks Inc. 2012
In Action (r = 3, N = 5, W = 8)




                                  Page 11
    © Hortonworks Inc. 2012
In Action (r = 3, N = 6, W = 9)




                                  Page 12
    © Hortonworks Inc. 2012
In Action (r = 3, N = 6, W = 12)




                                   Page 13
    © Hortonworks Inc. 2012
In Action (r = 3, N = 7, W = 13)




                                   Page 14
    © Hortonworks Inc. 2012
In Action (r = 3, N = 8, W = 14)




                                   Page 15
    © Hortonworks Inc. 2012
In Action (r = 3, N = 9, W = 15)




                                   Page 16
    © Hortonworks Inc. 2012
In Action (r = 3, N = 9, W = 24)




                                   Page 17
    © Hortonworks Inc. 2012
In Action (r = 3, N = 27, W = 90*)




                                     Page 18
    © Hortonworks Inc. 2012
Amount of data written

•W(rk) = (k+1)rk – (k-1)rk-1
•Thus, W(N) ≈ O(N log N)




                               Page 19
 ...
HBase Compaction Algorithm

•Compact a set of files when:

                                    sum of the
size of the
larg...
HBase Compaction Algorithm

•Compact a set of files when:

                                    sum of the
size of the
larg...
Other Compaction-related Properties

•Accumulo
  table.file.max
  tserver.compaction.major.thread.files.open.max
  tserver...
Accumulo Splitting

•Always check to see if a split is
 needed before compacting
•If it is needed, split first
•File names...
Accumulo Splitting Process

•Tablet closed, no new writes
•Three writes to the metadata table
 –tablet made smaller & mark...
Accumulo Splitting Recovery

•Whenever a tablet is brought online,
 the tablet server checks to see if it
 has split marks...
Hortonworks Data Platform
                                                     • Simplify deployment to get
              ...
Hortonworks Training

                         The expert source for
                         Apache Hadoop training &
   ...
Next Steps?

1                                 Download Hortonworks Data Platform
                                  horton...
Questions?
dev@accumulo.apache.org




                              Page 29
    © Hortonworks Inc. 2012
Upcoming SlideShare
Loading in …5
×

of

Compaction and Splitting in Apache Accumulo Slide 1 Compaction and Splitting in Apache Accumulo Slide 2 Compaction and Splitting in Apache Accumulo Slide 3 Compaction and Splitting in Apache Accumulo Slide 4 Compaction and Splitting in Apache Accumulo Slide 5 Compaction and Splitting in Apache Accumulo Slide 6 Compaction and Splitting in Apache Accumulo Slide 7 Compaction and Splitting in Apache Accumulo Slide 8 Compaction and Splitting in Apache Accumulo Slide 9 Compaction and Splitting in Apache Accumulo Slide 10 Compaction and Splitting in Apache Accumulo Slide 11 Compaction and Splitting in Apache Accumulo Slide 12 Compaction and Splitting in Apache Accumulo Slide 13 Compaction and Splitting in Apache Accumulo Slide 14 Compaction and Splitting in Apache Accumulo Slide 15 Compaction and Splitting in Apache Accumulo Slide 16 Compaction and Splitting in Apache Accumulo Slide 17 Compaction and Splitting in Apache Accumulo Slide 18 Compaction and Splitting in Apache Accumulo Slide 19 Compaction and Splitting in Apache Accumulo Slide 20 Compaction and Splitting in Apache Accumulo Slide 21 Compaction and Splitting in Apache Accumulo Slide 22 Compaction and Splitting in Apache Accumulo Slide 23 Compaction and Splitting in Apache Accumulo Slide 24 Compaction and Splitting in Apache Accumulo Slide 25 Compaction and Splitting in Apache Accumulo Slide 26 Compaction and Splitting in Apache Accumulo Slide 27 Compaction and Splitting in Apache Accumulo Slide 28 Compaction and Splitting in Apache Accumulo Slide 29
Upcoming SlideShare
OpenStack NSA
Next
Download to read offline and view in fullscreen.

6 Likes

Share

Download to read offline

Compaction and Splitting in Apache Accumulo

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Compaction and Splitting in Apache Accumulo

  1. 1. Compaction and Splitting in Apache Accumulo Billie Rinaldi billie@hortonworks.com October 24, 2012 © Hortonworks Inc. 2012 Page 1
  2. 2. What are compaction and splitting? •Accumulo tables are divided into non-overlapping key ranges called tablets •Compaction selects a set of sorted files for a single tablet and rewrites them into one file •Splitting divides a tablet into two tablets Page 2 © Hortonworks Inc. 2012
  3. 3. Tablet Overview •When memory fills, new sorted files are created by flushing •Sorted files are combined together into fewer sorted files Page 3 © Hortonworks Inc. 2012
  4. 4. How much data are you writing? •If you never compact – O(N) … •If you always compact – O(N2) … Page 4 © Hortonworks Inc. 2012
  5. 5. Accumulo Compaction Algorithm •Compact a set of files when: size of the largest file × compaction ratio ≤ sum of the sizes of files table.compaction.major.ratio Page 5 © Hortonworks Inc. 2012
  6. 6. In Action (r = 3, N = 1, W = 1) Page 6 © Hortonworks Inc. 2012
  7. 7. In Action (r = 3, N = 2, W = 2) Page 7 © Hortonworks Inc. 2012
  8. 8. In Action (r = 3, N = 3, W = 3) Page 8 © Hortonworks Inc. 2012
  9. 9. In Action (r = 3, N = 3, W = 6) Page 9 © Hortonworks Inc. 2012
  10. 10. In Action (r = 3, N = 4, W = 7) Page 10 © Hortonworks Inc. 2012
  11. 11. In Action (r = 3, N = 5, W = 8) Page 11 © Hortonworks Inc. 2012
  12. 12. In Action (r = 3, N = 6, W = 9) Page 12 © Hortonworks Inc. 2012
  13. 13. In Action (r = 3, N = 6, W = 12) Page 13 © Hortonworks Inc. 2012
  14. 14. In Action (r = 3, N = 7, W = 13) Page 14 © Hortonworks Inc. 2012
  15. 15. In Action (r = 3, N = 8, W = 14) Page 15 © Hortonworks Inc. 2012
  16. 16. In Action (r = 3, N = 9, W = 15) Page 16 © Hortonworks Inc. 2012
  17. 17. In Action (r = 3, N = 9, W = 24) Page 17 © Hortonworks Inc. 2012
  18. 18. In Action (r = 3, N = 27, W = 90*) Page 18 © Hortonworks Inc. 2012
  19. 19. Amount of data written •W(rk) = (k+1)rk – (k-1)rk-1 •Thus, W(N) ≈ O(N log N) Page 19 © Hortonworks Inc. 2012
  20. 20. HBase Compaction Algorithm •Compact a set of files when: sum of the size of the largest file ≤ sizes of × compaction ratio smaller files hbase.hstore.compaction.ratio Page 20 © Hortonworks Inc. 2012
  21. 21. HBase Compaction Algorithm •Compact a set of files when: sum of the size of the largest file ≤ sizes of × compaction ratio smaller files 1 HBase ratio = Accumulo ratio –1 Page 21 © Hortonworks Inc. 2012
  22. 22. Other Compaction-related Properties •Accumulo table.file.max tserver.compaction.major.thread.files.open.max tserver.compaction.major.delay table.compaction.major.everything.idle •Hbase hbase.hstore.compactionThreshold hbase.hstore.blockingStoreFiles hbase.hstore.blockingWaitTime hbase.hstore.compaction.min hbase.hstore.compaction.max hbase.hstore.compaction.min.size hbase.hstore.compaction.max.size Page 22 © Hortonworks Inc. 2012
  23. 23. Accumulo Splitting •Always check to see if a split is needed before compacting •If it is needed, split first •File names stored in metadata table split threshold Page 23 © Hortonworks Inc. 2012
  24. 24. Accumulo Splitting Process •Tablet closed, no new writes •Three writes to the metadata table –tablet made smaller & marked as splitting –new tablet added –original tablet's splitting marks removed •Tablet server swaps new tablets for old tablet in its online tablet list •Master informed Page 24 © Hortonworks Inc. 2012
  25. 25. Accumulo Splitting Recovery •Whenever a tablet is brought online, the tablet server checks to see if it has split marks. •If so, it assumes the splitting process was interrupted and finishes making changes to the metadata table. Page 25 © Hortonworks Inc. 2012
  26. 26. Hortonworks Data Platform • Simplify deployment to get started quickly and easily • Monitor, manage any size cluster with familiar console and tools 1 • Only platform to include data integration services to interact with any data • Metadata services opens the platform for integration with existing applications • Dependable high availability architecture  Reduce risks and cost of adoption  Lower the total cost to administer and provision • Tested at scale to future proof your cluster growth  Integrate with your existing ecosystem Page 26 © Hortonworks Inc. 2012
  27. 27. Hortonworks Training The expert source for Apache Hadoop training & certification Role-based Developer and Administration training – Coursework built and maintained by the core Apache Hadoop development team. – The “right” course, with the most extensive and realistic hands-on materials – Provide an immersive experience into real-world Hadoop scenarios – Public and Private courses available Comprehensive Apache Hadoop © Hortonworks Inc. 2012 Page 27
  28. 28. Next Steps? 1 Download Hortonworks Data Platform hortonworks.com/download 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Hortonworks Support • Expert role based training • Full lifecycle technical support • Course for admins, developers across four service levels and operators • Delivered by Apache Hadoop • Certification program Experts/Committers • Custom onsite options • Forward-compatible hortonworks.com/training hortonworks.com/support Page 28 © Hortonworks Inc. 2012
  29. 29. Questions? dev@accumulo.apache.org Page 29 © Hortonworks Inc. 2012
  • RenatBekbolatov

    Mar. 28, 2015
  • snuffkin

    Jan. 20, 2014
  • akhanolkar

    Jan. 7, 2014
  • cdrewthornton

    Feb. 15, 2013
  • binlijin

    Oct. 29, 2012
  • NELLAIVIJAY1

    Oct. 28, 2012

Views

Total views

5,893

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

90

Shares

0

Comments

0

Likes

6

×