Your SlideShare is downloading. ×
0
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Optimizing_hbase_scanner_performance
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Optimizing_hbase_scanner_performance

248

Published on

Optimizing_hbase_scanner_performance

Optimizing_hbase_scanner_performance

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
248
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Optimizing HBase scannerperformanceMikhail BautinSoftware Engineer01/19/2012
  • 2. HBase ScannersWhat happens on a Get RegionScanner ColumnFamily1 ColumnFamily2 Store = (Region, StoreScanner StoreScanner CF) ...StoreFileScanne StoreFileScanne StoreFileScanne ... r r r(R1,C1,T3) (R1,C2,T2) (R1,C1,T1) (R1,C2,T3) (R2,C2,T1) . . .(R1,C2,T1) (R2,C1,T2)
  • 3. HBase Scanner StateWhat happens on a next() RegionScanner ColumnFamily1 Priority ColumnFamily2 Queue Store = (Region, StoreScanner StoreScanner CF) Priority Priority Queue ... Queue StoreFileScanne StoreFileScanne StoreFileScanne ... r r r Current KeyValue Current KeyValue Current KeyValue
  • 4. Avoiding next() on StoreFileScannerEvery next() call may result in disk I/O▪ HBASE-4433: avoid extra next if done with row/column (Kannan) ▪ An optimization for queries specifying a column set ▪ INCLUDE_AND_SEEK_NEXT_COL ▪ INCLUDE_AND_SEEK_NEXT_ROW▪ HBASE-4434: Dont do HFile Scanner next() unless the next KV is needed (Kannan) ▪ Avoid aggressive pre-fetching
  • 5. Simple ROWCOL Bloom FiltersDo we have to read all of these files?Query: (R1, C3) Row Col TS Row Col TS Row Col TS T2 C1 T3 C1 T4 C1 R1 R1 T1 R1 C2 T3 C2 T2 C2 T1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 T2 C2 T3 R2 C2 T1 C3 T1
  • 6. Simple ROWCOL Bloom FiltersIn some cases, we only have to read one fileQuery: (R1, C3) Row Col TS R1 C1 T3 R1 C2 T3 R1 C3 T2 C1 T2 R2 C2 T3
  • 7. Multi-column Bloom Filters (HBASE-2794)ROWCOL Bloom filters for multi-column queriesQuery: C1 and C3 in all rows Row Col TS Row Col TS Row Col TS T2 C1 T3 C1 T4 C1 R1 R1 T1 R1 C2 T3 C2 T2 C2 T1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 T2 C2 T3 R2 C2 T1 C3 T1
  • 8. Multi-column Bloom Filters (HBASE-2794)ROWCOL Bloom filters for multi-column queriesQuery: C1 and C3 in all rows—seek to (R1,C1) Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C1 T1 R1 C2 T3 R1 C2 T2 R1 C2 T1 R1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 T2 C2 T3 R2 C2 T1 C3 T1
  • 9. Multi-column Bloom Filters (HBASE-2794) ROWCOL Bloom filters for multi-column queries Query: C1 and C3 in all rows—seek to (R1, C3) Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T1 C1 T1 C1 T2 Fake key: (R1, end of R2 C3) T2 C2 T3 R2 C2 T1 C3 T1Fake key: (R1, end ofC3)
  • 10. Multi-column Bloom Filters (HBASE-2794)ROWCOL Bloom filters for multi-column queriesQuery: C1 and C3 in all rows—seek to (R2,C1) Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T2 R2 C2 T2 R2 C2 T3 R2 C2 T1 (R2, C1, T2) R2 C3 T1 wins by timestamp
  • 11. Multi-column Bloom Filters (HBASE-2794)ROWCOL Bloom filters for multi-column queriesQuery: C1 and C3 in all rows—seek to (R2,C3) Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T2 Fake key: (R2, end of C3) Fake key: (R2, end of R2 C3 T1 C3) (R2, C3, T1)
  • 12. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS T2 C1 T3 C1 T4 C1 R1 R1 T1 R1 C2 T3 C2 T2 C2 T1 C3 T2 R2 C1 T1 C1 T1 C1 T2 Fake key: (R1, C1, R2 T4) T2 C2 T3 R2 C2 T1 Fake key: (R1, C1, C3 T1 T3)Fake key: (R1, C1,T2)
  • 13. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS T2 C1 T3 R1 C1 T4 C1 R1 T1 R1 C2 T3 R1 C2 T2 C2 T1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 (R1, C1, T4) T2 C2 T3 R2 C2 T1 Fake key: (R1, C1, C3 T1 T3)Fake key: (R1, C1,T2)
  • 14. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T1 C1 T1 C1 T2 Fake key: (R1, C3, R2 T4) T2 C2 T3 R2 C2 T1 Fake key: (R1, C3, C3 T1 T3)Fake key: (R1, C3,T2)
  • 15. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 (R2, C1, T1) T2 C2 T3 R2 C2 T1 Fake key: (R1, C3, C3 T1 T3)Fake key: (R1, C3,T2)
  • 16. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 (R2, C1, T1) T2 C2 T3 R2 C2 T1 (R1, C3, T2) is C3 T1 nextFake key: (R1, C3,T2)
  • 17. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T1 C1 T1 C1 T2 R2 (R2, C1, T1) T2 C2 T3 R2 C2 T1 Fake key: (R2, C1, T3) C3 T1 To be selected next.Fake key: (R2, C1,T2)
  • 18. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 C1 T1 R2 C1 T2 (R2, C1, T1) T2 R2 C2 T3 R2 C2 T1 (R2, C1, T2) C3 T1 wins by timestampFake key: (R2, C1,T2)
  • 19. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 C1 T1 R2 C1 T2 Fake key: (R2, C3, T2 R2 C2 T3 T4) R2 C2 T1 Fake key: (R2, C3, T3) C3 T1Fake key: (R2, C3,T2)
  • 20. Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 C1 T1 R2 C1 T2 EO F T2 R2 C2 T3 R2 C2 T1 Real seek to C3 T1 (R2, C3, T3)Fake key: (R2, C3,T2)
  • 21. Lazy Seek (HBASE-4465)Optimizing for reading recent data T1 – T2 – T1 – T2 T3 T4 Row Col TS Row Col TS Row Col TS R1 C1 T4 R1 C3 T2 R2 C1 T2 EO F EOF R2 C3 T1 (R2, C3, T1)
  • 22. Top-of-the-row seekSome applications do not use DeleteFamily▪ We always seek to the top of the row first ▪ DeleteFamily comes before all columns, i.e. at (R1, empty column) ▪ Even if we only need (R1, C1), there might be a DeleteFamily for R1▪ Some applications do not even use DeleteFamily▪ Two fixes by Liyin Tang: ▪ Utilize existing ROWCOL Bloom filter (HBASE-4469) ▪ Added a separate ROW-only Bloom filter for DeleteFamily(HBASE- 4532)
  • 23. Seek on deleted KV (HBASE-4585)What if the requested column has been deleted?▪ We are requesting C1, C2, ..., Cn▪ What if we see a delete marker for Ci?▪ Previously, we would keep calling next()▪ Now, we seek to (i + 1)’th requested column (also a fix by Liyin)
  • 24. Data block read requests (dark launch)Thu, Sep 15 – Sun, Sep 25 2011 Fri Sep 16th vs. Sep 23rd: 45% savings in logical block read requests (cache hits + misses) Pushed on Tue Sep 20th: • No extra next when done with column/row (HBASE-4433) • No KV prefetch (HBASE-4434) • Lazy Seek (HBASE-4465)
  • 25. Data block read requests (dark launch)Sun, Sep 25 – Mon, Oct 3 2011 Sun Sep 25th vs. Oct 2nd: 33% savings in logical block read requests (cache hits + misses) Pushed on Fri Sep 30th: • Avoid top-of-the-row seek (HBASE-4469, Liyin) • Off-peak compactions (HBASE- 4463, Karthik)
  • 26. Data block cache misses (dark launch)▪ 20.6 K (Mon Sep 19th) -> 11.8 K (Mon Sep 26th) -> 9.8 K (Mon Oct 3rd)▪ 52% savings (42% and then 17% more) • No next KV prefetch • No next() when done with row/column • No top-of-the-row seek • Lazy Seek • Off-peak compactios
  • 27. Avoid loading previous block (HBASE-4443)We sometimes go to previous block on exact match▪ Future work▪ Suppose the first key of a block matches (Row, Column)▪ But maybe there is an earlier key that would also match?▪ We load the previous block to find out▪ Possible fixes: ▪ Track deletes and optimize the MAX_VERSIONS=1 case ▪ Add last key in block to index (increases index size)
  • 28. Top-of-the-column seek (HBASE-4962)Some applications do not use DeleteColumn▪ Future work▪ DeleteColumn deletes all versions of a particular column▪ Comes before all Puts for a (Row, Column)▪ Slows down timestamp range queries▪ Proposed solution: ▪ Add a (Row, Column) Bloom filter for DeleteColumn only ▪ Seek to (Row, Column, T2) for a [T1, T2] range query
  • 29. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

×