Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. Tree Indexing on Flash Disks Yinan Li Cooperate with: Bingsheng He, Qiong Luo, and Ke Yi Hong Kong University of Science and Technology
  2. 2. Introduction <ul><li>Flash based device : the main-stream storage in mobile devices and embedded systems. </li></ul><ul><li>Recently, the flash disk , or flash Solid State Disk (SSD) , has emerged as a viable alternative to the magnetic hard disk for non-volatile storage. </li></ul>“ Tape is Dead, Disk is Tape, Flash is Disk” – Jim Gray
  3. 3. Flash SSD <ul><li>Intel X-25M 80GB SATA SSD </li></ul><ul><li>Mtron 64GB SATA SSD </li></ul><ul><li>Other manufactories: Samsung, SanDisk , Seagate , Fusion-IO , … </li></ul>
  4. 4. Internal Structure of Flash Disk
  5. 5. Flash Memory <ul><li>Three basic operations of flash memory </li></ul><ul><li>Read : Page (512B-2KB), 80us </li></ul><ul><li>Write : Page (512B-2KB), 200us </li></ul><ul><ul><li>writes are only able to change bits from 1 to 0 . </li></ul></ul><ul><li>Erase : Block (128-512KB), 1.5ms </li></ul><ul><ul><li>clear all bits to 1. </li></ul></ul><ul><ul><li>Each block can be erased for a finite number of times before wear out. </li></ul></ul>
  6. 6. Flash Translation Layer (FTL) <ul><li>Flash SSDs employ a firmware layer, called FTL, to implement out-place update scheme . </li></ul><ul><li>Maintaining a mapping table between the logical and physical pages: </li></ul><ul><ul><li>Address Translation </li></ul></ul><ul><ul><li>Garbage Collection </li></ul></ul><ul><ul><li>Wear Leveling </li></ul></ul><ul><li>Page-Level Mapping, Block-Level Mapping, Fragmentation </li></ul>
  7. 7. Superiority of Flash Disk <ul><li>Pure electrical device (No mechanical moving part) </li></ul><ul><ul><li>Extremely fast random read speed </li></ul></ul><ul><ul><li>Low power consumption </li></ul></ul>Magnetic Hard Disk Flash Disk
  8. 8. Challenge of Flash Disk <ul><li>Due to the physical feature of flash memory, flash disk exhibits relative Poor Random Write performance. </li></ul>
  9. 9. Bandwidth of Basic Access Patterns <ul><li>Random writes are 5.6 - 55X slower than random reads on flash SSDs [ Intel, Mtron, Samsung SSDs ]. </li></ul><ul><li>Random accesses are significantly slower than sequential ones with multi-page optimization. </li></ul>Access Unit Size: 2KB Access Unit Size: 512KB
  10. 10. Tree Indexing on Flash Disk <ul><li>Tree indexes are a primary access method in databases </li></ul><ul><li>Tree indexes on flash disk </li></ul><ul><ul><li>exploit the fast random read speed. </li></ul></ul><ul><ul><li>suffer from the poor random write performance. </li></ul></ul><ul><li>we study how to adapt them to the flash disk exploiting the hardware features for </li></ul><ul><li>efficiency. </li></ul>
  11. 11. B+-Tree <ul><li>Search I/O Cost: O(log B N) Random Reads </li></ul><ul><li>Update I/O Cost: O(log B N) Rndom Reads + O(1) Rndom Writes </li></ul>43 54 58 39 Search Key: 48 9 15 27 36 43 48 53 39 41 54 56 58 … … 48 Insert Key: 40 O(log B N) Levels 41 40
  12. 12. LSM-Tree (Log Structure Merge Tree) <ul><li>Search I/O Cost: O(log k N*log B N) Random Reads </li></ul><ul><li>Update I/O Cost: O(log k N) Sequential Write </li></ul>O(log B N) Levels B+-Tree B+-Tree B+-Tree Size Ratio: k O(log k N) B+Trees Size Ratio: k Size Ratio: k B+-Tree Search Key: X Insert Key: Y Merge Merge Merge [1] P. E. O’Neil, E. Cheng, D. Gawlick, and E. J. O’Neil. The Log-Structure Merge-Tree(LSM-Tree). Acta Informatica. 1996
  13. 13. BFTL <ul><li>Search I/O cost: O(c*log B N) Random Reads </li></ul><ul><li>Update I/O cost: O(1/c) Random Writes </li></ul>Pid: 0 Pid: 1 Pid:2 Pid: 100 Pid: 200 Pid:3 … … 0 1 2 … 100 Max Length of link lists: c … Pid [2] Chin-Hsien Wu, Tei-Wei Kuo, and Li Ping Chang. An efficient B-tree layer implementation for flash memory storage systems, In RTCSA, 2003
  14. 14. Designing Index for Flash Disk <ul><li>Our Goal : </li></ul><ul><ul><li>reducing update cost </li></ul></ul><ul><ul><li>preserving search efficiency </li></ul></ul><ul><li>Two ways to reduce random write cost </li></ul><ul><ul><li>Transform into sequential ones. </li></ul></ul><ul><ul><li>Limit them within a small area (< 512-8MB). </li></ul></ul>
  15. 15. Outline <ul><li>Introduction </li></ul><ul><li>Structure of FD-Tree </li></ul><ul><li>Cost Analysis </li></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  16. 16. FD-Tree <ul><li>Transforming Random Writes into Sequential ones by logarithmic method . </li></ul><ul><ul><li>Insert perform on a small tree first </li></ul></ul><ul><ul><li>Gradually merge to larger ones </li></ul></ul><ul><li>Improving search efficiency by fractional cascading . </li></ul><ul><ul><li>In each level, using a special entry to find the page in the next level that search will go next. </li></ul></ul>
  17. 17. Data Structure of FD-Tree <ul><li>L Levels: </li></ul><ul><ul><li>one head tree (B+-tree) on the top </li></ul></ul><ul><ul><li>L-1 sorted runs at the bottom </li></ul></ul><ul><li>Logarithmically increasing sizes(capacities) of levels </li></ul>
  18. 18. Data Structure of FD-Tree <ul><li>Entry : a pair of key and pointer </li></ul><ul><li>Fence : a special entry, used to improve search efficiency </li></ul><ul><ul><li>Key is equal to the FIRST key in its pointed page. </li></ul></ul><ul><ul><li>Pointer is ID of a page in the immediate next level that search will go next. </li></ul></ul>
  19. 19. Data Structure of FD-Tree <ul><li>Each page is pointed by one or more fences in the immediate topper level. </li></ul><ul><li>The first entry of each page is a fence. (If not, we insert one) </li></ul>
  20. 20. Insertion on FD-Tree <ul><li>Insert new entry into the head tree </li></ul><ul><li>If the head tree is full, merge it into next level and then empty it. </li></ul><ul><li>The merge process may invoke recursive merge process (merge to lower levels). </li></ul>
  21. 21. Merge on FD-Tree <ul><li>Scan two sorted runs and generate new sorted runs. </li></ul>11 2 3 1 19 11 29 1 5 6 7 9 10 Li 11 12 15 22 24 26 Li+1 1 9 1 New Li New Li+1 2 3 5 6 7 9 10 12 15 19 22 24 26 9 22 x Fence x Entry in Li Entry in Li+1 x
  22. 22. Insertion & Merge on FD-Tree <ul><li>When top L levels are full, merge top L levels and replace them with new ones. </li></ul>Insert Merge
  23. 23. Search on FD-Tree 72 63 95 84 63 84 79 78 75 71 63 71 81 76 83 86 91 L1 L2 L0 (Head Tree) 58 60 93 Search Key: 81 81
  24. 24. Deletion on FD-Tree <ul><li>A deletion is handled in a way similar to an insertion. </li></ul><ul><li>Insert a special entry, called filter entry , to mark the original entry, called phantom entry , has been deleted. </li></ul><ul><li>Filter entry will encounter its corresponding phantom entry in a particular level as the merges occurring. Thus, we discard both of them. </li></ul>
  25. 25. Deletion on FD-Tree 16 45 37 Delete three entries Merge L0, L1, L2 Merge L0,L1 L1 L1 L2 L2 L0 L0 L1 L1 L2 L2 L0 L0 16 45 16 45 16 16
  26. 26. Outline <ul><li>Introduction </li></ul><ul><li>Structure of FD-Tree </li></ul><ul><li>Cost Analysis </li></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  27. 27. Cost Analysis of FD-Tree <ul><li>I/O cost of FD-Tree </li></ul><ul><ul><li>Search : </li></ul></ul><ul><ul><li>Insertion : </li></ul></ul><ul><ul><li>Deletion : Search + Insertion </li></ul></ul><ul><ul><li>Update : Deletion + Insertion </li></ul></ul>k : size ratio between adjacent levels f : # entries in a page N : # entries in index : # entries in the head tree
  28. 28. I/O Cost Comparison You may assume for simplicity of comparison, thus Search Insertion Rand Read Rand. Read Seq. Read Rand. Write Seq. Write FD-Tree B+-Tree 1 LSM-Tree BFTL
  29. 29. Cost Model <ul><li>Tradeoff of k value </li></ul><ul><ul><li>Large k value: high insertion cost </li></ul></ul><ul><ul><li>Small k value: high search cost </li></ul></ul><ul><li>We develop a cost model to calculate the optimal value of k , given the characteristics of both flash SSD and workload. </li></ul>
  30. 30. Cost Model <ul><li>Estimated cost varying k values </li></ul>
  31. 31. Outline <ul><li>Introduction </li></ul><ul><li>Structure of FD-Tree </li></ul><ul><li>Cost Analysis </li></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  32. 32. Implementation Details <ul><li>Storage Layout </li></ul><ul><ul><li>Fixed-length record page format </li></ul></ul><ul><ul><li>Disable OS disk buffering </li></ul></ul><ul><li>Buffer Manager </li></ul><ul><ul><li>LRU replacement policy </li></ul></ul>Flash SSDs Storage Layout Buffer Manager FD-tree LSM-tree BFTL B+-tree
  33. 33. Experimental Setup <ul><li>Platform </li></ul><ul><ul><li>Intel Quad Core CPU </li></ul></ul><ul><ul><li>2GB memory </li></ul></ul><ul><ul><li>Windows XP </li></ul></ul><ul><li>Three Flash SSDs: </li></ul><ul><ul><li>Intel X-25M 80GB, Mtron 64GB, Samsung 32GB. </li></ul></ul><ul><ul><li>SATA interface </li></ul></ul>
  34. 34. Experimental Settings <ul><li>Index Size: 128MB-8GB (8GB by default) </li></ul><ul><li>Entry Size: 8 Bytes (4 Bytes Key + 4 Bytes Ptr) </li></ul><ul><li>Buffer Size: 16MB </li></ul><ul><li>Warm up period: 10000 queries </li></ul><ul><li>Workload: 50% search + 50% insertion (by default) </li></ul>
  35. 35. Validation of the Cost Model <ul><li>The estimated costs are very close to the measured ones. </li></ul><ul><li>We can estimated relative accurate k value to minimize the overall cost by our cost model. </li></ul>Mtron SSD Intel SSD
  36. 36. Overall Performance Comparison <ul><li>On Mtron SSD , FD-tree is 24.2X , 5.8X , and 1.8X faster than B+-tree , BFTL and LSM-tree , respectively. </li></ul><ul><li>On Intel SSD , FD-tree is 3X , 3X , and 1.5X faster than B+-tree , BFTL , and LSM-tree , respectively </li></ul>Mtron SSD Intel SSD
  37. 37. Search Performance Comparison <ul><li>FD-tree has similar search performance as B+-tree </li></ul><ul><li>FD-tree and B+-tree outperform others on both SSDs </li></ul>Mtron SSD Intel SSD
  38. 38. Insertion Performance Comparison <ul><li>FD-tree has similar insertion performance as LSM-tree </li></ul><ul><li>FD-tree and LSM-tree outperform others on both SSDs. </li></ul>Mtron SSD Intel SSD
  39. 39. Performance Comparison <ul><li>W_Search : 80% search + 10% insertion + 5% deletion + 5% update </li></ul><ul><li>W_Update : 20% search + 40% insertion + 20% deletion + 20% update </li></ul>
  40. 40. Outline <ul><li>Introduction </li></ul><ul><li>Structure of FD-Tree </li></ul><ul><li>Cost Analysis </li></ul><ul><li>Experimental Results </li></ul><ul><li>Conclusion </li></ul>
  41. 41. Conclusion <ul><li>We design a new index structure that can transform almost all random writes into sequential ones, and preserve the search efficiency. </li></ul><ul><li>We empirically and analytically show that FD-tree outperform all other indexes on various flash SSDs. </li></ul>
  42. 42. Related Publication <ul><li>Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree Indexing on Flash Disks. ICDE 2009 . Short Paper. </li></ul><ul><li>Yinan Li, Bingsheng He, Qiong Luo, Ke Yi. Tree Indexing on Flash Based Solid State Drives. Preparing to submit to a journal . </li></ul>
  43. 43. Q&A <ul><li>Thank You! </li></ul><ul><li>Q&A </li></ul>
  44. 44.
  45. 45. Additional Slides
  46. 46. Block-Level FTL <ul><li>Mapping Granularity: Block </li></ul><ul><li>Cost: 1 erase + N writes + N reads </li></ul>Logical Block ID Physical Block ID XXX
  47. 47. Page-Level FTL <ul><li>Mapping Granularity: Page </li></ul><ul><li>Larger mapping table </li></ul><ul><li>Cost: 1/N erase + 1 write + 1 read </li></ul>Logical Block ID Physical Block ID XXX YYY
  48. 48. Fragmentation <ul><li>Cost of Recycling ONE block: N^2 reads, N*(N-1) writes, N erases. </li></ul>Flash Disk is full now…. We have to recycle space
  49. 49. Deamortized FD-Tree <ul><li>Normal FD-Tree </li></ul><ul><ul><li>High average insertion performance </li></ul></ul><ul><ul><li>Poor worst case insertion performance </li></ul></ul><ul><li>Deamoritzed FD-Tree </li></ul><ul><ul><li>Reducing the worst case insertion cost </li></ul></ul><ul><ul><li>Preserving the average insertion cost. </li></ul></ul>
  50. 50. Deamortized FD-Tree <ul><li>Maintain Two Head Trees T 0 , T 0 ’ </li></ul><ul><ul><li>Insert into T 0 ’ </li></ul></ul><ul><ul><li>Search on both T 0 and T 0 ’ </li></ul></ul><ul><ul><li>Concurrent Merge </li></ul></ul>T 0 T 0 ’ Search Insert Insert into T0’
  51. 51. Deamortized FD-Tree <ul><li>The high merge cost is amortized to all entries inserted into the head tree. </li></ul><ul><li>The overall cost (almost) does not increased. </li></ul>
  52. 52. FD-Tree vs. Deamortized FD-Tree <ul><li>Relative high worst case performance </li></ul><ul><li>Low overhead </li></ul>