SlideShare a Scribd company logo
1 of 34
Download to read offline
Indexing Delight
   Thinking Cap of Fractal-tree Indexes
                    BohuTANG@2012/12
                 overred.shuttler@gmail.com
B-tree
Invented in 1972, 40 years!
B-tree

                                                                 Block0




                           Block1                                 Block2                                           Block3
                                                ....                                         ....




            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...
B-tree Insert
                                                               Insert x

                                                                 Block0

                                                                                                          seek



                           Block1                                 Block2                                           Block3
                                                ....                                         ....




            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...
B-tree Insert
                                                               Insert x

                                                                 Block0

                                                                                                          seek



                           Block1                                 Block2                                           Block3
                                                ....                                         ....

                                                                                                                                     seek



            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...
B-tree Insert
                                                               Insert x

                                                                 Block0

                                                                                                          seek



                           Block1                                 Block2                                           Block3
                                                ....                                         ....

                                                                                                                                     seek



            Block4                                                                                                                   Block5
                                    .....................................................................................




File on disk:        ...       Block0                  ...             ...           Block3                 ...             Block5     ...



                           Insert one item causes many random seeks!
B-tree Search
                                             Search x

                                                Block0

                                                                                         seek



          Block1                                 Block2                                           Block3
                               ....                                         ....

                                                                                                           seek



 Block4                                                                                                    Block5
                   .....................................................................................




                     Query is fast, I/Os costs O(logBN)
B-tree Conclusions
●   Search: O(logBN ) block transfers.
●   Insert: O(logBN ) block transfers(slow).
●   B-tree range queries are slow.
●   IMPORTANT:
     --Parent and child blocks sparse in disk.
A Simplified Fractal-tree
Cache Oblivious Lookahead Array, invented by MITers
COLA


                                        log2N




           ...........


Binary Search in one level:O(log2N) 2
COLA (Using Fractional Cascading)


                                                      log2N




         ...........


●   Search: O(log2N) block transfers.
●   Insert: O((1/B)log2N) amortized block transfers.
●   Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
●   Balanced Binary Search Tree
COLA Conclusions

● Search: O(log2N) block transfers(Using Fractional
  Cascading).
● Insert: O((1/B)log2N) amortized block transfers.
● Data is stored in log2N arrays of sizes 2, 4, 8, 16,..
● Balanced Binary Search Tree
● Lookahead(Prefetch), Data-Intensive!
● BUT, the bottom level will be big and bigger,
  merging expensive.
COLA vs B-tree
● Search:
  -- (log2N)/(logBN)
     = log2B times slower than B-tree(In theory)
● Insert:
  --(logBN)/((1/B)log2N)
     = B/(log2B) times faster than B-trees(In theory)
if B = 4KB:
      COLA search is 12 times slower than B-tree
      COLA insert is 341 times faster than B-tree
LSM-tree
LSM-tree
                                                       In memory
                                 buffer



               buffer             ...                    buffer



      buffer     ...    buffer          ...   buffer        ...    buffer




●   Lazy insertion, Sorted before
●   Leveli is the buffer of Leveli+1
●   Search: O(logBN) * O(logN)
●   Insert:O((logBN)/B)
LSM-tree (Using Fractional Cascading)
                                                     In memory
                               buffer



             buffer             ...                    buffer



    buffer     ...    buffer          ...   buffer        ...    buffer




● Search: O(logBN) (Using FC)
● Insert:O((logBN)/B)
● 0.618 Fractal-tree?But NOT Cache Oblivious...
LSM-tree (Merging)
                                                           In memory
                                 buffer



             buffer               ...                        buffer
   merge              merge                   merge

    buffer     ...      buffer          ...       buffer        ...        buffer




A lot of I/O wasted during merging!
Like a headless fly flying...                                          Zzz...
Fractal-tree Indexes
Just Fractal. Patented by Tokutek...
Fractal-tree Indexes




Search: O(logBN) Insert: O((logBN)/B) (amortized)
Search is same as B-tree, but insert faster than B-tree
Fractal-tree Indexes (Block size)



                    ....


            ....     ....    ....




               B is 4MB...
Fractal-tree Indexes (Block size)


                    full


                     ....


            ....      ....   ....




               B is 4MB...
Fractal-tree Indexes (Block size)



            full     ....


            ....      ....   ....




                   B is 4MB...
Fractal-tree Indexes (Block size)

                                    ..

                            ..      ..         ..



                full




           ..                    ... ... ...             ..

      ..   ..          ..                           ..   ..   ..




           Fractal! 4MB one seek...
ε
B -tree
Just a constant factor on Block fanout...
ε
B -tree
             B-tree
      Fast                                ε=1/2



 Search




      Slow
                                                  AOF
              Slow
                                                   Fast
                      Inserts


                          Optimal Curve
ε
B -tree

                          insert            search

        B-tree           O(logBN)          O(logBN)
        (ɛ=1)

        ɛ=1/2         O((logBN)/√B)        O(logBN)

         ɛ=0            O((logN)/B)         O(logN)


 if we want optimal point queries + very fast inserts, we
 should choose ɛ=1/2
ε
B -tree




     So, if block size is B, the fanout should be √B
Cache Oblivious Data
Structure
All the above is JUST Cache Oblivious Data Structures...
Cache Oblivious Data Structure
Question:
   Reading a sequence of k consecutive blocks
at once is not much more expensive than
reading a single block. How to take advantage
of this feature?
Cache Oblivious Data Structure
My Questions(In Chinese):
Q1:
  只有1MB内存,怎样把两个64MB有序文件合
并成一个有序文件?

Q2:
  大多数机械磁盘,连续读取多个Block和读取
单个Block花费相差不大,在Q1中如何利用这个
优势?
nessDB
You should agree that VFS do better than yourself cache!
https://github.com/shuttler/nessDB
nessDB


             ..         ... ... ...        ..

     ..      ..   ..                  ..   ..   ..




          Each Block is Small-Splittable Tree
nessDB, What's going on?

                             ..

                     ..      ..         ..




         ..               ... ... ...             ..

    ..   ..     ..                           ..   ..   ..




              From the line to the plane..
Thanks!
Most of the references are from:
Tokutek & MIT CSAIL & Stony Brook.

Drafted By BohuTANG using Google Drive, @2012/12/12

More Related Content

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

B-tree Indexes and Fractal Tree Structures for Fast Data Access

  • 1. Indexing Delight Thinking Cap of Fractal-tree Indexes BohuTANG@2012/12 overred.shuttler@gmail.com
  • 3. B-tree Block0 Block1 Block2 Block3 .... .... Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ...
  • 4. B-tree Insert Insert x Block0 seek Block1 Block2 Block3 .... .... Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ...
  • 5. B-tree Insert Insert x Block0 seek Block1 Block2 Block3 .... .... seek Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ...
  • 6. B-tree Insert Insert x Block0 seek Block1 Block2 Block3 .... .... seek Block4 Block5 ..................................................................................... File on disk: ... Block0 ... ... Block3 ... Block5 ... Insert one item causes many random seeks!
  • 7. B-tree Search Search x Block0 seek Block1 Block2 Block3 .... .... seek Block4 Block5 ..................................................................................... Query is fast, I/Os costs O(logBN)
  • 8. B-tree Conclusions ● Search: O(logBN ) block transfers. ● Insert: O(logBN ) block transfers(slow). ● B-tree range queries are slow. ● IMPORTANT: --Parent and child blocks sparse in disk.
  • 9. A Simplified Fractal-tree Cache Oblivious Lookahead Array, invented by MITers
  • 10. COLA log2N ........... Binary Search in one level:O(log2N) 2
  • 11. COLA (Using Fractional Cascading) log2N ........... ● Search: O(log2N) block transfers. ● Insert: O((1/B)log2N) amortized block transfers. ● Data is stored in log2N arrays of sizes 2, 4, 8, 16,.. ● Balanced Binary Search Tree
  • 12. COLA Conclusions ● Search: O(log2N) block transfers(Using Fractional Cascading). ● Insert: O((1/B)log2N) amortized block transfers. ● Data is stored in log2N arrays of sizes 2, 4, 8, 16,.. ● Balanced Binary Search Tree ● Lookahead(Prefetch), Data-Intensive! ● BUT, the bottom level will be big and bigger, merging expensive.
  • 13. COLA vs B-tree ● Search: -- (log2N)/(logBN) = log2B times slower than B-tree(In theory) ● Insert: --(logBN)/((1/B)log2N) = B/(log2B) times faster than B-trees(In theory) if B = 4KB: COLA search is 12 times slower than B-tree COLA insert is 341 times faster than B-tree
  • 15. LSM-tree In memory buffer buffer ... buffer buffer ... buffer ... buffer ... buffer ● Lazy insertion, Sorted before ● Leveli is the buffer of Leveli+1 ● Search: O(logBN) * O(logN) ● Insert:O((logBN)/B)
  • 16. LSM-tree (Using Fractional Cascading) In memory buffer buffer ... buffer buffer ... buffer ... buffer ... buffer ● Search: O(logBN) (Using FC) ● Insert:O((logBN)/B) ● 0.618 Fractal-tree?But NOT Cache Oblivious...
  • 17. LSM-tree (Merging) In memory buffer buffer ... buffer merge merge merge buffer ... buffer ... buffer ... buffer A lot of I/O wasted during merging! Like a headless fly flying... Zzz...
  • 18. Fractal-tree Indexes Just Fractal. Patented by Tokutek...
  • 19. Fractal-tree Indexes Search: O(logBN) Insert: O((logBN)/B) (amortized) Search is same as B-tree, but insert faster than B-tree
  • 20. Fractal-tree Indexes (Block size) .... .... .... .... B is 4MB...
  • 21. Fractal-tree Indexes (Block size) full .... .... .... .... B is 4MB...
  • 22. Fractal-tree Indexes (Block size) full .... .... .... .... B is 4MB...
  • 23. Fractal-tree Indexes (Block size) .. .. .. .. full .. ... ... ... .. .. .. .. .. .. .. Fractal! 4MB one seek...
  • 24. ε B -tree Just a constant factor on Block fanout...
  • 25. ε B -tree B-tree Fast ε=1/2 Search Slow AOF Slow Fast Inserts Optimal Curve
  • 26. ε B -tree insert search B-tree O(logBN) O(logBN) (ɛ=1) ɛ=1/2 O((logBN)/√B) O(logBN) ɛ=0 O((logN)/B) O(logN) if we want optimal point queries + very fast inserts, we should choose ɛ=1/2
  • 27. ε B -tree So, if block size is B, the fanout should be √B
  • 28. Cache Oblivious Data Structure All the above is JUST Cache Oblivious Data Structures...
  • 29. Cache Oblivious Data Structure Question: Reading a sequence of k consecutive blocks at once is not much more expensive than reading a single block. How to take advantage of this feature?
  • 30. Cache Oblivious Data Structure My Questions(In Chinese): Q1: 只有1MB内存,怎样把两个64MB有序文件合 并成一个有序文件? Q2: 大多数机械磁盘,连续读取多个Block和读取 单个Block花费相差不大,在Q1中如何利用这个 优势?
  • 31. nessDB You should agree that VFS do better than yourself cache! https://github.com/shuttler/nessDB
  • 32. nessDB .. ... ... ... .. .. .. .. .. .. .. Each Block is Small-Splittable Tree
  • 33. nessDB, What's going on? .. .. .. .. .. ... ... ... .. .. .. .. .. .. .. From the line to the plane..
  • 34. Thanks! Most of the references are from: Tokutek & MIT CSAIL & Stony Brook. Drafted By BohuTANG using Google Drive, @2012/12/12