SlideShare a Scribd company logo
MongoDB and
                 ®
     Fractal Tree Indexes


              Tim Callaghan*!
         VP/Engineering, Tokutek!
            tim@tokutek.com!
                     !
                       !
        MongoDB Boston 2012


         * not [yet] a MongoDB expert



1
B-trees




2
B-tree Definition




    In computer science, a B-tree is a tree data
structure that keeps data sorted and allows searches,
      sequential access, insertions, and deletions
                  in logarithmic time.	

                            	

                            	

          http://en.wikipedia.org/wiki/B-tree
B-tree Overview




I will use a simple single-pivot example 	

      throughout this presentation
Basic B-tree




                       Pivots	

     Pointers	

                                         Internal Nodes -
                                         Path to data	





                      Leaf Nodes - 	

                      Actual Data	


5
B-tree example

                          22	





                10	

             99	





2, 3, 4	

     10,20	

           22,25	

   99	





             * Pivot Rule is >=
B-tree - insert


                   “Insert 15”	


                             22	





                  10	

              99	





2, 3, 4	

     10,15,20	

           22,25	

   99	



Value stored in leaf node
B-tree - search


                   “Find 25”	


                         22	





               10	

             99	





2, 3, 4	

    10,20	

           22,25	

   99
B-tree - storage


 Performance is IO limited when bigger than RAM:	

  try to fit all internal nodes and some leaf nodes	


                                    22	

                      RAM	





                          10	

             99	





DISK	

                                                        RAM	

          2, 3, 4	

     10,20	

           22,25	

   99
B-tree – serial insertions


      Serial insertion workloads are in-memory, 	

              think MongoDB’s “_id” index	


                                  22	

                      RAM	





                        10	

             99	





DISK	

                                                      RAM	

          2, 3, 4	

   10,20	

           22,25	

   99
Fractal Tree Indexes




11
Fractal Tree Indexes

                                              message                    All internal nodes
                                               buffer	

                have message buffers	





                message                                    message
                 buffer	

                                  buffer	





similar to B-trees	

         different than B-trees	

- store data in leaf nodes	

 - message buffer in all internal nodes	

- use PK for ordering	

      - doesn’t need to update leaf node immediately	

                              - much larger nodes (4MB vs. 8KB*)
Fractal Tree Indexes – “insert 15”

                                                         insert(15)	

                                     22	

                                                             	





                          10	

               99	





          2, 3, 4	

     10, 20	

           22, 25	

                   99	





      No IO is required, all internal nodes usually fit in RAM	




13
Fractal Tree Indexes – “find 25”

                                                   insert(15)	

                               22	

                                                       	




       insert(20)	

      insert(25)	

    10	

            99	

        delete(3)	

            	




          2, 3, 4	

   10	

           22, 25	

                   99	





14
Fractal Tree Indexes – “insert 8”

                                                               insert(15)	

                                           22	

                                                                   	




               insert(20)	

              insert(25)	

        10	

            99	

                delete(3)	

                    	




                  2, 3, 4	

       10	

           22, 25	

                   99	





      Buffer is full, push messages down to next level.	




15
Fractal Tree Indexes – “insert 8”

                                                            insert(15)	

                                        22	

                                                                	





                           10	

                 99	





       2, 4, 8	

       10, 20, 25	

           22, 25	

                   99	





                    Inserted 8, 20, 25. Deleted 3.	





16
Fractal Tree Indexes – compression

      •  Large node size (4MB) leads to high compression
         ratios.
      •  Supports zlib, quicklz, and lzma compression
         algorithms.
      •  Compression is generally 5x to 25x, similar to what
         gzip and 7z can do to your data.
      •  Significantly less disk space needed
      •  Less writes, bigger writes
        •  Both of which are great for SSDs
      •  Reads are highly compressed, more data per IO



17
So what does this have to do with
                MongoDB?




18
So what does this have to do with
                MongoDB?


      * Watch Tyler Brock’s presentation “Indexing
               and Query Optimization”




19
MongoDB Storage

                                                  db.test.insert({foo:55})	

                                                  db.test.ensureIndex({foo:1})	



          PK index (_id + pointer)	

                                             Secondary Index (foo + pointer)	

                                     25	

                                                                     85	





                         10	

                  99	

                                              40	

                 120	





      (2,ptr2),	

   (10,ptr10)	

           (25,ptr25),     (101,ptr101)	

    (2,ptr10),	

   (55,ptr4)	

           (90,ptr2)	

   (2599,ptr98)	

      (4,ptr4)	

                            (98,ptr98)	

                     (35,ptr101)	




The “pointer” tells MongoDB where to look in the data files for the actual
document data.	

20
MongoDB Storage


                                                                       B-trees	




                                     25	

                                                                     85	





                         10	

                  99	

                                              40	

                 120	





      (2,ptr2),	

   (10,ptr10)	

           (25,ptr25),     (101,ptr101)	

    (2,ptr10),	

   (55,ptr4)	

           (90,ptr2)	

   (2599,ptr98)	

      (4,ptr4)	

                            (98,ptr98)	

                     (35,ptr101)	





21
Who is Tokutek and what have we done?


      •  Tokutek’s Fractal Tree Index Implementations
           •  MySQL Storage Engine (TokuDB)
           •  BerkeleyDB API
           •  File System (TokuFS)
      •     Recently added Fractal Tree Indexes to
            MongoDB 2.2
      •     Existing indexes are still supported
      •     Source changes are available via our blog at
            www.tokutek.com/tokuview
      •     This is a work in progress (see roadmap
            slides)

22
MongoDB and Fractal Tree Indexes




                as simple as

      db.test.ensureIndex({foo:1}, {v:2})




23
Indexing Options #1


      db.test.ensureIndex({foo:1},{v:2,
         blocksize:4194304,
         basementsize=131072,
         compression:quicklz,
         clustering:false})

      •  Node size, defaults to 4MB.




24
Indexing Options #2


      db.test.ensureIndex({foo:1},{v:2,
         blocksize:4194304,
         basementsize=131072,
         compression:quicklz,
         clustering:false})

      •  Basement node size, defaults to 128K.
      •  Smallest retrievable unit of a leaf node,
         efficient point queries




25
Indexing Options #3


      db.test.ensureIndex({foo:1},{v:2,
         blocksize:4194304,
         basementsize=131072,
         compression:quicklz,
         clustering:false})

      •  Compression algorithm, defaults to quicklz.
      •  Supports quicklz, lzma, zlib, and none.
      •  LZMA provides 40% additional compression
         beyond quicklz, needs more CPU.
      •  Decompression is of quicklz and lzma are
         similar.
26
Indexing Options #4


      db.test.ensureIndex({foo:1},{v:2,
         blocksize:4194304,
         basementsize=131072,
         compression:quicklz,
         clustering:false})

      •  Clustering indexes store data by key and
         include the entire document as the payload
         (rather than a pointer to the document)
      •  Always “cover” a query, no need to retrieve
         the document data

27
How well does it perform?

      Three Benchmarks
      •  Benchmark 1 : Raw insertion performance
      •  Benchmark 2 : Insertion plus queries
      •  Benchmark 3 : Covered indexes vs. clustering
         indexes




28
Benchmarks…

      Race Results
      •  First Place = John
      •  Second Place = Tim
      •  Third Place = Frank




29
Benchmarks…

      Race Results
      •  First Place = John
      •  Second Place = Tim
      •  Third Place = Frank

      Frank can say the following:
      “I finished third, but Tim was second to last.”




30
Benchmarks…

      Race Results
      •  First Place = John
      •  Second Place = Tim
      •  Third Place = Frank

      Frank can say the following:
      “I finished third, but Tim was second to last.”



       Understand benchmark specifics and review all results.



31
Benchmark 1 : Overview

      •  Measure single threaded insertion performance
      •  Document is URI (character), name (character),
         origin (character), creation date (timestamp), and
         expiration date (timestamp)
      •  Secondary indexes on URI, name, origin, expiration
      •  Machine specifics:
       – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek
         Controller (256MB, write-back), 4x10K SAS/RAID 0
       – Ubuntu 10.04 Server (64-bit), ext4 filesystem
       – MongoDB v2.2.RC0




32
Benchmark 1 : Without Journaling




33
Benchmark 1 : With Journaling




34
Benchmark 1 : Observations

      •  Fractal Tree Indexing insertion performance is 8x
         better than standard MongoDB indexing with
         journaling, and 11x without journaling
      •  Fractal Tree Indexing insertion performance
         reaches steady state, even at 200 million
         insertions. MongoDB insertion performance seems
         to be in continual decline at only 50 million
         insertions
      •  B-tree performance is great until the working data
         set > RAM




35
Benchmark 2 : Overview

      •  Measure single threaded insertion
         performance while querying for 1000
         documents with a URI greater than or equal
         to a randomly selected value once every 60
         seconds
      •  Document is same as benchmark 1
      •  Secondary indexes on URI, name, origin, expiration
      •  Fractal Tree Index on URI is clustering
       – clustering indexes store entire document inline
       – Compression controls disk usage
       – no need to get document data from elsewhere
       –  db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true})

      •  Same hardware as benchmark 1
36
Benchmark 2 : Insertion Performance




37
Benchmark 2 : Query Latency




38
Benchmark 2 : Observations

      •  Fractal Tree Indexing insertion performance is 10x
         better than standard MongoDB indexing
      •  Fractal Tree Indexing query latency is 268x better
         than standard MongoDB indexing
      •  B-tree performance is great until the working data
         set > RAM
      •  Random lookups are bad



         ...but what about MongoDB’s covered indexes?



39
Benchmark 3 : Overview

      •  Same workload and hardware as benchmark 2
      •  Create a MongoDB covered index on URI to
         eliminate lookups in the data files.
       –  db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})




40
Benchmark 3 : Insertion Performance




41
Benchmark 3 : Query Latency




42
Benchmark 3 : Observations

      •  Fractal Tree Indexing insertion performance is still
         3.7x better than standard MongoDB indexing
      •  Fractal Tree Indexing query latency is 3.2x better
         than standard MongoDB indexing (although the
         MongoDB performance is highly variable)
      •  B-tree performance is great until the working data
         set > RAM
      •  MongoDB’s covered indexes can help a lot
       – But what happens when I add new fields to my
         document?
          o Do I drop and re-create by including my new field?
          o Do I live without it?
       – Clustered Fractal Tree Indexes keep on covering your
         queries!
43
Roadmap : Continuing the Implementation

      •  Optimize Indexing Insert/Update/Delete Operations
       – Each of our secondary indexes is currently creating and
         committing a transaction for each operation
       – A single transaction envelope will improve performance




44
Roadmap : Continuing the Implementation

      •  Add Support for Parallel Array Indexes
       – MongoDB does not support indexing the following two
         fields:
          o {a: [1, 2], b: [1, 2]}
       – “it could get out of hand”
       – Ticketed on 3/24/2010,
         jira.mongodb.org/browse/SERVER-826
       – Benchmark coming soon…




45
Roadmap : Continuing the Implementation

      •  Add Crash Safety
       – Our implementation is not [yet] crash safe with the
         MongoDB PK/heap storage mechanism.
       – MongoDB journal is separate from Fractal Tree Index
         logs.
       – Need to create a transactional envelope around both of
         them




46
Roadmap : Continuing the Implementation

      •  Replace MongoDB data store and PK index
       – A clustering index on _id eliminates the need for two
         storage systems
       – Compression greatly reduces disk footprint
       – This is a large task




47
We are looking for evaluators!




       Email me at tim@tokutek.com


       See me after the presentation




48
Questions?



                 Tim Callaghan
               tim@tokutek.com
                 @tmcallaghan



      More detailed benchmark information
                 in my blogs at
          www.tokutek.com/tokuview
49

More Related Content

What's hot

Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
Databricks
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
Harri Kauhanen
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
Nguyen Quang
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
Yue Chen
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
Alluxio, Inc.
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesArangoDB Database
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
Databricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
Cuelogic Technologies Pvt. Ltd.
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
ZHAO Sam
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
Databricks
 
concurrency-control
concurrency-controlconcurrency-control
concurrency-control
Saranya Natarajan
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overviewhonglee71
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
Navis Ryu
 

What's hot (20)

Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
 
Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
 
concurrency-control
concurrency-controlconcurrency-control
concurrency-control
 
Oracle Database Overview
Oracle Database OverviewOracle Database Overview
Oracle Database Overview
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 

Viewers also liked

The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
Cloudera, Inc.
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB RoadmapMongoDB
 
Mongodb index 讀書心得
Mongodb index 讀書心得Mongodb index 讀書心得
Mongodb index 讀書心得
cc liu
 
Web design basics 1
Web design basics 1Web design basics 1
Web design basics 1
Trivuz ত্রিভুজ
 
Database Design and Implementation
Database Design and ImplementationDatabase Design and Implementation
Database Design and ImplementationChristian Reina
 
Fractals presentation
Fractals presentationFractals presentation
Fractals presentationgbdriver80
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
 
MySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APIMySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APILixun Peng
 
MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋Lixun Peng
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程Lixun Peng
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践Lixun Peng
 
阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化Lixun Peng
 
DoubleBinlog方案
DoubleBinlog方案DoubleBinlog方案
DoubleBinlog方案Lixun Peng
 
Database management system presentation
Database management system presentationDatabase management system presentation
Database management system presentationsameerraaj
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
Mike Dirolf
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
Rommel Garcia
 

Viewers also liked (20)

The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
Mongodb index 讀書心得
Mongodb index 讀書心得Mongodb index 讀書心得
Mongodb index 讀書心得
 
Web design basics 1
Web design basics 1Web design basics 1
Web design basics 1
 
Database Design and Implementation
Database Design and ImplementationDatabase Design and Implementation
Database Design and Implementation
 
Fractals presentation
Fractals presentationFractals presentation
Fractals presentation
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
MySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APIMySQL源码分析.02.Handler API
MySQL源码分析.02.Handler API
 
MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践
 
阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化
 
DoubleBinlog方案
DoubleBinlog方案DoubleBinlog方案
DoubleBinlog方案
 
Database - Design & Implementation - 1
Database - Design & Implementation - 1Database - Design & Implementation - 1
Database - Design & Implementation - 1
 
Dbms slides
Dbms slidesDbms slides
Dbms slides
 
Database management system presentation
Database management system presentationDatabase management system presentation
Database management system presentation
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Similar to MongoDB and Fractal Tree Indexes

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
Rakuten Group, Inc.
 
北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file system
Cando Zhou
 
Frits Hoogland - About multiblock reads
Frits Hoogland - About multiblock readsFrits Hoogland - About multiblock reads
Frits Hoogland - About multiblock reads
Getting value from IoT, Integration and Data Analytics
 
RivieraJUG - MySQL 8.0 - What's new for developers.pdf
RivieraJUG - MySQL 8.0 - What's new for developers.pdfRivieraJUG - MySQL 8.0 - What's new for developers.pdf
RivieraJUG - MySQL 8.0 - What's new for developers.pdf
Frederic Descamps
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
Chris Richardson
 
FrozenRails Training
FrozenRails TrainingFrozenRails Training
FrozenRails TrainingMike Dirolf
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
Thomas Gottron
 
NoSQL - Motivation and Overview
NoSQL - Motivation and OverviewNoSQL - Motivation and Overview
NoSQL - Motivation and Overview
Jonathan Weiss
 
MAINVIEW for DB2.ppt
MAINVIEW for DB2.pptMAINVIEW for DB2.ppt
MAINVIEW for DB2.ppt
Sreedhar Ambatipudi
 
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Altair
 
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020Geir Høydalsvik
 
Taking browsers fuzzing new
Taking browsers fuzzing newTaking browsers fuzzing new
Taking browsers fuzzing newgeeksec80
 
Deep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_level
Deep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_levelDeep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_level
Deep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_levelgeeksec80
 

Similar to MongoDB and Fractal Tree Indexes (14)

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
 
北航云计算公开课03 google file system
北航云计算公开课03 google file system北航云计算公开课03 google file system
北航云计算公开课03 google file system
 
Frits Hoogland - About multiblock reads
Frits Hoogland - About multiblock readsFrits Hoogland - About multiblock reads
Frits Hoogland - About multiblock reads
 
RivieraJUG - MySQL 8.0 - What's new for developers.pdf
RivieraJUG - MySQL 8.0 - What's new for developers.pdfRivieraJUG - MySQL 8.0 - What's new for developers.pdf
RivieraJUG - MySQL 8.0 - What's new for developers.pdf
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
FrozenRails Training
FrozenRails TrainingFrozenRails Training
FrozenRails Training
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
NoSQL - Motivation and Overview
NoSQL - Motivation and OverviewNoSQL - Motivation and Overview
NoSQL - Motivation and Overview
 
OOPSLA Talk on Preon
OOPSLA Talk on PreonOOPSLA Talk on Preon
OOPSLA Talk on Preon
 
MAINVIEW for DB2.ppt
MAINVIEW for DB2.pptMAINVIEW for DB2.ppt
MAINVIEW for DB2.ppt
 
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
Moldex3D, Structural Analysis, and HyperStudy Integrated in HyperWorks Platfo...
 
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020
 
Taking browsers fuzzing new
Taking browsers fuzzing newTaking browsers fuzzing new
Taking browsers fuzzing new
 
Deep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_level
Deep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_levelDeep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_level
Deep sec 2012_rosario_valotta_-_taking_browsers_fuzzing_to_the_next_(dom)_level
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

MongoDB and Fractal Tree Indexes

  • 1. MongoDB and ® Fractal Tree Indexes Tim Callaghan*! VP/Engineering, Tokutek! tim@tokutek.com! ! ! MongoDB Boston 2012 * not [yet] a MongoDB expert 1
  • 3. B-tree Definition In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. http://en.wikipedia.org/wiki/B-tree
  • 4. B-tree Overview I will use a simple single-pivot example throughout this presentation
  • 5. Basic B-tree Pivots Pointers Internal Nodes - Path to data Leaf Nodes - Actual Data 5
  • 6. B-tree example 22 10 99 2, 3, 4 10,20 22,25 99 * Pivot Rule is >=
  • 7. B-tree - insert “Insert 15” 22 10 99 2, 3, 4 10,15,20 22,25 99 Value stored in leaf node
  • 8. B-tree - search “Find 25” 22 10 99 2, 3, 4 10,20 22,25 99
  • 9. B-tree - storage Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes 22 RAM 10 99 DISK RAM 2, 3, 4 10,20 22,25 99
  • 10. B-tree – serial insertions Serial insertion workloads are in-memory, think MongoDB’s “_id” index 22 RAM 10 99 DISK RAM 2, 3, 4 10,20 22,25 99
  • 12. Fractal Tree Indexes message All internal nodes buffer have message buffers message message buffer buffer similar to B-trees different than B-trees - store data in leaf nodes - message buffer in all internal nodes - use PK for ordering - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)
  • 13. Fractal Tree Indexes – “insert 15” insert(15) 22 10 99 2, 3, 4 10, 20 22, 25 99 No IO is required, all internal nodes usually fit in RAM 13
  • 14. Fractal Tree Indexes – “find 25” insert(15) 22 insert(20) insert(25) 10 99 delete(3) 2, 3, 4 10 22, 25 99 14
  • 15. Fractal Tree Indexes – “insert 8” insert(15) 22 insert(20) insert(25) 10 99 delete(3) 2, 3, 4 10 22, 25 99 Buffer is full, push messages down to next level. 15
  • 16. Fractal Tree Indexes – “insert 8” insert(15) 22 10 99 2, 4, 8 10, 20, 25 22, 25 99 Inserted 8, 20, 25. Deleted 3. 16
  • 17. Fractal Tree Indexes – compression •  Large node size (4MB) leads to high compression ratios. •  Supports zlib, quicklz, and lzma compression algorithms. •  Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data. •  Significantly less disk space needed •  Less writes, bigger writes •  Both of which are great for SSDs •  Reads are highly compressed, more data per IO 17
  • 18. So what does this have to do with MongoDB? 18
  • 19. So what does this have to do with MongoDB? * Watch Tyler Brock’s presentation “Indexing and Query Optimization” 19
  • 20. MongoDB Storage db.test.insert({foo:55}) db.test.ensureIndex({foo:1}) PK index (_id + pointer) Secondary Index (foo + pointer) 25 85 10 99 40 120 (2,ptr2), (10,ptr10) (25,ptr25), (101,ptr101) (2,ptr10), (55,ptr4) (90,ptr2) (2599,ptr98) (4,ptr4) (98,ptr98) (35,ptr101) The “pointer” tells MongoDB where to look in the data files for the actual document data. 20
  • 21. MongoDB Storage B-trees 25 85 10 99 40 120 (2,ptr2), (10,ptr10) (25,ptr25), (101,ptr101) (2,ptr10), (55,ptr4) (90,ptr2) (2599,ptr98) (4,ptr4) (98,ptr98) (35,ptr101) 21
  • 22. Who is Tokutek and what have we done? •  Tokutek’s Fractal Tree Index Implementations •  MySQL Storage Engine (TokuDB) •  BerkeleyDB API •  File System (TokuFS) •  Recently added Fractal Tree Indexes to MongoDB 2.2 •  Existing indexes are still supported •  Source changes are available via our blog at www.tokutek.com/tokuview •  This is a work in progress (see roadmap slides) 22
  • 23. MongoDB and Fractal Tree Indexes as simple as db.test.ensureIndex({foo:1}, {v:2}) 23
  • 24. Indexing Options #1 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Node size, defaults to 4MB. 24
  • 25. Indexing Options #2 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Basement node size, defaults to 128K. •  Smallest retrievable unit of a leaf node, efficient point queries 25
  • 26. Indexing Options #3 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Compression algorithm, defaults to quicklz. •  Supports quicklz, lzma, zlib, and none. •  LZMA provides 40% additional compression beyond quicklz, needs more CPU. •  Decompression is of quicklz and lzma are similar. 26
  • 27. Indexing Options #4 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Clustering indexes store data by key and include the entire document as the payload (rather than a pointer to the document) •  Always “cover” a query, no need to retrieve the document data 27
  • 28. How well does it perform? Three Benchmarks •  Benchmark 1 : Raw insertion performance •  Benchmark 2 : Insertion plus queries •  Benchmark 3 : Covered indexes vs. clustering indexes 28
  • 29. Benchmarks… Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank 29
  • 30. Benchmarks… Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” 30
  • 31. Benchmarks… Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results. 31
  • 32. Benchmark 1 : Overview •  Measure single threaded insertion performance •  Document is URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp) •  Secondary indexes on URI, name, origin, expiration •  Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC0 32
  • 33. Benchmark 1 : Without Journaling 33
  • 34. Benchmark 1 : With Journaling 34
  • 35. Benchmark 1 : Observations •  Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling •  Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions •  B-tree performance is great until the working data set > RAM 35
  • 36. Benchmark 2 : Overview •  Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds •  Document is same as benchmark 1 •  Secondary indexes on URI, name, origin, expiration •  Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere –  db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true}) •  Same hardware as benchmark 1 36
  • 37. Benchmark 2 : Insertion Performance 37
  • 38. Benchmark 2 : Query Latency 38
  • 39. Benchmark 2 : Observations •  Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing •  Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing •  B-tree performance is great until the working data set > RAM •  Random lookups are bad ...but what about MongoDB’s covered indexes? 39
  • 40. Benchmark 3 : Overview •  Same workload and hardware as benchmark 2 •  Create a MongoDB covered index on URI to eliminate lookups in the data files. –  db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1}) 40
  • 41. Benchmark 3 : Insertion Performance 41
  • 42. Benchmark 3 : Query Latency 42
  • 43. Benchmark 3 : Observations •  Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing •  Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable) •  B-tree performance is great until the working data set > RAM •  MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my document? o Do I drop and re-create by including my new field? o Do I live without it? – Clustered Fractal Tree Indexes keep on covering your queries! 43
  • 44. Roadmap : Continuing the Implementation •  Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and committing a transaction for each operation – A single transaction envelope will improve performance 44
  • 45. Roadmap : Continuing the Implementation •  Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two fields: o {a: [1, 2], b: [1, 2]} – “it could get out of hand” – Ticketed on 3/24/2010, jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon… 45
  • 46. Roadmap : Continuing the Implementation •  Add Crash Safety – Our implementation is not [yet] crash safe with the MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index logs. – Need to create a transactional envelope around both of them 46
  • 47. Roadmap : Continuing the Implementation •  Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two storage systems – Compression greatly reduces disk footprint – This is a large task 47
  • 48. We are looking for evaluators! Email me at tim@tokutek.com See me after the presentation 48
  • 49. Questions? Tim Callaghan tim@tokutek.com @tmcallaghan More detailed benchmark information in my blogs at www.tokutek.com/tokuview 49