• Share
  • Email
  • Embed
  • Like
  • Private Content
MongoDB and Fractal Tree Indexes
 

MongoDB and Fractal Tree Indexes

on

  • 5,529 views

Interested in learning more about MongoDB? Sign up for MongoSV, the largest annual user conference dedicated to MongoDB. Learn more at MongoSV.com

Interested in learning more about MongoDB? Sign up for MongoSV, the largest annual user conference dedicated to MongoDB. Learn more at MongoSV.com

Statistics

Views

Total Views
5,529
Views on SlideShare
4,132
Embed Views
1,397

Actions

Likes
12
Downloads
72
Comments
0

13 Embeds 1,397

http://www.10gen.com 650
http://www.mongodb.com 450
http://www.scoop.it 256
https://twitter.com 32
http://www.google.pl 1
https://www.google.com 1
http://translate.googleusercontent.com 1
http://161.27.27.33 1
http://drupal1.10gen.cc 1
http://varnish1.10gen.cc 1
https://si0.twimg.com 1
http://tweetedtimes.com 1
https://www.mongodb.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MongoDB and Fractal Tree Indexes MongoDB and Fractal Tree Indexes Presentation Transcript

    • MongoDB and ® Fractal Tree Indexes Tim Callaghan*! VP/Engineering, Tokutek! tim@tokutek.com! ! ! MongoDB Boston 2012 * not [yet] a MongoDB expert1
    • B-trees2
    • B-tree Definition In computer science, a B-tree is a tree datastructure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. http://en.wikipedia.org/wiki/B-tree
    • B-tree OverviewI will use a simple single-pivot example throughout this presentation
    • Basic B-tree Pivots Pointers Internal Nodes - Path to data Leaf Nodes - Actual Data 5
    • B-tree example 22 10 99 2, 3, 4 10,20 22,25 99 * Pivot Rule is >=
    • B-tree - insert “Insert 15” 22 10 99 2, 3, 4 10,15,20 22,25 99 Value stored in leaf node
    • B-tree - search “Find 25” 22 10 99 2, 3, 4 10,20 22,25 99
    • B-tree - storage Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes 22 RAM 10 99 DISK RAM 2, 3, 4 10,20 22,25 99
    • B-tree – serial insertions Serial insertion workloads are in-memory, think MongoDB’s “_id” index 22 RAM 10 99 DISK RAM 2, 3, 4 10,20 22,25 99
    • Fractal Tree Indexes11
    • Fractal Tree Indexes message All internal nodes buffer have message buffers message message buffer buffer similar to B-trees different than B-trees - store data in leaf nodes - message buffer in all internal nodes - use PK for ordering - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)
    • Fractal Tree Indexes – “insert 15” insert(15) 22 10 99 2, 3, 4 10, 20 22, 25 99 No IO is required, all internal nodes usually fit in RAM 13
    • Fractal Tree Indexes – “find 25” insert(15) 22 insert(20) insert(25) 10 99 delete(3) 2, 3, 4 10 22, 25 99 14
    • Fractal Tree Indexes – “insert 8” insert(15) 22 insert(20) insert(25) 10 99 delete(3) 2, 3, 4 10 22, 25 99 Buffer is full, push messages down to next level. 15
    • Fractal Tree Indexes – “insert 8” insert(15) 22 10 99 2, 4, 8 10, 20, 25 22, 25 99 Inserted 8, 20, 25. Deleted 3. 16
    • Fractal Tree Indexes – compression •  Large node size (4MB) leads to high compression ratios. •  Supports zlib, quicklz, and lzma compression algorithms. •  Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data. •  Significantly less disk space needed •  Less writes, bigger writes •  Both of which are great for SSDs •  Reads are highly compressed, more data per IO17
    • So what does this have to do with MongoDB?18
    • So what does this have to do with MongoDB? * Watch Tyler Brock’s presentation “Indexing and Query Optimization”19
    • MongoDB Storage db.test.insert({foo:55}) db.test.ensureIndex({foo:1}) PK index (_id + pointer) Secondary Index (foo + pointer) 25 85 10 99 40 120 (2,ptr2), (10,ptr10) (25,ptr25), (101,ptr101) (2,ptr10), (55,ptr4) (90,ptr2) (2599,ptr98) (4,ptr4) (98,ptr98) (35,ptr101) The “pointer” tells MongoDB where to look in the data files for the actualdocument data. 20
    • MongoDB Storage B-trees 25 85 10 99 40 120 (2,ptr2), (10,ptr10) (25,ptr25), (101,ptr101) (2,ptr10), (55,ptr4) (90,ptr2) (2599,ptr98) (4,ptr4) (98,ptr98) (35,ptr101) 21
    • Who is Tokutek and what have we done? •  Tokutek’s Fractal Tree Index Implementations •  MySQL Storage Engine (TokuDB) •  BerkeleyDB API •  File System (TokuFS) •  Recently added Fractal Tree Indexes to MongoDB 2.2 •  Existing indexes are still supported •  Source changes are available via our blog at www.tokutek.com/tokuview •  This is a work in progress (see roadmap slides)22
    • MongoDB and Fractal Tree Indexes as simple as db.test.ensureIndex({foo:1}, {v:2})23
    • Indexing Options #1 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Node size, defaults to 4MB.24
    • Indexing Options #2 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Basement node size, defaults to 128K. •  Smallest retrievable unit of a leaf node, efficient point queries25
    • Indexing Options #3 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Compression algorithm, defaults to quicklz. •  Supports quicklz, lzma, zlib, and none. •  LZMA provides 40% additional compression beyond quicklz, needs more CPU. •  Decompression is of quicklz and lzma are similar.26
    • Indexing Options #4 db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false}) •  Clustering indexes store data by key and include the entire document as the payload (rather than a pointer to the document) •  Always “cover” a query, no need to retrieve the document data27
    • How well does it perform? Three Benchmarks •  Benchmark 1 : Raw insertion performance •  Benchmark 2 : Insertion plus queries •  Benchmark 3 : Covered indexes vs. clustering indexes28
    • Benchmarks… Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank29
    • Benchmarks… Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.”30
    • Benchmarks… Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results.31
    • Benchmark 1 : Overview •  Measure single threaded insertion performance •  Document is URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp) •  Secondary indexes on URI, name, origin, expiration •  Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC032
    • Benchmark 1 : Without Journaling33
    • Benchmark 1 : With Journaling34
    • Benchmark 1 : Observations •  Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling •  Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions •  B-tree performance is great until the working data set > RAM35
    • Benchmark 2 : Overview •  Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds •  Document is same as benchmark 1 •  Secondary indexes on URI, name, origin, expiration •  Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere –  db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true}) •  Same hardware as benchmark 136
    • Benchmark 2 : Insertion Performance37
    • Benchmark 2 : Query Latency38
    • Benchmark 2 : Observations •  Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing •  Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing •  B-tree performance is great until the working data set > RAM •  Random lookups are bad ...but what about MongoDB’s covered indexes?39
    • Benchmark 3 : Overview •  Same workload and hardware as benchmark 2 •  Create a MongoDB covered index on URI to eliminate lookups in the data files. –  db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})40
    • Benchmark 3 : Insertion Performance41
    • Benchmark 3 : Query Latency42
    • Benchmark 3 : Observations •  Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing •  Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable) •  B-tree performance is great until the working data set > RAM •  MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my document? o Do I drop and re-create by including my new field? o Do I live without it? – Clustered Fractal Tree Indexes keep on covering your queries!43
    • Roadmap : Continuing the Implementation •  Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and committing a transaction for each operation – A single transaction envelope will improve performance44
    • Roadmap : Continuing the Implementation •  Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two fields: o {a: [1, 2], b: [1, 2]} – “it could get out of hand” – Ticketed on 3/24/2010, jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon…45
    • Roadmap : Continuing the Implementation •  Add Crash Safety – Our implementation is not [yet] crash safe with the MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index logs. – Need to create a transactional envelope around both of them46
    • Roadmap : Continuing the Implementation •  Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two storage systems – Compression greatly reduces disk footprint – This is a large task47
    • We are looking for evaluators! Email me at tim@tokutek.com See me after the presentation48
    • Questions? Tim Callaghan tim@tokutek.com @tmcallaghan More detailed benchmark information in my blogs at www.tokutek.com/tokuview49