SlideShare a Scribd company logo
1 of 53
Download to read offline
®

Fractal Tree Indexes
Theory to Practice
Percona Live London 2013
Tim Callaghan, Tokutek
tim@tokutek.com
@tmcallaghan
®

Tuesday, November 12, 13
Ever seen this?

IO Utilization Graph, performance is IO limited

®

Tuesday, November 12, 13
Who is Tokutek?

Tokutek builds highperformance database
software!
TokuDB - storage engine for
MySQL and MariaDB
TokuMX - storage engine for
MongoDB

Developer Interface"
"
Storage Engine"

HDD & SSD!
storage"

®

Tuesday, November 12, 13
Who am I?

• 17 year database consumer
• schema design, development, deployment
• database administration + infrastructure
• mostly Oracle
• 5 year database producer
• 2 years @ VoltDB
• 2+ years @ Tokutek

®

Tuesday, November 12, 13
Housekeeping

• Feedback is important to me
• Ideas for Webinars or Presentations?
• Who’s using MongoDB?
• Anyone using TokuDB or TokuMX?
• Please ask questions

®

Tuesday, November 12, 13
Agenda

• Why Fractal Tree indexes are cool
• What they enable in MySQL® (TokuDB)
• What they enable in MongoDB® (TokuMX)
• Q+A

®

Tuesday, November 12, 13
Indexing:
B-trees and
Fractal Tree Indexes

®

Tuesday, November 12, 13
B-trees

®

Tuesday, November 12, 13
B-tree Overview - vocabulary

Pivots
Pointers

Internal Nodes Path to data

Leaf Nodes Actual Data Sorted
®

Tuesday, November 12, 13
B-tree Overview - example
22

10

2, 3, 4

10,20

99

22,25

99

* Pivot Rule is >=

®

Tuesday, November 12, 13
B-tree Overview - search
“Find 25”
22

10

2, 3, 4

10,20

99

22,25

99

®

Tuesday, November 12, 13
B-tree Overview - insert
“Insert 15”
22

10

2, 3, 4

10,15,20

99

22,25

99

®

Tuesday, November 12, 13
B-tree Overview - performance
Performance is IO limited when data > RAM,
one IO is needed for each insert/update
(actually it’s one IO for every index on the table)
RAM

22

10

99

RAM

DISK
2, 3, 4

10,20

22,25

99

®

Tuesday, November 12, 13
Fractal Tree Indexes

®

Tuesday, November 12, 13
Fractal Tree Indexes
message
buffer

message
buffer

All internal nodes
have message
buffers

message
buffer

As buffers overflow,
they cascade down
the tree

Messages are
eventually applied to
leaf nodes

similar to B-trees
•store data in leaf nodes
•use index key for ordering

different than B-trees
•message buffers
•big nodes (4MB vs. ~16KB)
®

Tuesday, November 12, 13
Fractal Tree Indexes - sample data
25

10

2,3,4

10,20

99

22,25

99

Looks a lot like a b-tree!

®

Tuesday, November 12, 13
Fractal Tree Indexes - insert
insert 15;
insert (15)

25

10

2,3,4

•
•
•
•

99

10,20

22,25

99

search operations must consider messages along the way
messages cascade down the tree as buffers fill up
they are eventually applied to the leaf nodes, hundreds or
thousands of operations for a single IO
CPU and cache are conserved as important data is not ejected
®

Tuesday, November 12, 13
Fractal Tree Indexes - other operations
25

delete(8)
delete(2)
insert (8)

2,3,4

10

10,20

add_column(c4 bigint)
delete(99)
increment(22,+5)
...

99

22,25

insert (100)

99

Lots of operations can be messages!

®

Tuesday, November 12, 13
TokuDB
Fractal Tree Indexing +
MySQL/MariaDB

®

Tuesday, November 12, 13
What is TokuDB?

Transactional MySQL Storage Engine - think InnoDB
Available for MySQL 5.5 and MariaDB 5.5
ACID and MVCC
Free/OSS Community Edition
– http://github.com/Tokutek/ft-engine
• Enterprise Edition
– Commercial support + hot backup
•
•
•
•

Performance + Compression + Agility

20

Tuesday, November 12, 13

®
TokuDB Performance
Warning - Benchmarks Ahead!

®

Tuesday, November 12, 13
Indexed Insertion Performance
• High-performance insert/update/delete for large
databases (> RAM) while maintaining indexes

* old numbers, now > 25K/sec
22

Tuesday, November 12, 13

®
Sysbench Performance
Sysbench read/write workload, > RAM

The fastest IO is the one you never have to do (compression)
23

Tuesday, November 12, 13

®
Performance Advantages
•
•

•
•

Efficient index maintenance, especially secondary
indexes
Clustered secondary indexes
• Additional copy of the row is stored in the index
• No additional IO to get row data from primary key
• Think better covering index (all non-indexed columns)
• Compression eliminates size concerns
Big blocks = sequential IO for range scans
• Basement nodes are always co-located
Multi-threaded bulk loader

®

24

Tuesday, November 12, 13
TokuDB Compression

®

Tuesday, November 12, 13
Compression: TokuDB vs. InnoDB

• InnoDB compression misses force node splits, which
greatly reduces performance
– MySQL 5.6 “dynamic padding” (from FB), less cache
• Larger block size and flexible on-disk size wins!
• Multiple compression algorithms (lzma, quicklz, zlib)
• Larger, less frequent writes (much less IO)
• Why it matters on spinning disks:
– Compressed reads and amortized compressed writes
overcome IO limitations
• Why it matters on flash/SSD:
– Buy less : 250GB * 10x = as 2.5TB)
– Large/less frequent writes are flash friendly
26

Tuesday, November 12, 13

®
Compression + IO Reduction

• Server was at 90% IO utilization with InnoDB,
10% IO utilization with TokuDB

27

Tuesday, November 12, 13

®
Compression Performance
• iiBench benchmark

28

Tuesday, November 12, 13

®
Compression Achieved
• log data (extremely compressible)

29

Tuesday, November 12, 13

®
TokuDB Agility

®

Tuesday, November 12, 13
The Challenge of MySQL Schema Changes
• Common schema changes can take hours in
MySQL
– Adding, dropping, or expanding a column
– Adding an index
• And the table is unavailable for writes during the
process
• As a workaround, people generally
– Use a replication slave, then swap with master
– Use helper tools: Percona OSC, MySQL 5.6
o These have IO, CPU, RAM consequences

31

Tuesday, November 12, 13

®
Schema Changes Without Downtime
• In TokuDB, column add/drop/expand is
instantaneous
– “it’s just a message”
• Indexes can be created in the background while
table is fully available
– TokuDB just builds the index, it does not
rebuild the table (MySQL getting better)

32

Tuesday, November 12, 13

®
TokuMX
Fractal Tree Indexing +
MongoDB

®

Tuesday, November 12, 13
What is TokuMX?

• TokuMX = MongoDB with improved storage (Fractal Tree indexes)
• Drop in replacement for MongoDB v2.2 applications
– Including replication and sharding
– Same data model
– Same query language
– Drivers just work
• Open Source
– http://github.com/Tokutek/mongo

Performance + Compression + Transactions
®

Tuesday, November 12, 13
MongoDB Storage
memory mapped heap

db.test.insert({foo:55})
db.test.ensureIndex({foo:1})

PK index (_id + pointer)

Secondary index (foo + pointer)

18

85

4

(1,ptr5)

(4,ptr1),
(12,ptr8)

5555

(19,ptr7)

40

(10000,ptr2)

(2,ptr5),
(22,ptr6)

(50,ptr4)

120

(100,ptr7)

(222,ptr3)

The “pointer” tells MongoDB where to look in the heap for the requested
document (another IO)
®

35

Tuesday, November 12, 13
TokuMX Storage
db.test.insert({foo:55})
db.test.ensureIndex({foo:1})

memory mapped heap

PK index (_id + document)

Secondary index (foo + _id)

18

4

(1,doc)

(4,doc),
(12,doc)

85

5555

(19,doc)

40

(10000,doc)

(2,4), (22,12)

(50,19)

120

(100,10000)

(222,1)

One less IO per _id lookup, document is clustered in the index
®

36

Tuesday, November 12, 13
TokuMX Performance

®

Tuesday, November 12, 13
Performance - Indexed Insertion
• 100mm inserts into a collection with 3 secondary indexes

38

Tuesday, November 12, 13

®
Performance - Inserts on Indexed Arrays
• Indexed Insertion : Multikey (100 inserts per doc)

39

Tuesday, November 12, 13

®
Performance - Replication
• TokuMX replication allows secondary servers to process
replication without IO
– Simply injecting messages into the Fractal Tree
Indexes on the secondary server
– The “Hard Work” was done on the primary
o Uniqueness checking
o Transactional locking
o Update effort (read-before-write)
– Elimination of replication lag
• Your secondaries are fully available for read scaling!
– Wasn’t that the point?

40

Tuesday, November 12, 13

®
Performance - Lock Refinement
• TokuMX performs locking at the document level
– Extreme concurrency!
instance
database
collection

MongoDB v2.0
database

collection collection

collection

document

document document document

document

document

document

MongoDB v2.2

TokuMX

document
document
document

41

Tuesday, November 12, 13

®
Performance - Lock Refinement

42

Tuesday, November 12, 13

®
Performance - Lock Refinement + Reduced IO
• Sysbench benchmark (> RAM)

43

Tuesday, November 12, 13

®
Performance - Reduced IO
– Indexed insertion benchmark

44

Tuesday, November 12, 13

®
Performance - Clustered Indexes
•

Clustered secondary indexes
• Additional copy of the document is stored in the index
• No additional IO to get row data from primary key
• Think better covered index (all non-indexed fields)
• Good for point queries, great for range scans
• Compression eliminates size concerns

45

Tuesday, November 12, 13

®
Performance - Memory Management

• Two approaches to memory management
– MongoDB = memory-mapped files
o Operating system determines what data is
important
– TokuMX = managed cache
o User defined size
o TokuMX determines what data is important
• Run multiple TokuMX instances on a single server
– Each has it’s own fixed cache size

46

Tuesday, November 12, 13

®
TokuMX Compression

®

Tuesday, November 12, 13
Compression
• MongoDB does not offer compression
– Compressed file systems?
– Shortened field names?
o Remember: each field name is stored in every single document

• TokuMX easily achieves 5x-10x compression
– Buy less disk or flash
– Compressed reads and writes reduce overall IO
• TokuMX support 3 compression types
– zlib, quicklz, lzma (size vs. speed)
– all data is compressed
• Use descriptive field names!
– They are easy to compress

48

Tuesday, November 12, 13

®
Compression
• 31 million documents, bit torrent peer data
– http://cs.brown.edu/~pavlo/torrent/

49

Tuesday, November 12, 13

®
TokuMX Transactions

®

Tuesday, November 12, 13
ACID + MVCC
• ACID
– In MongoDB, multi-insertion operations allow for
partial success
o Asked to store 5 documents, 3 succeeded

– We offer “all or nothing” behavior
– Document level locking

• MVCC
– In MongoDB, queries can be interrupted by writers.
o The effect of these writers are visible to the reader

– TokuMX offers MVCC
o Reads are consistent as of the operation start

51

Tuesday, November 12, 13

®
Multi-statement Transactions

• TokuMX brings the following to MongoDB
– db.runCommand({“beginTransaction”, “isolation”:
“mvcc”})
– ... perform 1 or more operations
– db.runCommand(“rollbackTransaction”) |
db.runCommand(“commitTransaction”)
• Not allowed in sharded environments
– mongos will reject

52

Tuesday, November 12, 13

®
Questions?

Tim Callaghan
VP/Engineering, Tokutek
tim@tokutek.com
@tmcallaghan

®

Tuesday, November 12, 13

More Related Content

What's hot

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Oracle-DB: Performance Analysis with Panorama
Oracle-DB: Performance Analysis with PanoramaOracle-DB: Performance Analysis with Panorama
Oracle-DB: Performance Analysis with PanoramaPeter Ramm
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Monitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMonitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMongoDB
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Philip Fisher-Ogden
 
PostgreSQL Tuning: O elefante mais rápido que um leopardo
PostgreSQL Tuning: O elefante mais rápido que um leopardoPostgreSQL Tuning: O elefante mais rápido que um leopardo
PostgreSQL Tuning: O elefante mais rápido que um leopardoelliando dias
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0MongoDB
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberYing Zheng
 
plProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerplProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerelliando dias
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraAmazon Web Services
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
 
Active Session History in PostgreSQL:
Active Session History in PostgreSQL:Active Session History in PostgreSQL:
Active Session History in PostgreSQL:BertrandDrouvot
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 

What's hot (20)

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Oracle-DB: Performance Analysis with Panorama
Oracle-DB: Performance Analysis with PanoramaOracle-DB: Performance Analysis with Panorama
Oracle-DB: Performance Analysis with Panorama
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Monitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with DatadogMonitoring MongoDB Atlas with Datadog
Monitoring MongoDB Atlas with Datadog
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
PostgreSQL Tuning: O elefante mais rápido que um leopardo
PostgreSQL Tuning: O elefante mais rápido que um leopardoPostgreSQL Tuning: O elefante mais rápido que um leopardo
PostgreSQL Tuning: O elefante mais rápido que um leopardo
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
plProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerplProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancer
 
Allyourbase
AllyourbaseAllyourbase
Allyourbase
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Active Session History in PostgreSQL:
Active Session History in PostgreSQL:Active Session History in PostgreSQL:
Active Session History in PostgreSQL:
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 

Viewers also liked

Database Design and Implementation
Database Design and ImplementationDatabase Design and Implementation
Database Design and ImplementationChristian Reina
 
Fractals presentation
Fractals presentationFractals presentation
Fractals presentationgbdriver80
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerSpark Summit
 
MySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APIMySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APILixun Peng
 
MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋Lixun Peng
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程Lixun Peng
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践Lixun Peng
 
阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化Lixun Peng
 
DoubleBinlog方案
DoubleBinlog方案DoubleBinlog方案
DoubleBinlog方案Lixun Peng
 
Database management system presentation
Database management system presentationDatabase management system presentation
Database management system presentationsameerraaj
 
The traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DRThe traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DRClearSky Data
 

Viewers also liked (19)

MySQL 5.7 + Java
MySQL 5.7 + JavaMySQL 5.7 + Java
MySQL 5.7 + Java
 
Big challenges
Big challengesBig challenges
Big challenges
 
Web design basics 1
Web design basics 1Web design basics 1
Web design basics 1
 
Database Design and Implementation
Database Design and ImplementationDatabase Design and Implementation
Database Design and Implementation
 
Fractals presentation
Fractals presentationFractals presentation
Fractals presentation
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
MySQL源码分析.02.Handler API
MySQL源码分析.02.Handler APIMySQL源码分析.02.Handler API
MySQL源码分析.02.Handler API
 
MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋MySQL优化、新特性和新架构 彭立勋
MySQL优化、新特性和新架构 彭立勋
 
MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程MySQL源码分析.01.代码结构与基本流程
MySQL源码分析.01.代码结构与基本流程
 
MySQL新技术探索与实践
MySQL新技术探索与实践MySQL新技术探索与实践
MySQL新技术探索与实践
 
阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化阿里云RDS for MySQL的若干优化
阿里云RDS for MySQL的若干优化
 
DoubleBinlog方案
DoubleBinlog方案DoubleBinlog方案
DoubleBinlog方案
 
Database - Design & Implementation - 1
Database - Design & Implementation - 1Database - Design & Implementation - 1
Database - Design & Implementation - 1
 
Dbms slides
Dbms slidesDbms slides
Dbms slides
 
Database management system presentation
Database management system presentationDatabase management system presentation
Database management system presentation
 
The traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DRThe traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DR
 
State of the Cloud 2017
State of the Cloud 2017State of the Cloud 2017
State of the Cloud 2017
 
FractalTreeIndex
FractalTreeIndexFractalTreeIndex
FractalTreeIndex
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 

Similar to Fractal Tree Indexes : From Theory to Practice

Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02Francisco Gonçalves
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruUse Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBTim Callaghan
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Alluxio, Inc.
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMinsk MongoDB User Group
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Ontico
 
When is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar SeriesWhen is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar SeriesAlkin Tezuysal
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSeeQuality.net
 

Similar to Fractal Tree Indexes : From Theory to Practice (20)

Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruUse Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB Guru
 
Toku DB by Aswin
Toku DB by AswinToku DB by Aswin
Toku DB by Aswin
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
week1slides1704202828322.pdf
week1slides1704202828322.pdfweek1slides1704202828322.pdf
week1slides1704202828322.pdf
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
 
Meetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebServiceMeetup#2: Building responsive Symbology & Suggest WebService
Meetup#2: Building responsive Symbology & Suggest WebService
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
 
When is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar SeriesWhen is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar Series
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
 

More from Tim Callaghan

Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceTim Callaghan
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneTim Callaghan
 
So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)Tim Callaghan
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruTim Callaghan
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksTim Callaghan
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical OverviewTim Callaghan
 

More from Tim Callaghan (7)

Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical Overview
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Fractal Tree Indexes : From Theory to Practice

  • 1. ® Fractal Tree Indexes Theory to Practice Percona Live London 2013 Tim Callaghan, Tokutek tim@tokutek.com @tmcallaghan ® Tuesday, November 12, 13
  • 2. Ever seen this? IO Utilization Graph, performance is IO limited ® Tuesday, November 12, 13
  • 3. Who is Tokutek? Tokutek builds highperformance database software! TokuDB - storage engine for MySQL and MariaDB TokuMX - storage engine for MongoDB Developer Interface" " Storage Engine" HDD & SSD! storage" ® Tuesday, November 12, 13
  • 4. Who am I? • 17 year database consumer • schema design, development, deployment • database administration + infrastructure • mostly Oracle • 5 year database producer • 2 years @ VoltDB • 2+ years @ Tokutek ® Tuesday, November 12, 13
  • 5. Housekeeping • Feedback is important to me • Ideas for Webinars or Presentations? • Who’s using MongoDB? • Anyone using TokuDB or TokuMX? • Please ask questions ® Tuesday, November 12, 13
  • 6. Agenda • Why Fractal Tree indexes are cool • What they enable in MySQL® (TokuDB) • What they enable in MongoDB® (TokuMX) • Q+A ® Tuesday, November 12, 13
  • 7. Indexing: B-trees and Fractal Tree Indexes ® Tuesday, November 12, 13
  • 9. B-tree Overview - vocabulary Pivots Pointers Internal Nodes Path to data Leaf Nodes Actual Data Sorted ® Tuesday, November 12, 13
  • 10. B-tree Overview - example 22 10 2, 3, 4 10,20 99 22,25 99 * Pivot Rule is >= ® Tuesday, November 12, 13
  • 11. B-tree Overview - search “Find 25” 22 10 2, 3, 4 10,20 99 22,25 99 ® Tuesday, November 12, 13
  • 12. B-tree Overview - insert “Insert 15” 22 10 2, 3, 4 10,15,20 99 22,25 99 ® Tuesday, November 12, 13
  • 13. B-tree Overview - performance Performance is IO limited when data > RAM, one IO is needed for each insert/update (actually it’s one IO for every index on the table) RAM 22 10 99 RAM DISK 2, 3, 4 10,20 22,25 99 ® Tuesday, November 12, 13
  • 15. Fractal Tree Indexes message buffer message buffer All internal nodes have message buffers message buffer As buffers overflow, they cascade down the tree Messages are eventually applied to leaf nodes similar to B-trees •store data in leaf nodes •use index key for ordering different than B-trees •message buffers •big nodes (4MB vs. ~16KB) ® Tuesday, November 12, 13
  • 16. Fractal Tree Indexes - sample data 25 10 2,3,4 10,20 99 22,25 99 Looks a lot like a b-tree! ® Tuesday, November 12, 13
  • 17. Fractal Tree Indexes - insert insert 15; insert (15) 25 10 2,3,4 • • • • 99 10,20 22,25 99 search operations must consider messages along the way messages cascade down the tree as buffers fill up they are eventually applied to the leaf nodes, hundreds or thousands of operations for a single IO CPU and cache are conserved as important data is not ejected ® Tuesday, November 12, 13
  • 18. Fractal Tree Indexes - other operations 25 delete(8) delete(2) insert (8) 2,3,4 10 10,20 add_column(c4 bigint) delete(99) increment(22,+5) ... 99 22,25 insert (100) 99 Lots of operations can be messages! ® Tuesday, November 12, 13
  • 19. TokuDB Fractal Tree Indexing + MySQL/MariaDB ® Tuesday, November 12, 13
  • 20. What is TokuDB? Transactional MySQL Storage Engine - think InnoDB Available for MySQL 5.5 and MariaDB 5.5 ACID and MVCC Free/OSS Community Edition – http://github.com/Tokutek/ft-engine • Enterprise Edition – Commercial support + hot backup • • • • Performance + Compression + Agility 20 Tuesday, November 12, 13 ®
  • 21. TokuDB Performance Warning - Benchmarks Ahead! ® Tuesday, November 12, 13
  • 22. Indexed Insertion Performance • High-performance insert/update/delete for large databases (> RAM) while maintaining indexes * old numbers, now > 25K/sec 22 Tuesday, November 12, 13 ®
  • 23. Sysbench Performance Sysbench read/write workload, > RAM The fastest IO is the one you never have to do (compression) 23 Tuesday, November 12, 13 ®
  • 24. Performance Advantages • • • • Efficient index maintenance, especially secondary indexes Clustered secondary indexes • Additional copy of the row is stored in the index • No additional IO to get row data from primary key • Think better covering index (all non-indexed columns) • Compression eliminates size concerns Big blocks = sequential IO for range scans • Basement nodes are always co-located Multi-threaded bulk loader ® 24 Tuesday, November 12, 13
  • 26. Compression: TokuDB vs. InnoDB • InnoDB compression misses force node splits, which greatly reduces performance – MySQL 5.6 “dynamic padding” (from FB), less cache • Larger block size and flexible on-disk size wins! • Multiple compression algorithms (lzma, quicklz, zlib) • Larger, less frequent writes (much less IO) • Why it matters on spinning disks: – Compressed reads and amortized compressed writes overcome IO limitations • Why it matters on flash/SSD: – Buy less : 250GB * 10x = as 2.5TB) – Large/less frequent writes are flash friendly 26 Tuesday, November 12, 13 ®
  • 27. Compression + IO Reduction • Server was at 90% IO utilization with InnoDB, 10% IO utilization with TokuDB 27 Tuesday, November 12, 13 ®
  • 28. Compression Performance • iiBench benchmark 28 Tuesday, November 12, 13 ®
  • 29. Compression Achieved • log data (extremely compressible) 29 Tuesday, November 12, 13 ®
  • 31. The Challenge of MySQL Schema Changes • Common schema changes can take hours in MySQL – Adding, dropping, or expanding a column – Adding an index • And the table is unavailable for writes during the process • As a workaround, people generally – Use a replication slave, then swap with master – Use helper tools: Percona OSC, MySQL 5.6 o These have IO, CPU, RAM consequences 31 Tuesday, November 12, 13 ®
  • 32. Schema Changes Without Downtime • In TokuDB, column add/drop/expand is instantaneous – “it’s just a message” • Indexes can be created in the background while table is fully available – TokuDB just builds the index, it does not rebuild the table (MySQL getting better) 32 Tuesday, November 12, 13 ®
  • 33. TokuMX Fractal Tree Indexing + MongoDB ® Tuesday, November 12, 13
  • 34. What is TokuMX? • TokuMX = MongoDB with improved storage (Fractal Tree indexes) • Drop in replacement for MongoDB v2.2 applications – Including replication and sharding – Same data model – Same query language – Drivers just work • Open Source – http://github.com/Tokutek/mongo Performance + Compression + Transactions ® Tuesday, November 12, 13
  • 35. MongoDB Storage memory mapped heap db.test.insert({foo:55}) db.test.ensureIndex({foo:1}) PK index (_id + pointer) Secondary index (foo + pointer) 18 85 4 (1,ptr5) (4,ptr1), (12,ptr8) 5555 (19,ptr7) 40 (10000,ptr2) (2,ptr5), (22,ptr6) (50,ptr4) 120 (100,ptr7) (222,ptr3) The “pointer” tells MongoDB where to look in the heap for the requested document (another IO) ® 35 Tuesday, November 12, 13
  • 36. TokuMX Storage db.test.insert({foo:55}) db.test.ensureIndex({foo:1}) memory mapped heap PK index (_id + document) Secondary index (foo + _id) 18 4 (1,doc) (4,doc), (12,doc) 85 5555 (19,doc) 40 (10000,doc) (2,4), (22,12) (50,19) 120 (100,10000) (222,1) One less IO per _id lookup, document is clustered in the index ® 36 Tuesday, November 12, 13
  • 38. Performance - Indexed Insertion • 100mm inserts into a collection with 3 secondary indexes 38 Tuesday, November 12, 13 ®
  • 39. Performance - Inserts on Indexed Arrays • Indexed Insertion : Multikey (100 inserts per doc) 39 Tuesday, November 12, 13 ®
  • 40. Performance - Replication • TokuMX replication allows secondary servers to process replication without IO – Simply injecting messages into the Fractal Tree Indexes on the secondary server – The “Hard Work” was done on the primary o Uniqueness checking o Transactional locking o Update effort (read-before-write) – Elimination of replication lag • Your secondaries are fully available for read scaling! – Wasn’t that the point? 40 Tuesday, November 12, 13 ®
  • 41. Performance - Lock Refinement • TokuMX performs locking at the document level – Extreme concurrency! instance database collection MongoDB v2.0 database collection collection collection document document document document document document document MongoDB v2.2 TokuMX document document document 41 Tuesday, November 12, 13 ®
  • 42. Performance - Lock Refinement 42 Tuesday, November 12, 13 ®
  • 43. Performance - Lock Refinement + Reduced IO • Sysbench benchmark (> RAM) 43 Tuesday, November 12, 13 ®
  • 44. Performance - Reduced IO – Indexed insertion benchmark 44 Tuesday, November 12, 13 ®
  • 45. Performance - Clustered Indexes • Clustered secondary indexes • Additional copy of the document is stored in the index • No additional IO to get row data from primary key • Think better covered index (all non-indexed fields) • Good for point queries, great for range scans • Compression eliminates size concerns 45 Tuesday, November 12, 13 ®
  • 46. Performance - Memory Management • Two approaches to memory management – MongoDB = memory-mapped files o Operating system determines what data is important – TokuMX = managed cache o User defined size o TokuMX determines what data is important • Run multiple TokuMX instances on a single server – Each has it’s own fixed cache size 46 Tuesday, November 12, 13 ®
  • 48. Compression • MongoDB does not offer compression – Compressed file systems? – Shortened field names? o Remember: each field name is stored in every single document • TokuMX easily achieves 5x-10x compression – Buy less disk or flash – Compressed reads and writes reduce overall IO • TokuMX support 3 compression types – zlib, quicklz, lzma (size vs. speed) – all data is compressed • Use descriptive field names! – They are easy to compress 48 Tuesday, November 12, 13 ®
  • 49. Compression • 31 million documents, bit torrent peer data – http://cs.brown.edu/~pavlo/torrent/ 49 Tuesday, November 12, 13 ®
  • 51. ACID + MVCC • ACID – In MongoDB, multi-insertion operations allow for partial success o Asked to store 5 documents, 3 succeeded – We offer “all or nothing” behavior – Document level locking • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader – TokuMX offers MVCC o Reads are consistent as of the operation start 51 Tuesday, November 12, 13 ®
  • 52. Multi-statement Transactions • TokuMX brings the following to MongoDB – db.runCommand({“beginTransaction”, “isolation”: “mvcc”}) – ... perform 1 or more operations – db.runCommand(“rollbackTransaction”) | db.runCommand(“commitTransaction”) • Not allowed in sharded environments – mongos will reject 52 Tuesday, November 12, 13 ®