SlideShare a Scribd company logo
B+Tree Indexes and InnoDB
Ovais Tariq
Percona Live London 2011
www.percona.com
Agenda
•  Why Index
•  What is a B+Tree
•  B+Tree Characteristics
•  Cost Estimation Formulae
•  A Few Advantages
•  B+Tree Index in InnoDB
•  Primary and Secondary B+Tree Indexes
•  Characteristics of an Ideal Primary Index
www.percona.com
Agenda (Cont ..)
•  In-order INSERTs vs Random-order INSERTs
•  Composite Indexes
•  B+Tree and Index Prefix
•  Index Selectivity
•  Speeding up Secondary Indexes
•  Tips and Take-away
•  Percona Live London Sponsors
•  Annual MySQL Users Conference
www.percona.com
Why Index?
•  Linear search is very slow, complexity is O(n)
•  Indexes are used in variety of DBMS
•  Many different type of indexes
•  Hash Indexes (only MEMORY SE and NDB)
•  Bitmap Indexes (not available in MySQL)
•  BTree Indexes and derivates (MyISAM, InnoDB)
•  Indexes improve search performance
•  But add extra cost to INSERT/UPDATE/
DELETE
www.percona.com
What is a B+Tree
•  A generalized version of Binary Search Tree
http://en.wikipedia.org/wiki/Binary_search_tree
•  Classic disk based structure for indexing
records based on an ordered key set
•  Reading a single record from a very large table,
results in only a few pages of data being read
•  Any index structure other then B+Tree is
subject to overflow
www.percona.com
B+Tree Characteristics
•  Every node can have p – 1 key values and p
node pointers (p is called the order of the tree)
•  The leaf node contains data, internal nodes are
only used as guides
•  The leaf nodes are connected together as
doubly linked list
•  Keys are stored in the nodes in sorted order
•  All leaf nodes are at the same height, that’s
why it’s called a balanced tree
www.percona.com
Typical B+Tree structure
www.percona.com
Cost Estimation Formulae
•  Some Assumptions
•  Cost Calculations
•  A Few Extra Considerations
www.percona.com
Some Assumptions
•  h is the height of the tree
•  p is the branching factor of the tree
•  n is the number of rows in a table
•  p = (page size in bytes/key length in bytes) + 1
•  h > log n / log p
www.percona.com
Cost Calculations
•  Search cost for a single row
•  S = h I/O ops
•  Update cost for a single row
•  U = search cost + rewrite data page = h + 1 I/O ops
•  Insert cost for a single row
•  I = search cost + rewrite index page + rewrite data
page
•  I = h + 1 + 1 = h + 2 I/O ops
www.percona.com
Cost Calculations (Cont ..)
•  Delete cost for a single row
•  D = search cost + rewrite index page + rewrite data
page
•  D = h + 1 + 1 = h + 2 I/O ops
www.percona.com
A Few Extra Considerations
•  Updates are in place only if the new data is of
the same size, otherwise its delete plus insert
•  Inserts may require splits if the leaf node is full
•  Occasionally the split of a leaf node
necessitates split of the next higher node
•  In worst case scenarios the split may cascade
all the way up to the root node
•  Deletions may result in emptying a node that
necessitates the consolidation of two nodes
www.percona.com
A Few Advantages
•  Reduced I/O
•  Reduced Rebalancing
•  Extremely efficient range scans
•  Implicit sorting
www.percona.com
Reduced I/O
•  Height of a B+Tree is very small (and has a
very large branching factor)
•  Generally every node in a tree corresponds to a
page of data (page size ranges from 211 to 214
bytes)
•  A node read = read a page = 1 random I/O
•  So to reach leaf node, we need to read h pages
•  No matter if requested row is at the start or end
of table, same number of I/O is needed
www.percona.com
Reduced Rebalancing
•  A tree needs rebalancing after an insertion or
deletion
•  B+Tree is wide, more keys can fit in node, so
rebalancing needed few times on insertions
and deletions
•  Note that rebalancing means extra I/O, so
rebalancing saved is I/O saved
www.percona.com
Extremely Efficient Range Scans
•  Leaf nodes are linked together as doubly linked
list
•  So need to traverse from root -> leaf just once
•  Move from leaf -> leaf until you reach the end
of range
•  Entire tree may be scanned without visiting the
higher nodes at all
www.percona.com
Implicit Sorting
•  Nodes contain keys sorted in key-order
•  Therefore records can be implicitly returned in
sorted order
•  No external sorting needed hence memory and
CPU cycles saved
•  Sometimes sorted data cannot fit into buffer,
and data needs to be sorted in passes, needing
I/O, which can be avoided if you need data in
key order
www.percona.com
B+Tree Index in InnoDB
•  B+Tree Index in InnoDB is a typical B+Tree
structure, no strings attached!
•  Leaf nodes contain the data (what the data is
depends whether it’s a Primary Index or a
Secondary Index)
•  Root nodes and internal nodes contain only key
values
www.percona.com
A Typical Index
EMP_NO
1 to 1000000
EMP_NO
1 to 500000
Employee details
for
EMP_NO
1 to 1000
Employee details
for
EMP_NO
1001 to 2000
EMP_NO
500001 to 1000000
Employee details
for
EMP_NO
998001 to 999000
Employee details
for
EMP_NO
999001 to 1000000
Root node
Internal nodes
Leaf nodes
www.percona.com
Primary and Secondary B+Tree
Indexes
•  Primary index holds the entire row data in its
leaf nodes
•  Primary index can also be called a clustered
index, because data is clustered around PK
values
•  A single PK per table means, a single clustered
index per table
•  Secondary Indexes have the key values and
PK values in the index and no row data
www.percona.com
Primary and Secondary B+Tree
Indexes (Cont .. )
•  PK values stored in the leaf nodes of a
secondary index act as pointer to the data
•  This means secondary index lookups are two
lookups
•  Cost of secondary index lookup
•  C = Height of Secondary Index B+Tree + Height of
Primary Index B+Tree
www.percona.com
A Typical Secondary Index
FIRST_NAME
A to Z
FIRST_NAME
A to M
FIRST_NAME
&
PK Column values
(Anna, 1) …
FIRST_NAME
&
PK Column values
(Jacob, 10000) …
FIRST_NAME
N to Z
FIRST_NAME
&
PK Column values
(Nathan, 100000) …
FIRSY_NAME
&
PK Column values
(Zita, 1000000) …
EMP_NO
1 to 1000000
EMP_NO
1 to 500000
Employee details
for
EMP_NO
1 to 1000
Employee details
for
EMP_NO
1001 to 2000
EMP_NO
500001 to 1000000
Employee details
for
EMP_NO
998001 to 999000
Employee details
for
EMP_NO
999001 to 1000000
Secondary Index
Primary Index
www.percona.com
Characteristics of an Ideal
Primary Index
•  Create primary index on column(s) that are not
updated too often
•  Keep the size of the primary index as small as
possible
•  Select the column(s) to create primary index
on, that have sequentially increasing value
•  Random value columns, such as those that
store UUID, are very bad candidates for
primary index
www.percona.com
In-order INSERTs vs Random-
order INSERTs
•  In-order INSERTs result in good page fill
percentage, meaning InnoDB can keep on
inserting in the same page till its full
•  Good insert speed, good page fill percentage
•  Reduced page and extent fragmentation
Leaf Page 001
(1, ‘Alister’)
(2, ‘Anna’)
…..
(100, ‘Cathy’)
Leaf Page 002
(101, ‘Celvin’)
(102, ‘Donald’)
…..
(200, ‘Frank’)
Leaf Page 100
(10001, ‘Stan’)
(10002, ‘Steve’)
…..
(10100, ‘Suzzan’)
www.percona.com
In-order INSERTs vs Random-
order INSERTs (Cont ..)
•  Random-order inserts introduce overhead
•  Result in page and extent fragmentation
•  Bad insert speed and bad page fill percentage
(resulting in wasted space)
•  Data is not actually physically clustered
together
•  Scanning ranges do not result in pages read in
sequential order
•  That is why UUID is not a good PK candidate
www.percona.com
In-order INSERTs vs Random-
order INSERTs (Cont ..)
Leaf Page 101
(10101, ‘Alister’)
(10102, ‘Anna’)
…..
(10200, ‘Cathy’)
Leaf Page 96
(10201, ‘Celvin’)
(10202, ‘Donald’)
…..
(10300, ‘Frank’)
Leaf Page 102
(10301, ‘Stan’)
(10302, ‘Steve’)
…..
(10400, ‘Suzzan’)
Leaf Page 101
(10101, ‘Alister’)
Leaf Page 102
(10201, ‘Celvin’)
(10202, ‘Donald’)
Leaf Page 103
(10301, ‘Stan’)
(10302, ‘Steve’)
…..
(10400, ‘Suzzan’)
Extent Fragmentation
Page Fragmentation
www.percona.com
Example Schema
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL,
PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
www.percona.com
Composite Indexes
•  A single index can be defined on more than
one column
•  The index key is composed of more than one
key value
•  Let’s consider a query
•  SELECT emp_no, first_name, last_name
FROM employees
WHERE hire_date = '1985-03-22'
AND last_name = 'Peek';
www.percona.com
Composite Indexes (Cont ..)
•  One way of indexing is to create two separate
indexes on hire_date and last_name columns
•  Search cost would be
•  S = h(hire_date) + h(last_name) + Merge and
Intersect cost
•  S = h1 + h2 I/O ops + Merge and Intersect cost
•  Consider if you create a composite index
(hire_date, last_name)
•  The cost of composite index is one lookup
www.percona.com
Composite Indexes (Cont ..)
•  Composite index will not require extra merge-
intersect step, hence saving memory and CPU
cycles
•  To generalize, if a composite index has k
columns, then equivalent cost in single value
indexes is k index lookups
•  Similarly, k single value indexes will need more
pages to be read into memory
www.percona.com
Composite Indexes (Cont ..)
•  Suppose you have
•  1000 rows match hire_date = '1985-03-22’
•  1000 rows that match last_name = 'Peek’
•  4 rows that match both conditions
•  See less pages to load into memory when
using composite index
www.percona.com
B+Tree and Index Prefix
•  By design B+Tree can only work with filters that
filter at least on the prefix of the index
•  So an index idx(first_name, last_name) can
only be used for following searches
•  SELECT … WHERE first_name = x AND last_name = y
•  SELECT … WHERE first_name = x
•  And cannot be used for the following search
•  SELECT … WHERE last_name = y
www.percona.com
B+Tree and Index Prefix (Cont ..)
•  Same rule also hold for single column indexes
•  So an index idx(first_name) can only be used
for following searches
•  SELECT … WHERE first_name LIKE ‘ova%’
•  But cannot be used for following
•  SELECT … WHERE first_name LIKE ‘%ais’
•  If an index cannot be used that means a table
scan
www.percona.com
Index Selectivity
•  What is selectivity?
•  Selectivity = unique values / total no. of records
•  Primary Index is the most selective index
•  Suppose you index a column that stores
gender, meaning only two distinct values
•  Remember secondary index only store a
pointer to the data in the primary index
•  Indexing a gender column means each key
value with thousands of PK pointers
www.percona.com
Index Selectivity (Cont ..)
•  Each pointer lookup will be a random PK
lookup
•  Its much better to scan the PK in order and
filter by gender
•  But you can improve the selectivity of a column
by combining it with other columns and creating
a composite index
www.percona.com
Speeding up Secondary Indexes
•  Remember secondary indexes only store PK
pointers meaning two index lookups
•  Performance can be dramatically improved if
we avoid extra PK lookups
•  The trick is to include all the columns queried,
in the definition of the secondary index
•  Example query
•  SELECT emp_no, first_name, last_name
FROM employees WHERE hire_date = '1985-03-22'
AND last_name = 'Peek';
www.percona.com
Speeding up Secondary Indexes
(Cont ..)
•  Originally the index is idx(hire_date, last_name)
•  Let’s try to modify the index
•  No need to add the emp_no column as its PK
•  Add first_name column to right of index definition
•  idx(hire_date, last_name, first_name)
•  Note we add the column to the right of index
definition, remember B+Tree can only filter on
prefix of index
•  This is known as covering index optimization
www.percona.com
Tips and Take-away
•  Indexing should always be used to speed up
access
•  Index trade-off analysis can be done easily
using the cost estimation formulae discussed
•  Select optimal data types for columns,
especially ones that are to be indexed – int vs
bigint
•  When selecting columns for PK, select those
that would make the PK short, sequential and
with few updates
www.percona.com
Tips and Take-away (Cont ..)
•  Avoid using UUID style PK definitions
•  Insert speed is best when you insert in PK
order
•  When creating index on string columns, you
don’t need to index the entire column, you can
index a prefix of the column – idx(str_col(4))
•  B+Tree indexes are only suitable for columns
with good selectivity
•  Don’t shy away from creating composite
indexes
www.percona.com
Percona Live London Sponsors
Platinum Sponsor
Gold Sponsor
Silver Sponsors
www.percona.com
Percona Live London Sponsors
Exhibitor Sponsors
Friends of Percona Sponsors
Media Sponsors
www.percona.com
Annual MySQL Users Conference
Presented by Percona Live
The Hyatt Regency Hotel, Santa Clara, CA
April 10th-12th, 2012
Featured Speakers
Mark Callaghan, Facebook
Jeremy Zawodny, Craigslist
Marten Mickos, Eucalyptus Systems
Sarah Novotny, Blue Gecko
Peter Zaitsev, Percona
Baron Schwartz, Percona
The Call for Papers is Now Open!
Visit www.percona.com/live/mysql-conference-2012/
ovais.tariq@percona.com
@ovaistariq
We're Hiring! www.percona.com/about-us/careers/
www.percona.com/live

More Related Content

What's hot

InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
I Goo Lee.
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
Efficient Use of indexes in MySQL
Efficient Use of indexes in MySQLEfficient Use of indexes in MySQL
Efficient Use of indexes in MySQL
Aleksandr Kuzminsky
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
Karwin Software Solutions LLC
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
NeoClova
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
Mydbops
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
YoungHeon (Roy) Kim
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
Todd Palino
 
Local Apache NiFi Processor Debug
Local Apache NiFi Processor DebugLocal Apache NiFi Processor Debug
Local Apache NiFi Processor Debug
Deon Huang
 
Sga internals
Sga internalsSga internals
Sga internals
sergkosko
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
MIJIN AN
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking VN
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
Tuyen Vuong
 
Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)
Angel Boy
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
Command Prompt., Inc
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
Mydbops
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
Scott Leberknight
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
ASM
ASMASM

What's hot (20)

InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)InnoDB MVCC Architecture (by 권건우)
InnoDB MVCC Architecture (by 권건우)
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
Efficient Use of indexes in MySQL
Efficient Use of indexes in MySQLEfficient Use of indexes in MySQL
Efficient Use of indexes in MySQL
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바MySQL Advanced Administrator 2021 - 네오클로바
MySQL Advanced Administrator 2021 - 네오클로바
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 
URP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to KnowURP? Excuse You! The Three Kafka Metrics You Need to Know
URP? Excuse You! The Three Kafka Metrics You Need to Know
 
Local Apache NiFi Processor Debug
Local Apache NiFi Processor DebugLocal Apache NiFi Processor Debug
Local Apache NiFi Processor Debug
 
Sga internals
Sga internalsSga internals
Sga internals
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 Grokking Techtalk #39: How to build an event driven architecture with Kafka ... Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
 
Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)Windows 10 Nt Heap Exploitation (English version)
Windows 10 Nt Heap Exploitation (English version)
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
ASM
ASMASM
ASM
 

Similar to B+Tree Indexes and InnoDB

Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Ontico
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
Ami Ranjit
 
Index management in shallow depth
Index management in shallow depthIndex management in shallow depth
Index management in shallow depth
Andrea Giuliano
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
MYXPLAIN
 
SAG_Indexing and Query Optimization
SAG_Indexing and Query OptimizationSAG_Indexing and Query Optimization
SAG_Indexing and Query Optimization
Vaibhav Jain
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
Hitesh Kumar Markam
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
BADR
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
sathish sak
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
Antonios Chatzipavlis
 
Ils on a shoe string budget
Ils on a shoe string budgetIls on a shoe string budget
Ils on a shoe string budget
Jolene81
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
ColdFusionConference
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Spark Summit
 
Mysql query optimization
Mysql query optimizationMysql query optimization
Mysql query optimization
Baohua Cai
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
Amazon Web Services
 
PostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The FuturePostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The Future
Aaron Thul
 
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution PlansSqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
Marek Maśko
 
The Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search EngineThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Mehul Boricha
 
Hub102 - Lesson4 - Data Structure
Hub102 - Lesson4 - Data StructureHub102 - Lesson4 - Data Structure
Hub102 - Lesson4 - Data Structure
Tiểu Hổ
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 

Similar to B+Tree Indexes and InnoDB (20)

Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
Index management in shallow depth
Index management in shallow depthIndex management in shallow depth
Index management in shallow depth
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
SAG_Indexing and Query Optimization
SAG_Indexing and Query OptimizationSAG_Indexing and Query Optimization
SAG_Indexing and Query Optimization
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
 
Statistics and Indexes Internals
Statistics and Indexes InternalsStatistics and Indexes Internals
Statistics and Indexes Internals
 
Ils on a shoe string budget
Ils on a shoe string budgetIls on a shoe string budget
Ils on a shoe string budget
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
Mysql query optimization
Mysql query optimizationMysql query optimization
Mysql query optimization
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
PostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The FuturePostgreSQL 9.0 & The Future
PostgreSQL 9.0 & The Future
 
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution PlansSqlDay 2018 - Brief introduction into SQL Server Execution Plans
SqlDay 2018 - Brief introduction into SQL Server Execution Plans
 
The Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search EngineThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine
 
Hub102 - Lesson4 - Data Structure
Hub102 - Lesson4 - Data StructureHub102 - Lesson4 - Data Structure
Hub102 - Lesson4 - Data Structure
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 

Recently uploaded

Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 

Recently uploaded (20)

Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 

B+Tree Indexes and InnoDB

  • 1. B+Tree Indexes and InnoDB Ovais Tariq Percona Live London 2011
  • 2. www.percona.com Agenda •  Why Index •  What is a B+Tree •  B+Tree Characteristics •  Cost Estimation Formulae •  A Few Advantages •  B+Tree Index in InnoDB •  Primary and Secondary B+Tree Indexes •  Characteristics of an Ideal Primary Index
  • 3. www.percona.com Agenda (Cont ..) •  In-order INSERTs vs Random-order INSERTs •  Composite Indexes •  B+Tree and Index Prefix •  Index Selectivity •  Speeding up Secondary Indexes •  Tips and Take-away •  Percona Live London Sponsors •  Annual MySQL Users Conference
  • 4. www.percona.com Why Index? •  Linear search is very slow, complexity is O(n) •  Indexes are used in variety of DBMS •  Many different type of indexes •  Hash Indexes (only MEMORY SE and NDB) •  Bitmap Indexes (not available in MySQL) •  BTree Indexes and derivates (MyISAM, InnoDB) •  Indexes improve search performance •  But add extra cost to INSERT/UPDATE/ DELETE
  • 5. www.percona.com What is a B+Tree •  A generalized version of Binary Search Tree http://en.wikipedia.org/wiki/Binary_search_tree •  Classic disk based structure for indexing records based on an ordered key set •  Reading a single record from a very large table, results in only a few pages of data being read •  Any index structure other then B+Tree is subject to overflow
  • 6. www.percona.com B+Tree Characteristics •  Every node can have p – 1 key values and p node pointers (p is called the order of the tree) •  The leaf node contains data, internal nodes are only used as guides •  The leaf nodes are connected together as doubly linked list •  Keys are stored in the nodes in sorted order •  All leaf nodes are at the same height, that’s why it’s called a balanced tree
  • 8. www.percona.com Cost Estimation Formulae •  Some Assumptions •  Cost Calculations •  A Few Extra Considerations
  • 9. www.percona.com Some Assumptions •  h is the height of the tree •  p is the branching factor of the tree •  n is the number of rows in a table •  p = (page size in bytes/key length in bytes) + 1 •  h > log n / log p
  • 10. www.percona.com Cost Calculations •  Search cost for a single row •  S = h I/O ops •  Update cost for a single row •  U = search cost + rewrite data page = h + 1 I/O ops •  Insert cost for a single row •  I = search cost + rewrite index page + rewrite data page •  I = h + 1 + 1 = h + 2 I/O ops
  • 11. www.percona.com Cost Calculations (Cont ..) •  Delete cost for a single row •  D = search cost + rewrite index page + rewrite data page •  D = h + 1 + 1 = h + 2 I/O ops
  • 12. www.percona.com A Few Extra Considerations •  Updates are in place only if the new data is of the same size, otherwise its delete plus insert •  Inserts may require splits if the leaf node is full •  Occasionally the split of a leaf node necessitates split of the next higher node •  In worst case scenarios the split may cascade all the way up to the root node •  Deletions may result in emptying a node that necessitates the consolidation of two nodes
  • 13. www.percona.com A Few Advantages •  Reduced I/O •  Reduced Rebalancing •  Extremely efficient range scans •  Implicit sorting
  • 14. www.percona.com Reduced I/O •  Height of a B+Tree is very small (and has a very large branching factor) •  Generally every node in a tree corresponds to a page of data (page size ranges from 211 to 214 bytes) •  A node read = read a page = 1 random I/O •  So to reach leaf node, we need to read h pages •  No matter if requested row is at the start or end of table, same number of I/O is needed
  • 15. www.percona.com Reduced Rebalancing •  A tree needs rebalancing after an insertion or deletion •  B+Tree is wide, more keys can fit in node, so rebalancing needed few times on insertions and deletions •  Note that rebalancing means extra I/O, so rebalancing saved is I/O saved
  • 16. www.percona.com Extremely Efficient Range Scans •  Leaf nodes are linked together as doubly linked list •  So need to traverse from root -> leaf just once •  Move from leaf -> leaf until you reach the end of range •  Entire tree may be scanned without visiting the higher nodes at all
  • 17. www.percona.com Implicit Sorting •  Nodes contain keys sorted in key-order •  Therefore records can be implicitly returned in sorted order •  No external sorting needed hence memory and CPU cycles saved •  Sometimes sorted data cannot fit into buffer, and data needs to be sorted in passes, needing I/O, which can be avoided if you need data in key order
  • 18. www.percona.com B+Tree Index in InnoDB •  B+Tree Index in InnoDB is a typical B+Tree structure, no strings attached! •  Leaf nodes contain the data (what the data is depends whether it’s a Primary Index or a Secondary Index) •  Root nodes and internal nodes contain only key values
  • 19. www.percona.com A Typical Index EMP_NO 1 to 1000000 EMP_NO 1 to 500000 Employee details for EMP_NO 1 to 1000 Employee details for EMP_NO 1001 to 2000 EMP_NO 500001 to 1000000 Employee details for EMP_NO 998001 to 999000 Employee details for EMP_NO 999001 to 1000000 Root node Internal nodes Leaf nodes
  • 20. www.percona.com Primary and Secondary B+Tree Indexes •  Primary index holds the entire row data in its leaf nodes •  Primary index can also be called a clustered index, because data is clustered around PK values •  A single PK per table means, a single clustered index per table •  Secondary Indexes have the key values and PK values in the index and no row data
  • 21. www.percona.com Primary and Secondary B+Tree Indexes (Cont .. ) •  PK values stored in the leaf nodes of a secondary index act as pointer to the data •  This means secondary index lookups are two lookups •  Cost of secondary index lookup •  C = Height of Secondary Index B+Tree + Height of Primary Index B+Tree
  • 22. www.percona.com A Typical Secondary Index FIRST_NAME A to Z FIRST_NAME A to M FIRST_NAME & PK Column values (Anna, 1) … FIRST_NAME & PK Column values (Jacob, 10000) … FIRST_NAME N to Z FIRST_NAME & PK Column values (Nathan, 100000) … FIRSY_NAME & PK Column values (Zita, 1000000) … EMP_NO 1 to 1000000 EMP_NO 1 to 500000 Employee details for EMP_NO 1 to 1000 Employee details for EMP_NO 1001 to 2000 EMP_NO 500001 to 1000000 Employee details for EMP_NO 998001 to 999000 Employee details for EMP_NO 999001 to 1000000 Secondary Index Primary Index
  • 23. www.percona.com Characteristics of an Ideal Primary Index •  Create primary index on column(s) that are not updated too often •  Keep the size of the primary index as small as possible •  Select the column(s) to create primary index on, that have sequentially increasing value •  Random value columns, such as those that store UUID, are very bad candidates for primary index
  • 24. www.percona.com In-order INSERTs vs Random- order INSERTs •  In-order INSERTs result in good page fill percentage, meaning InnoDB can keep on inserting in the same page till its full •  Good insert speed, good page fill percentage •  Reduced page and extent fragmentation Leaf Page 001 (1, ‘Alister’) (2, ‘Anna’) ….. (100, ‘Cathy’) Leaf Page 002 (101, ‘Celvin’) (102, ‘Donald’) ….. (200, ‘Frank’) Leaf Page 100 (10001, ‘Stan’) (10002, ‘Steve’) ….. (10100, ‘Suzzan’)
  • 25. www.percona.com In-order INSERTs vs Random- order INSERTs (Cont ..) •  Random-order inserts introduce overhead •  Result in page and extent fragmentation •  Bad insert speed and bad page fill percentage (resulting in wasted space) •  Data is not actually physically clustered together •  Scanning ranges do not result in pages read in sequential order •  That is why UUID is not a good PK candidate
  • 26. www.percona.com In-order INSERTs vs Random- order INSERTs (Cont ..) Leaf Page 101 (10101, ‘Alister’) (10102, ‘Anna’) ….. (10200, ‘Cathy’) Leaf Page 96 (10201, ‘Celvin’) (10202, ‘Donald’) ….. (10300, ‘Frank’) Leaf Page 102 (10301, ‘Stan’) (10302, ‘Steve’) ….. (10400, ‘Suzzan’) Leaf Page 101 (10101, ‘Alister’) Leaf Page 102 (10201, ‘Celvin’) (10202, ‘Donald’) Leaf Page 103 (10301, ‘Stan’) (10302, ‘Steve’) ….. (10400, ‘Suzzan’) Extent Fragmentation Page Fragmentation
  • 27. www.percona.com Example Schema CREATE TABLE `employees` ( `emp_no` int(11) NOT NULL, `birth_date` date NOT NULL, `first_name` varchar(14) NOT NULL, `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` date NOT NULL, PRIMARY KEY (`emp_no`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  • 28. www.percona.com Composite Indexes •  A single index can be defined on more than one column •  The index key is composed of more than one key value •  Let’s consider a query •  SELECT emp_no, first_name, last_name FROM employees WHERE hire_date = '1985-03-22' AND last_name = 'Peek';
  • 29. www.percona.com Composite Indexes (Cont ..) •  One way of indexing is to create two separate indexes on hire_date and last_name columns •  Search cost would be •  S = h(hire_date) + h(last_name) + Merge and Intersect cost •  S = h1 + h2 I/O ops + Merge and Intersect cost •  Consider if you create a composite index (hire_date, last_name) •  The cost of composite index is one lookup
  • 30. www.percona.com Composite Indexes (Cont ..) •  Composite index will not require extra merge- intersect step, hence saving memory and CPU cycles •  To generalize, if a composite index has k columns, then equivalent cost in single value indexes is k index lookups •  Similarly, k single value indexes will need more pages to be read into memory
  • 31. www.percona.com Composite Indexes (Cont ..) •  Suppose you have •  1000 rows match hire_date = '1985-03-22’ •  1000 rows that match last_name = 'Peek’ •  4 rows that match both conditions •  See less pages to load into memory when using composite index
  • 32. www.percona.com B+Tree and Index Prefix •  By design B+Tree can only work with filters that filter at least on the prefix of the index •  So an index idx(first_name, last_name) can only be used for following searches •  SELECT … WHERE first_name = x AND last_name = y •  SELECT … WHERE first_name = x •  And cannot be used for the following search •  SELECT … WHERE last_name = y
  • 33. www.percona.com B+Tree and Index Prefix (Cont ..) •  Same rule also hold for single column indexes •  So an index idx(first_name) can only be used for following searches •  SELECT … WHERE first_name LIKE ‘ova%’ •  But cannot be used for following •  SELECT … WHERE first_name LIKE ‘%ais’ •  If an index cannot be used that means a table scan
  • 34. www.percona.com Index Selectivity •  What is selectivity? •  Selectivity = unique values / total no. of records •  Primary Index is the most selective index •  Suppose you index a column that stores gender, meaning only two distinct values •  Remember secondary index only store a pointer to the data in the primary index •  Indexing a gender column means each key value with thousands of PK pointers
  • 35. www.percona.com Index Selectivity (Cont ..) •  Each pointer lookup will be a random PK lookup •  Its much better to scan the PK in order and filter by gender •  But you can improve the selectivity of a column by combining it with other columns and creating a composite index
  • 36. www.percona.com Speeding up Secondary Indexes •  Remember secondary indexes only store PK pointers meaning two index lookups •  Performance can be dramatically improved if we avoid extra PK lookups •  The trick is to include all the columns queried, in the definition of the secondary index •  Example query •  SELECT emp_no, first_name, last_name FROM employees WHERE hire_date = '1985-03-22' AND last_name = 'Peek';
  • 37. www.percona.com Speeding up Secondary Indexes (Cont ..) •  Originally the index is idx(hire_date, last_name) •  Let’s try to modify the index •  No need to add the emp_no column as its PK •  Add first_name column to right of index definition •  idx(hire_date, last_name, first_name) •  Note we add the column to the right of index definition, remember B+Tree can only filter on prefix of index •  This is known as covering index optimization
  • 38. www.percona.com Tips and Take-away •  Indexing should always be used to speed up access •  Index trade-off analysis can be done easily using the cost estimation formulae discussed •  Select optimal data types for columns, especially ones that are to be indexed – int vs bigint •  When selecting columns for PK, select those that would make the PK short, sequential and with few updates
  • 39. www.percona.com Tips and Take-away (Cont ..) •  Avoid using UUID style PK definitions •  Insert speed is best when you insert in PK order •  When creating index on string columns, you don’t need to index the entire column, you can index a prefix of the column – idx(str_col(4)) •  B+Tree indexes are only suitable for columns with good selectivity •  Don’t shy away from creating composite indexes
  • 40. www.percona.com Percona Live London Sponsors Platinum Sponsor Gold Sponsor Silver Sponsors
  • 41. www.percona.com Percona Live London Sponsors Exhibitor Sponsors Friends of Percona Sponsors Media Sponsors
  • 42. www.percona.com Annual MySQL Users Conference Presented by Percona Live The Hyatt Regency Hotel, Santa Clara, CA April 10th-12th, 2012 Featured Speakers Mark Callaghan, Facebook Jeremy Zawodny, Craigslist Marten Mickos, Eucalyptus Systems Sarah Novotny, Blue Gecko Peter Zaitsev, Percona Baron Schwartz, Percona The Call for Papers is Now Open! Visit www.percona.com/live/mysql-conference-2012/