SlideShare a Scribd company logo
1 of 55
Download to read offline
<Insert Picture Here>

The InnoDB Storage Engine for MySQL

Morgan Tocker, MySQL Community Manager

http://www.tocker.ca/

Safe Harbor Statement
The	
  following	
  is	
  intended	
  to	
  outline	
  our	
  general	
  product	
  direction.	
  
It	
  is	
  intended	
  for	
  information	
  purposes	
  only,	
  and	
  may	
  not	
  be	
  
incorporated	
  into	
  any	
  contract.	
  It	
  is	
  not	
  a	
  commitment	
  to	
  deliver	
  
any	
  material,	
  code,	
  or	
  functionality,	
  and	
  should	
  not	
  be	
  relied	
  upon	
  
in	
  making	
  purchasing	
  decisions.	
  



The	
  development,	
  release,	
  and	
  timing	
  of	
  any	
  features	
  or	
  
functionality	
  described	
  for	
  Oracle’s	
  products	
  remains	
  at	
  the	
  sole	
  
discretion	
  of	
  Oracle.
4 Years of MySQL Innovation
MySQL Cluster 7.3

MySQL Workbench 6.0

MySQL Migration Wizard
<Insert Picture Here>

MySQL 5.5

MySQL 5.6

Windows installer & Tools MySQL 5.7 M y S Q L
Cluster
MySQL
MySQL Enterprise Monitor 2.3 & e r
M a n a g 3.0
Applier for
Hadoop

MySQL Enterprise Backup
Security
MySQL Utilities
MySQL Workbench 5.2 & 6.0
Scalability
MySQL Cluster 7.2
HA
MySQL Enterprise
MySQL Cluster 7.1
O r a c l e C e r t i f i c a t i o n s Audit
Hello and Welcome!
• I will be talking about InnoDB’s internal behaviour.
• Not talking (much) about MySQL.
• Aim of this talk is to give you X-ray vision.
• i.e. not so many direct takeaways, but one day it
will help you debug a problem.
Prerequisites

Copyright	
  ©	
  2012	
  Oracle	
  and/or	
  its	
  affiliates.	
  All	
  rights	
  reserved.
MySQL Architecture
IO Performance
L1 cache reference
Branch mispredict
L2 cache reference
Mutex lock/unlock
Main memory reference
Compress 1K bytes with Zippy
Send 2K bytes over 1 Gbps network
Read 1 MB sequentially from memory
Round trip within same datacenter
Disk seek
Read 1 MB sequentially from disk
Send packet CA->Netherlands->CA

0.5 ns!
5 ns!
7 ns!
25 ns!
100 ns!
3,000 ns!
20,000 ns!
250,000 ns!
500,000 ns!
10,000,000 ns!
20,000,000 ns!
150,000,000 ns

See: http://www.linux-mag.com/cache/7589/1.html and Google http://
www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
IO Performance (cont.)
• 5-10ms per disk IO.
• Maybe 50us for a high end SSD.
• Still not “memory speed”.
Buffered IO
• Operating Systems compensate well already.
• Reads are cached with free memory.
• Writes don’t happen instantly.
• A step is introduced to rewrite and merge.
Block 9, 10, 1, 4, 200, 5.

Block 1, 4, 5, 9, 10, 200
fsync

Synopsis
#include <unistd.h>
int fsync(int fd);
int fdatasync(int fd);

!

Description
fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by
the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks
until the device reports that the transfer has completed. It also flushes metadata information associated with the file
(see stat(2)).
Basic	
  Operation

Copyright	
  ©	
  2012	
  Oracle	
  and/or	
  its	
  affiliates.	
  All	
  rights	
  reserved.
Buffer Pool

Data Dictionary Cache

Buffer Pool Flush List

Adaptive Hash Indexes

Log Group
ibdata1
space 0

Storage

Buffer Pool LRU

Page Cache

Additional Mem Pool

Log Buffer

Transaction
System

Caching

InnoDB High Level Overview

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd

In Memory
On Disk
Query (pages not in buffer pool)
mysqld

SELECT * FROM a
WHERE id = 10;

Buffer Pool

Buffer Pool LRU

Adaptive Hash Indexes

Data Dictionary Cache

Buffer Pool Flush List

Page Cache

Log Group
ibdata1
space 0

Storage

InnoDB
Additional Mem Pool

Log Buffer

Transaction
System

Caching

Not Found

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
Query (pages in buffer pool)
mysqld

SELECT * FROM a
WHERE id = 10;

Buffer Pool

Data Dictionary Cache

Buffer Pool Flush List

Adaptive Hash Indexes

Log Group
ibdata1
space 0

Storage

Buffer Pool LRU

Page Cache

Additional Mem Pool

Log Buffer

Transaction
System

Caching

InnoDB

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
Update Query in a Transaction (simplified)
mysqld

UPDATE a SET col1 =
‘new’ WHERE id = 10;
commit;

Buffer Pool

Data Dictionary Cache

Buffer Pool Flush List

Adaptive Hash Indexes

Log Group
ibdata1
space 0

Storage

Buffer Pool LRU

Page Cache

Additional Mem Pool

Log Buffer

Transaction
System

Caching

InnoDB

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
Log files
• Provide recovery.
• Only written to in regular operation.
• Read only required if there is a crash.
• Are rewritten over-and-over again.
• Think of it like a tank tread.
Log files (cont.)
• Are an optimization!
• 512B aligned sequential writes.
• Tablespace writes are 16KiB random writes.
• Tablespace writes to same pages in close time
window can be merged.
• Just need a large enough log file.
Checkpoint (Background Activity)
mysqld

(nothing)

Buffer Pool

Data Dictionary Cache

Buffer Pool Flush List

Adaptive Hash Indexes

Log Group
ibdata1
space 0

Storage

Buffer Pool LRU

Page Cache

Additional Mem Pool

Log Buffer

Transaction
System

Caching

InnoDB

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
FAQ
• Q: What do we write to the log - is it committed data
only, or can we write uncommitted data as well?
• A: Both.
FAQ
• Q: How do you unapply transactions?
• A: UNDO space.

Think of it like a hidden table internally stored in
ibdata1.
Update Query (More Accurate*)
mysqld

UPDATE a SET col1 =
‘new’ WHERE id = 10;
commit;

Page Cache

Buffer Pool

Data Dictionary Cache

Buffer Pool Flush List

Adaptive Hash Indexes

Additional Mem Pool

Redirect older version
Update any indexes
Modify row in place
of row to undo space
to hold both versions.
Log Group

ibdata1
space 0

Storage

Buffer Pool LRU

Page Cache

Log Buffer

Transaction
System

Caching

InnoDB

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
Update Query (cont.)
• Background purge process is able to clean old rows
from UNDO as soon as oldest transaction advances
forward.
Summarized Performance
Characteristics
• Log Files:
• Are short sequential writes.
• They permit InnoDB to delay tablespace writes enabling more merging/optimization.
• Buffer Pool:
• “In memory version of the tablespace”.
• Loading/unloading via modified LRU algorithm.
Index Structure
• Indexes and “data” in InnoDB are B+Trees.
• Clustered Index design means that data itself is
stored in an index.
Index Structure (cont.)

Empty root
Infimum
Level 0
Root

Page 3
Next
Record

Supremum
Index Structure (cont.)

Insert: 1
Infimum
Level 0
Root

Page size is 16KB
B* is 2KB

Page 3

1
B*

Supremum
Index Structure (cont.)

Insert: 1 to 7
I
Level 0
Root

S

Page 3

1
B*

2
B*

3
B*

4
B*

5
B*

6
B*

7
B*
Index Structure (cont.)
Insert: 8
Infimum

Page 3

Level 1
Root

1
4

I
Level 0
Leaf

Allocate new page and link in root
Move records to new page
Split new page

Supremum

S

Page 4

1
B*

2
B*

3
B*

4
B*

5
B*

6
B*

7
B*
Index Structure (cont.)
Insert: 8 (Cont.)
Infimum
Level 1
Root

1
4

I
Level 0
Leaf

S

Page 4

1
B*

Split at the middle of original page

2
B*

3
B*

Supremum

Page 3

4
5

I

S

Page 5

4
B*

5
B*

6
B*

7
B*
Index Structure (cont.)
Insert: 9 and 10
Infimum
Level 1
Root

1
4

I
Level 0
Leaf

S

Page 4

1
B*

2
B*

3
B*

Supremum

Page 3

4
5

I

S

Page 5

4
B*

5
B*

6
B*

7
B*

8
B*

9
B*

10
B*
Index Structure (cont.)
Insert: 11
Infimum
Level 1
Root

4
5

1
4

I
Level 0
Leaf

Page 3

S

Page 4

I

Supremum
11
6

S

Page 5

I

S
Page 6

1
B*

2
B*

3
B*

Insert leads to a split at the insertion point

4
B*

5
B*

6
B*

7
B*

8
B*

9
B*

10
B*

11
B*
Index Structure
Level 2
Root

Infimum

Level 1
Internal

Page 6

0
A

1
B

S

Page 4

≥0
→
6

I
Level 0
Leaf

Next
Record

≥0
→
4

I

S

Supremum

Page 3

≥4
→
5

I

Next Page

≥2
→
7

≥4
→
8

Prev Page

I

Page 7

2
C

3
D

S

I

Page 8

4
E

5
F

S

Page 5

S

≥6
→
9

I

Page 9

6
G

7
H

S
Page Format

0
38

FIL Header (38)
Other headers and page data,
depending on page type.
Total usable space: 16,338 bytes.

16376
16384

FIL Trailer (8)
Row Format

N-5

N-4

N-2
N
N+k
N+k+6
N+k+13
N+k+13+j

Variable field lengths (1-2 bytes per var. field)
Info Flags (4 bits)
Number of Records Owned (4 bits)
Order (13 bits)
Record Type (3 bits)
Next Record Offset (2)
Cluster Key Fields (k)
Transaction ID (6)
Roll Pointer (7)
Non-Key Fields (j)
Conclusion
• Page is basic unit of storage.
• Default is 16KiB
• Rows of variable length.
Two more useful features
• Adaptive hash - Partial hash index to accelerate
secondary key lookups.
• Change buffering - when non-unique indexes are
not in memory, changes can be temporarily buffered
until they are.
Query (by secondary key)
mysqld

SELECT * FROM a
WHERE b_key = 10;

Buffer Pool

Data Dictionary Cache

Buffer Pool Flush List

Adaptive Hash Indexes

Log Group
ibdata1
space 0

Storage

Buffer Pool LRU

Page Cache

Additional Mem Pool

Log Buffer

Transaction
System

Caching

InnoDB

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
Update Query (large table)
mysqld

UPDATE a SET col1 =
‘new’ WHERE id = 10;
commit;

Buffer Pool

Caching

InnoDB
Buffer Pool LRU

Adaptive Hash Indexes

Data Dictionary Cache

Buffer Pool Flush List

Page Cache

Additional Mem Pool

Not Required
Log Group
ibdata1
space 0

Storage

Transaction
System

Log Buffer

IBUF_HEADER
IBUF_TREE
TRX_SYS
FIRST_RSEG
DICT_HDR

iblogfile0

iblogfile1

Doublewrite Buffer

Data Dict.

Block 1 (64 pages)
Block 2 (64 pages)

SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

iblogfile2

Tables with
file_per_table
A.ibd
B.ibd
C.ibd
New	
  Features

Copyright	
  ©	
  2012	
  Oracle	
  and/or	
  its	
  affiliates.	
  All	
  rights	
  reserved.
MySQL 5.5+
•
•
•
•
•
•
•
•
•

IO Scalability
Async IO
Multiple Buffer Pools
Adaptive Flushing
Scan Resistant LRU
Compressed Pages
CPU Scalability
Improved Atomics
Spin Loops with PAUSE
MySQL 5.6+
• LRU Dump and
Restore
• Improved Group
Commit
• Fulltext Search
• Fast Read-only
Transactions
• Memcached Interface
• Information Schema
metadata tables

•
•
•
•

Persistent Statistics
Variable Page Size
Online DDL
Transportable
Tablespace
• Transactional
Replication

Using InnoDB
MySQL 5.7+
• Faster Temporary Tables
• Index Lock Contention Reduction
• More Online DDL
• Extend VARCHAR
• Rename Index
• Improved Read-Only Transactions
• Improved CPU Scalability
Configuration

Copyright	
  ©	
  2012	
  Oracle	
  and/or	
  its	
  affiliates.	
  All	
  rights	
  reserved.
The Top 3
1. innodb-buffer-pool-size
2. innodb-log-file-size
3. innodb_flush_log_at_trx_commit
innodb-buffer-pool-size
• Really only one major buffer/cache settings to set.
• Responsible for all pages types (data, indexes, undo,
insert buffer..)
innodb-buffer-pool-size (cont.)
• Recommendation is 50-80% of RAM.
• Default is 128M of RAM.
• Please allow 5-10% on top for other meta data to
grow.
innodb-log-file-size
• Log files are on disk, but this contributes to how
many unflushed (dirty) pages you can hold in
memory.
• In theory larger log files = longer crash recovery.
• In MySQL 5.5 -2G max.
• In MySQL 5.6 - 4G is usually safe.
• Early versions should be much smaller.
• Default is 48M Log Files.
innodb_flush_log_at_trx_commit
• Default is full ACID Compliance (=1)
• Can be set to 0/2 if you do not mind some data loss.
innodb_flush_log_at_trx_commit
• 0 = Log buffer written + synced once per second.
Nothing done at commit.
• 1 = Log buffer written + synced once per second +
written and synced on commit.
• 2 = Log buffer written + synced once per second +
written (not synced) on commit.
!
2 is a slightly safer version of 0.
Requires about 5-10% of buffer pool
size as overhead (not directly
configurable).

Basic Configuration

Buffer Pool

InnoDB
Buffer Pool LRU

Adaptive Hash Indexes

Additional Mem Pool

innodb_log_file_size. Typical
values 256M+. Default of 2 files
(innodb_log_files_in_group).
Log Group

iblogfile0

ibdata1
space 0

innodb_log_buffer_size. Typical
Doublewrite Buffer
values 1-8M. Flushing
IBUF_HEADER
IBUF_TREE
Block 1 (64 pages)
behaviour influenced by
TRX_SYS
Block 2 (64 pages)
FIRST_RSEG
innodb_flush_log_at_trx_commit.
DICT_HDR
Storage

Data Dictionary Cache

Buffer Pool Flush List

Page Cache

Log Buffer

Transaction
System

Caching

Innodb_buffer_pool_size.
Recommendation is 50-80% RAM.

iblogfile1

iblogfile2

Tables with
file_per_table

Data Dict.
SYS_TABLES
SYS_COLUMNS
SYS_INDEXES
SYS_FIELDS

A.ibd
B.ibd
C.ibd

innodb_file_per_table
(Default: ON in 5.6+)
The ~Top 10
1. innodb-buffer-pool-size
2. innodb-log-file-size
3. innodb-log-buffer-size
4. innodb_flush_log_at_trx_commit
5. innodb_flush_method
6. innodb_flush_neighbors
7. innodb_io_capacity, innodb_io_capacity_max,
innodb_lru_scan_depth
8. innodb-buffer-pool-instances
9. innodb_read_io_threads and innodb_write_io_threads
Less-likely to need configuration
•
•
•
•
•

innodb_thread_concurrency
innodb_concurrency_tickets
innodb_max_pct_dirty_pages
innodb_use_native_aio (always on)
innodb_old_blocks_time (5.6 default: 1000)
Deprecated Settings
• Typically “remove on sight” from config files:
• innodb_additional_mempool_size
• innodb_use_sys_malloc
Credits
• InnoDB Architecture Diagrams via https://github.com/
jeremycole/innodb_diagrams
• Available under (3-clause) BSD license

Copyright (c) 2013, Twitter, Inc.

Copyright (c) 2013, Jeremy Cole <jeremy@jcole.us>

Copyright (c) 2013, Davi Arnaut
<darnaut@gmail.com>
The InnoDB Storage Engine for MySQL

More Related Content

What's hot

The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL TuningPgDay.Seoul
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMarco Tusa
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsTuyen Vuong
 
InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsMIJIN AN
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1Satishbabu Gunukula
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer ManagementMIJIN AN
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Jaime Crespo
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementlalit choudhary
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바NeoClova
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Kyle Hailey
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLYoshinori Matsunobu
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuningGuy Harrison
 

What's hot (20)

AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
 
InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and Checkpoints
 
Mysql-MHA
Mysql-MHAMysql-MHA
Mysql-MHA
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1What’s New in Oracle Database 19c - Part 1
What’s New in Oracle Database 19c - Part 1
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바MySQL Administrator 2021 - 네오클로바
MySQL Administrator 2021 - 네오클로바
 
Performance tuning in sql server
Performance tuning in sql serverPerformance tuning in sql server
Performance tuning in sql server
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuning
 

Similar to The InnoDB Storage Engine for MySQL

Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎YUCHENG HU
 
jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performanceHiroshi Ono
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMorgan Tocker
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration StrategiesMurilo Miranda
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomyRyan Robson
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?Mydbops
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryLouis liu
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosqlthinkinlamp
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
 
Inno db 5_7_features
Inno db 5_7_featuresInno db 5_7_features
Inno db 5_7_featuresTinku Ajit
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structurezhaolinjnu
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
Sql Health in a SharePoint environment
Sql Health in a SharePoint environmentSql Health in a SharePoint environment
Sql Health in a SharePoint environmentEnrique Lima
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)Ontico
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentalsChris Adkin
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retoreVasudeva Rao
 
InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0Mydbops
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 

Similar to The InnoDB Storage Engine for MySQL (20)

Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎
 
jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performance
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration Strategies
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB Anatomy
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summary
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
Inno db 5_7_features
Inno db 5_7_featuresInno db 5_7_features
Inno db 5_7_features
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structure
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
Sql Health in a SharePoint environment
Sql Health in a SharePoint environmentSql Health in a SharePoint environment
Sql Health in a SharePoint environment
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
Howmysqlworks
HowmysqlworksHowmysqlworks
Howmysqlworks
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retore
 
InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0InnoDB Scalability improvements in MySQL 8.0
InnoDB Scalability improvements in MySQL 8.0
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 

More from Morgan Tocker

Introducing Spirit - Online Schema Change
Introducing Spirit - Online Schema ChangeIntroducing Spirit - Online Schema Change
Introducing Spirit - Online Schema ChangeMorgan Tocker
 
MySQL Usability Guidelines
MySQL Usability GuidelinesMySQL Usability Guidelines
MySQL Usability GuidelinesMorgan Tocker
 
My First 90 days with Vitess
My First 90 days with VitessMy First 90 days with Vitess
My First 90 days with VitessMorgan Tocker
 
FOSDEM MySQL and Friends Devroom
FOSDEM MySQL and Friends DevroomFOSDEM MySQL and Friends Devroom
FOSDEM MySQL and Friends DevroomMorgan Tocker
 
Introducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live FrankfurtIntroducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live FrankfurtMorgan Tocker
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupMorgan Tocker
 
TiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction - San Francisco MySQL MeetupTiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction - San Francisco MySQL MeetupMorgan Tocker
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMorgan Tocker
 
MySQL Server Defaults
MySQL Server DefaultsMySQL Server Defaults
MySQL Server DefaultsMorgan Tocker
 
MySQL Cloud Service Deep Dive
MySQL Cloud Service Deep DiveMySQL Cloud Service Deep Dive
MySQL Cloud Service Deep DiveMorgan Tocker
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated TestingMorgan Tocker
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Morgan Tocker
 
MySQL Query Optimization
MySQL Query OptimizationMySQL Query Optimization
MySQL Query OptimizationMorgan Tocker
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMorgan Tocker
 
MySQL For Linux Sysadmins
MySQL For Linux SysadminsMySQL For Linux Sysadmins
MySQL For Linux SysadminsMorgan Tocker
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMorgan Tocker
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMorgan Tocker
 

More from Morgan Tocker (20)

Introducing Spirit - Online Schema Change
Introducing Spirit - Online Schema ChangeIntroducing Spirit - Online Schema Change
Introducing Spirit - Online Schema Change
 
MySQL Usability Guidelines
MySQL Usability GuidelinesMySQL Usability Guidelines
MySQL Usability Guidelines
 
My First 90 days with Vitess
My First 90 days with VitessMy First 90 days with Vitess
My First 90 days with Vitess
 
FOSDEM MySQL and Friends Devroom
FOSDEM MySQL and Friends DevroomFOSDEM MySQL and Friends Devroom
FOSDEM MySQL and Friends Devroom
 
Introducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live FrankfurtIntroducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live Frankfurt
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
 
TiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction - San Francisco MySQL MeetupTiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction - San Francisco MySQL Meetup
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
MySQL Server Defaults
MySQL Server DefaultsMySQL Server Defaults
MySQL Server Defaults
 
MySQL Cloud Service Deep Dive
MySQL Cloud Service Deep DiveMySQL Cloud Service Deep Dive
MySQL Cloud Service Deep Dive
 
MySQL 5.7 + JSON
MySQL 5.7 + JSONMySQL 5.7 + JSON
MySQL 5.7 + JSON
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
 
MySQL Query Optimization
MySQL Query OptimizationMySQL Query Optimization
MySQL Query Optimization
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
 
MySQL For Linux Sysadmins
MySQL For Linux SysadminsMySQL For Linux Sysadmins
MySQL For Linux Sysadmins
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big Data
 
MySQL NoSQL APIs
MySQL NoSQL APIsMySQL NoSQL APIs
MySQL NoSQL APIs
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server Changes
 

The InnoDB Storage Engine for MySQL

  • 1. <Insert Picture Here> The InnoDB Storage Engine for MySQL
 Morgan Tocker, MySQL Community Manager
 http://www.tocker.ca/

  • 2. Safe Harbor Statement The  following  is  intended  to  outline  our  general  product  direction.   It  is  intended  for  information  purposes  only,  and  may  not  be   incorporated  into  any  contract.  It  is  not  a  commitment  to  deliver   any  material,  code,  or  functionality,  and  should  not  be  relied  upon   in  making  purchasing  decisions.   
 The  development,  release,  and  timing  of  any  features  or   functionality  described  for  Oracle’s  products  remains  at  the  sole   discretion  of  Oracle.
  • 3. 4 Years of MySQL Innovation MySQL Cluster 7.3 MySQL Workbench 6.0 MySQL Migration Wizard <Insert Picture Here> MySQL 5.5 MySQL 5.6 Windows installer & Tools MySQL 5.7 M y S Q L Cluster MySQL MySQL Enterprise Monitor 2.3 & e r M a n a g 3.0 Applier for Hadoop MySQL Enterprise Backup Security MySQL Utilities MySQL Workbench 5.2 & 6.0 Scalability MySQL Cluster 7.2 HA MySQL Enterprise MySQL Cluster 7.1 O r a c l e C e r t i f i c a t i o n s Audit
  • 4. Hello and Welcome! • I will be talking about InnoDB’s internal behaviour. • Not talking (much) about MySQL. • Aim of this talk is to give you X-ray vision. • i.e. not so many direct takeaways, but one day it will help you debug a problem.
  • 5. Prerequisites Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
  • 7. IO Performance L1 cache reference Branch mispredict L2 cache reference Mutex lock/unlock Main memory reference Compress 1K bytes with Zippy Send 2K bytes over 1 Gbps network Read 1 MB sequentially from memory Round trip within same datacenter Disk seek Read 1 MB sequentially from disk Send packet CA->Netherlands->CA 0.5 ns! 5 ns! 7 ns! 25 ns! 100 ns! 3,000 ns! 20,000 ns! 250,000 ns! 500,000 ns! 10,000,000 ns! 20,000,000 ns! 150,000,000 ns See: http://www.linux-mag.com/cache/7589/1.html and Google http:// www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  • 8. IO Performance (cont.) • 5-10ms per disk IO. • Maybe 50us for a high end SSD. • Still not “memory speed”.
  • 9. Buffered IO • Operating Systems compensate well already. • Reads are cached with free memory. • Writes don’t happen instantly. • A step is introduced to rewrite and merge. Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200
  • 10. fsync Synopsis #include <unistd.h> int fsync(int fd); int fdatasync(int fd); ! Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).
  • 11. Basic  Operation Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
  • 12. Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB High Level Overview IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd In Memory On Disk
  • 13. Query (pages not in buffer pool) mysqld SELECT * FROM a WHERE id = 10; Buffer Pool Buffer Pool LRU Adaptive Hash Indexes Data Dictionary Cache Buffer Pool Flush List Page Cache Log Group ibdata1 space 0 Storage InnoDB Additional Mem Pool Log Buffer Transaction System Caching Not Found IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 14. Query (pages in buffer pool) mysqld SELECT * FROM a WHERE id = 10; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 15. Update Query in a Transaction (simplified) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 16. Log files • Provide recovery. • Only written to in regular operation. • Read only required if there is a crash. • Are rewritten over-and-over again. • Think of it like a tank tread.
  • 17. Log files (cont.) • Are an optimization! • 512B aligned sequential writes. • Tablespace writes are 16KiB random writes. • Tablespace writes to same pages in close time window can be merged. • Just need a large enough log file.
  • 18. Checkpoint (Background Activity) mysqld (nothing) Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 19. FAQ • Q: What do we write to the log - is it committed data only, or can we write uncommitted data as well? • A: Both.
  • 20. FAQ • Q: How do you unapply transactions? • A: UNDO space.
 Think of it like a hidden table internally stored in ibdata1.
  • 21. Update Query (More Accurate*) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Page Cache Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Additional Mem Pool Redirect older version Update any indexes Modify row in place of row to undo space to hold both versions. Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 22. Update Query (cont.) • Background purge process is able to clean old rows from UNDO as soon as oldest transaction advances forward.
  • 23. Summarized Performance Characteristics • Log Files: • Are short sequential writes. • They permit InnoDB to delay tablespace writes enabling more merging/optimization. • Buffer Pool: • “In memory version of the tablespace”. • Loading/unloading via modified LRU algorithm.
  • 24. Index Structure • Indexes and “data” in InnoDB are B+Trees. • Clustered Index design means that data itself is stored in an index.
  • 25. Index Structure (cont.) Empty root Infimum Level 0 Root Page 3 Next Record Supremum
  • 26. Index Structure (cont.) Insert: 1 Infimum Level 0 Root Page size is 16KB B* is 2KB Page 3 1 B* Supremum
  • 27. Index Structure (cont.) Insert: 1 to 7 I Level 0 Root S Page 3 1 B* 2 B* 3 B* 4 B* 5 B* 6 B* 7 B*
  • 28. Index Structure (cont.) Insert: 8 Infimum Page 3 Level 1 Root 1 4 I Level 0 Leaf Allocate new page and link in root Move records to new page Split new page Supremum S Page 4 1 B* 2 B* 3 B* 4 B* 5 B* 6 B* 7 B*
  • 29. Index Structure (cont.) Insert: 8 (Cont.) Infimum Level 1 Root 1 4 I Level 0 Leaf S Page 4 1 B* Split at the middle of original page 2 B* 3 B* Supremum Page 3 4 5 I S Page 5 4 B* 5 B* 6 B* 7 B*
  • 30. Index Structure (cont.) Insert: 9 and 10 Infimum Level 1 Root 1 4 I Level 0 Leaf S Page 4 1 B* 2 B* 3 B* Supremum Page 3 4 5 I S Page 5 4 B* 5 B* 6 B* 7 B* 8 B* 9 B* 10 B*
  • 31. Index Structure (cont.) Insert: 11 Infimum Level 1 Root 4 5 1 4 I Level 0 Leaf Page 3 S Page 4 I Supremum 11 6 S Page 5 I S Page 6 1 B* 2 B* 3 B* Insert leads to a split at the insertion point 4 B* 5 B* 6 B* 7 B* 8 B* 9 B* 10 B* 11 B*
  • 32. Index Structure Level 2 Root Infimum Level 1 Internal Page 6 0 A 1 B S Page 4 ≥0 → 6 I Level 0 Leaf Next Record ≥0 → 4 I S Supremum Page 3 ≥4 → 5 I Next Page ≥2 → 7 ≥4 → 8 Prev Page I Page 7 2 C 3 D S I Page 8 4 E 5 F S Page 5 S ≥6 → 9 I Page 9 6 G 7 H S
  • 33. Page Format 0 38 FIL Header (38) Other headers and page data, depending on page type. Total usable space: 16,338 bytes. 16376 16384 FIL Trailer (8)
  • 34. Row Format N-5 N-4 N-2 N N+k N+k+6 N+k+13 N+k+13+j Variable field lengths (1-2 bytes per var. field) Info Flags (4 bits) Number of Records Owned (4 bits) Order (13 bits) Record Type (3 bits) Next Record Offset (2) Cluster Key Fields (k) Transaction ID (6) Roll Pointer (7) Non-Key Fields (j)
  • 35. Conclusion • Page is basic unit of storage. • Default is 16KiB • Rows of variable length.
  • 36. Two more useful features • Adaptive hash - Partial hash index to accelerate secondary key lookups. • Change buffering - when non-unique indexes are not in memory, changes can be temporarily buffered until they are.
  • 37. Query (by secondary key) mysqld SELECT * FROM a WHERE b_key = 10; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 38. Update Query (large table) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Buffer Pool Caching InnoDB Buffer Pool LRU Adaptive Hash Indexes Data Dictionary Cache Buffer Pool Flush List Page Cache Additional Mem Pool Not Required Log Group ibdata1 space 0 Storage Transaction System Log Buffer IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
  • 39. New  Features Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
  • 40. MySQL 5.5+ • • • • • • • • • IO Scalability Async IO Multiple Buffer Pools Adaptive Flushing Scan Resistant LRU Compressed Pages CPU Scalability Improved Atomics Spin Loops with PAUSE
  • 41. MySQL 5.6+ • LRU Dump and Restore • Improved Group Commit • Fulltext Search • Fast Read-only Transactions • Memcached Interface • Information Schema metadata tables • • • • Persistent Statistics Variable Page Size Online DDL Transportable Tablespace • Transactional Replication
 Using InnoDB
  • 42. MySQL 5.7+ • Faster Temporary Tables • Index Lock Contention Reduction • More Online DDL • Extend VARCHAR • Rename Index • Improved Read-Only Transactions • Improved CPU Scalability
  • 43. Configuration Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
  • 44. The Top 3 1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb_flush_log_at_trx_commit
  • 45. innodb-buffer-pool-size • Really only one major buffer/cache settings to set. • Responsible for all pages types (data, indexes, undo, insert buffer..)
  • 46. innodb-buffer-pool-size (cont.) • Recommendation is 50-80% of RAM. • Default is 128M of RAM. • Please allow 5-10% on top for other meta data to grow.
  • 47. innodb-log-file-size • Log files are on disk, but this contributes to how many unflushed (dirty) pages you can hold in memory. • In theory larger log files = longer crash recovery. • In MySQL 5.5 -2G max. • In MySQL 5.6 - 4G is usually safe. • Early versions should be much smaller. • Default is 48M Log Files.
  • 48. innodb_flush_log_at_trx_commit • Default is full ACID Compliance (=1) • Can be set to 0/2 if you do not mind some data loss.
  • 49. innodb_flush_log_at_trx_commit • 0 = Log buffer written + synced once per second. Nothing done at commit. • 1 = Log buffer written + synced once per second + written and synced on commit. • 2 = Log buffer written + synced once per second + written (not synced) on commit. ! 2 is a slightly safer version of 0.
  • 50. Requires about 5-10% of buffer pool size as overhead (not directly configurable). Basic Configuration Buffer Pool InnoDB Buffer Pool LRU Adaptive Hash Indexes Additional Mem Pool innodb_log_file_size. Typical values 256M+. Default of 2 files (innodb_log_files_in_group). Log Group iblogfile0 ibdata1 space 0 innodb_log_buffer_size. Typical Doublewrite Buffer values 1-8M. Flushing IBUF_HEADER IBUF_TREE Block 1 (64 pages) behaviour influenced by TRX_SYS Block 2 (64 pages) FIRST_RSEG innodb_flush_log_at_trx_commit. DICT_HDR Storage Data Dictionary Cache Buffer Pool Flush List Page Cache Log Buffer Transaction System Caching Innodb_buffer_pool_size. Recommendation is 50-80% RAM. iblogfile1 iblogfile2 Tables with file_per_table Data Dict. SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS A.ibd B.ibd C.ibd innodb_file_per_table (Default: ON in 5.6+)
  • 51. The ~Top 10 1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb-log-buffer-size 4. innodb_flush_log_at_trx_commit 5. innodb_flush_method 6. innodb_flush_neighbors 7. innodb_io_capacity, innodb_io_capacity_max, innodb_lru_scan_depth 8. innodb-buffer-pool-instances 9. innodb_read_io_threads and innodb_write_io_threads
  • 52. Less-likely to need configuration • • • • • innodb_thread_concurrency innodb_concurrency_tickets innodb_max_pct_dirty_pages innodb_use_native_aio (always on) innodb_old_blocks_time (5.6 default: 1000)
  • 53. Deprecated Settings • Typically “remove on sight” from config files: • innodb_additional_mempool_size • innodb_use_sys_malloc
  • 54. Credits • InnoDB Architecture Diagrams via https://github.com/ jeremycole/innodb_diagrams • Available under (3-clause) BSD license
 Copyright (c) 2013, Twitter, Inc.
 Copyright (c) 2013, Jeremy Cole <jeremy@jcole.us>
 Copyright (c) 2013, Davi Arnaut <darnaut@gmail.com>