SlideShare a Scribd company logo
1 of 32
InnoDB Anatomy
The InnoDB Engine
Introduction to InnoDB
• Currently the default in MySQL (as of 5.5)
• Referential/Structural Integrity
• Consistent data
• Transactional
InnoDB is atomic in that its transactions
have only two possible outcomes - complete
fully, or fail completely.
• SUCCESS:
• FAILURE:
All changes committed.
All changes rolled back.
InnoDB Anatomy – ACID Compliance
ATOMICITY
COMMITROLLBACK
Unchanged
Data
Changes
Applied
• Data stays consistent before, during, and after a
transaction.
• No conflict of “versions”
• Successful transactions end with a commit.
InnoDB maintains optimism.
Introduction to InnoDB Structure
CONSISTENCY
Valid State
Work
performed
Still a valid
state
• Transactions cannot interact with each other
• Adjustable level of isolation.*
• Row-level locking
Introduction to InnoDB Structure
ISOLATION
* Isolation level changed via the transaction-isolation configuration option.
• Atomic transactions keep data durable.
• Changes are permanent once committed.
• Doublewrite buffer helps to recover from crashes
that occur during page writes.
Introduction to InnoDB Structure
DURABILITY
Explore the InnoDB structure within the file
system, and at its lower levels, to find out how
it can affect database operations.
InnoDB Anatomy
The Goal
Understanding the
Physical Structure of
Data in InnoDB
InnoDB Anatomy – Physical Structure
Physical File Structure
DataDirectory
(Default:/var/lib/mysql)
Ibdata1
System Tablespace
ib_logfile0
Redo/Transaction Log File
ib_logfile1
Redo/Transaction Log File
Database Folder
table.ibd
Tablespace File
table.frm
Format File
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
File System
Partition
Disk Block
Allocation Unit
File
Inode
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
User Records (Index Pages)…
First Inode Page Number
File Header
Insert Buffer Bitmap
IBD File
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
PAGE…
FIL Header - 38 bytes
…
FIL Trailer - 8 bytes
16384
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
1MB
Page 1
…
Page 64
EXTENT
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
Extent
Extent
…
SEGMENT
InnoDB Anatomy – Physical Structure
InnoDB
Tablespace
Page
Extent
Segment
Inode
File segment ID
Extent Data (Free/Partial/Full
Listing)
InnoDB Anatomy – Physical Structure
Tablespace
Segment
Segment
Segment
Segment
Extent
Extent
Extent
Extent
Extent
Extent
Extent
SYSTEM TABLESPACE
ibdata File
InnoDB Anatomy – System Tablespace
Undo Logs
Rollback
Segment
Rollback
Segment
Data
Dictionary
SYS_TABLES
SYS_INDEXES
SYS_COLUMNS
SYS_FIELDS
Doublewrite
Buffer
Block 1
(64 Pages)
Block 2
(64 Pages)
Change
Buffer
Insert
Buffering
Update
Buffering
Purge
Buffering
Undo Log Space
Rollback
Segment
…
InnoDB Anatomy – System Tablespace
Separate your Undo Logs (5.6+ only)
innodb_undo_logs; innodb_undo_tablespaces; innodb_undo_directory
I/O to the undo logs is random, instead of sequential like some other areas
of InnoDB. Because of this, it makes sense to separate your Undo Log
tablespaces out from the system tablespace onto a disk that handles
random reads and writes more effectively, such as a SSD.
Use the Information Schema, or innodb_table_monitor, to view
the Data Dictionary table data.
In 5.6, the information_schema database contains INNODB_SYS* tables that
allow you to view data dictionary information directly in MySQL.
Alternatively, you can create a table called “innodb_table_monitor” to
dump the data dictionary into the MySQL error logs.
How can you use this?
InnoDB Anatomy – System Tablespace
Do you need the Doublewrite buffer?
innodb_doublewrite
With Doublewrite buffer enabled, there is a 5-10% impact on I/O. If you
operate on a transactional file system, you disable this to avoid this impact.
Customizing Change Buffering for your Workload
innodb_change_buffering
Change buffering, by default in 5.5+, encompasses insert, update, and
delete buffering. If your workload consists almost entirely of one or the
other, it can make sense to limit this down to only one type of buffering.
How can you use this?
InnoDB in Memory and on Disk
Memory
Buffer Pool
Insert Buffer
Log Buffer
Additional Memory
Disk
System Tablespace
Doublewrite Buffer
Transaction Log Files
Insert Buffer
Undo Logs
Rollback Segment
Data Dictionary
Undo Buffering
Indexing
Thread Processing
Tablespace Files
InnoDB Anatomy – Pages
Page Headers/Trailers
Name Byte Length Offset Description
FIL_PAGE_SPACE 4 0 Space ID
FIL_PAGE_OFFSET 4 4 Page Number
FIL_PAGE_PREV 4 8 Previous Page (in key order)
FIL_PAGE_NEXT 4 12 Next Page (in key order)
FIL_PAGE_LSN 8 16 LSN of page’s latest log record
FIL_PAGE_TYPE 2 24 Page Type
FIL_PAGE_FILE_FLUSH_LSN 8 26
Flushed-up-to LSN (only in space ID 0, page
0)
FIL_PAGE_ARCH_LOG_NO 4 34
Latest archived LSN (only in space ID 0, page
0)
FIL Header (38)
FIL Trailer (8)
Name Byte Length Offset Description
FIL_PAGE_END_LSN 8 16376
Low 4 bytes: Checksum, Last 4 bytes:
FIL_PAGE_LSN
storage/innobase/include/fil0fil.h
InnoDB Anatomy - Demonstration
Changing values directly
At the byte level, these values can be changed directly in many situations to
“trick” InnoDB in one way or another. One good example of this is to get
around a page checksum failure. You can change the stored checksum to
match the calculated checksum, bypassing the crash and often allowing you
sufficient access to your records.
How can you use this?
InnoDB: Page checksum 2047964429, prior-to-4.0.14-form checksum 4196043695
InnoDB: stored checksum 1873408413, prior-to-4.0.14-form stored checksum 1946395024
# printf '%Xn' 2047964429; printf '%Xn' 4196043695
7A11750D  Primary calculated checksum
FA1A8BAF  “Old-style” calculated checksum
# expr 16384 * 6  Example Page 6
98304  Starting byte offset for Page 6
Writing the primary calculated checksum over the stored value of page 6:
# printf ‘x7Ax11x75x0D’ | dd of=table.ibd bs=1 seek=98304 count=4 conv=notrunc
Writing the “old-style” calculated checksum over the stored value of page 6:
# printf ‘x7Ax11x75x0D’ | dd of=table.ibd bs=1 seek=98304 count=4 conv=notrunc
•Stored in 2 files by default
(ib_logfile0/1)
•Treated as single file
•Circular buffer
LOG BLOCK
Header (12)
Log Records
Trailer (4)
…
LOG BLOCK
Header (12)
Log Records
Trailer (4)
…
LOG BLOCK
Header (12)
Log Records
Trailer (4)
…
ib_logfile0ib_logfile1
InnoDB Anatomy – Redo Logs
Structure
•Log blocks are 512 bytes
•Each block contains
checkpoint data
The Logical Log FileThe Redo Logs
InnoDB Anatomy – Redo Logs
Optimized log file size
innodb_log_file_size
Larger size means less checkpoint flushing required, reducing I/O impact.
Balance with expected recovery time required as a result of the size (less of
an issue in 5.6).
General Formula: (Current LSN – LSN 60 seconds later) * 60 / 1024 / 1024
Optimized log buffer size
innodb_log_buffer_size
Log buffer allows transactions to move forward without having to write the
log to disk before commit. Increased size allows larger transactions to run
without requiring writes to disk before a commit is performed.
How can you use this?
InnoDB Anatomy – Index Pages
INDEX Pages - B+Tree Structure
•Efficient method of storing data on disk in a tree format.
•Actual records stored in leaf pages (level 0).
•Root-level pages exist at the top of the tree structure.
•Non-leaf pages contain only pointers to leaf pages.
Level 0
Level 1
Level 2 Root
Non-Leaf
Leaf Leaf
Non-Leaf
Leaf
InnoDB Anatomy – Index Pages
B+Tree Structure – Basic Index Example
Root Node
Customer IDs
1-500
Non-Leaf
1-250
Non-Leaf
251-500
Leaf Node
251-260
Leaf Node
261-270
Leaf Node
1-10
Leaf Node
11-20
…
InnoDB Anatomy – Index Pages
INDEX Pages
•Not physically in order
•User data “grows down”
•Page directory “grows up”
FIL Header (38)
… Page Directory
FIL Trailer (8)
INDEX Header (36)
FSEG Header (20)
System Records (26)
User Data …
EMPTY
InnoDB Anatomy – Index Pages
INDEX Page Header (after FIL Header)
Name Byte Length Offset Description
PAGE_N_DIR_SLOTS 2 38 + 0 Number of Slots in Page Directory
PAGE_HEAP_TOP 2 38 + 2 Pointer to Record Heap Top
PAGE_N_HEAP 2 38 + 4 Number of Records in Heap
PAGE_FREE 2 38 + 6 Pointer to start of page’s free-record list
PAGE_GARBAGE 2 38 + 8 Number of bytes in “deleted” records
PAGE_LAST_INSERT 2 38 + 10
Pointer to last inserted record, or NULL if
this has been reset – eg. by a delete.
PAGE_DIRECTION 2 38 + 12
Last Insert direction, PAGE_LEFT,
PAGE_RIGHT …
PAGE_N_DIRECTION 2 38 + 14 Consecutive inserts in the same direction
PAGE_N_RECS 2 38 + 16 Number of user records on the page
PAGE_MAX_TRX_ID 8 38 + 18
Highest ID of transaction that may have
modified a record on the page.
PAGE_LEVEL 2 38 + 26 Level of node in index tree
PAGE_INDEX_ID 4 38 + 28 Index ID that page belongs to
storage/innobase/include/page0page.h
InnoDB Anatomy – Demonstration
Demonstration
Determining page level on an INDEX page
First, find your page’s start byte:
# expr 16384 * 3
49152
The offset of the PAGE_LEVEL value is 26 after the
FIL Header (38):
# expr 49152 + 38 + 26
49216
The byte-length is 2
# xxd –ps –s 49216 –l 2 customer.ibd
0001
Page Level: 1
/var/lib/mysql/testdb/
Additional Resources
Conclusion
• Jeremy Cole
• http://blog.jcole.us/
• https://github.com/jeremycole/
• Percona
• http://www.percona.com/files/percona-live/justin-
innodb-internals.pdf
• MySQL Internals Documentation & Source
• http://dev.mysql.com/doc/internals/en/innodb.html
• https://launchpad.net/mysql
Sources and Thanks

More Related Content

What's hot

Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
Ashnikbiz
 

What's hot (20)

Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
MonetDB :column-store approach in database
MonetDB :column-store approach in databaseMonetDB :column-store approach in database
MonetDB :column-store approach in database
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to read
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
IBM DB2
IBM DB2IBM DB2
IBM DB2
 
MySQL vs MonetDB Bencharmarks
MySQL vs MonetDB BencharmarksMySQL vs MonetDB Bencharmarks
MySQL vs MonetDB Bencharmarks
 
Ibm db2
Ibm db2Ibm db2
Ibm db2
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Exchange 2010 storage improvements
Exchange 2010 storage improvementsExchange 2010 storage improvements
Exchange 2010 storage improvements
 
MySQL vs. MonetDB
MySQL vs. MonetDBMySQL vs. MonetDB
MySQL vs. MonetDB
 
RocksDB storage engine for MySQL and MongoDB
RocksDB storage engine for MySQL and MongoDBRocksDB storage engine for MySQL and MongoDB
RocksDB storage engine for MySQL and MongoDB
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
 
IBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper
IBM System Storage LTO Ultrium 6 Tape Drive Performance White PaperIBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper
IBM System Storage LTO Ultrium 6 Tape Drive Performance White Paper
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
MongoDB and Fractal Tree Indexes
MongoDB and Fractal Tree IndexesMongoDB and Fractal Tree Indexes
MongoDB and Fractal Tree Indexes
 
2 db2 instance creation
2 db2 instance creation2 db2 instance creation
2 db2 instance creation
 

Viewers also liked

The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 

Viewers also liked (8)

The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
 
MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
 
innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話
 
The Complete MariaDB Server tutorial
The Complete MariaDB Server tutorialThe Complete MariaDB Server tutorial
The Complete MariaDB Server tutorial
 
개발자가 도전하는 MariaDB 서버구축
개발자가 도전하는 MariaDB 서버구축개발자가 도전하는 MariaDB 서버구축
개발자가 도전하는 MariaDB 서버구축
 
개발자도 알아야 하는 DBMS튜닝
개발자도 알아야 하는 DBMS튜닝개발자도 알아야 하는 DBMS튜닝
개발자도 알아야 하는 DBMS튜닝
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
 

Similar to cPanelCon 2014: InnoDB Anatomy

jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performance
Hiroshi Ono
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summary
Louis liu
 
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesOpen sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Arvids Godjuks
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
thinkinlamp
 
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
Ontico
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Aleksandr Kuzminsky
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
guest808c167
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
Ontico
 
DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2
Pranav Prakash
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retore
Vasudeva Rao
 

Similar to cPanelCon 2014: InnoDB Anatomy (20)

jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performance
 
MySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summaryMySQL 5.5&5.6 new features summary
MySQL 5.5&5.6 new features summary
 
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesOpen sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
 
M|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write PathsM|18 Deep Dive: InnoDB Transactions and Write Paths
M|18 Deep Dive: InnoDB Transactions and Write Paths
 
Data recovery talk on PLUK
Data recovery talk on PLUKData recovery talk on PLUK
Data recovery talk on PLUK
 
Teched03 Index Maint Tony Bain
Teched03 Index Maint Tony BainTeched03 Index Maint Tony Bain
Teched03 Index Maint Tony Bain
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)InnoDB: архитектура транзакционного хранилища (Константин Осипов)
InnoDB: архитектура транзакционного хранилища (Константин Осипов)
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
database.pdf
database.pdfdatabase.pdf
database.pdf
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
Incremental backups
Incremental backupsIncremental backups
Incremental backups
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retore
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

cPanelCon 2014: InnoDB Anatomy

  • 1.
  • 3. The InnoDB Engine Introduction to InnoDB • Currently the default in MySQL (as of 5.5) • Referential/Structural Integrity • Consistent data • Transactional
  • 4. InnoDB is atomic in that its transactions have only two possible outcomes - complete fully, or fail completely. • SUCCESS: • FAILURE: All changes committed. All changes rolled back. InnoDB Anatomy – ACID Compliance ATOMICITY COMMITROLLBACK Unchanged Data Changes Applied
  • 5. • Data stays consistent before, during, and after a transaction. • No conflict of “versions” • Successful transactions end with a commit. InnoDB maintains optimism. Introduction to InnoDB Structure CONSISTENCY Valid State Work performed Still a valid state
  • 6. • Transactions cannot interact with each other • Adjustable level of isolation.* • Row-level locking Introduction to InnoDB Structure ISOLATION * Isolation level changed via the transaction-isolation configuration option.
  • 7. • Atomic transactions keep data durable. • Changes are permanent once committed. • Doublewrite buffer helps to recover from crashes that occur during page writes. Introduction to InnoDB Structure DURABILITY
  • 8. Explore the InnoDB structure within the file system, and at its lower levels, to find out how it can affect database operations. InnoDB Anatomy The Goal
  • 10. InnoDB Anatomy – Physical Structure Physical File Structure DataDirectory (Default:/var/lib/mysql) Ibdata1 System Tablespace ib_logfile0 Redo/Transaction Log File ib_logfile1 Redo/Transaction Log File Database Folder table.ibd Tablespace File table.frm Format File
  • 11. InnoDB Anatomy – Physical Structure InnoDB Tablespace Page Extent Segment Inode File System Partition Disk Block Allocation Unit File Inode
  • 12. InnoDB Anatomy – Physical Structure InnoDB Tablespace Page Extent Segment Inode User Records (Index Pages)… First Inode Page Number File Header Insert Buffer Bitmap IBD File
  • 13. InnoDB Anatomy – Physical Structure InnoDB Tablespace Page Extent Segment Inode PAGE… FIL Header - 38 bytes … FIL Trailer - 8 bytes 16384
  • 14. InnoDB Anatomy – Physical Structure InnoDB Tablespace Page Extent Segment Inode 1MB Page 1 … Page 64 EXTENT
  • 15. InnoDB Anatomy – Physical Structure InnoDB Tablespace Page Extent Segment Inode Extent Extent … SEGMENT
  • 16. InnoDB Anatomy – Physical Structure InnoDB Tablespace Page Extent Segment Inode File segment ID Extent Data (Free/Partial/Full Listing)
  • 17. InnoDB Anatomy – Physical Structure Tablespace Segment Segment Segment Segment Extent Extent Extent Extent Extent Extent Extent
  • 18. SYSTEM TABLESPACE ibdata File InnoDB Anatomy – System Tablespace Undo Logs Rollback Segment Rollback Segment Data Dictionary SYS_TABLES SYS_INDEXES SYS_COLUMNS SYS_FIELDS Doublewrite Buffer Block 1 (64 Pages) Block 2 (64 Pages) Change Buffer Insert Buffering Update Buffering Purge Buffering Undo Log Space Rollback Segment …
  • 19. InnoDB Anatomy – System Tablespace Separate your Undo Logs (5.6+ only) innodb_undo_logs; innodb_undo_tablespaces; innodb_undo_directory I/O to the undo logs is random, instead of sequential like some other areas of InnoDB. Because of this, it makes sense to separate your Undo Log tablespaces out from the system tablespace onto a disk that handles random reads and writes more effectively, such as a SSD. Use the Information Schema, or innodb_table_monitor, to view the Data Dictionary table data. In 5.6, the information_schema database contains INNODB_SYS* tables that allow you to view data dictionary information directly in MySQL. Alternatively, you can create a table called “innodb_table_monitor” to dump the data dictionary into the MySQL error logs. How can you use this?
  • 20. InnoDB Anatomy – System Tablespace Do you need the Doublewrite buffer? innodb_doublewrite With Doublewrite buffer enabled, there is a 5-10% impact on I/O. If you operate on a transactional file system, you disable this to avoid this impact. Customizing Change Buffering for your Workload innodb_change_buffering Change buffering, by default in 5.5+, encompasses insert, update, and delete buffering. If your workload consists almost entirely of one or the other, it can make sense to limit this down to only one type of buffering. How can you use this?
  • 21. InnoDB in Memory and on Disk Memory Buffer Pool Insert Buffer Log Buffer Additional Memory Disk System Tablespace Doublewrite Buffer Transaction Log Files Insert Buffer Undo Logs Rollback Segment Data Dictionary Undo Buffering Indexing Thread Processing Tablespace Files
  • 22. InnoDB Anatomy – Pages Page Headers/Trailers Name Byte Length Offset Description FIL_PAGE_SPACE 4 0 Space ID FIL_PAGE_OFFSET 4 4 Page Number FIL_PAGE_PREV 4 8 Previous Page (in key order) FIL_PAGE_NEXT 4 12 Next Page (in key order) FIL_PAGE_LSN 8 16 LSN of page’s latest log record FIL_PAGE_TYPE 2 24 Page Type FIL_PAGE_FILE_FLUSH_LSN 8 26 Flushed-up-to LSN (only in space ID 0, page 0) FIL_PAGE_ARCH_LOG_NO 4 34 Latest archived LSN (only in space ID 0, page 0) FIL Header (38) FIL Trailer (8) Name Byte Length Offset Description FIL_PAGE_END_LSN 8 16376 Low 4 bytes: Checksum, Last 4 bytes: FIL_PAGE_LSN storage/innobase/include/fil0fil.h
  • 23. InnoDB Anatomy - Demonstration Changing values directly At the byte level, these values can be changed directly in many situations to “trick” InnoDB in one way or another. One good example of this is to get around a page checksum failure. You can change the stored checksum to match the calculated checksum, bypassing the crash and often allowing you sufficient access to your records. How can you use this? InnoDB: Page checksum 2047964429, prior-to-4.0.14-form checksum 4196043695 InnoDB: stored checksum 1873408413, prior-to-4.0.14-form stored checksum 1946395024 # printf '%Xn' 2047964429; printf '%Xn' 4196043695 7A11750D  Primary calculated checksum FA1A8BAF  “Old-style” calculated checksum # expr 16384 * 6  Example Page 6 98304  Starting byte offset for Page 6 Writing the primary calculated checksum over the stored value of page 6: # printf ‘x7Ax11x75x0D’ | dd of=table.ibd bs=1 seek=98304 count=4 conv=notrunc Writing the “old-style” calculated checksum over the stored value of page 6: # printf ‘x7Ax11x75x0D’ | dd of=table.ibd bs=1 seek=98304 count=4 conv=notrunc
  • 24. •Stored in 2 files by default (ib_logfile0/1) •Treated as single file •Circular buffer LOG BLOCK Header (12) Log Records Trailer (4) … LOG BLOCK Header (12) Log Records Trailer (4) … LOG BLOCK Header (12) Log Records Trailer (4) … ib_logfile0ib_logfile1 InnoDB Anatomy – Redo Logs Structure •Log blocks are 512 bytes •Each block contains checkpoint data The Logical Log FileThe Redo Logs
  • 25. InnoDB Anatomy – Redo Logs Optimized log file size innodb_log_file_size Larger size means less checkpoint flushing required, reducing I/O impact. Balance with expected recovery time required as a result of the size (less of an issue in 5.6). General Formula: (Current LSN – LSN 60 seconds later) * 60 / 1024 / 1024 Optimized log buffer size innodb_log_buffer_size Log buffer allows transactions to move forward without having to write the log to disk before commit. Increased size allows larger transactions to run without requiring writes to disk before a commit is performed. How can you use this?
  • 26. InnoDB Anatomy – Index Pages INDEX Pages - B+Tree Structure •Efficient method of storing data on disk in a tree format. •Actual records stored in leaf pages (level 0). •Root-level pages exist at the top of the tree structure. •Non-leaf pages contain only pointers to leaf pages. Level 0 Level 1 Level 2 Root Non-Leaf Leaf Leaf Non-Leaf Leaf
  • 27. InnoDB Anatomy – Index Pages B+Tree Structure – Basic Index Example Root Node Customer IDs 1-500 Non-Leaf 1-250 Non-Leaf 251-500 Leaf Node 251-260 Leaf Node 261-270 Leaf Node 1-10 Leaf Node 11-20 …
  • 28. InnoDB Anatomy – Index Pages INDEX Pages •Not physically in order •User data “grows down” •Page directory “grows up” FIL Header (38) … Page Directory FIL Trailer (8) INDEX Header (36) FSEG Header (20) System Records (26) User Data … EMPTY
  • 29. InnoDB Anatomy – Index Pages INDEX Page Header (after FIL Header) Name Byte Length Offset Description PAGE_N_DIR_SLOTS 2 38 + 0 Number of Slots in Page Directory PAGE_HEAP_TOP 2 38 + 2 Pointer to Record Heap Top PAGE_N_HEAP 2 38 + 4 Number of Records in Heap PAGE_FREE 2 38 + 6 Pointer to start of page’s free-record list PAGE_GARBAGE 2 38 + 8 Number of bytes in “deleted” records PAGE_LAST_INSERT 2 38 + 10 Pointer to last inserted record, or NULL if this has been reset – eg. by a delete. PAGE_DIRECTION 2 38 + 12 Last Insert direction, PAGE_LEFT, PAGE_RIGHT … PAGE_N_DIRECTION 2 38 + 14 Consecutive inserts in the same direction PAGE_N_RECS 2 38 + 16 Number of user records on the page PAGE_MAX_TRX_ID 8 38 + 18 Highest ID of transaction that may have modified a record on the page. PAGE_LEVEL 2 38 + 26 Level of node in index tree PAGE_INDEX_ID 4 38 + 28 Index ID that page belongs to storage/innobase/include/page0page.h
  • 30. InnoDB Anatomy – Demonstration Demonstration Determining page level on an INDEX page First, find your page’s start byte: # expr 16384 * 3 49152 The offset of the PAGE_LEVEL value is 26 after the FIL Header (38): # expr 49152 + 38 + 26 49216 The byte-length is 2 # xxd –ps –s 49216 –l 2 customer.ibd 0001 Page Level: 1 /var/lib/mysql/testdb/
  • 32. Conclusion • Jeremy Cole • http://blog.jcole.us/ • https://github.com/jeremycole/ • Percona • http://www.percona.com/files/percona-live/justin- innodb-internals.pdf • MySQL Internals Documentation & Source • http://dev.mysql.com/doc/internals/en/innodb.html • https://launchpad.net/mysql Sources and Thanks

Editor's Notes

  1. - Referential integrity = ensuring validity via adherence to constraints and restrictions Consistent data = enforced via checksum matching, by default. Transactional = grouping series of operations into a single, logical, atomic unit of work.
  2. Core InnoDB Concepts after this
  3. Transaction/Redo terms interchangeable when referring to the ib_logfiles.
  4. Tablespaces: Divide the data – innodb’s way of holding data for individual tables Pages: 16K data sections Extents: Units of allocation, stores groups of data (pages). Extents hold up to 64 pages each. Segments: Divisions of data within the tablespaces Inodes: Contain attributes and pointers to other sections of data.
  5. Tablespace: At least 3 initial header pages, each holding the standard page structure. Headers contain values about what to expect from the tablespace. Page: SEGMENT ACTS AS A DIVISION OF THE TABLESPACE – LOGICAL GROUP OF EXTENTS
  6. Tablespace: At least 3 initial header pages, each holding the standard page structure. Headers contain values about what to expect from the tablespace. Page: SEGMENT ACTS AS A DIVISION OF THE TABLESPACE – LOGICAL GROUP OF EXTENTS
  7. Tablespace: At least 3 initial header pages, each holding the standard page structure. Headers contain values about what to expect from the tablespace. Page: SEGMENT ACTS AS A DIVISION OF THE TABLESPACE – LOGICAL GROUP OF EXTENTS
  8. Tablespace: At least 3 initial header pages, each holding the standard page structure. Headers contain values about what to expect from the tablespace. Page: SEGMENT ACTS AS A DIVISION OF THE TABLESPACE – LOGICAL GROUP OF EXTENTS
  9. Tablespace: At least 3 initial header pages, each holding the standard page structure. Headers contain values about what to expect from the tablespace. Page: SEGMENT ACTS AS A DIVISION OF THE TABLESPACE – LOGICAL GROUP OF EXTENTS
  10. Additional log files can be used and/or relocated. Currently only one “group” supported Reliance on the log file allows InnoDB to delay flushes/writes to disk.
  11. Undo logs are composed of rollback segments (128), each able to support 1023 transactions; in total can support up to 128K concurrent transactions- increased from pre-5.5 value of just a single segment of 1023 transactions.
  12. Undo logs can be split off to be handled elsewhere, such as with an SSD, for optimal perfromance
  13. ZFS is an example of a transactional file system; makes sure writes are atomic
  14. Additional log files can be used and/or relocated. Log buffer flushed once per second MySQL 5.6 adjusts for log file size changes performed while offline. Currently only one “group” supported Reliance on the log file allows InnoDB to delay flushes/writes to disk.
  15. Additional log files can be used and/or relocated. Page structure does not apply here (blocks of 512 bytes instead). Total size of logical log file can be up to 512GB (file size * files in group) Reliance on the log file allows InnoDB to delay flushes/writes to disk. MySQL 5.6 adjusts for log file size changes performed while offline. Log buffer flushed once per second
  16. Allows for a consistent, determinable amount of reads to access any record in an index. Root page is a “starting point” for accessing the tree, tells it where to look and how far it will need to go (root page’s level indicates how far down it takes to get to 0, hence the bottom->top numbering) Tree can be as small as a single root page, or as big as millions of pages in a multi-level structure Everything is an index in InnoDB This is how all user records are stored Not stored physically in order, but use pointers In a single 4-level index tree, you have the potential for 814 billion rows/25.9TiB of data.
  17. “Non-leaf Nodes” also referred to as “Internal Nodes” Root page is a “starting point” for accessing the tree, tells it where to look and how far it will need to go (root page’s level indicates how far down it takes to get to 0, hence the bottom->top numbering) Everything is an index in InnoDB In a single 4-level index tree, you have the potential for 814 billion rows/25.9TiB of data.
  18. /* Directions of cursor movement */ #define PAGE_LEFT 1 #define PAGE_RIGHT 2 #define PAGE_SAME_REC 3 #define PAGE_SAME_PAGE 4 #define PAGE_NO_DIRECTION 5
  19. New tablespaces: undo and temporary tables (non-compressed)