IAC 2024 - IA Fast Track to Search Focused AI Solutions
Ā
Inno Db Internals Inno Db File Formats And Source Code Structure
1. Transactional Storage for MySQL
FAST. RELIABLE. PROVEN.
InnoDB Internals: InnoDB File
Formats and Source Code
Structure
MySQL Conference, April 2009
Heikki Tuuri Calvin Sun
CEO Innobase Oy Principal Engineer
Vice President, Development Oracle Corporation
Oracle Corporation
2. Todayās Topics
ā¢ Goals of InnoDB
ā¢ Key Functional Characteristics
ā¢ InnoDB Design Considerations
ā¢ InnoDB Architecture
ā¢ InnoDB On Disk Format
ā¢ Source Code Structure
ā¢ Q&A
3. Goals of InnoDB
ā¢ OLTP oriented
ā¢ Performance, Reliability, Scalability
ā¢ Data Protection
ā¢ Portability
4. InnoDB Key Functional
Characteristics
ā¢ Full transaction support
ā¢ Row-level locking
ā¢ MVCC
ā¢ Crash recovery
ā¢ Efficient IO
5. Design Considerations
ā¢ Modeled on Gray & Reuterās āTransactions
Processing: Concepts & Techniquesā
ā¢ Also emulated the Oracle architecture
ā¢ Added unique subsystems
ā¢ Doublewrite
ā¢ Insert buffering
ā¢ Adaptive hash index
ā¢ Designed to evolve with changing
hardware & requirements
6. InnoDB Architecture
Server Applications
Handler API Embedded InnoDB API
Transaction
Cursor / Row
Mini-
B-tree Lock
transaction
Page
Buffer
File Space Manager
IO
7. InnoDB On Disk Format
ā¢ InnoDB Database Files
ā¢ InnoDB Tablespaces
ā¢ InnoDB Pages / Extents
ā¢ InnoDB Rows
ā¢ InnoDB Indexes
ā¢ InnoDB Logs
ā¢ File Format Design Considerations
8. InnoDB Database Files
MySQL Data Directory
System tablespace
InnoDB
tables
internal
data .frm files
dictionary
innodb_file_per_table
OR
insert
buffer
undo
logs
.ibd files
ibdata files
9. InnoDB Tablespaces
ā¢ A tablespace consists of multiple files and/or
raw disk partitions.
file_name:file_size[:autoextend[:max:max_file_size]]
ā¢ A file/partition is a collection of segments.
ā¢ A segment consists of fixed-length pages.
ā¢ The page size is always 16KB in uncompressed
tablespaces, and 1KB-16KB in compressed
tablespaces (for both data and index).
10. System Tablespace
ā¢ Internal Data Dictionary
ā¢ Undo
ā¢ Insert Buffer
ā¢ Doublewrite Buffer
ā¢ MySQL Replication Info
11. InnoDB Tablespaces
Tablespace
Segment
Extent Extent
Leaf node segment
Non-leaf node segment Extent Extent
Extent
Rollback segment
Page
Row Row
Row
Trx id Row Row Row
Roll pointer
Row Row
Field pointers
Field 1 Field 2 Field n
an extent = 64 pages
12. InnoDB Pages
InnoDB Page Types
Symbol Value Notes
3 File segment inode
FIL_PAGE_INODE
17855 B-tree node
FIL_PAGE_INDEX
10 Uncompressed BLOB page
FIL_PAGE_TYPE_BLOB
1st compressed BLOB page
11
FIL_PAGE_TYPE_ZBLOB
12 Subsequent compressed BLOB page
FIL_PAGE_TYPE_ZBLOB2
6 System page
FIL_PAGE_TYPE_SYS
7 Transaction system page
FIL_PAGE_TYPE_TRX_SYS
i-buf bitmap, I-buf free list, file space
others header, extent desp page, new
allocated page
13. InnoDB Pages
A page consists of: a page header, a page
trailer, and a page body (rows or other
contents).
Page header
Row Row Row Row
Row Row
Row Row Row
Row Row
row offset array
Page trailer
14. Page Declares
typedef struct /* a space address */
{
ulint pageno; /* page number within the file */
ulint boffset; /* byte offset within the page */
} fil_addr_t;
typedef struct
{
ulint checksum; /*
checksum of the page (since 4.0.14) */
ulint page_offset; /*
page offset inside space */
fil_addr_t previous; /*
offset or fil_addr_t */
fil_addr_t next; /*
offset or fil_addr_t */
dulint page_lsn; /*
lsn of the end of the newest
modification log record to the page */
PAGE_TYPE page type; /* file page type */
dulint file_flush_lsn;/* the file has been flushed to disk
at least up to this lsn */
int space_id; /* space id of the page */
char data[]; /* will grow */
ulint page_lsn; /* the last 4 bytes of page_lsn */
ulint checksum; /* page checksum, or checksum magic, or 0 */
} PAGE, *PAGE;
15. InnoDB Compressed Pages
ā¢ InnoDB keeps a āmodification
Page header
logā in each page
ā¢ Updates & inserts of small
compressed data records are written to the log
w/o page reconstruction;
deletes donāt even require
uncompression
ā¢ Log also tells InnoDB if the
modification log
page will compress to fit page
empty space size
BLOB pointers ā¢ When log space runs out,
InnoDB uncompresses the
page directory
page, applies the changes and
Page trailer
recompresses the page
16. InnoDB Rows
ā¦ ā¦
prefix(768B)
COMACT format
overflow
20 bytes page
ā¦ ā¦
DYNAMIC format
overflow
page
Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values
17. InnoDB Indexes - Primary
rows are stored
āData
PK values
001 - nnn
ā¦ ā¦ in the B-tree leaf
nodes of a clustered
index
001 ā 500 ā 801 ā
500 800 nnn
āB-tree is organized
xxx
by primary key or
-
501 631 769 801 950
001 276 ā nnn
- - - - -
- 500
630 768 800 949 xxx
275
non-null unique key
of table, if defined;
clustered
else, an internal
(primary key)
index
Key values
column with 6-byte
501-
501-630 Primary Index
+ data for
corresponding rows
ROW_ID is added.
18. InnoDB Indexes - Secondary
clustered
clustered
(primary key)
(primary PK values
key)
index - nnn
001
index
ā Secondary index B-
tree leaf nodes
contain, for each key
value, the primary B-tree leaf nodes, containing data
keys of the
corresponding rows,
used to access key values
AZ
clustering index to
obtain the data
Secondary Index
B-tree leaf nodes, containing PKs
Secondary index
Secondary index
20. InnoDB Redo Log
end of log start of log last checkpoint
min LSN
Redo log structure:
Space id PageNo OpCode Data
21. File Format Management
ā¢ Builtin InnoDB format: āAntelopeā
ā¢ New āBarracudaā format enables
compression,ROW_FORMAT=DYNAMIC
ā¢ Fast index creation, other features do not
.ibd
require Barracuda file format
data files
(file per
ā¢ Builtin InnoDB can access āAntelopeā
table)
databases, but not āBarracudaā
databases
ā¢ Check file format tag in system tablespace
on startup
ā¢ Enable a file format with new dynamic
parameter innodb_file_format
ā¢ Preserves ability to downgrade easily
24. Source Code Subdirectories
ā¢ buf ā¢ ibuf ā¢ que
ā¢ data ā¢ include ā¢ read
ā¢ db ā¢ lock ā¢ rem
ā¢ dict ā¢ log ā¢ row
ā¢ dyn ā¢ math ā¢ srv
ā¢ eval ā¢ mem ā¢ sync
ā¢ fil ā¢ mtr ā¢ thr
ā¢ fsp ā¢ os ā¢ trx
ā¢ fut ā¢ page ā¢ usr
ā¢ ha ā¢ pars ā¢ ut
ā¢ handler
25. Summary:
Durability, Performance,
Compatibility & Efficiency
ā¢ InnoDB is the leading transactional storage engine
for MySQL
ā¢ InnoDBās architecture is well-suited to modern, on-
line transactional applications; as well as embedded
applications.
ā¢ InnoDBās file format is designed for high durability,
better performance, and easy to manage
26. For More Information ā¦
2009 MySQL User Conference
InnoDB Birds of a Feather
Wed 7:30pm
Ballroom C
ā¢ Heikki Tuuri: Concurrency Control: How it Really Works, Thurs,
2:50pm
Please visit www.innodb.com,
blogs.innodb.com and forums.innodb.com
29. InnoDB Size Limits
Max # of tables: 4 G
ā¢
Max size of a table: 32TB
ā¢
Columns per table: 1000
ā¢
Max row size: n*4 GB
ā¢
ā¢ 8 kB if stored on the same page
ā¢ n*4 GB with n BLOBs
ā¢ Max key length: 3500
ā¢ Maximum tablespace size: 64 TB
ā¢ Max # of concurrent trxs: 1023