0
Transactional Storage for MySQL
FAST. RELIABLE. PROVEN.
InnoDB Internals: InnoDB File
Formats and Source Code
Structure
My...
Today’s Topics
• Goals of InnoDB
• Key Functional Characteristics
• InnoDB Design Considerations
• InnoDB Architecture
• I...
Goals of InnoDB
• OLTP oriented
• Performance, Reliability, Scalability
• Data Protection
• Portability
InnoDB Key Functional
Characteristics
• Full transaction support
• Row-level locking
• MVCC
• Crash recovery
• Efficient IO
Design Considerations
• Modeled on Gray & Reuter’s “Transactions
Processing: Concepts & Techniques”
• Also emulated the Or...
InnoDB Architecture
IO
Buffer
File Space Manager
Transaction
Handler API Embedded InnoDB API
Cursor / Row
Mini-
transactio...
InnoDB On Disk Format
• InnoDB Database Files
• InnoDB Tablespaces
• InnoDB Pages / Extents
• InnoDB Rows
• InnoDB Indexes...
InnoDB Database Files
ibdata files
Systemtablespace
internal
data
dictionary
MySQL Data Directory
InnoDB
tables
OR innodb_...
InnoDB Tablespaces
• A tablespace consists of multiple files and/or
raw disk partitions.
file_name:file_size[:autoextend[:...
System Tablespace
• Internal Data Dictionary
• Undo
• Insert Buffer
• Doublewrite Buffer
• MySQL Replication Info
InnoDB Tablespaces
Extent
Segment
Extent
Extent Extent
an extent = 64 pages
Extent
Trx id
Row
Field 1
Roll pointer
Field p...
InnoDB Pages
Symbol Value Notes
FIL_PAGE_INODE 3 File segment inode
FIL_PAGE_INDEX 17855 B-tree node
FIL_PAGE_TYPE_BLOB 10...
InnoDB Pages
A page consists of: a page header, a page
trailer, and a page body (rows or other
contents).
Page header
Page...
Page Declares
typedef struct /* a space address */
{
ulint pageno; /* page number within the file */
ulint boffset; /* byt...
InnoDB Compressed Pages
•InnoDB keeps a “modification
log” in each page
•Updates & inserts of small
records are written to...
InnoDB Rows
prefix(768B) ……
overflow
page
COMACT formatCOMACT format
Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr...
InnoDB Indexes - Primary
●Data rows are stored
in the B-tree leaf
nodes of a clustered
index
●B-tree is organized
by prima...
InnoDB Indexes - Secondary
● Secondary index B-
tree leaf nodes
contain, for each key
value, the primary
keys of the
corre...
DATA
InnoDB Logging
Rollback segments
Log Buffer Buffer Pool
redo
log
rollback
Log File
#1
Log File
#2
log thread
write th...
InnoDB Redo Log
Redo log structure:
Space id PageNo OpCode Data
end of log
min LSN
start of log last checkpoint
File Format Management
• Builtin InnoDB format: “Antelope”
• New “Barracuda” format enables
compression,ROW_FORMAT=DYNAMIC...
InnoDB File Format Design
Considerations
• Durability
• Logging, doublewrite, checksum;
• Performance
• Insert buffering, ...
Source Code Structure
• 31 subdirectories
• Relevant InnoDB source files on file
formats
• Tablespace: fsp0fsp {.c, .ic, ....
Source Code Subdirectories
• buf
• data
• db
• dict
• dyn
• eval
• fil
• fsp
• fut
• ha
• handler
• ibuf
• include
• lock
...
Summary:
Durability, Performance,
Compatibility & Efficiency
• InnoDB is the leading transactional storage engine
for MySQ...
Q U E S T I O N SQ U E S T I O N S
A N S W E R SA N S W E R S
InnoDB Size Limits
• Max # of tables: 4 G
• Max size of a table: 32TB
• Columns per table: 1000
• Max row size: n*4 GB
• 8...
Upcoming SlideShare
Loading in...5
×

Inno db internals innodb file formats and source code structure

3,431

Published on

InnoDB Internals InnoDB File Formats and Source Code Structure

Published in: Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,431
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
125
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Inno db internals innodb file formats and source code structure"

  1. 1. Transactional Storage for MySQL FAST. RELIABLE. PROVEN. InnoDB Internals: InnoDB File Formats and Source Code Structure MySQL University, October 2009 Calvin Sun Principal Engineer Oracle Corporation
  2. 2. Today’s Topics • Goals of InnoDB • Key Functional Characteristics • InnoDB Design Considerations • InnoDB Architecture • InnoDB On Disk Format • Source Code Structure • Q & A
  3. 3. Goals of InnoDB • OLTP oriented • Performance, Reliability, Scalability • Data Protection • Portability
  4. 4. InnoDB Key Functional Characteristics • Full transaction support • Row-level locking • MVCC • Crash recovery • Efficient IO
  5. 5. Design Considerations • Modeled on Gray & Reuter’s “Transactions Processing: Concepts & Techniques” • Also emulated the Oracle architecture • Added unique subsystems • Doublewrite • Insert buffering • Adaptive hash index • Designed to evolve with changing hardware & requirements
  6. 6. InnoDB Architecture IO Buffer File Space Manager Transaction Handler API Embedded InnoDB API Cursor / Row Mini- transaction LockB-tree Page Server Applications
  7. 7. InnoDB On Disk Format • InnoDB Database Files • InnoDB Tablespaces • InnoDB Pages / Extents • InnoDB Rows • InnoDB Indexes • InnoDB Logs • File Format Design Considerations
  8. 8. InnoDB Database Files ibdata files Systemtablespace internal data dictionary MySQL Data Directory InnoDB tables OR innodb_file_per_table .ibd files .frm files undo logs insert buffer
  9. 9. InnoDB Tablespaces • A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]] • A file/partition is a collection of segments. • A segment consists of fixed-length pages. • The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).
  10. 10. System Tablespace • Internal Data Dictionary • Undo • Insert Buffer • Doublewrite Buffer • MySQL Replication Info
  11. 11. InnoDB Tablespaces Extent Segment Extent Extent Extent an extent = 64 pages Extent Trx id Row Field 1 Roll pointer Field pointers Field 2 Field n Row Page Row Row Row Row Leaf node segment Tablespace Rollback segment Non-leaf node segment RowRow
  12. 12. InnoDB Pages Symbol Value Notes FIL_PAGE_INODE 3 File segment inode FIL_PAGE_INDEX 17855 B-tree node FIL_PAGE_TYPE_BLOB 10 Uncompressed BLOB page FIL_PAGE_TYPE_ZBLOB 11 1st compressed BLOB page FIL_PAGE_TYPE_ZBLOB2 12 Subsequent compressed BLOB page FIL_PAGE_TYPE_SYS 6 System page FIL_PAGE_TYPE_TRX_SYS 7 Transaction system page others i-buf bitmap, I-buf free list, file space header, extent desp page, new allocated page InnoDB Page TypesInnoDB Page Types
  13. 13. InnoDB Pages A page consists of: a page header, a page trailer, and a page body (rows or other contents). Page header Page trailer row offset array Row RowRow Row Row RowRow Row Row RowRow
  14. 14. Page Declares typedef struct /* a space address */ { ulint pageno; /* page number within the file */ ulint boffset; /* byte offset within the page */ } fil_addr_t; typedef struct { ulint checksum; /* checksum of the page (since 4.0.14) */ ulint page_offset; /* page offset inside space */ fil_addr_t previous; /* offset or fil_addr_t */ fil_addr_t next; /* offset or fil_addr_t */ dulint page_lsn; /* lsn of the end of the newest modification log record to the page */ PAGE_TYPE page type; /* file page type */ dulint file_flush_lsn;/* the file has been flushed to disk at least up to this lsn */ int space_id; /* space id of the page */ char data[]; /* will grow */ ulint page_lsn; /* the last 4 bytes of page_lsn */ ulint checksum; /* page checksum, or checksum magic, or 0 */ } PAGE, *PAGE;
  15. 15. InnoDB Compressed Pages •InnoDB keeps a “modification log” in each page •Updates & inserts of small records are written to the log w/o page reconstruction; deletes don’t even require uncompression •Log also tells InnoDB if the page will compress to fit page size •When log space runs out, InnoDB uncompresses the page, applies the changes and recompresses the page Page header modification log Page trailer page directory compressed data BLOB pointers empty space
  16. 16. InnoDB Rows prefix(768B) …… overflow page COMACT formatCOMACT format Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values overflow page … … DYNAMIC formatDYNAMIC format 20 bytes
  17. 17. InnoDB Indexes - Primary ●Data rows are stored in the B-tree leaf nodes of a clustered index ●B-tree is organized by primary key or non-null unique key of table, if defined; else, an internal column with 6-byte ROW_ID is added. xxxxxxxxxxxx ---- nnnnnnnnnnnn001001001001 ---- 275275275275 276276276276 –––– 500500500500 clustered (primary key) index 501501501501 ---- 630630630630 631631631631 ---- 768768768768 769769769769 ---- 800800800800 801801801801 ---- 949949949949 950950950950 ---- xxxxxxxxxxxx 001001001001 –––– 500500500500 801801801801 –––– nnnnnnnnnnnn 500500500500 –––– 800800800800 PK valuesPK valuesPK valuesPK values 001001001001 ---- nnnnnnnnnnnn Key valuesKey valuesKey valuesKey values 501501501501----630630630630 + data for+ data for+ data for+ data for corresponding rowscorresponding rowscorresponding rowscorresponding rows …… Primary Index
  18. 18. InnoDB Indexes - Secondary ● Secondary index B- tree leaf nodes contain, for each key value, the primary keys of the corresponding rows, used to access clustering index to obtain the data clustered (primary key) index clustered (primary key) index Secondary index PK valuesPK valuesPK valuesPK values 001001001001 ---- nnnnnnnnnnnn B-tree leaf nodes, containing data key valueskey valueskey valueskey values A ZA ZA ZA Z B-tree leaf nodes, containing PKs Secondary index key valueskey valueskey valueskey values A ZA ZA ZA Z B-tree leaf nodes, containing PKs Secondary Index
  19. 19. DATA InnoDB Logging Rollback segments Log Buffer Buffer Pool redo log rollback Log File #1 Log File #2 log thread write thread log files ibdata files
  20. 20. InnoDB Redo Log Redo log structure: Space id PageNo OpCode Data end of log min LSN start of log last checkpoint
  21. 21. File Format Management • Builtin InnoDB format: “Antelope” • New “Barracuda” format enables compression,ROW_FORMAT=DYNAMIC • Fast index creation, other features do not require Barracuda file format • Builtin InnoDB can access “Antelope” databases, but not “Barracuda” databases • Check file format tag in system tablespace on startup • Enable a file format with new dynamic parameter innodb_file_format • Preserves ability to downgrade easily .ibd data files (file per table)
  22. 22. InnoDB File Format Design Considerations • Durability • Logging, doublewrite, checksum; • Performance • Insert buffering, table compression • Efficiency • Dynamic row format, table compression • Compatibility • File format management
  23. 23. Source Code Structure • 31 subdirectories • Relevant InnoDB source files on file formats • Tablespace: fsp0fsp {.c, .ic, .h} • Page: page0page, page0zip {.c, .ic, .h} • Log: log0log {.c, .ic, .h}
  24. 24. Source Code Subdirectories • buf • data • db • dict • dyn • eval • fil • fsp • fut • ha • handler • ibuf • include • lock • log • math • mem • mtr • os • page • pars • que • read • rem • row • srv • sync • thr • trx • usr • ut
  25. 25. Summary: Durability, Performance, Compatibility & Efficiency • InnoDB is the leading transactional storage engine for MySQL • InnoDB’s architecture is well-suited to modern, on- line transactional applications; as well as embedded applications. • InnoDB’s file format is designed for high durability, better performance, and easy to manage
  26. 26. Q U E S T I O N SQ U E S T I O N S A N S W E R SA N S W E R S
  27. 27. InnoDB Size Limits • Max # of tables: 4 G • Max size of a table: 32TB • Columns per table: 1000 • Max row size: n*4 GB • 8 kB if stored on the same page • n*4 GB with n BLOBs • Max key length: 3500 • Maximum tablespace size: 64 TB • Max # of concurrent trxs: 1023
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×