• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The InnoDB Storage Engine for MySQL
 

The InnoDB Storage Engine for MySQL

on

  • 643 views

 

Statistics

Views

Total Views
643
Views on SlideShare
633
Embed Views
10

Actions

Likes
1
Downloads
26
Comments
0

1 Embed 10

https://twitter.com 10

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The InnoDB Storage Engine for MySQL The InnoDB Storage Engine for MySQL Presentation Transcript

    • <Insert Picture Here> The InnoDB Storage Engine for MySQL
 Morgan Tocker, MySQL Community Manager
 http://www.tocker.ca/

    • Safe Harbor Statement The  following  is  intended  to  outline  our  general  product  direction.   It  is  intended  for  information  purposes  only,  and  may  not  be   incorporated  into  any  contract.  It  is  not  a  commitment  to  deliver   any  material,  code,  or  functionality,  and  should  not  be  relied  upon   in  making  purchasing  decisions.   
 The  development,  release,  and  timing  of  any  features  or   functionality  described  for  Oracle’s  products  remains  at  the  sole   discretion  of  Oracle.
    • 4 Years of MySQL Innovation MySQL Cluster 7.3 MySQL Workbench 6.0 MySQL Migration Wizard <Insert Picture Here> MySQL 5.5 MySQL 5.6 Windows installer & Tools MySQL 5.7 M y S Q L Cluster MySQL MySQL Enterprise Monitor 2.3 & e r M a n a g 3.0 Applier for Hadoop MySQL Enterprise Backup Security MySQL Utilities MySQL Workbench 5.2 & 6.0 Scalability MySQL Cluster 7.2 HA MySQL Enterprise MySQL Cluster 7.1 O r a c l e C e r t i f i c a t i o n s Audit
    • Hello and Welcome! • I will be talking about InnoDB’s internal behaviour. • Not talking (much) about MySQL. • Aim of this talk is to give you X-ray vision. • i.e. not so many direct takeaways, but one day it will help you debug a problem.
    • Prerequisites Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
    • MySQL Architecture
    • IO Performance L1 cache reference Branch mispredict L2 cache reference Mutex lock/unlock Main memory reference Compress 1K bytes with Zippy Send 2K bytes over 1 Gbps network Read 1 MB sequentially from memory Round trip within same datacenter Disk seek Read 1 MB sequentially from disk Send packet CA->Netherlands->CA 0.5 ns! 5 ns! 7 ns! 25 ns! 100 ns! 3,000 ns! 20,000 ns! 250,000 ns! 500,000 ns! 10,000,000 ns! 20,000,000 ns! 150,000,000 ns See: http://www.linux-mag.com/cache/7589/1.html and Google http:// www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
    • IO Performance (cont.) • 5-10ms per disk IO. • Maybe 50us for a high end SSD. • Still not “memory speed”.
    • Buffered IO • Operating Systems compensate well already. • Reads are cached with free memory. • Writes don’t happen instantly. • A step is introduced to rewrite and merge. Block 9, 10, 1, 4, 200, 5. Block 1, 4, 5, 9, 10, 200
    • fsync Synopsis #include <unistd.h> int fsync(int fd); int fdatasync(int fd); ! Description fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)).
    • Basic  Operation Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
    • Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB High Level Overview IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd In Memory On Disk
    • Query (pages not in buffer pool) mysqld SELECT * FROM a WHERE id = 10; Buffer Pool Buffer Pool LRU Adaptive Hash Indexes Data Dictionary Cache Buffer Pool Flush List Page Cache Log Group ibdata1 space 0 Storage InnoDB Additional Mem Pool Log Buffer Transaction System Caching Not Found IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • Query (pages in buffer pool) mysqld SELECT * FROM a WHERE id = 10; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • Update Query in a Transaction (simplified) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • Log files • Provide recovery. • Only written to in regular operation. • Read only required if there is a crash. • Are rewritten over-and-over again. • Think of it like a tank tread.
    • Log files (cont.) • Are an optimization! • 512B aligned sequential writes. • Tablespace writes are 16KiB random writes. • Tablespace writes to same pages in close time window can be merged. • Just need a large enough log file.
    • Checkpoint (Background Activity) mysqld (nothing) Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • FAQ • Q: What do we write to the log - is it committed data only, or can we write uncommitted data as well? • A: Both.
    • FAQ • Q: How do you unapply transactions? • A: UNDO space.
 Think of it like a hidden table internally stored in ibdata1.
    • Update Query (More Accurate*) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Page Cache Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Additional Mem Pool Redirect older version Update any indexes Modify row in place of row to undo space to hold both versions. Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • Update Query (cont.) • Background purge process is able to clean old rows from UNDO as soon as oldest transaction advances forward.
    • Summarized Performance Characteristics • Log Files: • Are short sequential writes. • They permit InnoDB to delay tablespace writes enabling more merging/optimization. • Buffer Pool: • “In memory version of the tablespace”. • Loading/unloading via modified LRU algorithm.
    • Index Structure • Indexes and “data” in InnoDB are B+Trees. • Clustered Index design means that data itself is stored in an index.
    • Index Structure (cont.) Empty root Infimum Level 0 Root Page 3 Next Record Supremum
    • Index Structure (cont.) Insert: 1 Infimum Level 0 Root Page size is 16KB B* is 2KB Page 3 1 B* Supremum
    • Index Structure (cont.) Insert: 1 to 7 I Level 0 Root S Page 3 1 B* 2 B* 3 B* 4 B* 5 B* 6 B* 7 B*
    • Index Structure (cont.) Insert: 8 Infimum Page 3 Level 1 Root 1 4 I Level 0 Leaf Allocate new page and link in root Move records to new page Split new page Supremum S Page 4 1 B* 2 B* 3 B* 4 B* 5 B* 6 B* 7 B*
    • Index Structure (cont.) Insert: 8 (Cont.) Infimum Level 1 Root 1 4 I Level 0 Leaf S Page 4 1 B* Split at the middle of original page 2 B* 3 B* Supremum Page 3 4 5 I S Page 5 4 B* 5 B* 6 B* 7 B*
    • Index Structure (cont.) Insert: 9 and 10 Infimum Level 1 Root 1 4 I Level 0 Leaf S Page 4 1 B* 2 B* 3 B* Supremum Page 3 4 5 I S Page 5 4 B* 5 B* 6 B* 7 B* 8 B* 9 B* 10 B*
    • Index Structure (cont.) Insert: 11 Infimum Level 1 Root 4 5 1 4 I Level 0 Leaf Page 3 S Page 4 I Supremum 11 6 S Page 5 I S Page 6 1 B* 2 B* 3 B* Insert leads to a split at the insertion point 4 B* 5 B* 6 B* 7 B* 8 B* 9 B* 10 B* 11 B*
    • Index Structure Level 2 Root Infimum Level 1 Internal Page 6 0 A 1 B S Page 4 ≥0 → 6 I Level 0 Leaf Next Record ≥0 → 4 I S Supremum Page 3 ≥4 → 5 I Next Page ≥2 → 7 ≥4 → 8 Prev Page I Page 7 2 C 3 D S I Page 8 4 E 5 F S Page 5 S ≥6 → 9 I Page 9 6 G 7 H S
    • Page Format 0 38 FIL Header (38) Other headers and page data, depending on page type. Total usable space: 16,338 bytes. 16376 16384 FIL Trailer (8)
    • Row Format N-5 N-4 N-2 N N+k N+k+6 N+k+13 N+k+13+j Variable field lengths (1-2 bytes per var. field) Info Flags (4 bits) Number of Records Owned (4 bits) Order (13 bits) Record Type (3 bits) Next Record Offset (2) Cluster Key Fields (k) Transaction ID (6) Roll Pointer (7) Non-Key Fields (j)
    • Conclusion • Page is basic unit of storage. • Default is 16KiB • Rows of variable length.
    • Two more useful features • Adaptive hash - Partial hash index to accelerate secondary key lookups. • Change buffering - when non-unique indexes are not in memory, changes can be temporarily buffered until they are.
    • Query (by secondary key) mysqld SELECT * FROM a WHERE b_key = 10; Buffer Pool Data Dictionary Cache Buffer Pool Flush List Adaptive Hash Indexes Log Group ibdata1 space 0 Storage Buffer Pool LRU Page Cache Additional Mem Pool Log Buffer Transaction System Caching InnoDB IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • Update Query (large table) mysqld UPDATE a SET col1 = ‘new’ WHERE id = 10; commit; Buffer Pool Caching InnoDB Buffer Pool LRU Adaptive Hash Indexes Data Dictionary Cache Buffer Pool Flush List Page Cache Additional Mem Pool Not Required Log Group ibdata1 space 0 Storage Transaction System Log Buffer IBUF_HEADER IBUF_TREE TRX_SYS FIRST_RSEG DICT_HDR iblogfile0 iblogfile1 Doublewrite Buffer Data Dict. Block 1 (64 pages) Block 2 (64 pages) SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS iblogfile2 Tables with file_per_table A.ibd B.ibd C.ibd
    • New  Features Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
    • MySQL 5.5+ • • • • • • • • • IO Scalability Async IO Multiple Buffer Pools Adaptive Flushing Scan Resistant LRU Compressed Pages CPU Scalability Improved Atomics Spin Loops with PAUSE
    • MySQL 5.6+ • LRU Dump and Restore • Improved Group Commit • Fulltext Search • Fast Read-only Transactions • Memcached Interface • Information Schema metadata tables • • • • Persistent Statistics Variable Page Size Online DDL Transportable Tablespace • Transactional Replication
 Using InnoDB
    • MySQL 5.7+ • Faster Temporary Tables • Index Lock Contention Reduction • More Online DDL • Extend VARCHAR • Rename Index • Improved Read-Only Transactions • Improved CPU Scalability
    • Configuration Copyright  ©  2012  Oracle  and/or  its  affiliates.  All  rights  reserved.
    • The Top 3 1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb_flush_log_at_trx_commit
    • innodb-buffer-pool-size • Really only one major buffer/cache settings to set. • Responsible for all pages types (data, indexes, undo, insert buffer..)
    • innodb-buffer-pool-size (cont.) • Recommendation is 50-80% of RAM. • Default is 128M of RAM. • Please allow 5-10% on top for other meta data to grow.
    • innodb-log-file-size • Log files are on disk, but this contributes to how many unflushed (dirty) pages you can hold in memory. • In theory larger log files = longer crash recovery. • In MySQL 5.5 -2G max. • In MySQL 5.6 - 4G is usually safe. • Early versions should be much smaller. • Default is 48M Log Files.
    • innodb_flush_log_at_trx_commit • Default is full ACID Compliance (=1) • Can be set to 0/2 if you do not mind some data loss.
    • innodb_flush_log_at_trx_commit • 0 = Log buffer written + synced once per second. Nothing done at commit. • 1 = Log buffer written + synced once per second + written and synced on commit. • 2 = Log buffer written + synced once per second + written (not synced) on commit. ! 2 is a slightly safer version of 0.
    • Requires about 5-10% of buffer pool size as overhead (not directly configurable). Basic Configuration Buffer Pool InnoDB Buffer Pool LRU Adaptive Hash Indexes Additional Mem Pool innodb_log_file_size. Typical values 256M+. Default of 2 files (innodb_log_files_in_group). Log Group iblogfile0 ibdata1 space 0 innodb_log_buffer_size. Typical Doublewrite Buffer values 1-8M. Flushing IBUF_HEADER IBUF_TREE Block 1 (64 pages) behaviour influenced by TRX_SYS Block 2 (64 pages) FIRST_RSEG innodb_flush_log_at_trx_commit. DICT_HDR Storage Data Dictionary Cache Buffer Pool Flush List Page Cache Log Buffer Transaction System Caching Innodb_buffer_pool_size. Recommendation is 50-80% RAM. iblogfile1 iblogfile2 Tables with file_per_table Data Dict. SYS_TABLES SYS_COLUMNS SYS_INDEXES SYS_FIELDS A.ibd B.ibd C.ibd innodb_file_per_table (Default: ON in 5.6+)
    • The ~Top 10 1. innodb-buffer-pool-size 2. innodb-log-file-size 3. innodb-log-buffer-size 4. innodb_flush_log_at_trx_commit 5. innodb_flush_method 6. innodb_flush_neighbors 7. innodb_io_capacity, innodb_io_capacity_max, innodb_lru_scan_depth 8. innodb-buffer-pool-instances 9. innodb_read_io_threads and innodb_write_io_threads
    • Less-likely to need configuration • • • • • innodb_thread_concurrency innodb_concurrency_tickets innodb_max_pct_dirty_pages innodb_use_native_aio (always on) innodb_old_blocks_time (5.6 default: 1000)
    • Deprecated Settings • Typically “remove on sight” from config files: • innodb_additional_mempool_size • innodb_use_sys_malloc
    • Credits • InnoDB Architecture Diagrams via https://github.com/ jeremycole/innodb_diagrams • Available under (3-clause) BSD license
 Copyright (c) 2013, Twitter, Inc.
 Copyright (c) 2013, Jeremy Cole <jeremy@jcole.us>
 Copyright (c) 2013, Davi Arnaut <darnaut@gmail.com>