SlideShare a Scribd company logo
1 of 52
Innodb Architecture and Internals
                                  Peter Zaitsev
                  Percona Live, Washington DC
                               11 January 2012
-2-


               About Presentation
●   Brief Introduction in Innodb Architecture
    – This area would deserve many books




                                           www.percona.com
Innodb Versions

●   MySQL 5.5 Current GA
    – Lots of improvements compared to previous versions
●   MySQL 5.6 Current development release
    – Will mention some changes
●   Percona Server 5.5
    – Based on MySQL 5.5 with improvements




                                           www.percona.com
-4-



                General Architecture

●   Traditional OLTP Engine
    – “Emulates Oracle Architecture”
●   Implemented using MySQL Storage engine API
●   Row Based Storage. Row Locking. MVCC
●   Data Stored in Tablespaces
●   Log of changes stored in circular log files
●   Data pages as pages in “Buffer Pool”


                                          www.percona.com
-5-


         Storage Files Layout



Physical Structure of Innodb Tabespaces and
                    Logs




                                 www.percona.com
-6-



                 Innodb Tablespaces

●   All data stored in Tablespaces
    – Changes to these databases stored in Circular Logs
    – Changes has to be reflected in tablespace before log
      record is overwritten
●   Single tablespace or multiple tablespace
    – innodb_file_per_table=1
●   System information always in main tablespace
    – Main tablespace can consist of many files
       • They are concatenated

                                             www.percona.com
-7-



                  Tablespace Format

●   Collection of Segments
    – Segment is like a “file”
●   Segment is number of extents
    – Typically 64 of 16K page sizes
    – Smaller extents for very small objects
●   First Tablespace page contains header
    – Tablespace size
    – Tablespace id

                                               www.percona.com
-8-



                   Types of Segments

●   Each table is Set of Indexes
    – Innodb has “index organized tables”
●   Each index has
    – Leaf node segment
    – Non Leaf node segment
●   Special Segments
    – Rollback Segment(s)
    – Insert buffer, etc

                                            www.percona.com
-9-


                 Innodb Log Files

●   Set of log files (ib_logfile?)
    – 2 log files by default. Effectively concatenated
●   Log Header
    – Stores information about last checkpoint
●   Log is NOT organized in pages, but records
    – Records aligned 512 bytes, matching disk sector
●   Log record format “physiological”
    – Stores Page# and operation to do on it
●   Only REDO operations are stored in logs.
                                               www.percona.com
Separate Undo Tablespace
●   MySQL 5.6 allows to store unto tablespace in
    separate set of files
    – innodb_undo_directory
    – innodb_undo_tablespaces
    – innodb_undo_logs
●   Note once you enable these options you can't
    downgrade
●   Offers another flexibility of using fast storage
    (such as SSD)

                                           www.percona.com
-11-



    Innodb Threads Architecture




What threads are there and what they do




                               www.percona.com
-12-



            General Thread Architecture
●   Using MySQL Threads for execution
    – Normally thread per connection
●   Transaction executed mainly by such thread
    – Little benefit from Multi-Core for single query
●   innodb_thread_concurrency can be used to limit
    number of executing threads
    – Reduce contention
●   This limit is number of threads in kernel
    – Including threads doing Disk IO or storing data in TMP
      Table.
                                                 www.percona.com
-13-



                       Helper Threads
●   Main Thread
    – Schedules activities – flush, purge, checkpoint, insert
      buffer merge
●   IO Threads
    – Read – multiple threads used for read ahead
    – Write – multiple threads used for background writes
    – Insert Buffer thread used for Insert buffer merge
    – Log Thread used for flushing the log
●   Purge thread(s) (MySQL 5.5 and XtraDB)
●   Deadlock detection thread & Others
                                                 www.percona.com
-14-



           Memory Handling




How Innodb Allocates and Manages Memory




                              www.percona.com
-15-


           Memory Allocation Basics

●   Buffer Pool
    – Set by innodb_buffer_pool_size
    – Database cache; Insert Buffer; Locks
    – Takes More memory than specified
       • Extra space needed for Latches, LRU etc
●   Additional Memory Pool
    – Dictionary and other allocations
    – innodb_additional_mem_pool_size
       • Not used in newer releases
●   Log Buffer (innodb_log_buffer_size)
                                                   www.percona.com
-16-



          Disk IO




How Innodb Performs Disk IO




                         www.percona.com
-17-



                          Reads

●   Most reads done by executing threads
●   Read-Ahead performed by background threads
    – Linear
    – Random
    – Do not count on read ahead a lot
●   Insert Buffer merge process causes reads




                                         www.percona.com
-18-



                              Writes
●   Data Writes are Background in Most cases
    – As long as you can flush data fast enough you're good
●   Synchronous flushes can happen if no free
    buffers available
●   Log Writes can by sync or async depending on
    innodb_flush_log_at_trx_commit
    – 1 – fsync log on transaction commit
    – 0 – do not flush. Flushed in background ~ once/sec
    – 2 – Flush to OS cache but do not call fsync()
       • Data safe if MySQL Crashes but OS Survives
                                                  www.percona.com
Flush List Writes

●   Flushing to advance “earliest modify LSN”
    – To free log space so it can be reduced
●   Most of writes typically happen this way
●   Number of pages to flush per cycle depended on
    the load
    – “innodb_adaptive_flushing”
    – Percona Server has more flushing modes
       • See innodb_adaptive_flushing_method
●   If Flushing can't keep up stalls can happen
                                               www.percona.com
Example of Misbehavior
●   Data fits in memory and can be modified fast
    – Yet we can't flush data fast enough
●   Working on solution in XtraDB




                                            www.percona.com
LRU Flushes
●   Can happen in workloads with data sets larger
    than memory
●   If Innodb is unable to find clean page in 10% of
    LRU list
●   LRU Flushes happen in user threads
●   Hard to see exact number in standard Innodb
    – XtraDB adds
      Innodb_buffer_pool_pages_LRU_flushed



                                           www.percona.com
LRU Flushes in MySQL 5.6
●   MySQL 5.6 adds “page_cleaner” to avoid LRU
    flushes in User Threads
●   innodb_lru_scan_depth=N
    – Controlls how deeply page cleaner will examine Tail
      of LRU for dirty pages
    – Happens once per second




                                             www.percona.com
-23-



                     Page Checksums
●   Protection from corrupted data
    – Bad hardware, OS Bugs, Innodb Bugs
    – Are not completely replaced by Filesystem Checksums
●   Checked when page is Read to Buffer Pool
●   Updated when page is flushed to disk
●   Can be significant overhead
    – Especially for very fast storage
●   Can be disabled by innodb_checksums=0

                                            www.percona.com
-24-


                Double Write Buffer

●   Innodb log requires consistent pages for recovery
●   Page write may complete partially
    – Updating part of 16K and leaving the rest
●   Double Write Buffer is short term page level log
●   The process is:
    – Write pages to double write buffer; Sync
    – Write Pages to their original locations; Sync
    – Pages contain tablespace_id+page_id
●   On crash recovery pages in buffer are compared
    to their original location            www.percona.com
-25-



                   Direct IO Operation

●   Default IO mode for Innodb data is Buffered
●   Good
    – Faster flushes when no write cache
    – Faster warmup on restart
    – Reduce problems with inode locking on EXT3
●   Bad
    – Lost of effective cache memory due to double buffering
    – OS Cache could be used to cache other data
    – Increased tendency to swap due to IO pressure
●   innodb_flush_method=O_DIRECT
                                                   www.percona.com
-26-



                           Log IO

●
    Log are opened in buffered mode
    – Even with innodb_flush_method=O_DIRECT
    – XtraDB can use O_DIRECT for logs
       • innodb_flush_method=ALL_O_DIRECT
●   Flushed by fsync() - default or O_SYNC
●   Logs are often written in 512 byte blocks
    – innodb_log_block_size=4096 in XtraDB
●   Logs which fit in cache may improve performance
    – Small transactions and
      innodb_flush_log_at_trx_commit=1 or 2
                                                www.percona.com
-27-



                Indexes




How Indexes are Implemented in Innodb




                              www.percona.com
-28-



               Everything is the Index
●   Innodb tables are “Index Organized”
    – PRIMARY KEY contains data instead of data pointer
●   Hidden PRIMARY KEY is used if not defined (6b)
●   Data is “Clustered” by PRIMARY KEY
    – Data with close PK value is stored close to each other
    – Clustering is within page ONLY
●   Leaf and Non-Leaf nodes use separate Segments
    – Makes IO more sequential for ordered scans
●   Innodb system tables SYS_TABLES and SYS_INDEXES
    hold information about index “root”
                                                www.percona.com
-29-


                 Index Structure

●   Secondary Indexes refer to rows by Primary Key
    – No update when row is moved to different page
●   Long Primary Keys are expensive
    – Increase size of all Indexes
●   Random Primary Key Inserts are expensive
    – Cause page splits; Fragmentation
    – Make page space utilization low
●   AutoIncrement keys are often better than
    artificial keys, UUIDs, SHA1 etc.
                                           www.percona.com
-30-



           Multi Versioning




Implementation of Multi Versioning and
              Locking




                               www.percona.com
-31-



            Multi Versioning at Glance

●   Multiple versions of row exist at the same time
●   Read Transaction can read old version of row,
    while it is modified
    – No need for locking
●   Locking reads can be performed with SELECT
    FOR UPDATE and LOCK IN SHARE MODE
    Modifiers



                                         www.percona.com
-32-



          Transaction isolation Modes

●   SERIALIZABLE
    – Locking reads. Bypass multi versioning
●   REPEATABLE-READ (default)
    – Read commited data at it was on start of transaction
●   READ-COMMITED
    – Read commited data as it was at start of statement
●   READ-UNCOMMITED
    – Read non committed data as it is changing live

                                               www.percona.com
-33-



          Updates and Locking Reads

●   Updates bypass Multi Versioning
    – You can only modify row which currently exists
●   Locking Read bypass multi-versioning
    – Result from SELECT vs SELECT .. LOCK IN SHARE
      MODE will be different
●   Locking Reads are slower
    – Because they have to set locks
    – Can be 2x+ slower !
    – SELECT FOR UPDATE has larger overhead
                                             www.percona.com
-34-



          Multi Version Implementaition
●   The most recent row version is stored in the page
    – Even before it is committed
●   Previous row versions stored in undo space
    – Located in System tablespace
●   The number of versions stored is not limited
    – Can cause system tablespace size to explode.
●   Access to old versions require going through linked list
    – Long transactions with many concurrent updates can
      impact performance.

                                                   www.percona.com
-35-



               Multi Versioning Indexes
●   Indexes contain pointers to all versions
    – Index key 5 will point to all rows which were 5 in the
      past
●   Indexes contain TRX_ID
    – Easy to check entry is visible
    – Can use “Covering Indexes”
●   Many old versions is performance problem
    – Slow down accesses
    – Will leave many “holes” in pages when purged

                                                 www.percona.com
-36-



                Cleaning up the Garbage
●   Old Row and index entries need to be removed
    – When they are not needed for any active transaction
●   REPEATABLE READ
    – Need to be able to read everything at transaction start
●   READ-COMMITED
    – Need to read everything at statement start
●   Purge Thread(s) may be unable to keep up with intensive
    updates
    – Innodb “History Length” will grow high
●   innodb_max_purge_lag slows updates down
    – Not very reliable                            www.percona.com
-37-



     Innodb Locking




How Innodb Locking Works




                      www.percona.com
-38-



                  Innodb Locking Basics
●   Pessimistic Locking Strategy
●   Graph Based Deadlock Detection
    – Takes shortcut for very large lock graphs
●   Row Level lock wait timeout
    – innodb_lock_wait_timeout
●   Traditional “S” and “X” locks
●   Intention locks on tables “IS” “IX”
    – Restricting table operations
●   Locks on Rows AND Index Records
●   No Lock escalation
                                                  www.percona.com
-39-



                          Gap Locks
●   Innodb does not only locks rows but also gap between them
●   Needed for consistent reads in Locking mode
    – Also used by update statements
●   Innodb has no Phantoms even in Consistent Reads
●   Gap locks often cause complex deadlock situations
●   “infinum”, “supremum” records define bounds of data stored
    on the page
    – May not correspond to actual rows stored
●   Only record lock is needed for PK Update


                                                  www.percona.com
-40-



                       Lock Storage

●   Innodb locks storage is pretty compact
    – This is why there is no lock escalation !
●   Lock space needed depends on lock location
    – Locking sparse rows is more expensive
●   Each Page having locks gets bitmap allocated
    for it
    – Bitmap holds lock information for all records on the
      page
●   Locks typically take 3-8 bits per locked row
                                                  www.percona.com
-41-



                Auto Increment Locks
●   Major Changes in MySQL 5.1 !
●   MySQL 5.0 and before
    – Table level AUTO_INC lock for duration of INSERT
    – Even if INSERT provided key value !
    – Serious bottleneck for concurrent Inserts
●   MySQL 5.1 and later
    – innodb_autoinc_lock_mode – set lock behavior
    – “1” - Does not hold lock for simple Inserts
    – “2” - Does not hold lock in any case.
       • Only works with Row level replication
                                                    www.percona.com
-42-



       Latching




Innodb Internal Locks




                        www.percona.com
-43-



                                Innodb Latching
●   Innodb implements its own Mutexes and RW-Locks
    – For transparency not only Performance
●   Latching stats shown in SHOW INNODB STATUS
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 13569, signal count 11421
--Thread 1152170336 has waited at ./../include/buf0buf.ic line 630 for 0.00 seconds the semaphore:
Mutex at 0x2a957858b8 created file buf0buf.c line 517, lock var 0
waiters flag 0
wait is ending
--Thread 1147709792 has waited at ./../include/buf0buf.ic line 630 for 0.00 seconds the semaphore:
Mutex at 0x2a957858b8 created file buf0buf.c line 517, lock var 0
waiters flag 0
wait is ending
Mutex spin waits 5672442, rounds 3899888, OS waits 4719
RW-shared spins 5920, OS waits 2918; RW-excl spins 3463, OS waits 3163


                                                                               www.percona.com
-44-


             Latching Performance

●   Was improving over the years
●   Still is problem for certain workloads
    – Great improvements in MySQL 5.5,5.6 & XtaDB
    – Still hotspots remain
●   innodb_thread_concurrency
    – Limiting concurrency can reduce contention
    – Introduces contention on its own
●   innodb_sync_spin_loops
    – Trade Spinning for context switching
    – Typically limited production impact
                                              www.percona.com
-45-



     Page Replacement




Page Replacement Flushing and
       Checkpointing




                         www.percona.com
-46-



               Basic Page Replacement
●   Innodb uses LRU for page replacement
    – With Midpoint Insertion
●   Innodb Plugin and XtraDB configure
    – innodb_old_blocks_pct, innodb_old_blocks_time
    – Offers Scan resistance from large full table scans
●   Scan LRU Tail to find clean block for replacement
●   May schedule synchronous flush if no clean pages
    for replacement


                                                www.percona.com
-47-



                         Checkpointing
●   Fuzzy Checkpointing
    – Flush few pages to advance min unflushed LSN
    – Flush List is maintained in this order
●   MySQL 5.1 often has “hiccups”
    – No more space left in log files. Need to wait for flush to
      complete
●   Percona Patches for 5.0 and XtraDB
    – Adaptive checkpointing: innodb_adaptive_checkpoint
●   Innodb Plugin innodb_adaptive_flushing
    – Best behavior depends on worload
                                                    www.percona.com
Percona Live DC Sponsors

          Media Sponsor




     Friends of Percona Sponsor




                                  www.percona.com
MySQL Conference & Expo 2012
    Presented by Percona Live
    The Hyatt Regency Santa Clara & Santa Clara
                Convention Center

                   April 10th-12th, 2012
                     Featured Speakers
   Mark Callaghan (Facebook), Jeremy Zawodny (Craigslist), Marten Mickos
                          (Eucalyptus Systems)

Sarah Novotny (Blue Gecko), Peter Zaitsev (Percona), Baron Schwartz (Percona)



          Learn More at www.percona.com/live/
                                        www.percona.com
Want More MySQL Training?
   Percona Training in Washington DC Next Week

           January 16th-19th, 2012

            MySQL Workshops
             MySQL Developers Training - Monday

               MySQL DBA Training - Tuesday

            MySQL InnoDB / XtraDB - Wednesday

                MySQL Operations – Thursday


Use Discount code DCPL30 to save 30%
          Visit http://bit.ly/dc-training
                                                  www.percona.com
Upcoming Percona MySQL Training

      Washington, DC – January 16, 2012

     Vancouver, Canada - February 6, 2012

     Frankfurt, Germany - February 13, 2012

         Irvine, CA – February 14, 2012

        Denver, CO - February 20, 2012

      San Francisco, CA - March 12, 2012

          Austin, TX - March 19, 2012

     Visit http://percona.com/training
                                           www.percona.com
Percona Live MySQL Conference,
  April 10-12 , Santa Clara,CA




                www.percona.com/live

More Related Content

What's hot

Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBTim Callaghan
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
 
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库YUCHENG HU
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
Dd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublinDd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublinStåle Deraas
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeTim Callaghan
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performanceguest9912e5
 
TokuDB - What You Need to Know
TokuDB - What You Need to KnowTokuDB - What You Need to Know
TokuDB - What You Need to KnowJervin Real
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)Ontico
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMorgan Tocker
 
An introduction to SQL Server in-memory OLTP Engine
An introduction to SQL Server in-memory OLTP EngineAn introduction to SQL Server in-memory OLTP Engine
An introduction to SQL Server in-memory OLTP EngineKrishnakumar S
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
Storage talk
Storage talkStorage talk
Storage talkchristkv
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBAshnikbiz
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 

What's hot (19)

Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
TokuDB 高科扩展性 MySQL 和 MariaDB 数据库
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Dd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublinDd and atomic ddl pl17 dublin
Dd and atomic ddl pl17 dublin
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
TokuDB - What You Need to Know
TokuDB - What You Need to KnowTokuDB - What You Need to Know
TokuDB - What You Need to Know
 
TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)TokuDB internals / Лесин Владислав (Percona)
TokuDB internals / Лесин Владислав (Percona)
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server Changes
 
An introduction to SQL Server in-memory OLTP Engine
An introduction to SQL Server in-memory OLTP EngineAn introduction to SQL Server in-memory OLTP Engine
An introduction to SQL Server in-memory OLTP Engine
 
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
Maaz Anjum - IOUG Collaborate 2013 - An Insight into Space Realization on ODA...
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Storage talk
Storage talkStorage talk
Storage talk
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 

Similar to Pldc2012 innodb architecture and internals

InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)Ontico
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architectureSHAJANA BASHEER
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadMarius Adrian Popa
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0MongoDB
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsMarcelo Altmann
 
Perconalive feb-2011-share
Perconalive feb-2011-sharePerconalive feb-2011-share
Perconalive feb-2011-sharemdcallag
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseFITC
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance OptimisationMydbops
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionStreamNative
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017Ivan Zoratti
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎YUCHENG HU
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosqlthinkinlamp
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT Group
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementKent Huang
 

Similar to Pldc2012 innodb architecture and internals (20)

InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and Improvements
 
Extlect03
Extlect03Extlect03
Extlect03
 
Perconalive feb-2011-share
Perconalive feb-2011-sharePerconalive feb-2011-share
Perconalive feb-2011-share
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless Evolution
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory management
 

More from mysqlops

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautifulmysqlops
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解mysqlops
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementmysqlops
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationmysqlops
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Clustermysqlops
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告mysqlops
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫mysqlops
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践mysqlops
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用mysqlops
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现mysqlops
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析mysqlops
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考mysqlops
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示mysqlops
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事mysqlops
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDLmysqlops
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护mysqlops
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规mysqlops
 

More from mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规
 

Recently uploaded

Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 

Recently uploaded (20)

Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 

Pldc2012 innodb architecture and internals

  • 1. Innodb Architecture and Internals Peter Zaitsev Percona Live, Washington DC 11 January 2012
  • 2. -2- About Presentation ● Brief Introduction in Innodb Architecture – This area would deserve many books www.percona.com
  • 3. Innodb Versions ● MySQL 5.5 Current GA – Lots of improvements compared to previous versions ● MySQL 5.6 Current development release – Will mention some changes ● Percona Server 5.5 – Based on MySQL 5.5 with improvements www.percona.com
  • 4. -4- General Architecture ● Traditional OLTP Engine – “Emulates Oracle Architecture” ● Implemented using MySQL Storage engine API ● Row Based Storage. Row Locking. MVCC ● Data Stored in Tablespaces ● Log of changes stored in circular log files ● Data pages as pages in “Buffer Pool” www.percona.com
  • 5. -5- Storage Files Layout Physical Structure of Innodb Tabespaces and Logs www.percona.com
  • 6. -6- Innodb Tablespaces ● All data stored in Tablespaces – Changes to these databases stored in Circular Logs – Changes has to be reflected in tablespace before log record is overwritten ● Single tablespace or multiple tablespace – innodb_file_per_table=1 ● System information always in main tablespace – Main tablespace can consist of many files • They are concatenated www.percona.com
  • 7. -7- Tablespace Format ● Collection of Segments – Segment is like a “file” ● Segment is number of extents – Typically 64 of 16K page sizes – Smaller extents for very small objects ● First Tablespace page contains header – Tablespace size – Tablespace id www.percona.com
  • 8. -8- Types of Segments ● Each table is Set of Indexes – Innodb has “index organized tables” ● Each index has – Leaf node segment – Non Leaf node segment ● Special Segments – Rollback Segment(s) – Insert buffer, etc www.percona.com
  • 9. -9- Innodb Log Files ● Set of log files (ib_logfile?) – 2 log files by default. Effectively concatenated ● Log Header – Stores information about last checkpoint ● Log is NOT organized in pages, but records – Records aligned 512 bytes, matching disk sector ● Log record format “physiological” – Stores Page# and operation to do on it ● Only REDO operations are stored in logs. www.percona.com
  • 10. Separate Undo Tablespace ● MySQL 5.6 allows to store unto tablespace in separate set of files – innodb_undo_directory – innodb_undo_tablespaces – innodb_undo_logs ● Note once you enable these options you can't downgrade ● Offers another flexibility of using fast storage (such as SSD) www.percona.com
  • 11. -11- Innodb Threads Architecture What threads are there and what they do www.percona.com
  • 12. -12- General Thread Architecture ● Using MySQL Threads for execution – Normally thread per connection ● Transaction executed mainly by such thread – Little benefit from Multi-Core for single query ● innodb_thread_concurrency can be used to limit number of executing threads – Reduce contention ● This limit is number of threads in kernel – Including threads doing Disk IO or storing data in TMP Table. www.percona.com
  • 13. -13- Helper Threads ● Main Thread – Schedules activities – flush, purge, checkpoint, insert buffer merge ● IO Threads – Read – multiple threads used for read ahead – Write – multiple threads used for background writes – Insert Buffer thread used for Insert buffer merge – Log Thread used for flushing the log ● Purge thread(s) (MySQL 5.5 and XtraDB) ● Deadlock detection thread & Others www.percona.com
  • 14. -14- Memory Handling How Innodb Allocates and Manages Memory www.percona.com
  • 15. -15- Memory Allocation Basics ● Buffer Pool – Set by innodb_buffer_pool_size – Database cache; Insert Buffer; Locks – Takes More memory than specified • Extra space needed for Latches, LRU etc ● Additional Memory Pool – Dictionary and other allocations – innodb_additional_mem_pool_size • Not used in newer releases ● Log Buffer (innodb_log_buffer_size) www.percona.com
  • 16. -16- Disk IO How Innodb Performs Disk IO www.percona.com
  • 17. -17- Reads ● Most reads done by executing threads ● Read-Ahead performed by background threads – Linear – Random – Do not count on read ahead a lot ● Insert Buffer merge process causes reads www.percona.com
  • 18. -18- Writes ● Data Writes are Background in Most cases – As long as you can flush data fast enough you're good ● Synchronous flushes can happen if no free buffers available ● Log Writes can by sync or async depending on innodb_flush_log_at_trx_commit – 1 – fsync log on transaction commit – 0 – do not flush. Flushed in background ~ once/sec – 2 – Flush to OS cache but do not call fsync() • Data safe if MySQL Crashes but OS Survives www.percona.com
  • 19. Flush List Writes ● Flushing to advance “earliest modify LSN” – To free log space so it can be reduced ● Most of writes typically happen this way ● Number of pages to flush per cycle depended on the load – “innodb_adaptive_flushing” – Percona Server has more flushing modes • See innodb_adaptive_flushing_method ● If Flushing can't keep up stalls can happen www.percona.com
  • 20. Example of Misbehavior ● Data fits in memory and can be modified fast – Yet we can't flush data fast enough ● Working on solution in XtraDB www.percona.com
  • 21. LRU Flushes ● Can happen in workloads with data sets larger than memory ● If Innodb is unable to find clean page in 10% of LRU list ● LRU Flushes happen in user threads ● Hard to see exact number in standard Innodb – XtraDB adds Innodb_buffer_pool_pages_LRU_flushed www.percona.com
  • 22. LRU Flushes in MySQL 5.6 ● MySQL 5.6 adds “page_cleaner” to avoid LRU flushes in User Threads ● innodb_lru_scan_depth=N – Controlls how deeply page cleaner will examine Tail of LRU for dirty pages – Happens once per second www.percona.com
  • 23. -23- Page Checksums ● Protection from corrupted data – Bad hardware, OS Bugs, Innodb Bugs – Are not completely replaced by Filesystem Checksums ● Checked when page is Read to Buffer Pool ● Updated when page is flushed to disk ● Can be significant overhead – Especially for very fast storage ● Can be disabled by innodb_checksums=0 www.percona.com
  • 24. -24- Double Write Buffer ● Innodb log requires consistent pages for recovery ● Page write may complete partially – Updating part of 16K and leaving the rest ● Double Write Buffer is short term page level log ● The process is: – Write pages to double write buffer; Sync – Write Pages to their original locations; Sync – Pages contain tablespace_id+page_id ● On crash recovery pages in buffer are compared to their original location www.percona.com
  • 25. -25- Direct IO Operation ● Default IO mode for Innodb data is Buffered ● Good – Faster flushes when no write cache – Faster warmup on restart – Reduce problems with inode locking on EXT3 ● Bad – Lost of effective cache memory due to double buffering – OS Cache could be used to cache other data – Increased tendency to swap due to IO pressure ● innodb_flush_method=O_DIRECT www.percona.com
  • 26. -26- Log IO ● Log are opened in buffered mode – Even with innodb_flush_method=O_DIRECT – XtraDB can use O_DIRECT for logs • innodb_flush_method=ALL_O_DIRECT ● Flushed by fsync() - default or O_SYNC ● Logs are often written in 512 byte blocks – innodb_log_block_size=4096 in XtraDB ● Logs which fit in cache may improve performance – Small transactions and innodb_flush_log_at_trx_commit=1 or 2 www.percona.com
  • 27. -27- Indexes How Indexes are Implemented in Innodb www.percona.com
  • 28. -28- Everything is the Index ● Innodb tables are “Index Organized” – PRIMARY KEY contains data instead of data pointer ● Hidden PRIMARY KEY is used if not defined (6b) ● Data is “Clustered” by PRIMARY KEY – Data with close PK value is stored close to each other – Clustering is within page ONLY ● Leaf and Non-Leaf nodes use separate Segments – Makes IO more sequential for ordered scans ● Innodb system tables SYS_TABLES and SYS_INDEXES hold information about index “root” www.percona.com
  • 29. -29- Index Structure ● Secondary Indexes refer to rows by Primary Key – No update when row is moved to different page ● Long Primary Keys are expensive – Increase size of all Indexes ● Random Primary Key Inserts are expensive – Cause page splits; Fragmentation – Make page space utilization low ● AutoIncrement keys are often better than artificial keys, UUIDs, SHA1 etc. www.percona.com
  • 30. -30- Multi Versioning Implementation of Multi Versioning and Locking www.percona.com
  • 31. -31- Multi Versioning at Glance ● Multiple versions of row exist at the same time ● Read Transaction can read old version of row, while it is modified – No need for locking ● Locking reads can be performed with SELECT FOR UPDATE and LOCK IN SHARE MODE Modifiers www.percona.com
  • 32. -32- Transaction isolation Modes ● SERIALIZABLE – Locking reads. Bypass multi versioning ● REPEATABLE-READ (default) – Read commited data at it was on start of transaction ● READ-COMMITED – Read commited data as it was at start of statement ● READ-UNCOMMITED – Read non committed data as it is changing live www.percona.com
  • 33. -33- Updates and Locking Reads ● Updates bypass Multi Versioning – You can only modify row which currently exists ● Locking Read bypass multi-versioning – Result from SELECT vs SELECT .. LOCK IN SHARE MODE will be different ● Locking Reads are slower – Because they have to set locks – Can be 2x+ slower ! – SELECT FOR UPDATE has larger overhead www.percona.com
  • 34. -34- Multi Version Implementaition ● The most recent row version is stored in the page – Even before it is committed ● Previous row versions stored in undo space – Located in System tablespace ● The number of versions stored is not limited – Can cause system tablespace size to explode. ● Access to old versions require going through linked list – Long transactions with many concurrent updates can impact performance. www.percona.com
  • 35. -35- Multi Versioning Indexes ● Indexes contain pointers to all versions – Index key 5 will point to all rows which were 5 in the past ● Indexes contain TRX_ID – Easy to check entry is visible – Can use “Covering Indexes” ● Many old versions is performance problem – Slow down accesses – Will leave many “holes” in pages when purged www.percona.com
  • 36. -36- Cleaning up the Garbage ● Old Row and index entries need to be removed – When they are not needed for any active transaction ● REPEATABLE READ – Need to be able to read everything at transaction start ● READ-COMMITED – Need to read everything at statement start ● Purge Thread(s) may be unable to keep up with intensive updates – Innodb “History Length” will grow high ● innodb_max_purge_lag slows updates down – Not very reliable www.percona.com
  • 37. -37- Innodb Locking How Innodb Locking Works www.percona.com
  • 38. -38- Innodb Locking Basics ● Pessimistic Locking Strategy ● Graph Based Deadlock Detection – Takes shortcut for very large lock graphs ● Row Level lock wait timeout – innodb_lock_wait_timeout ● Traditional “S” and “X” locks ● Intention locks on tables “IS” “IX” – Restricting table operations ● Locks on Rows AND Index Records ● No Lock escalation www.percona.com
  • 39. -39- Gap Locks ● Innodb does not only locks rows but also gap between them ● Needed for consistent reads in Locking mode – Also used by update statements ● Innodb has no Phantoms even in Consistent Reads ● Gap locks often cause complex deadlock situations ● “infinum”, “supremum” records define bounds of data stored on the page – May not correspond to actual rows stored ● Only record lock is needed for PK Update www.percona.com
  • 40. -40- Lock Storage ● Innodb locks storage is pretty compact – This is why there is no lock escalation ! ● Lock space needed depends on lock location – Locking sparse rows is more expensive ● Each Page having locks gets bitmap allocated for it – Bitmap holds lock information for all records on the page ● Locks typically take 3-8 bits per locked row www.percona.com
  • 41. -41- Auto Increment Locks ● Major Changes in MySQL 5.1 ! ● MySQL 5.0 and before – Table level AUTO_INC lock for duration of INSERT – Even if INSERT provided key value ! – Serious bottleneck for concurrent Inserts ● MySQL 5.1 and later – innodb_autoinc_lock_mode – set lock behavior – “1” - Does not hold lock for simple Inserts – “2” - Does not hold lock in any case. • Only works with Row level replication www.percona.com
  • 42. -42- Latching Innodb Internal Locks www.percona.com
  • 43. -43- Innodb Latching ● Innodb implements its own Mutexes and RW-Locks – For transparency not only Performance ● Latching stats shown in SHOW INNODB STATUS ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 13569, signal count 11421 --Thread 1152170336 has waited at ./../include/buf0buf.ic line 630 for 0.00 seconds the semaphore: Mutex at 0x2a957858b8 created file buf0buf.c line 517, lock var 0 waiters flag 0 wait is ending --Thread 1147709792 has waited at ./../include/buf0buf.ic line 630 for 0.00 seconds the semaphore: Mutex at 0x2a957858b8 created file buf0buf.c line 517, lock var 0 waiters flag 0 wait is ending Mutex spin waits 5672442, rounds 3899888, OS waits 4719 RW-shared spins 5920, OS waits 2918; RW-excl spins 3463, OS waits 3163 www.percona.com
  • 44. -44- Latching Performance ● Was improving over the years ● Still is problem for certain workloads – Great improvements in MySQL 5.5,5.6 & XtaDB – Still hotspots remain ● innodb_thread_concurrency – Limiting concurrency can reduce contention – Introduces contention on its own ● innodb_sync_spin_loops – Trade Spinning for context switching – Typically limited production impact www.percona.com
  • 45. -45- Page Replacement Page Replacement Flushing and Checkpointing www.percona.com
  • 46. -46- Basic Page Replacement ● Innodb uses LRU for page replacement – With Midpoint Insertion ● Innodb Plugin and XtraDB configure – innodb_old_blocks_pct, innodb_old_blocks_time – Offers Scan resistance from large full table scans ● Scan LRU Tail to find clean block for replacement ● May schedule synchronous flush if no clean pages for replacement www.percona.com
  • 47. -47- Checkpointing ● Fuzzy Checkpointing – Flush few pages to advance min unflushed LSN – Flush List is maintained in this order ● MySQL 5.1 often has “hiccups” – No more space left in log files. Need to wait for flush to complete ● Percona Patches for 5.0 and XtraDB – Adaptive checkpointing: innodb_adaptive_checkpoint ● Innodb Plugin innodb_adaptive_flushing – Best behavior depends on worload www.percona.com
  • 48. Percona Live DC Sponsors Media Sponsor Friends of Percona Sponsor www.percona.com
  • 49. MySQL Conference & Expo 2012 Presented by Percona Live The Hyatt Regency Santa Clara & Santa Clara Convention Center April 10th-12th, 2012 Featured Speakers Mark Callaghan (Facebook), Jeremy Zawodny (Craigslist), Marten Mickos (Eucalyptus Systems) Sarah Novotny (Blue Gecko), Peter Zaitsev (Percona), Baron Schwartz (Percona) Learn More at www.percona.com/live/ www.percona.com
  • 50. Want More MySQL Training? Percona Training in Washington DC Next Week January 16th-19th, 2012 MySQL Workshops MySQL Developers Training - Monday MySQL DBA Training - Tuesday MySQL InnoDB / XtraDB - Wednesday MySQL Operations – Thursday Use Discount code DCPL30 to save 30% Visit http://bit.ly/dc-training www.percona.com
  • 51. Upcoming Percona MySQL Training Washington, DC – January 16, 2012 Vancouver, Canada - February 6, 2012 Frankfurt, Germany - February 13, 2012 Irvine, CA – February 14, 2012 Denver, CO - February 20, 2012 San Francisco, CA - March 12, 2012 Austin, TX - March 19, 2012 Visit http://percona.com/training www.percona.com
  • 52. Percona Live MySQL Conference, April 10-12 , Santa Clara,CA www.percona.com/live