SlideShare a Scribd company logo
MySQL Innovation Works -- InnoSQL

                 David Jiang
         jiangchengyao@gmail.com
           weibo.com/insidemysql
About Me
 7+ years work on different databases
   SQL Server
   MySQL
   Oracle
 Now work for Netease Development and Research Center
  Lab
   MySQL kernel development
 Author
   <<Inside MySQL: InnoDB Storage Engine>>
   <<Inside MySQL: SQL Programming >> (coming soon
    2012.3)
What is InnoSQL
 A new MySQL branch
   Open source
   High performance (flash cache)
   Ease of use
   Fully compatible with original MySQL
   Collect creative idea for MySQL and make it happen
 MySQL Innovation Works
   http://www.innomysql.org
InnoSQL Feature
 Flash Cache for InnoDB
   Provide high performance than just use SSD as durable storage
 Share memory(SHM) for InnoDB Buffer Pool
   Quick warm-up InnoDB buffer pool
   Less than 1 sec !!!
 InnoDB IO Statistic
   Get each SQL’s physical and logic read
 Page Clean Thread
   Remove block in user query thread
InnoSQL Flash Cache
 InnoSQL Flash Cache
   Using SSD as Cache
 Other flash cache solution
   Facebook flash cache
   Oracle flash cache
   Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )
Facebook Flash Cache
 A general solution
 Open source
   https://github.com/facebook/flashcache
 Integration with file systems
   built using the Linux Device Mapper
 Not optimize for database
 Good in read intensive workload
 Worse in write intensive workload
 Need time to warm up
Oracle Flash Cache
 Work for Oracle 11g
 Page write to flash cache is slow
   Not so aggressive
 Need warm up
Secondary Buffer Pool
 Support in InnoSQL 5.5.8
 Good in read intensive workload
 Also not good for write intensive workload
   TPC-C
 Can warm up database when start up
   Slow for each start
 Cache is not a persistent storage
Why need warm up ?
 Capacity:
   SSD >> Memory
 Speed
   SSD << Memory
Flash Cache in InnoSQL 5.5.13
 Can cache both read & write operation
 Sequential write on SSD
   No random write
 Merge write
 Cache is persistent
Why not use SSD as durable storage
 SSD is good for random read
   7000+ IOPS
   100 ~ 150 IOPS for disk
 SSD life cycle
 SSD write performance
   Write: page
   Wipe: extent ( 128~256 page)
 Database is not fully optimized for SSD
   Read ahead algorithm
   512 bytes alignment write for log file
   Random write
Why use SSD as Cache
 Cache is everywhere
   Register
   L1 cache
   L2 cache       volatile

   L3 cache
   Memory
                              SSD
   Disk
                  non-volatile
   Tape
Question
 Using your SSD as volatile or non-volatile ?
Analyze
 If use SSD as durable storage
    Non-volatile
    But now the database not fully optimize it
 If use Secondary Buffer Pool or Oracle Flash Cache
    Volatile
    Performance degrade
       Need to write twice ( flash cache & durable storage )
 If use Facebook flash cache
    Volatile or Non-volatile
       Base on cache modes
           Writethrough
           Writearound
           writeback
    Performance degrade
      Still need to write twice, but use some optimization
    Not fully optimize for database
Cache in MySQL InnoDB
 InnoDB Buffer Pool
   Cache page
   Asynchronous operation for page
     Read page in buffer pool first
     Modify page in buffer pool first
     Then make fuzzy or sharp checkpoint to disk
     Need log manager for recovery
   More buffer pool, better performance
     Because speed gap between disk and memory
     However, we can not get enough memory to cache all the database
Cache in MySQL InnoDB
 Insert Buffer
    Insert buffer is a B+ Tree,
       MySQL version < 4.1.x, one table on insert buffer tree.
          (page_no, fields_type_info, actual record)
       >=4.1, only on insert buffer tree.
          (space_id, one-byte-marker, page_no,fields_type_info, actual record)
          index by (space_id, page_no)
    Work for non-unique secondary index
       Write to insert buffer , if page is not in the buffer pool
       Insert buffer bitmap page to track the free space of page
          2 bit per page
    Merge write operation
       Merge write
       Delay page write
       raise write performance
       However, increase read operation
    MySQL 5.5 Change Buffer
       insert、purge、delete mark
InnoDB Insert Buffer
mysql> show engine innodb statusG;
*************************** 1. row ***************************
Status:
=====================================
090922 11:52:51 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
……
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
          Used Page       Free Page         Seg size=size+free list len+1
-------------------------------------
Ibuf: size 2249, free list len 3346, seg size 5596,
374650 inserts, 51897 merged recs, 14300 merges
Hash table size 4980499, node heap has 1246 buffer(s)
1640.60 hash searches/s, 3709.46 non-hash searches/s
                merged recs: merges = insert buffer efficiency
Cache in MySQL InnoDB
 Cache can increase performance
 Delay write operation
   Gap between disk and cache
 However, there is another cache in InnoDB
   Doublewrite
What is Doublewrite ?
 Doublewrite
   Avoid partial write problem
     512 byte write is always OK
     But 16K write is not
   Doublewrite buffer
     2M
   Doublewrite file
     2M
     Share tablespace: ibdata1
Doublewrite Architecture
       Stores all data twice, first to the doublewrite buffer, and then
        to the actual data files
       --skip-innodb_doublewrite

mysql> show global status like 'innodb_dbl%'G;
************** 1. row ************************
Variable_name: Innodb_dblwr_pages_written
     Value: 152362
************** 2. row ************************
Variable_name: Innodb_dblwr_writes
     Value: 1465
2 rows in set (0.00 sec)
Doublewrite Feature
 Size: 2M
 All the page should first write here
 Sequential write
 Cache write



     Hence, what about have a 100G or 300G doublewrite ?
     This makes flash cache happen
Flash Cache in InnoSQL 5.5.13
 Replace original doublewrite work
 Now user can have a large doublewrite
 Page write is sequential
   SSD write feature
 Doublewrite can read now
   SSD random read feature
 Cache both read and write operation
 Persistent cache
 Merge write
   60 ~ 70% in workload like TPC-C
 Support AIO read on flash cache
   Not supported in Secondary Buffer Pool
Flash Cache Architecture
Flash Cache Data Structure
/** Flash cache block struct */
struct trx_flashcache_block_struct{
   unsigned      space:32;       /*!< tablespace id */
   unsigned      offset:32;      /*!< page number */
   unsigned      fil_offset:32; /*!< flash cache page number */
   unsigned      state:2;        /*!< flash cache state*/
   trx_flashcache_block_t* hash; /*!< hash chain */
};                                   Four State:
                                    BLOCK_NOT_USED
                                    BLOCK_READY_FOR_FLUSH
                                    BLOCK_READ_CACHE
                                    BLOCK_FLUSHED
Flash Cache Data Structure
struct trx_flashcache_struct{
   mutex_t         fc_mutex;/*!< mutex protecting flash cache */
   hash_table_t* fc_hash; /*!< hash table of flash cache pages */
   ulint           fc_size; /*!< flash cache size */
   ulint           write_off; /*!< write to flash cache offset */
   ulint           flush_off; /*!< flush to disk this offset */
   ulint           write_round; /* write round */
   ulint           flush_round; /* flush round */
   trx_flashcache_block_t* block; /* flash cache block */
   byte*           read_buf_unalign; /* unalign read buf */
   byte*           read_buf;          /* read buf */
}
From Developer Perspective View
    Write                                Flash Cache File
                       flush_offset                                  write_offset



     Block     Block       Block      Block    Block        Block   Block    Block



Flash Cache Block


                                       Flash Cache Hash Table           Lookup
    Flash Cache Log File
    write_offset                            (In Memory)
    flush _offset
    write_round
    flush_round
Flash Cache Flush Algorithms
   Flush page in flash cache to disk
   Take over the flush in master thread
   Flush in flash cache background thread
   Algorithms
     Less than innodb_flash_cache_write_cache_pct
        No flush
        Default 10
     Less than innodb_flash_cache_do_full_io_pct
        Flush 10% innodb_io_capacity
        Default 90
     Else
        Flush 100% innodb_io_capacity
     If idle
        Flush 100% innodb_io_capacity
Merge Write in Flash Cache

         flush_offset
                                                              write_offset


 (7,7)     (2,6)        (0,6)   (3,7)   ……   (3,7)   (2,6)   (4,8)




                   Page (2,6)、(3,7) can be merged
                   This much like insert buffer
                   Delay write operation
Flash Cache Benchmark
 Sysbench OLTP
   Read intensive
 TPC-C
   Write intensive
 Blogbench
   Blog like application oriented
   Developed by Netease
Sysbench OLTP




InnoDB Buffer Pool: 6G
DB Size: 19G
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
TPC-C




InnoDB Buffer Pool: 12G
DB Size: 39G                         SSD:3607.183 Tpm
innodb_flush_method = O_DIRECT       Flash Cache:7230.05 Tpm
innodb_flush_log_at_trx_commit = 1   Merge Write Ratio:65.47%
Flash Cache: 100G
Blogbench




InnoDB Buffer Pool: 4G
DB Size: 21G
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
Merge write ratio: 60%
Conclusion
 Flash Cache can work in both read and write workload
 Work better than using SSD as durable storage
 Optimize for SSD in database kernel
 No more writes in flash cache
 Merge write support
SHM for InnoDB Buffer Pool
 Use share memory to allocate innodb buffer pool
 Why use share memory?
   Speed warm up
 Warm up speed?
   Random read 10~20M/sec
   30G buffer pool need 30~60 minutes
Warm up Method
 Use SQL to warm up
   SELECT count(*) FROM table ( force index ( primary key ) )
   Warm up speed convert to sequential read
   But can not make database to previous workload environment
 Dump buffer pool to file
   MySQL 5.6+ support
   Warm up speed convert to sequential read
   Make database to previous workload environment
   Dump file is big
   Database crash ?
Warm up Method
 Percona Server
   Export (space_id, page_no) in LRU list to file
   Load this file order by (space_id,page_no) to make read
    sequential when MySQL is startup
   Make database to previous workload environment
   Still need long time to warm up
     if you have big buffer pool:128G、256G
Warm up in InnoSQL
 Use share memory
   --innodb_use_shm_preload=1
 Share memory configuration like Oracle
   /proc/sys/kernel/shmmax
   /proc/sys/kernel/shmall
 Warm up less than 1 sec
   All page is in memory
SHM for InnoDB Buffer Pool
# list share memory info
innosql@db-62:~$ ipcs -a
 ------ Shared Memory Segments --------
key       shmid owner perms bytes nattch status
0x0008c231 4653056 innosql 600          549715968 0
 ------ Semaphore Arrays --------
key       semid owner perms nsems
 ------ Message Queues --------
key       msqid owner perms used-bytes messages

# remove share memory
innosql@db-62:~$ ipcrm -m 4653056
InnoDB IO Statistics
 Get read IO statistics
   Like SQL Server:SET STATISTICS IO ON
 InnoSQL realize it in Slow query Log
   Both file and table
 Help SQL developer
   10 reads may be not good in OLTP application
 Help DBA
   Know the SQL real IO statistics
   Not only the time it consumes
 Still in develop
   You can preview this feature
InnoDB IO Statistics
# Time: 111103 13:29:06
# User@Host: root[root] @ localhost [::1]
# Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1
   Rows_examined: 1 Logical_reads: 198 Physical_reads: 3
use tpcc;
SET timestamp=1320298146;
select * from warehouse where w_id=1;
# Time: 111103 13:31:28
# User@Host: root[root] @ localhost [::1]
# Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1
   Rows_examined: 1 Logical_reads: 164 Physical_reads: 50
SET timestamp=1320298288;
select * from history;
Configuration
 long_query_time
 io_slow_query
 slow_query_type
   0 long_query_time
   1 io_slow_query
   2 both
Page Cleaner Thread
 Flush page in Master Thread
   Adaptive Flush
   IO Capacity
 Problem
   Master Thread have a lot to cope
   Async flush can block user query thread
 Page cleaner thread
   MySQL 5.6 support
   InnoSQL support it in MySQL 5.5
   Can also help flush in FLUSH_LRU_LIST
Flush Algorithms in InnoDB
 checkpoint_age:current_lsn – checkpint_lsn
 async_water_mark: ~78%*Log_Group_Size
 sync_water_mark: ~90%*Log_Group_Size
 For example:
   Log file size 1G, Log file number 2
   Async_water_mark = ~1.5G
   Sync_water_mark = ~1.8G
Flush Algorithms in InnoDB
 checkpoint_age < async_water_mark
   adaptive_flusing
   5% innodb_io_capacity
 async_water_mark < checkpoint_age < sync_water_mark
   Block one user query thread
   Async flush
 checkpoint_age > sync_water
   Block all user query thread
   Sync flush
 n_dirty_pages > innodb_max_dirty_page_pct
   Flush innodb_io_capacity
Page Cleaner Thread
 Reduce master thread burden
 Async flush move to this background
   No block happened in user query thread
However
 Flush not only happen in master thread
 FLUSH_LRU_LIST
   Check if there at least 64 page can be used
   In this situation, flush almost in user query thread
   Adaptive flush, innodb_io_capacity helps nothing
   Happen in user query thread
 InnoSQL also move this flush to page cleaner thread
   MySQL 5.6 does not support
   Still need more optimize
Q &A

More Related Content

What's hot

Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
Chris Adkin
 
Leveraging memory in sql server
Leveraging memory in sql serverLeveraging memory in sql server
Leveraging memory in sql server
Chris Adkin
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration Strategies
Murilo Miranda
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Chris Adkin
 
Percona xtrabackup - MySQL Meetup @ Mumbai
Percona xtrabackup - MySQL Meetup @ MumbaiPercona xtrabackup - MySQL Meetup @ Mumbai
Percona xtrabackup - MySQL Meetup @ Mumbai
Nilnandan Joshi
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Chris Adkin
 
Your browser, your storage (extended version)
Your browser, your storage (extended version)Your browser, your storage (extended version)
Your browser, your storage (extended version)
Francesco Fullone
 
Hdg explains swapfile.sys, hiberfil.sys and pagefile
Hdg explains   swapfile.sys, hiberfil.sys and pagefileHdg explains   swapfile.sys, hiberfil.sys and pagefile
Hdg explains swapfile.sys, hiberfil.sys and pagefileTrường Tiền
 
Lowest Storage Cost per Desktop with NetApp without any Tradeoffs
Lowest Storage Cost per Desktop with NetApp without any TradeoffsLowest Storage Cost per Desktop with NetApp without any Tradeoffs
Lowest Storage Cost per Desktop with NetApp without any Tradeoffs
NetApp
 
Drupal Performance - SerBenfiquista.com Case Study
Drupal Performance - SerBenfiquista.com Case StudyDrupal Performance - SerBenfiquista.com Case Study
Drupal Performance - SerBenfiquista.com Case Study
hernanibf
 
Introducing Xtrabackup Manager
Introducing Xtrabackup ManagerIntroducing Xtrabackup Manager
Introducing Xtrabackup ManagerHenrik Ingo
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramChris Adkin
 
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Guy Harrison
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql serverChris Adkin
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Experts, Inc.
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
Chris Adkin
 
2008 MySQL Conference Recap
2008 MySQL Conference Recap2008 MySQL Conference Recap
2008 MySQL Conference Recap
Chris Barber
 
The effect of page size modification on jvm
The effect of page size modification on jvmThe effect of page size modification on jvm
The effect of page size modification on jvm
Parameswaran Selvam
 

What's hot (18)

Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
Leveraging memory in sql server
Leveraging memory in sql serverLeveraging memory in sql server
Leveraging memory in sql server
 
VLDB Administration Strategies
VLDB Administration StrategiesVLDB Administration Strategies
VLDB Administration Strategies
 
Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)Column store indexes and batch processing mode (nx power lite)
Column store indexes and batch processing mode (nx power lite)
 
Percona xtrabackup - MySQL Meetup @ Mumbai
Percona xtrabackup - MySQL Meetup @ MumbaiPercona xtrabackup - MySQL Meetup @ Mumbai
Percona xtrabackup - MySQL Meetup @ Mumbai
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
 
Your browser, your storage (extended version)
Your browser, your storage (extended version)Your browser, your storage (extended version)
Your browser, your storage (extended version)
 
Hdg explains swapfile.sys, hiberfil.sys and pagefile
Hdg explains   swapfile.sys, hiberfil.sys and pagefileHdg explains   swapfile.sys, hiberfil.sys and pagefile
Hdg explains swapfile.sys, hiberfil.sys and pagefile
 
Lowest Storage Cost per Desktop with NetApp without any Tradeoffs
Lowest Storage Cost per Desktop with NetApp without any TradeoffsLowest Storage Cost per Desktop with NetApp without any Tradeoffs
Lowest Storage Cost per Desktop with NetApp without any Tradeoffs
 
Drupal Performance - SerBenfiquista.com Case Study
Drupal Performance - SerBenfiquista.com Case StudyDrupal Performance - SerBenfiquista.com Case Study
Drupal Performance - SerBenfiquista.com Case Study
 
Introducing Xtrabackup Manager
Introducing Xtrabackup ManagerIntroducing Xtrabackup Manager
Introducing Xtrabackup Manager
 
Sql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ramSql server engine cpu cache as the new ram
Sql server engine cpu cache as the new ram
 
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
Understanding Solid State Disk and the Oracle Database Flash Cache (older ver...
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql server
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
 
2008 MySQL Conference Recap
2008 MySQL Conference Recap2008 MySQL Conference Recap
2008 MySQL Conference Recap
 
The effect of page size modification on jvm
The effect of page size modification on jvmThe effect of page size modification on jvm
The effect of page size modification on jvm
 

Viewers also liked

云端的数据库
云端的数据库云端的数据库
云端的数据库thinkinlamp
 
Scrum beyond software (think in lamp version)
Scrum beyond software (think in lamp version)Scrum beyond software (think in lamp version)
Scrum beyond software (think in lamp version)thinkinlamp
 
对My sql dba的一些思考
对My sql dba的一些思考对My sql dba的一些思考
对My sql dba的一些思考thinkinlamp
 
项目中的知识管理
项目中的知识管理项目中的知识管理
项目中的知识管理thinkinlamp
 
《Scrum漫谈》
《Scrum漫谈》《Scrum漫谈》
《Scrum漫谈》thinkinlamp
 
别让专业水平外的因素拖
别让专业水平外的因素拖别让专业水平外的因素拖
别让专业水平外的因素拖thinkinlamp
 
Nosql七种武器之长生剑 mongodb的使用介绍
Nosql七种武器之长生剑 mongodb的使用介绍Nosql七种武器之长生剑 mongodb的使用介绍
Nosql七种武器之长生剑 mongodb的使用介绍
thinkinlamp
 
The art of storytelling and how it can help make a better world
The art of storytelling and how it can help make a better worldThe art of storytelling and how it can help make a better world
The art of storytelling and how it can help make a better world
(mostly) TRUE THINGS
 

Viewers also liked (9)

云端的数据库
云端的数据库云端的数据库
云端的数据库
 
数据仓库
数据仓库数据仓库
数据仓库
 
Scrum beyond software (think in lamp version)
Scrum beyond software (think in lamp version)Scrum beyond software (think in lamp version)
Scrum beyond software (think in lamp version)
 
对My sql dba的一些思考
对My sql dba的一些思考对My sql dba的一些思考
对My sql dba的一些思考
 
项目中的知识管理
项目中的知识管理项目中的知识管理
项目中的知识管理
 
《Scrum漫谈》
《Scrum漫谈》《Scrum漫谈》
《Scrum漫谈》
 
别让专业水平外的因素拖
别让专业水平外的因素拖别让专业水平外的因素拖
别让专业水平外的因素拖
 
Nosql七种武器之长生剑 mongodb的使用介绍
Nosql七种武器之长生剑 mongodb的使用介绍Nosql七种武器之长生剑 mongodb的使用介绍
Nosql七种武器之长生剑 mongodb的使用介绍
 
The art of storytelling and how it can help make a better world
The art of storytelling and how it can help make a better worldThe art of storytelling and how it can help make a better world
The art of storytelling and how it can help make a better world
 

Similar to My sql innovation work -innosql

The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLMorgan Tocker
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
Louis liu
 
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa
 
Bcache and Aerospike
Bcache and AerospikeBcache and Aerospike
Bcache and Aerospike
Anshu Prateek
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
Angelo Rajadurai
 
jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performanceHiroshi Ono
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
Angelo Rajadurai
 
Vmfs
VmfsVmfs
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB Anatomy
Ryan Robson
 
MySQL5.7 Innodb_enhance_parti_20160317
MySQL5.7 Innodb_enhance_parti_20160317MySQL5.7 Innodb_enhance_parti_20160317
MySQL5.7 Innodb_enhance_parti_20160317
Saewoong Lee
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory Management
DAGEOP LTD
 
Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014
Guy Harrison
 
Serve like a boss (part two)
Serve like a boss (part two)Serve like a boss (part two)
Serve like a boss (part two)
Hamed Nemati
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
Wim Godden
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎
YUCHENG HU
 
Inno db 5_7_features
Inno db 5_7_featuresInno db 5_7_features
Inno db 5_7_features
Tinku Ajit
 
Caching and tuning fun for high scalability @ PHPTour
Caching and tuning fun for high scalability @ PHPTourCaching and tuning fun for high scalability @ PHPTour
Caching and tuning fun for high scalability @ PHPTour
Wim Godden
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and Improvements
Marcelo Altmann
 

Similar to My sql innovation work -innosql (20)

The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
 
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
 
Bcache and Aerospike
Bcache and AerospikeBcache and Aerospike
Bcache and Aerospike
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
jacobs_tuuri_performance
jacobs_tuuri_performancejacobs_tuuri_performance
jacobs_tuuri_performance
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
Vmfs
VmfsVmfs
Vmfs
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB Anatomy
 
MySQL5.7 Innodb_enhance_parti_20160317
MySQL5.7 Innodb_enhance_parti_20160317MySQL5.7 Innodb_enhance_parti_20160317
MySQL5.7 Innodb_enhance_parti_20160317
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory Management
 
Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014
 
Serve like a boss (part two)
Serve like a boss (part two)Serve like a boss (part two)
Serve like a boss (part two)
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎Percona 服务器与 XtraDB 存储引擎
Percona 服务器与 XtraDB 存储引擎
 
Inno db 5_7_features
Inno db 5_7_featuresInno db 5_7_features
Inno db 5_7_features
 
Caching and tuning fun for high scalability @ PHPTour
Caching and tuning fun for high scalability @ PHPTourCaching and tuning fun for high scalability @ PHPTour
Caching and tuning fun for high scalability @ PHPTour
 
Measuring Firebird Disk I/O
Measuring Firebird Disk I/OMeasuring Firebird Disk I/O
Measuring Firebird Disk I/O
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and Improvements
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
 

More from thinkinlamp

浅谈 My sql 性能调优
浅谈 My sql 性能调优浅谈 My sql 性能调优
浅谈 My sql 性能调优thinkinlamp
 
2011 06-12-why do we need the rabbit
2011 06-12-why do we need the rabbit2011 06-12-why do we need the rabbit
2011 06-12-why do we need the rabbitthinkinlamp
 
2011 06-12-lamp-mysql-顾春江
2011 06-12-lamp-mysql-顾春江2011 06-12-lamp-mysql-顾春江
2011 06-12-lamp-mysql-顾春江thinkinlamp
 
大型微博应用Feed系统浅析
大型微博应用Feed系统浅析大型微博应用Feed系统浅析
大型微博应用Feed系统浅析thinkinlamp
 
Enterprise connect
Enterprise connectEnterprise connect
Enterprise connectthinkinlamp
 
I os tech talk 观后感
I os tech talk 观后感I os tech talk 观后感
I os tech talk 观后感thinkinlamp
 
网页游戏开发与敏捷开发
网页游戏开发与敏捷开发网页游戏开发与敏捷开发
网页游戏开发与敏捷开发thinkinlamp
 
My sql自动化监控
My sql自动化监控My sql自动化监控
My sql自动化监控thinkinlamp
 
服务化的网站架构
服务化的网站架构服务化的网站架构
服务化的网站架构thinkinlamp
 
大型互联网应用架构设计
大型互联网应用架构设计大型互联网应用架构设计
大型互联网应用架构设计thinkinlamp
 
Php extension开发
Php extension开发Php extension开发
Php extension开发thinkinlamp
 
大型Sns数据库设计
大型Sns数据库设计大型Sns数据库设计
大型Sns数据库设计
thinkinlamp
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
thinkinlamp
 
Mysql overview_20100811
Mysql overview_20100811Mysql overview_20100811
Mysql overview_20100811
thinkinlamp
 
面向搜索引擎的友好程序开发
面向搜索引擎的友好程序开发面向搜索引擎的友好程序开发
面向搜索引擎的友好程序开发thinkinlamp
 
基于架构的开发模式
基于架构的开发模式基于架构的开发模式
基于架构的开发模式thinkinlamp
 
系统邮件实战技巧
系统邮件实战技巧系统邮件实战技巧
系统邮件实战技巧thinkinlamp
 
领域驱动设计
领域驱动设计领域驱动设计
领域驱动设计thinkinlamp
 

More from thinkinlamp (20)

浅谈 My sql 性能调优
浅谈 My sql 性能调优浅谈 My sql 性能调优
浅谈 My sql 性能调优
 
2011 06-12-why do we need the rabbit
2011 06-12-why do we need the rabbit2011 06-12-why do we need the rabbit
2011 06-12-why do we need the rabbit
 
2011 06-12-lamp-mysql-顾春江
2011 06-12-lamp-mysql-顾春江2011 06-12-lamp-mysql-顾春江
2011 06-12-lamp-mysql-顾春江
 
蜘蛛
蜘蛛蜘蛛
蜘蛛
 
大型微博应用Feed系统浅析
大型微博应用Feed系统浅析大型微博应用Feed系统浅析
大型微博应用Feed系统浅析
 
Enterprise connect
Enterprise connectEnterprise connect
Enterprise connect
 
I os tech talk 观后感
I os tech talk 观后感I os tech talk 观后感
I os tech talk 观后感
 
网页游戏开发与敏捷开发
网页游戏开发与敏捷开发网页游戏开发与敏捷开发
网页游戏开发与敏捷开发
 
My sql自动化监控
My sql自动化监控My sql自动化监控
My sql自动化监控
 
服务化的网站架构
服务化的网站架构服务化的网站架构
服务化的网站架构
 
大型互联网应用架构设计
大型互联网应用架构设计大型互联网应用架构设计
大型互联网应用架构设计
 
Php extension开发
Php extension开发Php extension开发
Php extension开发
 
大型Sns数据库设计
大型Sns数据库设计大型Sns数据库设计
大型Sns数据库设计
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Mysql overview_20100811
Mysql overview_20100811Mysql overview_20100811
Mysql overview_20100811
 
面向搜索引擎的友好程序开发
面向搜索引擎的友好程序开发面向搜索引擎的友好程序开发
面向搜索引擎的友好程序开发
 
基于架构的开发模式
基于架构的开发模式基于架构的开发模式
基于架构的开发模式
 
系统邮件实战技巧
系统邮件实战技巧系统邮件实战技巧
系统邮件实战技巧
 
Scrum pennygame
Scrum pennygameScrum pennygame
Scrum pennygame
 
领域驱动设计
领域驱动设计领域驱动设计
领域驱动设计
 

Recently uploaded

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

My sql innovation work -innosql

  • 1. MySQL Innovation Works -- InnoSQL David Jiang jiangchengyao@gmail.com weibo.com/insidemysql
  • 2. About Me  7+ years work on different databases  SQL Server  MySQL  Oracle  Now work for Netease Development and Research Center Lab  MySQL kernel development  Author  <<Inside MySQL: InnoDB Storage Engine>>  <<Inside MySQL: SQL Programming >> (coming soon 2012.3)
  • 3. What is InnoSQL  A new MySQL branch  Open source  High performance (flash cache)  Ease of use  Fully compatible with original MySQL  Collect creative idea for MySQL and make it happen  MySQL Innovation Works  http://www.innomysql.org
  • 4. InnoSQL Feature  Flash Cache for InnoDB  Provide high performance than just use SSD as durable storage  Share memory(SHM) for InnoDB Buffer Pool  Quick warm-up InnoDB buffer pool  Less than 1 sec !!!  InnoDB IO Statistic  Get each SQL’s physical and logic read  Page Clean Thread  Remove block in user query thread
  • 5. InnoSQL Flash Cache  InnoSQL Flash Cache  Using SSD as Cache  Other flash cache solution  Facebook flash cache  Oracle flash cache  Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )
  • 6. Facebook Flash Cache  A general solution  Open source  https://github.com/facebook/flashcache  Integration with file systems  built using the Linux Device Mapper  Not optimize for database  Good in read intensive workload  Worse in write intensive workload  Need time to warm up
  • 7. Oracle Flash Cache  Work for Oracle 11g  Page write to flash cache is slow  Not so aggressive  Need warm up
  • 8. Secondary Buffer Pool  Support in InnoSQL 5.5.8  Good in read intensive workload  Also not good for write intensive workload  TPC-C  Can warm up database when start up  Slow for each start  Cache is not a persistent storage
  • 9. Why need warm up ?  Capacity:  SSD >> Memory  Speed  SSD << Memory
  • 10. Flash Cache in InnoSQL 5.5.13  Can cache both read & write operation  Sequential write on SSD  No random write  Merge write  Cache is persistent
  • 11. Why not use SSD as durable storage  SSD is good for random read  7000+ IOPS  100 ~ 150 IOPS for disk  SSD life cycle  SSD write performance  Write: page  Wipe: extent ( 128~256 page)  Database is not fully optimized for SSD  Read ahead algorithm  512 bytes alignment write for log file  Random write
  • 12. Why use SSD as Cache  Cache is everywhere  Register  L1 cache  L2 cache volatile  L3 cache  Memory SSD  Disk non-volatile  Tape
  • 13. Question  Using your SSD as volatile or non-volatile ?
  • 14. Analyze  If use SSD as durable storage  Non-volatile  But now the database not fully optimize it  If use Secondary Buffer Pool or Oracle Flash Cache  Volatile  Performance degrade  Need to write twice ( flash cache & durable storage )  If use Facebook flash cache  Volatile or Non-volatile  Base on cache modes  Writethrough  Writearound  writeback  Performance degrade  Still need to write twice, but use some optimization  Not fully optimize for database
  • 15. Cache in MySQL InnoDB  InnoDB Buffer Pool  Cache page  Asynchronous operation for page  Read page in buffer pool first  Modify page in buffer pool first  Then make fuzzy or sharp checkpoint to disk  Need log manager for recovery  More buffer pool, better performance  Because speed gap between disk and memory  However, we can not get enough memory to cache all the database
  • 16. Cache in MySQL InnoDB  Insert Buffer  Insert buffer is a B+ Tree,  MySQL version < 4.1.x, one table on insert buffer tree.  (page_no, fields_type_info, actual record)  >=4.1, only on insert buffer tree.  (space_id, one-byte-marker, page_no,fields_type_info, actual record)  index by (space_id, page_no)  Work for non-unique secondary index  Write to insert buffer , if page is not in the buffer pool  Insert buffer bitmap page to track the free space of page  2 bit per page  Merge write operation  Merge write  Delay page write  raise write performance  However, increase read operation  MySQL 5.5 Change Buffer  insert、purge、delete mark
  • 17. InnoDB Insert Buffer mysql> show engine innodb statusG; *************************** 1. row *************************** Status: ===================================== 090922 11:52:51 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 15 seconds …… ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX Used Page Free Page Seg size=size+free list len+1 ------------------------------------- Ibuf: size 2249, free list len 3346, seg size 5596, 374650 inserts, 51897 merged recs, 14300 merges Hash table size 4980499, node heap has 1246 buffer(s) 1640.60 hash searches/s, 3709.46 non-hash searches/s merged recs: merges = insert buffer efficiency
  • 18. Cache in MySQL InnoDB  Cache can increase performance  Delay write operation  Gap between disk and cache  However, there is another cache in InnoDB  Doublewrite
  • 19. What is Doublewrite ?  Doublewrite  Avoid partial write problem  512 byte write is always OK  But 16K write is not  Doublewrite buffer  2M  Doublewrite file  2M  Share tablespace: ibdata1
  • 20. Doublewrite Architecture  Stores all data twice, first to the doublewrite buffer, and then to the actual data files  --skip-innodb_doublewrite mysql> show global status like 'innodb_dbl%'G; ************** 1. row ************************ Variable_name: Innodb_dblwr_pages_written Value: 152362 ************** 2. row ************************ Variable_name: Innodb_dblwr_writes Value: 1465 2 rows in set (0.00 sec)
  • 21. Doublewrite Feature  Size: 2M  All the page should first write here  Sequential write  Cache write Hence, what about have a 100G or 300G doublewrite ? This makes flash cache happen
  • 22. Flash Cache in InnoSQL 5.5.13  Replace original doublewrite work  Now user can have a large doublewrite  Page write is sequential  SSD write feature  Doublewrite can read now  SSD random read feature  Cache both read and write operation  Persistent cache  Merge write  60 ~ 70% in workload like TPC-C  Support AIO read on flash cache  Not supported in Secondary Buffer Pool
  • 24. Flash Cache Data Structure /** Flash cache block struct */ struct trx_flashcache_block_struct{ unsigned space:32; /*!< tablespace id */ unsigned offset:32; /*!< page number */ unsigned fil_offset:32; /*!< flash cache page number */ unsigned state:2; /*!< flash cache state*/ trx_flashcache_block_t* hash; /*!< hash chain */ }; Four State: BLOCK_NOT_USED BLOCK_READY_FOR_FLUSH BLOCK_READ_CACHE BLOCK_FLUSHED
  • 25. Flash Cache Data Structure struct trx_flashcache_struct{ mutex_t fc_mutex;/*!< mutex protecting flash cache */ hash_table_t* fc_hash; /*!< hash table of flash cache pages */ ulint fc_size; /*!< flash cache size */ ulint write_off; /*!< write to flash cache offset */ ulint flush_off; /*!< flush to disk this offset */ ulint write_round; /* write round */ ulint flush_round; /* flush round */ trx_flashcache_block_t* block; /* flash cache block */ byte* read_buf_unalign; /* unalign read buf */ byte* read_buf; /* read buf */ }
  • 26. From Developer Perspective View Write Flash Cache File flush_offset write_offset Block Block Block Block Block Block Block Block Flash Cache Block Flash Cache Hash Table Lookup Flash Cache Log File write_offset (In Memory) flush _offset write_round flush_round
  • 27. Flash Cache Flush Algorithms  Flush page in flash cache to disk  Take over the flush in master thread  Flush in flash cache background thread  Algorithms  Less than innodb_flash_cache_write_cache_pct  No flush  Default 10  Less than innodb_flash_cache_do_full_io_pct  Flush 10% innodb_io_capacity  Default 90  Else  Flush 100% innodb_io_capacity  If idle  Flush 100% innodb_io_capacity
  • 28. Merge Write in Flash Cache flush_offset write_offset (7,7) (2,6) (0,6) (3,7) …… (3,7) (2,6) (4,8) Page (2,6)、(3,7) can be merged This much like insert buffer Delay write operation
  • 29. Flash Cache Benchmark  Sysbench OLTP  Read intensive  TPC-C  Write intensive  Blogbench  Blog like application oriented  Developed by Netease
  • 30. Sysbench OLTP InnoDB Buffer Pool: 6G DB Size: 19G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1
  • 31. TPC-C InnoDB Buffer Pool: 12G DB Size: 39G SSD:3607.183 Tpm innodb_flush_method = O_DIRECT Flash Cache:7230.05 Tpm innodb_flush_log_at_trx_commit = 1 Merge Write Ratio:65.47% Flash Cache: 100G
  • 32. Blogbench InnoDB Buffer Pool: 4G DB Size: 21G innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 Merge write ratio: 60%
  • 33. Conclusion  Flash Cache can work in both read and write workload  Work better than using SSD as durable storage  Optimize for SSD in database kernel  No more writes in flash cache  Merge write support
  • 34. SHM for InnoDB Buffer Pool  Use share memory to allocate innodb buffer pool  Why use share memory?  Speed warm up  Warm up speed?  Random read 10~20M/sec  30G buffer pool need 30~60 minutes
  • 35. Warm up Method  Use SQL to warm up  SELECT count(*) FROM table ( force index ( primary key ) )  Warm up speed convert to sequential read  But can not make database to previous workload environment  Dump buffer pool to file  MySQL 5.6+ support  Warm up speed convert to sequential read  Make database to previous workload environment  Dump file is big  Database crash ?
  • 36. Warm up Method  Percona Server  Export (space_id, page_no) in LRU list to file  Load this file order by (space_id,page_no) to make read sequential when MySQL is startup  Make database to previous workload environment  Still need long time to warm up  if you have big buffer pool:128G、256G
  • 37. Warm up in InnoSQL  Use share memory  --innodb_use_shm_preload=1  Share memory configuration like Oracle  /proc/sys/kernel/shmmax  /proc/sys/kernel/shmall  Warm up less than 1 sec  All page is in memory
  • 38. SHM for InnoDB Buffer Pool # list share memory info innosql@db-62:~$ ipcs -a ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x0008c231 4653056 innosql 600 549715968 0 ------ Semaphore Arrays -------- key semid owner perms nsems ------ Message Queues -------- key msqid owner perms used-bytes messages # remove share memory innosql@db-62:~$ ipcrm -m 4653056
  • 39. InnoDB IO Statistics  Get read IO statistics  Like SQL Server:SET STATISTICS IO ON  InnoSQL realize it in Slow query Log  Both file and table  Help SQL developer  10 reads may be not good in OLTP application  Help DBA  Know the SQL real IO statistics  Not only the time it consumes  Still in develop  You can preview this feature
  • 40. InnoDB IO Statistics # Time: 111103 13:29:06 # User@Host: root[root] @ localhost [::1] # Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1 Rows_examined: 1 Logical_reads: 198 Physical_reads: 3 use tpcc; SET timestamp=1320298146; select * from warehouse where w_id=1; # Time: 111103 13:31:28 # User@Host: root[root] @ localhost [::1] # Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1 Rows_examined: 1 Logical_reads: 164 Physical_reads: 50 SET timestamp=1320298288; select * from history;
  • 41. Configuration  long_query_time  io_slow_query  slow_query_type  0 long_query_time  1 io_slow_query  2 both
  • 42. Page Cleaner Thread  Flush page in Master Thread  Adaptive Flush  IO Capacity  Problem  Master Thread have a lot to cope  Async flush can block user query thread  Page cleaner thread  MySQL 5.6 support  InnoSQL support it in MySQL 5.5  Can also help flush in FLUSH_LRU_LIST
  • 43. Flush Algorithms in InnoDB  checkpoint_age:current_lsn – checkpint_lsn  async_water_mark: ~78%*Log_Group_Size  sync_water_mark: ~90%*Log_Group_Size  For example:  Log file size 1G, Log file number 2  Async_water_mark = ~1.5G  Sync_water_mark = ~1.8G
  • 44. Flush Algorithms in InnoDB  checkpoint_age < async_water_mark  adaptive_flusing  5% innodb_io_capacity  async_water_mark < checkpoint_age < sync_water_mark  Block one user query thread  Async flush  checkpoint_age > sync_water  Block all user query thread  Sync flush  n_dirty_pages > innodb_max_dirty_page_pct  Flush innodb_io_capacity
  • 45. Page Cleaner Thread  Reduce master thread burden  Async flush move to this background  No block happened in user query thread
  • 46. However  Flush not only happen in master thread  FLUSH_LRU_LIST  Check if there at least 64 page can be used  In this situation, flush almost in user query thread  Adaptive flush, innodb_io_capacity helps nothing  Happen in user query thread  InnoSQL also move this flush to page cleaner thread  MySQL 5.6 does not support  Still need more optimize
  • 47. Q &A