Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My sql innovation work -innosql

4,404 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

My sql innovation work -innosql

  1. 1. MySQL Innovation Works -- InnoSQL David Jiang jiangchengyao@gmail.com weibo.com/insidemysql
  2. 2. About Me 7+ years work on different databases  SQL Server  MySQL  Oracle Now work for Netease Development and Research Center Lab  MySQL kernel development Author  <<Inside MySQL: InnoDB Storage Engine>>  <<Inside MySQL: SQL Programming >> (coming soon 2012.3)
  3. 3. What is InnoSQL A new MySQL branch  Open source  High performance (flash cache)  Ease of use  Fully compatible with original MySQL  Collect creative idea for MySQL and make it happen MySQL Innovation Works  http://www.innomysql.org
  4. 4. InnoSQL Feature Flash Cache for InnoDB  Provide high performance than just use SSD as durable storage Share memory(SHM) for InnoDB Buffer Pool  Quick warm-up InnoDB buffer pool  Less than 1 sec !!! InnoDB IO Statistic  Get each SQL’s physical and logic read Page Clean Thread  Remove block in user query thread
  5. 5. InnoSQL Flash Cache InnoSQL Flash Cache  Using SSD as Cache Other flash cache solution  Facebook flash cache  Oracle flash cache  Secondary Buffer Pool for InnoDB ( InnoSQL 5.5.8 )
  6. 6. Facebook Flash Cache A general solution Open source  https://github.com/facebook/flashcache Integration with file systems  built using the Linux Device Mapper Not optimize for database Good in read intensive workload Worse in write intensive workload Need time to warm up
  7. 7. Oracle Flash Cache Work for Oracle 11g Page write to flash cache is slow  Not so aggressive Need warm up
  8. 8. Secondary Buffer Pool Support in InnoSQL 5.5.8 Good in read intensive workload Also not good for write intensive workload  TPC-C Can warm up database when start up  Slow for each start Cache is not a persistent storage
  9. 9. Why need warm up ? Capacity:  SSD >> Memory Speed  SSD << Memory
  10. 10. Flash Cache in InnoSQL 5.5.13 Can cache both read & write operation Sequential write on SSD  No random write Merge write Cache is persistent
  11. 11. Why not use SSD as durable storage SSD is good for random read  7000+ IOPS  100 ~ 150 IOPS for disk SSD life cycle SSD write performance  Write: page  Wipe: extent ( 128~256 page) Database is not fully optimized for SSD  Read ahead algorithm  512 bytes alignment write for log file  Random write
  12. 12. Why use SSD as Cache Cache is everywhere  Register  L1 cache  L2 cache volatile  L3 cache  Memory SSD  Disk non-volatile  Tape
  13. 13. Question Using your SSD as volatile or non-volatile ?
  14. 14. Analyze If use SSD as durable storage  Non-volatile  But now the database not fully optimize it If use Secondary Buffer Pool or Oracle Flash Cache  Volatile  Performance degrade  Need to write twice ( flash cache & durable storage ) If use Facebook flash cache  Volatile or Non-volatile  Base on cache modes  Writethrough  Writearound  writeback  Performance degrade  Still need to write twice, but use some optimization  Not fully optimize for database
  15. 15. Cache in MySQL InnoDB InnoDB Buffer Pool  Cache page  Asynchronous operation for page  Read page in buffer pool first  Modify page in buffer pool first  Then make fuzzy or sharp checkpoint to disk  Need log manager for recovery  More buffer pool, better performance  Because speed gap between disk and memory  However, we can not get enough memory to cache all the database
  16. 16. Cache in MySQL InnoDB Insert Buffer  Insert buffer is a B+ Tree,  MySQL version < 4.1.x, one table on insert buffer tree.  (page_no, fields_type_info, actual record)  >=4.1, only on insert buffer tree.  (space_id, one-byte-marker, page_no,fields_type_info, actual record)  index by (space_id, page_no)  Work for non-unique secondary index  Write to insert buffer , if page is not in the buffer pool  Insert buffer bitmap page to track the free space of page  2 bit per page  Merge write operation  Merge write  Delay page write  raise write performance  However, increase read operation  MySQL 5.5 Change Buffer  insert、purge、delete mark
  17. 17. InnoDB Insert Buffermysql> show engine innodb statusG;*************************** 1. row ***************************Status:=====================================090922 11:52:51 INNODB MONITOR OUTPUT=====================================Per second averages calculated from the last 15 seconds……-------------------------------------INSERT BUFFER AND ADAPTIVE HASH INDEX Used Page Free Page Seg size=size+free list len+1-------------------------------------Ibuf: size 2249, free list len 3346, seg size 5596,374650 inserts, 51897 merged recs, 14300 mergesHash table size 4980499, node heap has 1246 buffer(s)1640.60 hash searches/s, 3709.46 non-hash searches/s merged recs: merges = insert buffer efficiency
  18. 18. Cache in MySQL InnoDB Cache can increase performance Delay write operation  Gap between disk and cache However, there is another cache in InnoDB  Doublewrite
  19. 19. What is Doublewrite ? Doublewrite  Avoid partial write problem  512 byte write is always OK  But 16K write is not  Doublewrite buffer  2M  Doublewrite file  2M  Share tablespace: ibdata1
  20. 20. Doublewrite Architecture  Stores all data twice, first to the doublewrite buffer, and then to the actual data files  --skip-innodb_doublewritemysql> show global status like innodb_dbl%G;************** 1. row ************************Variable_name: Innodb_dblwr_pages_written Value: 152362************** 2. row ************************Variable_name: Innodb_dblwr_writes Value: 14652 rows in set (0.00 sec)
  21. 21. Doublewrite Feature Size: 2M All the page should first write here Sequential write Cache write Hence, what about have a 100G or 300G doublewrite ? This makes flash cache happen
  22. 22. Flash Cache in InnoSQL 5.5.13 Replace original doublewrite work Now user can have a large doublewrite Page write is sequential  SSD write feature Doublewrite can read now  SSD random read feature Cache both read and write operation Persistent cache Merge write  60 ~ 70% in workload like TPC-C Support AIO read on flash cache  Not supported in Secondary Buffer Pool
  23. 23. Flash Cache Architecture
  24. 24. Flash Cache Data Structure/** Flash cache block struct */struct trx_flashcache_block_struct{ unsigned space:32; /*!< tablespace id */ unsigned offset:32; /*!< page number */ unsigned fil_offset:32; /*!< flash cache page number */ unsigned state:2; /*!< flash cache state*/ trx_flashcache_block_t* hash; /*!< hash chain */}; Four State: BLOCK_NOT_USED BLOCK_READY_FOR_FLUSH BLOCK_READ_CACHE BLOCK_FLUSHED
  25. 25. Flash Cache Data Structurestruct trx_flashcache_struct{ mutex_t fc_mutex;/*!< mutex protecting flash cache */ hash_table_t* fc_hash; /*!< hash table of flash cache pages */ ulint fc_size; /*!< flash cache size */ ulint write_off; /*!< write to flash cache offset */ ulint flush_off; /*!< flush to disk this offset */ ulint write_round; /* write round */ ulint flush_round; /* flush round */ trx_flashcache_block_t* block; /* flash cache block */ byte* read_buf_unalign; /* unalign read buf */ byte* read_buf; /* read buf */}
  26. 26. From Developer Perspective View Write Flash Cache File flush_offset write_offset Block Block Block Block Block Block Block BlockFlash Cache Block Flash Cache Hash Table Lookup Flash Cache Log File write_offset (In Memory) flush _offset write_round flush_round
  27. 27. Flash Cache Flush Algorithms Flush page in flash cache to disk Take over the flush in master thread Flush in flash cache background thread Algorithms  Less than innodb_flash_cache_write_cache_pct  No flush  Default 10  Less than innodb_flash_cache_do_full_io_pct  Flush 10% innodb_io_capacity  Default 90  Else  Flush 100% innodb_io_capacity  If idle  Flush 100% innodb_io_capacity
  28. 28. Merge Write in Flash Cache flush_offset write_offset (7,7) (2,6) (0,6) (3,7) …… (3,7) (2,6) (4,8) Page (2,6)、(3,7) can be merged This much like insert buffer Delay write operation
  29. 29. Flash Cache Benchmark Sysbench OLTP  Read intensive TPC-C  Write intensive Blogbench  Blog like application oriented  Developed by Netease
  30. 30. Sysbench OLTPInnoDB Buffer Pool: 6GDB Size: 19Ginnodb_flush_method = O_DIRECTinnodb_flush_log_at_trx_commit = 1
  31. 31. TPC-CInnoDB Buffer Pool: 12GDB Size: 39G SSD:3607.183 Tpminnodb_flush_method = O_DIRECT Flash Cache:7230.05 Tpminnodb_flush_log_at_trx_commit = 1 Merge Write Ratio:65.47%Flash Cache: 100G
  32. 32. BlogbenchInnoDB Buffer Pool: 4GDB Size: 21Ginnodb_flush_method = O_DIRECTinnodb_flush_log_at_trx_commit = 1Merge write ratio: 60%
  33. 33. Conclusion Flash Cache can work in both read and write workload Work better than using SSD as durable storage Optimize for SSD in database kernel No more writes in flash cache Merge write support
  34. 34. SHM for InnoDB Buffer Pool Use share memory to allocate innodb buffer pool Why use share memory?  Speed warm up Warm up speed?  Random read 10~20M/sec  30G buffer pool need 30~60 minutes
  35. 35. Warm up Method Use SQL to warm up  SELECT count(*) FROM table ( force index ( primary key ) )  Warm up speed convert to sequential read  But can not make database to previous workload environment Dump buffer pool to file  MySQL 5.6+ support  Warm up speed convert to sequential read  Make database to previous workload environment  Dump file is big  Database crash ?
  36. 36. Warm up Method Percona Server  Export (space_id, page_no) in LRU list to file  Load this file order by (space_id,page_no) to make read sequential when MySQL is startup  Make database to previous workload environment  Still need long time to warm up  if you have big buffer pool:128G、256G
  37. 37. Warm up in InnoSQL Use share memory  --innodb_use_shm_preload=1 Share memory configuration like Oracle  /proc/sys/kernel/shmmax  /proc/sys/kernel/shmall Warm up less than 1 sec  All page is in memory
  38. 38. SHM for InnoDB Buffer Pool# list share memory infoinnosql@db-62:~$ ipcs -a ------ Shared Memory Segments --------key shmid owner perms bytes nattch status0x0008c231 4653056 innosql 600 549715968 0 ------ Semaphore Arrays --------key semid owner perms nsems ------ Message Queues --------key msqid owner perms used-bytes messages# remove share memoryinnosql@db-62:~$ ipcrm -m 4653056
  39. 39. InnoDB IO Statistics Get read IO statistics  Like SQL Server:SET STATISTICS IO ON InnoSQL realize it in Slow query Log  Both file and table Help SQL developer  10 reads may be not good in OLTP application Help DBA  Know the SQL real IO statistics  Not only the time it consumes Still in develop  You can preview this feature
  40. 40. InnoDB IO Statistics# Time: 111103 13:29:06# User@Host: root[root] @ localhost [::1]# Query_time: 119.293823 Lock_time: 119.274822 Rows_sent: 1 Rows_examined: 1 Logical_reads: 198 Physical_reads: 3use tpcc;SET timestamp=1320298146;select * from warehouse where w_id=1;# Time: 111103 13:31:28# User@Host: root[root] @ localhost [::1]# Query_time: 0.335019 Lock_time: 0.333019 Rows_sent: 1 Rows_examined: 1 Logical_reads: 164 Physical_reads: 50SET timestamp=1320298288;select * from history;
  41. 41. Configuration long_query_time io_slow_query slow_query_type  0 long_query_time  1 io_slow_query  2 both
  42. 42. Page Cleaner Thread Flush page in Master Thread  Adaptive Flush  IO Capacity Problem  Master Thread have a lot to cope  Async flush can block user query thread Page cleaner thread  MySQL 5.6 support  InnoSQL support it in MySQL 5.5  Can also help flush in FLUSH_LRU_LIST
  43. 43. Flush Algorithms in InnoDB checkpoint_age:current_lsn – checkpint_lsn async_water_mark: ~78%*Log_Group_Size sync_water_mark: ~90%*Log_Group_Size For example:  Log file size 1G, Log file number 2  Async_water_mark = ~1.5G  Sync_water_mark = ~1.8G
  44. 44. Flush Algorithms in InnoDB checkpoint_age < async_water_mark  adaptive_flusing  5% innodb_io_capacity async_water_mark < checkpoint_age < sync_water_mark  Block one user query thread  Async flush checkpoint_age > sync_water  Block all user query thread  Sync flush n_dirty_pages > innodb_max_dirty_page_pct  Flush innodb_io_capacity
  45. 45. Page Cleaner Thread Reduce master thread burden Async flush move to this background  No block happened in user query thread
  46. 46. However Flush not only happen in master thread FLUSH_LRU_LIST  Check if there at least 64 page can be used  In this situation, flush almost in user query thread  Adaptive flush, innodb_io_capacity helps nothing  Happen in user query thread InnoSQL also move this flush to page cleaner thread  MySQL 5.6 does not support  Still need more optimize
  47. 47. Q &A

×