SlideShare a Scribd company logo
1 of 165
Download to read offline
Who Am I?
• Jonathan
• MySQL Consultant
• Working with MySQL since 2007
• Specialize in SQL, Indexing and
 Reporting (Big Data)
Who is this for?




* Smilies indicate ability to control area
What Will I Cover?


               Domain Knowledge

 This Much

0%      20%                       100%
Solutions this
              tutorial will cover
Occurrences




              Problems
serenesimplycomplicated.blogspot.co.uk/2012/07/developing-direction.html
http://frabz.com/3bv6
What Will I Cover?

• The top 20% of the strategies to resolve 80% of
  your performance problems
• The strategies that are within reach of
  developers
• Strategies that are more common and more
 established
  • From my experience.
  • To reduce risk
Table of Contents

    Part One             Part Three
• Indexes             • Read Cache
• Finding             • Scaling Reads
 Bottlenecks          • Reporting
   Part Two           • Write Buffers
• Partitioning        • Scaling Writes
• Intensive Table     • Sharding
 Optimization
Indexes
What are Indexes?
Indexes
• Advantages
 • Speed – Use the right path
• Now used in NoSQL stores
• “A properly indexed database will give
  you very few problems” – me
• My blog – “Indexing and Caching”
B-Tree Indexes




(http://20bits.com/article/interview-questions-database-indexes)
(http://www.youtube.com/watch?v=coRJrcIYbF4)
Choosing the best Index

Prevent Reading
from Disk

                      *3



                           *2
                                       Prevent Extra
                                *1     Processing




Prevent Table Scans
Indexes
• 1 Star – EXPLAIN
 • Type
   • const - where id=1
   • ref - where location='london'
   • eq_ref - where t1.id = t2.id
 • Extra
   • using where
 • Limitation – type
   • range - where id in (1,2,3,4,5)
Indexes
• 2 Star – EXPLAIN
 • Extra
   • using where
   • Using index
 • And Not
   • Using filesort
   • Using temporary
 • Limitation
   • Using temporary - query contains different GROUP
    BY and ORDER BY columns
Indexes
• 3 Star – EXPLAIN
 • Type
   • index
 • Extra
   • Using index
   • Using index for group-by
Index Examples
Regular Usage



SELECT , FROM ₸
  WHERE = 121;

  (PRIMARY KEY )
Range Scan


  SELECT ▲, ■ FROM ₸
    WHERE = 121
AND BETWEEN 1 AND 100


       KEY ( , )
Range Scan


SELECT ▲, ■ FROM ₸
  WHERE = 121
AND IN (1,100,30,7)


     KEY ( , )
Covering Index



SELECT , FROM ₸
  WHERE = 121


   KEY ( , , )
Not Optimal


SELECT       FROM ₸
  WHERE = 121
AND ♪ IN (1,100,30,7)

     KEY ( ,       )
Broken Range


SELECT ▲, ■ FROM ₸
  WHERE = 121
AND IN (1,100,30,7);


      KEY ( , )
Sub Queries



   SELECT ▲, ■ FROM ₸
WHERE IN (SELECT ♪ FROM ♠)


          KEY ( )
Indexes for Sorting

SELECT ☼, ☺ FROM ₸
  WHERE = 121
    GROUP BY
    ORDER BY

        KEY ( )
Indexes for Sorting

SELECT ☼, ☺ FROM ₸
  WHERE = 121
    GROUP BY
    ORDER BY

      KEY ( ,   )
     or KEY ( , )
Indexes for Joins


 SELECT . , .      FROM
INNER JOIN ON . = .
       WHERE . = 232

            KEY ( )
         or KEY( )
WHERE ₸.Ω = 232;   INNER JOIN ♫
FULL SCAN ♫   INNER JOIN ₸




               FILTER Ω = 232
SELECT ▲, (SELECT .. FROM WHERE )
       FROM ₸ WHERE ♪ IN
   (SELECT ♪ FROM ♠ WHERE..);


 “I need help optimizing the my.cnf”
Clustered PK and
                         Secondary Indexes



                            KEY ( , , )

                           PRIMARY KEY ( )
                              KEY ( , )

                 (Can be used in GROUP / ORDER BY
                 SELECT variables to make Covering index)
http://www.dbasquare.com/2012/05/17/can-mysql-use-primary-key-values-from-a-secondary-index/
Index Merge



 SELECT ☺ FROM ₸
WHERE =1 OR =2;

      KEY ( )
      KEY ( )
Index Merge



    SELECT☺FROM ₸ WHERE =1
UNION (SELECT☺FROM ₸ WHERE =2)


            KEY ( )
            KEY ( )
* 5.6
Finding Bottlenecks
*


        *




*
Gathering Data

What you will need:
1. MySQL Slow log: MySQL >= 5.1 or
   Percona/MariaDB microslow patch
2. Set long_query_time = 0 - for 6 to 24 hours
   for decent sized slow log. (Make sure host
   has enough space)
Log Processing
Worst Response Queries
1. Echo ‘’ > slow.log
2. mysql> set global long_query_time=0; set
   long_query_time = 0; flush logs;
3. Wait X hours and return original value.
4. pt-query-digest slow.log > slow.txt
                                           Processing should be
Bulky Queries                              done on another host
--filter ‘($event->{Rows_examined} > 1000)’
Write Queries
--filter '($event->{Rows_affected} > 0)‘
Log Processing
• MySQL 5.6
 • Statement Digest
 • No need for log processing to get Digest.
Worst Response Queries
                 pt-query-digest slow.log > slow.txt

Rank   Response time    Calls   R/Call               Item

                                       SELECT dp_node dp_usernode
 1     480.9273 16.3%   600     0.8015 dp_buddylist dp_users
                                       dp_node_access

 2     322.4220 4.3% 129258 0.0025 ADMIN INIT DB
 3     314.8719 4.2%    30220   0.0104 UPDATE dp_users
 4     287.7109 3.8%    51606   0.0056 SET
                                       SELECT dp_node dp_usernode
 5     269.3434 3.6%    600     0.4489 dp_buddylist dp_users
                                       dp_node_access
 6     238.8571 3.1% 2902141 0.0001 SELECT dp_url_alias
* Gap Locking   * pt-stalk
mysql tables in use 4, locked 2
5289 lock struct(s), heap size 620984, 273785 row lock(s), undo log entries
363312
MySQL thread id 467, OS thread handle 0x7fceab7df700, query id 88914423
  SELECT r.exchange_rate INTO destination_exchange_rate
      FROM exchange_rate AS r WHERE r.currency_id =
  NAME_CONST('destination_currency_id',6) AND r.date =
  NAME_CONST('day',_latin1'2012-06-30' COLLATE
  'latin1_swedish_ci')
Trx read view will not see trx with id >= 10ECC92B0, sees < 10ECC916F
TABLE LOCK table `currency` trx id 10ECC90CC lock mode IS
RECORD LOCKS space id 0 page no 261 n bits 80 index `fk_currency_status1` of
table `currency` trx id 10ECC90CC lock mode S
TABLE LOCK table `daily_summary` trx id 10ECC90CC lock mode IS
RECORD LOCKS space id 0 page no 34829580 n bits 200 index `PRIMARY` of table
`daily_summary` trx id 10ECC90CC lock mode S
TABLE LOCK table `exchange_rate` trx id 10ECC90CC lock mode IS
TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
*** (1) TRANSACTION:
TRANSACTION 13DCDF4D9, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s)
MySQL thread id 2438176, OS thread handle 0x7f9a37408700, query id
118341815748
UPDATE sys_doctrine_lock_tracking
SET timestamp_obtained = '1341839053'
 WHERE object_key = '1146' AND user_ident = '158' AND c_type = '137'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 48627 n bits 280 index `PRIMARY` of table
`sys_doctrine_lock_tracking` trx id 13DCDF4D9 lock_mode X locks rec but not
gap waiting
Record lock, heap no 207 PHYSICAL RECORD: n_fields 6; compact format;
 0: len 8; hex 43616d706169676e;
 1: len 4; hex 31313436; asc 1146;;
 2: len 6; hex 00013dc12d7a; asc = -z;;
 …
Bottlenecks
• Locking Queries
   • Try to make them complete as fast as possible
 • JOINs vs sub query
 • Function wrapped around index
 • UDF with SQL inside
 • Long Transactions
 • Looping with short queries
Misbehaving Optimizer
• Optimizer Hints:
  • USE INDEX
  • FORCE INDEX
  • IGNORE INDEX
  • STRIGHT_JOIN
• Joins
  • LEFT sometimes faster then INNER


* Too many indexes confuse the optimizer
Bottlenecks
• Virtualization
  • Increase (obscenely) innodb_log_file_size
    • Needs restart + deleting old log files
  • EXT3
• General I/O improvements
  • innodb-flush-log-at-trx-commit
  • Sync binlog
                              Group Commit
  • Xa support
Bottlenecks
• General I/O improvements
 • Percona server
   • Better flushing to disk
   • Less mutexes
 • Upgrade MySQL
   • Same reasons as above
 • Innodb-io-capacity
  http://www.wmarow.com/strcalc/
Database Upgrades
• My Secret Sauce for smooth MySQL
 Migrations
 1. Upgrade the dev/staging DBs with desired
    version
 2. Wait 1-2 months till silky-smooth
   •   All features have been tested
   •   All the query issues have been fixed
 3. Upgrade servers
   •   Down to Up – Slaves first
Bottlenecks
• Mutexes
 • Query Cache
   • “Freeing items” in processlist
• Network
                                       Batch Processes
 • Skip-name-resolve
 • net_write_timeout / read_timeout
 • thread_cache_size
                          Lots of connections
Homework
• Haven’t talked fully about:
   SQL and EXPLAIN
• Webinars
  • Indexes - percona.tv/percona-webinars/tools-and-
   techniques-for-index-design
  • Explain - percona.tv/percona-webinars/explain-demystified
• Websites
  • http://www.myxplain.net
Part Two
Partitioning
Columns




Rows
       Big Table
Columns

            Partition

            Partition

            Partition

            Partition
Algorithm
            Partition   Rows

            Partition

            Partition

            Partition

            Partition
Select
                       Algorithm         Inserting
                                                             Parallelize Data
 Unique Key                              Algorithm

                                                                   Reduce Data
Foreign Keys
                           Issues
                                                     Use Cases
 Primary Key
  Overhead
                                    Partitioning
                                                                       ID

                Benefits                             Automatic
                                                                         Time
B-Tree
Levels                                  Manual
                                                                  Hash


         Short Scans                                 Time             80-90%
                              Archive
                                                                      DB Usage
                                           Shards
Partitioning                    Parallelize Data

                                                     Reduce Data


• Use Cases                           Use Cases


 • Reducing Data – Only get the partition/table
   that you need
 • Parallelizing Data - Get an equal amount of
   data from each partition in parallel
• Benefits
                                                  Benefits
 • Shorter table scans
                                 B-Tree
 • Less levels for index scans   Levels


                                          Short Scans
Issues
• Algorithm                                  Select
                                                         Inserting
                                           Algorithm
 • INSERTs                                               Algorithm

 • SELECTs
                                             Issues
• Keys               Unique Key

 • Foreign Key
                            Foreign Keys
 • Unique Key                                     Primary Key
                                                   Overhead
 • Primary Key Overhead
   • Increases Table size
   • Can change indexes
Partition Types
• Range
• List                              ID

• Hash
                     Automatic
                                        Time
• Key
• Columns                        Hash

• Sub-partitioning
Partitioning by Usage
                        vs Partitioning by Maintenance


Rank   Response time      Calls                          Item

                             SELECT address FROM        WHERE
 1     480.9273 26.3% 129258     BETWEEN ‘2012-11-10’ and ‘2012-11-
                             17’
                                  SELECT total FROM         WHERE
 2     322.4220 14.3%     600
                                  BETWEEN ‘2012-11-01’ and ‘2012-11-30’
                                  UPDATE             SET active=1 WHERE   =
 3       34.8719 4.2% 30220
                                  17635376

                                  SELECT dispatch_time FROM
 4       28.7109 3.8% 51606
                                  WHERE = 7387612
CREATE TABLE orders
     id int unsigned not null auto_increment,
     `date` date not null,
     …
                                Can also do
PRIMARY KEY (date, id),
                                PRIMARY KEY (id, date)
KEY id (id),                    KEY date (date)
..) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY        (        (date))
(PARTITION   VALUES LESS THAN (      (‘2012-01-01’),
PARTITION       VALUES LESS THAN (      (‘2012-06-01’)),
PARTITION      VALUES LESS THAN           )
Manual Partitioning
• Archive - Main table & Archive Table
• Time – Create table per year, per month..
• Shards – Table per country


 * Foreign keys             Manual


                                        Time
                  Archive

                               Shards
Intensive Table Optimization
Intensive Table Optimization



  Once upon a time, I was researching
ways to make a database working set fit as
much as possible to memory…
mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
Intensive Table Optimization
1. People are usually very liberal with data type
   sizes
2. There were (usually) so many indexes that:
  1. They multiplied the table size
  2. Were not efficient compares to how the table is
     used
  3. Confused the optimizer
3. Discovered partitions were not used or misused
  1. Discovered sub partitions
  2. Primary Key alignment
4. Discovered InnoDB compression
Intensive Table Optimization




There are tools to help with this, but…
quickmeme.com
/meme/3rmy8y/
Intensive Table Optimization

                                                               Data Types
           Slowest Queries


                      Bottleneck
User Statistics                                 Optimizations
                        Tables
                                   Optional

    Table Sizes
                                                    Query Logs


                                       Partitions                  Indexes


                                                    Foreign Keys
Gathering Data

What you will need:
1. MySQL Slow log: MySQL >= 5.1 or
   Percona/MariaDB microslow patch
2. Set long_query_time = 0 - for 6 to 24 hours
   for decent sized slow log. (Make sure host
   has enough space)
                  Slowest Queries



                                      Bottleneck Tables
Gathering Data

Helpful (Optional):
1. Percona/Mariadb user_statistics patch
2. Get list of most read/written tables
3. Get list of used and un-used indexes
4. List of largest tables
                                     Table Sizes



                   User Statistics

                                      Bottleneck Tables
Worst Response Queries
             pt-query-digest slow.log > slow.txt

Rank   Response time      Calls               Item

 1     8589.9513 27.5% 231051 UPDATE dp_users
 2     4752.6688 15.2%    257235   SELECT dp_cache_menu
 3      1606.4946 5.1%    183542   SELECT community_chats
 4      1418.9034 4.5%    259939   SELECT dp_cache
 5       564.3305 1.8%   7970165   SELECT dp_url_alias
 6       495.0092 1.6%    44940    SELECT dp_event dp_node
Table Statistics

SELECT table_name,
FROM information_schema.table_statistics
ORDER BY            DESC LIMIT 5;
                        table_name        rows_read
                        dp_users          2302477894
ROWS_CHANGED            dp_node           1231318439
ROWS_CHANGED_X_INDEXES dp_comments        1071462211
                        dp_userpoints     1033073070
                        dp_search_index   260154684
Table Statistics

          Worst Response Tables
            --group-by tables

Rank Response time    Calls         Item
 1 7975.4487 6.5% 124384 advertisement
 2 5554.1435 4.5%     1834 info
 3   4915.4816 4.0%   208     placement
 4 4902.7644 4.0%     158     advert_summary
Table Sizes

             Table_Name                                Rows            Data            Idx Total_size Idxfrac

total_daily_summary                                   610M              77G           88G           165G            1.15

advert_summary                                        478M              57G           45G           102G            0.78

log_messages                                            92M             47G           10G             57G           0.21
SELECT CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_NAME,
   CONCAT(ROUND(TABLE_ROWS / 1000000, 2), 'M')                    ROWS,
   CONCAT(ROUND(DATA_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G')        DATA,
   CONCAT(ROUND(INDEX_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G')       IDX,
   CONCAT(ROUND(( DATA_LENGTH + INDEX_LENGTH ) / ( 1024 * 1024 * 1024 ), 2), 'G') TOTAL_SIZE,
   ROUND(INDEX_LENGTH / DATA_LENGTH, 2)                         IDXFRAC
FROM INFORMATION_SCHEMA.TABLES
ORDER BY DATA_LENGTH + INDEX_LENGTH DESC                                          http://www.mysqlperformanceblog.com
LIMIT 10;
                                                                            /2008/03/17/researching-your-mysql-table-sizes/
Which table needs
             your attention?




Table Size
Intensive Table Optimization
• Table Targeting
 • The most “worthy” table to focus your
   attention on
   • Biggest bang for your buck
• If you know which table is the most
 troublesome
 • Ignore most of the investigations
 • Apart from slow log
 • Investigations help understand DB usage
Optimizations


      Compression                         Data Types


                        Query Logs




          Partitions                   Indexes




Sub Partitioning                                 Don’t Need
                        Foreign Keys
Intensive Table Optimization
• Datatypes
 • SELECT * FROM table
                     G
 • Example: Tinyint instead of Bigint:
 (7 bytes row + 7bytes index) * 350million
 rows = 4.9Gb
 • Enum instead of Varchar
 • Remove NULLs when not needed
Intensive Table Optimization
• Compression
 • Best for tables with a lot of varchar/text
 • Compress table by x2, x4, x8..
   • Need to experiment with innodb_strict = on;
 • On my tests (5.5) – Very very slow
   • Alter tables
   • INSERTS/UPDATES/DELETES
                                        Optimizations


                        Compression
1.
                      Slow Log
                                  Filtered by
     Get Data                    Target Table
                  Query Digest


                                   New Results
                                                       EXPLAIN      3.

                 2.                                           Index-Usage
                     Make                           Test
                  Assumptions                   Assumptions




            4.        Deploy
Target Table Processing
Filter Log:
pt-query-digest slow.log
--filter '$event->{arg} =~ m/dp_users /'
--no-report --print >dp_users.log

Worst Queries from new log:
pt-query-digest dp_users.log --limit 100%
>tbl_dp_users.txt
Response
Rank   time     Calls                        Item
     209.2863         UPDATE dp_users SET access = 133******3 WHERE
 1             88850
      10.7%           = 23****01G
     162.2711
 3            1309010 SELECT access FROM dp_users WHERE       = 21***4G
       8.3%
     139.9009            SELECT uid, name FROM dp_users WHERE           =1
 4               197
       7.1%              ORDER BY          DESCG
     133.8691            SELECT * FROM dp_users u WHERE          =
 5               327
       6.8%              's******s'G
                      SELECT name, created, picture FROM dp_users WHERE
     109.6903
 6              29152 picture !='' AND      = '1' AND        BETWEEN
       5.6%
                      '133*****0' AND '133*****60'G
      92.9095            SELECT dp_node dp_users using (     )
 7              360642
       4.7%              dp_node_revisions
      74.2426            SELECT * FROM dp_users u WHERE          = hoa****rio'
 8               106
       3.8%              AND      = '3837********5f9b' AND        = 1G
Partitioning by Usage

Rank   Response time    Calls                     Item

                                SELECT address FROM orders WHERE date
 1     480.9273 26.3% 129258
                                BETWEEN ‘2012-11-10’ and ‘2012-11-17’

                                SELECT total FROM orders WHERE date
 2     322.4220 14.3%   600
                                BETWEEN ‘2012-11-01’ and ‘2012-11-30’
                                UPDATE order SET active=1 WHERE id =
 3       34.8719 4.2% 30220
                                17635376

                                SELECT dispatch_time FROM order WHERE
 4       28.7109 3.8% 51606
                                id = 7387612
Testing Assumptions

SELECT uid, name FROM             WHERE
       = 1 ORDER BY           DESCG
1.30secs

SELECT uid, name FROM
WHERE         = 1 ORDER BY
DESCG
0.56secs       Query Digest
                                  EXPLAIN
Test Environment
• Hardware environment similar to live
• Data size similar to live environment:
  • Replicating slave
   •   Cannot change datatypes on MIXED/ROW
       replication
   •   Create table2 and run queries against it
  • Xtrabackup – full replica                         New Results
  • Script with Mysqldump + WHERE
   •   mysqldump --databases main --tables table1 table2 –where “date >
       now() – interval 30 day” > dump.sql
   •   Mysqldump –all-database –ignore-table main.table1 main.table2 >>
       dump.sql
Final Tweaking

(Remember the table log file – dp_users.log ?)
pt-index-usage
• pt-index-usage dp_users.log --host
  127.0.0.1 --tables dp_users
  >idx_dp_users.txt
• Go over recommendations
                                      Index-Usage
                            Test
                        Assumptions
Deploy Strategies
1. Rolling Servers
2. pt-online-schema-change                                Deploy
3. Two-part move
 a. Create new table – table2
 b. Insert table rows that will not change – INSERT INTO
       table2 SELECT * FROM table1 WHERE date <= curdate() – interval 30 day;
 c.    Short downtime
 d. Rename table1 to table3; rename table2 to table1;
 e. INSERT IGNORE INTO table1 SELECT * FROM table3 WHERE date >=
      curdate() – interval 30 day;

4. Alter table – long downtime (pre 5.6, maybe)
Continuous Self-Learning
Slow Log
                           Filtered by
Get Data                  Target Table
           Query Digest


                                                EXPLAIN
                            New Results


                                                          Index-Usage
                                              Test
              Make
                                          Assumptions
           Assumptions




            Deploy
Part Three
Streaming
           Reverse         File
            Proxy                         Volatile


Browser                                                                    Shield
 Cache                                         Query Cache
                  Page Cache




                                                                  Denormalize
   2nd Level
                      Summary                    Column
                       Tables                     Cache
   3rd Level                                                         Subtotal


                                  Attributes
 Data Warehouse                                              Conditional
Read Cache
• Outside the database
  • Page Cache
  • Query Cache


• Inside the database
  • Column Cache
  • Summary Table

* Complexity
Page Cache
• Browser Cache
  • Etag, Expires, Last-modified
• Reverse Proxy
  • Squid, Varnish, Nginx, Apache, Proprietary.
• File/Full page cache
  • mod_file_cache, Zend_Cache_Backend
  • W3 Total Cache, sfSuperCache
* Stale
memegenerator.net/instance/23247230
Query Cache

• Volatile
  • Memcached, Redis, Hibernate Cache, Arrays..
  • On-Request, Time-to-Live, Stale and Cache
   Stampede
• Streaming
  • Interval / Async, Stale, Common Queries
• Shield – Mongo Shield
  • Script/Tool Replication, Dependency
  • Aggregation
  • Complexity / Layers
Mongo
             Shield




                                                147cm




img.photobucket.com/albums/v158/keris_hanuman/Afbeelding1455.jpg
Sticky          Sticky




Sessions                    Sessions




Memcached                      Memcached
                     Cart

      MySQL
Memcached
            Sessions   MySQL
             Cart
Manipulating Time




* Error Handling
Column Cache
• Denormalize
   • Additional Column(s) to prevent JOINs
   • Maintenance, Space on disk
   • Example: CustomerID, OrderID, OrderItemID
• Sub Total
   • Prevent additional slow GROUP BY queries
   • Maintenance, Generation, Space on disk
   • Example: totalPurchases, moneyOwed
* Space vs Speed
Column Cache

• Conditional
  •   Store conditional (True/False) logic
  •   Prevents recalculating result – another query
  •   Can prevent rewriting code
  •   Example: isDone, hasReview, aboveAvg
• Attributes
  •   ENUM datatype
  •   SET datatype - ARRAY of options
  •   Prevents JOINs
  •   May save space
Summary Tables

An additional table which consists
of an aggregation of another table or
several JOIN’d tables.



                           Summary
                            Tables
SELECT ...
FROM main_table t1
 INNER JOIN table2 t2 on t1.orderid = t2.id
 INNER JOIN table3 t3 on t1.customerid = t3.id
 INNER JOIN table4 t4 on t1.addressid = t4.id
 INNER JOIN table5 t5 on t2.supplierid = t5.id
 INNER JOIN table6 t6 on t2.warehouse = t6.id
 INNER JOIN table2 t7 on t6.addressid = t7.id
 INNER JOIN table8 t8 on t1.productid = t8.id
 INNER JOIN table9 t9 on t1.buyerid = t9.id
 INNER JOIN table10 t10 on t1.officeid = t10.id
WHERE
t1.date between '2012-11-01' and '2012-11-30'
GROUP BY t1.date
Summary Tables

Processed 1.2million rows
Returned 30 rows
Time 17.52 minutes
Summary Tables
CREATE TABLE summary_table
(primary key (date,addressid,productid)) as
SELECT ...
FROM main_table t1
INNER JOIN table2 t2 on t1.orderid = t2.id
INNER JOIN table3 t3 on t1.customerid =
t3.id
INNER JOIN table4 t4 on t1.addressid = t4.id
INNER JOIN table6 t6 on t2.warehouse = t6.id
GROUP BY t1.date, t1.addressid, t1.productid
Summary Tables

SELECT ...
FROM summary_table t1
INNER JOIN table5 t5 on t2.supplierid = t5.id
INNER JOIN table2 t7 on t6.addressid = t7.id
INNER JOIN table8 t8 on t1.productid = t8.id
INNER JOIN table9 t9 on t1.buyerid = t9.id
INNER JOIN table10 t10 on t1.officeid = t10.id
WHERE
t1.date between '2012-11-01' and '2012-11-30‘
GROUP BY t1.date
Summary Tables

Processed 35000 rows
Returned 30 rows
Time 0.75 seconds
Summary Tables
        as an
Analytics Sub-System
Database Design
                       Operational System          Analytic System
   Comparison


                      Execution of a business   Measurement of a
Purpose
                      process                   business process

                      Insert, Update, Query,
Primary Interaction                             Query
                      Delete

Design Optimization Update concurrency          High-performance query

                      Entity-relationship (ER) Dimensional design (Star
Design Principle
                      3rd Normal form (3NF) schema or cube)


     amazon.co.uk/Schema-Complete-Reference-Christopher-Adamson/
Data Warehouses

   Fact Tables           Dimension Tables

• Measurement        •    Context
• Narrow             •    Wide
• Long               •    Short
• Most of the
                     •    Filters and
                          descriptive
  data                    data
Operational Design
Operational Design
Customers


                          Orders




                                       Order
                                       Items



            Addresses
                            Products
Star Schema
Star Schema
Date dim




                               Address dim




                  OrderItems
                     Fact




 Products dim
                               Customers dim
Maintenance
• Hourly/Daily/Weekly/Monthly Aggregations
• Intervals
• Off Peak
• On-Insert
A lot more
                                InnoDB              settings
            Read Cache         Buffer Pool


                                                   Indexes



 Sharding                                                     Intensive
                           Scaling Reads                        Table
                                                             Optimization

Galera
               Read                           Partitioning
              Slaves
                               Better
                              Hardware                  Sub Partitioning

 Another      Read/Write
 Master        Splitting         IO          Memory
Scaling Reads
• InnoDB Buffer Pool
  • Cache Warming
• Read buffer                        A lot more
                                      settings
                        InnoDB
• Sort Buffer          Buffer Pool

• Join Buffer
• Temp Table size / on disk
Scaling Reads
  • Better Hardware
    • Disk I/O
    • Memory




 Better
Hardware



  IO       Memory
Scaling Reads
• Read Slaves
  • Read/Write Splitting
  • Master/Master
  • Galera
Server Architecture


                      MySQL
Server Architecture
db3   db1

  db2
Columnar                £££                                          A lot more
      Store                                                               settings
                                                      InnoDB
                                                     Buffer Pool
                         Summary
Hadoop                    Tables                                   Indexes



                                                                          Intensive
  OLAP                                   Reporting                          Table
  Cubes                                                                  Optimization


              Sharding                                    Partitioning

                             Reporting
Cross Shard                                       Better
                              Slaves
   Joins                                         Hardware              Sub Partitioning

                Different
                Indexes                              IO            Memory
Reporting
• Reporting Slaves
  • Different Indexes
  • No Foreign Keys
  • Partitioning
  • If ROW-replication:
     • Must have same data types
                                               Reporting
                                                Slaves


                                   Different
                                   Indexes
Reporting
• Sharding
  • Cross-shard JOINs
  • Go Fish
  • Aggregations               Hadoop
Reporting
  • Summary tables - aggregations
     • Scripts
     • Hadoop
       • Ready –Made reports
       • OLAP Cubes
  • Columnar Store
     • £££


* RDBMS very fast at GROUP BY
Battery                                           A lot more
                                                            settings


                      RAID                   Innodb Log
                    Write-back                 File Size



     Local Server                Write Buffers
       Storage

                                                               MySQL
              ETL                                              settings
                                        Queue

Hadoop


                                                 Summarized
     CRUSH             HandCode
                                                   Writes
Write Buffers
• RAID card
 • BBU
 • Write-back vs write-through
 • Battery Learning/Drain
Write Buffers
• Innodb log file size
  • Buffer pool * dirty read (%) * io capacity




                                             A lot more
                                              settings


                               Innodb Log
* Virtual Environment            File Size
Write Buffers
• innodb-flush-log-at-trx-commit
• Sync binlog
 • Support xa


                                   MySQL
                                   settings
Write Buffers
• ActiveMQ, RabbitMQ, ZeroMQ, Gearman
 • Not ACID, may need redundancy
• Summarized Writes
 • Memcached counters + interval writes
Write Buffers
• Local Storage (web/app servers)
 • Memcached
 • SQLite (disk/in-memory)
 • Log file                      Local Server
              <- Most popular      Storage
 • MySQL
• Independent, isolated                   ETL
• Need to fetch data
 • Prevent missed data and duplicates
Write Buffers
• Fetching Data
 • Hand code
 • ETL tool – Pentaho/Talend
 • Flume
• Aggregation/Processing
 • Hadoop   <- Very Popular
 • Google CRUSH
Google CRUSH Tools
gunzip oldlog.log.gz |
convdate -f 3 -i "%d/%b/%Y:%H:%M:%S
%z" -o "%Y-%m-%d" |
reorder -k 3,2 |
aggregate -p -k 3,2 -c 1 |
csvformat |
gzip - > newlog.csv.gz
Innodb Log
                                                    File Size


                             Write Buffers
      Sharding
                                                  Indexes



                                                             Intensive
                            Scaling Writes
                                                               Table
   Hardware
                                                            Optimization


                        Remove
                                             Partitioning
                       Bottlenecks
OS Settings

                                                       Sub Partitioning
                               MySQL
              Bypass SQL
                               Settings
                Layer
Scaling Writes
• Less Indexes
  • Less writes
• Partitioning
  • Less table maintenance
    • Less B-tree levels
    • Less need to organize blocks
  • Algorithm overhead
  • Mutex/Locks
Scaling Writes
• Bypass SQL Layer
                                SQL Parser
 • Innodb/Memcached
 • HandlerSocket
                                Optimizer




                            Storage Engine




                       * Raik
Scaling Writes
• IO Scheduler
• File System (+ nobarrier, noatime, nodiratime)
  • EXT 3
  • EXT 4
  • XFS
  • ZFS
• Block Sizes
Scaling Writes
• Faster I/O
 • Faster Disks
   • SAS
   • SSD
 • RAID + cache                  Hardware

 • PCIe SSD
   • FusionIO
   • Virident
Scaling Writes
• Master/Master
 • Does not scale writes
 • Writes still need to replicate
• Sharding
 • Does scale writes
App                                   Global Data          Child Data
  Function                  DB


                                                                       By Schema
Proxy                                            Partitioned
                   Lookup                           Data

                                                                    Shared
                                                                    Nothing
           Functional                Sharding
          Partitioning

                                                                            ID

Go Fish                  Reporting                  Key

                                                                       Area
                                                        Hash
Cross Shard              Hadoop
Sharding
• Partitioned Data
• Splitting the Data
  • Vertically
    • Main Tables to partition
  • Child Tables
  • Global Tables
• By Schema
• Shared Nothing
Sharding
• Partitioning by Key
 • ID – CustomerID, ProductID, App
 • Area – Country, City, Continent
 • Hash – Random for equal spread
Sharding
• Which shard has the data?
 • Store it in a DB
   • Flexible but slower
 • Some Function in your App
   • Faster, less flexible
 • Proxy config file
   • Faster, less flexible
   • Needs some app coding
Sharding
• Maintenance
 • Backups
 • Slaves
• Uptime
 • Loosely coupled system
Sharding
• Functional Partitioning
 • Different Apps
 • Share some tables


                             Functional
                            Partitioning
Sharding
• Reporting
 • Go Fish – One server / Shared nothing
 • Cross Shard – Many servers
 • Hadoop – Aggregate to one reporting server
The End
The End
• Questions & Answers
• Email: contact@jonathanlevin.co.uk
• Don’t forget to rate this tutorial
If we have time
• MySQL 5.6
• NoSQL
• ORM
• Beyond Hadoop
• Bring-Your-Own-Problems

More Related Content

What's hot

Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaDuyhai Doan
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQLGeorgi Sotirov
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL ExplainMYXPLAIN
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL PerformanceSveta Smirnova
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysqlsabir18
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionSveta Smirnova
 
Mini Session - Using GDB for Profiling
Mini Session - Using GDB for ProfilingMini Session - Using GDB for Profiling
Mini Session - Using GDB for ProfilingEnkitec
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain ExplainedJeremy Coates
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query TuningSveta Smirnova
 
Moving to the NoSQL side: MySQL JSON functions
 Moving to the NoSQL side: MySQL JSON functions Moving to the NoSQL side: MySQL JSON functions
Moving to the NoSQL side: MySQL JSON functionsSveta Smirnova
 
Basic MySQL Troubleshooting for Oracle Database Administrators
Basic MySQL Troubleshooting for Oracle Database AdministratorsBasic MySQL Troubleshooting for Oracle Database Administrators
Basic MySQL Troubleshooting for Oracle Database AdministratorsSveta Smirnova
 
Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?Sveta Smirnova
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-onsTroubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-onsSveta Smirnova
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL PerformanceSveta Smirnova
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Oracle JSON internals advanced edition
Oracle JSON internals advanced editionOracle JSON internals advanced edition
Oracle JSON internals advanced editionAlexander Tokarev
 

What's hot (20)

Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Percona toolkit
Percona toolkitPercona toolkit
Percona toolkit
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL Explain
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
0888 learning-mysql
0888 learning-mysql0888 learning-mysql
0888 learning-mysql
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in action
 
Mini Session - Using GDB for Profiling
Mini Session - Using GDB for ProfilingMini Session - Using GDB for Profiling
Mini Session - Using GDB for Profiling
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
Moving to the NoSQL side: MySQL JSON functions
 Moving to the NoSQL side: MySQL JSON functions Moving to the NoSQL side: MySQL JSON functions
Moving to the NoSQL side: MySQL JSON functions
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
Basic MySQL Troubleshooting for Oracle Database Administrators
Basic MySQL Troubleshooting for Oracle Database AdministratorsBasic MySQL Troubleshooting for Oracle Database Administrators
Basic MySQL Troubleshooting for Oracle Database Administrators
 
Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?Optimizer Histograms: When they Help and When Do Not?
Optimizer Histograms: When they Help and When Do Not?
 
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-onsTroubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance add-ons
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Oracle JSON internals advanced edition
Oracle JSON internals advanced editionOracle JSON internals advanced edition
Oracle JSON internals advanced edition
 

Viewers also liked

Cleanliness is next to Godliness
Cleanliness is next to GodlinessCleanliness is next to Godliness
Cleanliness is next to GodlinessJonathan Levin
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLJonathan Levin
 
Reporting: From MySQL to Hadoop/Hive
Reporting: From MySQL to Hadoop/HiveReporting: From MySQL to Hadoop/Hive
Reporting: From MySQL to Hadoop/HiveManuel Aldana
 
Quick And Easy Guide To Speeding Up MySQL for web developers
Quick And Easy Guide To Speeding Up MySQL for web developersQuick And Easy Guide To Speeding Up MySQL for web developers
Quick And Easy Guide To Speeding Up MySQL for web developersJonathan Levin
 
Caching Business Logic in the Database
Caching Business Logic in the DatabaseCaching Business Logic in the Database
Caching Business Logic in the DatabaseJonathan Levin
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteriaAsis Mohanty
 

Viewers also liked (7)

Cleanliness is next to Godliness
Cleanliness is next to GodlinessCleanliness is next to Godliness
Cleanliness is next to Godliness
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
 
Reporting: From MySQL to Hadoop/Hive
Reporting: From MySQL to Hadoop/HiveReporting: From MySQL to Hadoop/Hive
Reporting: From MySQL to Hadoop/Hive
 
Quick And Easy Guide To Speeding Up MySQL for web developers
Quick And Easy Guide To Speeding Up MySQL for web developersQuick And Easy Guide To Speeding Up MySQL for web developers
Quick And Easy Guide To Speeding Up MySQL for web developers
 
Caching Business Logic in the Database
Caching Business Logic in the DatabaseCaching Business Logic in the Database
Caching Business Logic in the Database
 
How to Monitor MySQL
How to Monitor MySQLHow to Monitor MySQL
How to Monitor MySQL
 
ETL tool evaluation criteria
ETL tool evaluation criteriaETL tool evaluation criteria
ETL tool evaluation criteria
 

Similar to MySQL Performance Tuning and Optimization

10x Performance Improvements
10x Performance Improvements10x Performance Improvements
10x Performance ImprovementsRonald Bradford
 
10x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp0210x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp02promethius
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksMYXPLAIN
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the ServerdevObjective
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAiougVizagChapter
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data WarehousesConnor McDonald
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best PracticesDavid Keener
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practicesDavid Dhavan
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2Alex Zaballa
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2Alex Zaballa
 
2008 Collaborate IOUG Presentation
2008 Collaborate IOUG Presentation2008 Collaborate IOUG Presentation
2008 Collaborate IOUG PresentationBiju Thomas
 
My sql查询优化实践
My sql查询优化实践My sql查询优化实践
My sql查询优化实践ghostsun
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the ServerColdFusionConference
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 

Similar to MySQL Performance Tuning and Optimization (20)

10x Performance Improvements
10x Performance Improvements10x Performance Improvements
10x Performance Improvements
 
10x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp0210x improvement-mysql-100419105218-phpapp02
10x improvement-mysql-100419105218-phpapp02
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
My SQL Skills Killed the Server
My SQL Skills Killed the ServerMy SQL Skills Killed the Server
My SQL Skills Killed the Server
 
Sql killedserver
Sql killedserverSql killedserver
Sql killedserver
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Aioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_featuresAioug vizag oracle12c_new_features
Aioug vizag oracle12c_new_features
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
 
30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices30334823 my sql-cluster-performance-tuning-best-practices
30334823 my sql-cluster-performance-tuning-best-practices
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2
 
DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2DBA Commands and Concepts That Every Developer Should Know - Part 2
DBA Commands and Concepts That Every Developer Should Know - Part 2
 
2008 Collaborate IOUG Presentation
2008 Collaborate IOUG Presentation2008 Collaborate IOUG Presentation
2008 Collaborate IOUG Presentation
 
My sql查询优化实践
My sql查询优化实践My sql查询优化实践
My sql查询优化实践
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the Server
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 

Recently uploaded

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionDEEPRAJ PATHAK
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Recently uploaded (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile Evolution
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

MySQL Performance Tuning and Optimization

  • 1.
  • 2. Who Am I? • Jonathan • MySQL Consultant • Working with MySQL since 2007 • Specialize in SQL, Indexing and Reporting (Big Data)
  • 3. Who is this for? * Smilies indicate ability to control area
  • 4.
  • 5. What Will I Cover? Domain Knowledge This Much 0% 20% 100%
  • 6. Solutions this tutorial will cover Occurrences Problems
  • 9. What Will I Cover? • The top 20% of the strategies to resolve 80% of your performance problems • The strategies that are within reach of developers • Strategies that are more common and more established • From my experience. • To reduce risk
  • 10. Table of Contents Part One Part Three • Indexes • Read Cache • Finding • Scaling Reads Bottlenecks • Reporting Part Two • Write Buffers • Partitioning • Scaling Writes • Intensive Table • Sharding Optimization
  • 13. Indexes • Advantages • Speed – Use the right path • Now used in NoSQL stores • “A properly indexed database will give you very few problems” – me • My blog – “Indexing and Caching”
  • 16.
  • 17.
  • 18. Choosing the best Index Prevent Reading from Disk *3 *2 Prevent Extra *1 Processing Prevent Table Scans
  • 19. Indexes • 1 Star – EXPLAIN • Type • const - where id=1 • ref - where location='london' • eq_ref - where t1.id = t2.id • Extra • using where • Limitation – type • range - where id in (1,2,3,4,5)
  • 20. Indexes • 2 Star – EXPLAIN • Extra • using where • Using index • And Not • Using filesort • Using temporary • Limitation • Using temporary - query contains different GROUP BY and ORDER BY columns
  • 21. Indexes • 3 Star – EXPLAIN • Type • index • Extra • Using index • Using index for group-by
  • 23. Regular Usage SELECT , FROM ₸ WHERE = 121; (PRIMARY KEY )
  • 24. Range Scan SELECT ▲, ■ FROM ₸ WHERE = 121 AND BETWEEN 1 AND 100 KEY ( , )
  • 25. Range Scan SELECT ▲, ■ FROM ₸ WHERE = 121 AND IN (1,100,30,7) KEY ( , )
  • 26. Covering Index SELECT , FROM ₸ WHERE = 121 KEY ( , , )
  • 27. Not Optimal SELECT FROM ₸ WHERE = 121 AND ♪ IN (1,100,30,7) KEY ( , )
  • 28. Broken Range SELECT ▲, ■ FROM ₸ WHERE = 121 AND IN (1,100,30,7); KEY ( , )
  • 29. Sub Queries SELECT ▲, ■ FROM ₸ WHERE IN (SELECT ♪ FROM ♠) KEY ( )
  • 30. Indexes for Sorting SELECT ☼, ☺ FROM ₸ WHERE = 121 GROUP BY ORDER BY KEY ( )
  • 31. Indexes for Sorting SELECT ☼, ☺ FROM ₸ WHERE = 121 GROUP BY ORDER BY KEY ( , ) or KEY ( , )
  • 32. Indexes for Joins SELECT . , . FROM INNER JOIN ON . = . WHERE . = 232 KEY ( ) or KEY( )
  • 33. WHERE ₸.Ω = 232; INNER JOIN ♫
  • 34. FULL SCAN ♫ INNER JOIN ₸ FILTER Ω = 232
  • 35. SELECT ▲, (SELECT .. FROM WHERE ) FROM ₸ WHERE ♪ IN (SELECT ♪ FROM ♠ WHERE..); “I need help optimizing the my.cnf”
  • 36. Clustered PK and Secondary Indexes KEY ( , , ) PRIMARY KEY ( ) KEY ( , ) (Can be used in GROUP / ORDER BY SELECT variables to make Covering index) http://www.dbasquare.com/2012/05/17/can-mysql-use-primary-key-values-from-a-secondary-index/
  • 37. Index Merge SELECT ☺ FROM ₸ WHERE =1 OR =2; KEY ( ) KEY ( )
  • 38. Index Merge SELECT☺FROM ₸ WHERE =1 UNION (SELECT☺FROM ₸ WHERE =2) KEY ( ) KEY ( ) * 5.6
  • 39.
  • 41. * * *
  • 42.
  • 43. Gathering Data What you will need: 1. MySQL Slow log: MySQL >= 5.1 or Percona/MariaDB microslow patch 2. Set long_query_time = 0 - for 6 to 24 hours for decent sized slow log. (Make sure host has enough space)
  • 44. Log Processing Worst Response Queries 1. Echo ‘’ > slow.log 2. mysql> set global long_query_time=0; set long_query_time = 0; flush logs; 3. Wait X hours and return original value. 4. pt-query-digest slow.log > slow.txt Processing should be Bulky Queries done on another host --filter ‘($event->{Rows_examined} > 1000)’ Write Queries --filter '($event->{Rows_affected} > 0)‘
  • 45. Log Processing • MySQL 5.6 • Statement Digest • No need for log processing to get Digest.
  • 46. Worst Response Queries pt-query-digest slow.log > slow.txt Rank Response time Calls R/Call Item SELECT dp_node dp_usernode 1 480.9273 16.3% 600 0.8015 dp_buddylist dp_users dp_node_access 2 322.4220 4.3% 129258 0.0025 ADMIN INIT DB 3 314.8719 4.2% 30220 0.0104 UPDATE dp_users 4 287.7109 3.8% 51606 0.0056 SET SELECT dp_node dp_usernode 5 269.3434 3.6% 600 0.4489 dp_buddylist dp_users dp_node_access 6 238.8571 3.1% 2902141 0.0001 SELECT dp_url_alias
  • 47. * Gap Locking * pt-stalk
  • 48. mysql tables in use 4, locked 2 5289 lock struct(s), heap size 620984, 273785 row lock(s), undo log entries 363312 MySQL thread id 467, OS thread handle 0x7fceab7df700, query id 88914423 SELECT r.exchange_rate INTO destination_exchange_rate FROM exchange_rate AS r WHERE r.currency_id = NAME_CONST('destination_currency_id',6) AND r.date = NAME_CONST('day',_latin1'2012-06-30' COLLATE 'latin1_swedish_ci') Trx read view will not see trx with id >= 10ECC92B0, sees < 10ECC916F TABLE LOCK table `currency` trx id 10ECC90CC lock mode IS RECORD LOCKS space id 0 page no 261 n bits 80 index `fk_currency_status1` of table `currency` trx id 10ECC90CC lock mode S TABLE LOCK table `daily_summary` trx id 10ECC90CC lock mode IS RECORD LOCKS space id 0 page no 34829580 n bits 200 index `PRIMARY` of table `daily_summary` trx id 10ECC90CC lock mode S TABLE LOCK table `exchange_rate` trx id 10ECC90CC lock mode IS TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
  • 49. *** (1) TRANSACTION: TRANSACTION 13DCDF4D9, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s) MySQL thread id 2438176, OS thread handle 0x7f9a37408700, query id 118341815748 UPDATE sys_doctrine_lock_tracking SET timestamp_obtained = '1341839053' WHERE object_key = '1146' AND user_ident = '158' AND c_type = '137' *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 48627 n bits 280 index `PRIMARY` of table `sys_doctrine_lock_tracking` trx id 13DCDF4D9 lock_mode X locks rec but not gap waiting Record lock, heap no 207 PHYSICAL RECORD: n_fields 6; compact format; 0: len 8; hex 43616d706169676e; 1: len 4; hex 31313436; asc 1146;; 2: len 6; hex 00013dc12d7a; asc = -z;; …
  • 50. Bottlenecks • Locking Queries • Try to make them complete as fast as possible • JOINs vs sub query • Function wrapped around index • UDF with SQL inside • Long Transactions • Looping with short queries
  • 51. Misbehaving Optimizer • Optimizer Hints: • USE INDEX • FORCE INDEX • IGNORE INDEX • STRIGHT_JOIN • Joins • LEFT sometimes faster then INNER * Too many indexes confuse the optimizer
  • 52.
  • 53. Bottlenecks • Virtualization • Increase (obscenely) innodb_log_file_size • Needs restart + deleting old log files • EXT3 • General I/O improvements • innodb-flush-log-at-trx-commit • Sync binlog Group Commit • Xa support
  • 54. Bottlenecks • General I/O improvements • Percona server • Better flushing to disk • Less mutexes • Upgrade MySQL • Same reasons as above • Innodb-io-capacity http://www.wmarow.com/strcalc/
  • 55. Database Upgrades • My Secret Sauce for smooth MySQL Migrations 1. Upgrade the dev/staging DBs with desired version 2. Wait 1-2 months till silky-smooth • All features have been tested • All the query issues have been fixed 3. Upgrade servers • Down to Up – Slaves first
  • 56. Bottlenecks • Mutexes • Query Cache • “Freeing items” in processlist • Network Batch Processes • Skip-name-resolve • net_write_timeout / read_timeout • thread_cache_size Lots of connections
  • 57. Homework • Haven’t talked fully about: SQL and EXPLAIN • Webinars • Indexes - percona.tv/percona-webinars/tools-and- techniques-for-index-design • Explain - percona.tv/percona-webinars/explain-demystified • Websites • http://www.myxplain.net
  • 60. Columns Rows Big Table
  • 61. Columns Partition Partition Partition Partition Algorithm Partition Rows Partition Partition Partition Partition
  • 62. Select Algorithm Inserting Parallelize Data Unique Key Algorithm Reduce Data Foreign Keys Issues Use Cases Primary Key Overhead Partitioning ID Benefits Automatic Time B-Tree Levels Manual Hash Short Scans Time 80-90% Archive DB Usage Shards
  • 63. Partitioning Parallelize Data Reduce Data • Use Cases Use Cases • Reducing Data – Only get the partition/table that you need • Parallelizing Data - Get an equal amount of data from each partition in parallel • Benefits Benefits • Shorter table scans B-Tree • Less levels for index scans Levels Short Scans
  • 64. Issues • Algorithm Select Inserting Algorithm • INSERTs Algorithm • SELECTs Issues • Keys Unique Key • Foreign Key Foreign Keys • Unique Key Primary Key Overhead • Primary Key Overhead • Increases Table size • Can change indexes
  • 65. Partition Types • Range • List ID • Hash Automatic Time • Key • Columns Hash • Sub-partitioning
  • 66. Partitioning by Usage vs Partitioning by Maintenance Rank Response time Calls Item SELECT address FROM WHERE 1 480.9273 26.3% 129258 BETWEEN ‘2012-11-10’ and ‘2012-11- 17’ SELECT total FROM WHERE 2 322.4220 14.3% 600 BETWEEN ‘2012-11-01’ and ‘2012-11-30’ UPDATE SET active=1 WHERE = 3 34.8719 4.2% 30220 17635376 SELECT dispatch_time FROM 4 28.7109 3.8% 51606 WHERE = 7387612
  • 67. CREATE TABLE orders id int unsigned not null auto_increment, `date` date not null, … Can also do PRIMARY KEY (date, id), PRIMARY KEY (id, date) KEY id (id), KEY date (date) ..) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY ( (date)) (PARTITION VALUES LESS THAN ( (‘2012-01-01’), PARTITION VALUES LESS THAN ( (‘2012-06-01’)), PARTITION VALUES LESS THAN )
  • 68. Manual Partitioning • Archive - Main table & Archive Table • Time – Create table per year, per month.. • Shards – Table per country * Foreign keys Manual Time Archive Shards
  • 69.
  • 71. Intensive Table Optimization Once upon a time, I was researching ways to make a database working set fit as much as possible to memory…
  • 73. Intensive Table Optimization 1. People are usually very liberal with data type sizes 2. There were (usually) so many indexes that: 1. They multiplied the table size 2. Were not efficient compares to how the table is used 3. Confused the optimizer 3. Discovered partitions were not used or misused 1. Discovered sub partitions 2. Primary Key alignment 4. Discovered InnoDB compression
  • 74. Intensive Table Optimization There are tools to help with this, but…
  • 76. Intensive Table Optimization Data Types Slowest Queries Bottleneck User Statistics Optimizations Tables Optional Table Sizes Query Logs Partitions Indexes Foreign Keys
  • 77. Gathering Data What you will need: 1. MySQL Slow log: MySQL >= 5.1 or Percona/MariaDB microslow patch 2. Set long_query_time = 0 - for 6 to 24 hours for decent sized slow log. (Make sure host has enough space) Slowest Queries Bottleneck Tables
  • 78. Gathering Data Helpful (Optional): 1. Percona/Mariadb user_statistics patch 2. Get list of most read/written tables 3. Get list of used and un-used indexes 4. List of largest tables Table Sizes User Statistics Bottleneck Tables
  • 79. Worst Response Queries pt-query-digest slow.log > slow.txt Rank Response time Calls Item 1 8589.9513 27.5% 231051 UPDATE dp_users 2 4752.6688 15.2% 257235 SELECT dp_cache_menu 3 1606.4946 5.1% 183542 SELECT community_chats 4 1418.9034 4.5% 259939 SELECT dp_cache 5 564.3305 1.8% 7970165 SELECT dp_url_alias 6 495.0092 1.6% 44940 SELECT dp_event dp_node
  • 80. Table Statistics SELECT table_name, FROM information_schema.table_statistics ORDER BY DESC LIMIT 5; table_name rows_read dp_users 2302477894 ROWS_CHANGED dp_node 1231318439 ROWS_CHANGED_X_INDEXES dp_comments 1071462211 dp_userpoints 1033073070 dp_search_index 260154684
  • 81. Table Statistics Worst Response Tables --group-by tables Rank Response time Calls Item 1 7975.4487 6.5% 124384 advertisement 2 5554.1435 4.5% 1834 info 3 4915.4816 4.0% 208 placement 4 4902.7644 4.0% 158 advert_summary
  • 82. Table Sizes Table_Name Rows Data Idx Total_size Idxfrac total_daily_summary 610M 77G 88G 165G 1.15 advert_summary 478M 57G 45G 102G 0.78 log_messages 92M 47G 10G 57G 0.21 SELECT CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_NAME, CONCAT(ROUND(TABLE_ROWS / 1000000, 2), 'M') ROWS, CONCAT(ROUND(DATA_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') DATA, CONCAT(ROUND(INDEX_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') IDX, CONCAT(ROUND(( DATA_LENGTH + INDEX_LENGTH ) / ( 1024 * 1024 * 1024 ), 2), 'G') TOTAL_SIZE, ROUND(INDEX_LENGTH / DATA_LENGTH, 2) IDXFRAC FROM INFORMATION_SCHEMA.TABLES ORDER BY DATA_LENGTH + INDEX_LENGTH DESC http://www.mysqlperformanceblog.com LIMIT 10; /2008/03/17/researching-your-mysql-table-sizes/
  • 83. Which table needs your attention? Table Size
  • 84. Intensive Table Optimization • Table Targeting • The most “worthy” table to focus your attention on • Biggest bang for your buck • If you know which table is the most troublesome • Ignore most of the investigations • Apart from slow log • Investigations help understand DB usage
  • 85. Optimizations Compression Data Types Query Logs Partitions Indexes Sub Partitioning Don’t Need Foreign Keys
  • 86. Intensive Table Optimization • Datatypes • SELECT * FROM table G • Example: Tinyint instead of Bigint: (7 bytes row + 7bytes index) * 350million rows = 4.9Gb • Enum instead of Varchar • Remove NULLs when not needed
  • 87. Intensive Table Optimization • Compression • Best for tables with a lot of varchar/text • Compress table by x2, x4, x8.. • Need to experiment with innodb_strict = on; • On my tests (5.5) – Very very slow • Alter tables • INSERTS/UPDATES/DELETES Optimizations Compression
  • 88. 1. Slow Log Filtered by Get Data Target Table Query Digest New Results EXPLAIN 3. 2. Index-Usage Make Test Assumptions Assumptions 4. Deploy
  • 89. Target Table Processing Filter Log: pt-query-digest slow.log --filter '$event->{arg} =~ m/dp_users /' --no-report --print >dp_users.log Worst Queries from new log: pt-query-digest dp_users.log --limit 100% >tbl_dp_users.txt
  • 90. Response Rank time Calls Item 209.2863 UPDATE dp_users SET access = 133******3 WHERE 1 88850 10.7% = 23****01G 162.2711 3 1309010 SELECT access FROM dp_users WHERE = 21***4G 8.3% 139.9009 SELECT uid, name FROM dp_users WHERE =1 4 197 7.1% ORDER BY DESCG 133.8691 SELECT * FROM dp_users u WHERE = 5 327 6.8% 's******s'G SELECT name, created, picture FROM dp_users WHERE 109.6903 6 29152 picture !='' AND = '1' AND BETWEEN 5.6% '133*****0' AND '133*****60'G 92.9095 SELECT dp_node dp_users using ( ) 7 360642 4.7% dp_node_revisions 74.2426 SELECT * FROM dp_users u WHERE = hoa****rio' 8 106 3.8% AND = '3837********5f9b' AND = 1G
  • 91. Partitioning by Usage Rank Response time Calls Item SELECT address FROM orders WHERE date 1 480.9273 26.3% 129258 BETWEEN ‘2012-11-10’ and ‘2012-11-17’ SELECT total FROM orders WHERE date 2 322.4220 14.3% 600 BETWEEN ‘2012-11-01’ and ‘2012-11-30’ UPDATE order SET active=1 WHERE id = 3 34.8719 4.2% 30220 17635376 SELECT dispatch_time FROM order WHERE 4 28.7109 3.8% 51606 id = 7387612
  • 92. Testing Assumptions SELECT uid, name FROM WHERE = 1 ORDER BY DESCG 1.30secs SELECT uid, name FROM WHERE = 1 ORDER BY DESCG 0.56secs Query Digest EXPLAIN
  • 93. Test Environment • Hardware environment similar to live • Data size similar to live environment: • Replicating slave • Cannot change datatypes on MIXED/ROW replication • Create table2 and run queries against it • Xtrabackup – full replica New Results • Script with Mysqldump + WHERE • mysqldump --databases main --tables table1 table2 –where “date > now() – interval 30 day” > dump.sql • Mysqldump –all-database –ignore-table main.table1 main.table2 >> dump.sql
  • 94. Final Tweaking (Remember the table log file – dp_users.log ?) pt-index-usage • pt-index-usage dp_users.log --host 127.0.0.1 --tables dp_users >idx_dp_users.txt • Go over recommendations Index-Usage Test Assumptions
  • 95. Deploy Strategies 1. Rolling Servers 2. pt-online-schema-change Deploy 3. Two-part move a. Create new table – table2 b. Insert table rows that will not change – INSERT INTO table2 SELECT * FROM table1 WHERE date <= curdate() – interval 30 day; c. Short downtime d. Rename table1 to table3; rename table2 to table1; e. INSERT IGNORE INTO table1 SELECT * FROM table3 WHERE date >= curdate() – interval 30 day; 4. Alter table – long downtime (pre 5.6, maybe)
  • 97. Slow Log Filtered by Get Data Target Table Query Digest EXPLAIN New Results Index-Usage Test Make Assumptions Assumptions Deploy
  • 99. Streaming Reverse File Proxy Volatile Browser Shield Cache Query Cache Page Cache Denormalize 2nd Level Summary Column Tables Cache 3rd Level Subtotal Attributes Data Warehouse Conditional
  • 100. Read Cache • Outside the database • Page Cache • Query Cache • Inside the database • Column Cache • Summary Table * Complexity
  • 101. Page Cache • Browser Cache • Etag, Expires, Last-modified • Reverse Proxy • Squid, Varnish, Nginx, Apache, Proprietary. • File/Full page cache • mod_file_cache, Zend_Cache_Backend • W3 Total Cache, sfSuperCache * Stale
  • 103. Query Cache • Volatile • Memcached, Redis, Hibernate Cache, Arrays.. • On-Request, Time-to-Live, Stale and Cache Stampede • Streaming • Interval / Async, Stale, Common Queries • Shield – Mongo Shield • Script/Tool Replication, Dependency • Aggregation • Complexity / Layers
  • 104. Mongo Shield 147cm img.photobucket.com/albums/v158/keris_hanuman/Afbeelding1455.jpg
  • 105.
  • 106. Sticky Sticky Sessions Sessions Memcached Memcached Cart MySQL
  • 107. Memcached Sessions MySQL Cart
  • 109. Column Cache • Denormalize • Additional Column(s) to prevent JOINs • Maintenance, Space on disk • Example: CustomerID, OrderID, OrderItemID • Sub Total • Prevent additional slow GROUP BY queries • Maintenance, Generation, Space on disk • Example: totalPurchases, moneyOwed * Space vs Speed
  • 110. Column Cache • Conditional • Store conditional (True/False) logic • Prevents recalculating result – another query • Can prevent rewriting code • Example: isDone, hasReview, aboveAvg • Attributes • ENUM datatype • SET datatype - ARRAY of options • Prevents JOINs • May save space
  • 111. Summary Tables An additional table which consists of an aggregation of another table or several JOIN’d tables. Summary Tables
  • 112. SELECT ... FROM main_table t1 INNER JOIN table2 t2 on t1.orderid = t2.id INNER JOIN table3 t3 on t1.customerid = t3.id INNER JOIN table4 t4 on t1.addressid = t4.id INNER JOIN table5 t5 on t2.supplierid = t5.id INNER JOIN table6 t6 on t2.warehouse = t6.id INNER JOIN table2 t7 on t6.addressid = t7.id INNER JOIN table8 t8 on t1.productid = t8.id INNER JOIN table9 t9 on t1.buyerid = t9.id INNER JOIN table10 t10 on t1.officeid = t10.id WHERE t1.date between '2012-11-01' and '2012-11-30' GROUP BY t1.date
  • 113. Summary Tables Processed 1.2million rows Returned 30 rows Time 17.52 minutes
  • 114. Summary Tables CREATE TABLE summary_table (primary key (date,addressid,productid)) as SELECT ... FROM main_table t1 INNER JOIN table2 t2 on t1.orderid = t2.id INNER JOIN table3 t3 on t1.customerid = t3.id INNER JOIN table4 t4 on t1.addressid = t4.id INNER JOIN table6 t6 on t2.warehouse = t6.id GROUP BY t1.date, t1.addressid, t1.productid
  • 115. Summary Tables SELECT ... FROM summary_table t1 INNER JOIN table5 t5 on t2.supplierid = t5.id INNER JOIN table2 t7 on t6.addressid = t7.id INNER JOIN table8 t8 on t1.productid = t8.id INNER JOIN table9 t9 on t1.buyerid = t9.id INNER JOIN table10 t10 on t1.officeid = t10.id WHERE t1.date between '2012-11-01' and '2012-11-30‘ GROUP BY t1.date
  • 116. Summary Tables Processed 35000 rows Returned 30 rows Time 0.75 seconds
  • 117.
  • 118. Summary Tables as an Analytics Sub-System
  • 119. Database Design Operational System Analytic System Comparison Execution of a business Measurement of a Purpose process business process Insert, Update, Query, Primary Interaction Query Delete Design Optimization Update concurrency High-performance query Entity-relationship (ER) Dimensional design (Star Design Principle 3rd Normal form (3NF) schema or cube) amazon.co.uk/Schema-Complete-Reference-Christopher-Adamson/
  • 120. Data Warehouses Fact Tables Dimension Tables • Measurement • Context • Narrow • Wide • Long • Short • Most of the • Filters and descriptive data data
  • 122. Operational Design Customers Orders Order Items Addresses Products
  • 124. Star Schema Date dim Address dim OrderItems Fact Products dim Customers dim
  • 125. Maintenance • Hourly/Daily/Weekly/Monthly Aggregations • Intervals • Off Peak • On-Insert
  • 126. A lot more InnoDB settings Read Cache Buffer Pool Indexes Sharding Intensive Scaling Reads Table Optimization Galera Read Partitioning Slaves Better Hardware Sub Partitioning Another Read/Write Master Splitting IO Memory
  • 127. Scaling Reads • InnoDB Buffer Pool • Cache Warming • Read buffer A lot more settings InnoDB • Sort Buffer Buffer Pool • Join Buffer • Temp Table size / on disk
  • 128. Scaling Reads • Better Hardware • Disk I/O • Memory Better Hardware IO Memory
  • 129. Scaling Reads • Read Slaves • Read/Write Splitting • Master/Master • Galera
  • 132.
  • 133.
  • 134.
  • 135. db3 db1 db2
  • 136. Columnar £££ A lot more Store settings InnoDB Buffer Pool Summary Hadoop Tables Indexes Intensive OLAP Reporting Table Cubes Optimization Sharding Partitioning Reporting Cross Shard Better Slaves Joins Hardware Sub Partitioning Different Indexes IO Memory
  • 137. Reporting • Reporting Slaves • Different Indexes • No Foreign Keys • Partitioning • If ROW-replication: • Must have same data types Reporting Slaves Different Indexes
  • 138. Reporting • Sharding • Cross-shard JOINs • Go Fish • Aggregations Hadoop
  • 139. Reporting • Summary tables - aggregations • Scripts • Hadoop • Ready –Made reports • OLAP Cubes • Columnar Store • £££ * RDBMS very fast at GROUP BY
  • 140. Battery A lot more settings RAID Innodb Log Write-back File Size Local Server Write Buffers Storage MySQL ETL settings Queue Hadoop Summarized CRUSH HandCode Writes
  • 141. Write Buffers • RAID card • BBU • Write-back vs write-through • Battery Learning/Drain
  • 142. Write Buffers • Innodb log file size • Buffer pool * dirty read (%) * io capacity A lot more settings Innodb Log * Virtual Environment File Size
  • 143. Write Buffers • innodb-flush-log-at-trx-commit • Sync binlog • Support xa MySQL settings
  • 144. Write Buffers • ActiveMQ, RabbitMQ, ZeroMQ, Gearman • Not ACID, may need redundancy • Summarized Writes • Memcached counters + interval writes
  • 145. Write Buffers • Local Storage (web/app servers) • Memcached • SQLite (disk/in-memory) • Log file Local Server <- Most popular Storage • MySQL • Independent, isolated ETL • Need to fetch data • Prevent missed data and duplicates
  • 146. Write Buffers • Fetching Data • Hand code • ETL tool – Pentaho/Talend • Flume • Aggregation/Processing • Hadoop <- Very Popular • Google CRUSH
  • 147. Google CRUSH Tools gunzip oldlog.log.gz | convdate -f 3 -i "%d/%b/%Y:%H:%M:%S %z" -o "%Y-%m-%d" | reorder -k 3,2 | aggregate -p -k 3,2 -c 1 | csvformat | gzip - > newlog.csv.gz
  • 148. Innodb Log File Size Write Buffers Sharding Indexes Intensive Scaling Writes Table Hardware Optimization Remove Partitioning Bottlenecks OS Settings Sub Partitioning MySQL Bypass SQL Settings Layer
  • 149. Scaling Writes • Less Indexes • Less writes • Partitioning • Less table maintenance • Less B-tree levels • Less need to organize blocks • Algorithm overhead • Mutex/Locks
  • 150. Scaling Writes • Bypass SQL Layer SQL Parser • Innodb/Memcached • HandlerSocket Optimizer Storage Engine * Raik
  • 151. Scaling Writes • IO Scheduler • File System (+ nobarrier, noatime, nodiratime) • EXT 3 • EXT 4 • XFS • ZFS • Block Sizes
  • 152. Scaling Writes • Faster I/O • Faster Disks • SAS • SSD • RAID + cache Hardware • PCIe SSD • FusionIO • Virident
  • 153. Scaling Writes • Master/Master • Does not scale writes • Writes still need to replicate • Sharding • Does scale writes
  • 154. App Global Data Child Data Function DB By Schema Proxy Partitioned Lookup Data Shared Nothing Functional Sharding Partitioning ID Go Fish Reporting Key Area Hash Cross Shard Hadoop
  • 155. Sharding • Partitioned Data • Splitting the Data • Vertically • Main Tables to partition • Child Tables • Global Tables • By Schema • Shared Nothing
  • 156. Sharding • Partitioning by Key • ID – CustomerID, ProductID, App • Area – Country, City, Continent • Hash – Random for equal spread
  • 157. Sharding • Which shard has the data? • Store it in a DB • Flexible but slower • Some Function in your App • Faster, less flexible • Proxy config file • Faster, less flexible • Needs some app coding
  • 158. Sharding • Maintenance • Backups • Slaves • Uptime • Loosely coupled system
  • 159. Sharding • Functional Partitioning • Different Apps • Share some tables Functional Partitioning
  • 160. Sharding • Reporting • Go Fish – One server / Shared nothing • Cross Shard – Many servers • Hadoop – Aggregate to one reporting server
  • 162.
  • 163.
  • 164. The End • Questions & Answers • Email: contact@jonathanlevin.co.uk • Don’t forget to rate this tutorial
  • 165. If we have time • MySQL 5.6 • NoSQL • ORM • Beyond Hadoop • Bring-Your-Own-Problems