SlideShare a Scribd company logo
1 of 18
MySQL Performance
   Optimization
            Part I

         Abhijit Mondal

 Software Engineer at HolidayIQ
Contents
InnoDB or MyISAM or is there something better ?


Choosing optimal data types


Normalization vs. Denormalization


Cache and Summary Tables


Explaining “EXPLAIN”
InnoDB vs. MyISAM
●   “You should use InnoDB for your tables unless you have a compelling need to use a
    different engine” - High Performance MySQL by Peter Zaitsev
●   InnoDB :
    Pros-
             1. Row based locking mechanism, enabling scaling of insert and update
    queries. Whole table is not locked when one client is writing to selected rows, only
    those rows are locked (and any gaps in between- “phantom rows”).
             2. Clustering by primary key for faster lookups and ordering.
             3. High concurrency.
             4. Transactional, crash-safe, better online backup capability.
             5. Adaptive Hash index construction from B Tree indexes for faster
    lookups from main memory.

    Cons-
            1. Slower writes ( insert, update queries).
            2. Slower BLOB handling.
            3. COUNT(*) queries require full table scans.
InnoDB vs. MyISAM
●   MyISAM :
    Pros-
            1. Faster reads and writes for small to medium sized tables.
            2. COUNT(*) queries are fast. Separate field that keeps track of number of
    columns.
            3. Better for FULL Text Searching. But InnoDB can now use Sphinx for
    Full Text searching.

    Cons-
             1. Non transactional, Data loss issues during crashes.
             2. Table level locking. Entire table locked in the event of read and write,
    but can insert rows while select query is being processed.
             3. Insert and update queries are not scalable, concurrency issues.
●   Memory Engine for Temporary tables : Hash indexes for faster select queries from
    temporary tables . All data stored in memory. Data lost after server restart. Example
    usage – Mapping cities/attractions to region/countries, Caching data, Temporary
    summary tables for joins.
Choosing the optimal data type
●   Always choose the smallest data type that is large enough for the largest value that it
    is representing. Smaller data type takes up lesser space in memory and CPU cache.
●   Given an option for integer or character, should choose integer because due to
    character sets and sorting rules character comparisons are complicated.
●   Unless the requirement for storing NULL value inside a field, always choose NOT
    NULL. Null values makes index construction, index stats and value comparisons
    more complicated. They also require more space. When a nullable is indexed it
    requires and extra byte per entry. InnoDB handles NULL better (only single bit)
    than MyISAM.
●   TIMESTAMP vs. DATETIME: TIMESTAMP takes half as much space (4 bytes) as
    DATETIME (8 bytes) and also has auto updating feature.
●   Using UNSIGNED integer types for AUTO_INCREMENT primary key fields
    (unless negative integers are explicitly required) . For storing cities in India (around
    1000 cities) use UNSIGNED SMALLINT, it takes values from 0 to 65535 enough
    to hold all the cities in India. Whereas INT (as per current implementation) would
    use 32 bits compared to SMALLINT(16 bits).
Choosing the optimal data type
●   VARCHAR vs. CHAR : VARCHAR is variable length data type while CHAR is
    fixed length. For shorter strings VARCHAR saves space but when updated rows
    may grow or shrink depending on the update value. VARCHAR uses 1 byte extra to
    store the length of the value if length is less than 255 bytes else it use 2 additional
    bytes. Also VARCHAR is suitable for columns which are not updated frequently as
    this requires dynamic size adjustment everytime a value is updated.




●   VARCHAR is suitable for storing city/state/country/region/attraction names as these
    values are not updated much. Whereas CHAR may be suitable for storing MD5
    passwords (fixed length) or user names/activities ( updated and inserted frequently ).
Choosing the optimal data type
●   ENUM : For storing strings values which have fixed and small sample space use
    ENUM data type. Eg. Gender (M or F), is_active (1 or 0), Day of week etc.
    Create table activity(id primary key not null auto_increment, activity varchar(20),
    day_of_week enum('sun','mon','tue','wed','thu','fri','sat'));
    ENUM values are stored as integers (TINYINT) in table hence comparisons are
    faster and takes less space.
    But joins between ENUM and VARCHAR or CHAR is less efficient as ENUM
    needs to be converted into one of those types first then comparison is done.
●   BLOB and TEXT fields cannot be indexed.
●   Using SET to combine many true/false values into single column.
    Create table test1(perms set('can_read','can_write','can_delete'));
    Insert into test1(perms) values('can_read','can_delete');
    Select perms from test1 where find_in_set('can_delete',perms);
●   Identifier for table joins should be of the same data type to improve performance by
    reducing type conversion.
    Select count(*) from destination join attractions using(destinationid, active,
    countryid);
Normalization vs. Denormalization
●   Normalization :
        Pros :
        1. Normalized updates are usually faster than denormalized updates.
        2. No duplicate data so there is less data to change.
        3. Tables are usually smaller so they fit better in memory and perform better.
        4. Lack of redundant data means less need for GROUP BY or DISTINCT
    queries.

        Cons :
        1. JOINS required to retrieve values from Normalized tables. This is usually
    expensive and would have benefitted with indexing on a denormalized table.
●   Eg. Find the users and their reviews such that review given between 4th March and
    30th June and order by user's age. Expensive join required.

    SELECT u.user_name, r.review from user u join review r using(user_id) where
    r.date_reviewed between ('2012-03-04','2012-06-30') order by u.age limit 100;
Normalization vs. Denormalization
●   Denormalization :
            Pros :
            1. No JOINS required for denormalized data. Full table scan without
    indexing is still faster than joins that doesn't fit into memory.

             Cons :
             1. Duplicate data issue arises. Denormalized table has large rows that are
    almost same except for one single column. Happens for many-to-many relation
    mapping in a single table.
             2. Inconsistencies in data during updates may arise and updates are
    expensive.
●   Eg. Find the users and their reviews such that review given between 4th March and
    30th June and order by user's age. Index on (date_reviewed, age) will greatly
    increase the performance of this query.

    SELECT user_name, review from user_review where date_reviewed between
    ('2012-03-04','2012-06-30') order by age limit 100;
Normalization vs. Denormalization
●   When same tables are joined frequently in queries it is better to denormalize one of
    the table by duplicating data from the other table. Insert, updates and deletes can be
    made consistent by creating triggers on one of them. For eg. In the case of user and
    reviews table, copy review,review_id and date_reviewed from reviews to user. Then
    create triggers for insert, update and delete on reviews table.
●   DELIMITER #
    CREATE TRIGGER `after_insert_in_reviews` after insert on reviews
    FOR EACH ROW BEGIN
        INSERT INTO user(user_id, review, date_reviewed) values(NEW.user_id,
    review,NOW());
    END#
    DELIMITER ;
●   DELIMITER #
    CREATE TRIGGER `after_delete_in_reviews` after delete on reviews
    FOR EACH ROW BEGIN
       DELETE FROM user where review_id=OLD.review_id;
    END#
    DELIMITER ;
Summary and Cache Tables
●   Consider the situations:
         1. There are 3 tables for user, reviews and destination. We want to analyze the
    number of reviews for all destinations in a particular city grouped by destination and
    in a particular user age range and in another case grouped by user gender. So we
    write the two queries as:
●   SELECT destination.destname, count(review.review_id) as review_count from user
    join reviews join destination where user.age between 20 and 30 and
    user.userid=reviews.user_id and reviews.destination_id=destination.destid and
    destination.city='Bangalore' group by destination.destid;
●   SELECT user.gender, count(review.review_id) as review_count from user join
    reviews join destination where user.age between 20 and 30 and
    user.userid=reviews.user_id and reviews.destination_id=destination.destid and
    destination.city='Bangalore' group by user.gender;
●   Instead of doing expensive joins on 3 large tables everytime where on the summary
    of data differs, we can create a summary table and update it periodically using a
    cronjob.
Summary and Cache Tables
●   CREATE table user_rev_dest_summary SELECT * from user join reviews join
    destination where user.userid=reviews.user_id and
    reviews.destination_id=destination.destid;
●   ALTER table user_rev_dest_summary add index city_index(age, city);
●   SELECT dest_name,count(review_id) as review_count from
    user_rev_dest_summary where age between 20 and 30 and city='Bangalore' group
    by destid;
●   SELECT gender,count(review_id) as review_count from user_rev_dest_summary
    where age between 20 and 30 and city='Bangalore' group by gender;
●   Using summary table our query performance has greatly improved but if
    user,destination or review tables are updated frequently our summary data may
    become stale. So need to decide at what interval to update the summary table.
Explaining “Explain”
●   EXPLAIN output columns :
Explaining “Explain”
●   EXPLAIN output columns : Important columns are type, possible_keys, key, rows
    and Extra.
●   EXPLAIN extended select dest.`Destination_name`, attr.`attractionid`,
    attr.`attractionname` from destination as dest,attractions as attr,hotels_by_locality
    as hl where dest.`Destination_id`=attr.`destinationid` and dest.`CountryID`='1' and
    dest.`other_destination`='0' and attr.`active`='1' and hl.`typeid`=attr.`attractionid`;




●   Types of “type” : From best to worse
         1. const - The table has at most one matching row, which is read at the start of
    the query. const tables are very fast because they are read only once.
    const is used when you compare all parts of a PRIMARY KEY or UNIQUE index
    to constant values.
         SELECT * FROM attractions WHERE attraction_id=8385;
Explaining “Explain”
●   Types of “type” : contd.
         2. eq_ref - One row is read from this table for each combination of rows from
    the previous tables. It is used when all parts of an index are used by the join and the
    index is a PRIMARY KEY or UNIQUE NOT NULL index.
    SELECT * from resort join city using(CityID); (CityId is primary key of city).

        3. ref - All rows with matching index values are read from this table for each
    combination of rows from the previous tables.
    SELECT * from resort join city using(StateID); (index on city.StateId but many
    rows in city having same state id).

        4. fulltext - The join is performed using a FULLTEXT index.
        5. range - Only rows that are in a given range are retrieved, using an index to
    select the rows. The key column in the output row indicates which index is used.
        SELECT * from reviews where date_reviewed between '2012-06-30' and
    '2012-08-07'; (p.s. Index on date_reviewed doesn't work with DATE() functions).
Explaining “Explain”
●   Types of “type” : contd.
        6. index - This join type is the same as ALL, except that only the index tree is
    scanned. This usually is faster than ALL because the index file usually is smaller
    than the data file.
        SELECT StateID from resort; (covering index on StateID).
        7. ALL - A full table scan is done for each combination of rows from the
    previous tables. Avoid this by adding index to the appropriate table.
●   The common “Extra” 's :
         1. Using filesort - MySQL must do an extra pass to find out how to retrieve the
    rows in sorted order. The sort is done by going through all rows according to the
    join type and storing the sort key and pointer to the row for all rows that match the
    WHERE clause.
         SELECT resort.Location from resort order by resort.StateID; (no index on
    Location).
Explaining “Explain”
●   The common “Extra” 's : contd.
         2. Using index - The column information is retrieved from the table using only
    information in the index tree without having to do an additional seek to read the
    actual row. This strategy can be used when the query uses only columns that are
    part of a single index. (covering indexes).
         SELECT resort.StateID from resort order by resort.Destination_id;       (index
    on Destination_id, StateID, StateID picked from index only after sorting by
    Destination_id ).
         3. Using temporary - To resolve the query, MySQL needs to create a temporary
    table to hold the result. This typically happens if the query contains GROUP BY
    and ORDER BY clauses that list columns differently.
         4. Using where - A WHERE clause is used to restrict which rows to match
    against the next table or send to the client. Even if you are using an index for all
    parts of a WHERE clause, you may see Using where if the column can be NULL.
         SELECT resort.Location from resort where StateID!='NULL';
References
●   High Performance MySQL by Baron Schwartz, Peter Zaitsev and Vadim
    Tkachenko.
●   http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
●   http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-in
●   http://www.techrepublic.com/blog/10things/10-ways-to-screw-up-your-database-design/18



                                Thank You

More Related Content

Viewers also liked

MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)Aurimas Mikalauskas
 
JSON Web Tokens Will Improve Your Life
JSON Web Tokens Will Improve Your LifeJSON Web Tokens Will Improve Your Life
JSON Web Tokens Will Improve Your LifeJohn Anderson
 
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aaCarlos Carvalho
 
Year 7 websites evaluation
Year 7 websites evaluationYear 7 websites evaluation
Year 7 websites evaluationfrances20
 
Lermontov
LermontovLermontov
LermontovArmine
 
Way storm 行動廣告簡介(games0124)
Way storm 行動廣告簡介(games0124)Way storm 行動廣告簡介(games0124)
Way storm 行動廣告簡介(games0124)MaVis Tseng
 
Romanticism
RomanticismRomanticism
Romanticismms_faris
 
Meltwater Buzz Service Overview
Meltwater Buzz Service OverviewMeltwater Buzz Service Overview
Meltwater Buzz Service Overviewammit0724
 
Aef4 week 2
Aef4 week 2Aef4 week 2
Aef4 week 2Les Davy
 
HTML an introduction
HTML an introductionHTML an introduction
HTML an introductionNiamh Foley
 
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013АО "Самрук-Казына"
 
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...Gomez García
 
Power Notes Atomic Structure Day 3
Power Notes   Atomic Structure Day 3Power Notes   Atomic Structure Day 3
Power Notes Atomic Structure Day 3jmori1
 

Viewers also liked (20)

MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
 
JSON Web Tokens Will Improve Your Life
JSON Web Tokens Will Improve Your LifeJSON Web Tokens Will Improve Your Life
JSON Web Tokens Will Improve Your Life
 
ขั้นตอนศาสนพิธี
ขั้นตอนศาสนพิธีขั้นตอนศาสนพิธี
ขั้นตอนศาสนพิธี
 
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
1 ea5ea59 39b4-4e4c-a0cd183077e7b0aa
 
Year 7 websites evaluation
Year 7 websites evaluationYear 7 websites evaluation
Year 7 websites evaluation
 
Lermontov
LermontovLermontov
Lermontov
 
Way storm 行動廣告簡介(games0124)
Way storm 行動廣告簡介(games0124)Way storm 行動廣告簡介(games0124)
Way storm 行動廣告簡介(games0124)
 
Pt 4
Pt 4Pt 4
Pt 4
 
Romanticism
RomanticismRomanticism
Romanticism
 
C 4
C 4C 4
C 4
 
TelOne Zimbabwe - An Internet Research
TelOne Zimbabwe - An Internet ResearchTelOne Zimbabwe - An Internet Research
TelOne Zimbabwe - An Internet Research
 
Vogue
VogueVogue
Vogue
 
Meltwater Buzz Service Overview
Meltwater Buzz Service OverviewMeltwater Buzz Service Overview
Meltwater Buzz Service Overview
 
How to hire a relief Dr without getting into Jeopardy!
How to hire a relief Dr without getting into Jeopardy!How to hire a relief Dr without getting into Jeopardy!
How to hire a relief Dr without getting into Jeopardy!
 
Aef4 week 2
Aef4 week 2Aef4 week 2
Aef4 week 2
 
HTML an introduction
HTML an introductionHTML an introduction
HTML an introduction
 
Cayla t
Cayla tCayla t
Cayla t
 
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
Презентация П.Хауза на расширенное совещание АО "Самрук-Қазына" 07.02.2013
 
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the wh...
 
Power Notes Atomic Structure Day 3
Power Notes   Atomic Structure Day 3Power Notes   Atomic Structure Day 3
Power Notes Atomic Structure Day 3
 

Similar to MySQL Performance Optimization

PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...Dave Stokes
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLIntroduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLMárton Kodok
 
Modeling data for scalable, ad hoc analytics
Modeling data for scalable, ad hoc analyticsModeling data for scalable, ad hoc analytics
Modeling data for scalable, ad hoc analyticsMariaDB plc
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)Hemant Kumar Singh
 
IDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionIDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionInfrrd
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleScyllaDB
 
What We Need to Unlearn about Persistent Storage
What We Need to Unlearn about Persistent StorageWhat We Need to Unlearn about Persistent Storage
What We Need to Unlearn about Persistent StorageScyllaDB
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101Federico Razzoli
 
MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance OptimizationMindfire Solutions
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testingsmittal81
 
6.2 my sql queryoptimization_part1
6.2 my sql queryoptimization_part16.2 my sql queryoptimization_part1
6.2 my sql queryoptimization_part1Trần Thanh
 
Elasticsearch selected topics
Elasticsearch selected topicsElasticsearch selected topics
Elasticsearch selected topicsCube Solutions
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...Datavail
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDave Stokes
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingBADR
 
Basics on SQL queries
Basics on SQL queriesBasics on SQL queries
Basics on SQL queriesKnoldus Inc.
 

Similar to MySQL Performance Optimization (20)

PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLIntroduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQL
 
Modeling data for scalable, ad hoc analytics
Modeling data for scalable, ad hoc analyticsModeling data for scalable, ad hoc analytics
Modeling data for scalable, ad hoc analytics
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
IDP with Intelligent Table Extraction
IDP with Intelligent Table ExtractionIDP with Intelligent Table Extraction
IDP with Intelligent Table Extraction
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at Scale
 
What We Need to Unlearn about Persistent Storage
What We Need to Unlearn about Persistent StorageWhat We Need to Unlearn about Persistent Storage
What We Need to Unlearn about Persistent Storage
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101
 
MySQL Performance Optimization
MySQL Performance OptimizationMySQL Performance Optimization
MySQL Performance Optimization
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
Cost Based Oracle
Cost Based OracleCost Based Oracle
Cost Based Oracle
 
6.2 my sql queryoptimization_part1
6.2 my sql queryoptimization_part16.2 my sql queryoptimization_part1
6.2 my sql queryoptimization_part1
 
Elasticsearch selected topics
Elasticsearch selected topicsElasticsearch selected topics
Elasticsearch selected topics
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
SQL Pass Summit Presentations from Datavail - Optimize SQL Server: Query Tuni...
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Basics on SQL queries
Basics on SQL queriesBasics on SQL queries
Basics on SQL queries
 

More from Abhijit Mondal

Mysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresMysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresAbhijit Mondal
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyAbhijit Mondal
 
Number Theory for Security
Number Theory for SecurityNumber Theory for Security
Number Theory for SecurityAbhijit Mondal
 

More from Abhijit Mondal (8)

Pagerank
PagerankPagerank
Pagerank
 
Poster Presentation
Poster PresentationPoster Presentation
Poster Presentation
 
Mysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data StructuresMysql Performance Optimization Indexing Algorithms and Data Structures
Mysql Performance Optimization Indexing Algorithms and Data Structures
 
My MSc. Project
My MSc. ProjectMy MSc. Project
My MSc. Project
 
Security protocols
Security protocolsSecurity protocols
Security protocols
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Number Theory for Security
Number Theory for SecurityNumber Theory for Security
Number Theory for Security
 
Quantum games
Quantum gamesQuantum games
Quantum games
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

MySQL Performance Optimization

  • 1. MySQL Performance Optimization Part I Abhijit Mondal Software Engineer at HolidayIQ
  • 2. Contents InnoDB or MyISAM or is there something better ? Choosing optimal data types Normalization vs. Denormalization Cache and Summary Tables Explaining “EXPLAIN”
  • 3. InnoDB vs. MyISAM ● “You should use InnoDB for your tables unless you have a compelling need to use a different engine” - High Performance MySQL by Peter Zaitsev ● InnoDB : Pros- 1. Row based locking mechanism, enabling scaling of insert and update queries. Whole table is not locked when one client is writing to selected rows, only those rows are locked (and any gaps in between- “phantom rows”). 2. Clustering by primary key for faster lookups and ordering. 3. High concurrency. 4. Transactional, crash-safe, better online backup capability. 5. Adaptive Hash index construction from B Tree indexes for faster lookups from main memory. Cons- 1. Slower writes ( insert, update queries). 2. Slower BLOB handling. 3. COUNT(*) queries require full table scans.
  • 4. InnoDB vs. MyISAM ● MyISAM : Pros- 1. Faster reads and writes for small to medium sized tables. 2. COUNT(*) queries are fast. Separate field that keeps track of number of columns. 3. Better for FULL Text Searching. But InnoDB can now use Sphinx for Full Text searching. Cons- 1. Non transactional, Data loss issues during crashes. 2. Table level locking. Entire table locked in the event of read and write, but can insert rows while select query is being processed. 3. Insert and update queries are not scalable, concurrency issues. ● Memory Engine for Temporary tables : Hash indexes for faster select queries from temporary tables . All data stored in memory. Data lost after server restart. Example usage – Mapping cities/attractions to region/countries, Caching data, Temporary summary tables for joins.
  • 5. Choosing the optimal data type ● Always choose the smallest data type that is large enough for the largest value that it is representing. Smaller data type takes up lesser space in memory and CPU cache. ● Given an option for integer or character, should choose integer because due to character sets and sorting rules character comparisons are complicated. ● Unless the requirement for storing NULL value inside a field, always choose NOT NULL. Null values makes index construction, index stats and value comparisons more complicated. They also require more space. When a nullable is indexed it requires and extra byte per entry. InnoDB handles NULL better (only single bit) than MyISAM. ● TIMESTAMP vs. DATETIME: TIMESTAMP takes half as much space (4 bytes) as DATETIME (8 bytes) and also has auto updating feature. ● Using UNSIGNED integer types for AUTO_INCREMENT primary key fields (unless negative integers are explicitly required) . For storing cities in India (around 1000 cities) use UNSIGNED SMALLINT, it takes values from 0 to 65535 enough to hold all the cities in India. Whereas INT (as per current implementation) would use 32 bits compared to SMALLINT(16 bits).
  • 6. Choosing the optimal data type ● VARCHAR vs. CHAR : VARCHAR is variable length data type while CHAR is fixed length. For shorter strings VARCHAR saves space but when updated rows may grow or shrink depending on the update value. VARCHAR uses 1 byte extra to store the length of the value if length is less than 255 bytes else it use 2 additional bytes. Also VARCHAR is suitable for columns which are not updated frequently as this requires dynamic size adjustment everytime a value is updated. ● VARCHAR is suitable for storing city/state/country/region/attraction names as these values are not updated much. Whereas CHAR may be suitable for storing MD5 passwords (fixed length) or user names/activities ( updated and inserted frequently ).
  • 7. Choosing the optimal data type ● ENUM : For storing strings values which have fixed and small sample space use ENUM data type. Eg. Gender (M or F), is_active (1 or 0), Day of week etc. Create table activity(id primary key not null auto_increment, activity varchar(20), day_of_week enum('sun','mon','tue','wed','thu','fri','sat')); ENUM values are stored as integers (TINYINT) in table hence comparisons are faster and takes less space. But joins between ENUM and VARCHAR or CHAR is less efficient as ENUM needs to be converted into one of those types first then comparison is done. ● BLOB and TEXT fields cannot be indexed. ● Using SET to combine many true/false values into single column. Create table test1(perms set('can_read','can_write','can_delete')); Insert into test1(perms) values('can_read','can_delete'); Select perms from test1 where find_in_set('can_delete',perms); ● Identifier for table joins should be of the same data type to improve performance by reducing type conversion. Select count(*) from destination join attractions using(destinationid, active, countryid);
  • 8. Normalization vs. Denormalization ● Normalization : Pros : 1. Normalized updates are usually faster than denormalized updates. 2. No duplicate data so there is less data to change. 3. Tables are usually smaller so they fit better in memory and perform better. 4. Lack of redundant data means less need for GROUP BY or DISTINCT queries. Cons : 1. JOINS required to retrieve values from Normalized tables. This is usually expensive and would have benefitted with indexing on a denormalized table. ● Eg. Find the users and their reviews such that review given between 4th March and 30th June and order by user's age. Expensive join required. SELECT u.user_name, r.review from user u join review r using(user_id) where r.date_reviewed between ('2012-03-04','2012-06-30') order by u.age limit 100;
  • 9. Normalization vs. Denormalization ● Denormalization : Pros : 1. No JOINS required for denormalized data. Full table scan without indexing is still faster than joins that doesn't fit into memory. Cons : 1. Duplicate data issue arises. Denormalized table has large rows that are almost same except for one single column. Happens for many-to-many relation mapping in a single table. 2. Inconsistencies in data during updates may arise and updates are expensive. ● Eg. Find the users and their reviews such that review given between 4th March and 30th June and order by user's age. Index on (date_reviewed, age) will greatly increase the performance of this query. SELECT user_name, review from user_review where date_reviewed between ('2012-03-04','2012-06-30') order by age limit 100;
  • 10. Normalization vs. Denormalization ● When same tables are joined frequently in queries it is better to denormalize one of the table by duplicating data from the other table. Insert, updates and deletes can be made consistent by creating triggers on one of them. For eg. In the case of user and reviews table, copy review,review_id and date_reviewed from reviews to user. Then create triggers for insert, update and delete on reviews table. ● DELIMITER # CREATE TRIGGER `after_insert_in_reviews` after insert on reviews FOR EACH ROW BEGIN INSERT INTO user(user_id, review, date_reviewed) values(NEW.user_id, review,NOW()); END# DELIMITER ; ● DELIMITER # CREATE TRIGGER `after_delete_in_reviews` after delete on reviews FOR EACH ROW BEGIN DELETE FROM user where review_id=OLD.review_id; END# DELIMITER ;
  • 11. Summary and Cache Tables ● Consider the situations: 1. There are 3 tables for user, reviews and destination. We want to analyze the number of reviews for all destinations in a particular city grouped by destination and in a particular user age range and in another case grouped by user gender. So we write the two queries as: ● SELECT destination.destname, count(review.review_id) as review_count from user join reviews join destination where user.age between 20 and 30 and user.userid=reviews.user_id and reviews.destination_id=destination.destid and destination.city='Bangalore' group by destination.destid; ● SELECT user.gender, count(review.review_id) as review_count from user join reviews join destination where user.age between 20 and 30 and user.userid=reviews.user_id and reviews.destination_id=destination.destid and destination.city='Bangalore' group by user.gender; ● Instead of doing expensive joins on 3 large tables everytime where on the summary of data differs, we can create a summary table and update it periodically using a cronjob.
  • 12. Summary and Cache Tables ● CREATE table user_rev_dest_summary SELECT * from user join reviews join destination where user.userid=reviews.user_id and reviews.destination_id=destination.destid; ● ALTER table user_rev_dest_summary add index city_index(age, city); ● SELECT dest_name,count(review_id) as review_count from user_rev_dest_summary where age between 20 and 30 and city='Bangalore' group by destid; ● SELECT gender,count(review_id) as review_count from user_rev_dest_summary where age between 20 and 30 and city='Bangalore' group by gender; ● Using summary table our query performance has greatly improved but if user,destination or review tables are updated frequently our summary data may become stale. So need to decide at what interval to update the summary table.
  • 13. Explaining “Explain” ● EXPLAIN output columns :
  • 14. Explaining “Explain” ● EXPLAIN output columns : Important columns are type, possible_keys, key, rows and Extra. ● EXPLAIN extended select dest.`Destination_name`, attr.`attractionid`, attr.`attractionname` from destination as dest,attractions as attr,hotels_by_locality as hl where dest.`Destination_id`=attr.`destinationid` and dest.`CountryID`='1' and dest.`other_destination`='0' and attr.`active`='1' and hl.`typeid`=attr.`attractionid`; ● Types of “type” : From best to worse 1. const - The table has at most one matching row, which is read at the start of the query. const tables are very fast because they are read only once. const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values. SELECT * FROM attractions WHERE attraction_id=8385;
  • 15. Explaining “Explain” ● Types of “type” : contd. 2. eq_ref - One row is read from this table for each combination of rows from the previous tables. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index. SELECT * from resort join city using(CityID); (CityId is primary key of city). 3. ref - All rows with matching index values are read from this table for each combination of rows from the previous tables. SELECT * from resort join city using(StateID); (index on city.StateId but many rows in city having same state id). 4. fulltext - The join is performed using a FULLTEXT index. 5. range - Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. SELECT * from reviews where date_reviewed between '2012-06-30' and '2012-08-07'; (p.s. Index on date_reviewed doesn't work with DATE() functions).
  • 16. Explaining “Explain” ● Types of “type” : contd. 6. index - This join type is the same as ALL, except that only the index tree is scanned. This usually is faster than ALL because the index file usually is smaller than the data file. SELECT StateID from resort; (covering index on StateID). 7. ALL - A full table scan is done for each combination of rows from the previous tables. Avoid this by adding index to the appropriate table. ● The common “Extra” 's : 1. Using filesort - MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. SELECT resort.Location from resort order by resort.StateID; (no index on Location).
  • 17. Explaining “Explain” ● The common “Extra” 's : contd. 2. Using index - The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index. (covering indexes). SELECT resort.StateID from resort order by resort.Destination_id; (index on Destination_id, StateID, StateID picked from index only after sorting by Destination_id ). 3. Using temporary - To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently. 4. Using where - A WHERE clause is used to restrict which rows to match against the next table or send to the client. Even if you are using an index for all parts of a WHERE clause, you may see Using where if the column can be NULL. SELECT resort.Location from resort where StateID!='NULL';
  • 18. References ● High Performance MySQL by Baron Schwartz, Peter Zaitsev and Vadim Tkachenko. ● http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/ ● http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-in ● http://www.techrepublic.com/blog/10things/10-ways-to-screw-up-your-database-design/18 Thank You