Beyond PHP :
It's not (just) about the code



                              Wim Godden
                            Cu.be Solutions
Who am I ?
 Wim Godden (@wimgtr)
 Founder of Cu.be Solutions (http://cu.be)
 Open Source developer since 1997
 Developer of OpenX
 Zend Certified Engineer
 Zend Framework Certified Engineer
 MySQL Certified Developer
 Speaker at PHP and Open Source conferences
Cu.be Solutions ?
 Open source consultancy
 PHP-centered
 High-speed redundant network (BGP, OSPF, VRRP)
 High scalability development
    Nginx + extensions
    MySQL Cluster


 Projects :
    mostly IT & Telecom companies
    lots of public-facing apps/sites
Who are you ?
 Developers ?

 Anyone setup a MySQL master-slave ?

 Anyone setup a site/app on separate web and database server ?
 → How much traffic between them ?
The topic
 Things we take for granted
 Famous last words : "It should work just fine"

 Works fine today
 → might fail tomorrow

 Most common mistakes

 PHP code ↔ PHP ecosystem

 How-to & How-NOT-to
It starts with...
  … code !




  First up : database
Database queries – complexity
       SELECT DISTINCT n.nid, n.uid, n.title, n.type, e.event_start, e.event_start AS
       event_start_orig, e.event_end, e.event_end AS event_end_orig, e.timezone,
       e.has_time, e.has_end_date, tz.offset AS offset, tz.offset_dst AS offset_dst,
       tz.dst_region, tz.is_dst, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst,
       tz.offset) HOUR_SECOND AS event_start_utc, e.event_end - INTERVAL
       IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND AS event_end_utc,
       e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND +
       INTERVAL 0 SECOND AS event_start_user, e.event_end - INTERVAL IF(tz.is_dst,
       tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS
       event_end_user, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset)
       HOUR_SECOND + INTERVAL 0 SECOND AS event_start_site, e.event_end -
       INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0
       SECOND AS event_end_site, tz.name as timezone_name FROM node n INNER
       JOIN event e ON n.nid = e.nid INNER JOIN event_timezones tz ON tz.timezone =
       e.timezone INNER JOIN node_access na ON na.nid = n.nid LEFT JOIN
       domain_access da ON n.nid = da.nid LEFT JOIN node i18n ON n.tnid > 0 AND
       n.tnid = i18n.tnid AND i18n.language = 'en' WHERE (na.grant_view >= 1 AND
       ((na.gid = 0 AND na.realm = 'all'))) AND ((da.realm = "domain_id" AND da.gid = 4)
       OR (da.realm = "domain_site" AND da.gid = 0)) AND (n.language ='en' OR
       n.language ='' OR n.language IS NULL OR n.language = 'is' AND i18n.nid IS NULL)
       AND ( n.status = 1 AND ((e.event_start >= '2010-01-31 00:00:00' AND
       e.event_start <= '2010-03-01 23:59:59') OR (e.event_end >= '2010-01-31 00:00:00'
       AND e.event_end <= '2010-03-01 23:59:59') OR (e.event_start <= '2010-01-31
       00:00:00' AND e.event_end >= '2010-03-01 23:59:59')) ) GROUP BY n.nid HAVING
       (event_start >= '2010-02-01 00:00:00' AND event_start <= '2010-02-28 23:59:59')
       OR (event_end >= '2010-02-01 00:00:00' AND event_end <= '2010-02-28 23:59:59')
       OR (event_start <= '2010-02-01 00:00:00' AND event_end >= '2010-02-28
       23:59:59') ORDER BY event_start ASC;
Database - indexing
 'select id from stock where status = 2 order by qty'
    → aggregate index on (status, qty)
 'select id from stock where status > 2 order by qty'
    → aggregate index on (status, qty) ?
    → No : range selection stops use of aggregate index
    → separate index on status and qty
Database - indexing
 Indexes make database faster
    → Let's index everything !
    → DON'T :
       Insert/update/delete → Index modification
       Each query → evaluation of all indexes




 "Relational schema design is based on data
                         but index design is based on queries"
                                        (Bill Karwin, Percona)
Databases – detecting problematic queries
 Slow query log
    → SET GLOBAL slow_query_log = ON;
 Queries not using indexes
    → In my.cnf/my.ini : 'log_queries_not_using_indexes'
 General query log
    → SET GLOBAL general_log = ON;
    → Turn it off quickly !
 Percona Toolkit (Maatkit)
    pt-query-digest
Databases - pt-query-digest




#   Profile
#   Rank Query ID           Response time    Calls R/Call Apdx V/M    Item
#   ==== ================== ================ ===== ======= ==== ===== ==========
#      1 0x543FB322AE4330FF 16526.2542 62.0% 1208 13.6806 1.00 0.00 SELECT output_option
#      2 0xE78FEA32E3AA3221     0.8312 10.3% 6412 0.0001 1.00 0.00 SELECT poller_output poller_item
#      3 0x211901BF2E1C351E     0.6811 8.4% 6416 0.0001 1.00 0.00 SELECT poller_time
#      4 0xA766EE8F7AB39063     0.2805 3.5%    149 0.0019 1.00 0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships
#      5 0xA3EEB63EFBA42E9B     0.1999 2.5%     51 0.0039 1.00 0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary
#      6 0x94350EA2AB8AAC34     0.1956 2.4%     89 0.0022 1.00 0.01 UPDATE wp_options
#   MISC 0xMISC                 0.8137 10.0% 3853 0.0002    NS   0.0 <147 ITEMS>
Databases - pt-query-digest
        # Query 2: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
        # This item is included in the report because it matches --limit.
        # Scores: Apdex = 1.00 [1.0], V/M = 0.00
        # Query_time sparkline: |    _^    |
        # Time range: 2011-12-28 18:42:47 to 19:03:10
        # Attribute    pct   total      min    max     avg     95% stddev median
        # ============ === ======= ======= ======= ======= ======= ======= =======
        # Count           1    312
        # Exec time      50     4s      5ms   25ms    13ms    20ms      4ms    12ms
        # Lock time       3   32ms     43us  163us   103us   131us     19us    98us
        # Rows sent      59 62.41k      203    231 204.82 202.40       3.99 202.40
        # Rows examine 13 73.63k        238    296 241.67 246.02     10.15 234.30
        # Rows affecte    0       0        0      0      0       0        0       0
        # Rows read      59 62.41k      203    231 204.82 202.40       3.99 202.40
        # Bytes sent     53 24.85M 46.52k 84.36k 81.56k 83.83k       7.31k 79.83k
        # Merge passes    0       0        0      0      0       0        0       0
        # Tmp tables      0       0        0      0      0       0        0       0
        # Tmp disk tbl    0       0        0      0      0       0        0       0
        # Tmp tbl size    0       0        0      0      0       0        0       0
        # Query size      0 21.63k        71     71     71      71        0      71
        # InnoDB:
        # IO r bytes      0       0        0      0      0       0        0       0
        # IO r ops        0       0        0      0      0       0        0       0
        # IO r wait       0       0        0      0      0       0        0       0
        # pages distin 40 11.77k          34     44  38.62   38.53     1.87  38.53
        # queue wait      0       0        0      0      0       0        0       0
        # rec lock wai    0       0        0      0      0       0        0       0
        # Boolean:
        # Full scan    100% yes,    0% no
        # String:
        # Databases    wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more
        # Hosts
        # InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more
        # Last errno   0
        # Users        wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more
        # Query_time distribution
        #   1us
        # 10us
        # 100us
        #   1ms
        # 10ms ################################################################
        # 100ms
        #    1s
        # 10s+
        # Tables
        #    SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'G
        #    SHOW CREATE TABLE `wp_blog_one `.`wp_options`G
        # EXPLAIN /*!50100 PARTITIONS*/
        SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'G
Databases – pt-query-digest – Digest UI
Databases – next step : explain
 explain <query>
 "How will MySQL execute the query"
Databases – next step : explain
 +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
 | id | select_type | TABLE     | TYPE | possible_keys | KEY   | key_len | REF   | ROWS   | Extra       |
 +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
 |   1 | SIMPLE    | employees | ALL   | NULL           | NULL | NULL   | NULL | 299809 | USING WHERE |
 +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+


 +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+
 | id | select_type | table      | type   | possible_keys                | key       | key_len | ref   | rows | Extra |
 +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+
 |   1 | SIMPLE    | itdevice    | const | PRIMARY,fk_device_devicetype1 | PRIMARY | 4         | const |    1 |       |
 |   1 | SIMPLE    | devicetype | const | PRIMARY                        | PRIMARY | 4         | const |    1 |       |
 +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+
Databases – next step : explain
 explain <query>
 "How will MySQL execute the query"
 Shows :
    Indexes available
    Indexes used (do you see one ?)
    Number of rows scanned
    Type of lookup
        'system', 'const' and 'ref' = good
        'ALL' = bad
    Extra info
        Using index = good
        Using filesort = usually bad
        Using where = bad
Databases – when to use / not to use
 Good at :
    Fetching data
    Storing data
    Searching through data
 Bad at :
    select `someField` from `bigTable` where crc32(`field`) = "something"
    → full table scan
For / foreach




           $customers = CustomerQuery::create()
               ->filterByState('SC')
               ->find();
           foreach ($customers as $customer) {
               $contacts = ContactsQuery::create()
                   ->filterByCustomerid($customer->getId())
                   ->find();
               foreach ($contacts as $contact) {
                   doSomestuffWith($contact);
               }
           }
Joins



        $contacts = mysql_query("
            select
                 contacts.*
            from
                 customer
                 join contact
                     on contact.customerid = customer.id
            where
                 state = 'SC'
            ");
        while ($contact = mysql_fetch_array($contacts)) {
            doSomeStuffWith($contact);
        }



                or the ORM equivalent
Better...
 10001 → 1 query
 Sadly : people still produce code with query loops

 Usually :
    Growth not anticipated
    Internal app → Public app
The origins of this talk
 Customers :
    Projects we built
    Projects we didn't build, but got pulled into
        Fixes
        Changes
        Infrastructure migration



 15 years of 'how to cause mayhem with a few lines of code'
Client X
 Jobs search site
 Monitor job views :
    Daily hits
    Weekly hits
    Monthly hits
    Which user saw which job
Client X
 Originally : when user viewed job details
 Now : when job is in search result

 Search for 'php' → 50 jobs = 50 jobs to be updated
 → 50 updates for shown_today
 → 50 updates for shown_week
 → 50 updates for shown_month
 → 50 inserts for shown_user
Client X : the code

foreach ($jobs as $job) {                  $db->query("
    $db->query("                               insert into shown_month(
        insert into shown_today(                   jobId,
            jobId,                                 number
            number                             ) values(
        ) values(                                  " . $job['id'] . ",
            " . $job['id'] . ",                    1
            1                                  )
        )                                      on duplicate key
        on duplicate key                           update
            update                                      number = number + 1
                 number = number + 1       ");
    ");                                    $db->query("
    $db->query("                               insert into shown_user(
        insert into shown_week(                    jobId,
            jobId,                                 userId,
            number                                 when
        ) values(                              ) values (
            " . $job['id'] . ",                    " . $job['id'] . ",
            1                                      " . $user['id'] . ",
        )                                          now()
        on duplicate key                       )
            update                         ");
                 number = number + 1   }
    ");
Client X : the graph
Client X : the numbers
 600-1000 updates/sec (peaks up to 1600)
 400-1000 updates/sec (peaks up to 2600)
 16 core machine
Client X : panic !
 Mail : "MySQL slave is more than 5 minutes behind master"

 We set it up → who did they blame ?

 Wait a second !
Client X : what's causing those peaks ?
Client X : possible cause ?
 Code changes ?
    → According to developers : none
 Action : turn on general log, analyze with pt-query-digest
    → 50+-fold increase in queries
    → Developers : 'Oops we did make a change'




 After 3 days : 2,5 days behind
 Every hour : 50 min extra lag
Client X : But why is the slave lagging ?



                  File :                                   File :
           master-bin-xxxx.log   Slave I/O thread   master-bin-xxxx.log
                         p                                     Sl
                   um                                            av
                g d ad                                         th e S
                                                                  re Q
            n lo e                                                  ad L
          Bi thr




      Master                                                               Slave
Client X : Master
Client X : Slave
Client X : fix ?

foreach ($jobs as $job) {                  $db->query("
    $db->query("                               insert into shown_month(
        insert into shown_today(                   jobId,
            jobId,                                 number
            number                             ) values(
        ) values(                                  " . $job['id'] . ",
            " . $job['id'] . ",                    1
            1                                  )
        )                                      on duplicate key
        on duplicate key                           update
            update                                      number = number + 1
                 number = number + 1       ");
    ");                                    $db->query("
    $db->query("                               insert into shown_user(
        insert into shown_week(                    jobId,
            jobId,                                 userId,
            number                                 when
        ) values(                              ) values (
            " . $job['id'] . ",                    " . $job['id'] . ",
            1                                      " . $user['id'] . ",
        )                                          now()
        on duplicate key                       )
            update                         ");
                 number = number + 1   }
    ");
Client X : the code change
                       $todayQuery = "
                           insert into shown_today(
                               jobId,
                               number
                           ) values ";

                       foreach ($jobs as $job) {
                           $todayQuery .= "(" . $job['id'] . ", 1),";
                       }

                       $todayQuery = substr($todayQuery, -1);

                       $todayQuery .= "
                           )
                           on duplicate key
                               update
                                   number = number + 1
                           ";

                       $db->query($todayQuery);



Result : insert into shown_today values (5, 1), (8, 1), (12, 1), (18, 1), ...

                          Careful : max_allowed_packet !
Client X : the chosen solution

$db->autocommit(false);                    $db->query("
foreach ($jobs as $job) {                      insert into shown_month(
    $db->query("                                   jobId,
        insert into shown_today(                   number
            jobId,                             ) values(
            number                                 " . $job['id'] . ",
        ) values(                                  1
            " . $job['id'] . ",                )
            1                                  on duplicate key
        )                                          update
        on duplicate key                                number = number + 1
            update                         ");
                 number = number + 1       $db->query("
    ");                                        insert into shown_user(
    $db->query("                                   jobId,
        insert into shown_week(                    userId,
            jobId,                                 when
            number                             ) values (
        ) values(                                  " . $job['id'] . ",
            " . $job['id'] . ",                    " . $user['id'] . ",
            1                                      now()
        )                                      )
        on duplicate key                   ");
            update                     }
                 number = number + 1   $db->commit();
    ");
Client X : conclusion
 For loops are bad (we already knew that)
 Add master/slave and it gets much worse

 Use transactions : it will provide huge performance increase

 Result : slave caught up 5 days later
Database → Network
 Customer Y
 Top 10 site in Belgium
 Growing rapidly
 At peak traffic :
    Unexplicable latency on database
    Load on webservers : minimal
    Load on database servers : acceptable
Client Y : the network
Client Y : the network




       60GB       700GB   700GB
Client Y : network overload
 Cause : Drupal hooks → retrieving data that was not needed
 Only load data you actually need
 Don't know at the start ? → Use lazy loading

 Caching :
    Same story
    Memcached/Redis are fast
    But : data still needs to cross the network
Network trouble : more than just traffic
 Customer Z
 150.000 visits/day



 News ticker :
    XML feed from other site (owned by same customer)
    Cached for 15 min
Customer Z – fetching the feed




     if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) {
         unlink(APP_DIR . '/tmp/ScrambledSiteName.xml');
         file_put_contents(
             APP_DIR . '/tmp/ScrambledSiteName.xml',
             file_get_contents('http://www.scrambledsitename.be/xml/feed.xml')
         );
     }
     $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');




                       What's wrong with this code ?
Customer Z – no feed without the source




                        Feed source
Customer Z – no feed without the source




                        Feed source
Customer Z : timeout
 default_socket_timeout : 60 sec by default
 Each visitor : 60 sec wait time
 People keep hitting refresh → more load
 More active connections → more load
 Apache hits maximum connections → entire site down
Customer Z : timeout fix



$context = stream_context_create(
    array(
        'http' => array(
            'timeout' => 5
        )
    )
);

if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) {
    unlink(APP_DIR . '/tmp/ScrambledSiteName.xml');
    file_put_contents(
        APP_DIR . '/tmp/ScrambledSiteName.xml',
        file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context)
    );
}
$xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');
Customer Z : don't delete from cache



$context = stream_context_create(
    array(
        'http' => array(
            'timeout' => 5
        )
    )
);

if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) {
    unlink(APP_DIR . '/tmp/ScrambledSiteName.xml');
    file_put_contents(
        APP_DIR . '/tmp/ScrambledSiteName.xml',
        file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context)
    );
}
$xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');
Network resources
 Use timeouts for all :
    fopen
    curl
    SOAP
    …
 Data source trusted ?
    → setup a webservice
    → let them push updates when their feed changes
    → less load on data source
    → no timeout issues
 Add logging → early detection
Logging
 Logging = good
 Logging in PHP using fopen
 → bad idea : locking issues
 → Use file_put_contents($filename, $data, FILE_APPEND)
 For Firefox : FirePHP (add-on for Firebug)
 Debug logging = bad on production

 Watch your logs !
 Don't log on slow disks → I/O bottlenecks
File system : I/O bottlenecks
 Causes :
    Excessive writes (database updates, logfiles, swapping, …)
    Excessive reads (non-indexed database queries, swapping, small file
    system cache, …)
 How to detect ?
    top
    iostat




 See iowait ? Stop worrying about php, fix the I/O problem !
File system
 Worst of all : NFS
    PHP files → lstat calls
    Templates → same
    Sessions
    → locking issues
    → corrupt data
    → store sessions in database, Memcached, Redis, ...
Much more than code



              XML feed




                          Network
    User
              Webserver              DB
                                    server
Questions ?
Questions ?
Contact
 Twitter          @wimgtr
 Web              http://techblog.wimgodden.be
 Slides           http://www.slideshare.net/wimg
 E-mail           wim.godden@cu.be



                          Please...
           Rate my talk : http://spkr8.com/t/21141
Thanks !
               Please...
Rate my talk : http://spkr8.com/t/21141

Beyond PHP - it's not (just) about the code

  • 1.
    Beyond PHP : It'snot (just) about the code Wim Godden Cu.be Solutions
  • 2.
    Who am I? Wim Godden (@wimgtr) Founder of Cu.be Solutions (http://cu.be) Open Source developer since 1997 Developer of OpenX Zend Certified Engineer Zend Framework Certified Engineer MySQL Certified Developer Speaker at PHP and Open Source conferences
  • 3.
    Cu.be Solutions ? Open source consultancy PHP-centered High-speed redundant network (BGP, OSPF, VRRP) High scalability development Nginx + extensions MySQL Cluster Projects : mostly IT & Telecom companies lots of public-facing apps/sites
  • 4.
    Who are you? Developers ? Anyone setup a MySQL master-slave ? Anyone setup a site/app on separate web and database server ? → How much traffic between them ?
  • 5.
    The topic Thingswe take for granted Famous last words : "It should work just fine" Works fine today → might fail tomorrow Most common mistakes PHP code ↔ PHP ecosystem How-to & How-NOT-to
  • 6.
    It starts with... … code ! First up : database
  • 7.
    Database queries –complexity SELECT DISTINCT n.nid, n.uid, n.title, n.type, e.event_start, e.event_start AS event_start_orig, e.event_end, e.event_end AS event_end_orig, e.timezone, e.has_time, e.has_end_date, tz.offset AS offset, tz.offset_dst AS offset_dst, tz.dst_region, tz.is_dst, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND AS event_start_utc, e.event_end - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND AS event_end_utc, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_start_user, e.event_end - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_end_user, e.event_start - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_start_site, e.event_end - INTERVAL IF(tz.is_dst, tz.offset_dst, tz.offset) HOUR_SECOND + INTERVAL 0 SECOND AS event_end_site, tz.name as timezone_name FROM node n INNER JOIN event e ON n.nid = e.nid INNER JOIN event_timezones tz ON tz.timezone = e.timezone INNER JOIN node_access na ON na.nid = n.nid LEFT JOIN domain_access da ON n.nid = da.nid LEFT JOIN node i18n ON n.tnid > 0 AND n.tnid = i18n.tnid AND i18n.language = 'en' WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all'))) AND ((da.realm = "domain_id" AND da.gid = 4) OR (da.realm = "domain_site" AND da.gid = 0)) AND (n.language ='en' OR n.language ='' OR n.language IS NULL OR n.language = 'is' AND i18n.nid IS NULL) AND ( n.status = 1 AND ((e.event_start >= '2010-01-31 00:00:00' AND e.event_start <= '2010-03-01 23:59:59') OR (e.event_end >= '2010-01-31 00:00:00' AND e.event_end <= '2010-03-01 23:59:59') OR (e.event_start <= '2010-01-31 00:00:00' AND e.event_end >= '2010-03-01 23:59:59')) ) GROUP BY n.nid HAVING (event_start >= '2010-02-01 00:00:00' AND event_start <= '2010-02-28 23:59:59') OR (event_end >= '2010-02-01 00:00:00' AND event_end <= '2010-02-28 23:59:59') OR (event_start <= '2010-02-01 00:00:00' AND event_end >= '2010-02-28 23:59:59') ORDER BY event_start ASC;
  • 8.
    Database - indexing 'select id from stock where status = 2 order by qty' → aggregate index on (status, qty) 'select id from stock where status > 2 order by qty' → aggregate index on (status, qty) ? → No : range selection stops use of aggregate index → separate index on status and qty
  • 9.
    Database - indexing Indexes make database faster → Let's index everything ! → DON'T : Insert/update/delete → Index modification Each query → evaluation of all indexes "Relational schema design is based on data but index design is based on queries" (Bill Karwin, Percona)
  • 10.
    Databases – detectingproblematic queries Slow query log → SET GLOBAL slow_query_log = ON; Queries not using indexes → In my.cnf/my.ini : 'log_queries_not_using_indexes' General query log → SET GLOBAL general_log = ON; → Turn it off quickly ! Percona Toolkit (Maatkit) pt-query-digest
  • 11.
    Databases - pt-query-digest # Profile # Rank Query ID Response time Calls R/Call Apdx V/M Item # ==== ================== ================ ===== ======= ==== ===== ========== # 1 0x543FB322AE4330FF 16526.2542 62.0% 1208 13.6806 1.00 0.00 SELECT output_option # 2 0xE78FEA32E3AA3221 0.8312 10.3% 6412 0.0001 1.00 0.00 SELECT poller_output poller_item # 3 0x211901BF2E1C351E 0.6811 8.4% 6416 0.0001 1.00 0.00 SELECT poller_time # 4 0xA766EE8F7AB39063 0.2805 3.5% 149 0.0019 1.00 0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships # 5 0xA3EEB63EFBA42E9B 0.1999 2.5% 51 0.0039 1.00 0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary # 6 0x94350EA2AB8AAC34 0.1956 2.4% 89 0.0022 1.00 0.01 UPDATE wp_options # MISC 0xMISC 0.8137 10.0% 3853 0.0002 NS 0.0 <147 ITEMS>
  • 12.
    Databases - pt-query-digest # Query 2: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299 # This item is included in the report because it matches --limit. # Scores: Apdex = 1.00 [1.0], V/M = 0.00 # Query_time sparkline: | _^ | # Time range: 2011-12-28 18:42:47 to 19:03:10 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 1 312 # Exec time 50 4s 5ms 25ms 13ms 20ms 4ms 12ms # Lock time 3 32ms 43us 163us 103us 131us 19us 98us # Rows sent 59 62.41k 203 231 204.82 202.40 3.99 202.40 # Rows examine 13 73.63k 238 296 241.67 246.02 10.15 234.30 # Rows affecte 0 0 0 0 0 0 0 0 # Rows read 59 62.41k 203 231 204.82 202.40 3.99 202.40 # Bytes sent 53 24.85M 46.52k 84.36k 81.56k 83.83k 7.31k 79.83k # Merge passes 0 0 0 0 0 0 0 0 # Tmp tables 0 0 0 0 0 0 0 0 # Tmp disk tbl 0 0 0 0 0 0 0 0 # Tmp tbl size 0 0 0 0 0 0 0 0 # Query size 0 21.63k 71 71 71 71 0 71 # InnoDB: # IO r bytes 0 0 0 0 0 0 0 0 # IO r ops 0 0 0 0 0 0 0 0 # IO r wait 0 0 0 0 0 0 0 0 # pages distin 40 11.77k 34 44 38.62 38.53 1.87 38.53 # queue wait 0 0 0 0 0 0 0 0 # rec lock wai 0 0 0 0 0 0 0 0 # Boolean: # Full scan 100% yes, 0% no # String: # Databases wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more # Hosts # InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more # Last errno 0 # Users wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms ################################################################ # 100ms # 1s # 10s+ # Tables # SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'G # SHOW CREATE TABLE `wp_blog_one `.`wp_options`G # EXPLAIN /*!50100 PARTITIONS*/ SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'G
  • 13.
  • 14.
    Databases – nextstep : explain explain <query> "How will MySQL execute the query"
  • 15.
    Databases – nextstep : explain +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | REF | ROWS | Extra | +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | employees | ALL | NULL | NULL | NULL | NULL | 299809 | USING WHERE | +----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+ +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | itdevice | const | PRIMARY,fk_device_devicetype1 | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | devicetype | const | PRIMARY | PRIMARY | 4 | const | 1 | | +----+-------------+------------+-------+-------------------------------+---------+---------+-------+------+-------+
  • 16.
    Databases – nextstep : explain explain <query> "How will MySQL execute the query" Shows : Indexes available Indexes used (do you see one ?) Number of rows scanned Type of lookup 'system', 'const' and 'ref' = good 'ALL' = bad Extra info Using index = good Using filesort = usually bad Using where = bad
  • 17.
    Databases – whento use / not to use Good at : Fetching data Storing data Searching through data Bad at : select `someField` from `bigTable` where crc32(`field`) = "something" → full table scan
  • 18.
    For / foreach $customers = CustomerQuery::create() ->filterByState('SC') ->find(); foreach ($customers as $customer) { $contacts = ContactsQuery::create() ->filterByCustomerid($customer->getId()) ->find(); foreach ($contacts as $contact) { doSomestuffWith($contact); } }
  • 19.
    Joins $contacts = mysql_query(" select contacts.* from customer join contact on contact.customerid = customer.id where state = 'SC' "); while ($contact = mysql_fetch_array($contacts)) { doSomeStuffWith($contact); } or the ORM equivalent
  • 20.
    Better... 10001 →1 query Sadly : people still produce code with query loops Usually : Growth not anticipated Internal app → Public app
  • 21.
    The origins ofthis talk Customers : Projects we built Projects we didn't build, but got pulled into Fixes Changes Infrastructure migration 15 years of 'how to cause mayhem with a few lines of code'
  • 22.
    Client X Jobssearch site Monitor job views : Daily hits Weekly hits Monthly hits Which user saw which job
  • 23.
    Client X Originally: when user viewed job details Now : when job is in search result Search for 'php' → 50 jobs = 50 jobs to be updated → 50 updates for shown_today → 50 updates for shown_week → 50 updates for shown_month → 50 inserts for shown_user
  • 24.
    Client X :the code foreach ($jobs as $job) { $db->query(" $db->query(" insert into shown_month( insert into shown_today( jobId, jobId, number number ) values( ) values( " . $job['id'] . ", " . $job['id'] . ", 1 1 ) ) on duplicate key on duplicate key update update number = number + 1 number = number + 1 "); "); $db->query(" $db->query(" insert into shown_user( insert into shown_week( jobId, jobId, userId, number when ) values( ) values ( " . $job['id'] . ", " . $job['id'] . ", 1 " . $user['id'] . ", ) now() on duplicate key ) update "); number = number + 1 } ");
  • 25.
    Client X :the graph
  • 26.
    Client X :the numbers 600-1000 updates/sec (peaks up to 1600) 400-1000 updates/sec (peaks up to 2600) 16 core machine
  • 27.
    Client X :panic ! Mail : "MySQL slave is more than 5 minutes behind master" We set it up → who did they blame ? Wait a second !
  • 28.
    Client X :what's causing those peaks ?
  • 29.
    Client X :possible cause ? Code changes ? → According to developers : none Action : turn on general log, analyze with pt-query-digest → 50+-fold increase in queries → Developers : 'Oops we did make a change' After 3 days : 2,5 days behind Every hour : 50 min extra lag
  • 30.
    Client X :But why is the slave lagging ? File : File : master-bin-xxxx.log Slave I/O thread master-bin-xxxx.log p Sl um av g d ad th e S re Q n lo e ad L Bi thr Master Slave
  • 31.
    Client X :Master
  • 32.
  • 33.
    Client X :fix ? foreach ($jobs as $job) { $db->query(" $db->query(" insert into shown_month( insert into shown_today( jobId, jobId, number number ) values( ) values( " . $job['id'] . ", " . $job['id'] . ", 1 1 ) ) on duplicate key on duplicate key update update number = number + 1 number = number + 1 "); "); $db->query(" $db->query(" insert into shown_user( insert into shown_week( jobId, jobId, userId, number when ) values( ) values ( " . $job['id'] . ", " . $job['id'] . ", 1 " . $user['id'] . ", ) now() on duplicate key ) update "); number = number + 1 } ");
  • 34.
    Client X :the code change $todayQuery = " insert into shown_today( jobId, number ) values "; foreach ($jobs as $job) { $todayQuery .= "(" . $job['id'] . ", 1),"; } $todayQuery = substr($todayQuery, -1); $todayQuery .= " ) on duplicate key update number = number + 1 "; $db->query($todayQuery); Result : insert into shown_today values (5, 1), (8, 1), (12, 1), (18, 1), ... Careful : max_allowed_packet !
  • 35.
    Client X :the chosen solution $db->autocommit(false); $db->query(" foreach ($jobs as $job) { insert into shown_month( $db->query(" jobId, insert into shown_today( number jobId, ) values( number " . $job['id'] . ", ) values( 1 " . $job['id'] . ", ) 1 on duplicate key ) update on duplicate key number = number + 1 update "); number = number + 1 $db->query(" "); insert into shown_user( $db->query(" jobId, insert into shown_week( userId, jobId, when number ) values ( ) values( " . $job['id'] . ", " . $job['id'] . ", " . $user['id'] . ", 1 now() ) ) on duplicate key "); update } number = number + 1 $db->commit(); ");
  • 36.
    Client X :conclusion For loops are bad (we already knew that) Add master/slave and it gets much worse Use transactions : it will provide huge performance increase Result : slave caught up 5 days later
  • 37.
    Database → Network Customer Y Top 10 site in Belgium Growing rapidly At peak traffic : Unexplicable latency on database Load on webservers : minimal Load on database servers : acceptable
  • 38.
    Client Y :the network
  • 39.
    Client Y :the network 60GB 700GB 700GB
  • 40.
    Client Y :network overload Cause : Drupal hooks → retrieving data that was not needed Only load data you actually need Don't know at the start ? → Use lazy loading Caching : Same story Memcached/Redis are fast But : data still needs to cross the network
  • 41.
    Network trouble :more than just traffic Customer Z 150.000 visits/day News ticker : XML feed from other site (owned by same customer) Cached for 15 min
  • 42.
    Customer Z –fetching the feed if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) { unlink(APP_DIR . '/tmp/ScrambledSiteName.xml'); file_put_contents( APP_DIR . '/tmp/ScrambledSiteName.xml', file_get_contents('http://www.scrambledsitename.be/xml/feed.xml') ); } $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml'); What's wrong with this code ?
  • 43.
    Customer Z –no feed without the source Feed source
  • 44.
    Customer Z –no feed without the source Feed source
  • 45.
    Customer Z :timeout default_socket_timeout : 60 sec by default Each visitor : 60 sec wait time People keep hitting refresh → more load More active connections → more load Apache hits maximum connections → entire site down
  • 46.
    Customer Z :timeout fix $context = stream_context_create( array( 'http' => array( 'timeout' => 5 ) ) ); if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) { unlink(APP_DIR . '/tmp/ScrambledSiteName.xml'); file_put_contents( APP_DIR . '/tmp/ScrambledSiteName.xml', file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context) ); } $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');
  • 47.
    Customer Z :don't delete from cache $context = stream_context_create( array( 'http' => array( 'timeout' => 5 ) ) ); if (filectime(APP_DIR . '/tmp/ScrambledSiteName.xml') < time() - 900) { unlink(APP_DIR . '/tmp/ScrambledSiteName.xml'); file_put_contents( APP_DIR . '/tmp/ScrambledSiteName.xml', file_get_contents('http://www.scrambledsitename.be/xml/feed.xml', false, $context) ); } $xmlfeed = ParseXmlFeed(APP_DIR . '/tmp/ScrambledSiteName.xml');
  • 48.
    Network resources Usetimeouts for all : fopen curl SOAP … Data source trusted ? → setup a webservice → let them push updates when their feed changes → less load on data source → no timeout issues Add logging → early detection
  • 49.
    Logging Logging =good Logging in PHP using fopen → bad idea : locking issues → Use file_put_contents($filename, $data, FILE_APPEND) For Firefox : FirePHP (add-on for Firebug) Debug logging = bad on production Watch your logs ! Don't log on slow disks → I/O bottlenecks
  • 50.
    File system :I/O bottlenecks Causes : Excessive writes (database updates, logfiles, swapping, …) Excessive reads (non-indexed database queries, swapping, small file system cache, …) How to detect ? top iostat See iowait ? Stop worrying about php, fix the I/O problem !
  • 51.
    File system Worstof all : NFS PHP files → lstat calls Templates → same Sessions → locking issues → corrupt data → store sessions in database, Memcached, Redis, ...
  • 52.
    Much more thancode XML feed Network User Webserver DB server
  • 53.
  • 54.
  • 55.
    Contact Twitter @wimgtr Web http://techblog.wimgodden.be Slides http://www.slideshare.net/wimg E-mail wim.godden@cu.be Please... Rate my talk : http://spkr8.com/t/21141
  • 56.
    Thanks ! Please... Rate my talk : http://spkr8.com/t/21141

Editor's Notes

  • #5 5kbit/sec or 100Mbit/sec ?
  • #7 Let&apos;s talk about code Without : we don&apos;t exist What are most common mistakes in ecosystem Let&apos;s start with the database
  • #12 time spent per query pattern how many queries of that query pattern
  • #19 Get back to what I said Lots of people use ORM - easier - don&apos;t need to write queries - object-oriented but people start doing this Imagine 10000 customers → 10001 queries
  • #20 Not best code Uses deprecated mysql extension no error handling
  • #31 Master : 16 CPU cores 12 cores for SQL 1 core for binlog dump rest for system Slave : 16 CPU cores 1 core for slave I/O 1 core for slave SQL
  • #35 Grouping Works fine, but : maximum size of string ? PHP = no limit MySQL = max_allowed_packet
  • #36 All in a single commit Note : transaction has max. size Possible : combination with previous solution
  • #39 took few moments to figure out No network monitoring → iptraf → 100Mbit/sec limit → packets dropped → connections dropped Customer : upgrade switch Us : why 100Mbit/sec ?
  • #41 Databases → network What other network related issues ?
  • #45 Server on which feed located : crashed Fine for few minutes (cache) 15 minutes : file_get_contents uses default_socket_timeout
  • #47 Better, not perfect. What else is wrong ? Multiple visitors hit expiring cache → file delete → xml feed hit a lot
  • #48 Better, not perfect. What else is wrong ? Multiple visitors hit expiring cache → file delete → xml feed hit a lot
  • #53 How do you treat your data : - where do you get it - how long did you have to wait to get it - how is it transported - how is it processed minimize the amount of data : retrieved transported processed, sent to db and users