Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Percona Toolkit
(It's Basically Magic)
SDPHP | Business.com | 05-28-14
Notes:
Who Am I?
https://twitter.com/robertswisher
...
Percona Toolkit
(It's Basically Magic)
SDPHP | Business.com | 05-28-14
Notes:
Who Am I?
https://twitter.com/robertswisher
...
Percona?
Who the hell are they?!
Notes:
Formerly known as Maatkit & Aspersa
Baron Shwartz literally wrote the book on MySQ...
Percona?
Who the hell are they?!
Notes:
Formerly known as Maatkit & Aspersa
Baron Shwartz literally wrote the book on MySQ...
Basically anyone using running MySQL who has lots of data
Who Uses It?
(Or anyone smart and lazy like all of us)
Notes:
In...
Basically anyone using running MySQL who has lots of data
Who Uses It?
(Or anyone smart and lazy like all of us)
Notes:
In...
Tools
Notes:
What Do You Use It For?
- Schema changes
- Data archival
- Query optimization
- Data consistency
- Performanc...
Tools
Notes:
What Do You Use It For?
- Schema changes
- Data archival
- Query optimization
- Data consistency
- Performanc...
Schema Changes
- Always creates a copy of table before 5.6
(except fast index creation in 5.5 or 5.1 with innodb plugin)
-...
Schema Changes
- Always creates a copy of table before 5.6
(except fast index creation in 5.5 or 5.1 with innodb plugin)
-...
pt-online-schema-change
-- dry-run and --execute mutually exclusive
Use nohup with -- password `cat /tmp/pass`
Tune --max-...
pt-online-schema-change
-- dry-run and --execute mutually exclusive
Use nohup with -- password `cat /tmp/pass`
Tune --max-...
Notes:
Data Archival
- LOTS of writing to BIG tables = BAD
- Pruning BIG tables to only frequently accessed data = GOOD
- ...
Notes:
Data Archival
- LOTS of writing to BIG tables = BAD
- Pruning BIG tables to only frequently accessed data = GOOD
- ...
pt-archiver
Create destination table first
--dry-run exists, but --execute doesn't
If you use an auto-increment column, ed...
pt-archiver
Create destination table first
--dry-run exists, but --execute doesn't
If you use an auto-increment column, ed...
pt-archiver
Slave:
pt-archiver --dry-run --ask-pass --progress 5000 
--statistics --bulk-insert --no-delete 
--limit 5000 ...
pt-archiver
Slave:
pt-archiver --dry-run --ask-pass --progress 5000 
--statistics --bulk-insert --no-delete 
--limit 5000 ...
Query Optimization
Notes:
Query Optimization
--filter perl code that must return true for query to appear
--limit shows on...
Query Optimization
Notes:
Query Optimization
--filter perl code that must return true for query to appear
--limit shows on...
# Query 1: 0.00 QPS, 0.01x concurrency, ID 0x76F9EC92751F314A at byte 80096643
# This item is included in the report becau...
# Query 1: 0.00 QPS, 0.01x concurrency, ID 0x76F9EC92751F314A at byte 80096643
# This item is included in the report becau...
pt-table-checksum
Requires STATEMENT based replication for tiered replication
Replication filters are dangerous because a ...
pt-table-checksum
Requires STATEMENT based replication for tiered replication
Replication filters are dangerous because a ...
pt-table-sync
Notes:
pt-table-sync
--dry-run and --execute mutually exclusive
ALWAYS backup first
In a tiered replication ...
pt-table-sync
Notes:
pt-table-sync
--dry-run and --execute mutually exclusive
ALWAYS backup first
In a tiered replication ...
pt-table-sync
Notes:
Notes:
pt-table-sync
Notes:
Notes:
Performance Debugging
- Problems can be random
- Problems only last for a few seconds,
you can't connect and observe fast ...
Performance Debugging
- Problems can be random
- Problems only last for a few seconds,
you can't connect and observe fast ...
pt-stalk
Run as root
--daemonize fork and run in the background
-- sleep length to sleep between collects
-- cycles the nu...
pt-stalk
Run as root
--daemonize fork and run in the background
-- sleep length to sleep between collects
-- cycles the nu...
pt-sift
Pass it the path to the dir used with --data
(default /var/lib/pt-stalk)
Interactive program
Lots of data points c...
pt-sift
Pass it the path to the dir used with --data
(default /var/lib/pt-stalk)
Interactive program
Lots of data points c...
General Admin / Maintenance
pt-slave-restart - try to restart a slave skipping errors
if replication fails
pt-summary - gi...
General Admin / Maintenance
pt-slave-restart - try to restart a slave skipping errors
if replication fails
pt-summary - gi...
Notes:
QUESTIONS?
Notes:
Notes:
QUESTIONS?
Notes:
Upcoming SlideShare
Loading in …5
×

SDPHP - Percona Toolkit (It's Basically Magic)

1,681 views

Published on

Intro talk on the Percona Toolkit, as set of tools for managing things DBAs and developers need to do with MySQL.

Published in: Engineering

SDPHP - Percona Toolkit (It's Basically Magic)

  1. 1. Percona Toolkit (It's Basically Magic) SDPHP | Business.com | 05-28-14 Notes: Who Am I? https://twitter.com/robertswisher https://plus.google.com/+RobertSwisher https://www.linkedin.com/in/robertswisher robert@business.com Notes:
  2. 2. Percona Toolkit (It's Basically Magic) SDPHP | Business.com | 05-28-14 Notes: Who Am I? https://twitter.com/robertswisher https://plus.google.com/+RobertSwisher https://www.linkedin.com/in/robertswisher robert@business.com Notes:
  3. 3. Percona? Who the hell are they?! Notes: Formerly known as Maatkit & Aspersa Baron Shwartz literally wrote the book on MySQL Open-source collection of scripts to help common tasks that every DBA and developer has to do. - Development - Profiling - Configuration - Monitoring - Replication - Same code, same developers, new branding - Source now on LaunchPad (like Percona Server) (https://launchpad.net/percona-toolkit) What is Percona Toolkit? (You should use Percona Server too!) Notes:
  4. 4. Percona? Who the hell are they?! Notes: Formerly known as Maatkit & Aspersa Baron Shwartz literally wrote the book on MySQL Open-source collection of scripts to help common tasks that every DBA and developer has to do. - Development - Profiling - Configuration - Monitoring - Replication - Same code, same developers, new branding - Source now on LaunchPad (like Percona Server) (https://launchpad.net/percona-toolkit) What is Percona Toolkit? (You should use Percona Server too!) Notes:
  5. 5. Basically anyone using running MySQL who has lots of data Who Uses It? (Or anyone smart and lazy like all of us) Notes: InstallationAs of writing current version is 2.2.7 Yum Apt or source Notes:
  6. 6. Basically anyone using running MySQL who has lots of data Who Uses It? (Or anyone smart and lazy like all of us) Notes: InstallationAs of writing current version is 2.2.7 Yum Apt or source Notes:
  7. 7. Tools Notes: What Do You Use It For? - Schema changes - Data archival - Query optimization - Data consistency - Performance debugging - General maintenance Notes:
  8. 8. Tools Notes: What Do You Use It For? - Schema changes - Data archival - Query optimization - Data consistency - Performance debugging - General maintenance Notes:
  9. 9. Schema Changes - Always creates a copy of table before 5.6 (except fast index creation in 5.5 or 5.1 with innodb plugin) - Table is locked during the change - BIG tables = BIG TROUBLE (millions of rows take hours or more) - Used to require trickery like ALTER on slave, promote to master, ALTER on old master, promote to master again (Gets really ugly with master-master or tiered replication) Notes: pt-online-schema-change Triggers are trouble, but can be handled (dropped by default) Foreign keys are trouble, but can be handled (dropped and rebuilt) Takes longer than ALTER TABLE (up to 4x) ALWAYS backup first Notes:
  10. 10. Schema Changes - Always creates a copy of table before 5.6 (except fast index creation in 5.5 or 5.1 with innodb plugin) - Table is locked during the change - BIG tables = BIG TROUBLE (millions of rows take hours or more) - Used to require trickery like ALTER on slave, promote to master, ALTER on old master, promote to master again (Gets really ugly with master-master or tiered replication) Notes: pt-online-schema-change Triggers are trouble, but can be handled (dropped by default) Foreign keys are trouble, but can be handled (dropped and rebuilt) Takes longer than ALTER TABLE (up to 4x) ALWAYS backup first Notes:
  11. 11. pt-online-schema-change -- dry-run and --execute mutually exclusive Use nohup with -- password `cat /tmp/pass` Tune --max-lag and --max load for busy systems Example: nohup pt-online-schema-change --dry-run --alter 'CHANGE `foo` `foo` varchar(24) COLLATE 'latin1_bin' NULL AFTER `bar`' --password `cat /tmp/pass` --print --nocheck-replication-filters --max-load "Threads_connected:60,Threads_running:20" D=your_db,t=really_big_table & Notes: Notes:
  12. 12. pt-online-schema-change -- dry-run and --execute mutually exclusive Use nohup with -- password `cat /tmp/pass` Tune --max-lag and --max load for busy systems Example: nohup pt-online-schema-change --dry-run --alter 'CHANGE `foo` `foo` varchar(24) COLLATE 'latin1_bin' NULL AFTER `bar`' --password `cat /tmp/pass` --print --nocheck-replication-filters --max-load "Threads_connected:60,Threads_running:20" D=your_db,t=really_big_table & Notes: Notes:
  13. 13. Notes: Data Archival - LOTS of writing to BIG tables = BAD - Pruning BIG tables to only frequently accessed data = GOOD - BIG tables more prone to corruption - Deleting from BIG tables = SLOOOOOOW - Long running transactions = REALLY SLOOOOOOOOOW - DELETE locks MyISAM Notes:
  14. 14. Notes: Data Archival - LOTS of writing to BIG tables = BAD - Pruning BIG tables to only frequently accessed data = GOOD - BIG tables more prone to corruption - Deleting from BIG tables = SLOOOOOOW - Long running transactions = REALLY SLOOOOOOOOOW - DELETE locks MyISAM Notes:
  15. 15. pt-archiver Create destination table first --dry-run exists, but --execute doesn't If you use an auto-increment column, edit the schema --limit is good for sequential data, but be careful if bouncing around Use --progress to track May want to archive from slave, then purge from master ALWAYS backup first Notes: Notes:
  16. 16. pt-archiver Create destination table first --dry-run exists, but --execute doesn't If you use an auto-increment column, edit the schema --limit is good for sequential data, but be careful if bouncing around Use --progress to track May want to archive from slave, then purge from master ALWAYS backup first Notes: Notes:
  17. 17. pt-archiver Slave: pt-archiver --dry-run --ask-pass --progress 5000 --statistics --bulk-insert --no-delete --limit 5000 --source D=your_db,t=big_table --dest D=your_db,t=big_table_archive --where "timestamp < '2013-01-01'" Master: pt-archiver --dry-run --ask-pass --progress 5000 --statistics --bulk-delete --purge --limit 5000 --source D=your_db,t=big_table --where "timestamp < '2013-01-01'" Notes: Notes:
  18. 18. pt-archiver Slave: pt-archiver --dry-run --ask-pass --progress 5000 --statistics --bulk-insert --no-delete --limit 5000 --source D=your_db,t=big_table --dest D=your_db,t=big_table_archive --where "timestamp < '2013-01-01'" Master: pt-archiver --dry-run --ask-pass --progress 5000 --statistics --bulk-delete --purge --limit 5000 --source D=your_db,t=big_table --where "timestamp < '2013-01-01'" Notes: Notes:
  19. 19. Query Optimization Notes: Query Optimization --filter perl code that must return true for query to appear --limit shows only the top % of worst queries Notes:
  20. 20. Query Optimization Notes: Query Optimization --filter perl code that must return true for query to appear --limit shows only the top % of worst queries Notes:
  21. 21. # Query 1: 0.00 QPS, 0.01x concurrency, ID 0x76F9EC92751F314A at byte 80096643 # This item is included in the report because it matches --limit. # Scores: V/M = 188.68 # Time range: 2012-02-01 09:20:24 to 2013-10-04 10:47:56 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 1 490 # Exec time 14 384617s 9s 4869s 785s 1292s 385s 833s # Lock time 2 11s 169us 6s 22ms 6ms 290ms 316us # Rows sent 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39 # Rows examine 10 30.01G 0 123.80M 62.71M 117.57M 44.73M 75.78M # Rows affecte 0 0 0 0 0 0 0 0 # Rows read 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39 # Bytes sent 0 21.90M 0 167.52k 45.77k 143.37k 52.07k 8.46k # Tmp tables 8 1.91k 2 4 3.99 3.89 0.16 3.89 # Tmp disk tbl 0 0 0 0 0 0 0 0 # Tmp tbl size 3 3.36G 0 7.98M 7.02M 7.65M 1.26M 7.65M # Query size 0 471.35k 982 986 985.03 964.41 0 964.41 # String: # Databases bdc_ccm # Hosts # InnoDB trxID 13E9F1B2 (1/0%), 1402493D (1/0%)... 488 more # Last errno 0 # Users semuser (488/99%), jackie.lam (1/0%)... 1 more # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # # 10s+ ################################################################ # Tables # SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_inbound'G # SHOW CREATE TABLE `bdc_ccm`.`click_log_inbound`G # SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound_tp'G # SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound_tp`G # SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound'G # SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound`G # EXPLAIN /*!50100 PARTITIONS*/ selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'AS' astag from click_log_inbound a, click_log_outbound_tp b wherea.id = b.inbound_id and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59' and b.partner like'adsense' and b.flag = 0 group by date(a.timestamp), a.referrer union all selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc) , 'FL' astag from click_log_inbound a, click_log_outbound b wherea.id = b.inbound_id and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59' and flag = 0 group by date(a.timestamp), a.referrer union all selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'TP' astag from click_log_inbound a, click_log_outbound_tp b wherea.id = b.inbound_id and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59' and b.partner in ('capterra', 'bdc_network') and b.flag = 0 group by date(a.timestamp), a.referrerG Notes: Data Consistency - Replication isn't perfect - Replication filters - master-master replication - 1062 “DUPLICATE KEY ERROR” - Server crashes - Non-deterministic (aka not idempotent) writes Notes:
  22. 22. # Query 1: 0.00 QPS, 0.01x concurrency, ID 0x76F9EC92751F314A at byte 80096643 # This item is included in the report because it matches --limit. # Scores: V/M = 188.68 # Time range: 2012-02-01 09:20:24 to 2013-10-04 10:47:56 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 1 490 # Exec time 14 384617s 9s 4869s 785s 1292s 385s 833s # Lock time 2 11s 169us 6s 22ms 6ms 290ms 316us # Rows sent 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39 # Rows examine 10 30.01G 0 123.80M 62.71M 117.57M 44.73M 75.78M # Rows affecte 0 0 0 0 0 0 0 0 # Rows read 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39 # Bytes sent 0 21.90M 0 167.52k 45.77k 143.37k 52.07k 8.46k # Tmp tables 8 1.91k 2 4 3.99 3.89 0.16 3.89 # Tmp disk tbl 0 0 0 0 0 0 0 0 # Tmp tbl size 3 3.36G 0 7.98M 7.02M 7.65M 1.26M 7.65M # Query size 0 471.35k 982 986 985.03 964.41 0 964.41 # String: # Databases bdc_ccm # Hosts # InnoDB trxID 13E9F1B2 (1/0%), 1402493D (1/0%)... 488 more # Last errno 0 # Users semuser (488/99%), jackie.lam (1/0%)... 1 more # Query_time distribution # 1us # 10us # 100us # 1ms # 10ms # 100ms # 1s # # 10s+ ################################################################ # Tables # SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_inbound'G # SHOW CREATE TABLE `bdc_ccm`.`click_log_inbound`G # SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound_tp'G # SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound_tp`G # SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound'G # SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound`G # EXPLAIN /*!50100 PARTITIONS*/ selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'AS' astag from click_log_inbound a, click_log_outbound_tp b wherea.id = b.inbound_id and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59' and b.partner like'adsense' and b.flag = 0 group by date(a.timestamp), a.referrer union all selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc) , 'FL' astag from click_log_inbound a, click_log_outbound b wherea.id = b.inbound_id and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59' and flag = 0 group by date(a.timestamp), a.referrer union all selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'TP' astag from click_log_inbound a, click_log_outbound_tp b wherea.id = b.inbound_id and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59' and b.partner in ('capterra', 'bdc_network') and b.flag = 0 group by date(a.timestamp), a.referrerG Notes: Data Consistency - Replication isn't perfect - Replication filters - master-master replication - 1062 “DUPLICATE KEY ERROR” - Server crashes - Non-deterministic (aka not idempotent) writes Notes:
  23. 23. pt-table-checksum Requires STATEMENT based replication for tiered replication Replication filters are dangerous because a failed query can break replication May want to use nohup since it can be slow Notes: Notes:
  24. 24. pt-table-checksum Requires STATEMENT based replication for tiered replication Replication filters are dangerous because a failed query can break replication May want to use nohup since it can be slow Notes: Notes:
  25. 25. pt-table-sync Notes: pt-table-sync --dry-run and --execute mutually exclusive ALWAYS backup first In a tiered replication setup or master-master take extra care to think through what will be done Run on master to sync all slaves pt-table-sync --execute --replicate test.checksum master1 Run on master for slaves individually to sync to master pt-table-sync --execute --sync-to-master slave1 Notes:
  26. 26. pt-table-sync Notes: pt-table-sync --dry-run and --execute mutually exclusive ALWAYS backup first In a tiered replication setup or master-master take extra care to think through what will be done Run on master to sync all slaves pt-table-sync --execute --replicate test.checksum master1 Run on master for slaves individually to sync to master pt-table-sync --execute --sync-to-master slave1 Notes:
  27. 27. pt-table-sync Notes: Notes:
  28. 28. pt-table-sync Notes: Notes:
  29. 29. Performance Debugging - Problems can be random - Problems only last for a few seconds, you can't connect and observe fast enough - Problems like to happen at odd hours; ETL, rollups, reporting, etc - You can't ALWAYS log on Notes: pt-stalk - Creates a lot of files - Output inspected with pt-sift Notes:
  30. 30. Performance Debugging - Problems can be random - Problems only last for a few seconds, you can't connect and observe fast enough - Problems like to happen at odd hours; ETL, rollups, reporting, etc - You can't ALWAYS log on Notes: pt-stalk - Creates a lot of files - Output inspected with pt-sift Notes:
  31. 31. pt-stalk Run as root --daemonize fork and run in the background -- sleep length to sleep between collects -- cycles the number of cycles the var must be true to collect --variable Threads_running and Execution_time are good ones --disk-bytes-free don't collect if this threshold is hit (best practice would be to set --log and --dest to a different disk than your data lives on the same as other mysql logs) Notes: Notes:
  32. 32. pt-stalk Run as root --daemonize fork and run in the background -- sleep length to sleep between collects -- cycles the number of cycles the var must be true to collect --variable Threads_running and Execution_time are good ones --disk-bytes-free don't collect if this threshold is hit (best practice would be to set --log and --dest to a different disk than your data lives on the same as other mysql logs) Notes: Notes:
  33. 33. pt-sift Pass it the path to the dir used with --data (default /var/lib/pt-stalk) Interactive program Lots of data points collected from the time of the incident Notes: Notes:
  34. 34. pt-sift Pass it the path to the dir used with --data (default /var/lib/pt-stalk) Interactive program Lots of data points collected from the time of the incident Notes: Notes:
  35. 35. General Admin / Maintenance pt-slave-restart - try to restart a slave skipping errors if replication fails pt-summary - gives a general summary of the MySQL instance pt-upgrade - tests logged queries against a new MySQL version pt-config-diff - show formatted diff of my.cnf files pt-heartbeat - update table on master with heartbeat data from slaves pt-kill - kill MySQL threads according to filters pt-index-usage - report on index structure and usage pt-variable-advisor - looks at runtime vars and makes suggestions Notes: http:s//cloud.percona.com to sign up for beta Percona Cloud Tools Notes:
  36. 36. General Admin / Maintenance pt-slave-restart - try to restart a slave skipping errors if replication fails pt-summary - gives a general summary of the MySQL instance pt-upgrade - tests logged queries against a new MySQL version pt-config-diff - show formatted diff of my.cnf files pt-heartbeat - update table on master with heartbeat data from slaves pt-kill - kill MySQL threads according to filters pt-index-usage - report on index structure and usage pt-variable-advisor - looks at runtime vars and makes suggestions Notes: http:s//cloud.percona.com to sign up for beta Percona Cloud Tools Notes:
  37. 37. Notes: QUESTIONS? Notes:
  38. 38. Notes: QUESTIONS? Notes:

×