Divide And Be Conquered? 04/23/2009  Brooks Johnson [email_address]
Performance not guaranteed <ul><li>Does not automatically improve performance
Often degrades performance
In theory could improve concurrency
First cover InnoDB
Then MyISAM
Finish up with admin improvements </li></ul>
No Partition - Classic Primary Key  CREATE TABLE  `test`.`SaleO` ( `orderId` int(11) NOT NULL, `customerId` int(11) NOT NU...
Partitioning by Date (Month) CREATE TABLE  `test`.`SaleP` ( `orderId` int(11) NOT NULL, `customerId` int(11) NOT NULL, `pr...
Partition by Order CREATE TABLE  `test`.`SaleOPO` ( `orderId` int(11) NOT NULL, `customerId` int(11) NOT NULL, `productId`...
Sum Month without Partitioning explain partitions select sum(purchaseCost) from SaleO  where purchaseDate >= '2001-12-01' ...
Sum Month with Date Partitions explain partitions select sum(purchaseCost) from SaleP  where purchaseDate >= '2001-12-01' ...
Performance improvement due to clustering by date, not partitioning </li></ul>
Sum Month with Order Partitions mysql>  explain partitions select sum(purchaseCost) from SaleOPO ->  where purchaseDate >=...
A bit longer than the non-partitioned table (34 seconds)
Partitioning might have caused a small overhead in this case </li></ul>
Select orders 10,000 times explain partitions select unit from SaleO where orderId = 1 G *************************** 1. ro...
One process randomly selecting 10,000 orders </li></ul>
Select orders 10,000 times explain partitions select unit from SaleP where orderId = 1 G *************************** 1. ro...
Over twice as long as the non-partitioned table (55 seconds) </li></ul>
Non-Partitioned Index
Partitioned Index
Select orders 10,000 times explain partitions select unit  from SaleOPO where orderId = 1 G  *************************** 1...
Partitioning might result in small overhead for each query  </li></ul>
Insert 800,000 rows into table <ul><li>8 processes each inserting 100,000 rows in parallel
Non-partitioned – 173 seconds
Partitioned by date – 300 seconds
Upcoming SlideShare
Loading in …5
×

Divide and Be Conquered?

2,058 views

Published on

MySQL Conference Presentation on the benefits and drawbacks on partitioning

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,058
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
64
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Divide and Be Conquered?

  1. 1. Divide And Be Conquered? 04/23/2009 Brooks Johnson [email_address]
  2. 2. Performance not guaranteed <ul><li>Does not automatically improve performance
  3. 3. Often degrades performance
  4. 4. In theory could improve concurrency
  5. 5. First cover InnoDB
  6. 6. Then MyISAM
  7. 7. Finish up with admin improvements </li></ul>
  8. 8. No Partition - Classic Primary Key CREATE TABLE `test`.`SaleO` ( `orderId` int(11) NOT NULL, `customerId` int(11) NOT NULL, `productId` int(11) NOT NULL, `productBigId` int(11) NOT NULL, `unit` int(11) NOT NULL, `purchaseAmount` decimal(16,2) NOT NULL, `purchaseCost` decimal(16,2) NOT NULL, `purchaseDate` datetime NOT NULL, PRIMARY KEY (`orderId`), KEY `idx_sale_purchasedate` (`purchaseDate`), KEY `idx_sale_product` (`productId`), KEY `idx_sale_customer` (`customerId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  9. 9. Partitioning by Date (Month) CREATE TABLE `test`.`SaleP` ( `orderId` int(11) NOT NULL, `customerId` int(11) NOT NULL, `productId` int(11) NOT NULL, `productBigId` int(11) NOT NULL, `unit` int(11) NOT NULL, `purchaseAmount` decimal(16,2) NOT NULL, `purchaseCost` decimal(16,2) NOT NULL, `purchaseDate` datetime NOT NULL, PRIMARY KEY (`purchaseDate`,`orderId`), KEY `idx_sale_product` (`productId`), KEY `idx_sale_order` (`orderId`), KEY `idx_SaleP_orderId` (`orderId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY RANGE (to_days(purchaseDate)) (PARTITION p0 VALUES LESS THAN (730882) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (730910) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN (730941) ENGINE = InnoDB, PARTITION p3 VALUES LESS THAN (730971) ENGINE = InnoDB, PARTITION p4 VALUES LESS THAN (731002) ENGINE = InnoDB, PARTITION p5 VALUES LESS THAN (731032) ENGINE = InnoDB, PARTITION p6 VALUES LESS THAN (731063) ENGINE = InnoDB, PARTITION p7 VALUES LESS THAN (731094) ENGINE = InnoDB, PARTITION p8 VALUES LESS THAN (731124) ENGINE = InnoDB, PARTITION p9 VALUES LESS THAN (731155) ENGINE = InnoDB, PARTITION p10 VALUES LESS THAN (731185) ENGINE = InnoDB, PARTITION p11 VALUES LESS THAN (731216) ENGINE = InnoDB, PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
  10. 10. Partition by Order CREATE TABLE `test`.`SaleOPO` ( `orderId` int(11) NOT NULL, `customerId` int(11) NOT NULL, `productId` int(11) NOT NULL, `productBigId` int(11) NOT NULL, `unit` int(11) NOT NULL, `purchaseAmount` decimal(16,2) NOT NULL, `purchaseCost` decimal(16,2) NOT NULL, `purchaseDate` datetime NOT NULL, PRIMARY KEY (`orderId`), KEY `idx_sale_purchasedate` (`purchaseDate`), KEY `idx_sale_product` (`productId`), KEY `idx_sale_customer` (`customerId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY HASH (orderID) PARTITIONS 12
  11. 11. Sum Month without Partitioning explain partitions select sum(purchaseCost) from SaleO where purchaseDate >= '2001-12-01' and purchaseDate < '2002-01-01'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleO partitions: NULL type: range possible_keys: idx_sale_purchasedate key: idx_sale_purchasedate key_len: 8 ref: NULL rows: 18219104 Extra: Using where 1 row in set (0.00 sec) <ul><li>Take 34 seconds to execute </li></ul>
  12. 12. Sum Month with Date Partitions explain partitions select sum(purchaseCost) from SaleP where purchaseDate >= '2001-12-01' and purchaseDate < '2002-01-01' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleP partitions: p11,p12 type: range possible_keys: PRIMARY key: PRIMARY key_len: 8 ref: NULL rows: 4238758 Extra: Using where 1 row in set (0.00 sec) <ul><li>Takes 20 seconds to run (34 non-partitioned)
  13. 13. Performance improvement due to clustering by date, not partitioning </li></ul>
  14. 14. Sum Month with Order Partitions mysql> explain partitions select sum(purchaseCost) from SaleOPO -> where purchaseDate >= '2001-12-01' -> and purchaseDate < '2002-01-01' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleOPO partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11 type: range possible_keys: idx_sale_purchasedate key: idx_sale_purchasedate key_len: 8 ref: NULL rows: 13863552 Extra: Using where 1 row in set (0.00 sec) <ul><li>Takes 39 seconds to run
  15. 15. A bit longer than the non-partitioned table (34 seconds)
  16. 16. Partitioning might have caused a small overhead in this case </li></ul>
  17. 17. Select orders 10,000 times explain partitions select unit from SaleO where orderId = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleO partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.01 sec) <ul><li>55 seconds for the non-partitioned table
  18. 18. One process randomly selecting 10,000 orders </li></ul>
  19. 19. Select orders 10,000 times explain partitions select unit from SaleP where orderId = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleP partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12 type: ref possible_keys: idx_sale_order,idx_SaleP_orderId key: idx_sale_order key_len: 4 ref: const rows: 13 Extra: 1 row in set (0.17 sec) <ul><li>139 seconds for the data partitioned table
  20. 20. Over twice as long as the non-partitioned table (55 seconds) </li></ul>
  21. 21. Non-Partitioned Index
  22. 22. Partitioned Index
  23. 23. Select orders 10,000 times explain partitions select unit from SaleOPO where orderId = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleOPO partitions: p1 type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.02 sec) <ul><li>57 seconds – same as no partition (well 2 seconds slower)
  24. 24. Partitioning might result in small overhead for each query </li></ul>
  25. 25. Insert 800,000 rows into table <ul><li>8 processes each inserting 100,000 rows in parallel
  26. 26. Non-partitioned – 173 seconds
  27. 27. Partitioned by date – 300 seconds
  28. 28. All the data was added to the last date partition
  29. 29. Partitioned by order – 237 seconds
  30. 30. Data was added to all 12 partitions
  31. 31. Partitioning “seemed” to add overhead </li></ul>
  32. 32. MyISAM without Partitioning CREATE TABLE `test`.`SaleI` ( `orderId` int(11) NOT NULL AUTO_INCREMENT, `customerId` int(11) NOT NULL, `productId` int(11) NOT NULL, `productBigId` int(11) NOT NULL, `unit` int(11) NOT NULL, `purchaseAmount` decimal(16,2) NOT NULL, `purchaseCost` decimal(16,2) NOT NULL, `purchaseDate` datetime NOT NULL, PRIMARY KEY (`orderId`), KEY `idx_sale_product` (`productId`), KEY `idx_sale_customer` (`customerId`), KEY `idx_SaleI_purchaseDate` (`purchaseDate`) ) ENGINE=MyISAM AUTO_INCREMENT=121900002 DEFAULT CHARSET=utf8;
  33. 33. MyISAM Partitioning by Date CREATE TABLE `test`.`SaleIP` ( `orderId` int(11) NOT NULL AUTO_INCREMENT, `customerId` int(11) NOT NULL, `productId` int(11) NOT NULL, `productBigId` int(11) NOT NULL, `unit` int(11) NOT NULL, `purchaseAmount` decimal(16,2) NOT NULL, `purchaseCost` decimal(16,2) NOT NULL, `purchaseDate` datetime NOT NULL, PRIMARY KEY (`purchaseDate`,`orderId`), KEY `idx_sale_order` (`orderId`), KEY `idx_sale_product` (`productId`), KEY `idx_saleIP_customer` (`customerId`) ) ENGINE=MyISAM AUTO_INCREMENT=122200002 DEFAULT CHARSET=utf8 PARTITION BY RANGE (to_days(purchaseDate)) (PARTITION p0 VALUES LESS THAN (730882) ENGINE = MyISAM, PARTITION p1 VALUES LESS THAN (730910) ENGINE = MyISAM, PARTITION p2 VALUES LESS THAN (730941) ENGINE = MyISAM, PARTITION p3 VALUES LESS THAN (730971) ENGINE = MyISAM, PARTITION p4 VALUES LESS THAN (731002) ENGINE = MyISAM, PARTITION p5 VALUES LESS THAN (731032) ENGINE = MyISAM, PARTITION p6 VALUES LESS THAN (731063) ENGINE = MyISAM, PARTITION p7 VALUES LESS THAN (731094) ENGINE = MyISAM, PARTITION p8 VALUES LESS THAN (731124) ENGINE = MyISAM, PARTITION p9 VALUES LESS THAN (731155) ENGINE = MyISAM, PARTITION p10 VALUES LESS THAN (731185) ENGINE = MyISAM, PARTITION p11 VALUES LESS THAN (731216) ENGINE = MyISAM, PARTITION p12 VALUES LESS THAN MAXVALUE ENGINE = MyISAM)
  34. 34. MyISAM Order Partitioned CREATE TABLE `test`.`SaleIPO` ( `orderId` int(11) NOT NULL AUTO_INCREMENT, `customerId` int(11) NOT NULL, `productId` int(11) NOT NULL, `productBigId` int(11) NOT NULL, `unit` int(11) NOT NULL, `purchaseAmount` decimal(16,2) NOT NULL, `purchaseCost` decimal(16,2) NOT NULL, `purchaseDate` datetime NOT NULL, PRIMARY KEY (`orderId`), KEY `idx_sale_purchaseDate` (`purchaseDate`), KEY `idx_sale_product` (`productId`), KEY `idx_sale_customer` (`customerId`) ) ENGINE=MyISAM AUTO_INCREMENT=120900003 DEFAULT CHARSET=utf8 PARTITION BY HASH (orderID) PARTITIONS 12
  35. 35. Sum Month without Partitioning explain partitions select sum(purchaseCost) from SaleI where purchaseDate >= '2001-12-01' and purchaseDate < '2002-01-01' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleI partitions: NULL type: range possible_keys: idx_SaleI_purchaseDate key: idx_SaleI_purchaseDate key_len: 8 ref: NULL rows: 16447915 Extra: Using where 1 row in set (0.00 sec) <ul><li>Take 14 seconds to execute </li></ul>
  36. 36. Sum Month with Date Partitioned explain partitions select sum(purchaseCost) from SaleIP where purchaseDate >= '2001-12-01' and purchaseDate < '2002-01-01' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleIP partitions: p11,p12 type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 121300000 Extra: Using where 1 row in set (0.00 sec) <ul><li>5 seconds (14 seconds non-partitioned)
  37. 37. Real improvement </li></ul>
  38. 38. Sum Month with Order Partition explain partitions select sum(purchaseCost) from SaleIPO where purchaseDate >= '2001-12-01' and purchaseDate < '2002-01-01' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleIPO partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11 type: range possible_keys: idx_sale_purchaseDate key: idx_sale_purchaseDate key_len: 8 ref: NULL rows: 11695184 Extra: Using where 1 row in set (0.00 sec) <ul><li>14 seconds
  39. 39. No difference from non-partitioned </li></ul>
  40. 40. Select orders 10,000 times explain partitions select unit from SaleI where orderId = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleI partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.02 sec) <ul><li>96 seconds for non-partitioned
  41. 41. One process selecting 10,000 random orders </li></ul>
  42. 42. Select orders 10,000 times explain partitions select unit from SaleIP where orderId = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleIP partitions: p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12 type: ref possible_keys: idx_sale_order key: idx_sale_order key_len: 4 ref: const rows: 13 Extra: 1 row in set (0.00 sec) <ul><li>111 seconds for date partitioned (96 non-partitioned)
  43. 43. A bit worse than non-partitioned, but not that bad </li></ul>
  44. 44. Select orders 10,000 times explain partitions select unit from SaleIPO where orderId = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: SaleIPO partitions: p1 type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.00 sec) <ul><li>102 seconds for order partitioned
  45. 45. Worse than non-partitioned (96 non-partitioned)
  46. 46. Better than date partitioned (111 seconds)
  47. 47. Partitioning added overhead again </li></ul>
  48. 48. Insert 800,000 rows into MyIsam <ul><li>8 processes each inserting 100,000 rows in parallel
  49. 49. Non-partitioned table takes 702 seconds
  50. 50. Date partitioned table takes 47 seconds
  51. 51. All the data was added to the last date partition
  52. 52. Order partitioned table takes 974 seconds
  53. 53. The data was added to all 12 orderId partitions
  54. 54. Date partitioned table was the fastest by far
  55. 55. So, real concurrency improvement in this case </li></ul>
  56. 56. Partitioning Admin Improvements <ul><li>ANALYZE PARTITION , CHECK PARTITION , OPTIMIZE PARTITION , REBUILD PARTITION , and REPAIR PARTITION
  57. 57. No longer use “OPTIMIZE TABLE SaleP”
  58. 58. Instead use “ALTER TABLE SaleP OPTIMIZE PARTITION p12”
  59. 59. Optimizing just the most recent partition can be over an order of magnitude faster than a full table optimization
  60. 60. One partition is far easier to fit into memory and much less data to sort
  61. 61. Dropping a partition is much faster than deleting rows </li></ul>
  62. 62. Partitioning or Disk Striping Partitioning on Different Disks
  63. 63. Partitioning <ul><li>Not a turbo button
  64. 64. Can improve performance
  65. 65. Can degrade performance
  66. 66. Will improve administrative tasks
  67. 67. Performance depends on what is partitioned
  68. 68. Performance also depends on data distribution
  69. 69. Still more to learn
  70. 70. Any questions? </li></ul>

×