Your SlideShare is downloading. ×
Addressing vendor weaknesses in user space (Robert Treat)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Addressing vendor weaknesses in user space (Robert Treat)

520
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
520
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Addressing Vendor Weaknesses in User-Space ROBERT TREAT, OmniTI Highload++ 2011 @robtreat2 xzilla.net +Robert Treat 1Monday, October 3, 11
  • 2. Who Am I? OMNTI - Internet Scalability Consultants Lead Database Operations 2Monday, October 3, 11
  • 3. Who Am I? OMNTI - Internet Scalability Consultants Lead Database Operations “Large Scale” 3Monday, October 3, 11
  • 4. Who Am I? OMNTI - Internet Scalability Consultants Lead Database Operations “Large Scale” High Transactions TB+ Data 4Monday, October 3, 11
  • 5. Who Am I? OMNTI - Internet Scalability Consultants Lead Database Operations “Large Scale” High Transactions TB+ Data Mission Critical 5Monday, October 3, 11
  • 6. Who Am I? Database Operations @OMNTI Postgres MySQL Oracle & More 6Monday, October 3, 11
  • 7. Postgres for Scalability Traditional RDBMS Highly Extensible Runs Everywhere Talks To Everything “BSD” Licensed 15+ Years Development Open Development Community 7Monday, October 3, 11
  • 8. The Bloat Problem Data Footprint Can Be Critical To Performance 8Monday, October 3, 11
  • 9. The Bloat Problem Data Footprint Can Be Critical To Performance Size On Disk Affects The Needs Of RAM, Disk Speed, Storage 9Monday, October 3, 11
  • 10. The Bloat Problem Data Footprint Can Be Critical To Performance Size On Disk Affects The Needs Of RAM, Disk Speed, Storage “Bloat” is unused, wasted disk space, used by the database, but not needed for actual data storage 10Monday, October 3, 11
  • 11. The Bloat Problem Data Footprint Can Be Critical To Performance Size On Disk Affects The Needs Of RAM, Disk Speed, Storage “Bloat” is unused, wasted disk space, taken up by the database, but not needed for actual data storage Why? 11Monday, October 3, 11
  • 12. MVCC Architecture Multiversion Concurrency Control (MVCC) allows Postgres to offer high concurrency even during significant database read/write activity. MVCC specifically offers behavior where "readers never block writers, and writers never block readers". 12Monday, October 3, 11
  • 13. MVCC Architecture • Oracle • MySQL (InnoDB) • Informix • Firebird • MSSQL (optional) 13Monday, October 3, 11
  • 14. MVCC Architecture • Oracle • MySQL (InnoDB) • Informix • Firebird • MSSQL (optional) • CouchDB 14Monday, October 3, 11
  • 15. “Bloat” Manifests Differently, But Is Common • MongoDB (deletes, some updates) • dump/restore • mongod --repair • db.runCommand( { compact : mycollectionname } ) • Lucene (updates) • Hadoop / HDFS (small files) 15Monday, October 3, 11
  • 16. Postgres MVCC Architecture • Implemented Postgres 6.5 • 1999, Vadim Mikheev • MVCC Unmasked • http://momjian.us/main/writings/pgsql/mvcc.pdf 16Monday, October 3, 11
  • 17. Postgres MVCC Architecture • Postgres maintains global transaction counters • Keeps track of transaction counter per row for • creating transaction • removing transaction • Using these counters, Postgres allows different transactions to see different rows, based on visibility rules. 17Monday, October 3, 11
  • 18. Postgres MVCC Architecture • Postgres maintains global transaction counters • Keeps track of transaction counter per row for • creating transaction • removing transaction • Using these counters, Postgres allows different transactions to see different rows, based on visibility rules. Transaction Reading An Old Row Doesn’t Block Transaction Writing A Row 18Monday, October 3, 11
  • 19. MVCC Architecture user_id X42 Create 32 INSERT Expire 19Monday, October 3, 11
  • 20. MVCC Architecture user_id X42 Create 32 INSERT Expire user_id X42 Create 32 DELETE Expire 38 20Monday, October 3, 11
  • 21. MVCC Architecture user_id X69 Create 43 OLD(delete) Expire 56 user_id X69 UPDATE Create 43 NEW(insert) Expire 21Monday, October 3, 11
  • 22. MVCC Architecture user_id X69 Create 43 <~~ DEAD ROW Expire 56 user_id X69 Clean Up / Bloat Create 43 <~~ VISIBLE ROW Expire 22Monday, October 3, 11
  • 23. MVCC Architecture user_id X69 Create 43 <~~ DEAD ROW Expire 56 user_id X69 Clean Up / Bloat Create 43 <~~ VISIBLE ROW Expire Speed Up SQL Commands By Dealing With Clean Up Later 23Monday, October 3, 11
  • 24. How Postgres Deals With Bloat • Heap-Only-Tuples (HOT) • On-The-Fly, Per Page Cleanup • Marks Given Row’s Space Reusable • Update Only 24Monday, October 3, 11
  • 25. How Postgres Deals With Bloat • Heap-Only-Tuples (HOT) • On-The-Fly, Per Page Cleanup • Marks Given Row’s Space Reusable • Update Only • VACUUM • Non-Blocking Bulk Cleanup • Removes End-Of-File Pages • “autovacuum” Process Monitors Tables 25Monday, October 3, 11
  • 26. Problems With Automatic Cleanup • HOT • Update Only • Doesn’t Work With Changing Index Data 26Monday, October 3, 11
  • 27. Problems With Automatic Cleanup • HOT • Update Only • Doesn’t Work When Changing Index Data • VACUUM • Must Wait For Long Transactions To Complete • Costs I/O, Can Only Work So Fast • Can’t Remove Non End-Of-File Pages • Leaves A “High Water Mark” 27Monday, October 3, 11
  • 28. Dealing With Bloat - The Hard Way • VACUUM FULL / CLUSTER • The Good • Reclaims All “Dead Rows” 28Monday, October 3, 11
  • 29. Dealing With Bloat - The Hard Way • VACUUM FULL / CLUSTER • The Good • Reclaims All “Dead Rows” • The Bad • Exclusive Lock • Rewrite All Data In Tables • Needs Working Space • Heavy I/O 29Monday, October 3, 11
  • 30. Monitoring Your Bloat • check_postgres.pl • Nagios plugin • Compares physical size to row size estimates • http://bucardo.org/wiki/Check_postgres • “bloat report” • Script to measure table/index bloat • Compares physical size to row size estimates • http://labs.omniti.com/labs/pgtreats/ browser/trunk/tools/ 30Monday, October 3, 11
  • 31. Dealing With Bloat In Userspace • Solving MVCC Bloat Is A “Hard Problem” • Even a good solution would be hard to implement in core 31Monday, October 3, 11
  • 32. Dealing With Bloat In Userspace • Solving MVCC Bloat Is A “Hard Problem” • Even a good solution would be hard to implement in core • Can we build a tool in user space? • Develop solution quicker • Easier to deploy and maintain • Provide a prototype for future development 32Monday, October 3, 11
  • 33. Dealing With Bloat Redux • Updating A Row Rewrites Data To New Location 33Monday, October 3, 11
  • 34. Dealing With Bloat Redux • Updating A Row Rewrites Data To New Location • Use Vacuum To Mark Old Rows “Reusable” 34Monday, October 3, 11
  • 35. Dealing With Bloat Redux • Updating A Row Rewrites Data To New Location • Use Vacuum To Mark Old Rows “Reusable” • Update Row To Rewrite Data At “Front” Of Page 35Monday, October 3, 11
  • 36. Dealing With Bloat Redux • Updating A Row Rewrites Data To New Location • Use Vacuum To Mark Old Rows “Reusable” • Update Row To Rewrite Data At “Front” Of Page • Use Vacuum To Reclaim Space From End Of File 36Monday, October 3, 11
  • 37. Dealing With Bloat Redux • Updating A Row Rewrites Data To New Location • Use Vacuum To Mark Old Rows “Reusable” • Update Row To Rewrite Data At “Front” Of Page • Use Vacuum To Reclaim Space From End Of File • Put A Script On It • https://labs.omniti.com/pgtreats/trunk/tools/compact_table 37Monday, October 3, 11
  • 38. Dealing With Bloat Redux • “Compact Table” • Requires Lots of Time, I/O • Often Causes Heavy Index Bloat • Heavy Concurrency Bloats Faster Than We Can Recover It 38Monday, October 3, 11
  • 39. Dealing With Bloat For Real! • Enter “pg_reorg” 39Monday, October 3, 11
  • 40. Dealing With Bloat For Real! • Enter “pg_reorg” • Vacuum / Cluster Replacement 40Monday, October 3, 11
  • 41. Dealing With Bloat For Real! • Enter “pg_reorg” • Vacuum / Cluster Replacement • Command Line Tool 41Monday, October 3, 11
  • 42. Dealing With Bloat For Real! • Enter “pg_reorg” • Vacuum / Cluster Replacement • Command Line Tool • Online Table Rewrite • Uses Minimal Locking 42Monday, October 3, 11
  • 43. Dealing With Bloat For Real! • Enter “pg_reorg” • Vacuum / Cluster Replacement • Command Line Tool • Online Table Rewrite • Uses Minimal Locking • Developed By NTT 43Monday, October 3, 11
  • 44. Dealing With Bloat For Real! • Enter “pg_reorg” • Vacuum / Cluster Replacement • Command Line Tool • Online Table Rewrite • Uses Minimal Locking • Developed By NTT • BSD Licensed • C Code • http://pgfoundry.org/projects/reorg/ 44Monday, October 3, 11
  • 45. How pg_reorg Works • Create a log table for changes • Create triggers on the old table to log changes (I/U/D) • Create a new table with a copy of all data in old table • Create all indexes on the new table • Apply all changes from the log table to the new table • Modify the system catalogs information about table files • Drop old table, leaving new table in it’s place 45Monday, October 3, 11
  • 46. How pg_reorg Works • Create a log table for changes • Create triggers on the old table to log changes • Create a new table with a copy of all data in old table • Create all indexes on the new table • Apply all changes from the log table to the new table • MODIFY THE SYSTEM CATALOGS INFORMATION ABOUT THE TABLE FILES (!!!) • Drop old table, leaving the new table in it’s place 46Monday, October 3, 11
  • 47. Dealing With Bloat For Real! Open Source Code The Power Is In Your Hands Look At Code Examine the SQL (User Space Is Really Visible) TEST! 47Monday, October 3, 11
  • 48. Dealing With Bloat For Real! What Does Testing Look Like? Create Some Tables, Create Artificial Bloat, run pg_reorg 48Monday, October 3, 11
  • 49. Dealing With Bloat For Real! What Does Testing Look Like? Create Some Tables, Create Artificial Bloat, run pg_reorg WIN! 49Monday, October 3, 11
  • 50. Dealing With Bloat For Real! Test In “Prod” 50Monday, October 3, 11
  • 51. Dealing With Bloat For Real! Test In “Prod” Find Some Bloated Tables, Make Backup Of Tables, Cross Fingers, pg_reorg 51Monday, October 3, 11
  • 52. Dealing With Bloat For Real! Test In “Prod” Find Some Bloated Tables, Make Backup Of Tables, Cross Fingers, pg_reorg WIN! 52Monday, October 3, 11
  • 53. Dealing With Bloat For Real! Eventually You Have To Use It On Something That Matters 53Monday, October 3, 11
  • 54. pg_reorg In The Real World • Production Database (OLTP) • 540GB Size • 2000 TPS (off-peak time, multiple statements) • Largest Table (pre-reorg) 127GB 54Monday, October 3, 11
  • 55. pg_reorg In The Real World • Production Database (OLTP) • 540GB Size • 2000 TPS (off-peak time, multiple statements) • Largest Table (pre-reorg) 127GB • Rebuild Stats • 5.75 Hours To Rebuild • Reclaimed 52GB Disk Space • No outages reported for Website/API’s 55Monday, October 3, 11
  • 56. pg_reorg In The Real World 56Monday, October 3, 11
  • 57. pg_reorg In The Real World 56Monday, October 3, 11
  • 58. pg_reorg In The Real World 57Monday, October 3, 11
  • 59. pg_reorg In The Real World 57Monday, October 3, 11
  • 60. pg_reorg In The Real World 57Monday, October 3, 11
  • 61. pg_reorg In The Real World YAY! 58Monday, October 3, 11
  • 62. Return Of The Jedi 59Monday, October 3, 11
  • 63. “your overconfidence is your weakness.” -Luke Skywalker 60Monday, October 3, 11
  • 64. “your faith in your friends is yours.” -Emperor Palpatine 61Monday, October 3, 11
  • 65. Sometimes You Can Have Both Trust in NTT’s Code == faith in friends Success in production == overconfidence 62Monday, October 3, 11
  • 66. When Good pg_reorgs Go Bad! WARNING:  unexpected attrdef record found for attr 61 of rel orders WARNING:  1 attrdef record(s) missing for rel orders 63Monday, October 3, 11
  • 67. When Good pg_reorgs Go Bad! WARNING:  unexpected attrdef record found for attr 61 of rel orders WARNING:  1 attrdef record(s) missing for rel orders Yes, On A Production System Yes, Trying To Take 1000’s of Orders Per Second 64Monday, October 3, 11
  • 68. When Good pg_reorgs Go Bad! create table test ( a int4, b int4 default 2112, c bool ); 65Monday, October 3, 11
  • 69. When Good pg_reorgs Go Bad! create table test ( a int4, b int4 default 2112, c bool ); Postgres internals track defaults / constraints based on column position “2”, not column name “b” 66Monday, October 3, 11
  • 70. When Good pg_reorgs Go Bad! create table test ( a int4, b int4 default 2112, c bool ); Postgres internals track defaults / constraints based on column position “2”, not column name “b” If you drop column “a” and then do pg_reorg, column “c” is now column “2”, and default 2112 is on boolean 67Monday, October 3, 11
  • 71. When Good pg_reorgs Go Bad! create table test ( a int4, b int4 default 2112, c bool ); Postgres internals track defaults / constraints based on column position “2”, not column name “b” If you drop column “a” and then do pg_reorg, column “c” is now column “2”, and default 2112 is on boolean This Is Fair - pg_reorg hacks the system tables 68Monday, October 3, 11
  • 72. When Good pg_reorgs Go Bad! Basic Fix: Drop All Defaults And Recreate 69Monday, October 3, 11
  • 73. When Good pg_reorgs Go Bad! Basic Fix: Drop All Defaults And Recreate Alternative Fix: Hack System Catalogs Some More 70Monday, October 3, 11
  • 74. When Good pg_reorgs Go Bad! Basic Fix: Drop All Defaults And Recreate Alternative Fix: Hack System Catalogs Some More Haven’t we had enough system catalog hacking for now? 71Monday, October 3, 11
  • 75. When Good pg_reorgs Go Bad! “now, if youll excuse me, Ill go away and have a heart attack.” 72Monday, October 3, 11
  • 76. What Next? Report Problem To Mailing List Submit A Patch Ultimately The Problem Is Fixed Everyone’s Happy? 73Monday, October 3, 11
  • 77. Hackers Discussion Postgres Development Community Is Funny Sometimes Hard To Get Them To Recognize Problems Not Everyone See Online Rebuild As A Big Problem 74Monday, October 3, 11
  • 78. Hackers Discussion Postgres Development Community Is Funny Sometimes Hard To Get Them To Recognize Problems Not Everyone See Online Rebuild As A Big Problem In All The Fairness, Not Everyone Has This Problem 75Monday, October 3, 11
  • 79. Hackers Discussion Hackers Meeting 2011, Discussion On Internal Queuing System Could Be Used As Underlying Basis For On-Line Rebuilding Until Then... 76Monday, October 3, 11
  • 80. pg_reorg Is A Great Tool! Best Option For Difficult Situation Just Be Careful! 77Monday, October 3, 11
  • 81. THANKS! Highload++ NTT OmniTI Postgres Community Momjian, Depesz, Patel, Kocoloski xzilla.net @robtreat2 + Robert Treat 78Monday, October 3, 11