Big Bad “Upgraded” Postgres                Robert Treat                        / PresentationWednesday, May 23, 12
Intro          • Robert Treat                • xzilla.net                • @robtreat2                • +RobertTreatWednesd...
Intro, part 2                • COO @ OmniTI                        • Full Stack Tech Consulting                        • R...
Philosophy                  OmniTI has a reputation for scalable web                      applications and architectures. ...
PhilosophyWednesday, May 23, 12
So, What Is Big Bad Postgres?                        • Original Project                         • Convert TB+ Sized Oracle...
So, What Is Big Bad Postgres?                                 OLTP Instance:                                 Drives the Si...
So, What Is Big Bad Postgres?                                 OLTP Instance:                                 Drives the Si...
So, What Is Big Bad Postgres?                         ProTip                   •    Remove Oracle                        L...
So, What Is Big Bad Postgres?                         ProTip                     BroTip                   •    Remove Orac...
So, What Is Big Bad Postgres?                        • Missing Features (Postgres 8.1)                         • Heterogen...
So, What Is Big Bad Postgres?                        • Save $500K in Licenses                          • $100K in labor   ...
Big Bad “Broken” Postgres                        • Feb 2008, disaster struck                          • disk failures     ...
Big Bad “Broken” Postgres      Using ZFS snapshots, we were able to modify Postgres       code, deploy / test on productio...
Big Bad “Broken” Postgres                        • Fallout:                          • New Machines (2 of them!)          ...
So, What Is Big Bad Postgres?                                 OLTP Instance:                                 Drives the Si...
Big Bad “Upgraded” Postgres                    robert@omniti.com at 12:24 on 2009-02-17                  8.4 is approachin...
Big Bad “Upgraded” Postgres          • Issues to handle                • 3TB, compressed                • Limited by hardw...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres          • Options?Wednesday, May 23, 12
Big Bad “Upgraded” Postgres          • Options?                • SLONY - Well known, but mostly unusable                  ...
Big Bad “Upgraded” Postgres          • Options?                • SLONY - Well known, but mostly unusable                  ...
Big Bad “Upgraded” Postgres          • Options?                • SLONY - Well known, but mostly unusable                  ...
Big Bad “Upgraded” Postgres          • 8.4 dev pg_migrator               • Ran into bugs with Solaris support during dev  ...
Big Bad “Upgraded” Postgres          • 8.4.0 pg_migrator (Aug 2009)                        • packages all built without ma...
Big Bad “Upgraded” Postgres          • 9.0 upgrade?                        • Jan 2011, 9.0 is in development              ...
Big Bad “Upgraded” Postgres          • Some tools we use:                • dblink                • plperl / dbilink       ...
Big Bad “Upgraded” Postgres          • SNAG #1               • can’t symlink across filesystems                        • (...
Big Bad “Upgraded” Postgres          • SNAG #1               • can’t symlink across filesystems                        • (...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres          • SNAG #2          • un-upgradable data typesWednesday, May 23, 12
Big Bad “Upgraded” Postgres          • SNAG #2          • un-upgradable data types               • left over pg_reorg cruf...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres          • SNAG #2, part b          • un-upgradable data typesWednesday, May 23, 12
Big Bad “Upgraded” Postgres          • SNAG #2, part b          • un-upgradable data types               • “NAME” type    ...
Big Bad “Upgraded” Postgres          • SNAG #2, part c          • un-upgradable data types               • “NAME” typeWedn...
Big Bad “Upgraded” Postgres          • SNAG #2, part c          • un-upgradable data types               • “NAME” type    ...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres                              -- plan of attack --                       swap in minimal cron /...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres                         -- actual attack --                  swap in minimal cron / configs    ...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres    CREATE ROLE asha;    ALTER ROLE asha SET role TO    omniti;    .. sometime later ...    CRE...
Big Bad “Upgraded” Postgres    Added to TODO:    ! Allow pg_dumpall to output    restorable ALTER USER/DATABASE    SET set...
Big Bad “Upgraded” Postgres    We could have done something where we dropped or    modified all roles and recreated them af...
Big Bad “Upgraded” Postgres    We could have done something where we dropped or    modified all roles and recreated them af...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres                        Meanwhile, at another client, not far away...Wednesday, May 23, 12
Big Bad “Upgraded” Postgres          • more pg_upgrade bugs               • Fix pg_upgrades handling of TOAST tables      ...
Big Bad “Upgraded” Postgres          • more pg_upgrade bugs               • Fix pg_upgrades handling of TOAST tables      ...
Big Bad “Upgraded” Postgres          • back to the role bug               • after ~ 3 months, the patch was accepted      ...
Big Bad “Upgraded” Postgres          • back to the role bug               • after ~ 3 months, the patch was accepted      ...
Big Bad “Upgraded” Postgres         • 9.1 released in September, 2011               • 9.1 upgrade testing begins          ...
Big Bad “Upgraded” Postgres         • 9.1 released in September, 2011               • 9.1 upgrade testing begins          ...
Big Bad “Upgraded” Postgres         • doh #1               • non-empty tablespace               • left over from some 9.1 ...
Big Bad “Upgraded” Postgres         • doh #1               • non-empty tablespace               • left over from some 9.1 ...
Big Bad “Upgraded” Postgres          • doh #2               • new bad data types had snuck in (more                   name...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres          • doh #3               • actually it worked*Wednesday, May 23, 12
Big Bad “Upgraded” Postgres          • doh #3               • actually it worked*               • begin brining services b...
Big Bad “Upgraded” Postgres          • doh #3               • actually it worked*               • begin brining services b...
Big Bad “Upgraded” Postgres          • Replication was running               • 100’s of tables          • ETL services not...
Big Bad “Upgraded” Postgres                             Things Were Sailing Along FineWednesday, May 23, 12
Big Bad “Upgraded” Postgres             “if this were the movie titanic, this is             the scene where you see the t...
Big Bad “Upgraded” Postgres             NOTICE: ERROR: value too long for type character             varying(40) at line 9...
Big Bad “Upgraded” Postgres             NOTICE: ERROR: value too long for type character             varying(40) at line 9...
Big Bad “Upgraded” Postgres          • Verified function with debug output               • function creates a temp table  ...
Big Bad “Upgraded” Postgres          • function with same data working on             the other server          • verified...
Big Bad “Upgraded” Postgres                           Comparing Bad Data - The “Good”    -[ RECORD 1 ]----+---------------...
Big Bad “Upgraded” Postgres                            Comparing Bad Data - The “Bad”    -[ RECORD 1 ]----+---------------...
Big Bad “Upgraded” Postgres                        I hate encoding issuesWednesday, May 23, 12
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres               • Ruled out several config options                   (lc_*, locale on the machin...
Big Bad “Upgraded” Postgres               • Ruled out several config options                   (lc_*, locale on the machin...
Big Bad “Upgraded” Postgres               • Ruled out several config options                   (lc_*, locale on the machin...
Big Bad “Upgraded” Postgres               • wrote perl script that used dbd:pg                   to grab data from oracle ...
Big Bad “Upgraded” PostgresWednesday, May 23, 12
Big Bad “Upgraded” Postgres              postgres=# show client_encoding;               client_encoding              -----...
Big Bad “Upgraded” Postgres              postgres=# show client_encoding;               client_encoding              -----...
Big Bad “Upgraded” Postgres              postgres=# show client_encoding;               client_encoding              -----...
Big Bad “Upgraded” Postgres                        • check lang var                         • pg user                     ...
Big Bad “Upgraded” Postgres                “But Wait, Maybe It Is The LC Stuff”               -bash-3.00$ sudo pargs -e 17...
Big Bad “Upgraded” Postgres                “But Wait, Maybe It Is The LC Stuff”                 -bash-3.00$ sudo pargs -e ...
Big Bad “Upgraded” Postgres                “But Wait, Maybe It Is The LC Stuff”                 -bash-3.00$ sudo pargs -e ...
Big Bad “Upgraded” Postgres                “But Wait, Maybe It Is The LC Stuff”                 After more experimentation...
Big Bad “Upgraded” Postgres                           have you figured it out yet?Wednesday, May 23, 12
Big Bad “Upgraded” Postgres                                pg_enable_utf8Wednesday, May 23, 12
Big Bad “Upgraded” Postgres                                $dbh->{pg_enable_utf8} = 1;             Force strings passed to...
Big Bad “Upgraded” Postgres                               $dbh->{pg_enable_utf8} = 1;   Actually a change in 9.0    http:/...
Big Bad “Upgraded” Postgres                           My 2:00AM Summary...           “In any case, this seems like a horri...
Big Bad “Upgraded” Postgres                            One last broken job...  SELECT     42, cntry_abbr as country,     d...
Big Bad “Upgraded” Postgres                            One last broken job...        Default output of bytea columns was c...
Big Bad “Upgraded” Postgres                            One last broken job...  SELECT     42, cntry_abbr as country,     d...
Aftermath          • did utf8 issue cause data corruption?               • extensive post-upgrade testing               • ...
Aftermath          • did utf8 issue cause data corruption?               • extensive post-upgrade testing               • ...
Aftermath                        pg_upgrade works pretty wellWednesday, May 23, 12
Alls Well That Ends Well                             We upgraded the other boxWednesday, May 23, 12
Alls Well That Ends Well                                 Took ~ 45 minutesWednesday, May 23, 12
Alls Well That Ends Well                                   Nothing brokeWednesday, May 23, 12
Alls Well That Ends Well                                        Yet?                                         ;-)Wednesday,...
THE END                              Thanks!                               PGCon                         omniti dba team  ...
Wednesday, May 23, 12
Upcoming SlideShare
Loading in …5
×

Big Bad "Upgraded" Postgres

3,665 views

Published on

This talk covers a long running upgrade project of a multi-terabyte database from Postgres 8.3 to Postgres 9.1, by way of pg_migrator. We discuss both technical and non-technical reasons why the project took several years to complete.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,665
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big Bad "Upgraded" Postgres

  1. 1. Big Bad “Upgraded” Postgres Robert Treat / PresentationWednesday, May 23, 12
  2. 2. Intro • Robert Treat • xzilla.net • @robtreat2 • +RobertTreatWednesday, May 23, 12
  3. 3. Intro, part 2 • COO @ OmniTI • Full Stack Tech Consulting • Remote Database Management and Consulting • Large Scale / Mission Critical • We’re Hiring!Wednesday, May 23, 12
  4. 4. Philosophy OmniTI has a reputation for scalable web applications and architectures. We didnt learn this stuff overnight. Like many success stories, we acquired experience through trial and error, constant collaboration between development and operations teams, and an unwavering commitment to excellence. But we still lean on our friends and peers to see how things can be done better.Wednesday, May 23, 12
  5. 5. PhilosophyWednesday, May 23, 12
  6. 6. So, What Is Big Bad Postgres? • Original Project • Convert TB+ Sized Oracle ODS To Postgres • 2005, Postgres 8.1Wednesday, May 23, 12
  7. 7. So, What Is Big Bad Postgres? OLTP Instance: Drives the Site Warm Standby Oracle Oracle mysql Data Warehouse logging, bulk processing Oracle mysqlWednesday, May 23, 12
  8. 8. So, What Is Big Bad Postgres? OLTP Instance: Drives the Site Warm Standby Oracle Oracle mysql Data Warehouse logging, bulk processing XXX Postgres mysqlWednesday, May 23, 12
  9. 9. So, What Is Big Bad Postgres? ProTip • Remove Oracle Licensing Costs • Remove Oracle Add-On CostsWednesday, May 23, 12
  10. 10. So, What Is Big Bad Postgres? ProTip BroTip • Remove Oracle • Missing features Licensing Costs • Remove Oracle Add-On • Upgrades are a problem CostsWednesday, May 23, 12
  11. 11. So, What Is Big Bad Postgres? • Missing Features (Postgres 8.1) • Heterogeneous Replication (dbi-link) • Autonomous Transactions (dblink) • Backups (zfs) • Aggregate SQL (plpgsql) • Large Selects (cursors?) • Upgrades (?)Wednesday, May 23, 12
  12. 12. So, What Is Big Bad Postgres? • Save $500K in Licenses • $100K in labor • Took Roughly 6 Months • Built lots of useful tools • Learned a lot about Postgres • Everybody’s happy http://lethargy.org/~jesus/writes/big-bad-postgresqlWednesday, May 23, 12
  13. 13. Big Bad “Broken” Postgres • Feb 2008, disaster struck • disk failures • memory issues • software bugs • no failover box! • disaster recovery ensuedWednesday, May 23, 12
  14. 14. Big Bad “Broken” Postgres Using ZFS snapshots, we were able to modify Postgres code, deploy / test on production copy of data, and eventually get a running system.Wednesday, May 23, 12
  15. 15. Big Bad “Broken” Postgres • Fallout: • New Machines (2 of them!) • Upgrade to Postgres 8.3 • given corrupt data, dump/restore was forced • made life easier operationally • Happiness restored!Wednesday, May 23, 12
  16. 16. So, What Is Big Bad Postgres? OLTP Instance: Drives the Site Warm Standby Oracle Oracle “identical” Data Warehouse Postgres Postgres 8.3 8.3Wednesday, May 23, 12
  17. 17. Big Bad “Upgraded” Postgres   robert@omniti.com at 12:24 on 2009-02-17 8.4 is approaching beta, and has some good features we could take advantage of on pgods, if we could get it to upgrade.Wednesday, May 23, 12
  18. 18. Big Bad “Upgraded” Postgres • Issues to handle • 3TB, compressed • Limited by hardware • Limited by spare timeWednesday, May 23, 12
  19. 19. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  20. 20. Big Bad “Upgraded” Postgres • Options?Wednesday, May 23, 12
  21. 21. Big Bad “Upgraded” Postgres • Options? • SLONY - Well known, but mostly unusable solution (issues with partitioned tables are primary culprit, but lots of other dynamic ddl as well)Wednesday, May 23, 12
  22. 22. Big Bad “Upgraded” Postgres • Options? • SLONY - Well known, but mostly unusable solution (issues with partitioned tables are primary culprit, but lots of other dynamic ddl as well) • Mammoth Replicator - can replicate between versions, but slave creation is a problem.Wednesday, May 23, 12
  23. 23. Big Bad “Upgraded” Postgres • Options? • SLONY - Well known, but mostly unusable solution (issues with partitioned tables are primary culprit, but lots of other dynamic ddl as well) • Mammoth Replicator - can replicate between versions, but slave creation is a problem. • “8.4 dev tree has a pg_migrator script in it, for in place upgrades, this should be investigated in our environment.”Wednesday, May 23, 12
  24. 24. Big Bad “Upgraded” Postgres • 8.4 dev pg_migrator • Ran into bugs with Solaris support during dev • "file.c", line 592: warning: argument #4 is incompatible with prototype: • Spent ~ 1 month on this before it got put on the back burnerWednesday, May 23, 12
  25. 25. Big Bad “Upgraded” Postgres • 8.4.0 pg_migrator (Aug 2009) • packages all built without major issues • 8.4.0 had a bug with indexes & plperl • so, we delayed • actually, we postponedWednesday, May 23, 12
  26. 26. Big Bad “Upgraded” Postgres • 9.0 upgrade? • Jan 2011, 9.0 is in development • Given months of testing, we focused on 9.0Wednesday, May 23, 12
  27. 27. Big Bad “Upgraded” Postgres • Some tools we use: • dblink • plperl / dbilink • fuzzystring • freespacemap • pg_reorg • secure check postgresWednesday, May 23, 12
  28. 28. Big Bad “Upgraded” Postgres • SNAG #1 • can’t symlink across filesystems • (aka zfs datasets)Wednesday, May 23, 12
  29. 29. Big Bad “Upgraded” Postgres • SNAG #1 • can’t symlink across filesystems • (aka zfs datasets) Given $PGDATA = /pgsql/main NO: YES: /pgsql/main /pgsql/main/83 /pgsql/main9 /pgsql/main/90Wednesday, May 23, 12
  30. 30. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  31. 31. Big Bad “Upgraded” Postgres • SNAG #2 • un-upgradable data typesWednesday, May 23, 12
  32. 32. Big Bad “Upgraded” Postgres • SNAG #2 • un-upgradable data types • left over pg_reorg cruft • (special table data types)Wednesday, May 23, 12
  33. 33. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  34. 34. Big Bad “Upgraded” Postgres • SNAG #2, part b • un-upgradable data typesWednesday, May 23, 12
  35. 35. Big Bad “Upgraded” Postgres • SNAG #2, part b • un-upgradable data types • “NAME” type • check_postgres.pg_stat_activity.datnameWednesday, May 23, 12
  36. 36. Big Bad “Upgraded” Postgres • SNAG #2, part c • un-upgradable data types • “NAME” typeWednesday, May 23, 12
  37. 37. Big Bad “Upgraded” Postgres • SNAG #2, part c • un-upgradable data types • “NAME” type create table x as select tablename, pg_relation_size(schemaname||’.’||tablename) from pg_tablesWednesday, May 23, 12
  38. 38. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  39. 39. Big Bad “Upgraded” Postgres -- plan of attack -- swap in minimal cron / configs shut down 8.3 database zfs snapshot fs bring up 8.3 / 9.0 databases run pg_upgrade verify migration complete turn on snap replication sanity checking slow role more services back on-line back in buisnessWednesday, May 23, 12
  40. 40. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  41. 41. Big Bad “Upgraded” Postgres -- actual attack -- swap in minimal cron / configs shut down 8.3 database zfs snapshot fs bring up 8.3 / 9.0 databases run pg_upgrade **hit a bug in role creation** discuss rollback (rename the control file of the old cluster) bring back up 8.3 databaseWednesday, May 23, 12
  42. 42. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  43. 43. Big Bad “Upgraded” Postgres CREATE ROLE asha; ALTER ROLE asha SET role TO omniti; .. sometime later ... CREATE ROLE omniti;Wednesday, May 23, 12
  44. 44. Big Bad “Upgraded” Postgres Added to TODO: ! Allow pg_dumpall to output restorable ALTER USER/DATABASE SET settings March 2011Wednesday, May 23, 12
  45. 45. Big Bad “Upgraded” Postgres We could have done something where we dropped or modified all roles and recreated them after upgrade, but this didn’t seem like the right fix.Wednesday, May 23, 12
  46. 46. Big Bad “Upgraded” Postgres We could have done something where we dropped or modified all roles and recreated them after upgrade, but this didn’t seem like the right fix. eventually, we started working on a patchWednesday, May 23, 12
  47. 47. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  48. 48. Big Bad “Upgraded” Postgres Meanwhile, at another client, not far away...Wednesday, May 23, 12
  49. 49. Big Bad “Upgraded” Postgres • more pg_upgrade bugs • Fix pg_upgrades handling of TOAST tables (april) • Fix pg_upgrade to preserve toast tables relfrozenxids during an upgrade from 8.3 (sept)Wednesday, May 23, 12
  50. 50. Big Bad “Upgraded” Postgres • more pg_upgrade bugs • Fix pg_upgrades handling of TOAST tables (april) • Fix pg_upgrade to preserve toast tables relfrozenxids during an upgrade from 8.3 (sept) Not directly relevant to “Big Bad”, but did take time / energy away, and was not confidence inspiringWednesday, May 23, 12
  51. 51. Big Bad “Upgraded” Postgres • back to the role bug • after ~ 3 months, the patch was accepted (oct) • 9.1 is now just around the corner, so...Wednesday, May 23, 12
  52. 52. Big Bad “Upgraded” Postgres • back to the role bug • after ~ 3 months, the patch was accepted (oct) • 9.1 is now just around the corner, so... we decided to go to 9.1Wednesday, May 23, 12
  53. 53. Big Bad “Upgraded” Postgres • 9.1 released in September, 2011 • 9.1 upgrade testing begins • schema tests • compatibility testsWednesday, May 23, 12
  54. 54. Big Bad “Upgraded” Postgres • 9.1 released in September, 2011 • 9.1 upgrade testing begins • schema tests • compatibility tests 2011-10-25 Judgement DayWednesday, May 23, 12
  55. 55. Big Bad “Upgraded” Postgres • doh #1 • non-empty tablespace • left over from some 9.1 testing (pg_dump / pg_restore of schema only)Wednesday, May 23, 12
  56. 56. Big Bad “Upgraded” Postgres • doh #1 • non-empty tablespace • left over from some 9.1 testing (pg_dump / pg_restore of schema only) Creating databases in the new cluster psql:/pgdata/main/pg_upgrade_dump_globals.sql:247: NOTICE: schema "ods" does not exist psql:/pgdata/main/pg_upgrade_dump_globals.sql:255: NOTICE: schema "check_postgres" does not exist psql:/pgdata/main/pg_upgrade_dump_globals.sql:303: ERROR: directory "/ pgdata/alldata1/PG_9.1_201105231" already in use as a tablespace There were problems executing "/opt/pgsql911/bin/psql" --set ON_ERROR_STOP=on --no-psqlrc --port 5491 --username "postgres" -f "/ pgdata/main/pg_upgrade_dump_globals.sql" --dbname template1 >> "/dev/null" Failure, exitingWednesday, May 23, 12
  57. 57. Big Bad “Upgraded” Postgres • doh #2 • new bad data types had snuck in (more name, some unknown) • drop tables, alter data typesWednesday, May 23, 12
  58. 58. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  59. 59. Big Bad “Upgraded” Postgres • doh #3 • actually it worked*Wednesday, May 23, 12
  60. 60. Big Bad “Upgraded” Postgres • doh #3 • actually it worked* • begin brining services back online • analyze, vacuum • turn on “snap job” replication • turn on regular replicationWednesday, May 23, 12
  61. 61. Big Bad “Upgraded” Postgres • doh #3 • actually it worked* • begin brining services back online • analyze, vacuum • turn on “snap job” replication • turn on regular replication • aka “the point of no return” * don’t worry, it didn’t really work, it just looked like it didWednesday, May 23, 12
  62. 62. Big Bad “Upgraded” Postgres • Replication was running • 100’s of tables • ETL services not yet on • dinner time :-)Wednesday, May 23, 12
  63. 63. Big Bad “Upgraded” Postgres Things Were Sailing Along FineWednesday, May 23, 12
  64. 64. Big Bad “Upgraded” Postgres “if this were the movie titanic, this is the scene where you see the two guys up in the watch tower joking around before the iceberg hits.”Wednesday, May 23, 12
  65. 65. Big Bad “Upgraded” Postgres NOTICE: ERROR: value too long for type character varying(40) at line 96.Wednesday, May 23, 12
  66. 66. Big Bad “Upgraded” Postgres NOTICE: ERROR: value too long for type character varying(40) at line 96. normally we see this on (oracle) number -> (pgsql) integerWednesday, May 23, 12
  67. 67. Big Bad “Upgraded” Postgres • Verified function with debug output • function creates a temp table matching primary table • copies replicated data into temp table • replaces data in actual table with data from temp table • This was all workingWednesday, May 23, 12
  68. 68. Big Bad “Upgraded” Postgres • function with same data working on the other server • verified sql ran fine using dblink (so, plperl specific)Wednesday, May 23, 12
  69. 69. Big Bad “Upgraded” Postgres Comparing Bad Data - The “Good” -[ RECORD 1 ]----+--------------------------------- userid | 78184652 username | 78184652 title | firstname | ???????????????? lastname | ?????????????????? middleinitial | email | user@example.com address | ???????????????? 34?? ????6 address2 | city | ?????????????? state | ?????????????????? ?????? zipcode | 211573 country | by active | 1 subscribed | 1 cookieusername | ed9cd5fed817628ca5b052ebfe11925d partner | 1059105 actual_phone | dob | phone | regdate | 2011-10-26 15:32:07 ipaddress | 21.12.21.12 last_open_ts | last_click_ts | last_play_ts | last_delivery_ts |Wednesday, May 23, 12
  70. 70. Big Bad “Upgraded” Postgres Comparing Bad Data - The “Bad” -[ RECORD 1 ]----+-------------------------------------------------------------------------- userid | 78184652 username | 78184652 title | firstname | ���������������� lastname | ������������������ middleinitial | email | user@example.com address | ���������������� 34�� ����6 address2 | city | �������������� state | ������������������ ������ zipcode | 211573 country | by active | 1 subscribed | 1 cookieusername | ed9cd5fed817628ca5b052ebfe11925d partner | 1059105 actual_phone | dob | phone | regdate | 2011-10-26 15:32:07 ipaddress | 21.12.21.12 last_open_ts | last_click_ts | last_play_ts | last_delivery_ts |Wednesday, May 23, 12
  71. 71. Big Bad “Upgraded” Postgres I hate encoding issuesWednesday, May 23, 12
  72. 72. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  73. 73. Big Bad “Upgraded” Postgres • Ruled out several config options (lc_*, locale on the machine) • plperl running on different version of libpq?Wednesday, May 23, 12
  74. 74. Big Bad “Upgraded” Postgres • Ruled out several config options (lc_*, locale on the machine) • plperl running on different version of libpq? • install libpq5 built on 9.1 • install dbdpg built on new libpqWednesday, May 23, 12
  75. 75. Big Bad “Upgraded” Postgres • Ruled out several config options (lc_*, locale on the machine) • plperl running on different version of libpq? • install libpq5 built on 9.1 • install dbdpg built on new libpq ...noWednesday, May 23, 12
  76. 76. Big Bad “Upgraded” Postgres • wrote perl script that used dbd:pg to grab data from oracle • worked! • so, plperl eh?Wednesday, May 23, 12
  77. 77. Big Bad “Upgraded” PostgresWednesday, May 23, 12
  78. 78. Big Bad “Upgraded” Postgres postgres=# show client_encoding; client_encoding ----------------- UTF8 (1 row)Wednesday, May 23, 12
  79. 79. Big Bad “Upgraded” Postgres postgres=# show client_encoding; client_encoding ----------------- UTF8 (1 row) postgres=# select * from dbi_link.remote_select(3,select chr(255) as bar) t (bar text); bar ----- ÿ (1 row)Wednesday, May 23, 12
  80. 80. Big Bad “Upgraded” Postgres postgres=# show client_encoding; client_encoding ----------------- UTF8 (1 row) postgres=# select * from dbi_link.remote_select(3,select chr(255) as bar) t (bar text); bar ----- ÿ (1 row) postgres=# set client_encoding = latin1; SET postgres=# select * from dbi_link.remote_select(3,select chr(255) as bar) t (bar text); bar ----- ÿ (1 row)Wednesday, May 23, 12
  81. 81. Big Bad “Upgraded” Postgres • check lang var • pg user • root • smf init script • everything checked out :-Wednesday, May 23, 12
  82. 82. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” -bash-3.00$ sudo pargs -e 17302 Password: 17302: /opt/pgsql8314/bin/postgres -D /pgdata/main envp[0]: LC_TIME=C envp[1]: LC_NUMERIC=C envp[2]: LC_MONETARY=C envp[3]: LC_MESSAGES=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[5]: LC_COLLATE=en_US.UTF-8 envp[6]: LD_LIBRARY_PATH=/opt/oracle/amd64 envp[7]: ORACLE_HOME=/opt/oracle envp[8]: PATH=/usr/sbin:/usr/bin envp[9]: PERL5LIB=/data/CPAN/lib/site_perl envp[10]: PGDATA=/pgdata/main envp[11]: PGPREFIX=/opt/pgsql envp[12]: PGSYSCONFDIR=/opt/pgsql8314/etc envp[13]: PGUSER=postgres envp[14]: SMF_FMRI=svc:/database/postgres:default envp[15]: SMF_METHOD=/opt/pgsql/bin/pg_ctl -D $PGDATA start -w envp[16]: SMF_RESTARTER=svc:/system/svc/restarter:default envp[17]: TNS_ADMIN=/opt/oracle/network/admin envp[18]: TZ=US/EasternWednesday, May 23, 12
  83. 83. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” -bash-3.00$ sudo pargs -e `pgrep postgres` | grep LC_CTYPE Password: envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[6]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8Wednesday, May 23, 12
  84. 84. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” -bash-3.00$ sudo pargs -e `pgrep postgres` | grep LC_CTYPE Password: envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[6]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 envp[4]: LC_CTYPE=en_US.UTF-8 note: this is actually from the “broken” oneWednesday, May 23, 12
  85. 85. Big Bad “Upgraded” Postgres “But Wait, Maybe It Is The LC Stuff” After more experimentation, including restart of database with adjusted LANG and en vars, I was able to fix the odd LC settings, but not the remote data select with plperl... #DOHWednesday, May 23, 12
  86. 86. Big Bad “Upgraded” Postgres have you figured it out yet?Wednesday, May 23, 12
  87. 87. Big Bad “Upgraded” Postgres pg_enable_utf8Wednesday, May 23, 12
  88. 88. Big Bad “Upgraded” Postgres $dbh->{pg_enable_utf8} = 1; Force strings passed to and from plperl to be in UTF8 encoding. String are converted to UTF8 on the way into perl and to the database encoding on the way back. This avoids a number of observed anomalies, and ensures Perl a consistent view of the world. https://github.com/postgres/postgres/commit/50d89d422f9c68a52a6964e5468e8eb4f90b1d95Wednesday, May 23, 12
  89. 89. Big Bad “Upgraded” Postgres $dbh->{pg_enable_utf8} = 1; Actually a change in 9.0 http://www.postgresql.org/docs/9.1/static/release-9-0.html * Verify that PL/Perl return values are valid in the server encoding (Andrew Dunstan) Note: Perl may otherwise make assumptions that your data is Latin1Wednesday, May 23, 12
  90. 90. Big Bad “Upgraded” Postgres My 2:00AM Summary... “In any case, this seems like a horrible backwards compatibility nightmare thats likely to eat peoples data; I only noticed it because I was pulling data from a varchar(20) into a varchar(20) and it complained the data size was too long. Had I been using text (which is what I normally do), I think I would have screwed myself.”Wednesday, May 23, 12
  91. 91. Big Bad “Upgraded” Postgres One last broken job... SELECT 42, cntry_abbr as country, date_trunc(day, h.hitdate) as rollup_day, pg.price_group_id, count(1) as hits FROM tblhits h join tbladvertiser_campaign sc on h.partner = sc.source_code join tblcountry flc on promo.perl_geo_ip_country(h.ipaddress) = flc.cntry_abbr left join tbladvertiser_price_groups pg on pg.campaign_id = 42 and h.hitdate between pg.start_date and coalesce(pg.end_date, h.hitdate) and get_bit(decode(pg.countries::text, hex), flc.country_id::integer) > 0 WHERE h.hitdate >= 2011-10-26::date and sc.campaign_id = 42 and h.hitdate < 2011-10-31::date + 1 day::interval GROUP BY flc.country_abbreviation, date_trunc(day, h.hitdate), pg.price_group_id;Wednesday, May 23, 12
  92. 92. Big Bad “Upgraded” Postgres One last broken job... Default output of bytea columns was changed in Postgres 9.1Wednesday, May 23, 12
  93. 93. Big Bad “Upgraded” Postgres One last broken job... SELECT 42, cntry_abbr as country, date_trunc(day, h.hitdate) as rollup_day, pg.price_group_id, count(1) as hits FROM tblhits h join tbladvertiser_campaign sc on h.partner = sc.source_code join tblcountry flc on promo.perl_geo_ip_country(h.ipaddress) = flc.cntry_abbr left join tbladvertiser_price_groups pg on pg.campaign_id = 42 and h.hitdate between pg.start_date and coalesce(pg.end_date, h.hitdate) and get_bit(pg.countries, flc.country_id::integer) > 0 WHERE h.hitdate >= 2011-10-26::date and sc.campaign_id = 42 and h.hitdate < 2011-10-31::date + 1 day::interval GROUP BY flc.country_abbreviation, date_trunc(day, h.hitdate), pg.price_group_id;Wednesday, May 23, 12
  94. 94. Aftermath • did utf8 issue cause data corruption? • extensive post-upgrade testing • “data diff” between servers • fixed dozen of problemsWednesday, May 23, 12
  95. 95. Aftermath • did utf8 issue cause data corruption? • extensive post-upgrade testing • “data diff” between servers • fixed dozen of problems NONE attributable to the upgrade or plperl issuesWednesday, May 23, 12
  96. 96. Aftermath pg_upgrade works pretty wellWednesday, May 23, 12
  97. 97. Alls Well That Ends Well We upgraded the other boxWednesday, May 23, 12
  98. 98. Alls Well That Ends Well Took ~ 45 minutesWednesday, May 23, 12
  99. 99. Alls Well That Ends Well Nothing brokeWednesday, May 23, 12
  100. 100. Alls Well That Ends Well Yet? ;-)Wednesday, May 23, 12
  101. 101. THE END Thanks! PGCon omniti dba team postgres hackers Slides http://www.xzilla.net/ @robtreat2Wednesday, May 23, 12
  102. 102. Wednesday, May 23, 12

×