Your SlideShare is downloading. ×
0
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Bad PostgreSQL @ Percona

7,559

Published on

Discuss building a multi-terabyte PostgreSQL instance in a high volume, mission-critical operational datastore that replaced Oracle. Learn about solving real-life problems such as a near-catastrophic …

Discuss building a multi-terabyte PostgreSQL instance in a high volume, mission-critical operational datastore that replaced Oracle. Learn about solving real-life problems such as a near-catastrophic hardware failure at terabyte level.

Published in: Technology
1 Comment
14 Likes
Statistics
Notes
No Downloads
Views
Total Views
7,559
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
227
Comments
1
Likes
14
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide




































































































  • Transcript

    • 1. Big Bad PostgreSQL: A Case Study Moving a “large,” “complicated,” and mission-critical datawarehouse from Oracle to PostgreSQL for cost control. 1
    • 2. About the Speaker • Principal @ OmniTI S32699X_Scalable_Internet.qxd 6/23/06 3:31 PM Page 1 Scalable Internet Architectures • Open Source Theo Schlossnagle Scalable Internet Architectures With an estimated one billion users worldwide, the Internet today is nothing less than a global subculture with immense diversity, incredible size, and wide geographic reach. With a relatively low barrier to entry, almost anyone can register a domain name today and potentially provide services to people around the entire world tomorrow. But easy entry to web-based commerce and services can be a double-edged sword. In such a market, it is typically much harder to gauge interest in advance, and the negative impact of unexpected customer traffic can turn out to be devastating for the unprepared. mod_backhand, spreadlogd, In Scalable Internet Architectures, renowned software engineer and architect Theo Schlossnagle outlines the steps and processes organizations can follow to build online services that can scale well with demand—both quickly and economically. By making intelligent decisions throughout the evolution of an architecture, scalability can be a matter Scalable Internet of engineering rather than redesign, costly purchasing, or black magic. OpenSSH+SecurID, Daiquiri, Filled with numerous examples, anecdotes, and lessons gleaned from the author’s years of experience building large-scale Internet services, Scalable Internet Architectures is both thought-provoking and instructional. Readers are challenged to understand first, before they Architectures start a large project, how what they are building will be used, so that from the beginning they can design for scalability those parts which need to scale. With the right approach, it should take no more effort to design and implement a solution that scales than it takes Wackamole, libjlog, Spread, to build something that will not—and if this is the case, Schlossnagle writes, respect yourself and build it right. Theo Schlossnagle is a principal at OmniTI Computer Consulting, where he provides expert consulting services related to scalable Internet architectures, database replication, Reconnoiter, etc. and email infrastructure. He is the creator of the Backhand Project and the Ecelerity MTA, and spends most of his time solving the scalability problems that arise in high performance and highly distributed systems. Internet/Programming Cover image © Digital Vision/Getty Images • Closed Source Schlossnagle Scalability $49.99 USA / $61.99 CAN / £35.99 Net UK Performance Security www.omniti.com DEVELOPER’S LIBRARY DEVELOPER’S www.developers-library.com LIBRARY Message Systems MTA, Message Central • Author Scalable Internet Architectures
    • 3. Overall Architecture OLTP instance: Oracle 8i drives the site 0.5 TB 0.25 TB Hitachi JBOD OLTP Log import and Oracle 8i processing Oracle 8i 0.75 TB JBOD MySQL log importer 0.5 TB 1.5 TB Hitachi MTI OLTP warm backup Warm spare 1.2 TB SATA Datawarehouse RAID Log Importer MySQL 4.1 1.2 TB IDE RAID Data Exporter bulk selects / data exports
    • 4. Overall Architecture OLTP instance: Oracle 8i drives the site 0.5 TB 0.25 TB Hitachi JBOD OLTP Log import and Oracle 8i processing Oracle 8i 0.75 TB JBOD MySQL log importer 0.5 TB 1.5 TB Hitachi MTI OLTP warm backup Warm spare 1.2 TB SATA Datawarehouse RAID Log Importer MySQL 4.1 1.2 TB IDE RAID Data Exporter bulk selects / data exports
    • 5. Database Situation
    • 6. Database Situation • The problems: • The database is growing. • The OLTP and ODS/warehouse are too slow. • A lot of application code against the OLTP system. • Minimal application code against the ODS system.
    • 7. Database Situation • The problems: • The database is growing. • The OLTP and ODS/warehouse are too slow. • A lot of application code against the OLTP system. • Minimal application code against the ODS system. • Oracle: • Licensed per processor. • Really, really, really expensive on a large scale.
    • 8. Database Situation • The problems: • The database is growing. • The OLTP and ODS/warehouse are too slow. • A lot of application code against the OLTP system. • Minimal application code against the ODS system. • Oracle: • Licensed per processor. • Really, really, really expensive on a large scale. • PostgreSQL: • No licensing costs. • Good support for complex queries.
    • 9. Database Choices
    • 10. Database Choices • Must keep Oracle on OLTP •Complex, Oracle-specific web application. •Need more processors.
    • 11. Database Choices • Must keep Oracle on OLTP • Complex, Oracle-specific web application. • Need more processors. • ODS: Oracle not required. • Complex queries from limited sources. • Needs more space and power.
    • 12. Database Choices • Must keep Oracle on OLTP • Complex, Oracle-specific web application. • Need more processors. • ODS: Oracle not required. • Complex queries from limited sources. • Needs more space and power. • Result: • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS
    • 13. PostgreSQL gotchas
    • 14. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle.
    • 15. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle. • No upgrades?!
    • 16. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle. • No upgrades?! • Less community experience with large databases.
    • 17. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle. • No upgrades?! • Less community experience with large databases. • Replication features less evolved.
    • 18. PostgreSQL ♥ ODS
    • 19. PostgreSQL ♥ ODS • Mostly inserts.
    • 20. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time.
    • 21. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time. • pl/perl (leverage DBI/DBD for remote database connectivity).
    • 22. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time. • pl/perl (leverage DBI/DBD for remote database connectivity). • Monster queries.
    • 23. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time. • pl/perl (leverage DBI/DBD for remote database connectivity). • Monster queries. • Extensible.
    • 24. Choosing Linux
    • 25. Choosing Linux • Popular, liked, good community support.
    • 26. Choosing Linux • Popular, liked, good community support. • Chronic problems:
    • 27. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics
    • 28. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only
    • 29. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only • filesystems don’t support snapshots
    • 30. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only • filesystems don’t support snapshots • LVM is clunky on enterprise storage
    • 31. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only • filesystems don’t support snapshots • LVM is clunky on enterprise storage • 20 outages in 4 months
    • 32. Choosing Solaris 10
    • 33. Choosing Solaris 10 • Switched to Solaris 10
    • 34. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools.
    • 35. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management.
    • 36. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS
    • 37. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups.
    • 38. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups. • Excellent support for enterprise storage.
    • 39. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups. • Excellent support for enterprise storage. • DTrace.
    • 40. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups. • Excellent support for enterprise storage. • DTrace. • Free (too).
    • 41. Oracle features we need
    • 42. Oracle features we need • Partitioning
    • 43. Oracle features we need • Partitioning • Statistics and Aggregations
    • 44. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc.
    • 45. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc. • Large selects (100GB)
    • 46. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc. • Large selects (100GB) • Autonomous transactions
    • 47. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc. • Large selects (100GB) • Autonomous transactions • Replication from Oracle (to Oracle)
    • 48. Partitioning For large data sets:
    • 49. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super;
    • 50. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row)
    • 51. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m
    • 52. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m • Allows us to cluster data over specific ranges (by date in our case)
    • 53. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m • Allows us to cluster data over specific ranges (by date in our case) • Simple, cheap archiving and removal of data.
    • 54. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m • Allows us to cluster data over specific ranges (by date in our case) • Simple, cheap archiving and removal of data. • Can put ranges used less often in different tablespaces (slower, cheaper storage)
    • 55. Partitioning PostgreSQL style
    • 56. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition...
    • 57. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?)
    • 58. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm.
    • 59. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning:
    • 60. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning: • One master table with no rows.
    • 61. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning: • One master table with no rows. • Child tables that have our partition constraints.
    • 62. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning: • One master table with no rows. • Child tables that have our partition constraints. • Rules on the master table for insert/update/delete.
    • 63. Partitioning PostgreSQL realized
    • 64. Partitioning PostgreSQL realized • Cheaply add new empty partitions
    • 65. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions
    • 66. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage
    • 67. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition
    • 68. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition • PostgreSQL >8.1 supports constraint checking on inherited tables.
    • 69. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition • PostgreSQL >8.1 supports constraint checking on inherited tables. • smarter planning
    • 70. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition • PostgreSQL >8.1 supports constraint checking on inherited tables. • smarter planning • smarter executing
    • 71. RANK OVER PARTITION • In Oracle: • In PostgreSQL: With 8.4, we have windowing functions
    • 72. RANK OVER PARTITION • In Oracle: select userid, email from ( select u.userid, u.email, row_number() over (partition by u.email order by userid desc) as position from (...)) where position = 1 • In PostgreSQL: With 8.4, we have windowing functions
    • 73. RANK OVER PARTITION • In Oracle: select userid, email from ( select u.userid, u.email, row_number() over (partition by u.email order by userid desc) as position from (...)) where position = 1 • In PostgreSQL: FOR v_row IN select u.userid, u.email from (...) order by email, userid desc LOOP IF v_row.email != v_last_email THEN RETURN NEXT v_row; v_last_email := v_row.email; v_rownum := v_rownum + 1; END IF; END LOOP; With 8.4, we have windowing functions
    • 74. Large SELECTs • Application code does:
    • 75. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;
    • 76. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; • The width of these rows is about 2k
    • 77. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; • The width of these rows is about 2k • 50 million row return set
    • 78. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; • The width of these rows is about 2k • 50 million row return set • > 100 GB of data
    • 79. The Large SELECT Problem • libpq will buffer the entire result in memory. • This affects language bindings (DBD::Pg). • This is an utterly deficient default behavior. • This can be avoided by using cursors • Requires the app to be PostgreSQL specific. • You open a cursor. • Then FETCH the row count you desire.
    • 80. Big SELECTs the Postgres way The previous “big” query becomes:
    • 81. Big SELECTs the Postgres way The previous “big” query becomes: DECLARE CURSOR bigdump FOR select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;
    • 82. Big SELECTs the Postgres way The previous “big” query becomes: DECLARE CURSOR bigdump FOR select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; Then, in a loop: FETCH FORWARD 10000 FROM bigdump;
    • 83. Autonomous Transactions
    • 84. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures.
    • 85. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures. • During these procedures, we like to:
    • 86. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures. • During these procedures, we like to: • COMMIT incrementally Useful for long transactions (update/delete) that need not be atomic -- incremental COMMITs.
    • 87. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures. • During these procedures, we like to: • COMMIT incrementally Useful for long transactions (update/delete) that need not be atomic -- incremental COMMITs. • start a new top-level txn that can COMMIT Useful for logging progress in a stored procedure so that you know how far you progessed and how long each step took even if it rolls back.
    • 88. PostgreSQL shortcoming
    • 89. PostgreSQL shortcoming • PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.”
    • 90. PostgreSQL shortcoming • PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.” • When in doubt, use brute force.
    • 91. PostgreSQL shortcoming • PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.” • When in doubt, use brute force. • Use pl/perl to use DBD::Pg to connect to ourselves (a new backend) and execute a new top-level transaction.
    • 92. Replication
    • 93. Replication • Cross vendor database replication isn’t too difficult.
    • 94. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database.
    • 95. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can.
    • 96. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can. • We can connect to any remote database.
    • 97. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can. • We can connect to any remote database. • INSERT into local tables directly from remote SELECT statements. [snapshots]
    • 98. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can. • We can connect to any remote database. • INSERT into local tables directly from remote SELECT statements. [snapshots] • LOOP over remote SELECT statements and process them row-by-row. [replaying remote DML logs]
    • 99. Replication (really)
    • 100. Replication (really) • Through a combination of snapshotting and DML replay we: • replicate over into over 2000 tables in PostgreSQL from Oracle • snapshot replication of 200 • DML replay logs for 1800
    • 101. Replication (really) • Through a combination of snapshotting and DML replay we: • replicate over into over 2000 tables in PostgreSQL from Oracle • snapshot replication of 200 • DML replay logs for 1800 • PostgreSQL to Oracle is a bit harder • out-of-band export and imports
    • 102. New Architecture • Master: Sun v890 and Hitachi AMS + warm standby running Oracle (1TB) • Logs: several customs running MySQL instances (2TB each) • ODS BI: 2x Sun v40 running PostgreSQL 8.3 (6TB on Sun JBODs on ZFS each) • ODS archive: 2x custom running PostgreSQL 8.3 (14TB internal storage on ZFS each)
    • 103. PostgreSQL is Lacking • No upgrades (AYFKM). • pg_dump is too intrusive. • Poor system-level instrumentation. • Poor methods to determine specific contention. • It relies on the operating system’s filesystem cache. (which make PostgreSQL inconsistent across it’s supported OS base)
    • 104. Enter Solaris • Solaris is a UNIX from Sun Microsystems. • Is it different than other UNIX/UNIX-like systems? • Mostly it isn’t different (hence the term UNIX) • It does have extremely strong ABI backward compatibility. • It’s stable and works well on large machines. • Solaris 10 shakes things up a bit: • DTrace • ZFS • Zones
    • 105. Solaris / ZFS • ZFS: Zettaback Filesystem. • 264 snapshots, 248 files/directory, 264 bytes/filesystem, 278 (256 ZiB) bytes in a pool, 264 devices/pool, 264 pools/system • Extremely cheap differential backups. • I have a 5 TB database, I need a backup! • No rollback in your database? What is this? MySQL? • No rollback in your filesystem? • ZFS has snapshots, rollback, clone and promote. • OMG! Life altering features. • Caveat: ZFS is slower than alternatives, by about 10% with tuning.
    • 106. Solaris / Zones • Zones: Virtual Environments. • Shared kernel. • Can share filesystems. • Segregated processes and privileges. • No big deal for databases, right? But Wait!
    • 107. Solaris / ZFS + Zones = Magic Juju https://labs.omniti.com/trac/pgsoltools/browser/trunk/pitr_clone/clonedb_startclone.sh • ZFS snapshot, clone, delegate to zone, boot and run. • When done, halt zone, destroy clone. • We get a point-in-time copy of our PostgreSQL database: • read-write, • low disk-space requirements, • NO LOCKS! Welcome back pg_dump, you don’t suck (as much) anymore. • Fast snapshot to usable copy time: • On our 20 GB database: 1 minute. • On our 1.2 TB database: 2 minutes.
    • 108. ZFS: how I saved my soul. • Database crash. Bad. 1.2 TB of data... busted. The reason Robert Treat looks a bit older than he should. • xlogs corrupted. catalog indexes corrupted. • Fault? PostgreSQL bug? Bad memory? Who knows? • Trial & error on a 1.2 TB data set is a cruel experience. • In real-life, most recovery actions are destructive actions. • PostgreSQL is no different. • Rollback to last checkpoint (ZFS), hack postgres code, try, fail, repeat.
    • 109. Let DTrace open your eyes • DTrace: Dynamic Tracing • Dynamically instrument “stuff” in the system: • system calls (like strace/truss/ktrace). • process/scheduler activity (on/off cpu, semaphores, conditions). • see signals sent and received. • trace kernel functions, networking. • watch I/O down to the disk. • user-space processes, each function... each machine instruction! • Add probes into apps where it makes sense to you.
    • 110. Can you see what I see? • There is EXPLAIN... when that isn’t enough... • There is EXPLAIN ANALYZE... when that isn’t enough. • There is DTrace. ; dtrace -q -n ‘ postgresql*:::statement-start { self->query = copyinstr(arg0); self->ok=1; } io:::start /self->ok/ { @[self->query, args[0]->b_flags & B_READ ? quot;readquot; : quot;writequot;, args[1]->dev_statname] = sum(args[0]->b_bcount); }’ dtrace: description 'postgres*:::statement-start' matched 14 probes ^C select count(1) from c2w_ods.tblusers where zipcode between 10000 and 11000; read sd1 16384 select division, sum(amount), avg(amount) from ods.billings where txn_timestamp between ‘2006-01-01 00:00:00’ and ‘2006-04-01 00:00:00’ group by division; read sd2 71647232
    • 111. OmniTI Labs / pgsoltools • https://labs.omniti.com/trac/pgsoltools • Where we stick out PostgreSQL on Solaris goodies... • like pg_file_stress FILENAME/DBOBJECT READS WRITES # min avg max # min avg max alldata1__idx_remove_domain_external 1 12 12 12 398 0 0 0 slowdata1__pg_rewrite 1 12 12 12 0 0 0 0 slowdata1__pg_class_oid_index 1 0 0 0 0 0 0 0 slowdata1__pg_attribute 2 0 0 0 0 0 0 0 alldata1__mv_users 0 0 0 0 4 0 0 0 slowdata1__pg_statistic 1 0 0 0 0 0 0 0 slowdata1__pg_index 1 0 0 0 0 0 0 0 slowdata1__pg_index_indexrelid_index 1 0 0 0 0 0 0 0 alldata1__remove_domain_external 0 0 0 0 502 0 0 0 alldata1__promo_15_tb_full_2 19 0 0 0 11 0 0 0 slowdata1__pg_class_relname_nsp_index 2 0 0 0 0 0 0 0 alldata1__promo_177intaoltest_tb 0 0 0 0 1053 0 0 0 slowdata1__pg_attribute_relid_attnum_index 2 0 0 0 0 0 0 0 alldata1__promo_15_tb_full_2_pk 2 0 0 0 0 0 0 0 alldata1__all_mailable_2 1403 0 0 423 0 0 0 0 alldata1__mv_users_pkey 0 0 0 0 4 0 0 0
    • 112. Results
    • 113. Results • Move ODS Oracle licenses to OLTP
    • 114. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS
    • 115. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS • Save $800k in license costs.
    • 116. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS • Save $800k in license costs. • Spend $100k in labor costs.
    • 117. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS • Save $800k in license costs. • Spend $100k in labor costs. • Learn a lot.
    • 118. Thanks! • Thank you. • http://omniti.com/does/postgresql • We’re hiring, but only if you love: • lots of data on lots of disks on lots of big boxes • smart people • hard problems • more than one database technology (including PostgreSQL) • responsibility

    ×