SlideShare a Scribd company logo

Big Bad PostgreSQL @ Percona

Discuss building a multi-terabyte PostgreSQL instance in a high volume, mission-critical operational datastore that replaced Oracle. Learn about solving real-life problems such as a near-catastrophic hardware failure at terabyte level.

1 of 118
Download to read offline
Big Bad PostgreSQL: A Case Study


           Moving a
           “large,”
    “complicated,” and
     mission-critical
     datawarehouse
        from Oracle
      to PostgreSQL
     for cost control.



1
About the Speaker

                                                                                                                                                                              • Principal @ OmniTI
S32699X_Scalable_Internet.qxd      6/23/06   3:31 PM    Page 1




                                                                                                                        Scalable Internet Architectures
                                                                                                                                                                              • Open Source
                                                                                                                                                          Theo Schlossnagle

   Scalable Internet Architectures
     With an estimated one billion users worldwide, the Internet today is nothing less than a
     global subculture with immense diversity, incredible size, and wide geographic reach. With a
     relatively low barrier to entry, almost anyone can register a domain name today and potentially
     provide services to people around the entire world tomorrow. But easy entry to web-based
     commerce and services can be a double-edged sword. In such a market, it is typically much
     harder to gauge interest in advance, and the negative impact of unexpected customer traffic
     can turn out to be devastating for the unprepared.




                                                                                                                                                                                mod_backhand, spreadlogd,
     In Scalable Internet Architectures, renowned software engineer and architect Theo
     Schlossnagle outlines the steps and processes organizations can follow to build online
     services that can scale well with demand—both quickly and economically. By making
     intelligent decisions throughout the evolution of an architecture, scalability can be a matter




                                                                                                                                                          Scalable Internet
     of engineering rather than redesign, costly purchasing, or black magic.




                                                                                                                                                                                OpenSSH+SecurID, Daiquiri,
     Filled with numerous examples, anecdotes, and lessons gleaned from the author’s years
     of experience building large-scale Internet services, Scalable Internet Architectures is both
     thought-provoking and instructional. Readers are challenged to understand first, before they




                                                                                                                                                          Architectures
     start a large project, how what they are building will be used, so that from the beginning
     they can design for scalability those parts which need to scale. With the right approach, it
     should take no more effort to design and implement a solution that scales than it takes




                                                                                                                                                                                Wackamole, libjlog, Spread,
     to build something that will not—and if this is the case, Schlossnagle writes, respect
     yourself and build it right.


     Theo Schlossnagle is a principal at OmniTI Computer Consulting, where he provides
     expert consulting services related to scalable Internet architectures, database replication,




                                                                                                                                                                                Reconnoiter, etc.
     and email infrastructure. He is the creator of the Backhand Project and the Ecelerity MTA,
     and spends most of his time solving the scalability problems that arise in high performance
     and highly distributed systems.




   Internet/Programming                                                     Cover image © Digital Vision/Getty Images




                                                                                                                                                                              • Closed Source
                                                                                                                         Schlossnagle
                  Scalability
                                                                    $49.99 USA / $61.99 CAN / £35.99 Net UK
                 Performance
                   Security
                 www.omniti.com

   DEVELOPER’S
   LIBRARY
                                                                                                                         DEVELOPER’S
   www.developers-library.com                                                                                            LIBRARY




                                                                                                                                                                                Message Systems MTA,
                                                                                                                                                                                Message Central

                                                                                                                                                                              • Author
                                                                                                                                                                                Scalable Internet Architectures
Overall Architecture


                                                                                OLTP instance:
                                          Oracle 8i



                                                                                drives the site
                                0.5 TB              0.25 TB
                                Hitachi              JBOD


                                                        OLTP




Log import and                                                                        Oracle 8i




processing
                                                    Oracle 8i

                                                                                          0.75 TB
                                                                                           JBOD
                 MySQL
               log importer               0.5 TB                1.5 TB
                                          Hitachi                MTI              OLTP warm backup


                                                                                                     Warm spare
                     1.2 TB
                     SATA                             Datawarehouse
                     RAID


                 Log Importer


                                                                 MySQL 4.1




                                                                       1.2 TB
                                                                     IDE RAID


                                                                  Data Exporter




                       bulk selects / data exports
Overall Architecture


                                                                                OLTP instance:
                                          Oracle 8i



                                                                                drives the site
                                0.5 TB              0.25 TB
                                Hitachi              JBOD


                                                        OLTP




Log import and                                                                        Oracle 8i




processing
                                                    Oracle 8i

                                                                                          0.75 TB
                                                                                           JBOD
                 MySQL
               log importer               0.5 TB                1.5 TB
                                          Hitachi                MTI              OLTP warm backup


                                                                                                     Warm spare
                     1.2 TB
                     SATA                             Datawarehouse
                     RAID


                 Log Importer


                                                                 MySQL 4.1




                                                                       1.2 TB
                                                                     IDE RAID


                                                                  Data Exporter




                       bulk selects / data exports
Database Situation
Database Situation

 •   The problems:
     • The database is growing.
     • The OLTP and ODS/warehouse are too slow.
     • A lot of application code against the OLTP system.
     • Minimal application code against the ODS system.

Recommended

Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everythingTheo Schlossnagle
 
Velocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesVelocity 2010: Scalable Internet Architectures
Velocity 2010: Scalable Internet ArchitecturesTheo Schlossnagle
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentationTheo Schlossnagle
 

More Related Content

Viewers also liked

OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012Theo Schlossnagle
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet ArchitectureTheo Schlossnagle
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLPGConf APAC
 
The Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLThe Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLEDB
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEdureka!
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 
PostgreSQL vs MySQL: PostgreSQL como alternativa.
PostgreSQL vs MySQL: PostgreSQL como alternativa.PostgreSQL vs MySQL: PostgreSQL como alternativa.
PostgreSQL vs MySQL: PostgreSQL como alternativa.Arturo Espinosa
 

Viewers also liked (13)

OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Project reality
Project realityProject reality
Project reality
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet Architecture
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
The Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLThe Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQL
 
Why use PostgreSQL?
Why use PostgreSQL?Why use PostgreSQL?
Why use PostgreSQL?
 
Really Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DWReally Big Elephants: PostgreSQL DW
Really Big Elephants: PostgreSQL DW
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Esperwhispering
EsperwhisperingEsperwhispering
Esperwhispering
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 
PostgreSQL vs MySQL: PostgreSQL como alternativa.
PostgreSQL vs MySQL: PostgreSQL como alternativa.PostgreSQL vs MySQL: PostgreSQL como alternativa.
PostgreSQL vs MySQL: PostgreSQL como alternativa.
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Similar to Big Bad PostgreSQL @ Percona

NYC Chalk Talk
NYC Chalk TalkNYC Chalk Talk
NYC Chalk TalkBobsNJ
 
Artic Startup
Artic StartupArtic Startup
Artic StartupBobsNJ
 
Tagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceTagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceEduard Bondarenko
 
Internet World Web2
Internet World Web2Internet World Web2
Internet World Web2BobsNJ
 
Cloud Camp Feb 21 2013 - All Slides
Cloud Camp Feb 21 2013 - All SlidesCloud Camp Feb 21 2013 - All Slides
Cloud Camp Feb 21 2013 - All SlidesCloudCamp Chicago
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
locotalk-whitepaper-2016
locotalk-whitepaper-2016locotalk-whitepaper-2016
locotalk-whitepaper-2016Anthony Wijnen
 
Life Beyond Rails: Creating Cross Platform Ruby Apps
Life Beyond Rails: Creating Cross Platform Ruby AppsLife Beyond Rails: Creating Cross Platform Ruby Apps
Life Beyond Rails: Creating Cross Platform Ruby AppsTristan Gomez
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolithStay productive while slicing up the monolith
Stay productive while slicing up the monolithMarkus Eisele
 
Project SpaceLock - Architecture & Design
Project SpaceLock - Architecture & DesignProject SpaceLock - Architecture & Design
Project SpaceLock - Architecture & DesignAbhishek Mishra
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolithStay productive while slicing up the monolith
Stay productive while slicing up the monolithMarkus Eisele
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLKonstantin Gredeskoul
 
Web 20- 2: Architecture Patterns And Models For The New Internet
Web 20- 2: Architecture Patterns And Models For The New InternetWeb 20- 2: Architecture Patterns And Models For The New Internet
Web 20- 2: Architecture Patterns And Models For The New Internettvawler
 
Connecting the Dots: How Blockchains Can Interoperate with Polkadot
Connecting the Dots: How Blockchains Can Interoperate with PolkadotConnecting the Dots: How Blockchains Can Interoperate with Polkadot
Connecting the Dots: How Blockchains Can Interoperate with PolkadotPureStake
 
Guide to NoSQL with MySQL
Guide to NoSQL with MySQLGuide to NoSQL with MySQL
Guide to NoSQL with MySQLSamuel Rohaut
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...Giuseppe Paterno'
 
Essential Node.js for Web Developers from Developer Week 2013
Essential Node.js for Web Developers from Developer Week 2013Essential Node.js for Web Developers from Developer Week 2013
Essential Node.js for Web Developers from Developer Week 2013CA API Management
 
Open stack in action enovance - cloudwatt - european ambitions for openstack
Open stack in action   enovance - cloudwatt - european ambitions for openstackOpen stack in action   enovance - cloudwatt - european ambitions for openstack
Open stack in action enovance - cloudwatt - european ambitions for openstackeNovance
 

Similar to Big Bad PostgreSQL @ Percona (20)

NYC Chalk Talk
NYC Chalk TalkNYC Chalk Talk
NYC Chalk Talk
 
Artic Startup
Artic StartupArtic Startup
Artic Startup
 
Tagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and PerformanceTagging and Folksonomy Schema Design for Scalability and Performance
Tagging and Folksonomy Schema Design for Scalability and Performance
 
Internet World Web2
Internet World Web2Internet World Web2
Internet World Web2
 
Cloud Camp Feb 21 2013 - All Slides
Cloud Camp Feb 21 2013 - All SlidesCloud Camp Feb 21 2013 - All Slides
Cloud Camp Feb 21 2013 - All Slides
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
locotalk-whitepaper-2016
locotalk-whitepaper-2016locotalk-whitepaper-2016
locotalk-whitepaper-2016
 
Life Beyond Rails: Creating Cross Platform Ruby Apps
Life Beyond Rails: Creating Cross Platform Ruby AppsLife Beyond Rails: Creating Cross Platform Ruby Apps
Life Beyond Rails: Creating Cross Platform Ruby Apps
 
Why Cloud Computing is Different
Why Cloud Computing is DifferentWhy Cloud Computing is Different
Why Cloud Computing is Different
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolithStay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
Project SpaceLock - Architecture & Design
Project SpaceLock - Architecture & DesignProject SpaceLock - Architecture & Design
Project SpaceLock - Architecture & Design
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolithStay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
 
Web 20- 2: Architecture Patterns And Models For The New Internet
Web 20- 2: Architecture Patterns And Models For The New InternetWeb 20- 2: Architecture Patterns And Models For The New Internet
Web 20- 2: Architecture Patterns And Models For The New Internet
 
Connecting the Dots: How Blockchains Can Interoperate with Polkadot
Connecting the Dots: How Blockchains Can Interoperate with PolkadotConnecting the Dots: How Blockchains Can Interoperate with Polkadot
Connecting the Dots: How Blockchains Can Interoperate with Polkadot
 
Guide to NoSQL with MySQL
Guide to NoSQL with MySQLGuide to NoSQL with MySQL
Guide to NoSQL with MySQL
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
 
Essential Node.js for Web Developers from Developer Week 2013
Essential Node.js for Web Developers from Developer Week 2013Essential Node.js for Web Developers from Developer Week 2013
Essential Node.js for Web Developers from Developer Week 2013
 
Open stack in action enovance - cloudwatt - european ambitions for openstack
Open stack in action   enovance - cloudwatt - european ambitions for openstackOpen stack in action   enovance - cloudwatt - european ambitions for openstack
Open stack in action enovance - cloudwatt - european ambitions for openstack
 

More from Theo Schlossnagle

More from Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 

Recently uploaded

SKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologiesSKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologiesNeo4j
 
Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emNho Vĩnh
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...ShapeBlue
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionNeo4j
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubShapeBlue
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Cprime
 
iOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingeriOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingerssuser9354ce
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfinfogdgmi
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...SearchNorwich
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...ShapeBlue
 
Achieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdfAchieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdfIES VE
 
My Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceMy Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceVijayananda Mohire
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentThorsten Huelsmann
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIVijayananda Mohire
 
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Chris Bingham
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 

Recently uploaded (20)

SKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologiesSKY Paradigms, change and cake: the steep curve of introducing new technologies
SKY Paradigms, change and cake: the steep curve of introducing new technologies
 
Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ em
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ion
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
 
iOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostingeriOncologi_Pitch Deck_2024 slide show for hostinger
iOncologi_Pitch Deck_2024 slide show for hostinger
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdf
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
 
Achieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdfAchieving Excellence IESVE for HVAC Simulation.pdf
Achieving Excellence IESVE for HVAC Simulation.pdf
 
My Journey towards Artificial Intelligence
My Journey towards Artificial IntelligenceMy Journey towards Artificial Intelligence
My Journey towards Artificial Intelligence
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry development
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AI
 
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 

Big Bad PostgreSQL @ Percona

  • 1. Big Bad PostgreSQL: A Case Study Moving a “large,” “complicated,” and mission-critical datawarehouse from Oracle to PostgreSQL for cost control. 1
  • 2. About the Speaker • Principal @ OmniTI S32699X_Scalable_Internet.qxd 6/23/06 3:31 PM Page 1 Scalable Internet Architectures • Open Source Theo Schlossnagle Scalable Internet Architectures With an estimated one billion users worldwide, the Internet today is nothing less than a global subculture with immense diversity, incredible size, and wide geographic reach. With a relatively low barrier to entry, almost anyone can register a domain name today and potentially provide services to people around the entire world tomorrow. But easy entry to web-based commerce and services can be a double-edged sword. In such a market, it is typically much harder to gauge interest in advance, and the negative impact of unexpected customer traffic can turn out to be devastating for the unprepared. mod_backhand, spreadlogd, In Scalable Internet Architectures, renowned software engineer and architect Theo Schlossnagle outlines the steps and processes organizations can follow to build online services that can scale well with demand—both quickly and economically. By making intelligent decisions throughout the evolution of an architecture, scalability can be a matter Scalable Internet of engineering rather than redesign, costly purchasing, or black magic. OpenSSH+SecurID, Daiquiri, Filled with numerous examples, anecdotes, and lessons gleaned from the author’s years of experience building large-scale Internet services, Scalable Internet Architectures is both thought-provoking and instructional. Readers are challenged to understand first, before they Architectures start a large project, how what they are building will be used, so that from the beginning they can design for scalability those parts which need to scale. With the right approach, it should take no more effort to design and implement a solution that scales than it takes Wackamole, libjlog, Spread, to build something that will not—and if this is the case, Schlossnagle writes, respect yourself and build it right. Theo Schlossnagle is a principal at OmniTI Computer Consulting, where he provides expert consulting services related to scalable Internet architectures, database replication, Reconnoiter, etc. and email infrastructure. He is the creator of the Backhand Project and the Ecelerity MTA, and spends most of his time solving the scalability problems that arise in high performance and highly distributed systems. Internet/Programming Cover image © Digital Vision/Getty Images • Closed Source Schlossnagle Scalability $49.99 USA / $61.99 CAN / £35.99 Net UK Performance Security www.omniti.com DEVELOPER’S LIBRARY DEVELOPER’S www.developers-library.com LIBRARY Message Systems MTA, Message Central • Author Scalable Internet Architectures
  • 3. Overall Architecture OLTP instance: Oracle 8i drives the site 0.5 TB 0.25 TB Hitachi JBOD OLTP Log import and Oracle 8i processing Oracle 8i 0.75 TB JBOD MySQL log importer 0.5 TB 1.5 TB Hitachi MTI OLTP warm backup Warm spare 1.2 TB SATA Datawarehouse RAID Log Importer MySQL 4.1 1.2 TB IDE RAID Data Exporter bulk selects / data exports
  • 4. Overall Architecture OLTP instance: Oracle 8i drives the site 0.5 TB 0.25 TB Hitachi JBOD OLTP Log import and Oracle 8i processing Oracle 8i 0.75 TB JBOD MySQL log importer 0.5 TB 1.5 TB Hitachi MTI OLTP warm backup Warm spare 1.2 TB SATA Datawarehouse RAID Log Importer MySQL 4.1 1.2 TB IDE RAID Data Exporter bulk selects / data exports
  • 6. Database Situation • The problems: • The database is growing. • The OLTP and ODS/warehouse are too slow. • A lot of application code against the OLTP system. • Minimal application code against the ODS system.
  • 7. Database Situation • The problems: • The database is growing. • The OLTP and ODS/warehouse are too slow. • A lot of application code against the OLTP system. • Minimal application code against the ODS system. • Oracle: • Licensed per processor. • Really, really, really expensive on a large scale.
  • 8. Database Situation • The problems: • The database is growing. • The OLTP and ODS/warehouse are too slow. • A lot of application code against the OLTP system. • Minimal application code against the ODS system. • Oracle: • Licensed per processor. • Really, really, really expensive on a large scale. • PostgreSQL: • No licensing costs. • Good support for complex queries.
  • 10. Database Choices • Must keep Oracle on OLTP •Complex, Oracle-specific web application. •Need more processors.
  • 11. Database Choices • Must keep Oracle on OLTP • Complex, Oracle-specific web application. • Need more processors. • ODS: Oracle not required. • Complex queries from limited sources. • Needs more space and power.
  • 12. Database Choices • Must keep Oracle on OLTP • Complex, Oracle-specific web application. • Need more processors. • ODS: Oracle not required. • Complex queries from limited sources. • Needs more space and power. • Result: • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS
  • 14. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle.
  • 15. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle. • No upgrades?!
  • 16. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle. • No upgrades?! • Less community experience with large databases.
  • 17. PostgreSQL gotchas • For an OLTP system that does thousands of updates per second, vacuuming is a hassle. • No upgrades?! • Less community experience with large databases. • Replication features less evolved.
  • 19. PostgreSQL ♥ ODS • Mostly inserts.
  • 20. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time.
  • 21. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time. • pl/perl (leverage DBI/DBD for remote database connectivity).
  • 22. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time. • pl/perl (leverage DBI/DBD for remote database connectivity). • Monster queries.
  • 23. PostgreSQL ♥ ODS • Mostly inserts. • Updates/Deletes controlled, not real-time. • pl/perl (leverage DBI/DBD for remote database connectivity). • Monster queries. • Extensible.
  • 25. Choosing Linux • Popular, liked, good community support.
  • 26. Choosing Linux • Popular, liked, good community support. • Chronic problems:
  • 27. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics
  • 28. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only
  • 29. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only • filesystems don’t support snapshots
  • 30. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only • filesystems don’t support snapshots • LVM is clunky on enterprise storage
  • 31. Choosing Linux • Popular, liked, good community support. • Chronic problems: • kernel panics • filesystems remounting read-only • filesystems don’t support snapshots • LVM is clunky on enterprise storage • 20 outages in 4 months
  • 33. Choosing Solaris 10 • Switched to Solaris 10
  • 34. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools.
  • 35. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management.
  • 36. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS
  • 37. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups.
  • 38. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups. • Excellent support for enterprise storage.
  • 39. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups. • Excellent support for enterprise storage. • DTrace.
  • 40. Choosing Solaris 10 • Switched to Solaris 10 • No crashes, better system-level tools. • prstat, iostat, vmstat, smf, fault- management. • ZFS • snapshots (persistent), BLI backups. • Excellent support for enterprise storage. • DTrace. • Free (too).
  • 42. Oracle features we need • Partitioning
  • 43. Oracle features we need • Partitioning • Statistics and Aggregations
  • 44. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc.
  • 45. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc. • Large selects (100GB)
  • 46. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc. • Large selects (100GB) • Autonomous transactions
  • 47. Oracle features we need • Partitioning • Statistics and Aggregations • rank over partition, lead, lag, etc. • Large selects (100GB) • Autonomous transactions • Replication from Oracle (to Oracle)
  • 48. Partitioning For large data sets:
  • 49. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super;
  • 50. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row)
  • 51. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m
  • 52. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m • Allows us to cluster data over specific ranges (by date in our case)
  • 53. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m • Allows us to cluster data over specific ranges (by date in our case) • Simple, cheap archiving and removal of data.
  • 54. Partitioning For large data sets: pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 1790994512 (1 row) • Next biggest tables: 850m, 650m, 590m • Allows us to cluster data over specific ranges (by date in our case) • Simple, cheap archiving and removal of data. • Can put ranges used less often in different tablespaces (slower, cheaper storage)
  • 56. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition...
  • 57. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?)
  • 58. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm.
  • 59. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning:
  • 60. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning: • One master table with no rows.
  • 61. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning: • One master table with no rows. • Child tables that have our partition constraints.
  • 62. Partitioning PostgreSQL style • PostgreSQL doesn’t support partition... • It supports inheritance... (what’s this?) • some crazy object-relation paradigm. • We can use it to implement partitioning: • One master table with no rows. • Child tables that have our partition constraints. • Rules on the master table for insert/update/delete.
  • 64. Partitioning PostgreSQL realized • Cheaply add new empty partitions
  • 65. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions
  • 66. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage
  • 67. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition
  • 68. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition • PostgreSQL >8.1 supports constraint checking on inherited tables.
  • 69. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition • PostgreSQL >8.1 supports constraint checking on inherited tables. • smarter planning
  • 70. Partitioning PostgreSQL realized • Cheaply add new empty partitions • Cheaply remove old partitions • Migrate less-often-accessed partitions to slower storage • Different indexes strategies per partition • PostgreSQL >8.1 supports constraint checking on inherited tables. • smarter planning • smarter executing
  • 71. RANK OVER PARTITION • In Oracle: • In PostgreSQL: With 8.4, we have windowing functions
  • 72. RANK OVER PARTITION • In Oracle: select userid, email from ( select u.userid, u.email, row_number() over (partition by u.email order by userid desc) as position from (...)) where position = 1 • In PostgreSQL: With 8.4, we have windowing functions
  • 73. RANK OVER PARTITION • In Oracle: select userid, email from ( select u.userid, u.email, row_number() over (partition by u.email order by userid desc) as position from (...)) where position = 1 • In PostgreSQL: FOR v_row IN select u.userid, u.email from (...) order by email, userid desc LOOP IF v_row.email != v_last_email THEN RETURN NEXT v_row; v_last_email := v_row.email; v_rownum := v_rownum + 1; END IF; END LOOP; With 8.4, we have windowing functions
  • 74. Large SELECTs • Application code does:
  • 75. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;
  • 76. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; • The width of these rows is about 2k
  • 77. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; • The width of these rows is about 2k • 50 million row return set
  • 78. Large SELECTs • Application code does: select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; • The width of these rows is about 2k • 50 million row return set • > 100 GB of data
  • 79. The Large SELECT Problem • libpq will buffer the entire result in memory. • This affects language bindings (DBD::Pg). • This is an utterly deficient default behavior. • This can be avoided by using cursors • Requires the app to be PostgreSQL specific. • You open a cursor. • Then FETCH the row count you desire.
  • 80. Big SELECTs the Postgres way The previous “big” query becomes:
  • 81. Big SELECTs the Postgres way The previous “big” query becomes: DECLARE CURSOR bigdump FOR select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;
  • 82. Big SELECTs the Postgres way The previous “big” query becomes: DECLARE CURSOR bigdump FOR select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate; Then, in a loop: FETCH FORWARD 10000 FROM bigdump;
  • 84. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures.
  • 85. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures. • During these procedures, we like to:
  • 86. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures. • During these procedures, we like to: • COMMIT incrementally Useful for long transactions (update/delete) that need not be atomic -- incremental COMMITs.
  • 87. Autonomous Transactions • In Oracle we have over 2000 custom stored procedures. • During these procedures, we like to: • COMMIT incrementally Useful for long transactions (update/delete) that need not be atomic -- incremental COMMITs. • start a new top-level txn that can COMMIT Useful for logging progress in a stored procedure so that you know how far you progessed and how long each step took even if it rolls back.
  • 89. PostgreSQL shortcoming • PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.”
  • 90. PostgreSQL shortcoming • PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.” • When in doubt, use brute force.
  • 91. PostgreSQL shortcoming • PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.” • When in doubt, use brute force. • Use pl/perl to use DBD::Pg to connect to ourselves (a new backend) and execute a new top-level transaction.
  • 93. Replication • Cross vendor database replication isn’t too difficult.
  • 94. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database.
  • 95. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can.
  • 96. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can. • We can connect to any remote database.
  • 97. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can. • We can connect to any remote database. • INSERT into local tables directly from remote SELECT statements. [snapshots]
  • 98. Replication • Cross vendor database replication isn’t too difficult. • Helps a lot when you can do it inside the database. • Using dbi-link (based on pl/perl and DBI) we can. • We can connect to any remote database. • INSERT into local tables directly from remote SELECT statements. [snapshots] • LOOP over remote SELECT statements and process them row-by-row. [replaying remote DML logs]
  • 100. Replication (really) • Through a combination of snapshotting and DML replay we: • replicate over into over 2000 tables in PostgreSQL from Oracle • snapshot replication of 200 • DML replay logs for 1800
  • 101. Replication (really) • Through a combination of snapshotting and DML replay we: • replicate over into over 2000 tables in PostgreSQL from Oracle • snapshot replication of 200 • DML replay logs for 1800 • PostgreSQL to Oracle is a bit harder • out-of-band export and imports
  • 102. New Architecture • Master: Sun v890 and Hitachi AMS + warm standby running Oracle (1TB) • Logs: several customs running MySQL instances (2TB each) • ODS BI: 2x Sun v40 running PostgreSQL 8.3 (6TB on Sun JBODs on ZFS each) • ODS archive: 2x custom running PostgreSQL 8.3 (14TB internal storage on ZFS each)
  • 103. PostgreSQL is Lacking • No upgrades (AYFKM). • pg_dump is too intrusive. • Poor system-level instrumentation. • Poor methods to determine specific contention. • It relies on the operating system’s filesystem cache. (which make PostgreSQL inconsistent across it’s supported OS base)
  • 104. Enter Solaris • Solaris is a UNIX from Sun Microsystems. • Is it different than other UNIX/UNIX-like systems? • Mostly it isn’t different (hence the term UNIX) • It does have extremely strong ABI backward compatibility. • It’s stable and works well on large machines. • Solaris 10 shakes things up a bit: • DTrace • ZFS • Zones
  • 105. Solaris / ZFS • ZFS: Zettaback Filesystem. • 264 snapshots, 248 files/directory, 264 bytes/filesystem, 278 (256 ZiB) bytes in a pool, 264 devices/pool, 264 pools/system • Extremely cheap differential backups. • I have a 5 TB database, I need a backup! • No rollback in your database? What is this? MySQL? • No rollback in your filesystem? • ZFS has snapshots, rollback, clone and promote. • OMG! Life altering features. • Caveat: ZFS is slower than alternatives, by about 10% with tuning.
  • 106. Solaris / Zones • Zones: Virtual Environments. • Shared kernel. • Can share filesystems. • Segregated processes and privileges. • No big deal for databases, right? But Wait!
  • 107. Solaris / ZFS + Zones = Magic Juju https://labs.omniti.com/trac/pgsoltools/browser/trunk/pitr_clone/clonedb_startclone.sh • ZFS snapshot, clone, delegate to zone, boot and run. • When done, halt zone, destroy clone. • We get a point-in-time copy of our PostgreSQL database: • read-write, • low disk-space requirements, • NO LOCKS! Welcome back pg_dump, you don’t suck (as much) anymore. • Fast snapshot to usable copy time: • On our 20 GB database: 1 minute. • On our 1.2 TB database: 2 minutes.
  • 108. ZFS: how I saved my soul. • Database crash. Bad. 1.2 TB of data... busted. The reason Robert Treat looks a bit older than he should. • xlogs corrupted. catalog indexes corrupted. • Fault? PostgreSQL bug? Bad memory? Who knows? • Trial & error on a 1.2 TB data set is a cruel experience. • In real-life, most recovery actions are destructive actions. • PostgreSQL is no different. • Rollback to last checkpoint (ZFS), hack postgres code, try, fail, repeat.
  • 109. Let DTrace open your eyes • DTrace: Dynamic Tracing • Dynamically instrument “stuff” in the system: • system calls (like strace/truss/ktrace). • process/scheduler activity (on/off cpu, semaphores, conditions). • see signals sent and received. • trace kernel functions, networking. • watch I/O down to the disk. • user-space processes, each function... each machine instruction! • Add probes into apps where it makes sense to you.
  • 110. Can you see what I see? • There is EXPLAIN... when that isn’t enough... • There is EXPLAIN ANALYZE... when that isn’t enough. • There is DTrace. ; dtrace -q -n ‘ postgresql*:::statement-start { self->query = copyinstr(arg0); self->ok=1; } io:::start /self->ok/ { @[self->query, args[0]->b_flags & B_READ ? quot;readquot; : quot;writequot;, args[1]->dev_statname] = sum(args[0]->b_bcount); }’ dtrace: description 'postgres*:::statement-start' matched 14 probes ^C select count(1) from c2w_ods.tblusers where zipcode between 10000 and 11000; read sd1 16384 select division, sum(amount), avg(amount) from ods.billings where txn_timestamp between ‘2006-01-01 00:00:00’ and ‘2006-04-01 00:00:00’ group by division; read sd2 71647232
  • 111. OmniTI Labs / pgsoltools • https://labs.omniti.com/trac/pgsoltools • Where we stick out PostgreSQL on Solaris goodies... • like pg_file_stress FILENAME/DBOBJECT READS WRITES # min avg max # min avg max alldata1__idx_remove_domain_external 1 12 12 12 398 0 0 0 slowdata1__pg_rewrite 1 12 12 12 0 0 0 0 slowdata1__pg_class_oid_index 1 0 0 0 0 0 0 0 slowdata1__pg_attribute 2 0 0 0 0 0 0 0 alldata1__mv_users 0 0 0 0 4 0 0 0 slowdata1__pg_statistic 1 0 0 0 0 0 0 0 slowdata1__pg_index 1 0 0 0 0 0 0 0 slowdata1__pg_index_indexrelid_index 1 0 0 0 0 0 0 0 alldata1__remove_domain_external 0 0 0 0 502 0 0 0 alldata1__promo_15_tb_full_2 19 0 0 0 11 0 0 0 slowdata1__pg_class_relname_nsp_index 2 0 0 0 0 0 0 0 alldata1__promo_177intaoltest_tb 0 0 0 0 1053 0 0 0 slowdata1__pg_attribute_relid_attnum_index 2 0 0 0 0 0 0 0 alldata1__promo_15_tb_full_2_pk 2 0 0 0 0 0 0 0 alldata1__all_mailable_2 1403 0 0 423 0 0 0 0 alldata1__mv_users_pkey 0 0 0 0 4 0 0 0
  • 113. Results • Move ODS Oracle licenses to OLTP
  • 114. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS
  • 115. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS • Save $800k in license costs.
  • 116. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS • Save $800k in license costs. • Spend $100k in labor costs.
  • 117. Results • Move ODS Oracle licenses to OLTP • Run PostgreSQL on ODS • Save $800k in license costs. • Spend $100k in labor costs. • Learn a lot.
  • 118. Thanks! • Thank you. • http://omniti.com/does/postgresql • We’re hiring, but only if you love: • lots of data on lots of disks on lots of big boxes • smart people • hard problems • more than one database technology (including PostgreSQL) • responsibility

Editor's Notes