SlideShare a Scribd company logo
1 of 69
Download to read offline
When it all Goes Wrong
Nordic PGDay 2019 — March 19 — Copenhagen
@leinweber
Will Leinweber
@leinweber

Citus Data (Microsoft)

bitfission.com

(warning autoplays midi)
@leinweber
coming from
citus cloud

heroku postgres
@leinweber
special thanks
citus cloud

— dan farina (@danfarina)

heroku postgres

— maciek sakrejda (@uhoh_itsmaciek)
@leinweber
same sorts of problems
from pages & alerts

from support tickets
@leinweber
this talk
more app dev who uses postgres

rather than dba
@leinweber
the problem with Postgres
it’s pretty good

you don’t get experience with how it breaks
@leinweber
what to do for a problem
@leinweber
what to do for a problem
@leinweber
complicated system
network

hardware

o/s

postgres
@leinweber
using the database (too much)
95% application

4% auto vacuum 

1% everything else
@leinweber
hard to convince
all the graphs saying DB is slow

and nothing has changed

…must be the database!
@leinweber
https://upload.wikimedia.org/wikipedia/commons/9/98/Survivorship-bias.png
@leinweber
“but I didn’t change anything”
no deploys!

no database migrations!

no scaling!
@leinweber
“but I didn’t change anything”
https://upload.wikimedia.org/wikipedia/commons/0/09/Redherring.gif
@leinweber
“but I didn’t change anything”
more traffic?

change in access patterns?

one big user logged in?
@leinweber
run out of a resource
@leinweber
snowball
@leinweber
example
manageable user 1s query => 2x expensive

frequent, small queries 3ms => 12ms
@leinweber
assumptions
hardware
maintenance
app
@leinweber
assumptions
postgres should not crash

…with overcommit off and no containers

large extensions increase chance
@leinweber
if not postgres, then what
@leinweber
system resources
cpu

memory

disk

parallelism / backends

locks
@leinweber
cpu mem disk parallelismcpu mem disk parallelism
@leinweber
cpu mem disk parallelism
credentials wrong

networking broken 

locking issue, check pg_locks

idle in transaction
@leinweber
cpu mem disk parallelism
application submitting backlogged workload

connection leak

pool sizes set too large

pg_lock issue + application backlog
@leinweber
cpu mem disk parallelism
workload skew causing thrashing

unusual sequential scan workload

failover or restart => no cache

pg_prewarm
@leinweber
cpu mem disk parallelism
same as just disk, 

but also the application is piling on
@leinweber
cpu mem disk parallelism
large GROUP BYs

high disk latency due to unusual page
dispersion pattern in the workload
@leinweber
cpu mem disk parallelism
workload has high mem (GROUP BY)

+ app adding backlog

lock contention slowing mem release
@leinweber
cpu mem disk parallelism
large GROUP BYs + paging in unusual data
@leinweber
cpu mem disk parallelism
Look for what is causing disk access
@leinweber
cpu mem disk parallelism
small, in-memory workload

lots of seq scans on small table

index scan w/ filter dropping lots
@leinweber
cpu mem disk parallelism
app backlog 

+ too much processing on small data

simply a lot of work
@leinweber
cpu mem disk parallelism
large seq scans
@leinweber
cpu mem disk parallelism
loading cold data + application backlog
@leinweber
cpu mem disk parallelism
small # of backends doing a lot more work
@leinweber
cpu mem disk parallelism
entity, workload, entity*workload

soft deletes and non-conditional indexes
@leinweber
cpu mem disk parallelism
reporting query
@leinweber
cpu mem disk parallelism
app backlog, but with CPU/mem problems
@leinweber
tools of the trade
@leinweber
tools of the trade
C symbols
@leinweber
tools of the trade: perf
perf record -p <pid> && perf report
@leinweber
tools of the trade: perf
perf top
@leinweber
tools of the trade: perf
www.brendangregg.com/perf.html
@leinweber
tools of the trade: gdb
gdb -batch -ex 'bt' -p <pid>
@leinweber
@leinweber
@leinweber
tools of the trade: iostat
iostat -xm 10
@leinweber
tools of the trade: iotop
@leinweber
tools of the trade: htop
@leinweber
Tools of the trade: bwm-ng
@leinweber
tools of the trade: backends
pgrep -lf postgres + grep + wc
select * from pg_stat_activity
@leinweber
tools of the trade: pg_s_s
select * from pg_stat_statements
@leinweber
tools of the trade: summary
cpu mem disk parallelism network
perf x
gdb x
iostat x
iotop x
htop x x
bwm x
pgrep x
@leinweber
what to do
@leinweber
what to do
configuration change
@leinweber
what to do
db change
@leinweber
what to do
code change
@leinweber
flirting with disaster
Velocity NY 2013: Richard Cook

"Resilience In Complex Adaptive Systems”

Jens Rasmussen:

Risk management in a dynamic society: a
modeling problem
@leinweber
flirting with disaster
economic
boundary
@leinweber
flirting with disaster
economic
boundary
workload
boundary
@leinweber
flirting with disaster
economic
boundary
workload
boundary
performance
boundary
@leinweber
flirting with disaster
economic
boundary
workload
boundary
performance
boundary
error
margin
@leinweber
flirting with disaster
economic
boundary
workload
boundary
performance
boundary
@leinweber
flirting with disaster
economic
boundary
workload
boundary
performance
boundary
error
margin
@leinweber
flirting with disaster
economic
boundary
workload
boundary
performance
boundary
error
margin
@leinweber
flirting with disaster
Velocity NY 2013: Richard Cook

"Resilience In Complex Adaptive Systems”

Jens Rasmussen: 

Risk management in a dynamic society: a
modeling problem
@leinweber
thank youWill Leinweber
@leinweber
citusdata.com

More Related Content

Similar to When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber

Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
Michael Peacock
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
HostedbyConfluent
 
Delivering Data - Social Networking Personal
Delivering Data - Social Networking PersonalDelivering Data - Social Networking Personal
Delivering Data - Social Networking Personal
iasaireland
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphite
nanderoo
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
Server Density
 

Similar to When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber (20)

Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
Big data no company
Big data   no companyBig data   no company
Big data no company
 
Big Data - Need of Converged Data Platform
Big Data - Need of Converged Data PlatformBig Data - Need of Converged Data Platform
Big Data - Need of Converged Data Platform
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
 
Day 9 - PostgreSQL Application Architecture
Day 9 - PostgreSQL Application ArchitectureDay 9 - PostgreSQL Application Architecture
Day 9 - PostgreSQL Application Architecture
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Delivering Data - Social Networking Personal
Delivering Data - Social Networking PersonalDelivering Data - Social Networking Personal
Delivering Data - Social Networking Personal
 
Copy data management
Copy data managementCopy data management
Copy data management
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Tek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with GraphiteTek12: Graphing real-time performance with Graphite
Tek12: Graphing real-time performance with Graphite
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback LoopsDevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback Loops
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013High performance Infrastructure Oct 2013
High performance Infrastructure Oct 2013
 
PyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive applicationPyConline AU 2021 - Things might go wrong in a data-intensive application
PyConline AU 2021 - Things might go wrong in a data-intensive application
 
Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!Big Data: fall seven times, stand up eight!
Big Data: fall seven times, stand up eight!
 

More from Citus Data

Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Citus Data
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Citus Data
 

More from Citus Data (20)

Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
JSONB Tricks: Operators, Indexes, and When (Not) to Use It | PostgresOpen 201...
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig KerstiensWhats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
Whats wrong with postgres | PGConf EU 2019 | Craig Kerstiens
 
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise GrandjoncAmazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
Amazing SQL your ORM can (or can't) do | PGConf EU 2019 | Louise Grandjonc
 
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
What Microsoft is doing with Postgres & the Citus Data acquisition | PGConf E...
 
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff DavisDeep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
Deep Postgres Extensions in Rust | PGCon 2019 | Jeff Davis
 
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
Why Postgres Why This Database Why Now | SF Bay Area Postgres Meetup | Claire...
 
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise GrandjoncA story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
A story on Postgres index types | PostgresLondon 2019 | Louise Grandjonc
 
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
Why developers need marketing now more than ever | GlueCon 2019 | Claire Gior...
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
Optimizing your app by understanding your Postgres | RailsConf 2019 | Samay S...
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
Using Postgres and Citus for Lightning Fast Analytics, also ft. Rollups | Liv...
 
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineHow to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
How to write SQL queries | pgDay Paris 2019 | Dimitri Fontaine
 
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire GiordanoWhy PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
Why PostgreSQL Why This Database Why Now | Nordic PGDay 2019 | Claire Giordano
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
 
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
Data Modeling, Normalization, and Denormalisation | FOSDEM '19 | Dimitri Font...
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 

When it all Goes Wrong |Nordic PGDay 2019 | Will Leinweber