PostgreSQL can be difficult to troubleshoot when the pressure is on without the right knowledge and tools. Knowing where to find the information you need to improve performance is central to your ability to act quickly and solve problems. In this training, we'll discuss the various query statistic views and log information that's available in PostgreSQL so that you can solve problems quickly. Along the way, we'll highlight a handful of open-source and paid tools that can help you track data over time and provide better alerting capabilities so that you know about problems before they become critical.
2. Grant Fritchey
DevOps Advocate
Microsoft Data Platform MVP
AWS Community Builder
About me
grant@scarydba.com
scarydba.com
@gfritchey
linkedin.com/in/grant-fritchey
3. Ryan Booz
PostgreSQL & DevOps
Advocate
@ryanbooz
About me
/in/ryanbooz
www.softwareandbooz.com
youtube.com/@ryanbooz
5. • Understand common
performance problems and how
to spot them
• Develop knowledge about the
tools within PostgreSQL that
show performance issues
• Learn about open source and 3rd
party monitoring solutions
Goals
6. Most Common Database
Problems
• Performance
• Scalability
• Security
• Multiple platforms
• Data integration & integrity
• Compliance
13. Statistics Views
• There are many. Each release tends to add more,
or more detail
• Most standard views ship with multiple versions:
• "all" view – contains a row for all objects of that type
• "sys" view – contains a row only for system objects of that type
• "user" view – contains a row only for user objects of that type
14. Standard Go-to Statistics Views
General
• pg_stat_database
• pg_stat_(all/sys/user)_tables
• pg_stat_(all/sys/user)_indexes
• pg_stat_io (PostgreSQL 16+)
For replication
• pg_stat_replication
• pg_stat_replication_slots
• pg_stat_subscription
• pg_stat_bgwriter
• pg_stat_archiver
• pg_stat_wal
15. pg_stat_database
• High-level statistics about each database
• Transaction counts
• Blocks used from cache or disk
• Tuples (rows) inserted, updated, deleted, etc.
• Deadlocks
• Session stats
• And more!
16. pg_stat_*_tables
• Some similarity to database statistics, but at the table level
• Tuple inserts, updates, deletes, hot updates, live, dead, etc.
• Time of last:
• table scan
• seq scan
• vacuum/autovacuum
• analyze/autoanalyzer
• And more!
19. pg_stat_io
• New in PostgreSQL 16
• Part of the 'contrib' modules
• I/O related statistics based on backend type, object type,
and context
• Helpful for tuning
• 'shared_buffers' (page cache)
• Checkpointer inefficiency
• Issues with background jobs
20. What is pg_stat_statements?
• An extension included with PostgreSQL 8.4+
• It is part of the contrib module but not enabled by default
• Must be loaded via ‘shared_preload_libraries’ in postgresql.conf
• Tracks aggregated statistics of all queries in the cluster
• Installing the extension in the database creates the necessary views to query the data
21. • Every dbid, userid, and
queryid
• Stats are grouped based on
query structure and final ID as
determined by an internal
hash calculation
How does it store aggregates?
22. How does it identify queries?
SELECT id, name FROM table1 WHERE id = 1000;
SELECT id, name FROM table1 WHERE id = $1;
SELECT id, name FROM table1 WHERE id IN
(1000,2000,3000);
SELECT id, name FROM table1 WHERE id IN
($1,$2,$3);
23. pg_stat_statement statistics
• Execution Time (total/min/max/mean/stddev)
• Planning Time (total/min/max/mean/stddev)
• Calls (total)
• Rows (total)
• Buffers (shared/local/temp)
• read/hit/dirtied/written
• read/write time
• WAL
26. All statistics are cumulative
from the last restart*
*or reset by a superuser
27. Caveats
• PostgreSQL 13
• modified column names to include
planning statistics
• PostgreSQL 14
• Must set “compute_query_id”=true
in postgresql.conf
• Includes informational view for
allocation and “last reset”
information
29. pg_stat_io
• New in PostgreSQL 16
• Part of the 'contrib' modules
• I/O related statistics based on backend type, object
type, and context
• Helpful for tuning
• 'shared_buffers' (page cache)
• Checkpointer inefficiency
• Issues with background jobs
30. pg_stat_kcache
• Open-source extension that provides statistics about real
reads and writes done at the filesystem layer
• Requires pg_stat_statements to be installed
• Must be added to the 'shared_preload_libraries' configuration
parameter
• Query-level filesystem statistics
• *Not often available in hosted environments
31. • AWS/Azure/GCP
have some limited
tooling to
track/display some
of this data
Postgres Tools for Data Collection
33. Common Tools
• pgAdmin and other IDEs
• Numerous open-source, Grafana-based tools
• pgWatch
• pgDash
• pg_stat_monitor
• auto_explain
• Some recent Python-based solutions, but they
aren't dynamic
34. Pros/cons
• Native tools, where available, are used by many, well
documented, and usually easy to get started with (at
least SQL Server)
• However, they can only go as far as the tool allows, and
they rarely have exactly the right documentation to
understand how to act upon the data
• Not all of these tools help you bring correlation between
the graphs and issues.
• Things are (at least slightly) desperate.
35. • Statistics determine plan choice
• Customizable per server, table, and column
• Asynchronous process maintains statistics
• Manually update with ANALYZE
Cost-based Optimizer
36. • Histogram of column values
• Defaults to 100 buckets
• Helpful views:
• pg_catalog.pg_stats
• pg_catalog.pg_stat_user_tables
Statistics
37. • EXPLAIN = Estimated plan
• EXPLAIN ANALYZE = Actual plan
• EXPLAIN (ANALYZE,BUFFERS)
• Query plan with disk IO
• EXPLAIN (ANALYZE,BUFFERS,VERBOSE)
• Additional details on columns, schemas, etc.
EXPLAIN in Practice
38. • Inverted tree representation
• There is no built-in visual execution plan
• EXPLAIN provides textual plan
• pgAdmin does attempt some visualizations
• Websites for visualizations and suggestions
• https://www.pgmustard.com/
• https://explain.depesz.com/
EXPLAIN
49. • Understand the common
performance problems and how
to spot them
• Develop knowledge about the
tools within PostgreSQL that
show performance issues
• Learn about open source and 3rd
party monitoring solutions
Goals
50. Grant Fritchey
DevOps Advocate
Microsoft Data Platform MVP
AWS Community Builder
About me
grant@scarydba.com
scarydba.com
@gfritchey
linkedin.com/in/grant-fritchey
51. Ryan Booz
PostgreSQL & DevOps
Advocate
@ryanbooz
About me
/in/ryanbooz
www.softwareandbooz.com
youtube.com/@ryanbooz
Study Identifies Top Five Most Challenging Database Management Issues (datavail.com)
5 Common Database Management Challenges & How to Solve Them (hackread.com)
Blog: 9 Common Database Management Challenges and How to Fix Them | Tudip
Top 10 Big Data Challenges and How to Address Them (techtarget.com)
dumpster+fire.jpg (400×281) (bp.blogspot.com)
Big question mark | in Ipswich | Benjamin Reay | Flickr
www.maxpixel.net | 522: Connection timed out
Big question mark | in Ipswich | Benjamin Reay | Flickr
tools.jpg (1200×675) (areweconnected.com)
circuit board - Bing images
Nagios is a log tool with a PostgreSQL add-on. pgWatch, pgHero, pgAdmin, pgCluu are database tools
DataDog and Better Stack are log monitor tools that add additional metrics. The other three are all focused database tools, but of the three, only SQL Monitor works on more than just PostgreSQL