To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
10.pdfMature Call girls in Dubai +971563133746 Dubai Call girls
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
1. August 2018
An Introduction to Performance
Monitoring for PostgreSQL
Sebastian Insausti
Presenter
sebastian@severalnines.com
2. Copyright 2017 Severalnines AB
I'm Jean-Jérôme from the Severalnines Team and
I'm your host for today's webinar!
Feel free to ask any questions in the Questions
section of this application or via the Chat box.
You can also contact me directly via the chat box
or via email: info@severalnines.com during or
after the webinar.
Your host & some logistics
8. Poll 1 - What databases do you currently
use?
Copyright 2018 Severalnines AB
(select one or more)
● PostgreSQL
● MySQL/MariaDB
● MongoDB
● Oracle and/or MS SQL
● Other
9. August 2018
An Introduction to Performance
Monitoring for PostgreSQL
Sebastian Insausti
Presenter
sebastian@severalnines.com
10. Agenda
● PostgreSQL architecture overview
● Key PostgreSQL metrics and their meaning
○ Troubleshooting performance problems in production
○ Tuning
● Performance monitoring tools
● Impact of monitoring on performance
● How to use ClusterControl to identify performance issues
○ Demo
12. Fundamental Parts
● Processes
○ Postgres Server Process
○ Backend Process
○ Background Process
○ Replications Associated Process
○ Background Worker Process
● Memory
○ Local memory area
○ Shared memory area
● Disk
○ Data Files
○ WAL Files
○ Log Files
19. System Monitoring
● CPU Usage: Percentage use of CPU (%cpu)
● RAM Usage: Amount of free RAM memory (mem free)
● Network: Packet loss or high latency (packet time or
packet loss)
● Disk Usage: Percentage use of disk (use%)
● Disk IOPS: Read or write per second, and IO wait.
(r/s, w/s, iowait)
● SWAP usage: Amount of free SWAP memory
(swap free)
22. Caching (1 of 3)
Cache hits vs disk hits: Disk access is expensive, we want to fetch most
of the data in memory.
Check queries to confirm if you are using cache or disk (EXPLAIN
ANALYZE BUFFER).
Related parameters:
● shared_buffers: The amount of memory that the database server
uses for shared memory buffers. If this value is too low, the
database would use more disk, which would cause more slowness.
23. ● work_mem: Amount of memory used by the internal operations of
ORDER BY, DISTINCT and JOIN before writing to the temporary files on
disk. If this value is too low, the database would use more disk.
● temp_buffers: Used to store the temporary tables used in each session.
This parameter sets the maximum amount of memory for this task.
Caching (2 of 3)
24. Caching (3 of 3)
● maintenance_work_mem: Maximum memory that an operation like
Vacuuming, adding indexes or foreign keys can consume.
● effective_cache_size: Used by the query planner to take into account
plans that may or may not fit in memory. A high value makes it more
probable that index scans are used and a low value makes it more
probable that sequential scans will be used.
25. Connections
Amount of connections: Create a baseline and check for odd patterns.
○ Increasing: Bad use of connection pooling, locking, increase of activity.
○ Decreasing: Application problem , networking issue.
State of connections: Search for queries in a particular state. How we
manage transactions in our applications can impact here.
Related parameters:
● max_connections: This parameter determines the maximum number
of simultaneous connections to our database.
26. Checkpoints (1 of 2)
Checkpoints are points in the sequence of transactions at which all data files
have been updated with all information written before that checkpoint.
In the event of a crash, the crash recovery procedure looks at the latest
checkpoint record to determine the point in the log (known as the redo
record) from which it should start the REDO operation.
Checkpoint frequency: Frequency impacts disk I/O performance.
27. Checkpoints (2 of 2)
Related parameters:
● Checkpoint_timeout: Maximum time between automatic WAL
checkpoints, in seconds.
● max_wal_size: Maximum size that the WAL is allowed to grow between
the control points.
● min_wal_size: When the WAL file is kept below this value, it is recycled for
future use at a checkpoint, instead of being deleted.
● wal_sync_method: It is used to force WAL updates to disk.
● wal_buffers: Amount of shared memory used for WAL data that has not
yet been written to disk.
28. High number of commits: Can be caused by inefficient bulk loads. Check
workload and what have changed.
Related parameters:
● synchronous_commit: It specifies if the transaction commit will wait for
the WAL records to be written to disk before the command returns a
"success" indication to the client.
Possible values: on, remote_apply, remote_write, local and off.
Commits (1 of 2)
29. [root@postgres1 /]# ./pgbench -c50 -N -Upgbtest pgbtest
Commits (2 of 2)
synchronous_commit TPS
on (default) 679.942166
off 913.768318
local 778.297985
remote_write 719.684452
remote_apply 630.358726
30. Lag and state: The key metrics to monitor here would be the lag and the
replication state.
● Check for networking issues.
● Check for resources or underdimesioning issues.
Related parameters:
● max_wal_senders: It specifies the maximum number of concurrent
connections from standby servers or streaming base backup clients. The
parameter cannot be set higher than max_connections.
Replication
31. Vacuum (1 of 3)
Vacuum process: It is responsible for several maintenance tasks in the database,
one of them recovering storage used by dead tuples. If the VACUUM is taking too
much time or resources, it means that we must do it more frequently
To monitor the vacuum process, check for dead tuples and last time vacuum
execution. We have this information in the pg_stat_user_tables:
SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables;
relname | n_dead_tup | last_autovacuum
-------------+------------------+-------------------------------
setups | 343688 | 2018-08-15 05:55:30.309274+00
users | 234865 | 2018-08-15 21:46:41.015965+00
32. Vacuum (2 of 3)
If the autovacuum process is not running:
● Check process on the operating system:
[root@postgres1 /]# ps aux |grep autovacuum
postgres 283 0.0 0.8 435340 8768 ? Ss 00:44 0:01 postgres: autovacuum launcher process
● Check autovacuum status on the database:
SELECT name, setting FROM pg_settings WHERE name='autovacuum';
name | setting
------------+---------
autovacuum | on
(1 row)
33. Vacuum (3 of 3)
Related parameters:
● autovacuum_work_mem: It specifies the maximum amount of memory
to be used by each autovacuum worker process. It defaults to -1,
indicating that we are using maintenance_work_mem.
34. Check Error Log: Check your log for errors like ‘FATAL’ or ‘deadlock’, or even
for common errors for proactive maintenance.
In general, the error messages contain a description of the issue, detailed
information, and a hint.
Examples:
2018-08-19 02:06:28.053 UTC [28856] FATAL: password authentication failed
for user "username"
2018-08-19 01:59:02.998 UTC [28789] ERROR: duplicate key value violates
unique constraint "sbtest21_pkey"
Monitoring the Error Log (1 of 2)
35. Monitoring the Error Log (2 of 2)
2018-08-18 12:56:38.520 -03 [1181] ERROR: deadlock detected
2018-08-18 12:56:38.520 -03 [1181] DETAIL: Process 1181 waits for ShareLock on transaction 579; blocked
by process 1148.
Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
Process 1181: UPDATE country SET population=18886001 WHERE code='AUS';
Process 1148: UPDATE country SET population=15864001 WHERE code='NLD';
2018-08-18 12:56:38.520 -03 [1181] HINT: See server log for query details.
2018-08-18 12:56:38.520 -03 [1181] CONTEXT: while updating tuple (0,15) in relation "country"
2018-08-18 12:56:38.520 -03 [1181] STATEMENT: UPDATE country SET population=18886001 WHERE
code='AUS';
2018-08-18 12:59:50.568 -03 [1181] ERROR: current transaction is aborted, commands ignored until end of
transaction block
36. Patterns: Check the patterns of the queries. Differences in time or frequency.
Operation: If you have a lot of reads, consider sending to a slave.
Locks or indexes: Understand how locking works, and if there are deadlocks.
Look for unindexed queries or unused indexes.
Queries
37. ● There are several types of locks.
● The important thing about them, is how they conflict with each other.
Locks
38. Queries
Slow queries:
● Resources: Check for load somewhere, high CPU, or swapping.
● Inefficient plan: Check for using correct indexes, bloat or out of date
statistics.
● Locks: Check for queries waiting for another query.
Related parameters:
● default_statistics_target: PostgreSQL collects statistics from each of
the tables to decide how queries will be executed on them. This value set
the number of rows to be inspected by ANALYZE process.
39. Queries
world=# EXPLAIN SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
QUERY PLAN
--------------------------------------------------------------------------
Nested Loop (cost=0.00..734.81 rows=50662 width=144)
-> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31)
Filter: ((id > 100) AND (population > 700000))
-> Materialize (cost=0.00..8.72 rows=146 width=113)
-> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=113)
Filter: (population < 7000000)
(6 rows)
40. Queries
world=# EXPLAIN ANALYZE SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..734.81 rows=50662 width=143) (actual time=0.040..22.066 rows=51100 loops=1)
-> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) (actual time=0.025..0.581 rows=350 loops=1)
Filter: ((id > 100) AND (population > 700000))
Rows Removed by Filter: 3729
-> Materialize (cost=0.00..8.72 rows=146 width=112) (actual time=0.000..0.010 rows=146 loops=350)
-> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=112) (actual time=0.005..0.053 rows=146 loops=1)
Filter: (population < 7000000)
Rows Removed by Filter: 93
Planning time: 0.123 ms
Execution time: 24.052 ms
(10 rows)
41. world=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..734.81 rows=50662 width=143) (actual time=0.034..21.384 rows=51100 loops=1)
Buffers: shared hit=37
-> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) (actual time=0.025..0.637 rows=350 loops=1)
Filter: ((id > 100) AND (population > 700000))
Rows Removed by Filter: 3729
Buffers: shared hit=32
-> Materialize (cost=0.00..8.72 rows=146 width=112) (actual time=0.000..0.010 rows=146 loops=350)
Buffers: shared hit=5
-> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=112) (actual time=0.005..0.054 rows=146 loops=1)
Filter: (population < 7000000)
Rows Removed by Filter: 93
Buffers: shared hit=5
Planning time: 0.134 ms
Execution time: 23.881 ms
Queries
43. Poll 2 - What tools do you use to monitor
PostgreSQL?
Copyright 2018 Severalnines AB
(select one or more)
● On-prem (Nagios, Zabbix)
● SaaS solution (DataDog, NewRelic)
● Postgres centric (Postgres Enterprise Manager, pgwatch2, …)
● Polyglot (ClusterControl)
● Other
44. Built-in
● Error Log
Automating some monitoring of the error log, looking
for key words like FATAL, ERROR or DEADLOCK is really
useful.
● Statistics collector
The collector can count accesses to tables and indexes
in both disk-block and individual-row terms, tracks the
total number of rows in each table, and information
about vacuum and analyze actions for each table.
45. Contributed / External
● pg_stat_statements
It help us to know the query profile of your database.
It tracks all the queries that are executed and stores a
lot of useful statistics in a table called
pg_stat_statements.
● pg_stat_plans
This builds on pg_stat_statements and records query
plans for all executed queries.
46. Contributed / External
● pgBadger
Performs an analysis of PostgreSQL logs and displays
them in an HTML file.
pgBadger is able to autodetect your log file format.
Parses huge log files as well as gzip compressed files.
47. Contributed / External
● pg_buffercache
Allows to check what's happening in the shared buffer
cache in real time, showing how many pages are
currently held in the cache.
● pgstattuple
Generates statistics for tables and indexes, shows how
much space used by each table/index is consumed by
live tuples, deleted tuples or how much unused space is
available in each relation.
48. Operating System
● top: Check CPU, Memory, Load and more
● ps: Check processes running
● free: Check memory (RAM & SWAP)
● netstat / ping / ifconfig: Check the network state
● iostat / iotop: Check the Disk access
50. Nagios is an Open Source system and network
monitoring application.
You can monitor network services, host resources,
and more.
For monitoring PostgreSQL you can use:
● Plugins
● Create your own script
Nagios
51. Zabbix is a software that can monitor both
networks and servers.
Flexible notification mechanism
Offers reports and data visualization based on the
stored data.
Zabbix is accessed by a web interface.
Zabbix
52. ClusterControl
ClusterControl is a polyglot management and
monitoring system that helps to deploy,
manage, monitor and scale different databases.
Supports PostgreSQL, MySQL, MariaDB,
MongoDB, Galera Cluster and more.
53. More Information
For more information about how to monitoring PostgreSQL with an external tool
you can check the following blog:
The Best Alert and Notification Tools for PostgreSQL
https://severalnines.com/blog/best-alert-and-notification-tools-postgresql
56. Poll 3 - How are your Postgres databases
performing?
Copyright 2018 Severalnines AB
(select one)
● Good, they are well tuned
● Poorly, we need to optimize them
● Poorly despite optimizing, we need a new DB architecture
● Good, but we might run into (traffic growth) issues
● Other