The document is a detailed exploration of PostgreSQL statistics covered in a presentation by Alexey Lesovsky at pgConf US 2016. It discusses the significance of PostgreSQL statistics for monitoring and performance tuning, the types of statistics available, and various methods to effectively utilize them, including queries and tools. The presentation emphasizes the importance of understanding these statistics to resolve issues and optimize database operations.
Overview of PostgreSQL statistics, presentation by Alexey Lesovsky at PGConf US 2016.
Defines PostgreSQL activity statistics, their effective use, and problem-solving goals.
Details process information in PostgreSQL, focusing on key processes and where time is spent.
Describes the overwhelming amount of information with statistics, lack of history, and absence of native tools.
Lists various built-in views, functions, and extensions available for gathering statistics.
Further discusses statistics sources and their implications on database performance.
Calculates the cache hit ratio from PostgreSQL statistics, aiming for a ratio of over 90%.
Examines database activity metrics, identifying potential anomalies in transaction statistics.
Statistics related to checkpoint activities and the role of background writers in PostgreSQL.
Details about PostgreSQL replication, including current states, locations, and lag analyzes.
Looks into statistics related to table activity, including sequential scans and autovacuum behavior.Explains how autovacuum settings impact dead tuples and maintenance thresholds.
Identifies the negatives of unused indexes on storage, performance, and additional workload.
Summarizes queries and transaction states, focusing on long-running queries and waiting clients.
Analyzes queries using pg_stat_statements, assessing performance, usage, and averages.
Discusses additional tools that enhance PostgreSQL statistics tracking and data performance.
Highlights the usefulness of PostgreSQL statistics and provides links for further reading.
$ ps hf-u postgres -o cmd
/usr/pgsql-9.5/bin/postgres -D /var/lib/pgsql/9.5/data
_ postgres: logger process
_ postgres: checkpointer process
_ postgres: writer process
_ postgres: wal writer process
_ postgres: autovacuum launcher process
_ postgres: stats collector process
_ postgres: postgres pgbench [local] idle in transaction
_ postgres: postgres pgbench [local] idle
_ postgres: postgres pgbench [local] UPDATE
_ postgres: postgres pgbench [local] UPDATE waiting
_ postgres: postgres pgbench [local] UPDATE
Black box
6.
Write Ahead Log
Shared
Buffers
BuffersIO Autovacuum Workers
Autovacuum Launcher
Background Workers
Indexes IO
Query Execution
Query Planning
Client Backends Postmaster
Relations IO
Logger Process Stats Collector
Logical
Replication
WAL Sender
Process
Archiver
Process
Background
Writer
Checkpointer
Process
Network Storage
Recovery Process
WAL Receiver Process
Tables/Indexes Data Files
Where PostgreSQL spends its time
7.
Too much information(more than 100 counters in 9.5).
Statistics are provided as an online counters.
No history (but reset functions are available).
No native handy stat tools in PostgreSQL.
A lot of 3rd party tools and programs.
Problems
8.
Too much information(more than 100 counters in 9.5).
Statistics are provided as an online counters.
No history (but reset functions are available).
No native handy stat tools in PostgreSQL.
A lot of 3rd party tools and programs.
Important to use stats directly from PostgreSQL.
Basic SQL skills are required.
Problems
9.
Counters in sharedmemory.
Functions.
Builtin views.
Official extensions in contribs package.
Unofficial extensions.
Statistics sources
$ select *from pg_stat_database;
...
blks_read | 7978770895
blks_hit | 9683551077519
...
$ select
sum(blks_hit)*100/sum(blks_hit+blks_read) as hit_ratio
from pg_stat_database;
More is better, and not less than 90%
Cache hit ratio
$ select *from pg_stat_replication;
...
sent_location | 1691/EEE65900
write_location | 1691/EEE65900
flush_location | 1691/EEE65900
replay_location | 1691/EEE658D0
...
1692/EEE65900 — location in transaction log (WAL)
All values are equal = ideal
Replication lag
22.
Lag causes:
Networking
Storage
CPU
How manybytes written in WAL
$ select
pg_xlog_location_diff(pg_current_xlog_location(),'0/00000000');
Replication lag in bytes
$ select
client_addr,
pg_xlog_location_diff(pg_current_xlog_location(), replay_location)
from pg_stat_replication;
Replication lag in seconds
$ select
extract(epoch from now() - pg_last_xact_replay_timestamp());
Replication lag
23.
$ select
client_addr asclient,
pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),sent_location)) as pending,
pg_size_pretty(pg_xlog_location_diff(sent_location,write_location)) as write,
pg_size_pretty(pg_xlog_location_diff(write_location,flush_location)) as flush,
pg_size_pretty(pg_xlog_location_diff(flush_location,replay_location)) as replay,
pg _size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),replay_location)) as total
from pg_stat_replication;
client | pending | network | written | flushed | total
-----------+----------+----------+---------+------------+------------
127.0.0.1 | 0 bytes | 0 bytes | 0 bytes | 48 bytes | 48 bytes
10.1.0.8 | 12 GB | 30 MB | 0 bytes | 156 kB | 12 GB
10.2.0.6 | 0 bytes | 48 bytes | 0 bytes | 551 MB | 552 MB
Replication lag
$ select
s.relname,
pg_size_pretty(pg_relation_size(relid)),
coalesce(n_tup_ins,0) +2 * coalesce(n_tup_upd,0) -
coalesce(n_tup_hot_upd,0) + coalesce(n_tup_del,0) AS total_writes,
(coalesce(n_tup_hot_upd,0)::float * 100 / (case when n_tup_upd > 0
then n_tup_upd else 1 end)::float)::numeric(10,2) AS hot_rate,
(select v[1] FROM regexp_matches(reloptions::text,E'fillfactor=(d+)') as
r(v) limit 1) AS fillfactor
from pg_stat_all_tables s
join pg_class c ON c.oid=relid
order by total_writes desc limit 50;
What is Heap-Only Tuples?
HOT does not cause index update.
HOT is only for non-indexed columns.
Big n_tup_hot_upd = good.
How to increase n_tup_hot_upd?
Write activity
$ select *from pg_stat_activity;
...
backend_start | 2015-10-14 15:18:03.01039+00
xact_start | 2015-10-14 15:21:15.336325+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
...
$ select
client_addr, usename, datname,
clock_timestamp() - xact_start as xact_age,
clock_timestamp() - query_start as query_age,
query
from pg_stat_activity order by xact_start, query_start;
Long queries and xacts
40.
$ select *from pg_stat_activity;
...
backend_start | 2015-10-14 15:18:03.01039+00
xact_start | 2015-10-14 15:21:15.336325+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
...
$ select
client_addr, usename, datname,
clock_timestamp() - xact_start as xact_age,
clock_timestamp() - query_start as query_age,
query
from pg_stat_activity order by xact_start, query_start;
clock_timestamp() for calculating query or transaction age.
Long queries: remember, terminate, optimize.
Long queries and xacts
41.
$ select *from pg_stat_activity where state in
('idle in transaction', 'idle in transaction (aborted)';
...
xact_start | 2015-10-14 15:21:21.128192+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
state | idle in transaction
...
Bad xacts
42.
$ select *from pg_stat_activity where state in
('idle in transaction', 'idle in transaction (aborted)';
...
xact_start | 2015-10-14 15:21:21.128192+00
query_start | 2015-10-14 15:21:30.336325+00
state_change | 2015-10-14 15:21:30.33635+00
state | idle in transaction
...
idle in transaction, idle in transaction (aborted) = bad
Warning value: > 5
clock_timestamp() for calculate xact age.
Bad xacts: remember, terminate, optimize app.
Bad xacts
$ select *from pg_stat_statements;
...
query | SELECT "id" FROM run_plan_xact(?)
calls | 11165832
total_time | 11743325.6880088
rows | 11165832
blk_read_time | 495425.535999976
blk_write_time | 0
Statements average time in ms
$ select (sum(total_time) / sum(calls))::numeric(6,3)
from pg_stat_statements;
The most writing (to shared_buffers) queries
$ select query, shared_blks_dirtied
from pg_stat_statements
where shared_blks_dirtied > 0 order by 2 desc;
pg_stat_statements
48.
query total time:15:43:07 (14.9%, CPU: 18.2%, IO: 9.0%)
сalls: 476 (0.00%) rows: 476,000
avg_time: 118881.54ms (IO: 21.2%)
user: app_user db: ustats
query: SELECT
filepath, type, deviceuid
FROM imvevents
WHERE
state = ?::eventstate
AND servertime BETWEEN $1 AND $2
ORDER BY servertime DESC LIMIT $3 OFFSET $4
https://goo.gl/6025wZ
Query reports
49.
query total time:15:43:07 (14.9%, CPU: 18.2%, IO: 9.0%)
сalls: 476 (0.00%) rows: 476,000
avg_time: 118881.54ms (IO: 21.2%)
user: app_user db: ustats
query: SELECT
filepath, type, deviceuid
FROM imvevents
WHERE
state = ?::eventstate
AND servertime BETWEEN $1 AND $2
ORDER BY servertime DESC LIMIT $3 OFFSET $4
Use sum() for calculating totals.
Calculate queries «contribution» in totals.
Resource usage (CPU, IO).
Query reports
50.
pg_statio_all_tables, pg_statio_all_indexes.
pg_stat_user_functions.
Size functions- df *size*
pgstattuple (in official contribs package)
● Bloat estimation for tables and indexes.
● Estimation time depends on table (or index) size.
pg_buffercache (in official contribs package)
● Shared buffers inspection.
● Heavy performance impact (buffers lock).
Behind this talk
51.
pgfincore (3rd partymodule)
● Low-level operations with tables using mincore().
● OS page cache inspection.
pg_stat_kcache (3rd party module)
● Using getrusage() before and after query.
● CPU usage and real filesystem operations stats.
● Requires pg_stat_statements and postgresql >= 9.4.
● No performance impact.
Behind this talk
52.
● The abilityto use statistics is useful.
● Statistics are not difficult.
● Statistics help to answer the questions.
● Do experiments.
Resume