SlideShare a Scribd company logo
▸ Sr. Database Architect at Medallia
▸ Recent fun employments:
▸ Principal Database Engineer@ WithMe
▸ Lead Database Architect @ OmniTI
▸ Expertise in PostgreSQL , Oracle, MySQL, NoSQL
▸ Contact : or
▸ Twitter: @DenishPatel
▸ Blog:
▸ Postgres Slack Channel (
▸ What to look for monitoring solution in general?
▸ Comparison - selected open source and commercial monitoring solutions
▸ Which metrics to collect and how?
▸ Which metrics to alert on and how to define thresholds?
▸ How to keep up with monitoring changes ?
▸ How to react on alerts at 3AM?
▸ Open discussion
▸ Blend of system monitoring with Postgres support
▸ Centralized monitoring
▸ Hosted vs On-premise
▸ Security concerns of clients
▸ Alerting and Dashboard/Graphs
▸ Easy installation and configuration
▸ Postgres Support
▸ pg_stat_statements
▸ Resource monitoring - CPU, RAM, DISK IO & Network
▸ pgbouncer support
▸ Open Source
▸ Sensu
▸ Zabbix
▸ Zenoss (Limited capabilities)
▸ Nagios
▸ (Stop using Nagios so it can die peacefully!!)
▸ SAAS Offerings
▸ Wavefront
▸ Circonus
▸ Vividcortex
▸ OkMeter
▸ NewRelic
Configuration Reco Confidence
Sensu Yes
Zabbix Yes Plugin
Zenoss Yes
Nagios Yes
Postgres support Configuration Confidence
Yes collectd plugins HIGH
Circonus Yes Default checks HIGH
Vividcortex Yes Default checks HIGH
Okmeter Yes
One click install
New Relic Yes
Plugins - missing
some metrics
▸ Capacity Planning
▸ Real Time Analytics
▸ Anomaly Detection
▸ Data Retention
▸ Support Reviews
▸ Pricing
▸ Nice Dashboard and alerting functionality
▸ Very scalable solution
▸ Works with existing metrics collection tools i.e collectd
▸ Real time analytics capability
▸ Complete monitoring suite
▸ It is agent based system so you just need to install agent in your environment
to monitor application, database or any other servers
▸ Very easy to install and configure
▸ Provides easy to configure Postgres Server monitoring using
pg_stat_statements with server stats. Once you install agent, you get
everything without any effort
▸ Built-in pgbouncer monitoring
▸ Built-in all resources monitoring ; Disk, CPU, Network & Memory
▸ 150+ DB clusters across the globe
▸ Easy installation
▸ Standardization
▸ Centralized solution
▸ Real time analytics
▸ Support new Infra - Docker/Aurora/Mesos
create role collectd login encrypted password 'XXX';
create schema collectd;
set search_path = collectd,pg_catalog;
grant usage on schema collectd to collectd;
alter role collectd set search_path = collectd,pg_catalog;
▸ Things to consider:
▸ Separate role for monitoring
▸ Limited permissions
LoadPlugin postgresql #
<Plugin postgresql>
<Database dba>
Host "localhost"
Port "5432"
User "collectd"
Query backends
Query transactions
Query queries
Query table_states
Query disk_io
Query disk_usage
Query query_plans
Query connections #custom
Query slow_queries #custom
Query txn_wraparound #custom
Query locks #custom
Query wal_files #custom
Query scans #custom
Query seq_scans #custom
Query avg_querytime #custom
Query checkpoints #custom
Query slave_lag #custom
Query backends>
Statement "SELECT count(*) AS count 
FROM pg_stat_activity 
WHERE datname = $1;"
Param database
Type "pg_numbackends"
ValuesFrom "count"
create or replace function pg_stat_activity()
returns set of pg_catalog.pg_stat_activity
as $$
return query(select * from pg_catalog.pg_stat_activity);
end $$
language plpgsql security definer;
revoke all on function pg_stat_activity() from public;
grant execute on function pg_stat_activity() to collectd;
<Query transactions>
Statement "SELECT xact_commit, xact_rollback 
FROM pg_stat_database 
WHERE datname = $1;"
Param database
Type "pg_xact"
InstancePrefix "commit"
ValuesFrom "xact_commit"
Type "pg_xact"
InstancePrefix "rollback"
ValuesFrom "xact_rollback"
<Query queries>
Statement "SELECT sum(n_tup_ins) AS ins, 
sum(n_tup_upd) AS upd, 
sum(n_tup_del) AS del, 
sum(n_tup_hot_upd) AS hot_upd 
FROM pg_stat_user_tables;"
Type "pg_n_tup_c"
InstancePrefix "ins"
ValuesFrom "ins"
<Query table_states>
Statement "SELECT sum(n_live_tup) AS live, sum(n_dead_tup) AS dead 
FROM pg_stat_user_tables;"
Type "pg_n_tup_g"
InstancePrefix "live"
ValuesFrom "live"
Type "pg_n_tup_g"
InstancePrefix "dead"
ValuesFrom "dead"
<Query query_plans>
Statement "SELECT sum(seq_scan) AS seq, 
sum(seq_tup_read) AS seq_tup_read, 
sum(idx_scan) AS idx, 
sum(idx_tup_fetch) AS idx_tup_fetch 
FROM pg_stat_user_tables;"
Type "pg_scan"
InstancePrefix "seq"
ValuesFrom "seq"
<Query disk_io>
Statement "SELECT coalesce(sum(heap_blks_read), 0) AS heap_read, 
coalesce(sum(heap_blks_hit), 0) AS heap_hit, 
coalesce(sum(idx_blks_read), 0) AS idx_read, 
coalesce(sum(idx_blks_hit), 0) AS idx_hit, 
coalesce(sum(toast_blks_read), 0) AS toast_read, 
coalesce(sum(toast_blks_hit), 0) AS toast_hit, 
coalesce(sum(tidx_blks_read), 0) AS tidx_read, 
coalesce(sum(tidx_blks_hit), 0) AS tidx_hit 
FROM pg_statio_user_tables;"
<Query disk_usage>
Statement "SELECT pg_database_size($1) AS size;"
Param database
Type pg_db_size
ValuesFrom "size"
<Query connections>
Statement "SELECT COUNT(state) AS count, state FROM (SELECT CASE
WHEN state = 'idle' THEN 'idle'
WHEN state = 'idle in transaction' THEN 'idle_in_transaction'
WHEN state = 'active' THEN 'active'
ELSE 'unknown' END AS state
FROM collectd.pg_stat_activity) state
GROUP BY state
SELECT COUNT(*) AS count, 'waiting' AS state
FROM collectd.pg_stat_activity WHERE waiting ;"
Type "pg_numbackends"
InstancePrefix "state"
InstancesFrom "state"
ValuesFrom "count"
<Query slow_queries>
Statement "SELECT COUNT(*) AS count FROM collectd.pg_stat_activity WHERE state='active'
AND now()-query_start > '300 seconds'::interval
AND query ~* '^(insert|update|delete|select)' ;"
Type "counter"
InstancePrefix "pg_slow_queries"
ValuesFrom "count"
<Query txn_wraparound>
Statement "SELECT age(datfrozenxid) as txn_wrap_age FROM pg_database ;"
Type "counter"
InstancePrefix "txn_wraparound"
ValuesFrom "txn_wrap_age"
<Query locks>
Statement "SELECT COUNT(mode) AS count, mode FROM pg_locks GROUP BY mode
UNION SELECT COUNT(*) AS count, 'waiting' AS mode FROM pg_locks
WHERE granted is false ;"
Type "gauge"
InstancePrefix "pg_locks"
InstancesFrom "mode"
ValuesFrom "count"
<Query wal_files>
Statement "SELECT archived_count AS count, failed_count AS failed FROM pg_stat_archiver;"
Type "gauge"
InstancePrefix "pg_wal_count"
ValuesFrom "count"
Type "gauge"
InstancePrefix "pg_wal_failed"
ValuesFrom "failed"
</Result> </Query>
<Query scans>
Statement "SELECT sum(idx_scan) as index_scans, sum(seq_scan) as seq_scans,
sum(idx_tup_fetch) as index_tup_fetch, sum(seq_tup_read) as seq_tup_read
FROM pg_stat_all_tables ; "
Type "pg_scan"
InstancePrefix "index"
ValuesFrom "index_scans"
<Query seq_scans>
Statement "SELECT CASE WHEN status='OK' THEN 0 ELSE 1 END AS status
FROM ( SELECT get_seq_scan_on_large_tables AS status
FROM collectd.get_seq_scan_on_large_tables) AS foo;"
Type "gauge"
InstancePrefix "pg_seq_scans"
ValuesFrom "status"
CREATE MATERIALIZED VIEW collectd.seq_scan_on_large_tables AS
SELECT relid, schemaname, relname, seq_scan, seq_tup_read ,
pg_relation_size(relid) as relsize, now() as refreshed_at
FROM pg_stat_all_tables
WHERE pg_relation_size(relid) > 1073741824
AND schemaname not in ('pg_catalog', 'information_schema')
UNION ALL SELECT 0,'0','0','0',0,0,now();
ALTER materialized VIEW collectd.seq_scan_on_large_tables OWNER TO collectd;
CREATE OR REPLACE FUNCTION collectd.get_seq_scan_on_large_tables()
v_matview text;
v_refreshed_at timestamptz;
v_tables_with_seq_scan text[];
SELECT refreshed_at INTO v_refreshed_at
FROM collectd.seq_scan_on_large_tables WHERE relid=0;
-- refresh MV every 4 hours
IF v_refreshed_at < now() - interval '4 hours' and pg_is_in_recovery() is false THEN
REFRESH MATERIALIZED VIEW collectd.seq_scan_on_large_tables;
SELECT ARRAY (SELECT base.relname ||':'|| (current.seq_scan-base.seq_scan) INTO v_tables_with_seq_scan
FROM collectd.seq_scan_on_large_tables AS base
LEFT JOIN pg_stat_all_tables AS current ON (base.schemaname=base.schemaname AND base.relname=current.relname)
WHERE (current.seq_scan-base.seq_scan) > 0 AND ((current.seq_tup_read-base.seq_tup_read)/(current.seq_scan-base.seq_scan)) > 50000 ) AS
IF v_tables_with_seq_scan = '{}' THEN
RETURN 'PROBLEM: Seq scan on table: '|| array_to_string(v_tables_with_seq_scan,'&');
<Query avg_querytime>
Statement "SELECT sum(total_time)/sum(calls) AS avg_querytime FROM
collectd.get_stat_statements() ;"
Type "gauge"
InstancePrefix "pg_avg_querytime"
ValuesFrom "avg_querytime"
<Query scans>
create extension IF NOT EXISTS pg_stat_statements WITH SCHEMA collectd;
alter schema collectd owner to collectd;
CREATE OR REPLACE FUNCTION collectd.get_stat_statements() RETURNS SETOF
pg_stat_statements AS
SELECT * FROM pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database());
<Query checkpoints>
Statement "SELECT (checkpoints_timed + checkpoints_req) AS total_checkpoints
FROM pg_stat_bgwriter ;"
Type "counter"
InstancePrefix "pg_checkpoints"
ValuesFrom "total_checkpoints"
<Query slave_lag>
Statement "SELECT CASE WHEN pg_is_in_recovery = 'false' THEN 0
ELSE COALESCE(ROUND(EXTRACT(epoch FROM now() pg_last_xact_replay_timestamp())),0) END
AS seconds
FROM pg_is_in_recovery();"
Type "counter"
InstancePrefix "slave_lag"
ValuesFrom "seconds"
▸ Uptime
▸ Waiting Connections
▸ # of connections waiting > 5
▸ Slow queries
▸ # of slow queries > 5
▸ Seq scan on large tables
▸ TXN Wraparound
▸ Age Over 1.5B
▸ Disk space usage
▸ 85%?
▸ Slave lag
▸ 5 minutes?
▸ Design with failover in mind
▸ Keep eyes on new features for monitoring in latest DB or OS version
▸ Postgres 9.5 enhancements
▸ Commit timestamp tracking
▸ SELECT * FROM pg_last_committed_xact();
▸ cluster_name
▸ $ ps -ef | grep checkpointer
▸ postgres 12181 12178 0 11:12 ? 00:00:00 postgres: personnel: checkpointer process
▸ postgres 12207 12204 0 11:12 ? 00:00:00 postgres: reportsdb: checkpointer process
▸ postgres 12233 12230 0 11:12 ? 00:00:00 postgres: management: checkpointer process
▸ A bunch of changes coming in Postgres 9.6
▸ Improve the pg_stat_activity view provides more details about waiting on what resources
▸ Deploy monitoring through config management tools 41
▸ PagerDuty calendar :
▸ Document metrics
▸ URL for the Dashboard
▸ Alert resolution procedure
▸ Clear SLAs (Decision)
▸ Escalation policy
▸ Scenarios
▸ Wait for server to bring backup
▸ Failover
▸ Review alerts before going OnCall
▸ Oncall notification
▸ Think for the worst and document accordingly
▸ What if you are in movie theatre/beach etc.?
▸ What if you can’t jump on the server?
▸ Keep the document up-to-date 42
▸ You!
▸ Conference committee
▸ Contact for further Q/A
▸ Twitter: @DenishPatel

More Related Content

What's hot

Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenPostgresOpen
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Joel Brewer
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
Alexey Bashtanov
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
Jignesh Shah
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
Hans-Jürgen Schönig
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
Joshua Zhu
Vacuum in PostgreSQL
Vacuum in PostgreSQLVacuum in PostgreSQL
Vacuum in PostgreSQL
Rafia Sabih
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
NTT DATA OSS Professional Services
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
Alexey Lesovsky
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA

What's hot (20)

Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
Vacuum in PostgreSQL
Vacuum in PostgreSQLVacuum in PostgreSQL
Vacuum in PostgreSQL
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
Tuning Autovacuum in Postgresql
Tuning Autovacuum in PostgresqlTuning Autovacuum in Postgresql
Tuning Autovacuum in Postgresql
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA

Viewers also liked

Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
Denish Patel
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
Denish Patel
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
Seth Familian
Two Elephants Inthe Room
Two Elephants Inthe RoomTwo Elephants Inthe Room
Two Elephants Inthe Room
Denish Patel
Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Denish Patel
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Denish Patel
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
Mark Wong
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!Denish Patel
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلاليةاختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
سمير بسيوني
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
John Paulett
PostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/SwitchbackPostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/Switchback
Vibhor Kumar
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBAshnikbiz
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
Overview of Postgres 9.5
Overview of Postgres 9.5 Overview of Postgres 9.5
Overview of Postgres 9.5
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
Amazon Web Services
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
Utkarsh De
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema DesignIron Speed

Viewers also liked (20)

Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
Two Elephants Inthe Room
Two Elephants Inthe RoomTwo Elephants Inthe Room
Two Elephants Inthe Room
Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)Out of the Box Replication in Postgres 9.4(PgConfUS)
Out of the Box Replication in Postgres 9.4(PgConfUS)
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلاليةاختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
اختلاف القراءات من صيغة الماضي إلى غيرها دراسة دلالية
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
PostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/SwitchbackPostgreSQL9.3 Switchover/Switchback
PostgreSQL9.3 Switchover/Switchback
Building Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDBBuilding Hybrid data cluster using PostgreSQL and MongoDB
Building Hybrid data cluster using PostgreSQL and MongoDB
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
Overview of Postgres 9.5
Overview of Postgres 9.5 Overview of Postgres 9.5
Overview of Postgres 9.5
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema Design

Similar to Advanced Postgres Monitoring

Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
InSync Conference
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Lukas Fittl
Apex and Virtual Private Database
Apex and Virtual Private DatabaseApex and Virtual Private Database
Apex and Virtual Private Database
Jeffrey Kemp
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14
Leon Rzhemovskiy
Sherlock holmes for dba’s
Sherlock holmes for dba’sSherlock holmes for dba’s
Sherlock holmes for dba’s
Kellyn Pot'Vin-Gorman
Aspects of 10 Tuning
Aspects of 10 TuningAspects of 10 Tuning
Aspects of 10 Tuning
Sage Computing Services
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
Alex Zaballa
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
Alex Zaballa
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
Alex Zaballa
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
Konstantine Krutiy
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
Alex Zaballa
Oracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or lessOracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or less
Leon Rzhemovskiy
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
Grant Fritchey
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
Emanuel Calvo

Similar to Advanced Postgres Monitoring (20)

Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
Apex and Virtual Private Database
Apex and Virtual Private DatabaseApex and Virtual Private Database
Apex and Virtual Private Database
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14Oracle_Audit_APEX IOUG Collaborate 14
Oracle_Audit_APEX IOUG Collaborate 14
Sherlock holmes for dba’s
Sherlock holmes for dba’sSherlock holmes for dba’s
Sherlock holmes for dba’s
Aspects of 10 Tuning
Aspects of 10 TuningAspects of 10 Tuning
Aspects of 10 Tuning
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should KnowDBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
DBA Brasil 1.0 - DBA Commands and Concepts That Every Developer Should Know
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
Oracle SQL Tuning
Oracle SQL TuningOracle SQL Tuning
Oracle SQL Tuning
Oracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or lessOracle audit and reporting in one hour or less
Oracle audit and reporting in one hour or less
PostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and AlertingPostgreSQL Performance Problems: Monitoring and Alerting
PostgreSQL Performance Problems: Monitoring and Alerting
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql

More from Denish Patel

Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Denish Patel
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Denish Patel
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Denish Patel
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Denish Patel
Choosing the "D" , Lightning talk
Choosing the "D" , Lightning talkChoosing the "D" , Lightning talk
Choosing the "D" , Lightning talkDenish Patel
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
Denish Patel
Achieving Pci Compliace
Achieving Pci CompliaceAchieving Pci Compliace
Achieving Pci Compliace
Denish Patel
Using SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparitionUsing SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparition
Denish Patel
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features IDenish Patel
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRep
Denish Patel

More from Denish Patel (11)

Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
Choosing the "D" , Lightning talk
Choosing the "D" , Lightning talkChoosing the "D" , Lightning talk
Choosing the "D" , Lightning talk
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
Achieving Pci Compliace
Achieving Pci CompliaceAchieving Pci Compliace
Achieving Pci Compliace
Using SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparitionUsing SQL Standards? Database SQL comparition
Using SQL Standards? Database SQL comparition
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features I
Yet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRepYet Another Replication Tool: RubyRep
Yet Another Replication Tool: RubyRep

Advanced Postgres Monitoring

  • 2. SPEAKER WHO IS THIS GUY? ▸ Sr. Database Architect at Medallia ▸ Recent fun employments: ▸ Principal Database Engineer@ WithMe ▸ Lead Database Architect @ OmniTI ▸ Expertise in PostgreSQL , Oracle, MySQL, NoSQL ▸ Contact : or ▸ Twitter: @DenishPatel ▸ Blog: ▸ Postgres Slack Channel ( 2
  • 3. AGENDA DISCUSSION LIST ▸ What to look for monitoring solution in general? ▸ Comparison - selected open source and commercial monitoring solutions ▸ Which metrics to collect and how? ▸ Which metrics to alert on and how to define thresholds? ▸ How to keep up with monitoring changes ? ▸ How to react on alerts at 3AM? ▸ Open discussion 3
  • 4. SELECTION CRITERIA WHAT TO LOOK FOR IN MONITORING SOLUTION ? ▸ Blend of system monitoring with Postgres support ▸ Centralized monitoring ▸ Hosted vs On-premise ▸ Security concerns of clients ▸ Alerting and Dashboard/Graphs ▸ Easy installation and configuration ▸ Postgres Support ▸ pg_stat_statements ▸ Resource monitoring - CPU, RAM, DISK IO & Network ▸ pgbouncer support 4
  • 5. COMPARISON MONITORING SOLUTIONS ▸ Open Source ▸ Sensu ▸ Zabbix ▸ Zenoss (Limited capabilities) ▸ Nagios ▸ (Stop using Nagios so it can die peacefully!!) ▸ SAAS Offerings ▸ Wavefront ▸ Circonus ▸ Vividcortex ▸ OkMeter ▸ NewRelic 5
  • 6. COMPARISON OPEN SOURCE SOLUTIONS 6 Postgres support? Configuration Reco Confidence Sensu Yes check_postgres Easy Graphite HIGH Zabbix Yes Plugin Easy MED Zenoss Yes Plugin Easy MED Nagios Yes check_postgres Difficult LOW
  • 7. COMPARISON SAAS OFFERINGS 7 Postgres support Configuration Confidence Wavefront Collectd Yes collectd plugins HIGH Circonus Yes Default checks HIGH Vividcortex Yes Default checks HIGH Okmeter Yes One click install pgbouncer HIGH New Relic Yes Plugins - missing some metrics MEDIUM
  • 8. COMPARISON SAAS OFFERINGS ▸ Capacity Planning ▸ Real Time Analytics ▸ Anomaly Detection ▸ Data Retention ▸ Support Reviews ▸ Pricing 8
  • 9. WAVEFRONT.COM WAVEFRONT ▸ Nice Dashboard and alerting functionality ▸ Very scalable solution ▸ Works with existing metrics collection tools i.e collectd ▸ Real time analytics capability ▸ Complete monitoring suite 9
  • 10. OKMETER.IO OKMETER ▸ It is agent based system so you just need to install agent in your environment to monitor application, database or any other servers ▸ Very easy to install and configure ▸ Provides easy to configure Postgres Server monitoring using pg_stat_statements with server stats. Once you install agent, you get everything without any effort ▸ Built-in pgbouncer monitoring ▸ Built-in all resources monitoring ; Disk, CPU, Network & Memory 10
  • 12. USE CASE MONITORING SOLUTION ▸ 150+ DB clusters across the globe ▸ Easy installation ▸ Standardization ▸ Centralized solution ▸ Real time analytics ▸ Support new Infra - Docker/Aurora/Mesos 12
  • 13. METRICS COLLECTION SETUP ROLE 13 create role collectd login encrypted password 'XXX'; create schema collectd; set search_path = collectd,pg_catalog; grant usage on schema collectd to collectd; alter role collectd set search_path = collectd,pg_catalog; ▸ Things to consider: ▸ Separate role for monitoring ▸ No SUPER ROLE ▸ Limited permissions
  • 14. METRICS COLLECTION COLLECTD PLUGIN 14 LoadPlugin postgresql # <Plugin postgresql> <Database dba> Host "localhost" Port "5432" User "collectd" Query backends Query transactions Query queries Query table_states Query disk_io Query disk_usage Query query_plans Query connections #custom Query slow_queries #custom Query txn_wraparound #custom Query locks #custom Query wal_files #custom Query scans #custom Query seq_scans #custom Query avg_querytime #custom Query checkpoints #custom Query slave_lag #custom </Database> </Plugin>
  • 15. METRICS COLLECTION BACKENDS 15 Query backends> Statement "SELECT count(*) AS count FROM pg_stat_activity WHERE datname = $1;" Param database <Result> Type "pg_numbackends" ValuesFrom "count" </Result> </Query>
  • 16. METRICS COLLECTION PG_STAT_ACTIVITY 16 create or replace function pg_stat_activity() returns set of pg_catalog.pg_stat_activity as $$ begin return query(select * from pg_catalog.pg_stat_activity); end $$ language plpgsql security definer; revoke all on function pg_stat_activity() from public; grant execute on function pg_stat_activity() to collectd;
  • 17. METRICS COLLECTION TRANSACTIONS 17 <Query transactions> Statement "SELECT xact_commit, xact_rollback FROM pg_stat_database WHERE datname = $1;" Param database <Result> Type "pg_xact" InstancePrefix "commit" ValuesFrom "xact_commit" </Result> <Result> Type "pg_xact" InstancePrefix "rollback" ValuesFrom "xact_rollback" </Result> </Query>
  • 18. METRICS COLLECTION QUERIES (DML) 18 <Query queries> Statement "SELECT sum(n_tup_ins) AS ins, sum(n_tup_upd) AS upd, sum(n_tup_del) AS del, sum(n_tup_hot_upd) AS hot_upd FROM pg_stat_user_tables;" <Result> Type "pg_n_tup_c" InstancePrefix "ins" ValuesFrom "ins" </Result> .. . . . </Query>
  • 19. METRICS COLLECTION TABLE_STATES 19 <Query table_states> Statement "SELECT sum(n_live_tup) AS live, sum(n_dead_tup) AS dead FROM pg_stat_user_tables;" <Result> Type "pg_n_tup_g" InstancePrefix "live" ValuesFrom "live" </Result> <Result> Type "pg_n_tup_g" InstancePrefix "dead" ValuesFrom "dead" </Result> </Query>
  • 20. METRICS COLLECTION QUERY_PLANS 20 <Query query_plans> Statement "SELECT sum(seq_scan) AS seq, sum(seq_tup_read) AS seq_tup_read, sum(idx_scan) AS idx, sum(idx_tup_fetch) AS idx_tup_fetch FROM pg_stat_user_tables;" <Result> Type "pg_scan" InstancePrefix "seq" ValuesFrom "seq" . . </Query>
  • 21. METRICS COLLECTION DISK_IO 21 <Query disk_io> Statement "SELECT coalesce(sum(heap_blks_read), 0) AS heap_read, coalesce(sum(heap_blks_hit), 0) AS heap_hit, coalesce(sum(idx_blks_read), 0) AS idx_read, coalesce(sum(idx_blks_hit), 0) AS idx_hit, coalesce(sum(toast_blks_read), 0) AS toast_read, coalesce(sum(toast_blks_hit), 0) AS toast_hit, coalesce(sum(tidx_blks_read), 0) AS tidx_read, coalesce(sum(tidx_blks_hit), 0) AS tidx_hit FROM pg_statio_user_tables;"
  • 22. METRICS COLLECTIONS DISK USAGE / DB SIZE 22 <Query disk_usage> Statement "SELECT pg_database_size($1) AS size;" Param database <Result> Type pg_db_size ValuesFrom "size" </Result> </Query>
  • 23. METRICS COLLECTION CONNECTIONS #CUSTOM 23 <Query connections> Statement "SELECT COUNT(state) AS count, state FROM (SELECT CASE WHEN state = 'idle' THEN 'idle' WHEN state = 'idle in transaction' THEN 'idle_in_transaction' WHEN state = 'active' THEN 'active' ELSE 'unknown' END AS state FROM collectd.pg_stat_activity) state GROUP BY state UNION SELECT COUNT(*) AS count, 'waiting' AS state FROM collectd.pg_stat_activity WHERE waiting ;" <Result> Type "pg_numbackends" InstancePrefix "state" InstancesFrom "state" ValuesFrom "count" </Result> </Query>
  • 25. METRICS COLLECTION SLOW_QUERIES 25 <Query slow_queries> Statement "SELECT COUNT(*) AS count FROM collectd.pg_stat_activity WHERE state='active' AND now()-query_start > '300 seconds'::interval AND query ~* '^(insert|update|delete|select)' ;" <Result> Type "counter" InstancePrefix "pg_slow_queries" ValuesFrom "count" </Result> </Query>
  • 26. METRICS COLLECTION TXN_WRAPAROUND 26 <Query txn_wraparound> Statement "SELECT age(datfrozenxid) as txn_wrap_age FROM pg_database ;" <Result> Type "counter" InstancePrefix "txn_wraparound" ValuesFrom "txn_wrap_age" </Result> </Query>
  • 27. METRICS COLLECTION LOCKS 27 <Query locks> Statement "SELECT COUNT(mode) AS count, mode FROM pg_locks GROUP BY mode UNION SELECT COUNT(*) AS count, 'waiting' AS mode FROM pg_locks WHERE granted is false ;" <Result> Type "gauge" InstancePrefix "pg_locks" InstancesFrom "mode" ValuesFrom "count" </Result> </Query>
  • 29. METRICS COLLECTION WAL_FILES 29 <Query wal_files> Statement "SELECT archived_count AS count, failed_count AS failed FROM pg_stat_archiver;" <Result> Type "gauge" InstancePrefix "pg_wal_count" ValuesFrom "count" </Result> <Result> Type "gauge" InstancePrefix "pg_wal_failed" ValuesFrom "failed" </Result> </Query>
  • 30. METRICS COLLECTION SCANS 30 <Query scans> Statement "SELECT sum(idx_scan) as index_scans, sum(seq_scan) as seq_scans, sum(idx_tup_fetch) as index_tup_fetch, sum(seq_tup_read) as seq_tup_read FROM pg_stat_all_tables ; " <Result> Type "pg_scan" InstancePrefix "index" ValuesFrom "index_scans" </Result> . . </Query>
  • 31. METRIC COLLECTION SEQ_SCANS 31 <Query seq_scans> Statement "SELECT CASE WHEN status='OK' THEN 0 ELSE 1 END AS status FROM ( SELECT get_seq_scan_on_large_tables AS status FROM collectd.get_seq_scan_on_large_tables) AS foo;" <Result> Type "gauge" InstancePrefix "pg_seq_scans" ValuesFrom "status" </Result> </Query>
  • 32. METRICS COLLECTION SEQ_SCAN_ON_LARGE_TABLES 32 CREATE MATERIALIZED VIEW collectd.seq_scan_on_large_tables AS SELECT relid, schemaname, relname, seq_scan, seq_tup_read , pg_relation_size(relid) as relsize, now() as refreshed_at FROM pg_stat_all_tables WHERE pg_relation_size(relid) > 1073741824 AND schemaname not in ('pg_catalog', 'information_schema') UNION ALL SELECT 0,'0','0','0',0,0,now(); ALTER materialized VIEW collectd.seq_scan_on_large_tables OWNER TO collectd;
  • 33. METRICS COLLECTION GET_SEQ_SCAN_ON_LARGE_TABLES 33 CREATE OR REPLACE FUNCTION collectd.get_seq_scan_on_large_tables() RETURNS text AS $$ DECLARE v_matview text; v_refreshed_at timestamptz; v_tables_with_seq_scan text[]; BEGIN SELECT refreshed_at INTO v_refreshed_at FROM collectd.seq_scan_on_large_tables WHERE relid=0; -- refresh MV every 4 hours IF v_refreshed_at < now() - interval '4 hours' and pg_is_in_recovery() is false THEN REFRESH MATERIALIZED VIEW collectd.seq_scan_on_large_tables; END IF; SELECT ARRAY (SELECT base.relname ||':'|| (current.seq_scan-base.seq_scan) INTO v_tables_with_seq_scan FROM collectd.seq_scan_on_large_tables AS base LEFT JOIN pg_stat_all_tables AS current ON (base.schemaname=base.schemaname AND base.relname=current.relname) WHERE (current.seq_scan-base.seq_scan) > 0 AND ((current.seq_tup_read-base.seq_tup_read)/(current.seq_scan-base.seq_scan)) > 50000 ) AS tables_with_seq_scan; IF v_tables_with_seq_scan = '{}' THEN RETURN 'OK'; ELSE RETURN 'PROBLEM: Seq scan on table: '|| array_to_string(v_tables_with_seq_scan,'&'); END If; END; $$ LANGUAGE 'plpgsql' SECURITY DEFINER;
  • 34. METRICS COLLECTION AVG_QUERYTIME 34 <Query avg_querytime> Statement "SELECT sum(total_time)/sum(calls) AS avg_querytime FROM collectd.get_stat_statements() ;" <Result> Type "gauge" InstancePrefix "pg_avg_querytime" ValuesFrom "avg_querytime" </Result> </Query> <Query scans>
  • 35. METRICS COLLECTION GET_STAT_STATEMENTS 35 create extension IF NOT EXISTS pg_stat_statements WITH SCHEMA collectd; alter schema collectd owner to collectd; CREATE OR REPLACE FUNCTION collectd.get_stat_statements() RETURNS SETOF pg_stat_statements AS $$ SELECT * FROM pg_stat_statements WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database()); $$ LANGUAGE sql VOLATILE SECURITY DEFINER;
  • 38. METRICS COLLECTIONS CHECKPOINTS 38 <Query checkpoints> Statement "SELECT (checkpoints_timed + checkpoints_req) AS total_checkpoints FROM pg_stat_bgwriter ;" <Result> Type "counter" InstancePrefix "pg_checkpoints" ValuesFrom "total_checkpoints" </Result> </Query>
  • 39. METRICS COLLECTION SLAVE LAG 39 <Query slave_lag> Statement "SELECT CASE WHEN pg_is_in_recovery = 'false' THEN 0 ELSE COALESCE(ROUND(EXTRACT(epoch FROM now() pg_last_xact_replay_timestamp())),0) END AS seconds FROM pg_is_in_recovery();" <Result> Type "counter" InstancePrefix "slave_lag" ValuesFrom "seconds" </Result> </Query>
  • 40. ALERTING SETUP ALERTS ON DB METRICS ▸ Uptime ▸ Waiting Connections ▸ # of connections waiting > 5 ▸ Slow queries ▸ # of slow queries > 5 ▸ Seq scan on large tables ▸ TXN Wraparound ▸ Age Over 1.5B ▸ Disk space usage ▸ 85%? ▸ Slave lag ▸ 5 minutes? 40
  • 41. MONITORING CHANGES HOW TO KEEP UP? ▸ Design with failover in mind ▸ Keep eyes on new features for monitoring in latest DB or OS version ▸ Postgres 9.5 enhancements ▸ Commit timestamp tracking ▸ SELECT * FROM pg_last_committed_xact(); ▸ cluster_name ▸ $ ps -ef | grep checkpointer ▸ postgres 12181 12178 0 11:12 ? 00:00:00 postgres: personnel: checkpointer process ▸ postgres 12207 12204 0 11:12 ? 00:00:00 postgres: reportsdb: checkpointer process ▸ postgres 12233 12230 0 11:12 ? 00:00:00 postgres: management: checkpointer process ▸ A bunch of changes coming in Postgres 9.6 ▸ Improve the pg_stat_activity view provides more details about waiting on what resources ▸ Deploy monitoring through config management tools 41
  • 42. INCIDENT MANAGEMENT HOW TO BE READY TO HANDLE 3AM CALL? ▸ PagerDuty calendar : ▸ Document metrics ▸ URL for the Dashboard ▸ Alert resolution procedure ▸ Clear SLAs (Decision) ▸ Escalation policy ▸ Scenarios ▸ Wait for server to bring backup ▸ Failover ▸ Review alerts before going OnCall ▸ Oncall notification ▸ Think for the worst and document accordingly ▸ What if you are in movie theatre/beach etc.? ▸ What if you can’t jump on the server? ▸ Keep the document up-to-date 42
  • 44. KEEP IN TOUCH THANKS & Q/A ▸ You! ▸ Conference committee ▸ Contact for further Q/A ▸ Twitter: @DenishPatel ▸ ▸ ▸ 44