This document discusses advanced Postgres monitoring. It begins with an introduction of the speaker and an agenda for the discussion. It then covers selection criteria for monitoring solutions, compares open source and SAAS monitoring options, and provides examples of collecting specific Postgres metrics using CollectD. It also discusses alerting, handling monitoring changes, and being prepared to respond to incidents outside of normal hours.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
Josh Berkus
You've heard that PostgreSQL is the highest-performance transactional open source database, but you're not seeing it on YOUR server. In fact, your PostgreSQL application is kind of poky. What should you do? While doing advanced performance engineering for really high-end systems takes years to learn, you can learn the basics to solve performance issues for 80% of PostgreSQL installations in less than an hour. In this session, you will learn: -- The parts of database application performance -- The performance setup procedure -- Basic troubleshooting tools -- The 13 postgresql.conf settings you need to know -- Where to look for more information.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
PostgreSQL is designed to be easily extensible. For this reason, extensions loaded into the database can function just like features that are built in. In this session, we will learn more about PostgreSQL extension framework, how are they built, look at some popular extensions, management of these extensions in your deployments.
PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
Best Practices for Becoming an Exceptional Postgres DBA EDB
Drawing from our teams who support hundreds of Postgres instances and production database systems for customers worldwide, this presentation provides real-real best practices from the nation's top DBAs. Learn top-notch monitoring and maintenance practices, get resource planning advice that can help prevent, resolve, or eliminate common issues, learning top database tuning tricks for increasing system performance and ultimately, gain greater insight into how to improve your effectiveness as a DBA.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
PostgreSQL is designed to be easily extensible. For this reason, extensions loaded into the database can function just like features that are built in. In this session, we will learn more about PostgreSQL extension framework, how are they built, look at some popular extensions, management of these extensions in your deployments.
PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
Best Practices for Becoming an Exceptional Postgres DBA EDB
Drawing from our teams who support hundreds of Postgres instances and production database systems for customers worldwide, this presentation provides real-real best practices from the nation's top DBAs. Learn top-notch monitoring and maintenance practices, get resource planning advice that can help prevent, resolve, or eliminate common issues, learning top database tuning tricks for increasing system performance and ultimately, gain greater insight into how to improve your effectiveness as a DBA.
An immersive workshop at General Assembly, SF. I typically teach this workshop at General Assembly, San Francisco. To see a list of my upcoming classes, visit https://generalassemb.ly/instructors/seth-familian/4813
I also teach this workshop as a private lunch-and-learn or half-day immersive session for corporate clients. To learn more about pricing and availability, please contact me at http://familian1.com
This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
This presentation reviews the top ten new features that will appear in the Postgres 9.5 release.
Postgres 9.5 adds many features designed to enhance the productivity of developers: UPSERT, CUBE, ROLLUP, JSONB functions, and PostGIS improvements. For administrators, it has row-level security, a new index type, and performance enhancements for large servers.
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...Amazon Web Services
Postgres is a popular relational database and is the backend of a number of high traffic applications. Join AWS and PalominoDB, the company that helped Obama for America campaign optimize the database infrastructure on AWS, to learn about how you can run high throughput, I/O intensive Postgres clusters on the Amazon EBS storage platform. We will go over best practices including performance, durability and optimization related to deploying Postgres on AWS.
You hear about the best practices learned and applied for the Obama for America campaign.
In this webinar, you will learn about:
- Amazon Elastic Block Store (EBS)
- Why Provisioned IOPS volumes fit the needs of high I/O intensive applications
- Best practices for deploying Postgres on AWS
- How to leverage Provisioned IOPS volumes for Postgres
Oracle Application Express and Oracle Row-Level Security (RLS) (aka Virtual Private Database) work very well together. Using RLS you can have one database serve different groups of users while virtually guaranteeing that no-one will be able to view or update data they aren't supposed to. That was the sales pitch. This presentation will be a case study on one small Apex application with complex security requirements. The author started with a complex solution using views which performed poorly and didn't satisfy all the user's requirements. When he switched to a solution using RLS, the application became significantly simpler and faster, and all user requirements were met.
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
The pg_stat_monitor is the statistics collection tool based on PostgreSQL’s contrib module pg_stat_statements. PostgreSQL’s pg_stat_statements provides only basic statistics, which is sometimes not enough. The major shortcoming in pg_stat_statements is that it accumulates all the queries and statistics, but does not provide aggregated statistics or histogram information. In this case, a user needs to calculate the aggregate, which is quite expensive. Pg_stat_monitor provides the pre-calculated aggregates. pg_stat_monitor collects and aggregates data on a bucket basis. The size and number of buckets should be configured using GUC (Grand Unified Configuration). The buckets are used to collect the statistics and aggregate them in a bucket. The talk will cover the usage of pg_stat_monitor and how it is better than pg_stat_statements.
• Answer questions Who, What, Where and When about any database activity, by setting up an Oracle standard audit. The feature is free and available in every database edition.
• Stay on top of any possible performance and storage issues by choosing appropriate audit parameters.
• Build summary and detail reports to analyze audit events from multiple databases using APEX or SQL*PLUS.
• Setup a data retention period and cleanup audit records regularly.
• Create a honeypot to attract hacker’s attention.
• Enable alerts and send email notifications using Oracle enterprise manager infrastructure
The presentation focuses on the facilities available in Oracle 10g for SQL and database tuning, the identification of database problems using wait events, and some common configuration problems.
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Ontico
The new PL profiler allows you to easily get through the dark barrier, PL/pgSQL puts between tools like pgbadger and the queries, you are looking for.
Query and schema tuning is tough enough by itself. But queries, buried many call levels deep in PL/pgSQL functions, make it torture. The reason is that the default monitoring tools like logs, pg_stat_activity and pg_stat_statements cannot penetrate into PL/pgSQL. All they report is that your query calling function X is slow. That is useful if function X has 20 lines of simple code. Not so useful if it calls other functions and the actual problem query is many call levels down in a dungeon of 100,000 lines of PL code.
Learn from the original author of PL/pgSQL and current maintainer of the plprofiler extension how you can easily analyze, what is going on inside your PL code.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
Answer questions Who, What , When and Where about any database activity by setting up an Oracle audit. The infrastructure is free and available in every database edition.
Stay on top of any possible performance and storage issues by choosing appropriate audit parameters.
Build summary and detail reports to analyze audit events from multiple databases using APEX or SQL*Plus.
Setup a data retention period and cleanup audit records regularly.
Create honeypot to attract hacker’s attention.
Enable alerts and send email notifications using Oracle Enterprise Manager infrastructure.
PostgreSQL Performance Problems: Monitoring and AlertingGrant Fritchey
PostgreSQL can be difficult to troubleshoot when the pressure is on without the right knowledge and tools. Knowing where to find the information you need to improve performance is central to your ability to act quickly and solve problems. In this training, we'll discuss the various query statistic views and log information that's available in PostgreSQL so that you can solve problems quickly. Along the way, we'll highlight a handful of open-source and paid tools that can help you track data over time and provide better alerting capabilities so that you know about problems before they become critical.
2. SPEAKER
WHO IS THIS GUY?
▸ Sr. Database Architect at Medallia
▸ Recent fun employments:
▸ Principal Database Engineer@ WithMe
▸ Lead Database Architect @ OmniTI
▸ Expertise in PostgreSQL , Oracle, MySQL, NoSQL
▸ Contact : denish.j.patel@gmail.com or dpatel@medallia.com
▸ Twitter: @DenishPatel
▸ Blog: http://www.pateldenish.com
▸ Postgres Slack Channel (https://postgres-slack.herokuapp.com/)
2
3. AGENDA
DISCUSSION LIST
▸ What to look for monitoring solution in general?
▸ Comparison - selected open source and commercial monitoring solutions
▸ Which metrics to collect and how?
▸ Which metrics to alert on and how to define thresholds?
▸ How to keep up with monitoring changes ?
▸ How to react on alerts at 3AM?
▸ Open discussion
3
4. SELECTION CRITERIA
WHAT TO LOOK FOR IN MONITORING SOLUTION ?
▸ Blend of system monitoring with Postgres support
▸ Centralized monitoring
▸ Hosted vs On-premise
▸ Security concerns of clients
▸ Alerting and Dashboard/Graphs
▸ Easy installation and configuration
▸ Postgres Support
▸ pg_stat_statements
▸ Resource monitoring - CPU, RAM, DISK IO & Network
▸ pgbouncer support
4
5. COMPARISON
MONITORING SOLUTIONS
▸ Open Source
▸ Sensu
▸ Zabbix
▸ Zenoss (Limited capabilities)
▸ Nagios
▸ (Stop using Nagios so it can die peacefully!!)
▸ SAAS Offerings
▸ Wavefront
▸ Circonus
▸ Vividcortex
▸ OkMeter
▸ NewRelic
5
7. COMPARISON
SAAS OFFERINGS
7
Postgres support Configuration Confidence
Wavefront
Collectd
Yes collectd plugins HIGH
Circonus Yes Default checks HIGH
Vividcortex Yes Default checks HIGH
Okmeter Yes
One click install
pgbouncer
HIGH
New Relic Yes
Plugins - missing
some metrics
MEDIUM
9. WAVEFRONT.COM
WAVEFRONT
▸ Nice Dashboard and alerting functionality
▸ Very scalable solution
▸ Works with existing metrics collection tools i.e collectd
▸ Real time analytics capability
▸ Complete monitoring suite
9
10. OKMETER.IO
OKMETER
▸ It is agent based system so you just need to install agent in your environment
to monitor application, database or any other servers
▸ Very easy to install and configure
▸ Provides easy to configure Postgres Server monitoring using
pg_stat_statements with server stats. Once you install agent, you get
everything without any effort
▸ Built-in pgbouncer monitoring
▸ Built-in all resources monitoring ; Disk, CPU, Network & Memory
10
12. USE CASE
MONITORING SOLUTION
▸ 150+ DB clusters across the globe
▸ Easy installation
▸ Standardization
▸ Centralized solution
▸ Real time analytics
▸ Support new Infra - Docker/Aurora/Mesos
12
13. METRICS COLLECTION
SETUP ROLE
13
create role collectd login encrypted password 'XXX';
create schema collectd;
set search_path = collectd,pg_catalog;
grant usage on schema collectd to collectd;
alter role collectd set search_path = collectd,pg_catalog;
▸ Things to consider:
▸ Separate role for monitoring
▸ No SUPER ROLE
▸ Limited permissions
16. METRICS COLLECTION
PG_STAT_ACTIVITY
16
create or replace function pg_stat_activity()
returns set of pg_catalog.pg_stat_activity
as $$
begin
return query(select * from pg_catalog.pg_stat_activity);
end $$
language plpgsql security definer;
revoke all on function pg_stat_activity() from public;
grant execute on function pg_stat_activity() to collectd;
17. METRICS COLLECTION
TRANSACTIONS
17
<Query transactions>
Statement "SELECT xact_commit, xact_rollback
FROM pg_stat_database
WHERE datname = $1;"
Param database
<Result>
Type "pg_xact"
InstancePrefix "commit"
ValuesFrom "xact_commit"
</Result>
<Result>
Type "pg_xact"
InstancePrefix "rollback"
ValuesFrom "xact_rollback"
</Result>
</Query>
18. METRICS COLLECTION
QUERIES (DML)
18
<Query queries>
Statement "SELECT sum(n_tup_ins) AS ins,
sum(n_tup_upd) AS upd,
sum(n_tup_del) AS del,
sum(n_tup_hot_upd) AS hot_upd
FROM pg_stat_user_tables;"
<Result>
Type "pg_n_tup_c"
InstancePrefix "ins"
ValuesFrom "ins"
</Result>
..
.
.
.
</Query>
19. METRICS COLLECTION
TABLE_STATES
19
<Query table_states>
Statement "SELECT sum(n_live_tup) AS live, sum(n_dead_tup) AS dead
FROM pg_stat_user_tables;"
<Result>
Type "pg_n_tup_g"
InstancePrefix "live"
ValuesFrom "live"
</Result>
<Result>
Type "pg_n_tup_g"
InstancePrefix "dead"
ValuesFrom "dead"
</Result>
</Query>
20. METRICS COLLECTION
QUERY_PLANS
20
<Query query_plans>
Statement "SELECT sum(seq_scan) AS seq,
sum(seq_tup_read) AS seq_tup_read,
sum(idx_scan) AS idx,
sum(idx_tup_fetch) AS idx_tup_fetch
FROM pg_stat_user_tables;"
<Result>
Type "pg_scan"
InstancePrefix "seq"
ValuesFrom "seq"
.
.
</Query>
21. METRICS COLLECTION
DISK_IO
21
<Query disk_io>
Statement "SELECT coalesce(sum(heap_blks_read), 0) AS heap_read,
coalesce(sum(heap_blks_hit), 0) AS heap_hit,
coalesce(sum(idx_blks_read), 0) AS idx_read,
coalesce(sum(idx_blks_hit), 0) AS idx_hit,
coalesce(sum(toast_blks_read), 0) AS toast_read,
coalesce(sum(toast_blks_hit), 0) AS toast_hit,
coalesce(sum(tidx_blks_read), 0) AS tidx_read,
coalesce(sum(tidx_blks_hit), 0) AS tidx_hit
FROM pg_statio_user_tables;"
22. METRICS COLLECTIONS
DISK USAGE / DB SIZE
22
<Query disk_usage>
Statement "SELECT pg_database_size($1) AS size;"
Param database
<Result>
Type pg_db_size
ValuesFrom "size"
</Result>
</Query>
23. METRICS COLLECTION
CONNECTIONS #CUSTOM
23
<Query connections>
Statement "SELECT COUNT(state) AS count, state FROM (SELECT CASE
WHEN state = 'idle' THEN 'idle'
WHEN state = 'idle in transaction' THEN 'idle_in_transaction'
WHEN state = 'active' THEN 'active'
ELSE 'unknown' END AS state
FROM collectd.pg_stat_activity) state
GROUP BY state
UNION
SELECT COUNT(*) AS count, 'waiting' AS state
FROM collectd.pg_stat_activity WHERE waiting ;"
<Result>
Type "pg_numbackends"
InstancePrefix "state"
InstancesFrom "state"
ValuesFrom "count"
</Result>
</Query>
27. METRICS COLLECTION
LOCKS
27
<Query locks>
Statement "SELECT COUNT(mode) AS count, mode FROM pg_locks GROUP BY mode
UNION SELECT COUNT(*) AS count, 'waiting' AS mode FROM pg_locks
WHERE granted is false ;"
<Result>
Type "gauge"
InstancePrefix "pg_locks"
InstancesFrom "mode"
ValuesFrom "count"
</Result>
</Query>
29. METRICS COLLECTION
WAL_FILES
29
<Query wal_files>
Statement "SELECT archived_count AS count, failed_count AS failed FROM pg_stat_archiver;"
<Result>
Type "gauge"
InstancePrefix "pg_wal_count"
ValuesFrom "count"
</Result>
<Result>
Type "gauge"
InstancePrefix "pg_wal_failed"
ValuesFrom "failed"
</Result> </Query>
30. METRICS COLLECTION
SCANS
30
<Query scans>
Statement "SELECT sum(idx_scan) as index_scans, sum(seq_scan) as seq_scans,
sum(idx_tup_fetch) as index_tup_fetch, sum(seq_tup_read) as seq_tup_read
FROM pg_stat_all_tables ; "
<Result>
Type "pg_scan"
InstancePrefix "index"
ValuesFrom "index_scans"
</Result>
.
.
</Query>
31. METRIC COLLECTION
SEQ_SCANS
31
<Query seq_scans>
Statement "SELECT CASE WHEN status='OK' THEN 0 ELSE 1 END AS status
FROM ( SELECT get_seq_scan_on_large_tables AS status
FROM collectd.get_seq_scan_on_large_tables) AS foo;"
<Result>
Type "gauge"
InstancePrefix "pg_seq_scans"
ValuesFrom "status"
</Result>
</Query>
32. METRICS COLLECTION
SEQ_SCAN_ON_LARGE_TABLES
32
CREATE MATERIALIZED VIEW collectd.seq_scan_on_large_tables AS
SELECT relid, schemaname, relname, seq_scan, seq_tup_read ,
pg_relation_size(relid) as relsize, now() as refreshed_at
FROM pg_stat_all_tables
WHERE pg_relation_size(relid) > 1073741824
AND schemaname not in ('pg_catalog', 'information_schema')
UNION ALL SELECT 0,'0','0','0',0,0,now();
ALTER materialized VIEW collectd.seq_scan_on_large_tables OWNER TO collectd;
33. METRICS COLLECTION
GET_SEQ_SCAN_ON_LARGE_TABLES
33
CREATE OR REPLACE FUNCTION collectd.get_seq_scan_on_large_tables()
RETURNS text AS
$$
DECLARE
v_matview text;
v_refreshed_at timestamptz;
v_tables_with_seq_scan text[];
BEGIN
SELECT refreshed_at INTO v_refreshed_at
FROM collectd.seq_scan_on_large_tables WHERE relid=0;
-- refresh MV every 4 hours
IF v_refreshed_at < now() - interval '4 hours' and pg_is_in_recovery() is false THEN
REFRESH MATERIALIZED VIEW collectd.seq_scan_on_large_tables;
END IF;
SELECT ARRAY (SELECT base.relname ||':'|| (current.seq_scan-base.seq_scan) INTO v_tables_with_seq_scan
FROM collectd.seq_scan_on_large_tables AS base
LEFT JOIN pg_stat_all_tables AS current ON (base.schemaname=base.schemaname AND base.relname=current.relname)
WHERE (current.seq_scan-base.seq_scan) > 0 AND ((current.seq_tup_read-base.seq_tup_read)/(current.seq_scan-base.seq_scan)) > 50000 ) AS
tables_with_seq_scan;
IF v_tables_with_seq_scan = '{}' THEN
RETURN 'OK';
ELSE
RETURN 'PROBLEM: Seq scan on table: '|| array_to_string(v_tables_with_seq_scan,'&');
END If;
END;
$$
LANGUAGE 'plpgsql' SECURITY DEFINER;
34. METRICS COLLECTION
AVG_QUERYTIME
34
<Query avg_querytime>
Statement "SELECT sum(total_time)/sum(calls) AS avg_querytime FROM
collectd.get_stat_statements() ;"
<Result>
Type "gauge"
InstancePrefix "pg_avg_querytime"
ValuesFrom "avg_querytime"
</Result>
</Query>
<Query scans>
35. METRICS COLLECTION
GET_STAT_STATEMENTS
35
create extension IF NOT EXISTS pg_stat_statements WITH SCHEMA collectd;
alter schema collectd owner to collectd;
CREATE OR REPLACE FUNCTION collectd.get_stat_statements() RETURNS SETOF
pg_stat_statements AS
$$
SELECT * FROM pg_stat_statements
WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database());
$$ LANGUAGE sql VOLATILE SECURITY DEFINER;
38. METRICS COLLECTIONS
CHECKPOINTS
38
<Query checkpoints>
Statement "SELECT (checkpoints_timed + checkpoints_req) AS total_checkpoints
FROM pg_stat_bgwriter ;"
<Result>
Type "counter"
InstancePrefix "pg_checkpoints"
ValuesFrom "total_checkpoints"
</Result>
</Query>
39. METRICS COLLECTION
SLAVE LAG
39
<Query slave_lag>
Statement "SELECT CASE WHEN pg_is_in_recovery = 'false' THEN 0
ELSE COALESCE(ROUND(EXTRACT(epoch FROM now() pg_last_xact_replay_timestamp())),0) END
AS seconds
FROM pg_is_in_recovery();"
<Result>
Type "counter"
InstancePrefix "slave_lag"
ValuesFrom "seconds"
</Result>
</Query>
40. ALERTING
SETUP ALERTS ON DB METRICS
▸ Uptime
▸ Waiting Connections
▸ # of connections waiting > 5
▸ Slow queries
▸ # of slow queries > 5
▸ Seq scan on large tables
▸ TXN Wraparound
▸ Age Over 1.5B
▸ Disk space usage
▸ 85%?
▸ Slave lag
▸ 5 minutes?
40
41. MONITORING CHANGES
HOW TO KEEP UP?
▸ Design with failover in mind
▸ Keep eyes on new features for monitoring in latest DB or OS version
▸ Postgres 9.5 enhancements
▸ Commit timestamp tracking
▸ SELECT * FROM pg_last_committed_xact();
▸ cluster_name
▸ $ ps -ef | grep checkpointer
▸ postgres 12181 12178 0 11:12 ? 00:00:00 postgres: personnel: checkpointer process
▸ postgres 12207 12204 0 11:12 ? 00:00:00 postgres: reportsdb: checkpointer process
▸ postgres 12233 12230 0 11:12 ? 00:00:00 postgres: management: checkpointer process
▸ A bunch of changes coming in Postgres 9.6
▸ Improve the pg_stat_activity view provides more details about waiting on what resources
▸ Deploy monitoring through config management tools 41
42. INCIDENT MANAGEMENT
HOW TO BE READY TO HANDLE 3AM CALL?
▸ PagerDuty calendar : https://www.pagerduty.com/
▸ Document metrics
▸ URL for the Dashboard
▸ Alert resolution procedure
▸ Clear SLAs (Decision)
▸ Escalation policy
▸ Scenarios
▸ Wait for server to bring backup
▸ Failover
▸ Review alerts before going OnCall
▸ Oncall notification
▸ Think for the worst and document accordingly
▸ What if you are in movie theatre/beach etc.?
▸ What if you can’t jump on the server?
▸ Keep the document up-to-date 42