SlideShare a Scribd company logo
Infrastructure Monitoring
(with Postgres, obviously)
Steve Simpson
StackHPC
steve@stackhpc.com
www.stackhpc.com
Overview
1) Background
2) Monitoring
Postgres for Metrics
3) Requirements
4) Data & Queries
5) Optimisation
Postgres for ...
6) Log Searching
7) Log Parsing
8) Queueing
Background
Background
Systems Software Engineer
C, C++, Python
Background
Based in Bristol, UK
Thriving Tech Industry
Background
● Gnodal
● 10GbE Ethernet
● ASIC Verification
● Embedded Firmware
● JustOne Database
● Agile “Big Data” RMDBS
● Based on PostgreSQL
● Storage Team Lead
Background
Consultancy for HPC on OpenStack
Multi-tenant massively parallel workloads
Monitoring complex infrastructure
Stack
HPC
Background
Cloud orchestration platform
IaaS through API and dashboard
Multi-tenancy throughout
Network, Compute, Storage
Background
Operational visibility is critical
OpenStack is a complex, distributed application
…to run your complex, distributed applications
Monitoring
Monitoring Requirements
Gain visibility into the operation of the
hardware and software
e.g. web site, database, cluster, disk drive
Monitoring Requirements
Fault finding and alerting
Notify me when a server or service is
unavailable, a disk needs replacing, ...
Fault post-mortem, pre-emption
Why did the outage occur and what can we
do to prevent it next time
Monitoring Requirements
Utilisation and efficiency analysis
Is all the hardware we own being used?
Is it being used efficiently?
Performance monitoring and profiling
How long are my web/database requests?
Monitoring Requirements
Auditing (security, billing)
Tracking users use of system
Auditing access to systems or resources
Decision making, future planning
What is expected growth in data, or users?
What of the current system is most used?
Monitoring
Existing Tools
Existing Tools
Checking and Alerting
Agents check on machines or services
Report centrally, notify users via dashboard
Store history of events in database
Existing Tools
Nagios / Icinga
ping -c 1 $host || mail -s “Help!” $me
Existing Tools
Kibana (+Elasticsearch/Logstash)
Existing Tools
Metrics
Periodically collect metrics, e.g. CPU%
Store in central database for visualization
Some systems allow checking on top
Existing Tools
Ganglia
Collector (gmond) + Aggregator (gmetad)
Existing Tools
https://ganglia.wikimedia.org/
Existing Tools
Grafana - visualization only
Existing Tools
Metrics Databases
● Ganglia (RRDtool)
● Graphite (Whisper)
● OpenTSDB (HBase)
● KairosDB (Cassandra)
● InfluxDB
● Prometheus
● Gnocchi
● Atlas
● Heroic
● Hawkular (Cassandra)
● MetricTank (Cassandra)
● Riak TS (Riak)
● Blueflood (Cassandra)
● DalmatinerDB
● Druid
● BTrDB
● Warp 10 (Hbase)
● Tgres (PostgreSQL!)
Existing Tools
Metrics Databases
● Ganglia [Berkley]
● Graphite [Orbitz]
● OpenTSDB [Stubleupon]
● KairosDB
● InfluxDB
● Prometheus [SoundCloud]
● Gnocchi [OpenStack]
● Atlas [Netflix]
● Heroic [Spotify]
● Hawkular [Redhat]
● MetricTank [Raintank]
● Riak TS [Basho]
● Blueflood [Rackspace]
● DalmatinerDB
● Druid
● BTrDB
● Warp 10
● Tgres
Existing Tools
Existing Tools
2000
Existing Tools
2000
2010
Existing Tools
2000
2010
2013 - 2015
Existing Tools
Software
Network
Storage
Servers
Existing Tools
Software
Network
Storage
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
MySQL
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
InfluxDB
Metric API
Alerting
MySQL
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
InfluxDB
Metric API
Alerting
Grafana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Zookeeper
Existing Tools
Commendable “right tool for the job” attitude, but…
How about Postgres?
Fewer points of failure
Fewer places to backup
Fewer redundancy protocols
One set of consistent data semantics
Re-use existing operational knowledge
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
MySQL
SQLite
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
SQLite
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
InfluxDB
Metric API
Alerting
Grafana
Kibana
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash Elastic
Metric API
Alerting
Grafana
Kibana
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Kafka
Logstash
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Zookeeper
Monasca
Existing Tools
Software
Network
Storage Log API
Logstash
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
Grafana
Grafana?
Servers
Metrics
Logs
Monasca
Existing Tools
Software
Network
Storage Log API
Metric API
Alerting
Grafana
Servers
Metrics
Logs
Postgres for Metrics
Requirements
Postgres for Metrics
Requirements
● ~45M values/day
(80x196 per 30s)
● 6 month history
● <1TB disk footprint
● <100ms queries
Postgres for Metrics
Combine Series
average over all
for {series=cpu}
[time range/interval]
Read Series
for each {type}
for {series=cpu}
[time range/interval]
Postgres for Metrics
List
Dimension
Values
List
Dimension
Names
List
Metric
Names
"metrics": [
"cpu.percent",
"cpu.user_perc",
"net.out_bytes_sec",
"net.out_errors_sec",
"net.in_bytes_sec",
"net.in_errors_sec"
…
]
"dimensions": [
"device",
"hostname",
"instance",
"mount_point",
"process_name",
"process_user"
…
]
"hostname": [
"dev-01",
"dev-02",
"staging-01",
"staging-02",
"prod-01",
"prod-02"
…
]
Postgres for Metrics
Data & Queries
Postgres for Metrics
"metric": {
"timestamp": 1232141412,
"name": "cpu.percent",
"value": 42,
"dimensions": { "hostname": "dev-01" },
"value_meta": { … }
}
JSON Ingest Format
Known, well defined structure
Varying set of dimensions key/values
Postgres for Metrics
CREATE TABLE measurements (
timestamp TIMESTAMPTZ,
name VARCHAR,
value FLOAT8,
dimensions JSONB,
value_meta JSON
);
Basic Denormalised Schema
Straightforward mapping onto input data
Data model for all schemas
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 60) AS timestamp,
AVG(value) AS avg
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z01:00:00'
AND name = 'cpu.percent'
AND dimensions @> '{"hostname": "dev-01"}'::JSONB
GROUP BY
timestamp
Single Series Query
One hour window | Single hostname
Measurements every 60 second interval
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 60) AS timestamp,
AVG(value) AS avg,
dimensions ->> 'hostname' AS hostname
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z01:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp, hostname
Group Multi-Series Query
One hour window | Every hostname
Measurements every 60 second interval
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 60) AS timestamp,
AVG(value) AS avg
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z01:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp
All Multi-Series Query
One hour window | Every hostname
Measurements every 60 second interval
Postgres for Metrics
SELECT DISTINCT
name
FROM
measurements
Metric Name List Query
:)
Postgres for Metrics
SELECT DISTINCT
JSONB_OBJECT_KEYS(dimensions)
AS d_name
WHERE
name = 'cpu.percent'
FROM
measurements
Dimension Name List Query
(for specific metric)
Postgres for Metrics
SELECT DISTINCT
dimensions ->> 'hostname'
AS d_value
WHERE
name = 'cpu.percent'
AND dimensions ? 'hostname'
FROM
measurements
Dimension Value List Query
(for specific metric and dimension)
Postgres for Metrics
Optimisation
Postgres for Metrics
CREATE TABLE measurements (
timestamp TIMESTAMPTZ,
name VARCHAR,
value FLOAT8,
dimensions JSONB,
value_meta JSON
);
CREATE INDEX ON measurements
(name, timestamp);
CREATE INDEX ON measurements USING GIN
(dimensions);
Indexes
Covers all necessary query terms
Using single GIN saves space, but slower
Postgres for Metrics
● Series Queries
● All, Group, Specific
● Varying Time Window/Interval
5m|15s, 1h|15s, 1h|300s, 6h|300s, 24h|300s
● Listing Queries
● Metric Names, Dimension Names & Values
● All, Partial
Postgres for Metrics
Single
Group
All
0 2000 4000 6000 8000 10000 12000
"Denormalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Single
Group
All
0 500 1000 1500 2000 2500
"Denormalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 10000 20000 30000 40000 50000 60000
"Denormalised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 1000 2000 3000 4000 5000 6000 7000 8000
"Denormalised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
CREATE TABLE measurement_values (
timestamp TIMESTAMPTZ,
metric_id INT,
value FLOAT8,
value_meta JSON
);
CREATE TABLE metrics (
id SERIAL,
name VARCHAR,
dimensions JSONB,
);
Normalised Schema
Reduces duplication of data
Pre-built set of distinct metric definitions
Postgres for Metrics
CREATE FUNCTION get_metric_id (in_name VARCHAR, in_dims JSONB)
RETURNS INT LANGUAGE plpgsql AS $_$
DECLARE
out_id INT;
BEGIN
SELECT id INTO out_id FROM metrics AS m
WHERE m.name = in_name AND m.dimensions = in_dims;
IF NOT FOUND THEN
INSERT INTO metrics ("name", "dimensions") VALUES
(in_name, in_dims) RETURNING id INTO out_id;
END IF;
RETURN out_id;
END; $_$;
Normalised Schema
Function to use at insert time
Finds existing metric_id or allocates new
Postgres for Metrics
CREATE VIEW measurements AS
SELECT *
FROM measurement_values
INNER JOIN
metrics ON (metric_id = id);
CREATE INDEX metrics_idx ON
metrics (name, dimensions);
CREATE INDEX measurements_idx ON
measurement_values (metric_id, timestamp);
Normalised Schema
Same queries, use view to join
Extra index to help normalisation step
Postgres for Metrics
Single
Group
All
0 500 1000 1500 2000 2500
"Normalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Single
Group
All
0 200 400 600 800 1000
"Normalised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 200 400 600 800 1000
"Normalised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
● As time window grows
less detail is necessary, e.g.
● 30s interval at 1 hour
● 300s interval at 6 hour
Postgres for Metrics
Timestamp Metric Value
10:00:00 1 10
10:00:00 2 2
10:00:30 1 10
10:00:30 2 4
10:01:30 1 20
10:01:30 2 4
10:02:00 1 15
10:02:00 2 2
10:02:30 1 5
10:02:30 2 2
10:03:00 1 10
10:03:00 2 6
Timestamp Metric Value
10:00:00 1 40
10:00:00 2 10
10:02:00 1 30
10:02:00 2 8
Postgres for Metrics
CREATE TABLE summary_values_5m (
timestamp TIMESTAMPTZ,
metric_id INT,
value_sum FLOAT8,
value_count FLOAT8,
value_min FLOAT8,
value_max FLOAT8,
UNIQUE (metric_id, timestamp)
);
Summarised Schema
Pre-compute every 5m (300s) interval
Functions to be applied must be known
Postgres for Metrics
CREATE FUNCTION update_summarise () RETURNS TRIGGER
LANGUAGE plpgsql AS $_$
BEGIN
INSERT INTO summary_values_5m VALUES (
TIME_ROUND(NEW.timestamp, 300), NEW.metric_id,
NEW.value, 1, NEW.value, NEW.value)
ON CONFLICT (metric_id, timestamp)
DO UPDATE SET
value_sum = value_sum + EXCLUDED.value_sum,
value_count = value_count + EXCLUDED.value_count,
value_min = LEAST (value_min, EXCLUDED.value_min),
value_max = GREATEST(value_max, EXCLUDED.value_max);
RETURN NULL;
END; $_$;
Summarised Schema
Entry for each metric/rounded time period
Update existing entries by aggregating
Postgres for Metrics
CREATE TRIGGER update_summarise_trigger
AFTER INSERT ON measurement_values
FOR EACH ROW
EXECUTE PROCEDURE update_summarise ();
CREATE VIEW summary_5m AS
SELECT *
FROM
summary_values_5m INNER JOIN metrics
ON (metric_id = id);
Summarised Schema
Trigger applies row to summary table
View mainly for convenience when querying
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 300) AS timestamp,
AVG(value) AS avg
FROM
measurements
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z06:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp
Combined Series Query
Six hour window | Every hostname
Measurements every 300 second interval
Postgres for Metrics
SELECT
TIME_ROUND(timestamp, 300) AS timestamp,
SUM(value_sum) / SUM(value_count) AS avg
FROM
summary_5m
WHERE
timestamp BETWEEN '2015-01-01Z00:00:00'
AND '2015-01-01Z06:00:00'
AND name = 'cpu.percent'
GROUP BY
timestamp
Combined Series Query
Use pre-aggregated summary table
Mostly the same; extra fiddling for AVG
Postgres for Metrics
Single
Group
All
0 200 400 600 800 1000
"Summarised" Series Queries
5m (15s)
1h (15s)
1h (300s)
6h (300s)
24h (300s)
Duration (ms)
Postgres for Metrics
Dimension Values
Dimension Names
Metric Names
0 200 400 600 800 1000
"Summarised" Listing Queries
All
Partial
Duration (ms)
Postgres for Metrics
Summarised
Normalised
Denormalised
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
Ingest Time (1 day / 45M rows)
Seconds
Postgres for Metrics
Summarised
Normalised
Denormalised
0 500 1000 1500 2000 2500 3000 3500 4000
Ingest Time (1 day / 45M rows)
Seconds
Postgres for Metrics
Summarised
Normalised
Denormalised
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Disk Usage (1 day / 45M rows)
MB
Postgres for Metrics
● Need coarser summaries for wider
queries (e.g. 30m summaries)
● Need to partition data by day to:
● Retain ingest rate due to indexes
● Optimise dropping old data
● Much better ways to produce summaries
to optimise ingest, specifically:
● Process rows in batches of interval size
● Process asynchronous to ingest transaction
Postgres for…
Postgres for…
Log Searching
Postgres for Log Searching
Requirements
● Central log storage
● Trivially searchable
● Time bounded
● Filter ‘dimensions’
● Interactive query
times (<100ms)
Postgres for Log Searching
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy"
"hostname": "dev-controller-0"
},
}
Log Ingest Format
Typically sourced from rsyslog
Varying set of dimensions key/values
Postgres for Log Searching
CREATE TABLE logs (
timestamp TIMESTAMPTZ,
message VARCHAR,
dimensions JSONB
);
Basic Schema
Straightforward mapping of source data
Allow for maximum dimension flexibility
Postgres for Log Searching
connection AND program:haproxy
Query Example
Kibana/Elastic style using PG-FTS
SELECT *
FROM logs
WHERE
TO_TSVECTOR('english', message)
@@ TO_TSQUERY('connection')
AND dimensions @> '{"program":"haproxy"}';
Postgres for Log Searching
CREATE INDEX ON logs
USING GIN
(TO_TSVECTOR('english', message));
CREATE INDEX ON logs
USING GIN
(dimensions);
Indexes
Enables fast text search on ‘message’
& Fast filtering based on ‘dimensions’
Postgres for …
Log Parsing
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0"
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0"
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0",
"tags": [ "connect" ]
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0",
"tags": [ "connect" ]
},
}
Postgres for Log Parsing
"log": {
"timestamp": 1232141412,
"message":
"Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)",
"dimensions": {
"severity": 6,
"facility": 16,
"pid": "39762",
"program": "haproxy",
"hostname": "dev-controller-0",
"tags": [ "connect" ],
"src_ip": "172.16.8.1",
"src_port": "52690",
"dest_ip": "172.16.8.10",
"dest_port": "5000",
"service_name": "keystone",
"protocol": "HTTP"
},
}
Postgres for Log Parsing
….regex!
# SELECT REGEXP_MATCHES(
'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)',
'Connect from '
|| '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)'
|| ' ((w+)/(w+))'
);
regexp_matches
---------------------------------------------------
{172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP}
(1 row)
Postgres for Log Parsing
Garnish with JSONB
# SELECT JSONB_PRETTY(JSONB_OBJECT(
'{src_ip,src_port,dest_ip,dest_port,service, protocol}',
'{172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP}'
));
jsonb_pretty
-------------------------------
{ +
"src_ip": "172.16.8.1", +
"dest_ip": "172.16.8.10",+
"service": "keystone", +
"protocol": "HTTP", +
"src_port": "52690", +
"dest_port": "5000" +
}
(1 row)
Postgres for Log Parsing
CREATE TABLE logs (
timestamp TIMESTAMPTZ,
message VARCHAR,
dimensions JSONB
);
Log Schema – Goals:
Parse message against set of patterns
Add extracted information as dimensions
Postgres for Log Parsing
Patterns Table
Store pattern to match and field names
CREATE TABLE patterns (
regex VARCHAR,
field_names VARCHAR[]
);
INSERT INTO patterns (regex, fields_names) VALUES (
'Connect from '
|| '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)'
|| ' ((w+)/(w+))',
'{src_ip,src_port,dest_ip,dest_port,service,protocol}'
);
Postgres for Log Parsing
Log Processing
Apply all configured patterns to new rows
CREATE FUNCTION process_log () RETURNS TRIGGER
LANGUAGE PLPGSQL AS $_$
DECLARE
m JSONB; p RECORD;
BEGIN
FOR p IN SELECT * FROM patterns LOOP
m := JSONB_OBJECT(p.field_names,
REGEXP_MATCHES(NEW.message, p.regex));
IF m IS NOT NULL THEN
NEW.dimensions := NEW.dimensions || m
END IF;
END LOOP;
RETURN NEW;
END; $_$;
Postgres for Log Parsing
CREATE TRIGGER process_log_trigger
BEFORE INSERT ON logs
FOR EACH ROW
EXECUTE PROCEDURE process_log ();
Log Processing Trigger
Apply patterns as messages and extend
dimensions, as inserted into logs table
Postgres for Log Parsing
# INSERT INTO logs (timestamp, message, dimensions) VALUES (
'2017-01-03T06:29:09.043Z',
'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)',
'{"hostname": "dev-controller-0", "program": "haproxy"}');
# SELECT timestamp, message, JSONB_PRETTY(dimensions) FROM logs;
-[ RECORD 1 ]+------------------------------------------------------------------
timestamp | 2017-01-03 06:29:09.043+00
message | Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)
jsonb_pretty | { +
| "src_ip": "172.16.8.1", +
| "dest_ip": "172.16.8.10", +
| "program": "haproxy", +
| "service": "keystone", +
| "hostname": "dev-controller-0", +
| "protocol": "HTTP", +
| "src_port": "52690", +
| "dest_port": "5000" +
| }
Postgres for …
Queueing
Requirements
● Offload data burden
from producers
● Persist as soon as
possible to avoid loss
● Handle high velocity
burst loads
● Data does not need
to be queryable
Postgres for Queueing
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Postgres for Queueing
WITH BINARY
VARCHAR
JSON
JSONB
Denormalised
Normalised
Summarised
0 50 100 150 200 250 300 350 400
Ingest Rate (1d / 45M rows)
K-row/sec
Conclusion.. ?
Conclusion… ?
● I view Postgres as a very flexible
“data persistence toolbox”
● ...which happens to use SQL
● Batteries not always included
● That doesn’t mean it’s hard
● Operational advantages of using
general purpose tools can be huge
● Use & deploy what you know & trust

More Related Content

What's hot

Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Gruter
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Cloudera, Inc.
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
Great Wide Open
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Gruter
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
N Masahiro
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
Renato Javier Marroquín Mogrovejo
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
Taro L. Saito
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
Prasanna Rajaperumal
 

What's hot (20)

Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14Giraph+Gora in ApacheCon14
Giraph+Gora in ApacheCon14
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 

Viewers also liked

PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL-Consulting
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
PostgreSQL-Consulting
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
Federico Campoli
 
PostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and ProfitPostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and Profit
David Fetter
 
Backups
BackupsBackups
Backups
Payal Singh
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
Jimmy Angelakos
 
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure MonitoringDissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dietmar Offenhuber
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Jimmy Angelakos
 
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT  Infrastructure & ServicesIT Executive Survey: Strategies for Monitoring IT  Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
CA Technologies
 
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Ontico
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
PostgreSQL-Consulting
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Grier Johnson
 
Microsoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManagerMicrosoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManager
ManageEngine
 
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Badoo Development
 
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре..."Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
Badoo Development
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL-Consulting
 
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Badoo Development
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootout
tomdc
 

Viewers also liked (20)

PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
PostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and ProfitPostgreSQL Hooks for Fun and Profit
PostgreSQL Hooks for Fun and Profit
 
Backups
BackupsBackups
Backups
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure MonitoringDissertation Defense Dec. 11, Participatory Infrastructure Monitoring
Dissertation Defense Dec. 11, Participatory Infrastructure Monitoring
 
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph DatabaseBringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
Bringing the Semantic Web closer to reality: PostgreSQL as RDF Graph Database
 
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT  Infrastructure & ServicesIT Executive Survey: Strategies for Monitoring IT  Infrastructure & Services
IT Executive Survey: Strategies for Monitoring IT Infrastructure & Services
 
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
 
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
Metrics and Monitoring Infrastructure: Lessons Learned Building Metrics at Li...
 
Microsoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManagerMicrosoft Infrastructure Monitoring using OpManager
Microsoft Infrastructure Monitoring using OpManager
 
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
Облако в Badoo год спустя - работа над ошибками, Юрий Насретдинов (Badoo)
 
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре..."Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
"Выбраться из спама - как повысить CTR рассылки без потери активности". Андре...
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
Приём платежей в Badoo - взгляд изнутри, Анатолий Панов (Badoo)
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootout
 

Similar to Infrastructure Monitoring with Postgres

How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17
Tom Arnfeld
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
Antons Kranga
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
ScyllaDB
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
HostedbyConfluent
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
Eva Tse
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
Alex Pinkin
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Amazon Web Services
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
Xiang Fu
 
Fom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalFom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-final
Luis Filipe Silva
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
MySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and GraphiteMySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and Graphite
DB-Art
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
plumbee
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Puppet
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
ScyllaDB
 

Similar to Infrastructure Monitoring with Postgres (20)

How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17How Cloudflare analyzes -1m dns queries per second @ Percona E17
How Cloudflare analyzes -1m dns queries per second @ Percona E17
 
Riga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWSRiga dev day: Lambda architecture at AWS
Riga dev day: Lambda architecture at AWS
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data PlatformRunning Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Fom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalFom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-final
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
MySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and GraphiteMySQL performance monitoring using Statsd and Graphite
MySQL performance monitoring using Statsd and Graphite
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 

Recently uploaded

Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 

Recently uploaded (20)

Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 

Infrastructure Monitoring with Postgres

  • 1. Infrastructure Monitoring (with Postgres, obviously) Steve Simpson StackHPC steve@stackhpc.com www.stackhpc.com
  • 2. Overview 1) Background 2) Monitoring Postgres for Metrics 3) Requirements 4) Data & Queries 5) Optimisation Postgres for ... 6) Log Searching 7) Log Parsing 8) Queueing
  • 5. Background Based in Bristol, UK Thriving Tech Industry
  • 6. Background ● Gnodal ● 10GbE Ethernet ● ASIC Verification ● Embedded Firmware ● JustOne Database ● Agile “Big Data” RMDBS ● Based on PostgreSQL ● Storage Team Lead
  • 7. Background Consultancy for HPC on OpenStack Multi-tenant massively parallel workloads Monitoring complex infrastructure Stack HPC
  • 8. Background Cloud orchestration platform IaaS through API and dashboard Multi-tenancy throughout Network, Compute, Storage
  • 9. Background Operational visibility is critical OpenStack is a complex, distributed application …to run your complex, distributed applications
  • 11. Monitoring Requirements Gain visibility into the operation of the hardware and software e.g. web site, database, cluster, disk drive
  • 12. Monitoring Requirements Fault finding and alerting Notify me when a server or service is unavailable, a disk needs replacing, ... Fault post-mortem, pre-emption Why did the outage occur and what can we do to prevent it next time
  • 13. Monitoring Requirements Utilisation and efficiency analysis Is all the hardware we own being used? Is it being used efficiently? Performance monitoring and profiling How long are my web/database requests?
  • 14. Monitoring Requirements Auditing (security, billing) Tracking users use of system Auditing access to systems or resources Decision making, future planning What is expected growth in data, or users? What of the current system is most used?
  • 16. Existing Tools Checking and Alerting Agents check on machines or services Report centrally, notify users via dashboard Store history of events in database
  • 17. Existing Tools Nagios / Icinga ping -c 1 $host || mail -s “Help!” $me
  • 19. Existing Tools Metrics Periodically collect metrics, e.g. CPU% Store in central database for visualization Some systems allow checking on top
  • 22. Existing Tools Grafana - visualization only
  • 23. Existing Tools Metrics Databases ● Ganglia (RRDtool) ● Graphite (Whisper) ● OpenTSDB (HBase) ● KairosDB (Cassandra) ● InfluxDB ● Prometheus ● Gnocchi ● Atlas ● Heroic ● Hawkular (Cassandra) ● MetricTank (Cassandra) ● Riak TS (Riak) ● Blueflood (Cassandra) ● DalmatinerDB ● Druid ● BTrDB ● Warp 10 (Hbase) ● Tgres (PostgreSQL!)
  • 24. Existing Tools Metrics Databases ● Ganglia [Berkley] ● Graphite [Orbitz] ● OpenTSDB [Stubleupon] ● KairosDB ● InfluxDB ● Prometheus [SoundCloud] ● Gnocchi [OpenStack] ● Atlas [Netflix] ● Heroic [Spotify] ● Hawkular [Redhat] ● MetricTank [Raintank] ● Riak TS [Basho] ● Blueflood [Rackspace] ● DalmatinerDB ● Druid ● BTrDB ● Warp 10 ● Tgres
  • 31. Monasca Existing Tools Software Network Storage Log API Metric API Alerting Servers Metrics Logs
  • 32. Monasca Existing Tools Software Network Storage Log API Metric API Alerting MySQL Servers Metrics Logs
  • 33. Monasca Existing Tools Software Network Storage Log API InfluxDB Metric API Alerting MySQL Servers Metrics Logs
  • 34. Monasca Existing Tools Software Network Storage Log API InfluxDB Metric API Alerting Grafana MySQL SQLite Servers Metrics Logs
  • 35. Monasca Existing Tools Software Network Storage Log API Logstash Elastic InfluxDB Metric API Alerting Grafana MySQL SQLite Servers Metrics Logs
  • 36. Monasca Existing Tools Software Network Storage Log API Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs
  • 37. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs
  • 38. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs Zookeeper
  • 39. Existing Tools Commendable “right tool for the job” attitude, but… How about Postgres? Fewer points of failure Fewer places to backup Fewer redundancy protocols One set of consistent data semantics Re-use existing operational knowledge
  • 40. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana MySQL SQLite Servers Metrics Logs Zookeeper
  • 41. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana SQLite Servers Metrics Logs Zookeeper
  • 42. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic InfluxDB Metric API Alerting Grafana Kibana Servers Metrics Logs Zookeeper
  • 43. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Elastic Metric API Alerting Grafana Kibana Servers Metrics Logs Zookeeper
  • 44. Monasca Existing Tools Software Network Storage Log API Kafka Logstash Metric API Alerting Grafana Grafana? Servers Metrics Logs Zookeeper
  • 45. Monasca Existing Tools Software Network Storage Log API Logstash Metric API Alerting Grafana Grafana? Servers Metrics Logs Zookeeper
  • 46. Monasca Existing Tools Software Network Storage Log API Logstash Metric API Alerting Grafana Grafana? Servers Metrics Logs
  • 47. Monasca Existing Tools Software Network Storage Log API Metric API Alerting Grafana Grafana? Servers Metrics Logs
  • 48. Monasca Existing Tools Software Network Storage Log API Metric API Alerting Grafana Servers Metrics Logs
  • 50. Postgres for Metrics Requirements ● ~45M values/day (80x196 per 30s) ● 6 month history ● <1TB disk footprint ● <100ms queries
  • 51. Postgres for Metrics Combine Series average over all for {series=cpu} [time range/interval] Read Series for each {type} for {series=cpu} [time range/interval]
  • 52. Postgres for Metrics List Dimension Values List Dimension Names List Metric Names "metrics": [ "cpu.percent", "cpu.user_perc", "net.out_bytes_sec", "net.out_errors_sec", "net.in_bytes_sec", "net.in_errors_sec" … ] "dimensions": [ "device", "hostname", "instance", "mount_point", "process_name", "process_user" … ] "hostname": [ "dev-01", "dev-02", "staging-01", "staging-02", "prod-01", "prod-02" … ]
  • 54. Postgres for Metrics "metric": { "timestamp": 1232141412, "name": "cpu.percent", "value": 42, "dimensions": { "hostname": "dev-01" }, "value_meta": { … } } JSON Ingest Format Known, well defined structure Varying set of dimensions key/values
  • 55. Postgres for Metrics CREATE TABLE measurements ( timestamp TIMESTAMPTZ, name VARCHAR, value FLOAT8, dimensions JSONB, value_meta JSON ); Basic Denormalised Schema Straightforward mapping onto input data Data model for all schemas
  • 56. Postgres for Metrics SELECT TIME_ROUND(timestamp, 60) AS timestamp, AVG(value) AS avg FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' AND dimensions @> '{"hostname": "dev-01"}'::JSONB GROUP BY timestamp Single Series Query One hour window | Single hostname Measurements every 60 second interval
  • 57. Postgres for Metrics SELECT TIME_ROUND(timestamp, 60) AS timestamp, AVG(value) AS avg, dimensions ->> 'hostname' AS hostname FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' GROUP BY timestamp, hostname Group Multi-Series Query One hour window | Every hostname Measurements every 60 second interval
  • 58. Postgres for Metrics SELECT TIME_ROUND(timestamp, 60) AS timestamp, AVG(value) AS avg FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' GROUP BY timestamp All Multi-Series Query One hour window | Every hostname Measurements every 60 second interval
  • 59. Postgres for Metrics SELECT DISTINCT name FROM measurements Metric Name List Query :)
  • 60. Postgres for Metrics SELECT DISTINCT JSONB_OBJECT_KEYS(dimensions) AS d_name WHERE name = 'cpu.percent' FROM measurements Dimension Name List Query (for specific metric)
  • 61. Postgres for Metrics SELECT DISTINCT dimensions ->> 'hostname' AS d_value WHERE name = 'cpu.percent' AND dimensions ? 'hostname' FROM measurements Dimension Value List Query (for specific metric and dimension)
  • 63. Postgres for Metrics CREATE TABLE measurements ( timestamp TIMESTAMPTZ, name VARCHAR, value FLOAT8, dimensions JSONB, value_meta JSON ); CREATE INDEX ON measurements (name, timestamp); CREATE INDEX ON measurements USING GIN (dimensions); Indexes Covers all necessary query terms Using single GIN saves space, but slower
  • 64. Postgres for Metrics ● Series Queries ● All, Group, Specific ● Varying Time Window/Interval 5m|15s, 1h|15s, 1h|300s, 6h|300s, 24h|300s ● Listing Queries ● Metric Names, Dimension Names & Values ● All, Partial
  • 65. Postgres for Metrics Single Group All 0 2000 4000 6000 8000 10000 12000 "Denormalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 66. Postgres for Metrics Single Group All 0 500 1000 1500 2000 2500 "Denormalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 67. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 10000 20000 30000 40000 50000 60000 "Denormalised" Listing Queries All Partial Duration (ms)
  • 68. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 1000 2000 3000 4000 5000 6000 7000 8000 "Denormalised" Listing Queries All Partial Duration (ms)
  • 69. Postgres for Metrics CREATE TABLE measurement_values ( timestamp TIMESTAMPTZ, metric_id INT, value FLOAT8, value_meta JSON ); CREATE TABLE metrics ( id SERIAL, name VARCHAR, dimensions JSONB, ); Normalised Schema Reduces duplication of data Pre-built set of distinct metric definitions
  • 70. Postgres for Metrics CREATE FUNCTION get_metric_id (in_name VARCHAR, in_dims JSONB) RETURNS INT LANGUAGE plpgsql AS $_$ DECLARE out_id INT; BEGIN SELECT id INTO out_id FROM metrics AS m WHERE m.name = in_name AND m.dimensions = in_dims; IF NOT FOUND THEN INSERT INTO metrics ("name", "dimensions") VALUES (in_name, in_dims) RETURNING id INTO out_id; END IF; RETURN out_id; END; $_$; Normalised Schema Function to use at insert time Finds existing metric_id or allocates new
  • 71. Postgres for Metrics CREATE VIEW measurements AS SELECT * FROM measurement_values INNER JOIN metrics ON (metric_id = id); CREATE INDEX metrics_idx ON metrics (name, dimensions); CREATE INDEX measurements_idx ON measurement_values (metric_id, timestamp); Normalised Schema Same queries, use view to join Extra index to help normalisation step
  • 72. Postgres for Metrics Single Group All 0 500 1000 1500 2000 2500 "Normalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 73. Postgres for Metrics Single Group All 0 200 400 600 800 1000 "Normalised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 74. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 200 400 600 800 1000 "Normalised" Listing Queries All Partial Duration (ms)
  • 75. Postgres for Metrics ● As time window grows less detail is necessary, e.g. ● 30s interval at 1 hour ● 300s interval at 6 hour
  • 76. Postgres for Metrics Timestamp Metric Value 10:00:00 1 10 10:00:00 2 2 10:00:30 1 10 10:00:30 2 4 10:01:30 1 20 10:01:30 2 4 10:02:00 1 15 10:02:00 2 2 10:02:30 1 5 10:02:30 2 2 10:03:00 1 10 10:03:00 2 6 Timestamp Metric Value 10:00:00 1 40 10:00:00 2 10 10:02:00 1 30 10:02:00 2 8
  • 77. Postgres for Metrics CREATE TABLE summary_values_5m ( timestamp TIMESTAMPTZ, metric_id INT, value_sum FLOAT8, value_count FLOAT8, value_min FLOAT8, value_max FLOAT8, UNIQUE (metric_id, timestamp) ); Summarised Schema Pre-compute every 5m (300s) interval Functions to be applied must be known
  • 78. Postgres for Metrics CREATE FUNCTION update_summarise () RETURNS TRIGGER LANGUAGE plpgsql AS $_$ BEGIN INSERT INTO summary_values_5m VALUES ( TIME_ROUND(NEW.timestamp, 300), NEW.metric_id, NEW.value, 1, NEW.value, NEW.value) ON CONFLICT (metric_id, timestamp) DO UPDATE SET value_sum = value_sum + EXCLUDED.value_sum, value_count = value_count + EXCLUDED.value_count, value_min = LEAST (value_min, EXCLUDED.value_min), value_max = GREATEST(value_max, EXCLUDED.value_max); RETURN NULL; END; $_$; Summarised Schema Entry for each metric/rounded time period Update existing entries by aggregating
  • 79. Postgres for Metrics CREATE TRIGGER update_summarise_trigger AFTER INSERT ON measurement_values FOR EACH ROW EXECUTE PROCEDURE update_summarise (); CREATE VIEW summary_5m AS SELECT * FROM summary_values_5m INNER JOIN metrics ON (metric_id = id); Summarised Schema Trigger applies row to summary table View mainly for convenience when querying
  • 80. Postgres for Metrics SELECT TIME_ROUND(timestamp, 300) AS timestamp, AVG(value) AS avg FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z06:00:00' AND name = 'cpu.percent' GROUP BY timestamp Combined Series Query Six hour window | Every hostname Measurements every 300 second interval
  • 81. Postgres for Metrics SELECT TIME_ROUND(timestamp, 300) AS timestamp, SUM(value_sum) / SUM(value_count) AS avg FROM summary_5m WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z06:00:00' AND name = 'cpu.percent' GROUP BY timestamp Combined Series Query Use pre-aggregated summary table Mostly the same; extra fiddling for AVG
  • 82. Postgres for Metrics Single Group All 0 200 400 600 800 1000 "Summarised" Series Queries 5m (15s) 1h (15s) 1h (300s) 6h (300s) 24h (300s) Duration (ms)
  • 83. Postgres for Metrics Dimension Values Dimension Names Metric Names 0 200 400 600 800 1000 "Summarised" Listing Queries All Partial Duration (ms)
  • 84. Postgres for Metrics Summarised Normalised Denormalised 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Ingest Time (1 day / 45M rows) Seconds
  • 85. Postgres for Metrics Summarised Normalised Denormalised 0 500 1000 1500 2000 2500 3000 3500 4000 Ingest Time (1 day / 45M rows) Seconds
  • 86. Postgres for Metrics Summarised Normalised Denormalised 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Disk Usage (1 day / 45M rows) MB
  • 87. Postgres for Metrics ● Need coarser summaries for wider queries (e.g. 30m summaries) ● Need to partition data by day to: ● Retain ingest rate due to indexes ● Optimise dropping old data ● Much better ways to produce summaries to optimise ingest, specifically: ● Process rows in batches of interval size ● Process asynchronous to ingest transaction
  • 90. Postgres for Log Searching Requirements ● Central log storage ● Trivially searchable ● Time bounded ● Filter ‘dimensions’ ● Interactive query times (<100ms)
  • 91. Postgres for Log Searching "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy" "hostname": "dev-controller-0" }, } Log Ingest Format Typically sourced from rsyslog Varying set of dimensions key/values
  • 92. Postgres for Log Searching CREATE TABLE logs ( timestamp TIMESTAMPTZ, message VARCHAR, dimensions JSONB ); Basic Schema Straightforward mapping of source data Allow for maximum dimension flexibility
  • 93. Postgres for Log Searching connection AND program:haproxy Query Example Kibana/Elastic style using PG-FTS SELECT * FROM logs WHERE TO_TSVECTOR('english', message) @@ TO_TSQUERY('connection') AND dimensions @> '{"program":"haproxy"}';
  • 94. Postgres for Log Searching CREATE INDEX ON logs USING GIN (TO_TSVECTOR('english', message)); CREATE INDEX ON logs USING GIN (dimensions); Indexes Enables fast text search on ‘message’ & Fast filtering based on ‘dimensions’
  • 96. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0" }, }
  • 97. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0" }, }
  • 98. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0", "tags": [ "connect" ] }, }
  • 99. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0", "tags": [ "connect" ] }, }
  • 100. Postgres for Log Parsing "log": { "timestamp": 1232141412, "message": "Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)", "dimensions": { "severity": 6, "facility": 16, "pid": "39762", "program": "haproxy", "hostname": "dev-controller-0", "tags": [ "connect" ], "src_ip": "172.16.8.1", "src_port": "52690", "dest_ip": "172.16.8.10", "dest_port": "5000", "service_name": "keystone", "protocol": "HTTP" }, }
  • 101. Postgres for Log Parsing ….regex! # SELECT REGEXP_MATCHES( 'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)', 'Connect from ' || '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)' || ' ((w+)/(w+))' ); regexp_matches --------------------------------------------------- {172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP} (1 row)
  • 102. Postgres for Log Parsing Garnish with JSONB # SELECT JSONB_PRETTY(JSONB_OBJECT( '{src_ip,src_port,dest_ip,dest_port,service, protocol}', '{172.16.8.1,52690,172.16.8.10,5000,keystone,HTTP}' )); jsonb_pretty ------------------------------- { + "src_ip": "172.16.8.1", + "dest_ip": "172.16.8.10",+ "service": "keystone", + "protocol": "HTTP", + "src_port": "52690", + "dest_port": "5000" + } (1 row)
  • 103. Postgres for Log Parsing CREATE TABLE logs ( timestamp TIMESTAMPTZ, message VARCHAR, dimensions JSONB ); Log Schema – Goals: Parse message against set of patterns Add extracted information as dimensions
  • 104. Postgres for Log Parsing Patterns Table Store pattern to match and field names CREATE TABLE patterns ( regex VARCHAR, field_names VARCHAR[] ); INSERT INTO patterns (regex, fields_names) VALUES ( 'Connect from ' || '(d+.d+.d+.d+):(d+) to (d+.d+.d+.d+):(d+)' || ' ((w+)/(w+))', '{src_ip,src_port,dest_ip,dest_port,service,protocol}' );
  • 105. Postgres for Log Parsing Log Processing Apply all configured patterns to new rows CREATE FUNCTION process_log () RETURNS TRIGGER LANGUAGE PLPGSQL AS $_$ DECLARE m JSONB; p RECORD; BEGIN FOR p IN SELECT * FROM patterns LOOP m := JSONB_OBJECT(p.field_names, REGEXP_MATCHES(NEW.message, p.regex)); IF m IS NOT NULL THEN NEW.dimensions := NEW.dimensions || m END IF; END LOOP; RETURN NEW; END; $_$;
  • 106. Postgres for Log Parsing CREATE TRIGGER process_log_trigger BEFORE INSERT ON logs FOR EACH ROW EXECUTE PROCEDURE process_log (); Log Processing Trigger Apply patterns as messages and extend dimensions, as inserted into logs table
  • 107. Postgres for Log Parsing # INSERT INTO logs (timestamp, message, dimensions) VALUES ( '2017-01-03T06:29:09.043Z', 'Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP)', '{"hostname": "dev-controller-0", "program": "haproxy"}'); # SELECT timestamp, message, JSONB_PRETTY(dimensions) FROM logs; -[ RECORD 1 ]+------------------------------------------------------------------ timestamp | 2017-01-03 06:29:09.043+00 message | Connect from 172.16.8.1:52690 to 172.16.8.10:5000 (keystone/HTTP) jsonb_pretty | { + | "src_ip": "172.16.8.1", + | "dest_ip": "172.16.8.10", + | "program": "haproxy", + | "service": "keystone", + | "hostname": "dev-controller-0", + | "protocol": "HTTP", + | "src_port": "52690", + | "dest_port": "5000" + | }
  • 109. Requirements ● Offload data burden from producers ● Persist as soon as possible to avoid loss ● Handle high velocity burst loads ● Data does not need to be queryable Postgres for Queueing
  • 110. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 111. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 112. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 113. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 114. Postgres for Queueing WITH BINARY VARCHAR JSON JSONB Denormalised Normalised Summarised 0 50 100 150 200 250 300 350 400 Ingest Rate (1d / 45M rows) K-row/sec
  • 116. Conclusion… ? ● I view Postgres as a very flexible “data persistence toolbox” ● ...which happens to use SQL ● Batteries not always included ● That doesn’t mean it’s hard ● Operational advantages of using general purpose tools can be huge ● Use & deploy what you know & trust