Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

pgday.seoul 2019: TimescaleDB

170 views

Published on

pgday.seoul 2019
TimescaleDB: Building a scalable time-series database on PostgreSQL

Published in: Technology
  • Be the first to comment

  • Be the first to like this

pgday.seoul 2019: TimescaleDB

  1. 1. TimescaleDB: Building a scalable time-series database on PostgreSQL Chanshik Lim Developer at NexCloud chanshik@gmail.com
  2. 2. Agenda • Time-series Data? • TimescaleDB Overview • Using TimescaleDB • Q & A
  3. 3. Time-series Data?
  4. 4. Time-series Data? (1) timestamp device_id cpu_1m_avg free_mem temperature location_id dev_type 2017-01-01 01:02:00 abc123 80 500MB 72 335 field 2017-01-01 01:02:23 def456 90 400MB 64 335 roof 2017-01-01 01:02:30 ghi789 120 0MB 56 77 roof 2017-01-01 01:03:12 abc123 80 500MB 72 335 field 2017-01-01 01:03:35 def456 95 350MB 64 335 roof 2017-01-01 01:03:42 ghi789 100 100MB 56 77 roof
  5. 5. Time-series Data? (2) • Time-centric • Data records always have a timestamp • Append-only • Data is almost solely append-only (INSERTs) • Recent • New data is typically about recent time intervals
  6. 6. Time-series Data? (3) • Monitoring computer systems • VM, server, container metrics (CPU, free memory, net/disk IOPS) • Service and application metrics (request rates, request latency) • Financial trading systems • Classic securities, newer cryptocurrencies, payments, transaction events • Internet of Things • Data from sensors on industrial machines and equipment • Eventing applications • User/customer interaction data like clickstreams, pageviews, logins, singups • Environmental monitoring • Temperature, humidity, pressure, pH, pollen count, air flow, …
  7. 7. TimescaleDB Overview
  8. 8. Easy to Use • Full SQL interface for all SQL natively supported by PostgreSQL • Secondary indexes • Non time-based aggregates • Sub-queries • Window functions • Connects to any client or tool that speaks PostgresSQL • Time-oriented features • Robust support for Data retention policies
  9. 9. Scalable • Transparent time/space partitioning • Scaling up (single node) • Scaling out (private beta) • High data write rates • Right-sized chunks • Parallelized operations across chunks and servers
  10. 10. Reliable • Engineered up from PostgreSQL, packaged as an extension • Proven foundations • From 20+ years of PostgreSQL research • Streaming replication • Backups • Flexible management options • Compatible with existing PostgreSQL ecosystem and tooling
  11. 11. Architecture • Hypertables • Abstraction of a single continuous table across all space and time intervals • Chunks • Each chunk corresponds to a specific time interval and a region of partition key’s space
  12. 12. Using TimescaleDB
  13. 13. Installing • https://docs.timescale.com/latest/getting-started/installation • Using Docker Image • shm-size: set /dev/shm partition size • Mapping /var/lib/postgresql/data to host directory $ docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password -v /mnt/timescaledb:/var/lib/postgresql/data --shm-size 1G timescale/timescaledb:1.5.1-pg11
  14. 14. Setting up $ psql -U postgres -h localhost postgres=# create database tutorial; CREATE DATABASE postgres=# c tutorial You are now connected to database "tutorial" as user "postgres". tutorial=# create extension if not exists timescaledb cascade; NOTICE: extension "timescaledb" already exists, skipping CREATE EXTENSION
  15. 15. Creating a Hypertable tutorial=# CREATE TABLE conditions ( tutorial(# time TIMESTAMPTZ NOT NULL, tutorial(# location TEXT NOT NULL, tutorial(# temperature DOUBLE PRECISION NULL, tutorial(# humidity DOUBLE PRECISION NULL tutorial(# ); CREATE TABLE tutorial=# SELECT create_hypertable('conditions', 'time’, chunk_time_interval => interval '1 day'); create_hypertable ------------------------- (1,public,conditions,t) (1 row)
  16. 16. Inserting tutorial=# INSERT INTO conditions tutorial-# VALUES tutorial-# (NOW(), 'office', 70.0, 50.0), tutorial-# (NOW(), 'basement', 66.5, 60.0), tutorial-# (NOW(), 'garage', 77.0, 65.2); INSERT 0 3 tutorial=# select * from conditions; time | location | temperature | humidity -------------------------------+----------+-------------+---------- 2019-12-06 20:12:06.987648+00 | office | 70 | 50 2019-12-06 20:12:06.987648+00 | basement | 66.5 | 60 2019-12-06 20:12:06.987648+00 | garage | 77 | 65.2 (3 rows)
  17. 17. Querying tutorial=# SELECT time_bucket('15 minutes', time) AS fifteen_min, tutorial-# location, COUNT(*), tutorial-# MAX(temperature) AS max_temp, tutorial-# MAX(humidity) AS max_hum tutorial-# FROM conditions tutorial-# WHERE time > NOW() - interval '3 hours' tutorial-# GROUP BY fifteen_min, location tutorial-# ORDER BY fifteen_min DESC, max_temp DESC; fifteen_min | location | count | max_temp | max_hum ------------------------+----------+-------+----------+--------- 2019-12-06 20:00:00+00 | garage | 1 | 77 | 65.2 2019-12-06 20:00:00+00 | office | 1 | 70 | 50 2019-12-06 20:00:00+00 | basement | 1 | 66.5 | 60 (3 rows)
  18. 18. Q & A
  19. 19. References • https://docs.timescale.com/latest/introduction • https://www.youtube.com/watch?v=F-UGFSGlzsk • https://blog.timescale.com/blog/building-columnar-compression-in-a-row-oriented- database/

×