Managing Temporal Data in PostgreSQL
Anton Dignös
Free University of Bozen-Bolzano
SFScon 2022
November 11, 2022
AUTONOME
PROVINZ
BOZEN
SÜDTIROL
PROVINCIA
AUTONOMA
DI BOLZANO
ALTO ADIGE
Research Südtirol/Alto Adige 2019
Project ISTeP
CUP: I52F20000250003
EFRE 2014-2020
Project EFRE1164 PREMISE
CUP: I59C20000340009
Agenda
What is temporal data?
Period temporal data in Postgres
Time series data in Postgres
What are we working on?
This talk will only provide a glimpse, if you are interested in more details,
I am happy to talk to you during the conference!
SFScon 2022 2/22 A. Dignös
Temporal data
Temporal data can be found in many application
▶ HR contracts
▶ Insurance policies
▶ Tourism data
▶ Medical domain
▶ Stock market data
▶ Industrial data
SFScon 2022 3/22 A. Dignös
What is temporal data?
Data with a “timestamp”
+
The “timestamp” indicates the validity of the data
Examples:
▶ A contract with a validity period
▶ A sensor reading with the measurement time
▶ An error event with the happening time
SFScon 2022 4/22 A. Dignös
Basic utilities for date/time in Postgres
▶ Postgres provides different date/time datatypes1
▶ Many functions
▶ Operators (+, -)
▶ Calendar functions (EXTRACT, date trunc)
▶ Whoever worked with dates/timezones knows to appreciate these
1
https://www.postgresql.org/docs/current/datatype-datetime.html
SFScon 2022 5/22 A. Dignös
Topic of today
Today it is about temporal data, not just storing dates or time
▶ Period temporal data
▶ Contracts
▶ Manufacturing periods
▶ Error states
▶ Time series data
▶ Sensor readings
▶ Stock market data
▶ Error events
Let’s have a peek on what Postgres and it’s ecosystem has to offer!
SFScon 2022 6/22 A. Dignös
Highlights for period temporal data in Postgres
▶ Postgres provides range types2 for managing period data
▶ What are range types?
▶ Datatypes for periods ’[start, end)’
▶ Can have different forms – ’[ , )’,’[ , ]’, ’( , ]’, ’( , )’
▶ Available for different types, e.g., INT, NUMERIC, DATE
▶ Many predicates and functions
▶ Indices available (GiST, SP-GiST, btree gist)
▶ Very easy to use
▶ Avoid many programming mistakes
2
https://www.postgresql.org/docs/current/rangetypes.html
SFScon 2022 7/22 A. Dignös
An example
Product prices that change over time
CREATE TABLE prices(
product INT ,
period DATERANGE ,
value FLOAT );
INSERT INTO prices
VALUES (1, ’[2021 -08 -01 ,␣2022 -08 -01) ’, 25),
(1, ’[2022 -08 -01 ,) ’, 30),
(2, ’[2021 -08 -01 ,␣2022 -04 -01) ’, 10),
(2, ’[2022 -04 -01 ,) ’, 20);
product | period | value
---------+-------------------------+-------
1 | [2021-08-01,2022-08-01) | 25
1 | [2022-08-01,) | 30
2 | [2021-08-01,2022-04-01) | 10
2 | [2022-04-01,) | 20
SFScon 2022 8/22 A. Dignös
Common queries
▶ What are the prices of products today?
WHERE period @> CURRENT_DATE
▶ What were the prices of products on the 2021-10-30?
WHERE period @> ’2021 -10 -30 ’
▶ What were the previous prices of products?
WHERE period << daterange(CURRENT_DATE , NULL , ’[)’)
▶ What were the prices of products between 2021-10-30 and
2022-10-30?
WHERE period && DATERANGE(’2021 -10 -30 ’,’2022 -10 -30 ’, ’[]’)
SFScon 2022 9/22 A. Dignös
Uniqueness Constraints
Ensure a product does not have two prices at the same time
CREATE TABLE prices(
product INT ,
period DATERANGE ,
value FLOAT ,
EXCLUDE USING GIST (product WITH =, period WITH &&));
product | period | value
---------+-------------------------+-------
1 | [2021-08-01,2022-08-01) | 25
1 | [2022-08-01,) | 30
2 | [2021-08-01,2022-04-01) | 10
2 | [2022-04-01,) | 20
INSERT INTO product_prices VALUES (1, ’[2022 -08 -04 ,) ’, 100);
ERROR: conflicting key value violates exclusion constraint ...
DETAIL: Key (product, period)=(1, [2022-08-04,)) conflicts ...
SFScon 2022 10/22 A. Dignös
Take home messages
▶ Range types is Postgres’ native period datatype
▶ Convenient representation of periods
▶ Many base datatypes are supported
▶ Support different period definitions if needed
▶ Many convenient predicates and functions
▶ Less error prone than custom builds
▶ Can be speed up using GiST indices
▶ Uniqueness constraints available
▶ Avoid inconsistencies at the source
SFScon 2022 11/22 A. Dignös
Highlights for time series data in Postgres
▶ TimescaleDB can be used to manage time series in Postgres
▶ What is TimescaleDB?
▶ TimescaleDB is a Postgres extension (based on UDFs)
▶ Runs on server side
▶ License (two versions of TimescaleDB with different support)3
▶ TimescaleDB Apache 2 Edition (Apache 2.0 license)
▶ TimescaleDB Community Edition (Timescale License – TSL)
▶ See https://docs.timescale.com/timescaledb/latest/
timescaledb-edition-comparison
▶ Available for most platforms as a binary or compile form source
3
Thanks to Chris Mair from 1006.org for pointing this out during a previous talk!
SFScon 2022 12/22 A. Dignös
What does TimescaleDB do?
Eases the timeseries data management
▶ Convenient timeseries specific functions (hyperfunctions)
▶ Gap-filling and Interpolation
▶ Weighted averages
▶ . . .
▶ Partitioning (hypertables)
▶ Access less data (faster runtime)
▶ Compression
▶ Make data smaller (also faster runtime)
SFScon 2022 13/22 A. Dignös
Hyperfunctions/1
SFScon 2022 14/22 A. Dignös
Hyperfunctions/2
Produce a value every five minutes and interpolate missing ones
SELECT time_bucket_gapfill (’5␣minutes ’, time) AS five_min ,
avg(value) AS value , -- average from data
interpolate(avg(value )) -- interpolate average if missing
FROM sensor_signal
WHERE sensor_id = 3 AND time BETWEEN now () - INTERVAL ’20␣min ’
AND now ()
GROUP BY five_min
ORDER BY five_min;
five_min | value | interpolate
---------------------+-------+-------------
2022-11-11 15:40:00 | 16.2 | 16.2
2022-11-11 15:45:00 | | 16
2022-11-11 15:50:00 | 15.8 | 15.8
2022-11-11 15:55:00 | | 11.9
2022-11-11 16:00:00 | 8 | 8
SFScon 2022 15/22 A. Dignös
Hypertables/1
3
Picture taken from timescale.com
SFScon 2022 16/22 A. Dignös
Hypertables/2
Transform our table into a hypertable
SELECT create_hypertable (
’sensor_signal ’,
’time ’,
chunk_time_interval => INTERVAL ’2␣days ’,
partitioning_column => ’sensor_id ’,
number_partitions => 2,
if_not_exists => true ,
migrate_data => true
);
▶ Partition by range on time every two days
▶ Partition by hash on id using 2 partitions
SFScon 2022 17/22 A. Dignös
Hypertables/3
▶ Be careful with the partitioning
▶ Relevant partitions are merged using UNION ALL
▶ New data keeps on adding partitions
▶ Example: 100 sensors and 3 years of data
chunk time interval => ’INTERVAL 7 days’
number partitions => 50
Result: potentially 3 · 52 · 50 = 7800 tables!!
SFScon 2022 18/22 A. Dignös
Compression
▶ Compression aims at reducing the size of the data
▶ Done at a per chunk (partition) level
▶ Usually also improves query time
▶ Transparent to the user
▶ Done via a TimescaleDB function
SFScon 2022 19/22 A. Dignös
Take home messages
▶ Timescale handles timeseries data transparently
▶ For you it is just a relation
▶ SQL will still work as before
▶ Use hyperfunctions
▶ Handy and much faster than custom builds
▶ Keep on improving
▶ Use hypertables
▶ Limit the search space
▶ But be careful with how to partition
▶ Use compression
▶ Improves performance substantially
▶ Should be used on (old) read-only data
SFScon 2022 20/22 A. Dignös
What are we working on?
▶ Period temporal data (project: ISTeP4)
▶ Temporal range and overlap joins
▶ Temporal anomalies in healthcare information systems
▶ Temporal key/foreign constraints
▶ Temporal histograms for cardinality estimation
▶ Time series data (project: PREMISE5)
▶ Predictive maintenance for industrial equipment
▶ Data ingestion infrastructure
▶ Data storage infrastructure
▶ Feature extraction
4
https://dbs.inf.unibz.it/projects/istep/
5
https://dbs.inf.unibz.it/projects/premise/
SFScon 2022 21/22 A. Dignös
Thank you!
anton.dignoes@unibz.it
SFScon 2022 22/22 A. Dignös

SFScon22 - Anton Dignoes - Managing Temporal Data in PostgreSQL.pdf

  • 1.
    Managing Temporal Datain PostgreSQL Anton Dignös Free University of Bozen-Bolzano SFScon 2022 November 11, 2022 AUTONOME PROVINZ BOZEN SÜDTIROL PROVINCIA AUTONOMA DI BOLZANO ALTO ADIGE Research Südtirol/Alto Adige 2019 Project ISTeP CUP: I52F20000250003 EFRE 2014-2020 Project EFRE1164 PREMISE CUP: I59C20000340009
  • 2.
    Agenda What is temporaldata? Period temporal data in Postgres Time series data in Postgres What are we working on? This talk will only provide a glimpse, if you are interested in more details, I am happy to talk to you during the conference! SFScon 2022 2/22 A. Dignös
  • 3.
    Temporal data Temporal datacan be found in many application ▶ HR contracts ▶ Insurance policies ▶ Tourism data ▶ Medical domain ▶ Stock market data ▶ Industrial data SFScon 2022 3/22 A. Dignös
  • 4.
    What is temporaldata? Data with a “timestamp” + The “timestamp” indicates the validity of the data Examples: ▶ A contract with a validity period ▶ A sensor reading with the measurement time ▶ An error event with the happening time SFScon 2022 4/22 A. Dignös
  • 5.
    Basic utilities fordate/time in Postgres ▶ Postgres provides different date/time datatypes1 ▶ Many functions ▶ Operators (+, -) ▶ Calendar functions (EXTRACT, date trunc) ▶ Whoever worked with dates/timezones knows to appreciate these 1 https://www.postgresql.org/docs/current/datatype-datetime.html SFScon 2022 5/22 A. Dignös
  • 6.
    Topic of today Todayit is about temporal data, not just storing dates or time ▶ Period temporal data ▶ Contracts ▶ Manufacturing periods ▶ Error states ▶ Time series data ▶ Sensor readings ▶ Stock market data ▶ Error events Let’s have a peek on what Postgres and it’s ecosystem has to offer! SFScon 2022 6/22 A. Dignös
  • 7.
    Highlights for periodtemporal data in Postgres ▶ Postgres provides range types2 for managing period data ▶ What are range types? ▶ Datatypes for periods ’[start, end)’ ▶ Can have different forms – ’[ , )’,’[ , ]’, ’( , ]’, ’( , )’ ▶ Available for different types, e.g., INT, NUMERIC, DATE ▶ Many predicates and functions ▶ Indices available (GiST, SP-GiST, btree gist) ▶ Very easy to use ▶ Avoid many programming mistakes 2 https://www.postgresql.org/docs/current/rangetypes.html SFScon 2022 7/22 A. Dignös
  • 8.
    An example Product pricesthat change over time CREATE TABLE prices( product INT , period DATERANGE , value FLOAT ); INSERT INTO prices VALUES (1, ’[2021 -08 -01 ,␣2022 -08 -01) ’, 25), (1, ’[2022 -08 -01 ,) ’, 30), (2, ’[2021 -08 -01 ,␣2022 -04 -01) ’, 10), (2, ’[2022 -04 -01 ,) ’, 20); product | period | value ---------+-------------------------+------- 1 | [2021-08-01,2022-08-01) | 25 1 | [2022-08-01,) | 30 2 | [2021-08-01,2022-04-01) | 10 2 | [2022-04-01,) | 20 SFScon 2022 8/22 A. Dignös
  • 9.
    Common queries ▶ Whatare the prices of products today? WHERE period @> CURRENT_DATE ▶ What were the prices of products on the 2021-10-30? WHERE period @> ’2021 -10 -30 ’ ▶ What were the previous prices of products? WHERE period << daterange(CURRENT_DATE , NULL , ’[)’) ▶ What were the prices of products between 2021-10-30 and 2022-10-30? WHERE period && DATERANGE(’2021 -10 -30 ’,’2022 -10 -30 ’, ’[]’) SFScon 2022 9/22 A. Dignös
  • 10.
    Uniqueness Constraints Ensure aproduct does not have two prices at the same time CREATE TABLE prices( product INT , period DATERANGE , value FLOAT , EXCLUDE USING GIST (product WITH =, period WITH &&)); product | period | value ---------+-------------------------+------- 1 | [2021-08-01,2022-08-01) | 25 1 | [2022-08-01,) | 30 2 | [2021-08-01,2022-04-01) | 10 2 | [2022-04-01,) | 20 INSERT INTO product_prices VALUES (1, ’[2022 -08 -04 ,) ’, 100); ERROR: conflicting key value violates exclusion constraint ... DETAIL: Key (product, period)=(1, [2022-08-04,)) conflicts ... SFScon 2022 10/22 A. Dignös
  • 11.
    Take home messages ▶Range types is Postgres’ native period datatype ▶ Convenient representation of periods ▶ Many base datatypes are supported ▶ Support different period definitions if needed ▶ Many convenient predicates and functions ▶ Less error prone than custom builds ▶ Can be speed up using GiST indices ▶ Uniqueness constraints available ▶ Avoid inconsistencies at the source SFScon 2022 11/22 A. Dignös
  • 12.
    Highlights for timeseries data in Postgres ▶ TimescaleDB can be used to manage time series in Postgres ▶ What is TimescaleDB? ▶ TimescaleDB is a Postgres extension (based on UDFs) ▶ Runs on server side ▶ License (two versions of TimescaleDB with different support)3 ▶ TimescaleDB Apache 2 Edition (Apache 2.0 license) ▶ TimescaleDB Community Edition (Timescale License – TSL) ▶ See https://docs.timescale.com/timescaledb/latest/ timescaledb-edition-comparison ▶ Available for most platforms as a binary or compile form source 3 Thanks to Chris Mair from 1006.org for pointing this out during a previous talk! SFScon 2022 12/22 A. Dignös
  • 13.
    What does TimescaleDBdo? Eases the timeseries data management ▶ Convenient timeseries specific functions (hyperfunctions) ▶ Gap-filling and Interpolation ▶ Weighted averages ▶ . . . ▶ Partitioning (hypertables) ▶ Access less data (faster runtime) ▶ Compression ▶ Make data smaller (also faster runtime) SFScon 2022 13/22 A. Dignös
  • 14.
  • 15.
    Hyperfunctions/2 Produce a valueevery five minutes and interpolate missing ones SELECT time_bucket_gapfill (’5␣minutes ’, time) AS five_min , avg(value) AS value , -- average from data interpolate(avg(value )) -- interpolate average if missing FROM sensor_signal WHERE sensor_id = 3 AND time BETWEEN now () - INTERVAL ’20␣min ’ AND now () GROUP BY five_min ORDER BY five_min; five_min | value | interpolate ---------------------+-------+------------- 2022-11-11 15:40:00 | 16.2 | 16.2 2022-11-11 15:45:00 | | 16 2022-11-11 15:50:00 | 15.8 | 15.8 2022-11-11 15:55:00 | | 11.9 2022-11-11 16:00:00 | 8 | 8 SFScon 2022 15/22 A. Dignös
  • 16.
    Hypertables/1 3 Picture taken fromtimescale.com SFScon 2022 16/22 A. Dignös
  • 17.
    Hypertables/2 Transform our tableinto a hypertable SELECT create_hypertable ( ’sensor_signal ’, ’time ’, chunk_time_interval => INTERVAL ’2␣days ’, partitioning_column => ’sensor_id ’, number_partitions => 2, if_not_exists => true , migrate_data => true ); ▶ Partition by range on time every two days ▶ Partition by hash on id using 2 partitions SFScon 2022 17/22 A. Dignös
  • 18.
    Hypertables/3 ▶ Be carefulwith the partitioning ▶ Relevant partitions are merged using UNION ALL ▶ New data keeps on adding partitions ▶ Example: 100 sensors and 3 years of data chunk time interval => ’INTERVAL 7 days’ number partitions => 50 Result: potentially 3 · 52 · 50 = 7800 tables!! SFScon 2022 18/22 A. Dignös
  • 19.
    Compression ▶ Compression aimsat reducing the size of the data ▶ Done at a per chunk (partition) level ▶ Usually also improves query time ▶ Transparent to the user ▶ Done via a TimescaleDB function SFScon 2022 19/22 A. Dignös
  • 20.
    Take home messages ▶Timescale handles timeseries data transparently ▶ For you it is just a relation ▶ SQL will still work as before ▶ Use hyperfunctions ▶ Handy and much faster than custom builds ▶ Keep on improving ▶ Use hypertables ▶ Limit the search space ▶ But be careful with how to partition ▶ Use compression ▶ Improves performance substantially ▶ Should be used on (old) read-only data SFScon 2022 20/22 A. Dignös
  • 21.
    What are weworking on? ▶ Period temporal data (project: ISTeP4) ▶ Temporal range and overlap joins ▶ Temporal anomalies in healthcare information systems ▶ Temporal key/foreign constraints ▶ Temporal histograms for cardinality estimation ▶ Time series data (project: PREMISE5) ▶ Predictive maintenance for industrial equipment ▶ Data ingestion infrastructure ▶ Data storage infrastructure ▶ Feature extraction 4 https://dbs.inf.unibz.it/projects/istep/ 5 https://dbs.inf.unibz.it/projects/premise/ SFScon 2022 21/22 A. Dignös
  • 22.