Successfully reported this slideshow.
Your SlideShare is downloading. ×

PGDay SF 2020 - Timeseries data in Postgres with updates

Upcoming SlideShare
Tumbling cube
Tumbling cube
Loading in …3
×

Check these out next

1 of 33 Ad
1 of 33 Ad

PGDay SF 2020 - Timeseries data in Postgres with updates

Download to read offline

One of the limiting factors of most timeseries databases is that, in order to get good read performance, they limit your ability to update data. That's fine if your data is an event stream, but if its coming from a pre-aggregated sources it might update past data, for example data about online ad performance updated after click fraud is discovered. In this talk I'll show you how AdStage stores timeseries data in Postgres to allow fast reads and updates using clever schema design and functions for speed.

One of the limiting factors of most timeseries databases is that, in order to get good read performance, they limit your ability to update data. That's fine if your data is an event stream, but if its coming from a pre-aggregated sources it might update past data, for example data about online ad performance updated after click fraud is discovered. In this talk I'll show you how AdStage stores timeseries data in Postgres to allow fast reads and updates using clever schema design and functions for speed.

Advertisement
Advertisement

More Related Content

Advertisement

PGDay SF 2020 - Timeseries data in Postgres with updates

  1. 1. PGDay SF 2020 G Gordon Worley III
  2. 2. ● Problem/use case ● Desiderata ● Possible solutions ● Limitations ● Naive approach ● Pivoted metrics ● Partitioning ● TimescaleDB ● "Columnar" storage ● Limit row count ● TOAST arrays ● Functions can be fast
  3. 3. Date Impressions Clicks Conversions Spend 2020-1-1 103 3 2 $8 2020-1-2 124 4 1 $15 2020-1-3 65 0 0 $5
  4. 4. Date Impressions Clicks Conversions Spend 2020-1-1 103 3 2 $8 2020-1-2 124 4 1 $15 2020-1-3 65 0 0 $5
  5. 5. ● ○ ■ ○ ● ○ ○ ○ ● ○
  6. 6. ● ● ● ● ● ● Cost effective ● Timeseries and aggregate queries ● Flexible schema
  7. 7. ● ● ● ● ● ● ● ●
  8. 8. ● ● ● ● ● ● ● ● ●
  9. 9. ● ● ● ● ● ● ● ● ● ●
  10. 10. ● ● ● ● ● ● ● ● ● ● ●
  11. 11. ● ● ● ● ● ○ ○
  12. 12. ● ○ ○ ○ ● ○ ○ ○ ●
  13. 13. ● ○ ● ○ ● ○ ● ○
  14. 14. Naive Approach create table entity_date_metric_value ( entity integer not null, ts date not null, metric integer not null, value numeric );
  15. 15. Naive Approach create table entity_date_metric_value ( entity integer not null, ts date not null, metric integer not null, value numeric ); ● ○ ○ ○ ●
  16. 16. Pivoted Metrics create table entity_date_metric ( entity integer not null, ts date not null, metric_1 numeric, metric_2 numeric, ... metric_100 numeric );
  17. 17. Pivoted Metrics create table entity_date_metric ( entity integer not null, ts date not null, metric_1 numeric, metric_2 numeric, ... metric_100 numeric ); ● ○ ○ ●
  18. 18. Partitioned Pivoted Metrics create table entity_date_metric ( entity integer not null, ts date not null, metric_1 numeric, metric_2 numeric, ... metric_100 numeric ) partition by range(ts);
  19. 19. Partitioned Pivoted Metrics create table entity_date_metric ( entity integer not null, ts date not null, metric_1 numeric, metric_2 numeric, ... metric_100 numeric ) partition by range(ts); ● ○ ●
  20. 20. Partitioned Pivoted Metrics select entity, date_trunc('week', ts) ts_trunc, sum(metric_*) ... from entity_date_metric where entity in (*list of 1000 random entities*) and ts between '2001-01-15' and '2001-02-15' group by entity, ts_trunc; ●
  21. 21. TimescaleDB create table entity_date_metric ( entity integer not null, ts date not null, metric_1 numeric, metric_2 numeric, ... metric_100 numeric ); select create_hypertable('entity_date_metric', 'ts');
  22. 22. TimescaleDB select entity, date_trunc('week', ts) ts_trunc, sum(metric_*) ... from entity_date_metric where entity in (*list of 1000 random entities*) and ts between '2001-01-15' and '2001-02-15' group by entity, ts_trunc; ●
  23. 23. ● ● ○ ○ ● ○ ○ ○ ○
  24. 24. TOASTy Arrays create table entity_metric ( entity integer not null, metric_1 numeric[], metric_2 numeric[], ... metric_100 numeric[] ); ● ● ○ ○ ○ ○ ● ○ ○
  25. 25. TOASTy Arrays ● ○ ● select entity, date_trunc('week', day) ts_trunc, sum(metric_*) ... from ( select entity, unnest(array(select generate_series(s, e, '1 day'))) as day, unnest(metric_*[start:end]) as metric_*, ... from entity_date_metric where entity in (*list of 1000 random entities*) ) as unnested_metrics group by entity, ts_trunc;
  26. 26. TOASTy Arrays ● ● ● ● ● What if I didn't have to UNNEST?
  27. 27. TOASTy Arrays create or replace function metric_array_sum( input numeric[] ) returns numeric as $fun$ declare output numeric := 0; begin if array_length(input, 1) is null then return null; end if; for i in array_lower(input, 1)..array_upper(input, 1) loop output := output + coalesce(input[i], 0); end loop; return output; end $fun$ language 'plpgsql' immutable strict parallel safe; ● ● ○ ○ ○
  28. 28. TOASTy Arrays ● ● select entity, metric_array_sum_by_date_part(metric_*, start, end, 'week') from entity_metric where entity in (*list of 1000 random entities*);
  29. 29. ● ● ●

×