Materialized views in PostgreSQL
Upcoming SlideShare
Loading in...5
×
 

Materialized views in PostgreSQL

on

  • 648 views

Presentation introducing materialized views in PostgreSQL with use cases. These slides were used for my talk at Indian PostgreSQL Users Group meetup at Hyderabad on 28th March, 2014

Presentation introducing materialized views in PostgreSQL with use cases. These slides were used for my talk at Indian PostgreSQL Users Group meetup at Hyderabad on 28th March, 2014

Statistics

Views

Total Views
648
Views on SlideShare
648
Embed Views
0

Actions

Likes
0
Downloads
18
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Materialized views in PostgreSQL Materialized views in PostgreSQL Presentation Transcript

  • © 2013 EDB All rights reserved. 1 Materialized views in PostgreSQL Ashutosh Bapat | 28th March, 2014
  • © 2013 EDB All rights reserved. 2 Theoretical background PostgreSQL's support Use cases
  • © 2013 EDB All rights reserved. 3 (SQL) View ● “Virtual relation” defined by a query ● Represents the result of the query ● Can be queried similar to a table ● Referencing view in a query, requires the defining query to be executed each time View: emp_with_good_salary SELECT emp_name FROM emp WHERE salary > 15000; Table: emp emp_name salary Kiran 10000 Mohan 20000 Leela 30000
  • © 2013 EDB All rights reserved. 4 Materialized View (MV) ● A “view” with results of associated query stored in the database ● Referencing a materialized view does not require execution of the query ● Needs to be “maintained” to keep up with changes in underlying objects (tables or views) ● Can be indexed unlike non-materialized view Table: emp emp_name salary Kiran 10000 Mohan 20000 Leela 30000 MV: emp_with_good_salary emp_name salary Mohan 20000 Leela 30000
  • © 2013 EDB All rights reserved. 5 Theoretical background PostgreSQL's support Use cases
  • © 2013 EDB All rights reserved. 6 ● Creation – CREATE MATERIALIZED VIEW ● Maintainance – REFRESH MATERIALIZED VIEW ● Destruction – DROP MATERIALIZED VIEW ● Supported from 9.3 ● Enhancements in 9.4 – REFRESH MATERIALIZED VIEW CONCURRENTLY Materialized Views in PostgreSQL
  • © 2013 EDB All rights reserved. 7 ● Lazy refresh – Materialized view usually contains stale data – REFRESH periodically or suitable independent of DML activity – ● Aggressive refresh – Materialized view contains latest data in serializable transactions and nearly fresh data at other isolation levels – REFRESH using triggers/rules Refreshing MV
  • © 2013 EDB All rights reserved. 8 ● Incremental refresh – Refreshing only those rows affected by changes to the underlying table – Being worked on community ● Using Materialized views for query optimization – Using MVs automatically ● Auto-refresh – Refreshing materialized view automatically when the underlying tables change What's not supported in 9.4
  • © 2013 EDB All rights reserved. 9 Theoretical background PostgreSQL's support Use cases
  • © 2013 EDB All rights reserved. 10 Reporting using stale data ● Very frequently updated tables ● Approximate reports are fine ● Create materialized view/s for reporting queries ● Refresh every night or on weekly/monthly basis
  • © 2013 EDB All rights reserved. 11 Reporting region-wise sales ● Table schema CREATE TABLE salesman(salesman_no integer PRIMARY KEY, name varchar(100), region varchar(100)); CREATE TABLE invoice (invoice_no integer PRIMARY KEY, salesman_no integer REFERENCES salesman, invoice_amt numeric(13, 2), invoice_date date); ● Reporting Query SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region ORDER BY region_sale LIMIT 10;
  • © 2013 EDB All rights reserved. 12 Reporting region-wise sales EXPLAIN ANALYZE SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region ORDER BY region_sale LIMIT 10; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=44294.16..44294.18 rows=10 width=234) (actual time=2609.868..2609.870 rows=10 loops=1) -> Sort (cost=44294.16..44294.66 rows=200 width=234) (actual time=2609.860..2609.861 rows=10 loops=1) Sort Key: (sum(i.invoice_amt)) Sort Method: top-N heapsort Memory: 26kB -> HashAggregate (cost=44287.84..44289.84 rows=200 width=234) (actual time=2609.347..2609.366 rows=26 loops=1) -> Hash Join (cost=559.84..39828.84 rows=891800 width=234) (actual time=29.751..1374.305 rows=1000000 loops=1) Hash Cond: (i.salesman_no = s.salesman_no) -> Seq Scan on invoice i (cost=0.00..15288.00 rows=891800 width=20) (actual time=0.048..398.745 rows=1000000 loops=1) -> Hash (cost=345.15..345.15 rows=5015 width=222) (actual time=29.602..29.602 rows=10000 loops=1) Buckets: 1024 Batches: 2 Memory Usage: 685kB -> Seq Scan on salesman s (cost=0.00..345.15 rows=5015 width=222) (actual time=0.009..5.221 rows=10000 loops=1) Total runtime: 2610.316 ms
  • © 2013 EDB All rights reserved. 13 Reporting region-wise sales CREATE MATERIALIZED VIEW sales_by_region AS SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region; EXPLAIN ANALYZE SELECT * FROM sales_by_region ORDER BY region_sale LIMIT 10; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Limit (cost=19.17..19.19 rows=10 width=250) (actual time=0.065..0.066 rows=10 loops=1) -> Sort (cost=19.17..19.89 rows=290 width=250) (actual time=0.064..0.064 rows=10 loops=1) Sort Key: region_sale Sort Method: top-N heapsort Memory: 26kB -> Seq Scan on sales_by_region (cost=0.00..12.90 rows=290 width=250) (actual time=0.007..0.013 rows=26 loops=1) Total runtime: 0.094 ms (6 rows)
  • © 2013 EDB All rights reserved. 14 Complex queries ● Relatively stable underlying tables ● Complex and slow running queries ● Bonus – Stale data not tolerable – use triggers to refresh – Faster query results – use indexes on MV
  • © 2013 EDB All rights reserved. 15 Shortest route problem ● Table schema CREATE TABLE roads (source char, dest char, length numeric(5, 2)); ● Slow query WITH RECURSIVE paths (source, dest, length, path) AS ( SELECT source, dest, length::float, '{}'::bpchar[] FROM roads WHERE source = 'A' UNION ALL SELECT p.source, r.dest, p.length + r.length, p.path || ARRAY[r.source] FROM paths p, roads r WHERE p.dest = r.source AND not (r.dest = ANY(p.path)) ) SELECT * FROM paths WHERE dest = 'L' ORDER BY length LIMIT 1;
  • © 2013 EDB All rights reserved. 16 SRP: without MV EXPLAIN ANALYZE output WITH RECURSIVE paths (source, dest, length, path) AS ( ORDER BY length LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=686.43..686.43 rows=1 width=56) (actual time=897.159..897.159 rows=1 loops=1) CTE paths -> Recursive Union (cost=0.00..581.31 rows=4667 width=76) (actual time=0.039..720.175 rows=138640 loops=1) -> Seq Scan on roads (cost=0.00..27.52 rows=7 width=28) (actual time=0.036..0.061 rows=5 loops=1) Filter: (source = 'A'::bpchar) Rows Removed by Filter: 75 -> Hash Join (cost=2.28..46.04 rows=466 width=76) (actual time=9.528..38.388 rows=8665 loops=16) Hash Cond: (r.source = p.dest) Join Filter: (r.dest <> ALL (p.path)) -> Seq Scan on roads r (cost=0.00..24.00 rows=1400 width=28) (actual time=0.010..0.025 rows=80 loops=16) -> Hash (cost=1.40..1.40 rows=70 width=56) (actual time=9.159..9.159 rows=8665 loops=16) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> WorkTable Scan on paths p (cost=0.00..1.40 rows=70 width=56) (actual time=0.008..3.959 rows=8665 loops=16) -> Sort (cost=105.12..105.18 rows=23 width=56) (actual time=897.154..897.154 rows=1 loops=1) Sort Key: paths.length Sort Method: top-N heapsort Memory: 25kB -> CTE Scan on paths (cost=0.00..105.01 rows=23 width=56) (actual time=0.696..896.652 rows=912 loops=1) Filter: (dest = 'L'::bpchar) Rows Removed by Filter: 137728 Total runtime: 900.970 ms (20 rows)
  • © 2013 EDB All rights reserved. 17 SRP: Materialized View CREATE MATERIALIZED VIEW paths AS WITH RECURSIVE paths (source, dest, length, path) AS ( SELECT source, dest, length::float, '{}'::bpchar[] FROM roads UNION ALL SELECT p.source, r.dest, p.length + r.length, p.path || ARRAY[r.source] FROM paths p, roads r WHERE p.dest = r.source AND not (r.dest = ANY(p.path)) ) SELECT * FROM paths; EXPLAIN ANALYZE SELECT * FROM paths WHERE source = 'A' and dest = 'L' ORDER BY length DESC LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Limit (cost=10623.33..10623.33 rows=1 width=56) (actual time=125.326..125.327 rows=1 loops=1) -> Sort (cost=10623.33..10623.35 rows=10 width=56) (actual time=125.324..125.324 rows=1 loops=1) Sort Key: length Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on paths (cost=0.00..10623.28 rows=10 width=56) (actual time=0.283..124.988 rows=912 loops=1) Filter: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) Rows Removed by Filter: 281233 Total runtime: 125.377 ms (8 rows)
  • © 2013 EDB All rights reserved. 18 SRP: MV with indexes CREATE INDEX i_paths_source on paths(source, dest); EXPLAIN ANALYZE SELECT * FROM paths WHERE source = 'A' and dest = 'L' ORDER BY length DESC LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.80..31.80 rows=1 width=56) (actual time=1.265..1.265 rows=1 loops=1) -> Sort (cost=31.80..31.81 rows=7 width=56) (actual time=1.264..1.264 rows=1 loops=1) Sort Key: length Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paths (cost=4.49..31.76 rows=7 width=56) (actual time=0.327..0.982 rows=912 loops=1) Recheck Cond: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) -> Bitmap Index Scan on i_paths_source (cost=0.00..4.49 rows=7 width=0) (actual time=0.304..0.304 rows=912 loops=1) Index Cond: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) Total runtime: 1.317 ms (9 rows)
  • © 2013 EDB All rights reserved. 19 SRP: latest data using triggers CREATE FUNCTION refresh_mvs() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN REFRESH MATERIALIZED VIEW paths; RETURN NULL; END; $$; CREATE TRIGGER paths_trig AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ON roads FOR EACH STATEMENT EXECUTE PROCEDURE refresh_mvs();
  • © 2013 EDB All rights reserved. 20 SRP: latest data using triggers SELECT * FROM paths WHERE source = 'T'; source | dest | length | path --------+------+--------+------ (0 rows) EXPLAIN ANALYZE INSERT INTO roads VALUES ('T', 'Z', 100.4); QUERY PLAN --------------------------------------------------------------------------------------------- Insert on roads (cost=0.00..0.01 rows=1 width=0) (actual time=0.033..0.033 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) Trigger paths_trig: time=9080.960 calls=1 Total runtime: 9081.028 ms (4 rows) SELECT * FROM paths WHERE source = 'T'; source | dest | length | path --------+------+--------+------ T | Z | 100.4 | {} (1 row)
  • © 2013 EDB All rights reserved. 21 Caching foreign data ● Materialized views on foreign tables – Data availability in case of foreign server failure – Faster data access – Possibly stale data ● Aggressive refresh – Triggers on foreign tables not supported ● Being discussed in the community – External method for firing REFRESH when foreign data changes ● Lazy refresh – Fire REFRESH periodically
  • © 2013 EDB All rights reserved. 22 Caching foreign data postgres=# d+ remote_emp Foreign table "public.remote_emp" Column | Type | Modifiers | FDW Options | Storage | Stats target | Description --------+-----------------------+-----------+-------------+----------+--------------+------------- empno | numeric(4,0) | | | main | | ename | character varying(10) | | | extended | | job | character varying(10) | | | extended | | Server: local_ppas FDW Options: (schema_name 'public', table_name 'emp') Has OIDs: no postgres=# create materialized view cached_remote_emp as select * from remote_emp; postgres=# explain analyze select * from cached_remote_emp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on cached_remote_emp (cost=0.00..16.90 rows=690 width=88) (actual time=0.020..0.024 rows=14 loops=1) Planning time: 0.076 ms Total runtime: 0.068 ms (3 rows) postgres=# explain analyze select * from remote_emp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Foreign Scan on remote_emp (cost=100.00..131.93 rows=731 width=88) (actual time=0.834..0.836 rows=14 loops=1) Planning time: 0.077 ms Total runtime: 1.451 ms (3 rows)
  • © 2013 EDB All rights reserved. 23 Thank you