• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
PostgreSQL: Advanced features in practice
 

PostgreSQL: Advanced features in practice

on

  • 4,490 views

Transactional DDL, partial & function indexes, fuzzy string matching with trigram indexes, views, recursive/with queries and window functions.

Transactional DDL, partial & function indexes, fuzzy string matching with trigram indexes, views, recursive/with queries and window functions.

Statistics

Views

Total Views
4,490
Views on SlideShare
4,061
Embed Views
429

Actions

Likes
4
Downloads
43
Comments
0

6 Embeds 429

http://speakerrate.com 399
http://lanyrd.com 13
https://twitter.com 12
http://www.linkedin.com 2
https://www.linkedin.com 2
https://abs.twimg.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    PostgreSQL: Advanced features in practice PostgreSQL: Advanced features in practice Presentation Transcript

    • PostgreSQL: Advanced features in practice JÁN SUCHAL 22.11.2011 @RUBYSLAVA
    • Why PostgreSQL? The world’s most advanced open source database. Features!  Transactional DDL  Cost-based query optimizer + Graphical explain  Partial indexes  Function indexes  K-nearest search  Views  Recursive Queries  Window Functions
    • Transactional DDLclass CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true endend Where is the problem?
    • Transactional DDLclass CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false Column title does not exist! t.text :body, null: false is created, index is not. Oops! Table t.references :author, null: false Transactional DDL FTW! t.timestamps null: false end add_index :posts, :title, unique: true endend Where is the problem?
    • Cost-based query optimizer What is the best plan to execute a given query? Cost = I/O + CPU operations needed Sequential vs. random seek Join order Join type (nested loop, hash join, merge join)
    • Graphical EXPLAIN pgAdmin (www.pgadmin.org)
    • Partial indexes Conditional indexes Problem: Async job/queue table, find failed jobs  Create index on failed_at column  99% of index is never used
    • Partial indexes Conditional indexes Problem: Async job/queue table, find failed jobs  Create index on failed_at column  99% of index is never used Solution:CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at) WHERE failed_at IS NOT NULL;  smaller index  faster updates
    • Function Indexes Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’
    • Function Indexes Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’ “Solution”:  Add reverse_code column, populate, add triggers for updates, create index on reverse_code column  reverse queries WHERE reverse_code LIKE “321%”
    • Function Indexes Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’ “Solution”:  Add reverse_code column, populate, add triggers for updates, create index on reverse_code column,  reverse queries WHERE reverse_code LIKE “321%” PostgreSQL solution: CREATE INDEX idx_reversed ON projects (reverse((code)::text) text_pattern_ops); SELECT … WHERE reverse(code) LIKE reverse(‘%123’)
    • K-nearest search Problem: Fuzzy string matching  900K rows CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> Michl Brla AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
    • K-nearest search Problem: Fuzzy string matching  900K rows Solution: Ngram/Trigram search  johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> Michl Brla AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
    • K-nearest search Problem: Fuzzy string matching  900K rows Solution: Ngram/Trigram search  johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> Michl Brla AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
    • Views Constraints propagated down to viewsCREATE VIEW edges AS SELECT subject_id AS source_id, connected_subject_id AS target_id FROM raw_connections UNION ALL SELECT connected_subject_id AS source_id, subject_id AS target_id FROM raw_connections; SELECT * FROM edges WHERE source_id = 123; SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10 No materialization, 2x indexed select + 1x append/merge
    • Views Constraints propagated down to viewsCREATE VIEW edges AS SELECT subject_id AS source_id, connected_subject_id AS target_id FROM raw_connections UNION ALL SELECT connected_subject_id AS source_id, subject_id AS target_id FROM raw_connections; SELECT * FROM edges WHERE source_id = 123; SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10  No materialization, 2x indexed select + 1x append/merge
    • Recursive Queries Problem: Find paths between two nodes in graphWITH RECURSIVE search_graph(source,target,distance,path) AS( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4)SELECT * FROM search_graph LIMIT 100
    • Recursive Queries Problem: Find paths between two nodes in graphWITH RECURSIVE search_graph(source,target,distance,path) AS( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4)SELECT * FROM search_graph LIMIT 100
    • Recursive Queries Problem: Find paths between two nodes in graphWITH RECURSIVE search_graph(source,target,distance,path) AS( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4)SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
    • Recursive Queries Problem: Find paths between two nodes in graphWITH RECURSIVE search_graph(source,target,distance,path) AS( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4)SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
    • Recursive Queries Problem: Find paths between two nodes in graphWITH RECURSIVE search_graph(source,target,distance,path) AS( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4)SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
    • Recursive queries
    • Recursive queries Graph with ~1M edges (61ms) source; target; distance; path 530556; 552506; 2; {530556,185423,552506}  JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján Počiatek 530556; 552506; 2; {530556,183291,552506}  JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján Počiatek 530556; 552506; 4; {530556,183291,552522,185423,552506}  JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek
    • Window functions “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile… Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>)SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance,target) ) AS score FROM ( … ) AS paths) as scored_pathsGROUP BY source, target ORDER BY SUM(score) DESC
    • Window functions “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile… Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>)SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance,target) ) AS score FROM ( … ) AS paths) as scored_pathsGROUP BY source, target ORDER BY SUM(score) DESC
    • Window functions “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile… Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>)SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS n FROM ( … ) AS paths) as scored_pathsGROUP BY source, target ORDER BY SUM(score) DESC
    • Window functions “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile… Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>)SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS score FROM ( … ) AS paths) as scored_pathsGROUP BY source, target ORDER BY SUM(score) DESC
    • Window functions “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile… Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>)SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS score FROM ( … ) AS paths) AS scored_pathsGROUP BY source, target ORDER BY SUM(score) DESC
    • Window functions Example: Closest to Róbert Kaliňák "Bussines Park Bratislava a.s." "JARABINY a.s." "Ing. Robert Pintér" "Ing. Ján Počiatek" "Bratislava trade center a.s.“ … 1M edges, 41ms
    • Additional resources www.postgresql.org  Read the docs, seriously www.explainextended.com  SQL guru blog explain.depesz.com  First aid for slow queries www.wikivs.com/wiki/MySQL_vs_PostgreSQL  MySQL vs. PostgreSQL comparison
    • Real World Explain www.postgresql.org