Indexing Complex PostgreSQL Data Types
Upcoming SlideShare
Loading in...5
×
 

Indexing Complex PostgreSQL Data Types

on

  • 6,035 views

PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special ...

PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?

Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.

Statistics

Views

Total Views
6,035
Views on SlideShare
5,915
Embed Views
120

Actions

Likes
8
Downloads
106
Comments
0

2 Embeds 120

https://twitter.com 108
http://www.redditmedia.com 12

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 001.sql

Indexing Complex PostgreSQL Data Types Indexing Complex PostgreSQL Data Types Presentation Transcript

  • Indexing  Complex  PostgreSQL   Data  Types Jonathan  S.  Katz  -­‐  PGDay  UK  2014  -­‐  July  9,  2014
  • About • Jonathan  S.  Katz   – CTO,  VenueBook   – Co-­‐Organizer,  NYC  PostgreSQL  User  Group   – PGConf  NYC  2015   • Mar  25  -­‐  27,  2015   • New  York  Marriott  Downtown   • http://nyc.pgconf.us   – @jkatz05 2
  • Quick  Overview • Introductory  Talk  with  demos  and  fun   • B-­‐trees   • GiST:  Generalized  Search  Trees   • GIN:  Generalized  Inverted  Index   • SP-­‐GiST:  Space  Partitioned  Generalized  Search   Trees 3 View slide
  • Assumptions • PostgreSQL  9.3+   • most  will  be  9.0+   • PostGIS  2.0+   • Believe  it  will  work  for  most  available   versions   • PostgreSQL  9.4  beta? 4 View slide
  • The  Beginning 5
  • The  Beginning 6 4 3 8 7 2 9 1 6 10 6
  • Find  3 7 4 3 8 7 2 9 1 6 10 5
  • Find  3 8 4 3 8 7 2 9 1 6 10 5
  • Find  3 9 4 3 8 7 2 9 1 6 10 5
  • Find  10 10 4 3 8 7 2 9 1 6 10 5
  • Find  10 11 4 3 8 7 2 9 1 6 10 5
  • Find  10 12 4 3 8 7 2 9 1 6 10 5
  • Find  10 13 4 3 8 7 2 9 1 6 10 5
  • Find  10 14 4 3 8 7 2 9 1 6 10 5
  • Find  10 15 4 3 8 7 2 9 1 6 10 5
  • Find  10 16 4 3 8 7 2 9 1 6 10 5
  • Find  10 17 4 3 8 7 2 9 1 6 10 5
  • Find  10 18 4 3 8 7 2 9 1 6 10 5
  • Find  10 19 4 3 8 7 2 9 1 6 10 5
  • What  We  Learned • Without  any  data  structure  around  search,  we   rely  on  "hope"   • Assumed  "unique  values"  and  "equality"   – would  have  to  scan  all  rows  otherwise   • …and  what  about:   – INSERT   – UPDATE   – DELETE 20
  • What  We  Need • Need  a  data  structure  for  search  that:   – allows  efficient  lookups   – plays  nicely  with  disk  I/O   – does  not  take  too  long  for  updates   21
  • B-­‐Trees • "default"  index   • quick  traversal  to  leaf  nodes   • leaf  nodes  pre-­‐sorted   • node  size  designed  to  fit  in  disk  block  size   – "degree"  of  nodes  has  max-­‐size   • theoretical  performance   – reads:  O(log  n)   – writes:  O(log  n)   – space:  O(n) 22
  • B-­‐Trees  and  PostgreSQL • supports   – <=,  <,  =,  >,  >=   – BETWEEN,  IN   – IS  NOT  NULL,  IS  NULL   – LIKE  in  specific  case  of  ‘plaintext%’   – ~  in  specific  case  of  ‘^plaintext’   – ILIKE  and  ~*  if  pattern  starts  with  nonalpha  characters   • does  not  support   • IS  NOT  DISTINCT  FROM 23
  • B-­‐Trees  and  PostgreSQL • data  types  supported   – any  data  type  with  all  the  equality  operators  defined   – number  types   • integer,  numeric,  decimal   – text   • char,  varchar,  text   – date  /  times   • timestamptz,  timestamp,  date,  time,  timetz,  interval   - arrays,  ranges 24
  • Demo  Specs  +  Configuration • Hardware  specs   – 2.3GHz  Intel  i7  2x4  core   – 16GB  RAM  DDR3  1600MHz   – Apple  SSD  6Gbps   • postgresql.conf  (9.3,  9.4  beta  1)   – shared_buffers  =  1GB   – work_mem  =  64MB   – maintenance_work_mem  =  1024MB   – effective_cache_size  =  8GB 25
  • Demo  #1:  Basic  Indexing  Plans 26
  • Demo  #1  Notes • Index  maintenance   • VACUUM  –  "cleans  up"  after  writes  on  table  / indexes   – ANALYZE  –  keeps  statistics  up-­‐to-­‐date  for  planner   ! VACUUM ANALYZE tablename;   ! • Good  idea  to  leave  autovacuum  on 27
  • Indexing  in  Production • CREATE  INDEX  CONCURRENTLY   • REINDEX   – corruption,  bloat,  invalid   • FILLFACTOR   – 10  –  100   – default:  90   – strategy:  lower  %  ::  write-­‐activity   • TABLESPACE   • NULLS  LAST,  NULLS  FIRST 28
  • Demo  #2:  Partial  Indexes 29 CREATE INDEX indexname ON tablename (columnname) WHERE somecondition;
  • Demo  #2  Notes • Partial  Indexes  are   – good  if  known  to  query  limited  subset  of  table   – take  up  less  space   – allow  for  much  quicker  writes   • Like  all  good  things,  do  not  overuse  and   saturate  your  I/O 30
  • Unique  Indexes • only  for  B-­‐trees   • NULL  not  unique   • use  UNIQUE  constraints  –  automatically  create   indexes   ! CREATE TABLE foo (bar int UNIQUE);   -- or   CREATE UNIQUE INDEX foo_bar_idx ON foo (bar);   ALTER TABLE foo ADD CONSTRAINT a_unique USING INDEX a_idx; 31
  • Multi-­‐Column  Indexes • Useful  for   – querying  two  columns  that  are  frequently  queried   together   – enforcing  UNIQUEness  across  columns   • n.b.  creating  UNIQUE  constraint  on  table  creates  UNIQUE   INDEX   • PostgreSQL  supports   – up  to  32  columns   – B-­‐tree,  GiST,  GIN   • Be  careful  of  how  you  choose  initial  column  order! 32
  • Multi-­‐Column  Indexes 33 CREATE INDEX multicolumn_idx ON tablename (col1, col2); ! ! ! CREATE UNIQUE INDEX pn_idx ON phone_numbers (country_code, national_number) WHERE extension IS NULL
  • Demo  #3  Notes • Multi-­‐column  indexes  can  be   – efficient  for  speed  +  space   – inefficient  with  performance   • Usage  depends  on  your  application  needs 34
  • Expression  Indexes • can  index  on  expressions  to  speed  up  lookups   – e.g.  case  insensitive  email  addresses   – can  use  functions  or  scalars   • (x * y) / 100   • COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')   • LOWER(email_address)   • tradeoff:  slower  writes 35
  • Demo  #4:  Expression  Indexes 36
  • Demo  #4  Notes • fast  lookups,  slow  updates   • size  on  disk   • cool  data  types   – geometric   – JSON 37
  • Geometric  Data  Types 38
  • Geometric  Data  Types CREATE TABLE points (coord point); ! CREATE INDEX points_idx ON points (coord); ERROR: data type point has no default operator class for access method "btree" HINT: You must specify an operator class for the index or define a default operator class for the data type. 39
  • GiST • "generalized  search  tree"   • infrastructure  that  provides  template  to  create  arbitrary   indexing  schemes   – supports  concurrency,  logging,  searching  –  only  have  to  define   behavior   – user-­‐defined  operator  class   • <<,  &<,  &>,  >>,  <<|,  &<|,  |&>,  |>>,  @>,  <@,  ~=,  &&   – have  to  implement  functions  in  interface   • supports  lossless  +  lossy  indexes   • provides  support  for  "nearest-­‐neighbor"  queries  –   "KNN-­‐Gist" 40 CREATE INDEX points_coord_gist_idx ON points USING gist(coord)
  • Demo  #5:  Geometric  GiST  +  KNN-­‐Gist 41
  • Demo  #5  Notes • GiST  indexes  on  geometric  types  radically   speedup  reads   • Writes  are  slower  due  to  distance  calculation   • Index  size  can  be  very  big 42
  • PostGIS • For  when  you  are  doing  real  things  with  shapes 43• (and  geographic  information  systems)
  • PostGIS  +  Indexes • B-­‐Tree?   • R-­‐Tree?   • PostGIS  docs  do  not  recommend  using  just   an  R-­‐Tree  index   • GiST   • overlaps!    containment!   • uses  a  combination  of  GiST  +  R-­‐Tree 44
  • PostGIS  +  GiST 45 2-­‐D CREATE INDEX zipcodes_geom_gist_idx ON zipcodes USING gist(geom); N-­‐D  (PostGIS  2.0+ CREATE INDEX zipcodes_geom_gist_idx ON zipcodes USING gist(geom gist_geometry_ops_nd);
  • Example  -­‐  USA  Zipcode  Boundaries • 33,120  rows   • geom:  MultiPolygon   • 52MB  without  indexes   • With  geometry  GiST  +  integer  B-­‐Tree:  869MB 46
  • What  Zipcode  Is  My  Office  In? • Geocoded  Address   • Lat,Long  =  40.7356197,-­‐73.9891102   • PostGIS:  POINT(-­‐73.9891102  40.7356197)   • 4269  -­‐  “SRID”  -­‐  unique  ID  for  coordinate  system   definitions 47 SELECT zcta5ce10 AS zipcode FROM zipcodes WHERE ST_Contains( geom, --MultiPolygon ST_GeomFromText('POINT(-73.9891102 40.7356197)', 4269) );
  • What  Zipcode  Is  My  Office  In? • No  Index 48 Seq Scan on zipcodes (cost=0.00..15382.00 rows=1 width=6) (actual time=64.780..5153.485 rows=1 loops=1) Filter: ((geom && '0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry) AND _st_contains(geom, '0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)) Rows Removed by Filter: 33119 Total runtime: 5153.505 ms
  • What  Zipcode  Is  My  Office  In? • Here’s  the  GiST: 49 Index Scan using zipcodes_geom_gist on zipcodes (cost=0.28..8.54 rows=1 width=6) (actual time=0.120..0.207 rows=1 loops=1) Index Cond: (geom && '0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry) Filter: _st_contains(geom, '0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry) Rows Removed by Filter: 1 ! Total runtime: 0.235 ms
  • PostGIS  +  GiST  Conclusion • 5153ms  =>  0.235ms  =  21,900x  speedup  :-­‐) 50 SELECT zcta5ce10 AS zipcode FROM zipcodes WHERE ST_Contains( geom, ST_GeomFromText('POINT(-73.9891102 40.7356197)', 4269) ); ! zipcode --------- 10003
  • Full  Text  Search • PostgreSQL  offers  full  text  search  with  the   tsearch2  engine   – algorithms  for  performing  FTS   – to_tsvector('english',  content)  @@   to_tsquery('irish  &  conference  |  meeting')   – provides  indexing  capabilities  for  efficient  search 51
  • Test  Data  Set • Wikipedia  English  category  titles  –  all   1,823,644  that  I  downloaded 52
  • Full-­‐Text  Search:  Basics 53 SELECT title FROM category WHERE to_tsvector('english', title) @@ to_tsquery('united & kingdom’); ! title ----- Lists of railway stations in the United Kingdom Political history of the United Kingdom Military of the United Kingdom United Kingdom constitution Television channels in the United Kingdom United Kingdom Roman Catholic secondary schools in the United Kingdom [results truncated] ! ! QUERY PLAN ------------ Seq Scan on category (cost=0.00..49262.77 rows=46 width=29) (actual time=21.900..16809.890 rows=8810 loops=1) Filter: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text)) Rows Removed by Filter: 1814834 ! Total runtime: 16811.108 ms
  • Full-­‐Text  Search  +  GiST 54 CREATE INDEX category_title_gist_idx ON category USING gist(to_tsvector('english', title)); ! SELECT title FROM category WHERE to_tsvector('english', title) @@ to_tsquery('united & kingdom'); QUERY PLAN ------------- Bitmap Heap Scan on category (cost=4.77..182.47 rows=46 width=29) (actual time=75.517..180.650 rows=8810 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text)) -> Bitmap Index Scan on category_title_gist_idx (cost=0.00..4.76 rows=46 width=0) (actual time=74.687..74.687 rows=8810 loops=1) Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom’::text)) ! Total runtime: 181.354 ms
  • Full  Text  Search  +  GiST • GiST  indexes  can  produce  false  positives   – "documents"  represented  by  fixed  length  signature   • words  are  hashed  into  single  bits  and  concatenated   – when  false  positive  occurs,  row  is  returned  and   checked  to  see  if  false  match   • Extra  validations  =  performance  degradation 55
  • Performance  Summary  with  GiST • initial  index  build  takes  awhile  =>  slow  writes   • reads  are  quick   • Table  size:  271MB   • Index  size:  83MB 56
  • GIN  Index • "generalized  inverted  index"   • supports  searching  within  composite  data   – arrays,  full-­‐text  documents,  hstore   • key  is  stored  once  and  points  to  composites  it  is   contained  in   • like  GiST,  provides  index  infrastructure  to  extend  GIN   based  on  behavior   – supports  operators  <@,  @>,  =,  &&   • GIN  performance  ⬄  log(#  unique  things) 57
  • Full  Text  Search  +  GIN 58 CREATE INDEX category_title_gin_idx ON category USING gin(to_tsvector('english', title)); ! EXPLAIN ANALYZE SELECT title FROM category WHERE to_tsvector('english', title) @@ to_tsquery('united & kingdom'); ! QUERY PLAN ------- Bitmap Heap Scan on category (cost=28.36..206.06 rows=46 width=29) (actual time=8.864..14.674 rows=8810 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text)) -> Bitmap Index Scan on category_title_gin_idx (cost=0.00..28.35 rows=46 width=0) (actual time=7.905..7.905 rows=8810 loops=1) Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text)) ! ! Total runtime: 15.157 ms
  • Performance  Summary  with  GIN • index  build  was  much  quicker   • significant  speedup  from  no  index   – (12,000ms  =>  15ms)   • significant  speedup  from  GiST   – (181ms  =>  15ms)   • Table  size:  271MB   • Index  size:   • 9.3:  71MB   • 9.4  beta  1:    40MB 59
  • What  Was  Not  Discussed • Word  density   – prior  to  9.3,  performance  issues  with  greater  word   density   • Type  of  text  data  –  phrases  vs  paragraphs 60
  • Full  Text  Search  –  GiST  vs  GIN • Reads   – overall,  GIN  should  win   • Writes   – traditionally,  GiST  has  better  performance  for   writes   – GIN   • FASTUPDATE   • 9.4:  compression 61
  • Regular  Expression  Indexes • Added  in  9.3   • Support  for  LIKE/ILIKE  wildcard  indexes  in  9.1   – title  LIKE  '%ab%e'   • Uses  pg_trgm  extension  +  GIN   ! CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX category_title_regex_idx ON category USING GIN(title gin_trgm_ops); 62
  • Regular  Expressions  -­‐  No  Index 63 EXPLAIN ANALYZE SELECT title FROM category WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$'; ! QUERY PLAN ---------- Seq Scan on category (cost=0.00..40144.55 rows=182 width=29) (actual time=2.509..4260.792 rows=5878 loops=1) Filter: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text) Rows Removed by Filter: 1817766 ! Total runtime: 4261.204 ms
  • Regular  Expressions  -­‐  Indexed 64 CREATE INDEX category_title_regex_idx ON category USING gin(title gin_trgm_ops); ! EXPLAIN ANALYZE SELECT title FROM category WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$'; QUERY PLAN ----------- Bitmap Heap Scan on category (cost=197.41..871.77 rows=182 width=29) (actual time=107.445..146.713 rows=5878 loops=1) Recheck Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text) Rows Removed by Index Recheck: 4712 -> Bitmap Index Scan on category_title_regex_idx (cost=0.00..197.37 rows=182 width=0) (actual time=106.645..106.645 rows=10590 loops=1) Index Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text) ! ! Total runtime: 147.026 ms
  • Range  Types • stores  range  data   –  1  to  6   – 2013-­‐10-­‐29  –  2013-­‐11-­‐2   • easy-­‐to-­‐use  operators  to  check  inclusion,   overlaps   • built-­‐in  types:  integers,  numerics,  dates,   timestamps   • extensible 65
  • Range  Type  Examples 66 --find all ranges that overlap with [100, 200) ! SELECT * FROM ranges WHERE int4range(100, 200) && range; ! range ----------- [10,102) [13,102) [18,108) [32,101) [34,134) [37,123) [43,111) [46,132) [48,107) [results trunctated] !QUERY PLAN ----------- Seq Scan on ranges (cost=0.00..14239.86 rows=7073 width=32) (actual time=0.018..185.411 rows=143 loops Filter: ('[100,200)'::int4range && range) Rows Removed by Filter: 999857 ! Total runtime: 185.439 ms
  • Range  Types  +  GiST 67 CREATE INDEX ranges_range_gist_idx ON ranges USING gist(range); ! EXPLAIN ANALYZE SELECT * FROM ranges WHERE int4range(100, 200) && range; ! QUERY PLAN ------------ Bitmap Heap Scan on ranges (cost=5.29..463.10 rows=130 width=13) (actual time=0.120..0.135 rows=144 loops=1) Recheck Cond: ('[100,200)'::int4range && range) -> Bitmap Index Scan on ranges_range_gist_idx (cost=0.00..5.26 rows=130 width=0) (actual time=0.109..0.109 rows=144 loops=1) Index Cond: ('[100,200)'::int4range && range) ! ! Total runtime: 0.168 ms
  • SP-­‐GiST • space-­‐partitioned  generalized  search  tree   • ideal  for  non-­‐balanced  data  structures   – k-­‐d  trees,  quad-­‐trees,  suffix  trees   – divides  search  space  into  partitions  of  unequal  size   • matching  partitioning  rule  =  fast  search   • traditionally  for  "in-­‐memory"  transactions,   converted  to  play  nicely  with  I/O 68
  • Range  Types:  GiST  vs  SP-­‐Gist CREATE TABLE ranges AS SELECT int4range( (random()*5)::int, (random()*5)::int + 5 ) AS range FROM generate_series(1,<N>) x; ! SELECT * FROM ranges WHERE range <operator> int4range(3,6); 69
  • N  =  1,000,000 70 CREATE INDEX ranges_range_spgist_idx ON ranges USING spgist(range); ERROR: unexpected spgdoinsert() failure Fixed in 9.3.2 GiST  Used GiST  Time SP-Gist Used SP-GiST Time= Yes 121 Yes 37 && No 257  No 260 @> No 223 No 223 <@ Yes 163  Yes 111 << Yes 95 Yes 5 >> Yes 95 Yes 25 &< No 184 No 185 &> No 203 No 203
  • Range  Types:  GiST  vs  SP-­‐GiST CREATE TABLE ranges AS SELECT int4range(x, x + (random()*5)::int + 5) AS range FROM generate_series(1,<N>) x; 71
  • N  =  250,000 72 GiST  Used GiST  Time SP-­‐GiST  Used SP-­‐GiST  Time = Yes 0.5 Yes 0.7 && Yes 0.3 Yes 0.3 @> Yes 0.3 Yes 0.3 <@ Yes 0.06 Yes 0.25 << No 40 Yes 0.2 >> No 60 No 60 &< Yes 0.3 Yes 0.2 &> No 74 No 61
  • GiST  vs  SP-­‐GiST:  Space 73 GiST  Clustered SP-­‐GiST  Clustered GiST  Sparse SP-­‐GiST  Sparse 100K  Size 6MB 5MB 6MB 11MB 100K  Time 0.5s .4s 2.5s 7.8s 250K  Size 15MB 12MB 15MB 28MB 250K  Time 1.5s 1.1s 6.3s 47.2s 500K  Size 30MB 25MB 30MB 55MB 500K  Time 3.1s 3.0s 13.9s 192s 1MM  Size 59MB 52MB! 60MB 110MB 1MM  Time 5.1s 5.7s 29.2 777s
  • Integer  Arrays 74 CREATE UNLOGGED TABLE int_arrays AS SELECT ARRAY[x, x + 1, x + 2] AS data FROM generate_series(1,1000000) x; ! CREATE INDEX int_arrays_data_idx ON int_arrays (data); ! CREATE INDEX int_arrays_data_gin_idx ON int_arrays USING GIN(data);
  • B-­‐Tree(?)  +  Integer  Arrays 75 EXPLAIN ANALYZE SELECT * FROM int_arrays WHERE 5432 = ANY (data); QUERY PLAN ----------- Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual time=1.260..159.197 rows=3 loops=1) Filter: (5432 = ANY (data)) Rows Removed by Filter: 999997 ! Total runtime: 159.222 ms
  • GIN  +  Integer  Arrays 76 EXPLAIN ANALYZE SELECT * FROM int_arrays WHERE ARRAY[5432] <@ data; QUERY PLAN ----------- Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33) (actual time=0.020..0.021 rows=3 loops=1) Recheck Cond: ('{5432}'::integer[] <@ data) -> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50 rows=5000 width=0) (actual time=0.014..0.014 rows=3 loops=1) Index Cond: ('{5432}'::integer[] <@ data) ! Total runtime: 0.045 ms
  • Hash  Indexes • only  work  with  "="  operator   • are  still  not  WAL  logged  as  of  9.4  beta  1   – not  crash  safe   – not  replicated 77
  • btree_gin 78 CREATE EXTENSION IF NOT EXISTS btree_gin; ! CREATE UNLOGGED TABLE numbers AS SELECT (random() * 2000)::int AS a FROM generate_series(1, 2000000) x; ! CREATE INDEX numbers_gin_idx ON numbers USING gin(a); ! EXPLAIN ANALYZE SELECT * FROM numbers WHERE a = 1000; ! QUERY PLAN ------------ Bitmap Heap Scan on numbers (cost=113.50..9509.26 rows=10000 width=4) (actual time=0.388..1.459 rows=991 loops=1) Recheck Cond: (a = 1000) -> Bitmap Index Scan on numbers_gin_idx (cost=0.00..111.00 rows=10000 width=0) (actual time=0.232..0.232 rows=991 loops=1) Index Cond: (a = 1000) ! Total runtime: 1.563 ms
  • btree_gin  vs  btree 79 -- btree SELECT pg_size_pretty(pg_total_relation_size('numbers_idx')); pg_size_pretty ---------------- 43 MB ! ! ! -- GIN SELECT pg_size_pretty(pg_total_relation_size('numbers_gin_idx')); pg_size_pretty ---------------- 16 MB • Only  use  GIN  over  btree  if  you  have  a  lot  of  duplicate  entries
  • hstore  -­‐  the  PostgreSQL  Key-­‐Value   Store 80 CREATE EXTENSION IF NOT EXISTS hstore; ! CREATE UNLOGGED TABLE keypairs AS SELECT (x || ' => ' || (x + (random() * 5)::int))::hstore AS data FROM generate_series(1,1000000) x; SELECT pg_size_pretty(pg_relation_size('keypairs')); ! ! SELECT * FROM keypairs WHERE data ? ‘3'; data ---------- "3"=>"4" ! EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3'; QUERY PLAN ----------- Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual time=0.065..208.808 rows=1 loops=1) Filter: (data ? '3'::text) Rows Removed by Filter: 999999 ! Total runtime: 208.825 ms
  • hstore  -­‐  the  PostgreSQL  Key-­‐Value   Store 81 CREATE INDEX keypairs_data_gin_idx ON keypairs USING gin(data); ! EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3'; ! QUERY PLAN ----------- Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24) (actual time=0.044..0.045 rows=1 loops=1) Recheck Cond: (data ? '3'::text) -> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual time=0.039..0.039 rows=1 loops=1) Index Cond: (data ? '3'::text) ! Total runtime: 0.071 ms
  • JSONB:  Coming  in  9.4 82 INSERT INTO documents SELECT row_to_json(ROW(x, x + 2, x + 3))::jsonb FROM generate_series(1,1000000) x; ! ! CREATE INDEX documents_data_gin_idx ON documents USING gin(data jsonb_path_ops); ! ! ! SELECT * FROM documents WHERE data @> '{ "f1": 10 }'; data -------------------------------- {"f1": 10, "f2": 12, "f3": 13} ! ! Execution time: 0.084 ms
  • Awesome  vs  WTF:   A  Note  On  Operator  Indexability 83 EXPLAIN ANALYZE SELECT * FROM documents WHERE data @> '{ "f1": 10 }'; ! QUERY PLAN ----------- Bitmap Heap Scan on documents (cost=27.75..3082.65 rows=1000 width=66) (actual time=0.029..0.030 rows=1 loops=1) Recheck Cond: (data @> '{"f1": 10}'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on documents_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual time=0.014..0.014 rows=1 loops=1) Index Cond: (data @> '{"f1": 10}'::jsonb) ! Execution time: 0.084 ms ! EXPLAIN ANALYZE SELECT * FROM documents WHERE '{ "f1": 10 }' <@ data; ! QUERY PLAN ----------- Seq Scan on documents (cost=0.00..24846.00 rows=1000 width=66) (actual time=0.015..245.924 rows=1 loops=1) Filter: ('{"f1": 10}'::jsonb <@ data) Rows Removed by Filter: 999999 ! Execution time: 245.947 ms
  • For  More  Information… • http://www.postgresql.org/docs/current/static/ indexes.html   • http://www.postgresql.org/docs/current/static/ gist.html   • http://www.postgresql.org/docs/current/static/ gin.html   • http://www.postgresql.org/docs/current/static/ spgist.html   • GiST  +  GIN  +  Full  Text  Search:   – http://www.postgresql.org/docs/current/static/textsearch-­‐ indexes.html 84
  • Conclusion • Postgres  has  *a  lot*  of  different  types  of   indexes,  and  variations  on  each  of  its  engines   • Extensions  make  use  of  PostgreSQL  indexes   – PostGIS   • Need  to  understand  where  index  usage  is   appropriate  in  your  application 85
  • Thanks! • http://nyc.pgconf.us   • Twitter:  @jkatz05 86