Successfully reported this slideshow.
Your SlideShare is downloading. ×

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 32 Ad
Advertisement

More Related Content

Similar to [Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우 (20)

More from PgDay.Seoul (20)

Advertisement

Recently uploaded (20)

Advertisement

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

  1. 1. GIN vs. GiST 인덱스 이야기 가이아쓰리디㈜ 박진우(swat018@gmail.com) 2017. 11. 04
  2. 2. Contents 1.Index 2.Heap 3.Btree and GIN 4.Ttree and GiST 5.summary
  3. 3. Why Index??
  4. 4. Why Index?? Spatial Index Visibility Index Full Text Search
  5. 5. Index
  6. 6. Index
  7. 7. Index 인덱스는 지정된 컬럼에 대한 매핑 정보를 가지고 있습니다. Ex) CREATE INDEX test1_id_index ON test1 (id);
  8. 8. Index PostgreSQL에서는 다음과 같은 Index type을 지원합니다. • B-Tree : numbers, text, dates, etc.. • Generalized Inverted Index (GIN) • Generalized Inverted Search Tree (GiST) • Space partitioned GiST (SP-GiST) • Block Range Indexes (BRIN) • Hash
  9. 9. Heap Heap(힙) 이란? : 정렬의 기준이 없이 저장된 테이블의 존재 형태 Block 0 Block 1 Block 2 Block 3 Block 4 Block 0 Block 1 Block 2 Block 3 Block 4 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
  10. 10. Heap Block 0 Block 1 Block 2 Block 3 Block 4 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 TID: Physical location of heap tuple ex) Berlin: 0번째 Block의 2번째 항목이다. Item Point: Berlin  (0,2)
  11. 11. Heap • Table file은 n개의 block으로 구성되어 있다. • 한 block 당 Page의 디폴트 크기는 8192byte(약 8KB)이다. • 한 페이지(Page)는 Header Info, Record data, free space로 구성되어 있다.
  12. 12. Heap
  13. 13. Seq. Scan VS. Index Scan
  14. 14. B-tree Postgres=# CREATE INDEX indexname ON tablename (columnname) CREATE INDEX test1_id_index ON test1 (id); • 기본적인 Index type의 방식 • 사용법
  15. 15. B-tree
  16. 16. B-tree
  17. 17. GIN Seoul (0,12) Seoul (4,2) Seoul (1,9) Seoul (4,1) Busan (2,2) Seoul (0,12), (4,2), (1,9), (4,1), (2,2) Busan (2,2) Posing list • Generalized Inverted Index (GIN)
  18. 18. GIN Posting tree
  19. 19. GIN Posting list
  20. 20. GIN 1. Text retrival postgres=# -- create a table with a text column postgres=# CREATE TABLE t1 (id serial, t text); CREATE TABLE postgres=# CREATE INDEX t1_idx ON t1 USING gin (to_tsvector('english', t)); CREATE INDEX postgres=# INSERT INTO t1 VALUES (1, 'a fat cat sat on a mat and ate a fat rat'); INSERT 0 1 postgres=# INSERT INTO t1 VALUES (2, 'a fat dog sat on a mat and ate a fat chop'); INSERT 0 1 postgres=# -- is there a row where column t contains the two words? (syntax contains some magic to hit index) postgres=# SELECT * FROM t1 WHERE to_tsvector('english', t) @@ to_tsquery('fat & rat'); id | t ----+------------------------------------------ 1 | a fat cat sat on a mat and ate a fat rat (1 row) postgres=# CREATE INDEX indexname ON tablename USING GIN (columnname);
  21. 21. GIN 2. Array postgres=# -- create a table where one column exists of an integer array postgres=# -- postgres=# CREATE TABLE t2 (id serial, temperatures INTEGER[]); CREATE TABLE postgres=# CREATE INDEX t2_idx ON t2 USING gin (temperatures); CREATE INDEX postgres=# INSERT INTO t2 VALUES (1, '{11, 12, 13, 14}'); INSERT 0 1 postgres=# INSERT INTO t2 VALUES (2, '{21, 22, 23, 24}'); INSERT 0 1 postgres=# -- Is there a row with the two array elements 12 and 11? postgres=# SELECT * FROM t2 WHERE temperatures @> '{12, 11}'; id | temperatures ----+--------------- 1 | {11,12,13,14} (1 row)
  22. 22. GiST • “contains”, “left of”, “overlaps”, 등을 지원한다. • Full Text Search, Geometric operations (PostGIS, etc. ), Handling ranges (tiem, etc.) • KNN-search, BRTree를 바탕으로 구성되어 있다.
  23. 23. R-tree(Rectangle-tree)
  24. 24. R-tree(Rectangle-tree) Linear Indexing
  25. 25. R-tree(Rectangle-tree) Multi-Dimensional
  26. 26. R-tree(Rectangle-tree) Multi-Dimensional
  27. 27. GiST postgres=# CREATE INDEX indexname ON tablename USING GIST (columnname); postgres=# -- create a table with a column of non-trivial type postgres=# -- postgres=# CREATE TABLE t3 (id serial, c circle); CREATE TABLE postgres=# CREATE INDEX t3_idx ON t3 USING gist(c); CREATE INDEX postgres=# INSERT INTO t3 VALUES (1, circle '((0, 0), 0.5)'); INSERT 0 1 postgres=# INSERT INTO t3 VALUES (2, circle '((1, 0), 0.5)'); INSERT 0 1 postgres=# INSERT INTO t3 VALUES (3, circle '((0.3, 0.3), 0.3)'); INSERT 0 1 postgres=# -- which circles lie in the bounds of the unit circle? postgres=# SELECT * FROM t3 WHERE circle '((0, 0), 1)' @> c; id | c ----+----------------- 1 | <(0,0),0.5> 3 | <(0.3,0.3),0.3> (2 rows)
  28. 28. 지원하는 Data type
  29. 29. 지원하는 Data type
  30. 30. 지원하는 Data type
  31. 31. summary • B-tree is ideal for unique values • GIN is ideal for indexes with many duplicates • GIST for everything else Experiments lead to the following observations: creation time - GIN takes 3x time to build than GiST size of index - GIN is 2-3 times bigger than GiST search time - GIN is 3 times faster than GiST update time - GIN is about 10 times slower than GiST
  32. 32. 경청해 주셔서 감사합니다. swat018@gmail.com

×