Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GIN vs. GiST 인덱스 이야기
가이아쓰리디㈜
박진우(swat018@gmail.com)
2017. 11. 04
Contents
1.Index
2.Heap
3.Btree and GIN
4.Ttree and GiST
5.summary
Why Index??
Why Index??
Spatial
Index
Visibility
Index
Full Text
Search
Index
Index
Index
인덱스는 지정된 컬럼에 대한 매핑 정보를 가지고 있습니다.
Ex) CREATE INDEX test1_id_index ON test1 (id);
Index
PostgreSQL에서는 다음과 같은 Index type을 지원합니다.
• B-Tree : numbers, text, dates, etc..
• Generalized Inverted Index (GIN)
• ...
Heap
Heap(힙) 이란?
: 정렬의 기준이 없이 저장된 테이블의 존재 형태
Block 0
Block 1
Block 2
Block 3
Block 4
Block 0
Block 1
Block 2
Block 3
Block...
Heap
Block 0
Block 1
Block 2
Block 3
Block 4
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
TID: Physical location of heap tuple
...
Heap
• Table file은 n개의 block으로 구성되어 있다.
• 한 block 당 Page의 디폴트 크기는 8192byte(약 8KB)이다.
• 한 페이지(Page)는 Header Info, Record da...
Heap
Seq. Scan VS. Index Scan
B-tree
Postgres=# CREATE INDEX indexname ON tablename (columnname)
CREATE INDEX test1_id_index ON test1 (id);
• 기본적인 Index...
B-tree
B-tree
GIN
Seoul
(0,12)
Seoul
(4,2)
Seoul
(1,9)
Seoul
(4,1)
Busan
(2,2)
Seoul
(0,12), (4,2),
(1,9), (4,1),
(2,2)
Busan
(2,2)
Posi...
GIN
Posting tree
GIN
Posting list
GIN
1. Text retrival
postgres=# -- create a table with a text column
postgres=# CREATE TABLE t1 (id serial, t text);
CREAT...
GIN
2. Array
postgres=# -- create a table where one column exists of an integer array
postgres=# --
postgres=# CREATE TABL...
GiST
• “contains”, “left of”, “overlaps”, 등을 지원한다.
• Full Text Search, Geometric operations (PostGIS, etc. ), Handling ran...
R-tree(Rectangle-tree)
R-tree(Rectangle-tree)
Linear Indexing
R-tree(Rectangle-tree)
Multi-Dimensional
R-tree(Rectangle-tree)
Multi-Dimensional
GiST
postgres=# CREATE INDEX indexname ON tablename USING GIST
(columnname);
postgres=# -- create a table with a column of...
지원하는 Data type
지원하는 Data type
지원하는 Data type
summary
• B-tree is ideal for unique values
• GIN is ideal for indexes with many duplicates
• GIST for everything else
Exp...
경청해 주셔서 감사합니다.
swat018@gmail.com
Upcoming SlideShare
Loading in …5
×

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

758 views

Published on

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

Published in: Software
  • Be the first to comment

[Pgday.Seoul 2017] 6. GIN vs GiST 인덱스 이야기 - 박진우

  1. 1. GIN vs. GiST 인덱스 이야기 가이아쓰리디㈜ 박진우(swat018@gmail.com) 2017. 11. 04
  2. 2. Contents 1.Index 2.Heap 3.Btree and GIN 4.Ttree and GiST 5.summary
  3. 3. Why Index??
  4. 4. Why Index?? Spatial Index Visibility Index Full Text Search
  5. 5. Index
  6. 6. Index
  7. 7. Index 인덱스는 지정된 컬럼에 대한 매핑 정보를 가지고 있습니다. Ex) CREATE INDEX test1_id_index ON test1 (id);
  8. 8. Index PostgreSQL에서는 다음과 같은 Index type을 지원합니다. • B-Tree : numbers, text, dates, etc.. • Generalized Inverted Index (GIN) • Generalized Inverted Search Tree (GiST) • Space partitioned GiST (SP-GiST) • Block Range Indexes (BRIN) • Hash
  9. 9. Heap Heap(힙) 이란? : 정렬의 기준이 없이 저장된 테이블의 존재 형태 Block 0 Block 1 Block 2 Block 3 Block 4 Block 0 Block 1 Block 2 Block 3 Block 4 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
  10. 10. Heap Block 0 Block 1 Block 2 Block 3 Block 4 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 TID: Physical location of heap tuple ex) Berlin: 0번째 Block의 2번째 항목이다. Item Point: Berlin  (0,2)
  11. 11. Heap • Table file은 n개의 block으로 구성되어 있다. • 한 block 당 Page의 디폴트 크기는 8192byte(약 8KB)이다. • 한 페이지(Page)는 Header Info, Record data, free space로 구성되어 있다.
  12. 12. Heap
  13. 13. Seq. Scan VS. Index Scan
  14. 14. B-tree Postgres=# CREATE INDEX indexname ON tablename (columnname) CREATE INDEX test1_id_index ON test1 (id); • 기본적인 Index type의 방식 • 사용법
  15. 15. B-tree
  16. 16. B-tree
  17. 17. GIN Seoul (0,12) Seoul (4,2) Seoul (1,9) Seoul (4,1) Busan (2,2) Seoul (0,12), (4,2), (1,9), (4,1), (2,2) Busan (2,2) Posing list • Generalized Inverted Index (GIN)
  18. 18. GIN Posting tree
  19. 19. GIN Posting list
  20. 20. GIN 1. Text retrival postgres=# -- create a table with a text column postgres=# CREATE TABLE t1 (id serial, t text); CREATE TABLE postgres=# CREATE INDEX t1_idx ON t1 USING gin (to_tsvector('english', t)); CREATE INDEX postgres=# INSERT INTO t1 VALUES (1, 'a fat cat sat on a mat and ate a fat rat'); INSERT 0 1 postgres=# INSERT INTO t1 VALUES (2, 'a fat dog sat on a mat and ate a fat chop'); INSERT 0 1 postgres=# -- is there a row where column t contains the two words? (syntax contains some magic to hit index) postgres=# SELECT * FROM t1 WHERE to_tsvector('english', t) @@ to_tsquery('fat & rat'); id | t ----+------------------------------------------ 1 | a fat cat sat on a mat and ate a fat rat (1 row) postgres=# CREATE INDEX indexname ON tablename USING GIN (columnname);
  21. 21. GIN 2. Array postgres=# -- create a table where one column exists of an integer array postgres=# -- postgres=# CREATE TABLE t2 (id serial, temperatures INTEGER[]); CREATE TABLE postgres=# CREATE INDEX t2_idx ON t2 USING gin (temperatures); CREATE INDEX postgres=# INSERT INTO t2 VALUES (1, '{11, 12, 13, 14}'); INSERT 0 1 postgres=# INSERT INTO t2 VALUES (2, '{21, 22, 23, 24}'); INSERT 0 1 postgres=# -- Is there a row with the two array elements 12 and 11? postgres=# SELECT * FROM t2 WHERE temperatures @> '{12, 11}'; id | temperatures ----+--------------- 1 | {11,12,13,14} (1 row)
  22. 22. GiST • “contains”, “left of”, “overlaps”, 등을 지원한다. • Full Text Search, Geometric operations (PostGIS, etc. ), Handling ranges (tiem, etc.) • KNN-search, BRTree를 바탕으로 구성되어 있다.
  23. 23. R-tree(Rectangle-tree)
  24. 24. R-tree(Rectangle-tree) Linear Indexing
  25. 25. R-tree(Rectangle-tree) Multi-Dimensional
  26. 26. R-tree(Rectangle-tree) Multi-Dimensional
  27. 27. GiST postgres=# CREATE INDEX indexname ON tablename USING GIST (columnname); postgres=# -- create a table with a column of non-trivial type postgres=# -- postgres=# CREATE TABLE t3 (id serial, c circle); CREATE TABLE postgres=# CREATE INDEX t3_idx ON t3 USING gist(c); CREATE INDEX postgres=# INSERT INTO t3 VALUES (1, circle '((0, 0), 0.5)'); INSERT 0 1 postgres=# INSERT INTO t3 VALUES (2, circle '((1, 0), 0.5)'); INSERT 0 1 postgres=# INSERT INTO t3 VALUES (3, circle '((0.3, 0.3), 0.3)'); INSERT 0 1 postgres=# -- which circles lie in the bounds of the unit circle? postgres=# SELECT * FROM t3 WHERE circle '((0, 0), 1)' @> c; id | c ----+----------------- 1 | <(0,0),0.5> 3 | <(0.3,0.3),0.3> (2 rows)
  28. 28. 지원하는 Data type
  29. 29. 지원하는 Data type
  30. 30. 지원하는 Data type
  31. 31. summary • B-tree is ideal for unique values • GIN is ideal for indexes with many duplicates • GIST for everything else Experiments lead to the following observations: creation time - GIN takes 3x time to build than GiST size of index - GIN is 2-3 times bigger than GiST search time - GIN is 3 times faster than GiST update time - GIN is about 10 times slower than GiST
  32. 32. 경청해 주셔서 감사합니다. swat018@gmail.com

×