Введение в современную PostgreSQL. Часть 2

Введение в
современную
PostgreSQL.
Часть 2
ДЕНИС ПИРШТУК,
INDATA LABS
SLONIK

ТИПЫ ИНДЕКСОВ
postgres=# select amname from pg_catalog.pg_am;
• btree ― balanced tree (по умолчанию)
• hash
• gist ― generalized search tree
• gin ― generalized inverted index
• spgist ― space-partitioned GiST
• brin ― block range index
2
http://www.postgresql.org/docs/9.1/static/textsearch-indexes.html

СОЗДАНИЕ ИНДЕКСА
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
[ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ]
[ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ]
[, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ] [ WHERE predicate ]
4

ВЫБОРКА БЕЗ ИНДЕКСА
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM github_events
WHERE 3488850707 < event_id AND event_id < 3488880707;
------------------------------------------------------------------
Seq Scan on github_events (cost=0.00..265213.33 rows=13185
width=8) (actual time=0.008..495.324 rows=12982 loops=1)
Filter: (('3488850707'::bigint < event_id) AND (event_id <
'3488880707'::bigint))
Rows Removed by Filter: 2040200
Planning time: 0.189 ms
Execution time: 504.053 ms
5

ПРОСТОЙ ИНДЕКС
CREATE UNIQUE INDEX event_id_idx ON github_events(event_id);
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM github_events
WHERE 3488850707 < event_id AND event_id < 3488880707;
------------------------------------------------------------------
Index Scan using event_id_idx on github_events
(cost=0.43..1921.28 rows=13187 width=8) (actual time=0.024..12.544
rows=12982 loops=1)
Index Cond: (('3488850707'::bigint < event_id) AND (event_id <
'3488880707'::bigint))
6

ОБЫЧНЫЙ ИНДЕКС
CREATE UNIQUE INDEX event_id_idx ON github_events(event_id);
--------------------------------
Index Scan using event_id_idx on github_events
(cost=0.43..1921.28 rows=13187 width=8) (actual
time=0.037..12.485 rows=12982 loops=1)
Index Cond: (('3488850707'::bigint < event_id) AND
(event_id < '3488880707'::bigint))
7

СОСТАВНОЙ ИНДЕКС
CREATE UNIQUE INDEX event_id_idx
ON github_events(event_id, repo_id);
8

ПОКРЫВАЮЩИЙ ИНДЕКС
• Меньше размер индекса
• Меньше издержек на обновление
• Быстрее планирование и поиск
• Для включенных столбцов не нужен opclass
• Фильтр по включенным столбцам
CREATE UNIQUE INDEX event_id_idx2 ON
github_events(event_id) INCLUDING (repo_id);
https://pgconf.ru/media/2016/02/19/4_Lubennikova_B-
tree_pgconf.ru_3.0%20(1).pdf
9

ПОКРЫВАЮЩИЙ ИНДЕКС
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM
github_events WHERE 3488850707 < event_id AND event_id < 3488880707;
---------------------------------------
Index Only Scan using event_id_idx2 on github_events
(cost=0.43..23764.29 rows=13187 width=8) (actual time=0.032..12.533
rows=12982 loops=1)
Index Cond: ((event_id > '3488850707'::bigint) AND (event_id <
'3488880707'::bigint))
Heap Fetches: 12982
10

BRIN-ИНДЕКС
CREATE INDEX event_id_brin_idx ON github_event USING(event_id);
--------------------------------
Bitmap Heap Scan on github_events (cost=175.16..42679.52 rows=13187 width=8) (actual
time=0.824..1
5.489 rows=12982 loops=1)
Recheck Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))
Rows Removed by Index Recheck: 13995
Heap Blocks: lossy=3072
-> Bitmap Index Scan on event_id_brin_idx (cost=0.00..171.87 rows=13187 width=0) (actual
time=0
.698..0.698 rows=30720 loops=1)
Index Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))
11

РАЗНИЦА?
Размер:
Обычный: 44 MB
BRIN: 80kB
ЦЕНА ОБНОВЛЕНИЯ???
12

CSTORE_FDW
• Inspired by Optimized Row Columnar (ORC) format
developed by Hortonworks.
• Compression: Reduces in-memory and on-disk data size
by 2-4x. Can be extended to support different codecs.
• Column projections: Only reads column data relevant to
the query. Improves performance for I/O bound queries.
• Skip indexes: Stores min/max statistics for row groups,
and uses them to skip over unrelated rows.
13

CSTORE_FDW
CREATE FOREIGN TABLE cstored_github_events (
event_id bigint,
event_type text,
event_public boolean,
repo_id bigint,
payload jsonb,
repo jsonb, actor jsonb,
org jsonb,
created_at timestamp
)
SERVER cstore_server
OPTIONS(compression 'pglz');
INSERT INTO cstored_github_events (SELECT * FROM github_events);
ANALYZE cstored_github_events;
14

ТИПИЧНЫЙ ЗАПРОС
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id, count(*) FROM cstored_github_events WHERE created_at BETWEEN timestamp
'2016-01-02 01:00:00' AND timestamp '2016-01-02 23:00:00' GROUP BY repo_id ORDER BY 2 DESC;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Sort (cost=75153.59..75221.43 rows=27137 width=8) (actual time=950.085..1030.283 rows=106145 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 8048kB
-> HashAggregate (cost=72883.86..73155.23 rows=27137 width=8) (actual time=772.445..861.162 rows=106145 loops=1)
Group Key: repo_id
-> Foreign Scan on cstored_github_events (cost=0.00..70810.84 rows=414603 width=8) (actual time=4.762..382.302
rows=413081 loops=1)
Filter: ((created_at >= '2016-01-02 01:00:00'::timestamp without time zone) AND (created_at <= '2016-01-02
23:00:00'::timestamp without time zone))
Rows Removed by Filter: 46919
CStore File: /var/lib/pgsql/9.5/data/cstore_fdw/18963/1236161
CStore File Size: 1475036725
15

НЕ ВСЕГДА КАК В РЕКЛАМЕ
SELECT
pg_size_pretty(cstore_table_size('cstored_github_events'));
1407 MB
SELECT pg_size_pretty(pg_table_size('github_events'));
2668 MB
16

POSTGRESQL 9.5:
FOREIGN TABLE INHERITANCE
• Fast INSERT and look-ups into current table.
• Periodically move data to archive table for compression.
• Query both via main table.
• Combined row-based and columnar store
17

КЛАСТЕРИЗАЦИЯ
SELECT retweet_count FROM contest WHERE "user.id" =
13201312;
Time: 120.743 ms
CREATE INDEX user_id_post_id ON contest("user.id"
ASC, "id" DESC);
CLUSTER contest USING user_id_post_id;
VACUUM contest;
Time: 4.128 ms
18
https://github.com/reorg/pg_repack
There is
no CLUSTER statement
in the SQL standard.
bloating

ЧТО ЕЩЕ?
• UPSERT: INSERT… ON CONFLICT DO
NOTHING/UPDATE (9.5)
• Частичные индексы (9.2)
• Материализованные представления (9.3)
19

ПРОФИЛИРОВАНИЕ И DBA
• pg_stat_statements, pg_stat_activity, pg_buffercache
• https://github.com/PostgreSQL-Consulting/pg-utils
• https://github.com/ankane/pghero
• Множество полезных запросов на wiki PostgreSQL
• https://wiki.postgresql.org/wiki/Show_database_bloat
20

PG-UTILS
• query_stat_cpu_time.sql, query_stat_io_time.sql,
query_stat_rows.sql, query_stat_time.sql
• low_used_indexes
• seq_scan_tables
• table_candidates_from_ssd.sql / table_candidates_to_ssd.sql
• index_disk_activity.sql
• table_disk_activity
• table_index_write_activity.sql / table_write_activity.sql
21

JSONB
CREATE INDEX login_idx ON github_events USING btree((org->>'login'));
CREATE INDEX login_idx2 ON github_events USING gin(org jsonb_value_path_ops);
jsonb_path_value_ops
(hash(path_item_1.path_item_2. ... .path_item_n); value)
jsonb_value_path_ops
(value; bloom(path_item_1) | bloom(path_item_2) | ... | bloom(path_item_n))
22

JSQUERY
CREATE TABLE js (
id serial,
data jsonb,
CHECK (data @@ '
name IS STRING AND
similar_ids.#: IS NUMERIC AND
points.#:(x IS NUMERIC AND y IS NUMERIC)':: jsquery));
23

МАСШТАБИРУЕМОСТЬ POSTGRESQL
24

ВЕРТИКАЛЬНАЯ
(POSTGRESPRO, POWER 8)
25

НУЖНО ВЫБИРАТЬ …
26

POSTGRES-XL
29
https://habrahabr.ru/post/253017/
http://www.postgres-xl.org/overview/

16 МАШИНОК VS 1 МАШИНКА
30

Введение в современную PostgreSQL. Часть 2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Введение в современную PostgreSQL. Часть 2

Similar to Введение в современную PostgreSQL. Часть 2 (20)

More from Dzianis Pirshtuk

More from Dzianis Pirshtuk (7)

Recently uploaded

Recently uploaded (20)

Введение в современную PostgreSQL. Часть 2