Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PostgreSQL – Tomasz Borek
Database for next project?
Why, PostgreSQL of course!
@LAFK_pl
Consultant @
About me
@LAFK_pl
Consultant @
Tomasz Borek
What will I tell you?
● Colourful history of PostgreSQL
– So, DB wars
● Chosen features
● Architecture and internals
● Que...
Colourful history
History
In-/Postgres forks
Support?
Chosen features
My Faves
● Error reporting / logging
● PL/xSQL – feel free to use Perl, Python, Ruby, Java,
LISP...
● XML and JSON handlin...
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
The cake is a lie!
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
Will DB eat your cake?
● Thanks @anandology
Consider password VARCHAR(8)
Logging, ‘gotchas’
● Default is to stderr only
●
Set on CLI or in config, not through sets
● Where is it?
●
How to log que...
Where is it?
● Default
– data/pg_log
● Launchers can set it (Mac Homebrew/plist)
● Version and config dependent
Ask DB
Logging, turn it on
● Default is to stderr only
● In PG:
logging_collector = on
log_filename = strftime-patterned filename...
Log line prefix
PL/pgSQL
● Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
PL/pgSQL
● Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
● Over dozen of option...
PL/pgSQL
● Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
● Over dozen of option...
PL/pgSQL
● Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM
● Over dozen of option...
Perl function example
CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
my ($x, $y) = @_;
if (not defined ...
XML or JSON support
● Parsing and retrieving XML (functions)
● Valid JSON checks (type)
● Careful with encoding!
– PG allo...
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
INSERT INTO example (data) VALUES
('name => "John Smit...
HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);
INSERT INTO example (data)
VALUES
('name => "John Smit...
XML and JSON datatype
CREATE TABLE test (
...,
xml_file xml,
json_file json,
...
);
XML functions example
XMLROOT (
XMLELEMENT (
NAME gazonk,
XMLATTRIBUTES (
’val’ AS name,
1 + 1 AS num
),
XMLELEMENT (
NAME...
Foreign Data Wrappers (FDW)
● Stop ETL, start FDW
● Read AND write
● FS, Mongo, Hadoop, Redis…
● You can write your own!
FDW vs ETL?
Windowing functions
● Replacement for procedures (somewhat)
● In a nutshell:
– Take row,
– find related rows,
– compute th...
CTEs and recursive queries
● Common table expressions (CTE) and
recursive queries
Index power
● Geo and spherical indexes
● Partial indexes (email like @company.com)
● Function indexes
● JSON(B) has index...
Architecture and internals
Check out processes
●
pgrep -l postgres
●
htop > filter: postgres
● Whatever you like / use usually
●
Careful with kill -9...
Regions
Query path and optimization (no hinting)
Query Path
http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals
Parser
● Syntax checks, like FRIM is not a keyword
– SELECT * FRIM myTable;
● Catalog lookup
– MyTable may not exist
● In ...
Grammar and a query tree
Planner
● Where Planner Tree is built
● Where best execution is decided upon
– Seq or index scan? Index or bitmap index?
–...
Full query path
Example to explain EXPLAIN
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
-------------------------------------------------------...
Explaining EXPLAIN - what
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
--------------------------------------------------------...
Explaining EXPLAIN - how
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
---------------------------------------------------------...
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.uniqu...
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.uniqu...
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.uniqu...
Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.uniqu...
Multithreading and PostgreSQL
Summary
Battle-tested
● Could mature since 1987
● Comes in many flavours (forks)
● Largest cluster – PBs in Yahoo
● Skype, NASA, I...
Great features
● Java, Perl, Python for stored procedures
● Add CTEs and FDWs => great ETL or µservice
● Handles XMLs and ...
Solid internals
● Well-thought out processes
● Built-in security (dozen of solutions)
● WAL, stats collector, vacuum
● Goo...
Disadvantages
● More like Python then Perl/PHP
● Some learning curve
● Some say:
– replication(‘s performance)
● I can’t t...
PostgreSQL – Tomasz Borek
Database for next project?
Why, PostgreSQL of course!
@LAFK_pl
Consultant @
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
Upcoming SlideShare
Loading in …5
×

JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course

27 views

Published on

While losing to Oracle in features, it's losing marginally. While not so long on the market, it's still second best. While not so funky and shiny like new NoSQL DBs, it's arguably most shiny of all relational DBs and it has a colourful history. So, let me tell you about Postgresql architecture and internals, walk you through query path and optimization, let me hint about no hinting and how and why, in another thread we'll talk about MVCC and vacuum and if there will be time for more, we'll have a round of questions.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course

  1. 1. PostgreSQL – Tomasz Borek Database for next project? Why, PostgreSQL of course! @LAFK_pl Consultant @
  2. 2. About me @LAFK_pl Consultant @ Tomasz Borek
  3. 3. What will I tell you? ● Colourful history of PostgreSQL – So, DB wars ● Chosen features ● Architecture and internals ● Query path and optimization (no hinting) ● Multithreading (very briefly, too little time)
  4. 4. Colourful history
  5. 5. History
  6. 6. In-/Postgres forks
  7. 7. Support?
  8. 8. Chosen features
  9. 9. My Faves ● Error reporting / logging ● PL/xSQL – feel free to use Perl, Python, Ruby, Java, LISP... ● XML and JSON handling ● Foreign Data Wrappers (FDW) ● Windowing functions ● Common table expressions (CTE) and recursive queries ● Power of Indexes
  10. 10. Will DB eat your cake? ● Thanks @anandology
  11. 11. Will DB eat your cake? ● Thanks @anandology
  12. 12. Will DB eat your cake? ● Thanks @anandology
  13. 13. The cake is a lie!
  14. 14. Will DB eat your cake? ● Thanks @anandology
  15. 15. Will DB eat your cake? ● Thanks @anandology
  16. 16. Will DB eat your cake? ● Thanks @anandology Consider password VARCHAR(8)
  17. 17. Logging, ‘gotchas’ ● Default is to stderr only ● Set on CLI or in config, not through sets ● Where is it? ● How to log queries… or turning log_collector on
  18. 18. Where is it? ● Default – data/pg_log ● Launchers can set it (Mac Homebrew/plist) ● Version and config dependent
  19. 19. Ask DB
  20. 20. Logging, turn it on ● Default is to stderr only ● In PG: logging_collector = on log_filename = strftime-patterned filename [log_destination = [stderr|syslog|csvlog] ] log_statement = [none|ddl|mod|all] // all log_min_error_statement = ERROR log_line_prefix = '%t %c %u ' # time sessionid user
  21. 21. Log line prefix
  22. 22. PL/pgSQL ● Stored procedure dilemma – Where to keep your logic? – How your logic is NOT in your SCM
  23. 23. PL/pgSQL ● Stored procedure dilemma – Where to keep your logic? – How your logic is NOT in your SCM ● Over dozen of options: – Perl, Python, Ruby, – pgSQL, Java, – TCL, LISP…
  24. 24. PL/pgSQL ● Stored procedure dilemma – Where to keep your logic? – How your logic is NOT in your SCM ● Over dozen of options: – Perl, Python, Ruby, – pgSQL, Java, – TCL, LISP… ● DevOps, SysAdmins, DBAs… ETLs etc.
  25. 25. PL/pgSQL ● Stored procedure dilemma – Where to keep your logic? – How your logic is NOT in your SCM ● Over dozen of options: – Perl, Python, Ruby, – pgSQL, Java, – TCL, LISP… ● DevOps, SysAdmins, DBAs… ETLs etc.
  26. 26. Perl function example CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$ my ($x, $y) = @_; if (not defined $x) { return undef if not defined $y; return $y; } return $x if not defined $y; return $x if $x > $y; return $y; $$ LANGUAGE plperl;
  27. 27. XML or JSON support ● Parsing and retrieving XML (functions) ● Valid JSON checks (type) ● Careful with encoding! – PG allows only one server encoding per database – Specify it to UTF-8 or weep ● Document database instead of OO or rel – JSON, JSONB, HSTORE – noSQL fun welcome!
  28. 28. HSTORE? CREATE TABLE example ( id serial PRIMARY KEY, data hstore);
  29. 29. HSTORE? CREATE TABLE example ( id serial PRIMARY KEY, data hstore); INSERT INTO example (data) VALUES ('name => "John Smith", age => 28, gender => "M"'), ('name => "Jane Smith", age => 24');
  30. 30. HSTORE? CREATE TABLE example ( id serial PRIMARY KEY, data hstore); INSERT INTO example (data) VALUES ('name => "John Smith", age => 28, gender => "M"'), ('name => "Jane Smith", age => 24'); SELECT id, data->'name' FROM example; SELECT id, data->'age' FROM example WHERE data->'age' >= '25';
  31. 31. XML and JSON datatype CREATE TABLE test ( ..., xml_file xml, json_file json, ... );
  32. 32. XML functions example XMLROOT ( XMLELEMENT ( NAME gazonk, XMLATTRIBUTES ( ’val’ AS name, 1 + 1 AS num ), XMLELEMENT ( NAME qux, ’foo’ ) ), VERSION ’1.0’, STANDALONE YES ) <?xml version=’1.0’ standalone=’yes’ ?> <gazonk name=’val’ num=’2’> <qux>foo</qux> </gazonk> xml '<foo>bar</foo>' '<foo>bar</foo>'::xml
  33. 33. Foreign Data Wrappers (FDW) ● Stop ETL, start FDW ● Read AND write ● FS, Mongo, Hadoop, Redis… ● You can write your own!
  34. 34. FDW vs ETL?
  35. 35. Windowing functions ● Replacement for procedures (somewhat) ● In a nutshell: – Take row, – find related rows, – compute things over related rows, – return result along with the row ● Ranking, averaging, growth per time... http://www.craigkerstiens.com/2014/02/26/Tracking-MoM-growth-in-SQL/ https://www.postgresql.org/docs/9.1/static/tutorial-window.html
  36. 36. CTEs and recursive queries ● Common table expressions (CTE) and recursive queries
  37. 37. Index power ● Geo and spherical indexes ● Partial indexes (email like @company.com) ● Function indexes ● JSON(B) has index support ● You may create your own index
  38. 38. Architecture and internals
  39. 39. Check out processes ● pgrep -l postgres ● htop > filter: postgres ● Whatever you like / use usually ● Careful with kill -9 on connections – kill -15 better
  40. 40. Regions
  41. 41. Query path and optimization (no hinting)
  42. 42. Query Path http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals
  43. 43. Parser ● Syntax checks, like FRIM is not a keyword – SELECT * FRIM myTable; ● Catalog lookup – MyTable may not exist ● In the end query tree is built – Query tokenization: SELECT (keyword) employeeName (field id) count (function call)...
  44. 44. Grammar and a query tree
  45. 45. Planner ● Where Planner Tree is built ● Where best execution is decided upon – Seq or index scan? Index or bitmap index? – Which join order? – Which join strategy (nested, hashed, merge)? – Inner or outer? – Aggregation: plain, hashed, sorted… ● Heuristic, if finding all plans too costly
  46. 46. Full query path
  47. 47. Example to explain EXPLAIN EXPLAIN SELECT * FROM tenk1; QUERY PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
  48. 48. Explaining EXPLAIN - what EXPLAIN SELECT * FROM tenk1; QUERY PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) ● Startup cost – time before output phase begins ● Total cost – in page fetches, may change, assumed to run node to completion ● Rows – estimated number to scan (but LIMIT etc.) ● Estimated average width of output from that node (in bytes)
  49. 49. Explaining EXPLAIN - how EXPLAIN SELECT * FROM tenk1; QUERY PLAN ------------------------------------------------------------ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244) SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; //358|10k ● No WHERE, no index ● Cost = disk pages read * seq page cost + rows scanned * cpu tuple cost ● 358 * 1.0 + 10000 * 0.01 = 458 // default values
  50. 50. Analyzing EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning time: 0.181 ms Execution time: 0.501 ms ● Actually runs the query ● More info: actual times, rows removed by filter, sort method used, disk/memory used...
  51. 51. Analyzing EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning time: 0.181 ms Execution time: 0.501 ms
  52. 52. Analyzing EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning time: 0.181 ms Execution time: 0.501 ms
  53. 53. Analyzing EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1) -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1) Recheck Cond: (unique1 < 10) -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1) Index Cond: (unique1 < 10) -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10) Index Cond: (unique2 = t1.unique2) Planning time: 0.181 ms Execution time: 0.501 ms
  54. 54. Multithreading and PostgreSQL
  55. 55. Summary
  56. 56. Battle-tested ● Could mature since 1987 ● Comes in many flavours (forks) ● Largest cluster – PBs in Yahoo ● Skype, NASA, Instagram ● Stable: – Many years on one version – Good version support – Every year something new – Follows ANSI SQL standards https://www.postgresql.org/about/users/
  57. 57. Great features ● Java, Perl, Python for stored procedures ● Add CTEs and FDWs => great ETL or µservice ● Handles XMLs and JSONs ● Error reporting / logging ● MVCC built-in ● Windowing functions ● ...
  58. 58. Solid internals ● Well-thought out processes ● Built-in security (dozen of solutions) ● WAL, stats collector, vacuum ● Good rule engine and clear query optimization – No hinting will bother some people ● Plethora of data types
  59. 59. Disadvantages ● More like Python then Perl/PHP ● Some learning curve ● Some say: – replication(‘s performance) ● I can’t think of more, doesn’t mean none are present :-)
  60. 60. PostgreSQL – Tomasz Borek Database for next project? Why, PostgreSQL of course! @LAFK_pl Consultant @

×