Teaching PostgreSQL to new people

PostgreSQL – Tomasz Borek
Teaching PostgreSQL to new people
@LAFK_pl
Consultant @

About me
@LAFK_pl
Consultant @
Tomasz Borek

What will I tell you?
● About me (done)
● Show of hands
● Who „new people” might be
– And usually – in my case – are
● About teaching
– Comfort zone, learners, stepping back
● Chosen approaches, features, gotchas and the like
● Why, why, why
● And yes, this’ll be about Postgres, but in an unusual way

Show of hands
● Developers (not PL/SQL ones)

Show of hands
● Developers
● Developers (PL/SQL ones)

Show of hands
● Developers
● DBA (Admin, Architect)

Show of hands
● Developers
● DevOps

Show of hands
● Developers
● DevOps
● SysAdmin

Show of hands
● Developers
● DevOps
● SysAdmin
● Trainers / consultants

Show of hands
● Developers
● DevOps
● SysAdmin
● Trainers / consultants
● Other?

Surprisingly
● Often your colleagues
● Sometimes older
● Sometimes more senior
● Experienced
● With success under their belts

Surprisingly
● Often your colleagues
● Sometimes older
● Sometimes more senior
● Experienced
● With success under their belts
● Basically: FORMED already
– Or MADE, if you will

Developers are problem solvers
● Your colleagues have certain problems
● Is Postgres the solution?
– Or „a solution” at least?
● And how is the learning curve
– Time including

Developers are not SQL people!
● Not many know JOINs very well
● Not many know how indexes work
● Not many know indexes weaknesses
● CTEs, window functions, procedures, cursors…
● They „omit” this
● Comfort zone is nice

Do not abandon them
Or they’ll abandon you

Do not abandon them
● Docs
● Materials
● Tools
● Links to good content
● Pictures, pictures, pictures
● They can edit / comment (Wiki)
● Your (colleagues) time

What is YOUR problem?
● DBA wanting respite for your DB?
● Malpractice in SQL queries?
● Why don’t they use XYZ feature?
● From tomorrow on, teach them some SQL
● Migration from X to Postgres
● Guidelines creation

Xun Kuang once said
不闻不若闻之 , 闻之不若见之 , 见之不若知之 , 知
之不若行之
Xunzi book 8: Ruxiao, chapter 11

Xun Kuang once said
之不若行之
“Not having heard something is not as good as
having heard it; having heard it is not as good as
having seen it; having seen it is not as good as
knowing it; knowing it is not as good as putting it
into practice.”

Xun Kuang paraphrase would be
之不若行之
“Not having heard something < having heard it;
having heard it < having seen it;
having seen it < knowing it;
knowing it < putting it into practice.”

How do they learn?
● „Practice makes master”
– Except it doesn’t
● Learning styles
● Docs still relevant
– If well-placed, accessible and easy to get in

Repetitio est mater studiorum
● Crash course
● Workshop
● Problem solving on their own
● Docs to help
● Code reviews

Comfort zone
● Setup / install
● Moving around
● Logs, timing queries
● EXPLAIN + ANALYZE
● Indexes
● PgSQL and variants
● NoSQL + XML

Chosen features, gotchas etc.
so
How to teach Postgres?

In short
● History – battle-tested, feature-rich, used
● Basics – moving around, commands, etc.
● Prepare your bait accordingly
– My faves
– Advanced features
– NoSQL angle
– …
● Don’t just drink the KoolAid!

Battle-tested
● Matures since 1987
● Comes in many flavours (forks)
● Largest cluster – 2PBs in Yahoo
● Skype, NASA, Instagram
● Stable:
– Many years on one version
– Good version support
– Every year something new
– Follows ANSI SQL standards
https://www.postgresql.org/about/users/

Great angles
● Procedures: Java, Perl, Python, CTEs...
● Enterprise / NoSQL - handles XMLs and JSONs
● Index power – spatial or geo or your own
● CTEs and FDWs => great ETL or µservice
● Pure dev: error reporting / logging, MVCC (dirty
read gone), own index, plenty of data types,
Java/Perl/… inside
● Solid internals: processes, sec built-in,

Basics
● Setup
● Psql
– Moving around
– What’s in
● Indexes
● Joins
● Query path
● Explain, Explain Analyze

Query Path
http://www.slideshare.net/SFScon/sfscon15-peter-moser-the-path-of-a-query-postgresql-internals

Parser
● Syntax checks, like FRIM is not a keyword
– SELECT * FRIM myTable;
● Catalog lookup
– MyTable may not exist
● In the end query tree is built
– Query tokenization: SELECT (keyword)
employeeName (field id) count (function call)...

Planner
● Where Planner Tree is built
● Where best execution is decided upon
– Seq or index scan? Index or bitmap index?
– Which join order?
– Which join strategy (nested, hashed, merge)?
– Inner or outer?
– Aggregation: plain, hashed, sorted…
● Heuristic, if finding all plans too costly

Example to explain EXPLAIN
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00
rows=10000 width=244)

Explaining EXPLAIN - what
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000
width=244)
● Startup cost – time before output phase begins
● Total cost – in page fetches, may change, assumed to
run node to completion
●
Rows – estimated number to scan (but LIMIT etc.)
● Estimated average width of output from that node (in
bytes)

Explaining EXPLAIN - how
QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; //358|10k
●
No WHERE, no index
● Cost = disk pages read * seq page cost + rows scanned
* cpu tuple cost
● 358 * 1.0 + 10000 * 0.01 = 458 // default values

Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10 loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024 rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022 rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms
● Actually runs the query
● More info: actual times, rows removed by filter,
sort method used, disk/memory used...

Analyzing EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.65..118.62 rows=10 width=488) (actual time=0.128..0.377 rows=10 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.47 rows=10 width=244) (actual time=0.057..0.121 rows=10
loops=1)
Recheck Cond: (unique1 < 10)
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.36 rows=10 width=0) (actual time=0.024..0.024
rows=10 loops=1)
Index Cond: (unique1 < 10)
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.29..7.91 rows=1 width=244) (actual time=0.021..0.022
rows=1 loops=10)
Index Cond: (unique2 = t1.unique2)
Planning time: 0.181 ms
Execution time: 0.501 ms

My Faves
● Error reporting
● PL/xSQL – feel free to use Perl, Python, Ruby, Java,
LISP...
● Data types
– XML and JSON handling
● Foreign Data Wrappers (FDW)
● Windowing functions
● Common table expressions (CTE) and recursive queries
● Power of Indexes

Will DB eat your cake?
● Thanks @anandology

Will DB eat your cake?
● Thanks @anandology
Consider password VARCHAR(8)

Logging, ‘gotchas’
● Default is to stderr only
●
Set on CLI or in config, not through sets
● Where is it?
●
How to log queries… or turning log_collector on

Where is it?
● Default
– data/pg_log
● Launchers can set it (Mac Homebrew/plist)
● Version and config dependent

Logging, turn it on
● Default is to stderr only
● In PG:
logging_collector = on
log_filename = strftime-patterned filename
[log_destination = [stderr|syslog|csvlog] ]
log_statement = [none|ddl|mod|all] // all
log_min_error_statement = ERROR
log_line_prefix = '%t %c %u ' # time sessionid user

PL/pgSQL
● Stored procedure dilemma
– Where to keep your logic?
– How your logic is NOT in your SCM

PL/pgSQL
● Over dozen of options:
– Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…

PL/pgSQL
● Over dozen of options:
– Perl, Python, Ruby,
– pgSQL, Java,
– TCL, LISP…
● DevOps, SysAdmins, DBAs… ETLs etc.

Perl function example
CREATE FUNCTION perl_max (integer, integer) RETURNS integer AS $$
my ($x, $y) = @_;
if (not defined $x) {
return undef if not defined $y;
return $y;
}
return $x if not defined $y;
return $x if $x > $y;
return $y;
$$ LANGUAGE plperl;

XML or JSON support
● Parsing and retrieving XML (functions)
● Valid JSON checks (type)
● Careful with encoding!
– PG allows only one server encoding per database
– Specify it to UTF-8 or weep
● Document database instead of OO or rel
– JSON, JSONB, HSTORE – noSQL fun welcome!

HSTORE?
CREATE TABLE example (
id serial PRIMARY KEY,
data hstore);

HSTORE?
data hstore);
INSERT INTO example (data) VALUES
('name => "John Smith", age => 28, gender => "M"'),
('name => "Jane Smith", age => 24');

HSTORE?
data hstore);
INSERT INTO example (data)
VALUES
('name => "John Smith", age => 28,
gender => "M"'),
('name => "Jane Smith", age => 24');
SELECT id,
data->'name'
FROM example;
SELECT id, data->'age'
FROM example
WHERE data->'age' >=
'25';

XML and JSON datatype
CREATE TABLE test (
...,
xml_file xml,
json_file json,
...
);

XML functions example
XMLROOT (
XMLELEMENT (
NAME gazonk,
XMLATTRIBUTES (
’val’ AS name,
1 + 1 AS num
),
XMLELEMENT (
NAME qux,
’foo’
)
),
VERSION ’1.0’,
STANDALONE YES
)
<?xml version=’1.0’
standalone=’yes’ ?>
<gazonk name=’val’
num=’2’>
<qux>foo</qux>
</gazonk>
xml '<foo>bar</foo>'
'<foo>bar</foo>'::xml

Check out processes
●
pgrep -l postgres
●
htop > filter: postgres
● Whatever you like / use usually
●
Careful with kill -9 on connections
– kill -15 better

Before
● Who are they?
● What is your problem?
● How large comfort zone, how to push them out?
● Materials, docs, workshop preparation
● How much time for training?
● How much time after?
● How many people will it be?
● What indicates that problem is solved?

During
● Establish the goal
– And – if possible – learning styles
● Promise support (and tell how!)
– Push out from comfort zone!
● Ask for hard work and stupid questions
● Show documentation, do live tour
● Do the workshop
● Involve, find best ones
– You will have them help you later
● Expect questions, make them ask
– Again, push out from comfort zone!

After
● Where are the docs?
– Are they using them?
● Answer the questions
– Again, and again
● Code reviews
– Deliver on support promise!
– Involve promising students
● Is the problem gone / better?

Don’t omit the basics
● Joins
● Indexes – how they work
● Query path (EXPLAIN, EXPLAIN ANALYZE)
● Moving around (psql)
● Setup and getting to DB

Postgres is cool
● Goodies like error reporting or log line prefix
● Processes thought out
● Good for µservices and enterprise
● Not only SQL (XML, JSON, Perl, Python...)
● Ask DB
● Indexes
● Powerful: CTEs, recursive queries, FDWs...
● Battle tested and always high

Teaching Postgres – Tomasz Borek
Teaching Postgres
to new people
@LAFK_pl
Consultant @

Teaching PostgreSQL to new people

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Teaching PostgreSQL to new people

Similar to Teaching PostgreSQL to new people (20)

More from Tomek Borek

More from Tomek Borek (20)

Recently uploaded

Recently uploaded (20)

Teaching PostgreSQL to new people