PostgreSQL

PostgreSQL
Reuven M. Lerner (reuven@lerner.co.il)
IL-Techtalks
November 14th, 2012

Who am I?
• Web developer since 1993
• Linux Journal columnist since 1996
• Software architect, developer, consultant
• Mostly Ruby on Rails + PostgreSQL, but
also Python, PHP, Perl, JavaScript, MySQL,
MongoDB, and lots more...
• PostgreSQL user since (at least) 1997

What do I do?

• Web development, especially in Rails
• Teaching/training
• Coaching/consulting

What is a database?

Store data
conﬁdently

Database

Retrieve data
ﬂexibly

Relational databases

Deﬁne tables,
store data in them

Database

Retrieve data from
related tables

Lots of options!

• Oracle
• Microsoft SQL Server
• IBM DB2
• MySQL
• PostgreSQL

How do you choose?
• Integrity (ACID compliance)

• Data types

• Functionality

• Tools

• Extensibility

• Documentation

• Community

PostgreSQL
• Very fast, very scalable. (Just ask Skype.)
• Amazingly ﬂexible, easily extensible.
• Rock-solid — no crashes, corruption,
security issues for years
• Ridiculously easy administration
• It also happens to be free (MIT/BSD)

What about MySQL?
• PostgreSQL has many more features
• Not nearly as popular as MySQL
• No single company behind it
• (A good thing, I think!)
• After using both, I prefer PostgreSQL
• I’ll be happy to answer questions later

Brief history
• Ingres (Stonebreaker, Berkeley)
• Postgres (Stonebreaker, Berkeley)
• PostgreSQL project = Postgres + SQL
• About one major release per year
• Version 8.x — Windows port, recovery
• Version 9.0 — hot replication, upgrades

ACID
• ACID — basic standard for databases
• Atomicity
• Consistency
• Isolation
• Durability
• Pg has always been ACID compliant

Data types
• Boolean
• Numeric (integer, ﬂoat, decimal)
• (var)char, text (inﬁnitely large), binary
• sequences (guaranteed to be unique)
• Date/time and time intervals
• IP addresses, XML, enums, arrays

Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);

Or create your own!

CREATE TYPE Person AS
(first_name TEXT, last_name
TEXT);

CREATE TABLE Members (group_id
INTEGER, member Person);

Strong typing
• PostgreSQL won’t automatically change
types for you.
• This can be annoying at ﬁrst — but it is
meant to protect your data!
• You can cast from one type to another with
the “cast” function or the :: operator
• You can also deﬁne your own casts

PostGIS
• Some people took this all the way
• Want to include geographical information?
• No problem — we’ve got PostGIS!
• Complete GIS solution, with data types and
functions
• Keeps pace with main PostgreSQL revisions

Object oriented tables

• Employee table inherits from People table:
CREATE TABLE Employee
(employee_id INTEGER
department_id INTEGER)
INHERITS (People);

Foreign keys that work
CREATE TABLE DVDs (id SERIAL, title TEXT, store_id
INTEGER REFERENCES Stores);

INSERT INTO DVDs (title, store_id) VALUES ('Attack of
the Killer Tomatoes', 500);



ERROR: insert or update on table "dvds" violates
foreign key constraint "dvds_store_id_fkey"




DETAIL: Key (store_id)=(500) is not present in table
"stores".




"stores".





"stores".


"stores".

Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);

Custom validity checks
CREATE TABLE DVDs (id SERIAL, title
TEXT check (length(title) > 3),
store_id INTEGER REFERENCES
Stores);
INSERT INTO DVDs (title, store_id)
VALUES ('AB', 500);
ERROR: new row for relation "dvds"
violates check constraint
"dvds_title_check"

No more bad dates!
INSERT INTO UPDATES
(created_at) values ('32-
feb-2008');

No more bad dates!
INSERT INTO UPDATES
feb-2008');
ERROR: date/time field value
out of range: "32-feb-2008"

No more bad dates!
INSERT INTO UPDATES
feb-2008');
ERROR: date/time field value
out of range: "32-feb-2008"
LINE 1: insert into updates
feb-2008');

Timestamp vs. Interval
testdb=# select now();
now
-------------------------------
2010-10-31 08:58:23.365792+02
(1 row)
Point in time

testdb=# select now() - interval '3 days';
?column?
-------------------------------
2010-10-28 08:58:28.870011+02
Difference between
(1 row) points in time

Built-in functions
• Math
• Text processing (including regexps)
• Date/time calculations
• Conditionals (CASE, COALESCE, NULLIF)
for use in queries
• Extensive library of geometrical functions

Or write your own!
• PL/pgSQL
• PL/Perl
• PL/Python
• PL/Ruby
• PL/R
• PL/Tcl

Or write your own!
CREATE OR REPLACE FUNCTION remove_cache_tables() RETURNS
VOID AS $$
DECLARE
r pg_catalog.pg_tables%rowtype;
BEGIN
FOR r IN SELECT * FROM pg_catalog.pg_tables
WHERE schemaname = 'public'
AND tablename ILIKE 'cache_%'
LOOP
RAISE NOTICE 'Now dropping table %', r.tablename;
EXECUTE 'DROP TABLE ' || r.tablename;
END LOOP;
END;
$$ LANGUAGE 'plpgsql';

Another example
CREATE OR REPLACE FUNCTION store_hostname() RETURNS
TRIGGER AS $store_hostname$

BEGIN

NEW.hostname := 'http://' ||

substring(NEW.url, '(?:http://)?([^/]+)');

RETURN NEW;

END;

$store_hostname$ LANGUAGE plpgsql;

Triggers

• Yes, that last function was a trigger
• Automatically execute functions upon
INSERT, UPDATE, and/or DELETE
• Can execute before or after
• Very powerful, very fast

Function possibilities
• Computing values, strings
• Returning table-like sets of values
• Encapsulating queries
• Dynamically generating queries via strings
• Triggers: Modifying data before it is inserted
or updated

Why use a PL/lang?

• Other libraries (e.g., CPAN for Perl)
• Faster, optimized functions (eg., R)
• Programmer familiarity
• Cached query plans

Views and rules
• Views are stored SELECT statements
• Pretend that something is a read-only table
• Rules let you turn it into a read/write table
• Intercept and rewrite incoming query
• Check or change data
• Change where data is stored

Full-text indexing

• Built into PostgreSQL
• Handles stop words, different languages,
synonyms, and even (often) stemming
• Very powerful, but it can take some time to
get conﬁgured correctly

Transactions
• In PostgreSQL from the beginning
• Use transactions for just about anything:
BEGIN
DROP TABLE DVDs;
ROLLBACK;
SELECT * FROM DVDs; -- Works!

Savepoints
(or, sub-transactions)
BEGIN;
INSERT INTO table1 VALUES (1);
SAVEPOINT my_savepoint;
ROLLBACK TO SAVEPOINT my_savepoint;
COMMIT;

MVCC
• Readers and writers don’t block each other
• “Multi-version concurrency control”
• xmin, xmax on each tuple; rows are those
tuples with txid_current between them
• Old versions stick around until vacuumed
• Autovacuum removes even this issue

MVCC
• Look at a row’s xmin and xmax
• Look at txid_current()
• Start transaction; look at row’s xmin/xmax
• Look at xmin/xmax on that row from
another session
• Commit, and look again at both!

Downsides of MVCC
• MVCC is usually fantastic
• But if you insert or update many rows, and
then do a COUNT(*), things will be slow
• There are solutions — including more
aggressive auto-vacuuming
• 9.2 introduced features that improved this

Indexing
• Regular, unique indexes
• Functional indexes
• Index calling a function on a column
• Partial indexes
• Index only rows matching criteria
• Cluster table on an index

CTEs
• Adds a “WITH” statement, which deﬁnes a
sorta-kinda temp table
• You can then query that same temp table
• Makes many queries easier to read, write,
without a real temp table
• Better yet: CTEs can be recursive, for
everything from Fibonacci to org charts

Speed and scalability
• MVCC + a smart query optimizer makes
PostgreSQL pretty fast and smart
• Statistics based on previous query results
inform the query planner
• Several scan types, join types are weighed
• Benchmarks consistently show excellent
performance with high mixes of read/write

WAL
• All activity in the database is put in “write-
ahead logs” before it happens
• If the database server fails, it replays the
WALs, then continues
• You can change how often WALs are
written, to improve performance
• PITR — restore database from WALs

Log shipping
• Copy WALs to a second, identical server —
known as “log shipping” — and you have a
backup
• If the primary server goes down, you can
bring the secondary up in its place
• This was known as “warm standby,” and
worked in 8.4

Hot standby,
streaming replication
• As of 9.0, you don’t have to do this
• You can have the primary stream the
information to the secondary
• Almost-instant updates
• The secondary machine can answer read-
only queries (“hot standby”), not just
handle failover

Extensions
• Provides a standardized mechanism for
downloading, installing, and versioning
extensions
• New data types, functions, languages are
possible
• Download, search via pgxn.org
• Similar to CPAN, PyPi, or Ruby gems

SQL/MED

• SQL/MED was introduced in 9.1
• Query information from other databases
(and database-like interfaces)
• So if you have data in MySQL, Oracle,
CSV ... just install a wrapper, and you can
query it like a PostgreSQL table

Unlogged tables

• All actions are logged in WALs
• That adds some overhead, which isn’t
required by throwaway data
• Unlogged tables (different from temp
tables!) offer a speedup, in exchange for
less reliability

New in 9.2
• JSON support
• Range types, for handling
• Much more scalable — from 24 cores and
75k queries/sec to 64 cores and 350k
queries/sec
• Index-only queries (“covering indexes”)
• Cascading replication

Web problems
• PostgreSQL is great as a Web backend
• But if you use an ORM (e.g., ActiveRecord),
you are probably losing much of the power
• e.g., foreign keys, CTE, triggers, and views
• No good way to bridge this gap — for now
• There are always methods, but this is an
area that deﬁnitely needs some work

Tablespaces
• You can create any number of
“tablespaces,” separate storage areas
• Put tables, indexes on different tablespaces
• Most useful with multiple disks
• Separate tables (or parts of a partitioned
table)... or separate tables from indexes

Partitioning
• Combine object-oriented tables, CHECK
clauses, and tablespaces for partitioning
• Example: Invoices from Jan-June go in table
“q12”, and July-December go in table “q34”
• Now PostgreSQL knows where to look
when you SELECT from the parent table
• Note that INSERT requires a trigger

Reﬂection

• pg_catalog schema contains everything
about your database
• Tables, functions, views, etc.
• You can learn a great deal about
PostgreSQL by looking through the
pg_catalog schema

Advanced uses

• GridSQL: Split a query across multiple
PostgreSQL servers
• Very large-scale data warehousing:
Greenplum

Client libraries
• libpq (in C) • Java (JDBC)
• Others by 3 rd • .NET (npgsql)
parties: • ODBC
• Python • JavaScript (!)
• Ruby • Just about any
language you can
• Perl imagine

Tools
• Yeah, tools are more primitive
• If you love GUIs, and hate the command
line, then PostgreSQL will be hard for you
• PgAdmin and other tools are OK, but not
really up to the task for “real” work
• PgAdmin does provide some graphical
query building and “explain” output

Windows compatibility
• It works on Windows
• .NET drivers work, as well
• Logging is far from perfect (can go to the
Windows log tool, but not filtered well)
• Configuration is still in a text file, foreign to
most Windows people
• Windows is still a second-class citizen

Who uses it?
• Afﬁlias
• IMDB
• Apple
• Skype
• BASF
• Sourceforge
• Cisco
• Heroku
• CD Baby
• Checkpoint
• Etsy

Who supports it?

• EnterpriseDB — products and services
• 2 Quadrant
nd

• Many freelancers (like me!)

PostgreSQL problems
• Tuning is still hard (but getting easier)
• Double quotes
• Lack of good GUI-based tools
• Some features (e.g., materialized views) that
people want without having to resort to
hacks and triggers/rules
• Multi-master (of course!)

Bottom line
• PostgreSQL: BSD licensed, easy to install,
easy to use, easy to administer
• Still not quite up to commercial databases
regarding features — but not far behind
• More than good enough for places like
Skype and Afﬁlias; probably good enough
for you!

Want to learn more?
• Mailing lists, wikis, and blogs
• All at http://postgresql.org/
• http://planetpostgresql.org
• PostgreSQL training, consulting,
development, hand-holding, and general
encouragement

Thanks!
(Any questions?)

reuven@lerner.co.il
http://www.lerner.co.il/
054-496-8405
“reuvenlerner” on Skype/AIM

PostgreSQL

More Related Content

What's hot

Similar to PostgreSQL

More from Reuven Lerner

Recently uploaded

PostgreSQL

Editor's Notes