PostgreSQL

7,765
-1

Published on

The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,765
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
107
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • PostgreSQL

    1. 1. PostgreSQLReuven M. Lerner (reuven@lerner.co.il) IL-Techtalks November 14th, 2012
    2. 2. Who am I?• Web developer since 1993• Linux Journal columnist since 1996• Software architect, developer, consultant• Mostly Ruby on Rails + PostgreSQL, but also Python, PHP, Perl, JavaScript, MySQL, MongoDB, and lots more...• PostgreSQL user since (at least) 1997
    3. 3. What do I do?• Web development, especially in Rails• Teaching/training• Coaching/consulting
    4. 4. What is a database? Store data confidently DatabaseRetrieve data flexibly
    5. 5. Relational databases Define tables,store data in them DatabaseRetrieve data from related tables
    6. 6. Lots of options!• Oracle• Microsoft SQL Server• IBM DB2• MySQL• PostgreSQL
    7. 7. How do you choose?• Integrity (ACID compliance)• Data types• Functionality• Tools• Extensibility• Documentation• Community
    8. 8. PostgreSQL• Very fast, very scalable. (Just ask Skype.)• Amazingly flexible, easily extensible.• Rock-solid — no crashes, corruption, security issues for years• Ridiculously easy administration• It also happens to be free (MIT/BSD)
    9. 9. PostgreSQL
    10. 10. PostgreSQL
    11. 11. PostgreSQL
    12. 12. PostgreSQL
    13. 13. PostgreSQL
    14. 14. PostgreSQL
    15. 15. What about MySQL?• PostgreSQL has many more features• Not nearly as popular as MySQL• No single company behind it • (A good thing, I think!)• After using both, I prefer PostgreSQL • I’ll be happy to answer questions later
    16. 16. Brief history• Ingres (Stonebreaker, Berkeley)• Postgres (Stonebreaker, Berkeley)• PostgreSQL project = Postgres + SQL• About one major release per year• Version 8.x — Windows port, recovery• Version 9.0 — hot replication, upgrades
    17. 17. ACID• ACID — basic standard for databases • Atomicity • Consistency • Isolation • Durability• Pg has always been ACID compliant
    18. 18. Data types• Boolean• Numeric (integer, float, decimal)• (var)char, text (infinitely large), binary• sequences (guaranteed to be unique)• Date/time and time intervals• IP addresses, XML, enums, arrays
    19. 19. Or create your own!
    20. 20. Or create your own!CREATE TYPE Person AS(first_name TEXT, last_nameTEXT);
    21. 21. Or create your own!CREATE TYPE Person AS(first_name TEXT, last_nameTEXT);
    22. 22. Or create your own!CREATE TYPE Person AS(first_name TEXT, last_nameTEXT);CREATE TABLE Members (group_idINTEGER, member Person);
    23. 23. Strong typing• PostgreSQL won’t automatically change types for you.• This can be annoying at first — but it is meant to protect your data!• You can cast from one type to another with the “cast” function or the :: operator• You can also define your own casts
    24. 24. PostGIS• Some people took this all the way• Want to include geographical information?• No problem — we’ve got PostGIS!• Complete GIS solution, with data types and functions• Keeps pace with main PostgreSQL revisions
    25. 25. Object oriented tables• Employee table inherits from People table: CREATE TABLE Employee (employee_id INTEGER department_id INTEGER) INHERITS (People);
    26. 26. Foreign keys that workCREATE TABLE DVDs (id SERIAL, title TEXT, store_idINTEGER REFERENCES Stores);INSERT INTO DVDs (title, store_id) VALUES (Attack ofthe Killer Tomatoes, 500);
    27. 27. Foreign keys that workCREATE TABLE DVDs (id SERIAL, title TEXT, store_idINTEGER REFERENCES Stores);INSERT INTO DVDs (title, store_id) VALUES (Attack ofthe Killer Tomatoes, 500);ERROR: insert or update on table "dvds" violatesforeign key constraint "dvds_store_id_fkey"
    28. 28. Foreign keys that workCREATE TABLE DVDs (id SERIAL, title TEXT, store_idINTEGER REFERENCES Stores);INSERT INTO DVDs (title, store_id) VALUES (Attack ofthe Killer Tomatoes, 500);ERROR: insert or update on table "dvds" violatesforeign key constraint "dvds_store_id_fkey"DETAIL: Key (store_id)=(500) is not present in table"stores".
    29. 29. Foreign keys that workCREATE TABLE DVDs (id SERIAL, title TEXT, store_idINTEGER REFERENCES Stores);INSERT INTO DVDs (title, store_id) VALUES (Attack ofthe Killer Tomatoes, 500);ERROR: insert or update on table "dvds" violatesforeign key constraint "dvds_store_id_fkey"DETAIL: Key (store_id)=(500) is not present in table"stores".ERROR: insert or update on table "dvds" violatesforeign key constraint "dvds_store_id_fkey"
    30. 30. Foreign keys that workCREATE TABLE DVDs (id SERIAL, title TEXT, store_idINTEGER REFERENCES Stores);INSERT INTO DVDs (title, store_id) VALUES (Attack ofthe Killer Tomatoes, 500);ERROR: insert or update on table "dvds" violatesforeign key constraint "dvds_store_id_fkey"DETAIL: Key (store_id)=(500) is not present in table"stores".ERROR: insert or update on table "dvds" violatesforeign key constraint "dvds_store_id_fkey"DETAIL: Key (store_id)=(500) is not present in table"stores".
    31. 31. Custom validity checksCREATE TABLE DVDs (id SERIAL, titleTEXT check (length(title) > 3),store_id INTEGER REFERENCESStores);INSERT INTO DVDs (title, store_id)VALUES (AB, 500);
    32. 32. Custom validity checksCREATE TABLE DVDs (id SERIAL, titleTEXT check (length(title) > 3),store_id INTEGER REFERENCESStores);INSERT INTO DVDs (title, store_id)VALUES (AB, 500);ERROR: new row for relation "dvds"violates check constraint"dvds_title_check"
    33. 33. No more bad dates!INSERT INTO UPDATES(created_at) values (32-feb-2008);
    34. 34. No more bad dates!INSERT INTO UPDATES(created_at) values (32-feb-2008);ERROR: date/time field valueout of range: "32-feb-2008"
    35. 35. No more bad dates!INSERT INTO UPDATES(created_at) values (32-feb-2008);ERROR: date/time field valueout of range: "32-feb-2008"LINE 1: insert into updates(created_at) values (32-feb-2008);
    36. 36. Timestamp vs. Intervaltestdb=# select now(); now------------------------------- 2010-10-31 08:58:23.365792+02(1 row) Point in timetestdb=# select now() - interval 3 days; ?column?------------------------------- 2010-10-28 08:58:28.870011+02 Difference between(1 row) points in time
    37. 37. Built-in functions• Math• Text processing (including regexps)• Date/time calculations• Conditionals (CASE, COALESCE, NULLIF) for use in queries• Extensive library of geometrical functions
    38. 38. Or write your own!• PL/pgSQL• PL/Perl• PL/Python• PL/Ruby• PL/R• PL/Tcl
    39. 39. Or write your own!CREATE OR REPLACE FUNCTION remove_cache_tables() RETURNSVOID AS $$DECLARE r pg_catalog.pg_tables%rowtype;BEGIN FOR r IN SELECT * FROM pg_catalog.pg_tables WHERE schemaname = public AND tablename ILIKE cache_% LOOP RAISE NOTICE Now dropping table %, r.tablename; EXECUTE DROP TABLE || r.tablename; END LOOP;END;$$ LANGUAGE plpgsql;
    40. 40. Another exampleCREATE OR REPLACE FUNCTION store_hostname() RETURNSTRIGGER AS $store_hostname$ BEGIN NEW.hostname := http:// || substring(NEW.url, (?:http://)?([^/]+)); RETURN NEW; END;$store_hostname$ LANGUAGE plpgsql;
    41. 41. Triggers• Yes, that last function was a trigger• Automatically execute functions upon INSERT, UPDATE, and/or DELETE• Can execute before or after• Very powerful, very fast
    42. 42. Function possibilities• Computing values, strings• Returning table-like sets of values• Encapsulating queries• Dynamically generating queries via strings• Triggers: Modifying data before it is inserted or updated
    43. 43. Why use a PL/lang?• Other libraries (e.g., CPAN for Perl)• Faster, optimized functions (eg., R)• Programmer familiarity• Cached query plans
    44. 44. Views and rules• Views are stored SELECT statements• Pretend that something is a read-only table• Rules let you turn it into a read/write table • Intercept and rewrite incoming query • Check or change data • Change where data is stored
    45. 45. Full-text indexing• Built into PostgreSQL• Handles stop words, different languages, synonyms, and even (often) stemming• Very powerful, but it can take some time to get configured correctly
    46. 46. Transactions• In PostgreSQL from the beginning• Use transactions for just about anything: BEGIN DROP TABLE DVDs; ROLLBACK; SELECT * FROM DVDs; -- Works!
    47. 47. Savepoints(or, sub-transactions)BEGIN;INSERT INTO table1 VALUES (1);SAVEPOINT my_savepoint;INSERT INTO table1 VALUES (2);ROLLBACK TO SAVEPOINT my_savepoint;INSERT INTO table1 VALUES (3);COMMIT;
    48. 48. MVCC• Readers and writers don’t block each other• “Multi-version concurrency control”• xmin, xmax on each tuple; rows are those tuples with txid_current between them• Old versions stick around until vacuumed • Autovacuum removes even this issue
    49. 49. MVCC• Look at a row’s xmin and xmax• Look at txid_current()• Start transaction; look at row’s xmin/xmax• Look at xmin/xmax on that row from another session• Commit, and look again at both!
    50. 50. Downsides of MVCC• MVCC is usually fantastic• But if you insert or update many rows, and then do a COUNT(*), things will be slow• There are solutions — including more aggressive auto-vacuuming• 9.2 introduced features that improved this
    51. 51. Indexing• Regular, unique indexes• Functional indexes • Index calling a function on a column• Partial indexes • Index only rows matching criteria• Cluster table on an index
    52. 52. CTEs• Adds a “WITH” statement, which defines a sorta-kinda temp table• You can then query that same temp table• Makes many queries easier to read, write, without a real temp table• Better yet: CTEs can be recursive, for everything from Fibonacci to org charts
    53. 53. Speed and scalability• MVCC + a smart query optimizer makes PostgreSQL pretty fast and smart• Statistics based on previous query results inform the query planner• Several scan types, join types are weighed• Benchmarks consistently show excellent performance with high mixes of read/write
    54. 54. WAL• All activity in the database is put in “write- ahead logs” before it happens• If the database server fails, it replays the WALs, then continues• You can change how often WALs are written, to improve performance• PITR — restore database from WALs
    55. 55. Log shipping• Copy WALs to a second, identical server — known as “log shipping” — and you have a backup• If the primary server goes down, you can bring the secondary up in its place• This was known as “warm standby,” and worked in 8.4
    56. 56. Hot standby, streaming replication• As of 9.0, you don’t have to do this• You can have the primary stream the information to the secondary • Almost-instant updates• The secondary machine can answer read- only queries (“hot standby”), not just handle failover
    57. 57. Extensions• Provides a standardized mechanism for downloading, installing, and versioning extensions• New data types, functions, languages are possible• Download, search via pgxn.org• Similar to CPAN, PyPi, or Ruby gems
    58. 58. SQL/MED• SQL/MED was introduced in 9.1• Query information from other databases (and database-like interfaces)• So if you have data in MySQL, Oracle, CSV ... just install a wrapper, and you can query it like a PostgreSQL table
    59. 59. Unlogged tables• All actions are logged in WALs• That adds some overhead, which isn’t required by throwaway data• Unlogged tables (different from temp tables!) offer a speedup, in exchange for less reliability
    60. 60. New in 9.2• JSON support• Range types, for handling• Much more scalable — from 24 cores and 75k queries/sec to 64 cores and 350k queries/sec• Index-only queries (“covering indexes”)• Cascading replication
    61. 61. Web problems• PostgreSQL is great as a Web backend• But if you use an ORM (e.g., ActiveRecord), you are probably losing much of the power • e.g., foreign keys, CTE, triggers, and views• No good way to bridge this gap — for now• There are always methods, but this is an area that definitely needs some work
    62. 62. Tablespaces• You can create any number of “tablespaces,” separate storage areas• Put tables, indexes on different tablespaces • Most useful with multiple disks• Separate tables (or parts of a partitioned table)... or separate tables from indexes
    63. 63. Partitioning• Combine object-oriented tables, CHECK clauses, and tablespaces for partitioning• Example: Invoices from Jan-June go in table “q12”, and July-December go in table “q34”• Now PostgreSQL knows where to look when you SELECT from the parent table• Note that INSERT requires a trigger
    64. 64. Reflection• pg_catalog schema contains everything about your database • Tables, functions, views, etc.• You can learn a great deal about PostgreSQL by looking through the pg_catalog schema
    65. 65. Advanced uses• GridSQL: Split a query across multiple PostgreSQL servers• Very large-scale data warehousing: Greenplum
    66. 66. Client libraries• libpq (in C) • Java (JDBC)• Others by 3 rd • .NET (npgsql) parties: • ODBC• Python • JavaScript (!)• Ruby • Just about any language you can• Perl imagine
    67. 67. Tools• Yeah, tools are more primitive• If you love GUIs, and hate the command line, then PostgreSQL will be hard for you• PgAdmin and other tools are OK, but not really up to the task for “real” work • PgAdmin does provide some graphical query building and “explain” output
    68. 68. Windows compatibility• It works on Windows• .NET drivers work, as well• Logging is far from perfect (can go to the Windows log tool, but not filtered well)• Configuration is still in a text file, foreign to most Windows people• Windows is still a second-class citizen
    69. 69. Who uses it?• Affilias • IMDB• Apple • Skype• BASF • Sourceforge• Cisco • Heroku• CD Baby • Checkpoint• Etsy
    70. 70. Who supports it?• EnterpriseDB — products and services• 2 Quadrant nd• Many freelancers (like me!)
    71. 71. PostgreSQL problems• Tuning is still hard (but getting easier)• Double quotes• Lack of good GUI-based tools• Some features (e.g., materialized views) that people want without having to resort to hacks and triggers/rules• Multi-master (of course!)
    72. 72. Bottom line• PostgreSQL: BSD licensed, easy to install, easy to use, easy to administer• Still not quite up to commercial databases regarding features — but not far behind• More than good enough for places like Skype and Affilias; probably good enough for you!
    73. 73. Want to learn more?• Mailing lists, wikis, and blogs • All at http://postgresql.org/ • http://planetpostgresql.org• PostgreSQL training, consulting, development, hand-holding, and general encouragement
    74. 74. Thanks!(Any questions?) reuven@lerner.co.il http://www.lerner.co.il/ 054-496-8405“reuvenlerner” on Skype/AIM
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×