Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
PostgreSQL worst practices
at FOSDEM PGDay Brussels 2017
Ilya Kosmodemiansky
ik@postgresql-consulting.com
Best practices are just boring
• Never follow them, try worst practices
• Only those practices can really help you to scre...
How it works?
• I have a list, a little bit more than 100 worst practices
• I do not make this stuff up, all of them are re...
0. Do not use indexes (a test one!)
• Basically, there is no difference between full table scan and
index scan
• You can ch...
1. Use ORM
• All databases share the same syntax
• You must write database-independent code
• Are there any benefits, which...
2. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite program...
2. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite program...
3. Be in trend, be schema-less
• You do not need to design the schema
• You need only one table, two columns: id bigserial...
4. Be agile, use EAV
• You need only 3 tables: entity, attribute, value
4. Be agile, use EAV
• You need only 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
4. Be agile, use EAV
• You need only 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
• Whet...
4. Be agile, use EAV
• You need only 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
• Whet...
5. Try to create as many indexes as you can
• Indexes consume no disk space
• Indexes consume no shared_bufers
• There is ...
6. Always keep all your time series data
• Time series data like tables with logs or session history should
be never delet...
6. Always keep all your time series data
• Time series data like tables with logs or session history should
be never delet...
6. Always keep all your time series data
• Time series data like tables with logs or session history should
be never delet...
6. Always keep all your time series data
• Time series data like tables with logs or session history should
be never delet...
7. Turn autovacuum off
• It is quite auxiliary process, you can easily stop it
• There is no problem at all to have 100Gb d...
8. Keep master and slave on different hardware
• That will maximize the possibility of unsuccessful failover
8. Keep master and slave on different hardware
• That will maximize the possibility of unsuccessful failover
• To make thin...
9. Put a synchronous replica to remote DC
• Indeed! That will maximize availability!
9. Put a synchronous replica to remote DC
• Indeed! That will maximize availability!
• Especially, if you put the replica ...
10. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
10. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
• That allows...
11. Use as many count(*) as you can
• Figure 301083021830123921 is very informative for the end
user
• If it changes in a ...
12. Never use graphical monitoring
• You do not need graphs
• Because it is an easy task to guess what was happened
yester...
13. Never use Foreign Keys
(Use local produced instead!)
• Consistency control at application level always works as
expect...
14. Always use text type for all columns
• It is always fun to reimplement date or ip validation in your
code
• You will n...
15. Always use improved ”PostgreSQL”
• Postgres is not a perfect database and you are smart
• All that annoying MVCC staff,...
16. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
16. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
• Oh, it...
16. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
• Oh, it...
17. Load your data to PostgreSQL in a smart manner
• Write your own loader, 100 parallel threads minimum
17. Load your data to PostgreSQL in a smart manner
• Write your own loader, 100 parallel threads minimum
• Never use COPY ...
18. Even if you want to backup your database...
• Use replication instead of backup
18. Even if you want to backup your database...
• Use replication instead of backup
• Use pg_dump instead of backup
18. Even if you want to backup your database...
• Use replication instead of backup
• Use pg_dump instead of backup
• Writ...
18. Even if you want to backup your database...
• Use replication instead of backup
• Use pg_dump instead of backup
• Writ...
18. Even if you want to backup your database...
• Use replication instead of backup
• Use pg_dump instead of backup
• Writ...
Do not forget
That was WORST practice talk
Questions or ideas? Share your story!
ik@postgresql-consulting.com
(I’am preparing this talk to be open sourced)
Upcoming SlideShare
Loading in …5
×

PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky

12,526 views

Published on

This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw ten random slides and explain you why such practices are bad and how to avoid running into them.

Published in: Engineering

PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky

  1. 1. PostgreSQL worst practices at FOSDEM PGDay Brussels 2017 Ilya Kosmodemiansky ik@postgresql-consulting.com
  2. 2. Best practices are just boring • Never follow them, try worst practices • Only those practices can really help you to screw the things up most effectively • PostgreSQL consultants are nice people, so try to make them happy
  3. 3. How it works? • I have a list, a little bit more than 100 worst practices • I do not make this stuff up, all of them are real-life examples • I reshuffle my list every time before presenting and extract some amount of examples • Well, there are some things, which I like more or less, so it is not a very honest shuffle
  4. 4. 0. Do not use indexes (a test one!) • Basically, there is no difference between full table scan and index scan • You can check that. Just insert 10 rows into a test table on your test server and compare. • Nobody deals with more than 10 row tables in production!
  5. 5. 1. Use ORM • All databases share the same syntax • You must write database-independent code • Are there any benefits, which are based on database specific features? • It always good to learn a new complicated technology
  6. 6. 2. Move joins to your application • Just select * a couple of tables into the application written in your favorite programming language • Than join them at the application level
  7. 7. 2. Move joins to your application • Just select * a couple of tables into the application written in your favorite programming language • Than join them at the application level • Now you only need to implement nested loop join, hash join and merge join as well as query optimizer and page cache
  8. 8. 3. Be in trend, be schema-less • You do not need to design the schema • You need only one table, two columns: id bigserial and extra jsonb • JSONB datatype is pretty effective in PostgreSQL, you can search in it just like in a well-structured table • Even if you put a 100M of JSON in it • Even if you have 1000+ tps
  9. 9. 4. Be agile, use EAV • You need only 3 tables: entity, attribute, value
  10. 10. 4. Be agile, use EAV • You need only 3 tables: entity, attribute, value • At some point add the 4th: attribute_type
  11. 11. 4. Be agile, use EAV • You need only 3 tables: entity, attribute, value • At some point add the 4th: attribute_type • Whet it starts to work slow, just call those four tables The Core and add 1000+ tables with denormalized data
  12. 12. 4. Be agile, use EAV • You need only 3 tables: entity, attribute, value • At some point add the 4th: attribute_type • Whet it starts to work slow, just call those four tables The Core and add 1000+ tables with denormalized data • If it is not enough, you can always add value_version
  13. 13. 5. Try to create as many indexes as you can • Indexes consume no disk space • Indexes consume no shared_bufers • There is no overhead on DML if one and every column in a table covered with bunch of indexes • Optimizer will definitely choose your index once you created it • Keep calm and create more indexes
  14. 14. 6. Always keep all your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all
  15. 15. 6. Always keep all your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space
  16. 16. 6. Always keep all your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space • You can always call that Big Data
  17. 17. 6. Always keep all your time series data • Time series data like tables with logs or session history should be never deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space • You can always call that Big Data • Solve the problem using partitioning... one partition for an hour or for a minute
  18. 18. 7. Turn autovacuum off • It is quite auxiliary process, you can easily stop it • There is no problem at all to have 100Gb data in a database which is 1Tb in size • 2-3Tb RAM servers are cheap, IO is a fastest thing in modern computing • Besides of that, everyone likes BigData
  19. 19. 8. Keep master and slave on different hardware • That will maximize the possibility of unsuccessful failover
  20. 20. 8. Keep master and slave on different hardware • That will maximize the possibility of unsuccessful failover • To make things worser, you can change only slave-related parameters at slave, leaving defaults for shared_buffers etc.
  21. 21. 9. Put a synchronous replica to remote DC • Indeed! That will maximize availability!
  22. 22. 9. Put a synchronous replica to remote DC • Indeed! That will maximize availability! • Especially, if you put the replica to another continent
  23. 23. 10. Reinvent Slony • If you need some data replication to another database, try to implement it from scratch
  24. 24. 10. Reinvent Slony • If you need some data replication to another database, try to implement it from scratch • That allows you to run into all problems, PostgreSQL have had since introducing Slony
  25. 25. 11. Use as many count(*) as you can • Figure 301083021830123921 is very informative for the end user • If it changes in a second to 30108302894839434020, it is still informative • select count(*) from sometable is a quite light-weighted query • Tuple estimation from pg_catalog can never be precise enough for you
  26. 26. 12. Never use graphical monitoring • You do not need graphs • Because it is an easy task to guess what was happened yesterday at 2 a.m. using command line and grep only
  27. 27. 13. Never use Foreign Keys (Use local produced instead!) • Consistency control at application level always works as expected • You will never get data inconsistency without constraints • Even if you already have a bullet proof framework to maintain consistency, could it be good enough reason to use it?
  28. 28. 14. Always use text type for all columns • It is always fun to reimplement date or ip validation in your code • You will never mistakenly convert ”12-31-2015 03:01AM” to ”15:01 12 of undef 2015” using text fields
  29. 29. 15. Always use improved ”PostgreSQL” • Postgres is not a perfect database and you are smart • All that annoying MVCC staff, 32 bit xid and autovacuum nightmare look like they look because hackers are oldschool and lazy • Hack it in a hard way, do not bother yourself with submitting your patch to the community, just put it into production • It is easy to maintain such production and keep it compatible with ”not perfect” PostgreSQL upcoming versions
  30. 30. 16. Postgres likes long transactions • Always call external services from stored procedures (like sending emails)
  31. 31. 16. Postgres likes long transactions • Always call external services from stored procedures (like sending emails) • Oh, it is arguable... It can be, if 100% of developers were familiar with word timeout
  32. 32. 16. Postgres likes long transactions • Always call external services from stored procedures (like sending emails) • Oh, it is arguable... It can be, if 100% of developers were familiar with word timeout • Anyway, you can just start transaction and go away for weekend
  33. 33. 17. Load your data to PostgreSQL in a smart manner • Write your own loader, 100 parallel threads minimum
  34. 34. 17. Load your data to PostgreSQL in a smart manner • Write your own loader, 100 parallel threads minimum • Never use COPY - it is specially designed for the task
  35. 35. 18. Even if you want to backup your database... • Use replication instead of backup
  36. 36. 18. Even if you want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup
  37. 37. 18. Even if you want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup • Write your own backup script
  38. 38. 18. Even if you want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup • Write your own backup script • As complicated as possible, combine all external tools you know
  39. 39. 18. Even if you want to backup your database... • Use replication instead of backup • Use pg_dump instead of backup • Write your own backup script • As complicated as possible, combine all external tools you know • Never perform a test recovery
  40. 40. Do not forget That was WORST practice talk
  41. 41. Questions or ideas? Share your story! ik@postgresql-consulting.com (I’am preparing this talk to be open sourced)

×