PostgreSQL
worst practices
Ilya Kosmodemiansky
ik@dataegret.com
Best practices are just boring
• Never follow them, try worst practices
• Only worst practices can really help you screw things up in a
most effective way
• PostgreSQL consultants are nice people - keep them happy!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
How it works?
• I have a list, a little bit more than 100 worst practices
• I do not make this stuff up, all of them are real-life examples
• I reshuffle my list every time before presenting and pick a few
examples
• Well, there are some things, which I like more or less, so it is
not a very honest shuffle
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
0. Do not use indexes (a test one!)
• Basically, there is no difference between full table scan and
index scan
• You can check that. Just insert 10 rows into a test table on
your test server and compare.
• Nobody deals with more than 10 row tables in production!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
1. Use as many count(*) as you can
• Figure 301083021830123921 is very informative for the end
user
• If it changes in a second to 30108302894839434020, it is still
informative
• select count(*) from sometable is a quite light-weighted query
• Tuple estimation from pg_catalog can never be precise
enough to you
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
2. Use ORM
• All databases share the same syntax
• You must write database-independent code
• Are there any benefits, which are based on database specific
features?
• It always good to learn a new complicated technology
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
3. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite programming language
• Than join them at the application level
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
3. Move joins to your application
• Just select * a couple of tables into the application written in
your favorite programming language
• Than join them at the application level
• Now you only need to implement nested loop join, hash join
and merge join as well as query optimizer and page cache
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
4. Be in trend, be schema-less
• You do not need to design the schema
• You only need one table, two columns: id bigserial and extra
jsonb
• JSONB datatype is pretty effective in PostgreSQL, you can
query it just like a well-structured table
• Even if you put a 100M of JSON in it
• Even if you have 1000+ tps
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
• When it starts slowing down, just call those four tables The
Core and add 1000+ tables with denormalized data
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
5. Be agile, use EAV
• You only need 3 tables: entity, attribute, value
• At some point add the 4th: attribute_type
• When it starts slowing down, just call those four tables The
Core and add 1000+ tables with denormalized data
• If it is not enough, you can always add value_version
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
6. Try to create as many indexes as you can
• Indexes consume no disk space
• Indexes consume no shared_bufers
• There is no overhead on DML if one and every column in a
table covered with bunch of indexes
• Optimizer will definitely choose your index once you created it
• Keep calm and create more indexes
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
• You will always know where to check, if you run out of disk
space
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
• You will always know where to check, if you run out of disk
space
• You can always call that Big Data
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
7. Always keep all your time series data
• Time series data like tables with logs or session history should
never be deleted, aggregated or archived, you always need to
keep it all
• You will always know where to check, if you run out of disk
space
• You can always call that Big Data
• Solve the problem using partitioning... one partition for an
hour or for a minute
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
8. Turn autovacuum off
• It is quite auxiliary process, you can easily stop it
• There is no problem at all to have 100Gb data in a database
which is 1Tb in size
• 2-3Tb RAM servers are cheap, IO is a fastest thing in modern
computing
• Besides, everyone likes BigData
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
9. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
9. Reinvent Slony
• If you need some data replication to another database, try to
implement it from scratch
• That allows you to run into all sorts of problems PostgreSQL
had since introducing Slony
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
10. Keep master and slave on different hardware
• That will maximize the possibility of unsuccessful failover
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
10. Keep master and slave on different hardware
• That will maximize the possibility of unsuccessful failover
• To make things even worse, you can change only slave-related
parameters at slave, leaving defaults for shared_buffers etc.
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
11. Put a synchronous replica to remote DC
• Indeed! That will maximize availability!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
11. Put a synchronous replica to remote DC
• Indeed! That will maximize availability!
• Especially, if you put the replica to another continent
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
12. Never use Foreign Keys
• Consistency control at application level always works as
expected
• You will never get data inconsistency without constraints
• Even if you already have a bullet proof framework to maintain
consistency, could it be good enough reason to use it?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
13. Always use text type for all columns
• It is always fun to reimplement date or ip validation in your
code
• You will never mistakenly convert ”12-31-2015 03:01AM” to
”15:01 12 of undef 2015” using text fields
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
14. Always use improved ”PostgreSQL”
• Postgres is not a perfect database and you are smart
• All that annoying MVCC staff, 32 bit xid and autovacuum
nightmares look the way they do because hackers are oldschool
and lazy
• Hack it in a hard way, do not bother submitting your patch to
the community, just put it into production
• It is easy to maintain such production and keep it compatible
with ”not perfect” PostgreSQL upcoming versions
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
15. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
15. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
• Oh, it is arguable... It can be, if 100% of developers were
familiar with word timeout
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
15. Postgres likes long transactions
• Always call external services from stored procedures (like
sending emails)
• Oh, it is arguable... It can be, if 100% of developers were
familiar with word timeout
• Anyway, you can just start transaction and go away for a
weekend
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
16. Never read your code, write it!
genre_id IN
( SELECT id FROM genres WHERE genres.id IN
(SELECT * FROM unnest(array[155]))
)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. Have problems with you PostgreSQL installation?
• Move those problems to the container!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. Have problems with you PostgreSQL installation?
• Move those problems to the container!
• It is always good to have something very stable inside
something very unstable!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. Have problems with you PostgreSQL installation?
• Move those problems to the container!
• It is always good to have something very stable inside
something very unstable!
• Now your problems are both inside and outside!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. Not only Slony should be reinvented!
• Need to convert timestamp? Stored procedure in C will help!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. Not only Slony should be reinvented!
• Need to convert timestamp? Stored procedure in C will help!
• Need a message queue? Write it!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
17. Not only Slony should be reinvented!
• Need to convert timestamp? Stored procedure in C will help!
• Need a message queue? Write it!
• Won’t use ORM? Write your own in plpgsql
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
18. And never, never use exceptions
• Documentation says they are slow
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
18. And never, never use exceptions
• Documentation says they are slow
• Raise notice on errors - everyone reads logs constantly!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
18. And never, never use exceptions
• Documentation says they are slow
• Raise notice on errors - everyone reads logs constantly!
• Who cares about errors?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
19. Application runs out of connections?
• Set max_connections to 1000
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
19. Application runs out of connections?
• Set max_connections to 1000
• Common, servers with 1000 CPUs are cheap now
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
19. Application runs out of connections?
• Set max_connections to 1000
• Common, servers with 1000 CPUs are cheap now
• Who said PostgreSQL workers have some overhead?
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
19. Application runs out of connections?
• Set max_connections to 1000
• Common, servers with 1000 CPUs are cheap now
• Who said PostgreSQL workers have some overhead?
• And never ever use pgbouncer!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
• Just like writing a code in that Emacs OS...
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
• Just like writing a code in that Emacs OS...
• Simple config, useful features
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
20. Use pgpool-II instead
• Pooling connections with pgpool is easy...
• Just like writing a code in that Emacs OS...
• Simple config, useful features
• Consulters are happy!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
21. Always start tuning PostgreSQL...
• From optimizer options in postgresql.conf
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
21. Always start tuning PostgreSQL...
• From optimizer options in postgresql.conf
• Forget about those shared_buffers and checkpoints!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
21. Always start tuning PostgreSQL...
• From optimizer options in postgresql.conf
• Forget about those shared_buffers and checkpoints!
• geqo options are a good candidate to be a silver bullet!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
22. Have heard about a cool new feature?
• Use it in production immediately!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
22. Have heard about a cool new feature?
• Use it in production immediately!
• Attend MVCC Unmasked by Bruce Momjian (Today, 15:50,
Liberty I)
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
22. Have heard about a cool new feature?
• Use it in production immediately!
• Attend MVCC Unmasked by Bruce Momjian (Today, 15:50,
Liberty I)
• Learn about xmin and xmax
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
22. Have heard about a cool new feature?
• Use it in production immediately!
• Attend MVCC Unmasked by Bruce Momjian (Today, 15:50,
Liberty I)
• Learn about xmin and xmax
• Use it in your applications logic!
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
Do not forget
That was WORST practices talk
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com
Send me your favourite!
ik@dataegret.com
dataegret.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kerneldevelopers
(for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
dataegret.com

PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky

  • 1.
  • 2.
    Best practices arejust boring • Never follow them, try worst practices • Only worst practices can really help you screw things up in a most effective way • PostgreSQL consultants are nice people - keep them happy! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 3.
    How it works? •I have a list, a little bit more than 100 worst practices • I do not make this stuff up, all of them are real-life examples • I reshuffle my list every time before presenting and pick a few examples • Well, there are some things, which I like more or less, so it is not a very honest shuffle dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 4.
    0. Do notuse indexes (a test one!) • Basically, there is no difference between full table scan and index scan • You can check that. Just insert 10 rows into a test table on your test server and compare. • Nobody deals with more than 10 row tables in production! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 5.
    1. Use asmany count(*) as you can • Figure 301083021830123921 is very informative for the end user • If it changes in a second to 30108302894839434020, it is still informative • select count(*) from sometable is a quite light-weighted query • Tuple estimation from pg_catalog can never be precise enough to you dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 6.
    2. Use ORM •All databases share the same syntax • You must write database-independent code • Are there any benefits, which are based on database specific features? • It always good to learn a new complicated technology dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 7.
    3. Move joinsto your application • Just select * a couple of tables into the application written in your favorite programming language • Than join them at the application level dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 8.
    3. Move joinsto your application • Just select * a couple of tables into the application written in your favorite programming language • Than join them at the application level • Now you only need to implement nested loop join, hash join and merge join as well as query optimizer and page cache dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 9.
    4. Be intrend, be schema-less • You do not need to design the schema • You only need one table, two columns: id bigserial and extra jsonb • JSONB datatype is pretty effective in PostgreSQL, you can query it just like a well-structured table • Even if you put a 100M of JSON in it • Even if you have 1000+ tps dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 10.
    5. Be agile,use EAV • You only need 3 tables: entity, attribute, value dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 11.
    5. Be agile,use EAV • You only need 3 tables: entity, attribute, value • At some point add the 4th: attribute_type dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 12.
    5. Be agile,use EAV • You only need 3 tables: entity, attribute, value • At some point add the 4th: attribute_type • When it starts slowing down, just call those four tables The Core and add 1000+ tables with denormalized data dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 13.
    5. Be agile,use EAV • You only need 3 tables: entity, attribute, value • At some point add the 4th: attribute_type • When it starts slowing down, just call those four tables The Core and add 1000+ tables with denormalized data • If it is not enough, you can always add value_version dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 14.
    6. Try tocreate as many indexes as you can • Indexes consume no disk space • Indexes consume no shared_bufers • There is no overhead on DML if one and every column in a table covered with bunch of indexes • Optimizer will definitely choose your index once you created it • Keep calm and create more indexes dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 15.
    7. Always keepall your time series data • Time series data like tables with logs or session history should never be deleted, aggregated or archived, you always need to keep it all dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 16.
    7. Always keepall your time series data • Time series data like tables with logs or session history should never be deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 17.
    7. Always keepall your time series data • Time series data like tables with logs or session history should never be deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space • You can always call that Big Data dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 18.
    7. Always keepall your time series data • Time series data like tables with logs or session history should never be deleted, aggregated or archived, you always need to keep it all • You will always know where to check, if you run out of disk space • You can always call that Big Data • Solve the problem using partitioning... one partition for an hour or for a minute dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 19.
    8. Turn autovacuumoff • It is quite auxiliary process, you can easily stop it • There is no problem at all to have 100Gb data in a database which is 1Tb in size • 2-3Tb RAM servers are cheap, IO is a fastest thing in modern computing • Besides, everyone likes BigData dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 20.
    9. Reinvent Slony •If you need some data replication to another database, try to implement it from scratch dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 21.
    9. Reinvent Slony •If you need some data replication to another database, try to implement it from scratch • That allows you to run into all sorts of problems PostgreSQL had since introducing Slony dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 22.
    10. Keep masterand slave on different hardware • That will maximize the possibility of unsuccessful failover dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 23.
    10. Keep masterand slave on different hardware • That will maximize the possibility of unsuccessful failover • To make things even worse, you can change only slave-related parameters at slave, leaving defaults for shared_buffers etc. dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 24.
    11. Put asynchronous replica to remote DC • Indeed! That will maximize availability! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 25.
    11. Put asynchronous replica to remote DC • Indeed! That will maximize availability! • Especially, if you put the replica to another continent dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 26.
    12. Never useForeign Keys • Consistency control at application level always works as expected • You will never get data inconsistency without constraints • Even if you already have a bullet proof framework to maintain consistency, could it be good enough reason to use it? dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 27.
    13. Always usetext type for all columns • It is always fun to reimplement date or ip validation in your code • You will never mistakenly convert ”12-31-2015 03:01AM” to ”15:01 12 of undef 2015” using text fields dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 28.
    14. Always useimproved ”PostgreSQL” • Postgres is not a perfect database and you are smart • All that annoying MVCC staff, 32 bit xid and autovacuum nightmares look the way they do because hackers are oldschool and lazy • Hack it in a hard way, do not bother submitting your patch to the community, just put it into production • It is easy to maintain such production and keep it compatible with ”not perfect” PostgreSQL upcoming versions dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 29.
    15. Postgres likeslong transactions • Always call external services from stored procedures (like sending emails) dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 30.
    15. Postgres likeslong transactions • Always call external services from stored procedures (like sending emails) • Oh, it is arguable... It can be, if 100% of developers were familiar with word timeout dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 31.
    15. Postgres likeslong transactions • Always call external services from stored procedures (like sending emails) • Oh, it is arguable... It can be, if 100% of developers were familiar with word timeout • Anyway, you can just start transaction and go away for a weekend dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 32.
    16. Never readyour code, write it! genre_id IN ( SELECT id FROM genres WHERE genres.id IN (SELECT * FROM unnest(array[155])) ) dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 33.
    17. Have problemswith you PostgreSQL installation? • Move those problems to the container! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 34.
    17. Have problemswith you PostgreSQL installation? • Move those problems to the container! • It is always good to have something very stable inside something very unstable! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 35.
    17. Have problemswith you PostgreSQL installation? • Move those problems to the container! • It is always good to have something very stable inside something very unstable! • Now your problems are both inside and outside! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 36.
    17. Not onlySlony should be reinvented! • Need to convert timestamp? Stored procedure in C will help! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 37.
    17. Not onlySlony should be reinvented! • Need to convert timestamp? Stored procedure in C will help! • Need a message queue? Write it! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 38.
    17. Not onlySlony should be reinvented! • Need to convert timestamp? Stored procedure in C will help! • Need a message queue? Write it! • Won’t use ORM? Write your own in plpgsql dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 39.
    18. And never,never use exceptions • Documentation says they are slow dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 40.
    18. And never,never use exceptions • Documentation says they are slow • Raise notice on errors - everyone reads logs constantly! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 41.
    18. And never,never use exceptions • Documentation says they are slow • Raise notice on errors - everyone reads logs constantly! • Who cares about errors? dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 42.
    19. Application runsout of connections? • Set max_connections to 1000 dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 43.
    19. Application runsout of connections? • Set max_connections to 1000 • Common, servers with 1000 CPUs are cheap now dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 44.
    19. Application runsout of connections? • Set max_connections to 1000 • Common, servers with 1000 CPUs are cheap now • Who said PostgreSQL workers have some overhead? dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 45.
    19. Application runsout of connections? • Set max_connections to 1000 • Common, servers with 1000 CPUs are cheap now • Who said PostgreSQL workers have some overhead? • And never ever use pgbouncer! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 46.
    20. Use pgpool-IIinstead • Pooling connections with pgpool is easy... dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 47.
    20. Use pgpool-IIinstead • Pooling connections with pgpool is easy... • Just like writing a code in that Emacs OS... dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 48.
    20. Use pgpool-IIinstead • Pooling connections with pgpool is easy... • Just like writing a code in that Emacs OS... • Simple config, useful features dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 49.
    20. Use pgpool-IIinstead • Pooling connections with pgpool is easy... • Just like writing a code in that Emacs OS... • Simple config, useful features • Consulters are happy! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 50.
    21. Always starttuning PostgreSQL... • From optimizer options in postgresql.conf dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 51.
    21. Always starttuning PostgreSQL... • From optimizer options in postgresql.conf • Forget about those shared_buffers and checkpoints! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 52.
    21. Always starttuning PostgreSQL... • From optimizer options in postgresql.conf • Forget about those shared_buffers and checkpoints! • geqo options are a good candidate to be a silver bullet! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 53.
    22. Have heardabout a cool new feature? • Use it in production immediately! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 54.
    22. Have heardabout a cool new feature? • Use it in production immediately! • Attend MVCC Unmasked by Bruce Momjian (Today, 15:50, Liberty I) dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 55.
    22. Have heardabout a cool new feature? • Use it in production immediately! • Attend MVCC Unmasked by Bruce Momjian (Today, 15:50, Liberty I) • Learn about xmin and xmax dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 56.
    22. Have heardabout a cool new feature? • Use it in production immediately! • Attend MVCC Unmasked by Bruce Momjian (Today, 15:50, Liberty I) • Learn about xmin and xmax • Use it in your applications logic! dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 57.
    Do not forget Thatwas WORST practices talk dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com
  • 58.
    Send me yourfavourite! ik@dataegret.com dataegret.com Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kerneldevelopers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload dataegret.com