Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bigger data with PostgreSQL 9

4,842 views

Published on

Companies today collect more and more data. And those companies also want to ask more and more questions to those datawarehouses. The time that a datawarehouse only needed to be tuned for read queries, and run the etl / elt once a night is over. This brought us some new challenges. With databases sizing to over 500GB, doing hundreds, sometimes even thousands of inserts/updates/deletes every second, and running select queries on the database who's motto is: the more tables we join, the more fun, it brought us some new challenged on how to configure and tune those databases (and servers). Not only dba skills were needed, but good system engineering skills were used too, to get the db running smooth under the heavy workload. We discovered that we needed more than one server. luckily PostreSQL 9 now provides us with streaming replication too. In this talk I will discuss how we took on all challenges, how we setup up our backup / replication strategy, and that all with as little effort as possible by using the right tools for the job.

Published in: Technology, Education

Bigger data with PostgreSQL 9

  1. 1. Slide 1 Bigger data with PostgreSQL 9 Datawarehousing in the 21st century. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. © by Numius nv Open systems, Smarter people
  2. 2. Slide 2 The presenter.. • Bert Desmet • Consultant @ Deloitte • System Engineer / DBA for deloitteanalytics.eu • 'devop'? © by Numius nv Open systems, Smarter people
  3. 3. Slide 3 agenda • Introduction • Release the elephants! • Impacting factors • Divide et impera • Basic configuration • Passing the speed limits • Keep your database fit © by Numius nv Open systems, Smarter people
  4. 4. Slide 4 Big data? ● 44x data growth per year! ● ● 80% of data is unstructured ● ● About 35.2 zettabyte by 2020 The volume will grow by a whopping 650% in the next 5years 80% of organisations will use cloud analytics ● By 2014 80% of eneterprises will want a saas based bi system © by Numius nv Open systems, Smarter people
  5. 5. Slide 5 Know your limits ● DB2 ● More load ● Scaling ● ● ● Speed Data size Pricing © by Numius nv Open systems, Smarter people
  6. 6. Slide 6 Release the elephants! 6 Footer © by Numius nv Open systems, Smarter people
  7. 7. Slide 7 PostgreSQL 9 ● Good for big databases ● Easy maintenance ● Scales! ● Very fast ● Extendable © by Numius nv Open systems, Smarter people
  8. 8. Impacting factors
  9. 9. Slide 9 Higly impacting operations • Dataload • In bulk (ETL) • Row by row. Up to 100k rows / minute • Datafetch (Reporting) • We do like joins. The more the better. © by Numius nv Open systems, Smarter people
  10. 10. Slide 10 Extra problems • a lot of I/O • A lot of cpu power (index creation) • A lot of locks © by Numius nv Open systems, Smarter people
  11. 11. Slide 11 The solution? • Use at least 2 servers • Set up binary replication • Put a lot of ram in your servers. © by Numius nv Open systems, Smarter people
  12. 12. Slide 12 Dataflow © by Numius nv Open systems, Smarter people
  13. 13. Slide 13 Devide et Impera 13 Footer © by Numius nv Open systems, Smarter people
  14. 14. Slide 14 Replication with postgres • 8.3 Warm Standby • 9.0 Async. Binary Replication • 9.1 Synchronous Replication • 9.2 Cascading Replication • 9.3 more improvents towards fail overs / switching masters • 9.4 Multimaster Binary Replication? © by Numius nv Open systems, Smarter people
  15. 15. Slide 15 Configure replication • Wal_level = ‘host standby’ • Checkpoint_segments >= 32 • Checkpoint_completetion_target >= 0.8 • Hot_standby = on • Hot_standby_feedback = on © by Numius nv Open systems, Smarter people
  16. 16. Slide 16 © by Numius nv Open systems, Smarter people
  17. 17. Slide 17 Keep it simple, stupid • 2nd quadrant is pretty awesome • Barman for backups • Repmgr for replication management © by Numius nv Open systems, Smarter people
  18. 18. Slide 18 Basic configuration © by Numius nv Open systems, Smarter people
  19. 19. Slide 19 Raise those memory limits! • shared_buffers = 1/8 to ¼ of RAM • work_mem = 128MB to 1GB • maintenance_work_mem = 512MB to 1GB • temp_buffers = 128MB to 1GB • effective_cache_size = ¾ of RAM • wal_buffers = 32MB © by Numius nv Open systems, Smarter people
  20. 20. Slide 20 Tune the planner for correct planning • Random_page_cost = 3 • Cpu_tuple_cost = 0.1 • Contraint_exclusion=on • From_collapse_limit => 12 • Join_collapse_limit => 12 © by Numius nv Open systems, Smarter people
  21. 21. Slide 21 Passing the speed limits © by Numius nv Open systems, Smarter people
  22. 22. Slide 22 Use partitions • Think about the partition key! • Trigger based for row / row inserts • Rule based for bulk inserts • Make sure you add constraints © by Numius nv Open systems, Smarter people
  23. 23. Slide 23 Use indexes • Learn to read query explains • Use http://explain.depesz.com/ • Don’t over index © by Numius nv Open systems, Smarter people
  24. 24. Slide 24 Other sane things to do • Use unique indexes • Auto created when defining a primary key • Use clustered indexes • And cluster those tables regularly © by Numius nv Open systems, Smarter people
  25. 25. Slide 25 Use partial indexes • Can only be found in Postgres and Mysql. • Really usefull on big tables • Disadvantage: no ‘moving’ indexes. Eg: index for current_day. © by Numius nv Open systems, Smarter people
  26. 26. Keep your database fit
  27. 27. Slide 27 Vacuum • Disable autovacuum for datawarehouses • Vacuum once a day • Check regulary if the vacuums to run! • Prevents data loss • Prevents the database to go out of control, size wise © by Numius nv Open systems, Smarter people
  28. 28. Slide 28 Analyze • Analyze once a day • Together with vacuum • Vacuum analyze <schema>.<table>; • ‘default_statistics_target’ >= 300 © by Numius nv Open systems, Smarter people
  29. 29. Slide 29 Check for bloat! • Free space on tables. • Indexes are not optimized anymore • use nagios check_postgres.pl © by Numius nv Open systems, Smarter people
  30. 30. Slide 30 Prevent bloat • Vacuum full • Offline! • Only when a pk is not available • Repack • Online! • Orders the tables (clustered index) • Needs a pk on the table • Reindex • Reindex regulary. © by Numius nv Open systems, Smarter people
  31. 31. Slide 31 Partial indexes? • Write a script • Use a cronjob • Recreate your time-aware indexes every day. Will be fast. © by Numius nv Open systems, Smarter people
  32. 32. Slide 32 © by Numius nv Open systems, Smarter people
  33. 33. Slide 33 Questions? • Postgres has an awesome community ® • Irc: #postgresql @ freenode • Check the mailing list © by Numius nv Open systems, Smarter people
  34. 34. Slide 34 © by Numius nv Open systems, Smarter people

×