SlideShare a Scribd company logo
1 of 34
Download to read offline
Slide 1

Bigger data with
PostgreSQL 9
Datawarehousing in the 21st century.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

© by Numius nv

Open systems, Smarter people
Slide 2

The presenter..

• Bert Desmet
• Consultant @ Deloitte
• System Engineer / DBA for deloitteanalytics.eu
• 'devop'?

© by Numius nv

Open systems, Smarter people
Slide 3

agenda

• Introduction
• Release the elephants!
• Impacting factors
• Divide et impera
• Basic configuration
• Passing the speed limits
• Keep your database fit

© by Numius nv

Open systems, Smarter people
Slide 4

Big data?

●

44x data growth per year!
●

●

80% of data is unstructured
●

●

About 35.2 zettabyte by 2020

The volume will grow by a whopping 650% in the next 5years

80% of organisations will use cloud analytics
●

By 2014 80% of eneterprises will want a saas based bi system

© by Numius nv

Open systems, Smarter people
Slide 5

Know your limits
●

DB2

●

More load

●

Scaling
●

●

●

Speed
Data size

Pricing

© by Numius nv

Open systems, Smarter people
Slide 6

Release the elephants!

6

Footer

© by Numius nv

Open systems, Smarter people
Slide 7

PostgreSQL 9
●

Good for big databases

●

Easy maintenance

●

Scales!

●

Very fast

●

Extendable

© by Numius nv

Open systems, Smarter people
Impacting factors
Slide 9

Higly impacting operations

• Dataload
• In bulk (ETL)
• Row by row. Up to 100k rows / minute

• Datafetch (Reporting)
• We do like joins. The more the better.

© by Numius nv

Open systems, Smarter people
Slide 10

Extra problems

• a lot of I/O
• A lot of cpu power (index creation)
• A lot of locks

© by Numius nv

Open systems, Smarter people
Slide 11

The solution?

• Use at least 2 servers
• Set up binary replication
• Put a lot of ram in your servers.

© by Numius nv

Open systems, Smarter people
Slide 12

Dataflow

© by Numius nv

Open systems, Smarter people
Slide 13

Devide et Impera

13

Footer

© by Numius nv

Open systems, Smarter people
Slide 14

Replication with postgres

• 8.3 Warm Standby
• 9.0 Async. Binary Replication
• 9.1 Synchronous Replication
• 9.2 Cascading Replication
• 9.3 more improvents towards fail overs / switching masters
• 9.4 Multimaster Binary Replication?

© by Numius nv

Open systems, Smarter people
Slide 15

Configure replication

• Wal_level = ‘host standby’
• Checkpoint_segments >= 32
• Checkpoint_completetion_target >= 0.8
• Hot_standby = on
• Hot_standby_feedback = on

© by Numius nv

Open systems, Smarter people
Slide 16

© by Numius nv

Open systems, Smarter people
Slide 17

Keep it simple, stupid

• 2nd quadrant is pretty awesome
• Barman for backups
• Repmgr for replication management

© by Numius nv

Open systems, Smarter people
Slide 18

Basic configuration

© by Numius nv

Open systems, Smarter people
Slide 19

Raise those memory limits!

• shared_buffers = 1/8 to ¼ of RAM
• work_mem = 128MB to 1GB
• maintenance_work_mem = 512MB to 1GB
• temp_buffers = 128MB to 1GB
• effective_cache_size = ¾ of RAM
• wal_buffers = 32MB

© by Numius nv

Open systems, Smarter people
Slide 20

Tune the planner for correct planning

• Random_page_cost = 3
• Cpu_tuple_cost = 0.1
• Contraint_exclusion=on
• From_collapse_limit => 12
• Join_collapse_limit => 12

© by Numius nv

Open systems, Smarter people
Slide 21

Passing the speed limits

© by Numius nv

Open systems, Smarter people
Slide 22

Use partitions

• Think about the partition key!
• Trigger based for row / row inserts
• Rule based for bulk inserts
• Make sure you add constraints

© by Numius nv

Open systems, Smarter people
Slide 23

Use indexes

• Learn to read query explains
• Use http://explain.depesz.com/
• Don’t over index

© by Numius nv

Open systems, Smarter people
Slide 24

Other sane things to do

• Use unique indexes
• Auto created when defining a primary key

• Use clustered indexes
• And cluster those tables regularly

© by Numius nv

Open systems, Smarter people
Slide 25

Use partial indexes

• Can only be found in Postgres and Mysql.
• Really usefull on big tables
• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.

© by Numius nv

Open systems, Smarter people
Keep your database fit
Slide 27

Vacuum

• Disable autovacuum for datawarehouses
• Vacuum once a day
• Check regulary if the vacuums to run!
• Prevents data loss
• Prevents the database to go out of control, size wise

© by Numius nv

Open systems, Smarter people
Slide 28

Analyze

• Analyze once a day
• Together with vacuum
• Vacuum analyze <schema>.<table>;

• ‘default_statistics_target’ >= 300

© by Numius nv

Open systems, Smarter people
Slide 29

Check for bloat!

• Free space on tables.
• Indexes are not optimized anymore
• use nagios check_postgres.pl

© by Numius nv

Open systems, Smarter people
Slide 30

Prevent bloat

• Vacuum full
• Offline!
• Only when a pk is not available

• Repack
• Online!
• Orders the tables (clustered index)
• Needs a pk on the table

• Reindex
• Reindex regulary.

© by Numius nv

Open systems, Smarter people
Slide 31

Partial indexes?

• Write a script
• Use a cronjob
• Recreate your time-aware indexes every day. Will be fast.

© by Numius nv

Open systems, Smarter people
Slide 32

© by Numius nv

Open systems, Smarter people
Slide 33

Questions?

• Postgres has an awesome community ®
• Irc: #postgresql @ freenode
• Check the mailing list

© by Numius nv

Open systems, Smarter people
Slide 34

© by Numius nv

Open systems, Smarter people

More Related Content

Viewers also liked

Tales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleTales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleSoumya Ranjan Subudhi
 
Monitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafanaMonitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafanaJan Wieck
 
Table partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + RailsTable partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + RailsAgnieszka Figiel
 
Oracle Health Check
Oracle Health CheckOracle Health Check
Oracle Health CheckDinesh Gupta
 
Time Series Database and Tick Stack
Time Series Database and Tick StackTime Series Database and Tick Stack
Time Series Database and Tick StackGianluca Arbezzano
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaTheo Schlossnagle
 
Big Data and PostgreSQL
Big Data and PostgreSQLBig Data and PostgreSQL
Big Data and PostgreSQLPGConf APAC
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).Alexey Lesovsky
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance ToolsBrendan Gregg
 
The Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLAshnikbiz
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
EDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6EDB
 
5 Postgres DBA Tips
5 Postgres DBA Tips5 Postgres DBA Tips
5 Postgres DBA TipsEDB
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practiceAlexey Lesovsky
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEdureka!
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 

Viewers also liked (20)

Tales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleTales from production with postgreSQL at scale
Tales from production with postgreSQL at scale
 
Monitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafanaMonitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafana
 
Table partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + RailsTable partitioning in PostgreSQL + Rails
Table partitioning in PostgreSQL + Rails
 
Database Health Check
Database Health CheckDatabase Health Check
Database Health Check
 
Oracle Health Check
Oracle Health CheckOracle Health Check
Oracle Health Check
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Case Studies on PostgreSQL
Case Studies on PostgreSQLCase Studies on PostgreSQL
Case Studies on PostgreSQL
 
Time Series Database and Tick Stack
Time Series Database and Tick StackTime Series Database and Tick Stack
Time Series Database and Tick Stack
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
 
Big Data and PostgreSQL
Big Data and PostgreSQLBig Data and PostgreSQL
Big Data and PostgreSQL
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
The Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQL
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
EDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB Postgres DBA Best Practices
EDB Postgres DBA Best Practices
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6
 
5 Postgres DBA Tips
5 Postgres DBA Tips5 Postgres DBA Tips
5 Postgres DBA Tips
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 

More from Bert Desmet

Scaling the cloud
Scaling the cloudScaling the cloud
Scaling the cloudBert Desmet
 
Security, you are also part of the game
Security, you are also part of the gameSecurity, you are also part of the game
Security, you are also part of the gameBert Desmet
 
How to gain karma
How to gain karmaHow to gain karma
How to gain karmaBert Desmet
 
Fedora 14 overview
Fedora 14 overviewFedora 14 overview
Fedora 14 overviewBert Desmet
 
Contribute or die
Contribute or dieContribute or die
Contribute or dieBert Desmet
 
How to live with SELinux
How to live with SELinuxHow to live with SELinux
How to live with SELinuxBert Desmet
 
Start hacking already
Start hacking alreadyStart hacking already
Start hacking alreadyBert Desmet
 

More from Bert Desmet (8)

Scaling the cloud
Scaling the cloudScaling the cloud
Scaling the cloud
 
Security, you are also part of the game
Security, you are also part of the gameSecurity, you are also part of the game
Security, you are also part of the game
 
How to gain karma
How to gain karmaHow to gain karma
How to gain karma
 
Fedora 14 overview
Fedora 14 overviewFedora 14 overview
Fedora 14 overview
 
Contribute or die
Contribute or dieContribute or die
Contribute or die
 
How to live with SELinux
How to live with SELinuxHow to live with SELinux
How to live with SELinux
 
Kvm
KvmKvm
Kvm
 
Start hacking already
Start hacking alreadyStart hacking already
Start hacking already
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Bigger data with PostgreSQL 9

  • 1. Slide 1 Bigger data with PostgreSQL 9 Datawarehousing in the 21st century. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. © by Numius nv Open systems, Smarter people
  • 2. Slide 2 The presenter.. • Bert Desmet • Consultant @ Deloitte • System Engineer / DBA for deloitteanalytics.eu • 'devop'? © by Numius nv Open systems, Smarter people
  • 3. Slide 3 agenda • Introduction • Release the elephants! • Impacting factors • Divide et impera • Basic configuration • Passing the speed limits • Keep your database fit © by Numius nv Open systems, Smarter people
  • 4. Slide 4 Big data? ● 44x data growth per year! ● ● 80% of data is unstructured ● ● About 35.2 zettabyte by 2020 The volume will grow by a whopping 650% in the next 5years 80% of organisations will use cloud analytics ● By 2014 80% of eneterprises will want a saas based bi system © by Numius nv Open systems, Smarter people
  • 5. Slide 5 Know your limits ● DB2 ● More load ● Scaling ● ● ● Speed Data size Pricing © by Numius nv Open systems, Smarter people
  • 6. Slide 6 Release the elephants! 6 Footer © by Numius nv Open systems, Smarter people
  • 7. Slide 7 PostgreSQL 9 ● Good for big databases ● Easy maintenance ● Scales! ● Very fast ● Extendable © by Numius nv Open systems, Smarter people
  • 9. Slide 9 Higly impacting operations • Dataload • In bulk (ETL) • Row by row. Up to 100k rows / minute • Datafetch (Reporting) • We do like joins. The more the better. © by Numius nv Open systems, Smarter people
  • 10. Slide 10 Extra problems • a lot of I/O • A lot of cpu power (index creation) • A lot of locks © by Numius nv Open systems, Smarter people
  • 11. Slide 11 The solution? • Use at least 2 servers • Set up binary replication • Put a lot of ram in your servers. © by Numius nv Open systems, Smarter people
  • 12. Slide 12 Dataflow © by Numius nv Open systems, Smarter people
  • 13. Slide 13 Devide et Impera 13 Footer © by Numius nv Open systems, Smarter people
  • 14. Slide 14 Replication with postgres • 8.3 Warm Standby • 9.0 Async. Binary Replication • 9.1 Synchronous Replication • 9.2 Cascading Replication • 9.3 more improvents towards fail overs / switching masters • 9.4 Multimaster Binary Replication? © by Numius nv Open systems, Smarter people
  • 15. Slide 15 Configure replication • Wal_level = ‘host standby’ • Checkpoint_segments >= 32 • Checkpoint_completetion_target >= 0.8 • Hot_standby = on • Hot_standby_feedback = on © by Numius nv Open systems, Smarter people
  • 16. Slide 16 © by Numius nv Open systems, Smarter people
  • 17. Slide 17 Keep it simple, stupid • 2nd quadrant is pretty awesome • Barman for backups • Repmgr for replication management © by Numius nv Open systems, Smarter people
  • 18. Slide 18 Basic configuration © by Numius nv Open systems, Smarter people
  • 19. Slide 19 Raise those memory limits! • shared_buffers = 1/8 to ¼ of RAM • work_mem = 128MB to 1GB • maintenance_work_mem = 512MB to 1GB • temp_buffers = 128MB to 1GB • effective_cache_size = ¾ of RAM • wal_buffers = 32MB © by Numius nv Open systems, Smarter people
  • 20. Slide 20 Tune the planner for correct planning • Random_page_cost = 3 • Cpu_tuple_cost = 0.1 • Contraint_exclusion=on • From_collapse_limit => 12 • Join_collapse_limit => 12 © by Numius nv Open systems, Smarter people
  • 21. Slide 21 Passing the speed limits © by Numius nv Open systems, Smarter people
  • 22. Slide 22 Use partitions • Think about the partition key! • Trigger based for row / row inserts • Rule based for bulk inserts • Make sure you add constraints © by Numius nv Open systems, Smarter people
  • 23. Slide 23 Use indexes • Learn to read query explains • Use http://explain.depesz.com/ • Don’t over index © by Numius nv Open systems, Smarter people
  • 24. Slide 24 Other sane things to do • Use unique indexes • Auto created when defining a primary key • Use clustered indexes • And cluster those tables regularly © by Numius nv Open systems, Smarter people
  • 25. Slide 25 Use partial indexes • Can only be found in Postgres and Mysql. • Really usefull on big tables • Disadvantage: no ‘moving’ indexes. Eg: index for current_day. © by Numius nv Open systems, Smarter people
  • 27. Slide 27 Vacuum • Disable autovacuum for datawarehouses • Vacuum once a day • Check regulary if the vacuums to run! • Prevents data loss • Prevents the database to go out of control, size wise © by Numius nv Open systems, Smarter people
  • 28. Slide 28 Analyze • Analyze once a day • Together with vacuum • Vacuum analyze <schema>.<table>; • ‘default_statistics_target’ >= 300 © by Numius nv Open systems, Smarter people
  • 29. Slide 29 Check for bloat! • Free space on tables. • Indexes are not optimized anymore • use nagios check_postgres.pl © by Numius nv Open systems, Smarter people
  • 30. Slide 30 Prevent bloat • Vacuum full • Offline! • Only when a pk is not available • Repack • Online! • Orders the tables (clustered index) • Needs a pk on the table • Reindex • Reindex regulary. © by Numius nv Open systems, Smarter people
  • 31. Slide 31 Partial indexes? • Write a script • Use a cronjob • Recreate your time-aware indexes every day. Will be fast. © by Numius nv Open systems, Smarter people
  • 32. Slide 32 © by Numius nv Open systems, Smarter people
  • 33. Slide 33 Questions? • Postgres has an awesome community ® • Irc: #postgresql @ freenode • Check the mailing list © by Numius nv Open systems, Smarter people
  • 34. Slide 34 © by Numius nv Open systems, Smarter people