Basic Query Tuning Primer - Pg West 2009mattsmiley
Intro to query tuning in Postgres, for beginners or intermediate software developers. Lists your basic toolkit, common problems, a series of examples. Assumes the audience knows basic SQL but has little or no experience with reading or adjusting execution plans. Accompanies 45-90 minute talk; meant to encourage Q/A.
KaiGai's talk at PGconf.EU 2018, Lisbon.
It shows how SSD2GPU Direct SQL of PG-Strom accelerates I/O intensive big-data queries using GPU in contradiction to the common sense.
New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
PostgreSQL, performance for queries with groupingAlexey Bashtanov
The talk will cover PostgreSQL grouping and aggregation facilities and best practices of using them in fast and efficient manner.
In 40 minutes the audience will learn several techniques to optimise queries containing GROUP BY, DISTINCT or DISTINCT ON keywords.
Basic Query Tuning Primer - Pg West 2009mattsmiley
Intro to query tuning in Postgres, for beginners or intermediate software developers. Lists your basic toolkit, common problems, a series of examples. Assumes the audience knows basic SQL but has little or no experience with reading or adjusting execution plans. Accompanies 45-90 minute talk; meant to encourage Q/A.
KaiGai's talk at PGconf.EU 2018, Lisbon.
It shows how SSD2GPU Direct SQL of PG-Strom accelerates I/O intensive big-data queries using GPU in contradiction to the common sense.
New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
PostgreSQL, performance for queries with groupingAlexey Bashtanov
The talk will cover PostgreSQL grouping and aggregation facilities and best practices of using them in fast and efficient manner.
In 40 minutes the audience will learn several techniques to optimise queries containing GROUP BY, DISTINCT or DISTINCT ON keywords.
PGConf.ASIA 2019 Bali - How did PostgreSQL Write Load Balancing of Queries Us...Equnix Business Solutions
PGConf.ASIA 2019 Bali - 10 September 2019
Speaker: Atsushi Mitani
Room: WAL
Title: How did PostgreSQL Write Load Balancing of Queries Using Transactions?
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
LAS16-TR04: Using tracing to tune and optimize EAS (English)
Speakers: Leo Yan
Date: September 28, 2016
★ Session Description ★
The Energy Aware Scheduler relies on a power model, rather than on heuristics, to make decisions and reduce power usage. As a result of this fact-based decision making EAS presents very few tunables and thus requires a significantly different approach to tuning and optimization when compared to the traditional tune/benchmark/repeat cycle. Tuning EAS has perhaps more similar to debugging: using trace tools such as ftrace, kernelshark and LISA, we can examine its decision making and look for ways to improve this decision making. This talk will offer a practical introduction to using these trace tools.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-tr04
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-tr04/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
PostgreSQL 13 is coming. Find out how to harness the power of new & improved features in PostgreSQL 13.
This presentation explores the following:
- Performance benchmarking: 4 years of going faster
- Logical subscription for partitioned tables
- Partitionwise joins
- BEFORE row-level triggers
- Parallel vacuum for indexes
- Corruption checking: pg_catcheck
- Improved security: libpq with channel binding
- EDB Postgres Advanced Server improved partitioning features
- Additional enhancements - PostgreSQL and EDB Postgres Advanced Server
Faster, better, stronger: The new InnoDBMariaDB plc
For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Parquet performance tuning: the missing guideRyan Blue
Ryan Blue explains how Netflix is building on Parquet to enhance its 40+ petabyte warehouse, combining Parquet’s features with Presto and Spark to boost ETL and interactive queries. Information about tuning Parquet is hard to find. Ryan shares what he’s learned, creating the missing guide you need.
Topics include:
* The tools and techniques Netflix uses to analyze Parquet tables
* How to spot common problems
* Recommendations for Parquet configuration settings to get the best performance out of your processing platform
* The impact of this work in speeding up applications like Netflix’s telemetry service and A/B testing platform
Simplifying Disaster Recovery with Delta LakeDatabricks
There’s a need to develop a recovery process for Delta table in a DR scenario. Cloud multi-region sync is Asynchronous. This type of replication does not guarantee the chronological order of files at the target (DR) region. In some cases, we can expect large files to arrive later than small files.
These are the slides of my talk at PGConf India on 14, February 2019 at Marriot Whitefield, Bengaluru.
These slides will help you understand the various features and limitations of the declarative partitioning in PostgreSQL v10 and v11.
Some might think Docker is for developers only, but this is not really the case.Docker is here to stay and we will only see more of it in the future.
In this session learn what Docker is and how it works.This session will be covering core areas such as volumes, but also stepping it up to a few tips and tricks to help you get the most out of your Docker environment.The session will dive into a few examples of how to create a database environment within just a few minutes - perfect for testing,development, and possibly even production systems.
Machine Learning explained with Examples
Everybody is talking about machine learning. What is it actually and how can I use it?
In this presentation we will see some examples of solving real life use cases using machine learning. We will define Tasks and see how that task can be addressed using machine learning.
SQL Server 2017でLinuxに対応し、その延長線でDocker対応やKubernetesによる可用性構成が組めるようになりました。そしてリリースを間近に控えたSQL Server 2019ではKubernetesを活用したBig Data Cluster機能の提供が予定されており、コンテナの活用範囲はさらに広がっています。
本セッションではこれからSQL Serverコンテナに触れていくための基礎知識と実際に触れてみるための手順やサンプルをお届けします。
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
1. Great performance at scale:
Approaching the real ability of partitioning
performance in PostgreSQL 12
2019/9/25
FUJITSU Limited
Sho Kato/ Naoki Yotsunaga
Copyright 2019 FUJITSU LIMITED1
db tech showcase Tokyo 2019
2. In this talk
n Evolution of partitioning in PostgreSQL 11
n Whatʻs new in PostgreSQL 11
n Performance of partitioning in PostgreSQL 11
n Problem of partitioning in PostgreSQL 11
n Enhancement of partitioning in PostgreSQL 12
n Performance of partitioning in PostgreSQL 12
n Tips and pitfalls from our experience
n Problem and solution in over 1,000s partitions
Copyright 2019 FUJITSU LIMITED2
3. Whatʻs new in PostgreSQL 11
n Hash partitioning
n Enables partitioning by a hashed key column
n PRIMARY KEY, FOREIGN KEY, indexes, and triggers
n Enhanced Partition Pruning
n Partition Pruning…Eliminates unneeded partitions from the processing
n Improves SELECT performance
n Partition wise join
n Processing a join operation effectively
n Improves a join query performance
» Partitioning is strengthened more and more!
Copyright 2019 FUJITSU LIMITED3
4. But, limitation is still remains..
n PostgreSQL 11 manual says that
Copyright 2019 FUJITSU LIMITED4
5. User needs
n We have users who have a table partitioned into 5,500 in DBMS x
n Oops, 5,500 partition is too large..
n They are also attracted to PostgreSQL
n Therefore, verify whether their performance requirements can be satisfied by
PostgreSQL 11 !
Copyright 2019 FUJITSU LIMITED5
6. Benchmark
n Purpose
n To see improvements of PostgreSQL 11 compared to PostgreSQL 10
n To confirm that PostgreSQL 11 can satisfy the performance requirements of users who owns
1,000s of partitions
n Workload
n Read/write a single record (assume OLTP workload)
n Execute each SELECT/UPDATE/DELETE/INSERT statement
n Use JdbcRunner which is open source benchmark tool
n Table definition
n RANGE partitioning, increasing # of partitions to 2, 4,…, 8,192
n Each partition has 1,000 records
n Type of partition key is integer and create B-tree index on partition key
n Comparing
n PostgreSQL 11.1, PostgreSQL 10.6
Copyright 2019 FUJITSU LIMITED6
7. Benchmark environment
n Hardware
n CPU: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz(8 core) * 2
n Memory: 260GB
n Disk: 2.2TB (SSD)
n OS: Red Hat Enterprise Linux Server release 6.5 64bit
n Database tuning
n shared_buffers = 100GB (All records are buffered)
n max_worker_processes = 0, max_parallel_workers_per_gather = 0, max_parallel_workers = 0
n max_wal_size = 20GB, checkpoint_timeout = 1d
n Other
n Single client session
n Client and server are running on the same machine
Copyright 2019 FUJITSU LIMITED7
12. Summary of comparison between PostgreSQL 11 and
PostgreSQL 10
n In few partitions
n Performance of SELECT improves up to 24%
n Performance of INSERT improves up to 10%
n But, performance of DELETE/UPDATE are almost same
n In too many partitions
n With 1,000s of partitions, the same performance as PostgreSQL 10 in all SQLs
n In DBMS x, performance does not change even if # of partitions increases
Copyright 2018 FUJITSU LIMITED12
13. Why does such a difference occur?
n Question
n The algorithm for choosing one partition should not be so different.
n Perhaps is there a bottleneck in the process of handling partitions in PostgreSQL?
n Method for proving
n Execute each DELETE/INSERT/SELECT/UPDATE statement 10 times
and measure the average elapsed time in each module without records
n How to measure
n Use log_XXX_stats = on
Copyright 2019 FUJITSU LIMITED13
14. Overview of query processing in PostgreSQL
n PARSER
n Analyze SQL syntax and convert SQL to Parse tree
n SQL -> Parse tree
n PARSE ANALISYS
n Add information like oid and converts the list of relations in Parse tree to the range table
n Parse tree -> Query tree
n REWRITER
n According to the rule system, rewrite the query tree
n Query tree -> Query tree
n PLANNER
n Refer to statistics and create access plan
n Query tree -> Plan tree
n EXECUTOR
n Execute access plan
Copyright 2019 FUJITSU LIMITED14
15. Elapsed time of each module in PostgreSQL 11 (1/2)
Copyright 2019 FUJITSU LIMITED
11 seconds 150 milliseconds
15
16. Elapsed time of each module in PostgreSQL 11 (2/2)
Copyright 2019 FUJITSU LIMITED
6.8 milliseconds
16
17. Summary of the verification
n Which module have the bottleneck?
n SELECT/DELETE/UPDATE has the bottleneck in PLANNER
n INSERT has the bottleneck in EXECUTOR
n Why is it taking time?
n PostgreSQL 11 has faster partition pruning.
So, planning time should be shorter?
n Executor processes only one record.
So, execution time should be short?
Copyright 2019 FUJITSU LIMITED17
18. Why Planner is slow?
n Partition Pruning
n Creating plans for only targeted partitions
n But the planner is still slow, why?
n Reasons
n Opening all partitions and processing them (SELECT/DELETE/UPDATE)
• Creating queryʼs range table and PlannerInfo
n Creating plans for all partitions (DELETE/UPDATE)
• We can use Partition Pruning for only a SELECT query currently
Copyright 2019 FUJITSU LIMITED
testdb=# explain select * from test.accounts where aid = 1 or aid = 8192;
QUERY PLAN
-------------------------------------------------------------------------
Append (cost=0.00..88.03 rows=46 width=8)
-> Seq Scan on account_part_1 (cost=0.00..43.90 rows=23 width=8)
Filter: ((aid = 1) OR (aid = 8192))
-> Seq Scan on account_part_8192 (cost=0.00..43.90 rows=23 width=8)
Filter: ((aid = 1) OR (aid = 8192))
18
19. Why Executor is slow?
n How Executor works
n Process a plan by 4 steps
1. ExecutorStart():Collecting information(ex . system catalog) for executing a plan
2. ExecutorRun(): Executing a plan
3. ExecutorFinish(): Processing some process like After Trigger, CTE
4. ExecutorEnd(): Releasing resources
n Are there any bottlenecks?
n Reasons
n Locking all partitions with RowExclusiveLock in ExecutorStart()
• Performance degrades as number of partitions increase
Copyright 2019 FUJITSU LIMITED19
20. Can we improve performance in PostgreSQL 12?
n Commits for improving partitioning performance
n Speed up planning when partitions can be pruned at plan time. (428b260f87e)
n Delay lock acquisition for partitions until we route a tuple to them. (9eefba181f7)
n Redesign initialization of partition routing structures (3f2393edefa)
Copyright 2019 FUJITSU LIMITED20
21. Speed up planning with partitions
n Problems
n Opening all partitions and processing them (SELECT/DELETE/UPDATE)
• Creating queryʼs range table and PlannerInfo
n Creating plans for all partitions (DELETE/UPDATE)
• We can use Partition Pruning for only a SELECT query currently
n Solutions
n Doing Partition Pruning before opening partitions
n Applying Partition Pruning for DELETE/UPDATE queries
Copyright 2018 FUJITSU LIMITED21
22. How much performance improves in PostgreSQL 12?
n Purpose
n To confirm that PostgreSQL 12 can satisfy the performance requirements of users who owns
1,000s of partitions
n Workload
n Same as the ones in measuring PostgreSQL 11
n Table definition
n plan_cache_mode = force_custom_plan
n Other definitions are same as the ones in measuring PostgreSQL 11
n Comparing
n PostgreSQL 12beta3, PostgreSQL 11.1
Copyright 2019 FUJITSU LIMITED22
27. Performance improves a lot in PostgreSQL 12
n Comparison in 8,192 partitions
n Performance doesnʼt depend on numbers of partitions
n Great improvements from PostgreSQL 11
Copyright 2019 FUJITSU LIMITED
PostgreSQL 12 PostgreSQL 11
DELETE 5,447 0.1
INSERT 6,705 118
SELECT 7,979 8.1
UPDATE 4,814 0.1
27
28. Tips and pitfalls from our experience
n We are migrating a large system using other DBMS to PostgreSQL 12
n Finally, weʼve achieved almost the same performance
n But, we encountered various problems…
n Weʼll introduce some of the problems and provide tips
n Cases
n Two tables are divided into 1,000 partitions, one with 200 million and another with 20 million
records
n Online processing that creates multiple temporary tables from these tables and joins them
n 40x speedup (120s → 3s) by using the tricks 1 and 2
1. Conditions in subquery do not prune outside partitions
2. Operator overload blocks partition pruning
3. TRUNCATE of one partition blocks access to other partitions
Copyright 2019 FUJITSU LIMITED28
29. Conditions in subquery do not prune outside partitions
n Problem
n Outer query is very slow due to access to unexpectedly many partitions
n Cause
n Partition pruning in subquery is not propagated to outer query when subquery uses GROUP BY
n Workaround
n Add subqueryʼs WHERE conditions to parent query
Copyright 2019 FUJITSU LIMITED
SELECT ・・・ FROM
(SELECT ・・・ FROM T11 WHERE T11.C1 = value1 GROUP BY C1,・・・) T1, T2
WHERE T1.C1 = T2.C1;
SELECT ・・・ FROM
(SELECT ・・・ FROM T11 WHERE T11.C1 = value1 GROUP BY C1, ・・・) T1, T2
WHERE T1.C1 = T2.C1
AND T2.C1 = value1;
29
30. Operator overload blocks partition pruning
n Problem
n Partition pruning does not work when comparing char type partition key and varchar type values
n Cause
n Overloaded the = operator to avoid App changes for explicit casting
n About comparing char and varchar
n Workaround
n Removed overloading of = operator causing problems
Copyright 2019 FUJITSU LIMITED
char = varchar
char::text = varchar::text
Overload
partition_key (char) = value (varchar)
col1=ʻaaaʼ char(5), col2=ʻaaaʼ varchar(5)
・PostgreSQL ・DBMS x
col1 = col2 → ʻaaaʼ = ʻaaaʼ → true col1 = col2 → ʻaaa ʼ = ʻaaaʼ→ false
30
31. TRUNCATE of one partition blocks access to other partitions
n Problem
n While one session TRUNCATEs a partition, other session cannot read/write other partitions of
the same table
n This occurs only when plan_cache_mode = force_generic_plan and EXECUTE statement is
executed
n Cause
n Lock conflict occurs on the TRUNCATEd partition because all partitions are accessed to create a
generic plan
n Workaround
n Use DELETE instead of TRUNCATE
Copyright 2019 FUJITSU LIMITED
TRUNCATE
child table
parent table
Lock conflict
access
31
32. Conclusion
n PostgreSQL 11 is limited to a few hundred partitions in practice
n PostgreSQL 12 can handle thousands of partitions with small overhead
n But, We need to make much effort of query tuning
Copyright 2019 FUJITSU LIMITED32