Join Methods and 12c Adaptive Plans
In its quest to improve cardinality estimation, 12c has introduced Adaptive Execution Plans which deals with the cardinalities that are difficult to estimate before execution. Ever seen a hanging query because a nested loop join is running on millions of rows?
This is the point addressed by Adaptive Joins. But that new feature is also a good occasion to look at the four possible join methods available for years.
Oracle Parallel Distribution and 12c Adaptive PlansFranck Pachot
Parallel Distribution and 12c Adaptive Plans
In the previous newsletter we have seen how 12c can defer the choice of the join method to the first execution. We considered only serial execution plans. But besides join method, the cardinality estimation is a key decision for parallel distribution when joining in parallel query. Ever seen a parallel query consuming huge tempfile space because a large table is broadcasted to lot of parallel processes? This is the point addressed by Adaptive Parallel Distribution.
Once again, that new feature is a good occasion to look at the different distribution methods.
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionFranck Pachot
Besides adaptive joins and adaptive parallel distribution, 12c comes with Adaptive Bitmap Pruning. I’ll describe the case it applies to and which is often not well known: the Star Transformation
CBO choice between Index and Full Scan: the good, the bad and the ugly param...Franck Pachot
Usually, the conclusion comes at the end. But here I will clearly show my goal: I wish I will never see the optimizer_index_cost_adj parameters again. Especially when going to 12c where Adaptive Join can be completely fooled because of it. Choosing between index access and full table scan is a key point when optimizing a query, and historically the CBO came with several ways to influence that choice. But on some system, the workarounds have accumulated one on top of the other – biasing completely the CBO estimations. And we see nested loops on huge number of rows because of those false estimations.
Oracle In-Memory option to improve analytic queries
In-Memory is now the trend for database editors. But they do not implement in-memory storage for the same reason nor the same architecture. Oracle In-Memory option directly addresses BI reporting. Oracle has always favored an hybrid approach where we can query the OLTP database (from Oracle 6 the reads do not block the writes) and the In-Memory approach follows the same philosophy: run efficient analytic queries on OLTP databases. Their first approach to columnar storage was bitmap indexes in 8i, which were very efficient to support ad-hoc queries but not compatible with an OLTP workload. Then came the Exadata SmartScan and Hybrid Columnar Compression that was still addressing datawarehouses load and reporting.
The 12c In-Memory option now gives columnar and in-memory efficiency directly on the OLTP database, without changing any design or code. A demo will show how this option can be used.
The Query Optimizer is the “brain” of your Postgres database. It interprets SQL queries and determines the fastest method of execution. Using the EXPLAIN command , this presentation shows how the optimizer interprets queries and determines optimal execution.
This presentation will give you a better understanding of how Postgres optimally executes their queries and what steps you can take to understand and perhaps improve its behavior in your environment.
To listen to the webinar recording, please visit EnterpriseDB.com > Resources > Ondemand Webcasts
If you have any questions please email sales@enterprisedb.com
Oracle Parallel Distribution and 12c Adaptive PlansFranck Pachot
Parallel Distribution and 12c Adaptive Plans
In the previous newsletter we have seen how 12c can defer the choice of the join method to the first execution. We considered only serial execution plans. But besides join method, the cardinality estimation is a key decision for parallel distribution when joining in parallel query. Ever seen a parallel query consuming huge tempfile space because a large table is broadcasted to lot of parallel processes? This is the point addressed by Adaptive Parallel Distribution.
Once again, that new feature is a good occasion to look at the different distribution methods.
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionFranck Pachot
Besides adaptive joins and adaptive parallel distribution, 12c comes with Adaptive Bitmap Pruning. I’ll describe the case it applies to and which is often not well known: the Star Transformation
CBO choice between Index and Full Scan: the good, the bad and the ugly param...Franck Pachot
Usually, the conclusion comes at the end. But here I will clearly show my goal: I wish I will never see the optimizer_index_cost_adj parameters again. Especially when going to 12c where Adaptive Join can be completely fooled because of it. Choosing between index access and full table scan is a key point when optimizing a query, and historically the CBO came with several ways to influence that choice. But on some system, the workarounds have accumulated one on top of the other – biasing completely the CBO estimations. And we see nested loops on huge number of rows because of those false estimations.
Oracle In-Memory option to improve analytic queries
In-Memory is now the trend for database editors. But they do not implement in-memory storage for the same reason nor the same architecture. Oracle In-Memory option directly addresses BI reporting. Oracle has always favored an hybrid approach where we can query the OLTP database (from Oracle 6 the reads do not block the writes) and the In-Memory approach follows the same philosophy: run efficient analytic queries on OLTP databases. Their first approach to columnar storage was bitmap indexes in 8i, which were very efficient to support ad-hoc queries but not compatible with an OLTP workload. Then came the Exadata SmartScan and Hybrid Columnar Compression that was still addressing datawarehouses load and reporting.
The 12c In-Memory option now gives columnar and in-memory efficiency directly on the OLTP database, without changing any design or code. A demo will show how this option can be used.
The Query Optimizer is the “brain” of your Postgres database. It interprets SQL queries and determines the fastest method of execution. Using the EXPLAIN command , this presentation shows how the optimizer interprets queries and determines optimal execution.
This presentation will give you a better understanding of how Postgres optimally executes their queries and what steps you can take to understand and perhaps improve its behavior in your environment.
To listen to the webinar recording, please visit EnterpriseDB.com > Resources > Ondemand Webcasts
If you have any questions please email sales@enterprisedb.com
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
Agenda:
- Introduction to Optimizer Hint
- Why Optimizer
- Hint Query
- Hint Statistics
- Hint Data
- Hint Drawback
By Kumar Rajiv Rastogi at India PG Day at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
In this presentation I am illustrating how and why InnodDB perform Merge and Split pages. I will also show what are the possible things to do to reduce the impact.
An overview presentation covering the use of Oracle's PX functionality including some tips and traps. Detailed white paper at http://oracledoug.com/px.html
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Exadata X3 in action: Measuring Smart Scan efficiency with AWRFranck Pachot
Exadata comes with new statistics and wait events that can be used to measure the efficiency of its main features (Smart
Scan offloading, Storage Indexes, Hybrid Columnar Compression and Smart Flash Cache).
The goal of this article is not to compare Exadata with any other platforms, but rather to help understand the few basic
statistics that we must know in order to evaluate if Exadata is a good solution for a specific workload, and how to
measure that the Exadata features are well used.
We will cover those few statistics from the ‘Timed events’ and ‘System Statistics’ sections of the AWR report.
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
Agenda:
- Introduction to Optimizer Hint
- Why Optimizer
- Hint Query
- Hint Statistics
- Hint Data
- Hint Drawback
By Kumar Rajiv Rastogi at India PG Day at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
In this presentation I am illustrating how and why InnodDB perform Merge and Split pages. I will also show what are the possible things to do to reduce the impact.
An overview presentation covering the use of Oracle's PX functionality including some tips and traps. Detailed white paper at http://oracledoug.com/px.html
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Exadata X3 in action: Measuring Smart Scan efficiency with AWRFranck Pachot
Exadata comes with new statistics and wait events that can be used to measure the efficiency of its main features (Smart
Scan offloading, Storage Indexes, Hybrid Columnar Compression and Smart Flash Cache).
The goal of this article is not to compare Exadata with any other platforms, but rather to help understand the few basic
statistics that we must know in order to evaluate if Exadata is a good solution for a specific workload, and how to
measure that the Exadata features are well used.
We will cover those few statistics from the ‘Timed events’ and ‘System Statistics’ sections of the AWR report.
Migrating to Oracle Database 12c: 300 DBs in 300 days.Ludovico Caldara
For a customer in Switzerland, we are in process of migrating 400 databases to 12c. We have migrated 300 so far, and we have had good and bad surprises. This session will show a few scenarios that we faced during the upgrade project.
1 in 20 IT projects deliver on time and satisfy business management.
That means 95% of IT projects are partial or total failures.
This presentation explores the causes, and potential strategies for delivering a project in the 5% bracket.
Design and develop with performance in mind
Establish a tuning environment
Index wisely
Reduce parsing
Take advantage of Cost Based Optimizer
Avoid accidental table scans
Optimize necessary table scans
Optimize joins
Use array processing
Consider PL/SQL for “tricky” SQL
Are you an Oracle developer or a DBA?
Do you know the difference between aggregate and analytic functions?
Without complex sub-queries or self-joins, do you know how to:
Calculate running/cumulative totals and moving/centered averages?
List products with revenues above or below their peers or product groups?
Compute the ratio of one category’s sales to the total sales?
Select the Top-N or Top N % of the customers/products?
Classify advertisers into quartiles/n-tiles based on the revenue potential?
Compare period-over-period (year-over-year, month-over-month) growth and rank advancement?
Convert rows into columns (pivot), columns into rows (unpivot) or aggregate strings?
Perform what-if analysis and hypothetical ranking?
Analytic functions are more performant because tables need to be scanned only once. They make you more productive because there is no need to write procedural code. No wonder Tom Kyte, a well-respected Oracle guru, says analytic functions are the best thing to happen after the sliced bread.
In the first half, I will cover the basics of the various analytic functions:
Ranking: RANK, DENSE_RANK, ROW_NUMBER, NTILE, CUME_DIST, PERCENTILE_RANK
Windowing: SUM, AVG, MAX, MIN, FIRST_VALUE, LAST_VALUE
Reporting: RATIO_TO_REPORT
Others: FIRST/LAST, LEAD/LAG, hypothetical ranking,
In the second half, I will show how powerful these functions are with a few examples.
If there is time, I will cover enhanced aggregation (ROLLUP, CUBE, GROUPING SET extensions to GROUP BY clause)
This class would be useful for both developers and DBAs alike, especially for those working in Analytic, Business Intelligence, and Datawarehouse environments.
Are you already an expert in analytic functions? Then come and help me refine the content.
For more info, read
http://download.oracle.com/docs/cd/E11882_01/server.112/e16579/analysis.htm
http://download.oracle.com/docs/cd/E11882_01/server.112/e16579/aggreg.htm
rollup, cross-tabulation across different dimensions using ROLLUP, CUBE and GROUPING SETS extension to GROUP BY clause
, most active time-periods (i.e. days when the most number of tickets are open in BZ, hours with the most take-off and landings, months with the highest sales, 5-minute periods with the maximum number of calls made, etc)
data densification?
their rank last year, this year, rank growth, running/cumulative total (Year-To-Date/Month-To-Date summation), moving averages, Year-Over-Year comparison, sales projection, average/min/max time between one sale and the next sale, products with above and below average sales.
overall average, sum, departmental average, sum, ranking, job wise ranking in one SQL.
This presentation is an INTRODUCTION to intermediate MySQL query optimization for the Audience of PHP World 2017. It covers some of the more intricate features in a cursory overview.
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
Let's see what major performance improvements PostgreSQL 9.5 brings, measure the impact on simple examples and also briefly look at improvements likely to appear in PostgreSQL 9.6 or some of the following releases.
Press the link to see the book from my google drive.https.docxChantellPantoja184
Press the link to see the book from my google drive.
https://drive.google.com/file/d/0Bz85sYc-djmHcjBiU2tkQW5aeEJGdHFKci1NWjZJTnV0WDVB/view?usp=sharing
Homework Assignment 3: Chapter 3 St. Clair & Visick, Putting your skills into practice, problem 5
Tuesday, October 28 Homework Assignment 3 will be due Tuesday, November 4.
What changes are needed to construct a semi-global alignment like in the third homework assignment? The global alignment works pretty well on sequences that are nearly the same length. Let's try another example where the sequence lengths are more disparate.
$ ruby global.rb -d cgctatag cta
Dynamic programming table:
| | c | g | c | t | a | t | a | g |
----+------+------+------+------+------+------+------+------+------+
| | | | | | | | | |
| 0 |< -1 |< -2 |< -3 |< -4 |< -5 |< -6 |< -7 |< -8 |
----+------+------+------+------+------+------+------+------+------+
| ^ |\ | |\ | | | | | |
c | -1 | 1 |< 0 |< -1 |< -2 |< -3 |< -4 |< -5 |< -6 |
----+------+------+------+------+------+------+------+------+------+
| ^ | ^ |\ |\ |\ | |\ | | |
t | -2 | 0 | 1 |< 0 | 0 |< -1 |< -2 |< -3 |< -4 |
----+------+------+------+------+------+------+------+------+------+
| ^ | ^ |\ ^ |\ |\ |\ | |\ | |
a | -3 | -1 | 0 | 1 |< 0 | 1 |< 0 |< -1 |< -2 |
----+------+------+------+------+------+------+------+------+------+
Alignment 1
cgctatag
__c__ta_
Alignment 2
cgctatag
c____ta_
Alignment 3
cgctatag
__ct__a_
Alignment 4
cgctatag
c__t__a_
Alignment 5
cgctatag
__cta___
Alignment 6
cgctatag
c__ta___
The 5th alignment really looks better here even though they all 6 scored the same -2. The problem is that terminal gaps are scored the same as internal gaps. If we are trying to see if a short sequence lines up best with a similar sized piece that is somewhere inside the longer sequence, internal gaps need to have a larger penalty than terminal gaps. If the terminal gap penalty was reduced to 0 while the other scoring stayed the same, that should get the desired result where the 5th alignment is clearly the best with a score of 3. Simply modifying how the global alignment program fills in the outside rows and columns of the dynamic programming table should be all that is required to do a semi-global alignment.
$ ruby semi-global.rb -d cgctatag cta
Dynamic programming table:
| | C | G | C | T | A | T | A | G |
----+------+------+------+------+------+------+------+------+------+
| | | | | | | | | |
| 0 |< 0 |< 0 |< 0 |< 0 |< 0 |< 0 |< 0 |< 0 |
----+------+------+------+------+------+------+------+------+------+
| ^ |\ |\ |\ |\ |\ |\ |\ |\ ^ |
C | 0 | 1 |.
Similar to Oracle Join Methods and 12c Adaptive Plans (20)
19 features you will miss if you leave Oracle DatabaseFranck Pachot
Today, the choice of the database technology is in the hands of the developers, and Open Source alternatives look appealing. Many talks and papers show the growing features of MySQL or PostgreSQL, which competes with the commercial RDBMS. But do not forget the core features that are there for decades. They may not be there, or less efficient, in those alternative solutions. Let’s show those basic features, with demos: in-place updates, cursor sharing, guaranteed disk writes, advanced materialized views, IOT, wait events, online operations,… Many were already in Oracle 30 years ago and we take them for granted. We may not realize what we will miss when going to another RDBMS.
12cR2 Single-Tenant: Multitenant Features for All EditionsFranck Pachot
Multitenant architecture is available even without Oracle's multitenant option. In this session take a look at the overhead and the 12.2 new features so that you can choose among single-tenant or non-container databases. These features include agility in data movement, easy flashback, and fast upgrade.
Il y a souvent des difficultés dans la communication entre les équipes d’exploitation et les équipes
de développement. Les enjeux ne sont pas les mêmes: les uns ont pour mission de stabiliser
le système, les autres au contraire de le faire évoluer. Ces incompréhensions sont encore plus
fortes avec les équipes BI, car les bases BI ont des besoins très différents des applications traditionnelles.
En expliquant ces différences, j’espère amener à une meilleure compréhension entre
les équipes. C’est aussi l’occasion de parler des technologies récentes qui adressent les besoins
BI: Exadata, In-Memory, réplication temps réel,…
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Oracle Join Methods and 12c Adaptive Plans
1. 12 Tips&techniques
Franck Pachot, dbi services
Join Methods and 12c
Adaptive Plans
In its quest to improve cardinality estimation,
12c has introduced Adaptive Execution Plans which
deals with the cardinalities that are difficult to estimate
before execution. Ever seen a hanging query
because a nested loop join is running on millions of
rows?
This is the point addressed by Adaptive Joins.
But that new feature is also a good occasion to look
at the four possible join methods available for years.
Nested, Hash, Merge… What is
a join?
In a relational database, each business entity is stored in
a separate table. That makes its strength (query from differ-ent
perspectives) but involves an expensive operation when
we need to combine rows from several tables: the join opera-tion.
SQL> select * from DEPT , EMP where DEPT.DEPTNO=EMP.DEPTNO;
The Oracle legacy syntax lists all the tables in the FROM
clause, which, from a logical point of view only, does a carte-sian
product (combine each row from one source to each row
from the other source) and then apply the join condition de-fined
in the WHERE clause, as for any filtering predicates.
The result – in the case of an inner join – is formed with the
rows where the join condition evaluates to true.
Of course, this is not what is actually done or it would be
very inefficient. The cartesian product multiplies the cardi-nalities
from both sources before filtering, but the key to opti-mize
a query is to filter as soon as possible before doing
other operations.
SQL> select * from DEPT join EMP on DEPT.DEPTNO=EMP.DEPTNO;
The ANSI join syntax gives a better picture because it
joins the tables one by one: the left table (that I will call the
outer rowsource from now) is joined to the right table (the in-ner
table) on a specific condition. That join condition involves
some columns from both tables and it can be an equality
(such as outer.colA=inner.colA) – this is an equijoin – or it can
be an inequality (such as outer.colA>inner.colA) – known as
thetajoin.
A variation is the semijoin (such as EXISTS) that do not
duplicate the outer rows even if multiple inner rows are
matching. And there is the antijoin (such as NOT IN) that
returns only rows that do not match. In addition to that, an
outer join will add some non-matching rows to an inner join.
SOUG Newsletter 2/2014
Of course, the SQL syntax is declarative and will not de-termine
the actual join order unless we force it with the LEAD-ING
hint.
The Oracle optimizer will choose the join order, joining an
outer rowsource to an inner table, the outer rowsource being
either the first table or the result from previous joins, aggre-gations
etc.
And for each join the Oracle optimizer will choose the join
method according to what is possible (Hash Join cannot do
thetajoin for example) and the estimated cost. We can force
another join method with hints as long as it is a possible one.
When we want to force a join order and method, we need to
set the order with the LEADING hint, listing all tables from
outer to inner. Then for each inner table (all the ones in the
LEADING hint except the first one) we will define the join
method, distribution etc. Of course, this is not a recommen-dation
for production. It is always better to improve cardinal-ity
estimations rather than restricting the CBO with hints. But
sometimes we want to see and test a specific execution plan.
The execution plans in this article were retrieved after the
execution in a session where statistics_level=all, using
dbms_xplan:
select * from from table(dbms_xplan.display_cursor(format=>’allstats’))
This shows the execution plan for the last sql statement
(we must be sure that no other statement is run in between,
such as when serveroutput is set to on in sqlplus).
For better readability, I’ve reproduced only the relevant
columns from executions plan.
Nested Loop Join
2. Tips&ceehinqstu 13
SOUG Newsletter 2/2014
All join methods will read one rowsource and lookup for
matching rows from the other table. What will differ among
the different join methods is the structure used to access ef-ficiently
to that other table. And an efficient access usually
involves sorting or hashing.
■ Nested Loop will use a permanent structure access,
such as an index that is already maintained sorted.
■ Sort Merge join will build a sorted buffer from the inner
table.
■ Hash Join will build a hash table either from the inner
table or the outer rowsource.
■ Merge Join Cartesian will build a buffer from the inner
table, which can be scanned several times.
Nested Loop is special in that it has nothing to do before
starting to iterate over the outer rowsource. Thus, it’s the
most efficient way to return the first row quickly. But because
it does nothing on the inner table before starting the loop, it
requires an efficient way to access to the inner rows: usually
an index access or hash cluster.
Here is the shape of the Nested Loop execution plan in
10g:
EXPLAINED SQL STATEMENT:
------------------------
select * from DEPT join EMP using(deptno) where sal>=3000
------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 13 |
| 1 | NESTED LOOPS | | 1 | 3 | 13 |
|* 2 | TABLE ACCESS FULL | EMP | 1 | 3 | 8 |
| 3 | TABLE ACCESS BY INDEX ROWID | DEPT | 3 | 3 | 5 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 3 | 3 | 2 |
------------------------------------------------------------------------
Execution Plan 1: Nested Loop Join in 10g
The inner table (EMP) had 3 rows (A-Rows=3) after apply-ing
the ‘sal>=3000’ predicate, and for each of them (Starts=3)
the DEPT has been accessed by a rowid, which is coming
from the index entry. The hints to force that join are:
LEADING(EMP DEPT) for the join order and USE_NL(DEPT)
to use Nested Loop when joining to the inner table DEPT.
Both have to be used, or the CBO may choose another plan.
The big drawback of Nested Loop is the time to access to
the inner table. Here we had to read 5 blocks (2 index
branch+leaf and 2 table blocks) in order to retrieve only 3
rows. If the outer rowsource returns more than a few rows,
then the Nested Loop is probably not an efficient method.
This can be found on the execution plan with the ‘A-Rows’
from outer rowsource that determines the ‘Starts’ to access
to the inner table.
The cost will be higher when the inner table is big (be-cause
of index depth) and when clustering factor is bad (more
logical reads to access the table). It can be much lower when
we don’t need to access to the table at all (index having all
required columns – known as covering index). But a nested
loop that retrieves a lot of rows is always something expen-sive.
Since 9i, prefetching is done when accessing to the inner
table in order to lower the logical reads. A further optimization
has been introduced by 11g with Nested Loop Join Batching.
A first loop retrieves a vector of rowid from the index and
a second loop retrieves the table rows with a batched I/O
(multiblock). It is faster but possible only when we don’t need
the result to be in the same order as the outer rowsource.
Here is the plan in 11g:
--------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 13 |
| 1 | NESTED LOOPS | | 1 | 3 | 13 |
| 2 | NESTED LOOPS | | 1 | 3 | 10 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 3 | 8 |
|* 4 | INDEX UNIQUE SCAN | PK_DEPT | 3 | 3 | 2 |
| 5 | TABLE ACCESS BY INDEX ROWID | DEPT | 3 | 3 | 3 |
--------------------------------------------------------------------------
Execution Plan 2: Nested Loop Join in 11g
The 12c plan is the same except that it shows the follow-ing
note: ‘this is an adaptive plan’. We will see that later.
Nested Loop can be used for all types of joins and is
efficient when we have few rows from outer rowsource that
can be matched to the inner table through an efficient access
path (usually index). It is the fastest way to retrieve the first
rows quickly (pagination except when we need all rows
before in order to sort them).
But when we have large tables to join, Nested Loop do
not scale. The other join method that can do all kind of joins
on large tables is the Sort Merge Join.
Sort Merge Join
Nested Loop can be used for all types of joins but is prob-ably
not optimal for inequalities, because of the cost of the
range scan to access the inner table:
EXPLAINED SQL STATEMENT:
------------------------
select /*+ leading(EMP DEPT) use_nl(DEPT) index(DEPT) */ * from
DEPT join EMP on(EMP.deptno between DEPT.deptno and DEPT.deptno+1 )
where sal>=3000
------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 13 |
| 1 | NESTED LOOPS | | 1 | 3 | 13 |
| 2 | NESTED LOOPS | | 1 | 3 | 11 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 3 | 8 |
|* 4 | INDEX RANGE SCAN | PK_DEPT | 3 | 3 | 3 |
| 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 3 | 3 | 2 |
------------------------------------------------------------------------
Execution Plan 3: Nested Loop Join with inequality
3. 14 Tips&techniques
Here I forced the plan to be a nested loop. But it can be
very bad if the range scan returns a lot of rows.
So without hinting, a Sort Merge Sort is chosen by the
CBO for that inequality join:
EXPLAINED SQL STATEMENT:
------------------------
select * from DEPT join EMP
on (EMP.deptno between DEPT.deptno and DEPT.deptno+10 ) where sal>=3000
------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | OMem |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 14 | |
| 1 | MERGE JOIN | | 1 | 5 | 14 | |
| 2 | SORT JOIN | | 1 | 3 | 7 | 2048 |
|* 3 | TABLE ACCESS FULL | EMP | 1 | 3 | 7 | |
|* 4 | FILTER | | 3 | 5 | 7 | |
|* 5 | SORT JOIN | | 3 | 5 | 7 | 2048 |
| 6 | TABLE ACCESS FULL | DEPT | 1 | 4 | 7 | |
------------------------------------------------------------------------
Execution Plan 4: Sort Merge Join
The sort merge will use sorted rowsets in order to do the
row matching without additional access. The outer rowsource
is read and because both are sorted in the same way, it is
easy to merge them. It is much quicker than a nested loop,
but has an additional overhead: the rows must be sorted.
The hints to force that plan are LEADING(EMP DEPT) USE_
MERGE(DEPT)
In this example, both rowsources had to be sorted (SORT
JOIN operation). This has an additional cost, and defers the
first row retrieving because the SORT is a blocking operation
(need to be completed before retrieving first result).
But when the outer row source is already ordered, and/or
when the result must be ordered anyway, the Sort merge Join
is an efficient method:
PLAN_TABLE_OUTPUT
EXPLAINED SQL STATEMENT:
------------------------
select * from DEPT join EMP using(deptno) where sal>=3000 order by deptno
-----------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | OMem |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 11 | |
| 1 | MERGE JOIN | | 1 | 3 | 11 | |
|* 2 | TABLE ACCESS BY INDEX
ROWID| EMP | 1 | 3 | 4 | |
| 3 | INDEX FULL SCAN | EMP_DEPTNO |1 | 14 | 2 | |
|* 4 | SORT JOIN | | 3 | 3 | 7 | 2048 |
| 5 | TABLE ACCESS FULL | DEPT | 1 | 4 | 7 | |
-----------------------------------------------------------------------
Execution Plan 5: Sort Merge Join without sorting
SOUG Newsletter 2/2014
Here because I have an index on EMP.DEPTNO, which is
ordered, there is no need to sort the outer rowsource. And
there is no need to sort the result of the join because the Sort
Merge Join returns rows in the same order as required by
ORDER BY.
However, even when already sorted, the inner rowsource
must always have a SORT JOIN operation, probably because
only that sort structure (in memory or tempfile) has the ability
to navigate backward when merging. So that join method
always have an overhead before being able to return rows,
even with sorted rowsources as input.
Merge join is very good for large rowsources, when one
rowsource is already ordered, and when we need an ordered
result. For example when joining multiple tables on the same
join predicate, one sort will benefit to all joins. It is the only
efficient join method for inequality joins on large tables.
But when we do an equijoin, hashing is probably a faster
algorithm than sorting.
Hash Join
We have seen that Sort Merge Join is used on large tables
because Nested Loop is not scalable. The problem was that
when we have n rows from the outer rowsource then the
Nested Loop has to access the inner table n times, and each
can involve 2 or 3 blocks through an index access. The best
we can do – in the case of equijoin - is to access through a
Single Table Hash Cluster where each access requires only
one block to read. But Hash clusters as a permanent struc-ture
is difficult to maintain for large tables especially when the
number of rows is difficult to predict. This is why it is not used
as much as indexes.
But in a similar way to the Sort Merge Join that does sort-ing
at each execution instead of accessing an ordered index,
the Hash Join can build a hash table at each execution. The
smallest rowset is used to build that, and then can be probed
efficiently a large number of times.
4. Tips&ceehinqstu 15
SOUG Newsletter 2/2014
Here is an example of a Hash Join plan:
EXPLAINED SQL STATEMENT:
------------------------
select * from DEPT join EMP using(deptno)
-----------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers | OMem |
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 15 | |
|* 1 | HASH JOIN | | 1 | 14 | 15 | 1321K |
| 2 | TABLE ACCESS FULL | DEPT | 1 | 4 | 7 |
| 3 | TABLE ACCESS FULL | EMP | 1 | 14 | 8 | |
-----------------------------------------------------------------------
Execution Plan 6: Hash Join
Note that this time DEPT is above EMP. With Hash Join,
inner and outer rowsource can be swapped. The smaller will
be used to build the hash table and is shown above the driv-ing
table.
The plan above, where EMP is still the outer rowsource, is
obtained with the following hints: LEADING(EMP DEPT)
USE_HASH(DEPT) and SWAP_JOIN_INPUTS(DEPT) telling
that DEPT – the inner table – will go above to be the built
(hash) table. We can read those hints as: we have EMP, we
join it to DEPT using a Hash Join and DEPT is the built table.
If we want EMP to be the built table, of course we can
declare the opposite hints: LEADING(DEPT EMP) USE_
HASH(EMP) SWAP_JOIN_INPUTS(EMP). But when we have
more than 2 tables we cannot just change the LEADING
order. Then, we will use the NO_SWAP_JOIN_INPUTS to let
the outer rowsource be the built table.
Merge Join Cartesian
There is another join method that we see rarely. All the join
methods we have seen above have an overhead coming from
the access to the inner table: follow the index tree, sort, hash
etc. What if both tables are so small that we prefer to scan it
as a whole instead of searching an elaborated access path?
We can imagine doing a Nested Loop from a small outer row-source
and FULL SCAN the inner table for each loop. But
there is better: FULL SCAN once, put it in a buffer and then
read the whole buffer for each outer rowsource row.
This is the Merge Join Cartesian:
EXPLAINED SQL STATEMENT:
------------------------
select /*+ leading(DEPT EMP) use_merge_cartesian(EMP) */ * from
DEPT join EMP using(deptno) where sal>3000
------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Buffers | OMem |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 15 | |
| 1 | MERGE JOIN CARTESIAN | | 1 | 1 | 15 | |
| 2 | TABLE ACCESS FULL | DEPT | 1 | 4 | 8 | |
| 3 | BUFFER SORT | | 4 | 1 | 7 | 2048 |
|* 4 | TABLE ACCESS FULL | EMP | 1 | 1 | 7 | |
------------------------------------------------------------------------
Execution Plan 7: Merge join Cartesian
Now the outer rowsource is DEPT, we read 4 rows from it
(‘A-Rows’). For each of them we read the buffer (‘Starts’=4)
but the FULL TABLE SCAN occurred only the first time
(‘Starts’=1). Note that BUFFER SORT – despites its name –
do not do any sorting. It is just buffered. The hints for that are
LEADING(EMP DEPT) USE_MERGE_CARTESIAN(DEPT).
That is a good join method when the inner table is small
but the join cardinality is high (e.g when for each outer row
most of inner rows will match).
12c new feature: Adaptive Join
We have seen at the beginning that in 12c a Nested Loop
was chosen by the optimizer, but mentioning that the plan is
adaptive. Here is the plan with the ‘+adaptive’ format of the
dbms_xplan:
EXPLAINED SQL STATEMENT:
------------------------
select * from DEPT join EMP using(deptno) where sal>=3000
------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 12 |
|- * 1 | HASH JOIN | | 1 | 3 | 12 |
| 2 | NESTED LOOPS | | 1 | 3 | 12 |
| 3 | NESTED LOOPS | | 1 | 3 | 9 |
|- 4 | STATISTICS COLLECTOR | | 1 | 3 | 7 |
| * 5 | TABLE ACCESS FULL | EMP | 1 | 3 | 7 |
| * 6 | INDEX UNIQUE SCAN | PK_DEPT | 3 | 3 | 2 |
| 7 | TABLE ACCESS BY
INDEX ROWID | DEPT | 3 | 3 | 3 |
|- 8 | TABLE ACCESS FULL | DEPT | 0 | 0 | 0 |
------------------------------------------------------------------------
Execution Plan 8: Adaptive Join resolved to Nested Loop
Most of the time, with an equijoin, the choice between
Nested Loop and Hash Join will depend on the size of the
outer rowsource.
If it is overestimated, then we risk doing a Hash join with a
Full Table Scan of the inner table – that is not efficient when
we match only few rows. But underestimating can be worse:
the risk is to do millions of nested loops to access a table.
So 12c defers that choice to the first execution time,
where a STATISTICS COLLECTOR will buffer and count the
rows coming from the outer rowsource until it reaches the
threshold cardinality.
5. 16 Tips&techniques
Then if reached, it will switch to a Hash Join (the outer
rowsource becoming the built table). The threshold is calculated
at parse time where the CBO calculates the cost
for Nested Loop and Hash Join for different cardinalities
(dichotomy until Nested Loop cost is higher than Hash join
cost).
The inflection point can be seen in the optimizer trace
(event 10053). In my example it shows:
Found point of inflection for NLJ vs. HJ: card = 9.30
meaning that execution will switch to Hash Join when
more than 10 rows are returned from EMP.
Here is the execution plan for the same query where the
actual number of rows is 1000 instead of 3
EXPLAINED SQL STATEMENT:
------------------------
select * from DEPT join EMP using(deptno) where sal>=3000
------------------------------------------------------------------------
| Id | Operation | Name | Starts | A-Rows | Buffers |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 14 |
| * 1 | HASH JOIN | | 1 | 1000 | 14 |
|- 2 | NESTED LOOPS | | 1 | 1000 | 7 |
|- 3 | NESTED LOOPS | | 1 | 1000 | 7 |
|- 4 | STATISTICS COLLECTOR | | 1 | 1000 | 7 |
| * 5 | TABLE ACCESS FULL | EMP | 1 | 1000 | 7 |
|- * 6 | INDEX UNIQUE SCAN | PK_DEPT | 0 | 0 | 0 |
|- 7 | TABLE ACCESS BY
INDEX ROWID | DEPT | 0 | 0 | 0 |
| 8 | TABLE ACCESS FULL | DEPT | 1 | 4 | 7 |
------------------------------------------------------------------------
Execution Plan 9: Adaptive Join resolved to Hash Join
The first 10 rows have been buffered, and then the nested
loop has been discarded in favor of hash join. That inactivates
the 1000 table access by index that would be necessary in
nested loop.
Note that the choice occurs only on the first execution.
Once the first execution has been done, the same plan will be
used by the subsequent executions. We can see than in
V$SQL:
■ IS_RESOLVED_ADAPTIVE_PLAN=N when plan is
adaptive but first execution did not occur yet
■ IS_RESOLVED_ADAPTIVE_PLAN=Y when the choice
has been done – after EXECUTIONS >= 1
So this is a solution when estimation was bad, but not a
solution when data changes. Another 12c feature, Automatic
Reoptimization, is there to adapt for subsequent executions.
SOUG Newsletter 2/2014
Conclusion
Each join method has its use cases where it is efficient:
■ Nested Loop Join is good when we want to retrieve few
rows for which we have a fast access path. This is
typical of OLTP. Proper indexing is the key to good
performance.
■ Hash Join is good when we have large tables, typical
of reporting or BI, especially when the smallest table fits
in memory. But it is available only for equijoins.
■ Merge Join Cartesian should be seen only with small
tables and with a high multiplicity for the join.
■ Sort Merge Join can be seen when the sorting is not
a big overhead, or for thetajoins where hash join is not
possible.
About resources, Nested Loops is more about indexes,
buffer cache (SGA), flash cache and single block reads. Sort
Merge and Hash Joins are more about full table scan, multi-block
reads, direct reads/smartscan and workarea (PGA)
and tempfiles.
The choice among the possible methods depends mainly
on the size of the tables. The estimation can be quite good
for a two table join with accurate object statistics, and/or
dynamic sampling. But after several joins and predicate
filters, the error made by the optimizer may be large, leading
to an inefficient execution plan.
Some cardinality estimations can be more accurate at
execution time. This is why 10g introduced bind variable
peeking, and 11g came with Adaptive Cursor Sharing. And
now 12c goes a step further with Adaptive Plans where a plan
chosen for small rowsource can be adapted when the actual
number of rows is higher than expected. ■
Contact
dbi services
Franck Pachot
E-Mail:
franck.pachot@dbi-services.com