Hash join

Hash Join
Background
It is very common that people says hash join is good for joining huge table while nested loop is good for small
table. I will not give any comment regarding that statement, but to me, the fundamental and the main
different between hash join and nested loop is that in nested loop we can look-up the inner table using any
value from the outer table (and get benefit of index access if any) while in hash join we cannot do that. The
only possibility to look-up to the inner table (probe table in term of hash join) is by using constant value.
In below example, actually we can divide the query into 2 separate queries.
SELECT /*+ leading(a) use_hash(b) */ a.*, b.*
FROM tbuild a, tprobe b
WHERE a.id = b.id;

Query 1:
SELECT a.* FROM tbuild;

Query 2:
SELECT b.* FROM tprobe;

So, even though we have index on ID column in those 2 tables, Oracle won’t be able to use that index access
(instead of full table scan).
In general, there are only 2 major steps in performing hash join:
1. Build hash table from 1 table based on pre-defined hash function
2. Probe the other table to get the result
Based on how Oracle do those 2 steps , there are 3 types of hash join:
1. Optimal
2. Onepass
3. Multipass
The objective of this article is to see the different between those 3 types along with few other scenarios to see
how Oracle handle it.
During the build phase, Oracle will create in-memory hash buckets (we may call it as hash table). The number
of hash bucket should be more than enough to avoid hash collision. Technically the hash buckets are split into
several partitions. In every partition there are several slots and those slots will be having several blocks. If I can
give analogy, it is very close to partition table. Apart from that, there is bitmap structure that maintain the slot
usage.

In-Memory Hash Table

Table Segment

Hash partition

Table partition

Every partition consists several slots

Every segment consists several extents

Every slot consists several blocks

Every extent consists several blocks

Start the Exercise
In this first 3 exercises I will use below attached script (create_tables.txt) to build the required tables.
Each table has 10,000 unique rows. To get the details information on hash join operation, we need to turn on
event 10104. Beside that we need to change ”workarea_size_policy” to MANUAL and we need to
configure the ”hash_area_size” to create the required scenarios (ibegin.sql).

create_tables.txt

ibegin.sql

The query for this exercise is:
SELECT /*+ leading(a) */ a.*, b.*
FROMtbuild a, tprobe b
WHERE a.id=b.id;

From above execution plan, the join will produces 10,000 rows and requires around 2 MB memory for hash
join. The size of hash join can be calculated as below. The query selects all columns in those 2 tables.Total size
of the columns are 201 + 4 = 205. We have 10,000 rows in the table, so in total we require 205 * 10,000 / 1,024
= 2,000 kB.

Hash Join - Optimal
workarea_size_policy = MANUAL
hash_area_size = 10485760 (10 MB)

Optimal hash join is the best type, where Oracle doesn’t requires any temporary space to store the hash
bucket. Everything is done in the memory since there is enough memory to do that.

Let’s analyze the statistics which we can see in above section of trace file.
In the first 3 lines after “Join Type” we can see the size of memory:
o Hash Area is 10,304,443
o

Slot Table is 9,478,144
This value is calculated later as 13 * 712 * 1,024, where:
 13 is number of slot
 712kB is size of each slot
 1,024 of course size of 1 kB

o

Overhead is 826,299
This is calculated as Hash Area – Slot Table
Number of slot/ cluster is 13
Number of partition is 8
Number of block in every slot is 89
Block size is 8 kB
Slot size is 89 * 8 kB = 712 kB
Bitmap size for each partition is 64 kB
Bitmap for all partitions is 8 * 64 kB
Size of row is 220
The size for overhead is approximately 15 bytes in this table (220 – 205).
Next in below section, the Build phase is started and since we have the optimal hash join, we cannot see any
operation in the temporary tablespace.

Later is the Probe phase. Let’s check few statistics from the trace file:
All partitions (8) are fitted and available in memory
Only 8 out of 13 slots are used
In the “Partition Distribution” we can see:
I take 1 line as example

o

Number of rows across all partitions

o

How many number of cluster/ slot in each partition

o
o

How many number of slot is available in the memory for each partition
The status,an indication whether the partition is still in memory or not
 If all partitions have kept=1  optimal hash join
 If all partitions have kept=0 multipass hash join
 If at least 1 partition has kept=1 onepass hash join

The number of bucket is 16,384
This value is 2^14 which is the closest one to 10,000  with this value, we are sure that hash collision
will not happened

Partitions
distribution

Below part shows the histogram of number of rows inside the hash bucket.

The last part is the overall statistics information which are quite self-explanation. The most interested part is
the first line after the title. From 16,384 available buckets, Oracle only use 7,538 buckets (the other 8,846
buckets are empty). That means there are some buckets that hold more than 1 row value (it can be saw in the
above histogram as well). So the hash function is efficient enough to manage/ address more than 1 row into
the bucket without any collision.
Number of rows

Lastly, below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – optimal” is increasing by 3, where actually only 1 is relevant for above
query (the other scenarios also show the increment of 2 in this statistic as well, so we should consider only 1).
There is no temporary tablespace activity in this test case.

Hash Join - Onepass
hash_area_size = 1572864 (1.5 MB)

In onepass type, Oracle needs to dump the data in the temporary tablespace due to insufficient hash area
memory, but when probing the second table, for each available partitions, Oracle only iterates ONCE. This is
why it is called onepass.

In the above capture, the important point is number of block in each slot, which is only 13 (compare to 89 in
the optimal type). It makes the slot size is 104 kB since there are 13 slots in this exercise. The reason why
Oracle reduce the number of block is to manage at least 1 slot in every partition is available in the memory.
The next section (Build phase) is quite interested, Oracle start spilling the data to temporary tablespace.

Once Build phase completed, Oracle start the Probe phase as below. Let’s highlight few points:
In first operation, only 6 slots are in memory
2 partitions are in memory
2,481 rows are processed
Above 3 items are also expressed in below capture

Number of bucket is 4,096
Again, this value is 2^12 (so we can conclude that the value is the closest power of 2)
All 13 slots are used

In the trace file output we see a lot of writing and reading, and we see new section in Probe phase like below
(HASH JOIN GET FLUSHED PARTITIONS). This is the process of reading back the build table and then continue
with probing the second table to get the result. Oracle will do this operation for rest of partitions, and since
the memory is not sufficient to do the operation in one shoot, Oracle will iterates the operation.
We can see clearly in below partition, 1,224 rows are being processed from build and probe table (which is
Partition: 0 if we trace back to the initial step of Probe phase), and at the end of iteration, the number of rows
left to be iterated over is 0.

These is the list of all iterations in this test case. Not sure why Oracle didn’t do the operation in ordered
fashion (from Partition 0 to 5 or from Partition 5 down to 0).

This is the overall statistics for onepass type (not all rows are showed).

Below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – onepass” is increasing by 1 (we can ignore the increment of “workarea
executions – optimal” as I mentioned before). There is balanced activity between read and write against
temporary tablespace, which means Oracle only write once to temporary tablespaceand read once from
temporary tablespace.

Hash Join - Multipass
hash_area_size = 131072 (128 kB)

In multipass type, Oracle also needs to dump the data in the temporary tablespace due to insufficient hash
area memory, but when probing the second table, for each available partitions, Oracle will iterates SEVERAL
times. This is why it is called multipass. This is the least efficient type of hash join.

In above output, every slot has single block only, so the size of slot is 8 kB. The total memory for slot table is 14
* 8 * 1024 = 114,688. The Build probe is getting longer since the size of slot is less (for efficiency please go to
the attached trace file if you want to know how long the “writing” activity of Build phase .

This is few capture of Probe phase for multipass type.

Again power of 2

Below is the details iteration for Partition 0. There are 4 iterations for processing 1,224 rows in this partition.
Theoretically if we want to change the type to onepass operation, we need to multiply “hash_area_size” by 4,
so in this case we need to configure at least 131,072 * 4 = 524,288 (512 kB).

Below is the output of autotrace from SQL*Plus session along with few session statistics. We can see the
statistic for “workarea executions – multipass” is increasing by 1 (we can ignore the increment of “workarea
executions – optimal” as I mentioned before). There is imbalance activity between read and write against
temporary tablespace. Oracle does the read part more compare to write part.

Which One Should Be The Build Table
Now let’s create another test case which will shows us the impact of build table’s size to the performance and
memory size. The configuration will be like below:
Create 2 tables, one table with 10,000 rows (TBIG) and the other with 2,500 rows (TSMALL). Both the
tables has 100 distinct values
Set “workarea_size_policy” to MANUAL
Set “hash_area_size” to 3 MB
Create 2 scenarios, first scenario will uses TBIG as build table and the second will uses TSMALL as build
table
The complete table creation script is attached

create_tables2.txt

The trace files output are attached

These are the execution plan for both queries. The consistent gets is bigger when we use TSMALL as build
table.(unfortunately I didn’t turn on event 10200 so I don’t know from where those consistent gets are
coming).The reason behind this symptom can be explained if we turn on event 10200 (for dumping consistent
gets). Please find below attached excel for the details of consistent gets, along with its trace files.

DBA series - Hash
Join.xlsx

Again, I attach below statistics from TBIG and TSMALL table. The next capture is summary of consistent gets
for both scenarios (TSMALL as “build” table and TBIG as “build” table)

TSMALL as “build” table

TBIG as “build” table

During Build phase, what Oracle reads all rows from “build” table. So in case of TBIG, Oracle requires 1,000
consistent gets (number of block in TBIG) and in case of TSMALL, Oracle requires 292 consistent gets (yet I
cannot explain the 78 different in this case)
The next Probe phase is more interesting,instead of loading all available blocks in“probe” table, it looks like
Oracle reads the second table row by row, 1 consistent get for single row. The result is 10,000 consistent gets
when we use TSMALL as “build” table (there are 10,000 rows in TBIG). In case of TBIG as “build” table, Oracle
requires 2,542 consistent gets (there are 2,500 rows in TSMALL  again I cannot explain the 42 different, but
during the test I filled-up the buffer by doing full table scan against TBIG and TSMALL, not sure if this was the
RC).
Apart from that, everything is similar.

Now let’s analyze the trace file to see the different from hash memory configuration and components.

From above comparison, we see Oracle works more efficient when the build table is small. We can see the
number of block in the slot is bigger and the number of bucket is smaller. It makes the overall memory
consumption is smaller when we have smaller build table.

Before we go to the conclusion that smaller build table is better than the bigger one, let’s retry the test case
with “workarea_size_policy” = AUTO which is default and recommended by Oracle.

Again we see that the memory consumption is better for smaller build table. But in this time, Oracle decided to
configure more blocks for each slot when bigger build table is used (this is in the reverse way if we compare to
previous test case when we set MANUAL for“workarea_size_policy”)

Conclusion
1. Multipass hash join is the most in-efficient type, and we can change it to, at least, onepass by multiply
the “hash_area_size“ by the number of iterations in one of hash partition.
2. Smaller table is always good as starting point for Build table until unless you see significant downgrade
in the performance. Small table will leads to smaller in-memory hash table.
2. Smaller table is not always good as “build” table, it depends ;-)
o It is good as “build”table as it requires smaller in-memory hash table to start the join
o In the other hand, it generates more consistent gets (again it depends of the size of Probe
table, the number of rows)
Saying this table is good as “build”table, or that table is not good (without confirmed by the number) is
not wise.

-heri-

Hash join

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Hash join

Similar to Hash join (20)

More from Heribertus Bramundito

More from Heribertus Bramundito (9)

Recently uploaded

Recently uploaded (20)

Hash join