1.6 PI Mechanics.pdf

Primary Index Mechanics
After completing this module, you will be able to:
• Explain the role of the hashing algorithm and the hash map in
locating a row.
• Explain the makeup of the Row ID and its role in row storage.
• Describe the sequence of events for locating a row given its PI
value.

Hashing Primary Index Values
Hashing
Algorithm
RH Data
Row Hash PI values
DSW and data
PARSER
Data Table
Message Passing Layer (Hash Maps)
AMP 1 AMP n - 1
AMP x
... ...
AMP 0 AMP n
PI value = 38
Hashing
Algorithm
1177 7C3C
SQL with primary index values
and data.
For example:
Assume PI value is 38
Summary
The MPL uses the DSW of
1177 and uses this value to
locate bucket #1177 in the
Hash Map.
Bucket# 1177 contains the
AMP number that has this
hash value – effectively the
AMP with this row.
DSW
Hash Maps
AMP #
Row ID Row Data
Row Hash Uniq Value
x '00000000'
x'1177 7C3C' 0000 0001 38
x 'FFFFFFFF'

Hashing Down to the AMPs
Index value(s)
hashing algorithm
Hash Map
AMP #
The hashing algorithm is designed to insure even distribution of
unique values across all AMPs.
Different hashing algorithms are used for different international
character sets.
A Row Hash is the 32-bit result of applying a hashing algorithm to
an index value.
The DSW or Hash Bucket is represented by the high order 16 bits
of the Row Hash.
A Hash Map is uniquely configured for each system.
It is a array of 65,536 entries (buckets) which associates bucket
numbers with specific AMPs.
Two systems with the same number of AMPs will have the same
Hash Map.
Changing the number of AMPs in a system requires a change to
the Hash Map.
{
{
{
{
DSW or
Hash Bucket #
Row Hash

A Hashing Example
Order
Order
Number
PK
UPI
Customer
Number
Order
Date
Order
Status
7325 2 4/13 O
7324 3 4/13 O
7415 3 4/13 O
7415 1 4/13 C
7103 1 4/10 O
7225 2 4/15 C
7384 1 4/12 C
7402 3 4/12 C
7188 1 4/13 C
7202 2 4/09 C
SELECT * FROM order
WHERE order_number = 7202;
7202
Hashing Algorithm
691B 14AE
32 bit Row Hash
Remaining 16 bits
Destination Selection Word
0110 1001 0001 1011 0001 0100 1010 1110
6 9 1 B

The Hash Map
7202 Hashing Algorithm
(Hexadecimal)
691B 14AE
HASH MAP
07 06 07 06 07 04 05 06 05 05 14 09 14 13 03 04
15 08 02 04 01 00 14 14 03 02 03 09 01 00 02 15
01 00 15 11 14 14 13 13 14 14 08 09 15 10 09 09
07 06 15 13 11 06 15 08 15 15 08 08 11 07 05 10
04 12 11 13 05 10 07 07 03 02 11 04 01 00 11 13
11 11 12 10 03 02 06 13 01 00 06 05 07 06 05 12
0 1 2 3 4 5 6 7 8 9 A B C D E F
690
691
692
693
694
695
32 bit Row Hash
Remaining 16 bits
Destination Selection Word
0110 1001 0001 1011 0001 0100 1010 1110
6 9 1 B
AMP 9
7202 2 4/09 C
Note: This partial Hash Map is based on a 16 AMP system and AMPs are shown in decimal format.

Identifying Rows
Consideration #1
A Row Hash = 32 bits = 4.2 billion possible
values
Because there is an infinite number of
possible data values, some data values will
have to share the same row hash.
Hash Algorithm
1254 7769
10A2 2936 10A2 2936 Hash Synonyms
Data values input
Consideration #2
A Primary Index may be non-unique (NUPI).
Different rows will have the same PI value
and thus the same row hash.
A row hash is not adequate to uniquely identify a row.
Conclusion
A row hash is not adequate to uniquely identify a row.
Hash Algorithm
(John)
'Smith'
0016 5557
(Dave)
'Smith' NUPI Duplicates
Rows have
same hash
0016 5557

The Row ID
To uniquely identify a row, we add a 32-bit uniqueness value.
The combined row hash and uniqueness value is called a Row ID.
Row Hash
(32 bits)
Uniqueness Id
(32 bits)
Row ID
Each stored row
has a Row ID as a
prefix.
Rows are logically
maintained in Row
ID sequence.
Row ID Row Data
3B11 5032 0000 0001 1018 Reynolds Jane
3B11 5032 0000 0002 1020 Davidson Evan
3B11 5032 0000 0003 1031 Green Jason
3B11 5033 0000 0001 1014 Jacobs Paul
3B11 5034 0000 0001 1012 Chevas Jose
3B11 5034 0000 0002 1021 Carnet Jean
: : : : :
Row Hash Unique ID Emp_No Last_Name First_Name
Row ID Row Data

Storing Rows (1 of 2)
Assumptions:
Last_Name is defined as a NUPI.
All rows in this example hash to the same AMP.
Add a row for 'John Smith'
'Smith' Hash Algorithm 0016 5557 Hash Map AMP #3
Row ID Row Data
Row Hash Unique ID Last_Name First_Name Etc.
0016 5557 0000 0001 Smith John
Add a row for 'Sam Adams'
'Adams' Hash Algorithm 1058 9829 Hash Map AMP #3
Row ID Row Data
0016 5557 0000 0001 Smith John
1058 9829 0000 0001 Adams Sam

Storing Rows (2 of 2)
Add a row for 'Fred Smith' - (NUPI Duplicate)
Row ID Row Data
0016 5557 0000 0001 Smith John
0016 5557 0000 0002 Smith Fred
1058 9829 0000 0001 Adams Sam
'Smith' Hash Algorithm 0016 5557 Hash Map AMP #3
Add a row for 'Dan Jones' - (Hash Synonym)
'Jones' Hash Algorithm 0016 5557 Hash Map AMP #3
Row ID Row Data
0016 5557 0000 0001 Smith John
0016 5557 0000 0002 Smith Fred
0016 5557 0000 0003 Jones Dan
1058 9829 0000 0001 Adams Sam
Given the row hash, what other information would be needed to find the 'Dan Jones' row?
… The 'Fred Smith' row?

Locating a Row On An AMP Using a PI
Locating a row on an AMP
requires three input elements:
1. The Table ID
2. The Row Hash of the PI
3. The PI value itself
Cyl 1
Index
Cyl 2
Index
Cyl 3
Index
Cyl 4
Index
Cyl 5
Index
Cyl 6
Index
Cyl 7
Index
M
a
s
t
e
r
I
n
d
e
x
Data Row
Data Row
DATA
BLOCK
AMP #3
Cylinder #
PI Value
Master
Index
Cylinder
Index
Data
Block
Table Id
Row Hash
Table Id
Row Hash
Cylinder #
Row Hash
PI Value
Cylinder #
Data Block Address
Data Row
START WITH: FIND:
APPLY TO:
Table ID
Row Hash

Review Questions
Fill in the Blanks
1. The output of the hashing algorithm is called the _____ _____.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the ________ number.
3. Two different PI values which hash to the same value are called Hash ___________ .
4. A Row ID consists of a row hash plus a ____________ value.
5. A uniqueness value is required to produce a unique Row ID because of _______
_________ and ______ ___________ .
6. Once the target AMP has been determined for a PI search, the _______ ________ for that
AMP must be consulted.
7. The Cylinder Index points us to the address and length of the data _______ .

Review Question Answers
Fill in the Blanks
1. The output of the hashing algorithm is called the Row Hash.
2. To determine the target AMP, the Message Passing Layer must lookup an entry in the
Hash Map based on the DSW or bucket number.
3. Two different PI values which hash to the same value are called Hash Synonyms .
4. A Row ID consists of a row hash plus a uniqueness value.
5. A uniqueness value is required to produce a unique Row ID because of hash synonyms
and NUPI duplicates .
6. Once the target AMP has been determined for a PI search, the Master Index for that AMP
must be consulted.
7. The Cylinder Index points us to the address and length of the data block .

1.6 PI Mechanics.pdf

Recommended

Recommended

More Related Content

Similar to 1.6 PI Mechanics.pdf

Similar to 1.6 PI Mechanics.pdf (20)

More from ssuser8b6c85

More from ssuser8b6c85 (10)

Recently uploaded

Recently uploaded (20)

1.6 PI Mechanics.pdf