Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
1.5 PI Access.pdf
1. Storing and Accessing Data Rows
After completing this module, you will be able to:
• Explain the purpose of the Primary Index
• Distinguish between Primary Index and Primary Key
• State the reasons for selecting a UPI vs. a NUPI
2. Storing Rows
Table A rows
Table B rows
AMP AMP AMP AMP
• The rows of every table are distributed among all AMPs
• Each AMP is responsible for a subset of the rows of each table.
• Ideally, each table will be evenly distributed among all AMPs.
• Evenly distributed tables result in evenly distributed workloads.
• The uniformity of distribution of the rows of a table depends on the choice of
the Primary Index.
Note:
The acronym AMP is used to refer to both V1 AMPs and V2 AMP vprocs.
However, this course will assume AMPs are V2 vprocs.
3. Creating a Primary Index
• A Primary Index is defined at table creation.
• It may consist of a single column, or a combination of columns
– Limit of 16 columns with V2R4.1 and prior releases
– Limit of 64 columns with V2R5.
CREATE TABLE sample_1
(col_a INTEGER
,col_b INTEGER
,col_c INTEGER)
UNIQUE PRIMARY INDEX (col_b);
UPI If the index choice of column(s) is unique,
we call this a UPI (Unique Primary Index).
A UPI choice will result in even distribution
of the rows of the table across all AMPs.
CREATE TABLE sample_2
(col_x INTEGER
,col_y INTEGER
,col_z INTEGER)
PRIMARY INDEX (col_x);
NUPI If the index choice of column(s) isn’t
unique, we call this a NUPI (Non-Unique
Primary Index).
A NUPI choice will result in even
distribution of the rows of the table
proportional to the degree of uniqueness of
the index.
Note: Changing the choice of Primary Index
requires dropping and recreating the table.
4. Primary Index Values
• The value of the Primary Index for a specific row determines the AMP
assignment for that row.
• This is done using a hashing algorithm.
PE
Row assignment
Row access
Hashing
Algorithm
AMP AMP AMP
PI Value
• Accessing the row by its Primary Index value is:
– always a one-AMP operation
– the most efficient way to access a row
Other table access techniques:
• Secondary index access
• Full table scans
5. Accessing Via a Unique Primary Index
A UPI access is a one-AMP operation which may access at most a single row.
CREATE TABLE sample_1
(col_a INTEGER
,col_b INTEGER
,col_c INTEGER)
UNIQUE PRIMARY INDEX (col_b);
SELECT col_a
,col_b
,col_c
FROM sample_1
WHERE col_b = 345;
PE
Hashing
Algorithm
AMP
UPI = 345
AMP AMP
col_a col_b col_c
123
234
col_a col_b col_c
345
456
col_a col_b col_c
567
678
6. Accessing Via a Non-Unique Primary Index
A NUPI access is a one-AMP operation which may access multiple rows.
CREATE TABLE sample_2
(col_x INTEGER
,col_y INTEGER
,col_z INTEGER)
PRIMARY INDEX (col_x);
SELECT col_x
,col_y
,col_z
FROM sample_2
WHERE col_x = 25;
PE
Hashing
Algorithm
AMP
NUPI = 25
AMP AMP
col_x col_y col_z
10 30 A
10 30 B
35 40 B
col_x col_y col_z
20 50 A
25 55 A
25 60 B
col_x col_y col_z
5 70 B
30 80 B
30 80 A
Both UPI and NUPI
accesses are one
AMP operations.
7. Primary Keys and Primary Indexes
• Indexes are conceptually different from keys.
• A PK is a relational modeling convention which allows each row to be uniquely identified.
• A PI is a Teradata convention which determines how the row will be stored and accessed.
• A significant percentage of tables may use the same columns for both the PK and the PI.
• A well-designed database will use a PI that is different from the PK for some tables.
Primary Key Primary Index
Logical concept of data modeling Physical mechanism for access and storage
Teradata doesn’t need to recognize Each table must have exactly one primary index
No limit on number of columns 16 column limit (V2R4.1); 64 column limit (V2R5)
Documented in data model Defined in CREATE TABLE statement
(Optional in CREATE TABLE)
Must be unique May be unique or non-unique
Identifies each row May be unique or non-unique
Values should not change Values may be changed (Delete + Insert)
May not be NULL – requires a value May be NULL
Does not imply an access path Defines most efficient access path
Chosen for logical correctness Chosen for physical performance
8. Duplicate Rows
A duplicate row is a row of a table whose
column values are all identical to
another row in the same table.
col_a col_b col_c
20 50 A
25 50 A
25 50 A
Duplicate Rows
• Because a PK uniquely identifies each row, ideally a relational table should
not have duplicate rows!
• The ANSI standard, however, permits duplicate rows for specialized
situations, thus Teradata permits them as well.
• You may select whether your table will or will not allow them.
* Note: If a UPI is selected on a SET table, the duplicate row check is replaced by a
check for duplicate index values.
CREATE SET TABLE table_A
:
:
CREATE MULTISET TABLE table_B
:
:
Checks for * and disallows duplicate rows. Doesn’t check for and allows duplicate rows.
The Teradata default The ANSI default
9. Row Distribution Using a UPI – Case 1
Notes:
• Often, but not always, the PK column(s) will
be used as a UPI.
• PI values for Order_Number are known to be
unique (it’s a PK).
• Teradata will distribute different index
values evenly across all AMPs.
• Resulting row distribution among AMPs is
very uniform.
• Assures maximum efficiency for parallel
operations.
AMP AMP AMP AMP
o_# c_# o_dt o_st
7202 2 4/09 C
7415 1 4/13 C
o_# c_# o_dt o_st
7325 2 4/13 O
7103 1 4/10 O
7402 3 4/16 C
o_# c_# o_dt o_st
7188 1 4/13 C
7225 2 4/15 C
o_# c_# o_dt o_st
7324 3 4/13 O
7384 1 4/12 C
O rd er
N u m b er
C u sto m er
N u m b er
O rd er
D ate
O rd er
S tatu s
P K
U P I
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
Order
10. Row Distribution Using a NUPI – Case 2
Notes:
• Customer_Number may be the preferred
access column for ORDER table, thus a good
index candidate.
• Values for Customer_Number are somewhat
non-unique.
• Choice of Customer_Number is therefore a
NUPI.
• Rows with the same PI value distribute to the
same AMP.
• Row distribution is less uniform or skewed.
o_# c_# o_dt o_st
7325 2 4/13 O
7202 2 4/09 C
7225 2 4/15 C
o_# c_# o_dt o_st
7384 1 4/12 C
7103 1 4/10 O
7415 1 4/13 C
7188 1 4/13 C
o_# c_# o_dt o_st
7402 3 4/16 C
7324 3 4/13 O
AMP AMP AMP AMP
O rd er
N u m b er
C u sto m er
N u m b er
O rd er
D ate
O rd er
S tatu s
P K
N U P I
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
Order
11. Row Distribution Using a Highly Non-Unique Primary
Index (NUPI) – Case 3
O rd er
N u m b er
C u sto m er
N u m b er
O rd er
D ate
O rd er
S tatu s
P K
N U P I
7325
7324
7415
7103
7225
7384
7402
7188
7202
2
3
1
1
2
1
3
1
2
4/13
4/13
4/13
4/10
4/15
4/12
4/16
4/13
4/09
O
O
C
O
C
C
C
C
C
Order Notes:
• Values for Order_Status are “highly” non-
unique.
• Choice of Order_Status column is a NUPI.
• Only two values exist, so only two AMPs
will ever be used for this table.
• Table will not perform well in parallel
operations.
• Highly non-unique columns are poor PI
choices generally.
• The degree of uniqueness is critical to
efficiency.
AMP AMP AMP AMP
o_# c_# o_dt o_st
7402 3 4/16 C
7202 2 4/09 C
7225 2 4/15 C
7415 1 4/13 C
7188 1 4/13 C
7384 1 4/12 C
o_# c_# o_dt o_st
7103 1 4/10 O
7324 3 4/13 O
7325 2 4/13 O
12. Review Questions
For each statement, indicate whether it applies to:
UPI’s, NUPI’s, or Either
_______ 1. Specified in CREATE TABLE statement
_______ 2. Provides uniform distribution via the hashing algorithm
_______ 3. May be up to 64 columns in V2R5
_______ 4. Always a one-AMP operation
_______ 5. Access will return (at most) a single row
_______ 6. Used to assign a row to a specific AMP
_______ 7. Allows a null or nulls
_______ 8. Required on every table
_______ 9. Permits duplicate rows
_______ 10. Used as a Primary Key implementation
13. Review Question Answers
For each statement, indicate whether it applies to:
UPI’s, NUPI’s, or Either
Either 1. Specified in CREATE TABLE statement
UPI 2. Provides uniform distribution via the hashing algorithm
Either 3. May be up to 64 columns in V2R5
Either 4. Always a one-AMP operation
UPI 5. Access will return (at most) a single row
Either 6. Used to assign a row to a specific AMP
Either 7. Allows a null or nulls
Either 8. Required on every table
NUPI 9. Permits duplicate rows
UPI 10. Used as a Primary Key implementation