%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
8 i index_tables
1. Deploying, Managing, and Administering the Oracle Internet Platform
INDEX-ORGANIZED TABLES IN ORACLE8I :
APPLICATIONS AND PERFORMANCE ANALYSIS
Jagannathan Srinivasan, Oracle Corporation
ABSTRACT
This paper covers the basics of index-organized tables in Oracle8i, emphasizing its latest features that include
secondary index support, partitioning, key compression, and LOBs. The paper highlights index-organized tables as a
scalable, high-performance, and high-availability solution for OLTP, Internet, Electronic Commerce, and Data
Warehousing Applications. With brief examples of current usage, this paper presents a comparative performance
analysis of index and heap-organized tables. The paper concludes with tips and techniques for adopting index-organized
Paper #256 / Page 1
tables.
INTRODUCTION
Oracle8 introduced ORGANIZATION INDEX storage option for tables. Tables created using this option, referred
to as index-organized tables are stored in a B-tree index and include not only the indexed columns, but also all the
remaining columns of the table. Each row consists of key and non-key columns, making the whole table structure
have an index-organization. Therefore, index-organized tables do not incur additional I/O overhead to access non-key
columns, unlike a conventional table’s index-based scan. Typically, the entire table data is held in its primary key index.
The benefits of this organization are:
· it provides fast random access using the primary key because an index-only scan is sufficient. Once a leaf block is
reached, both the key as well as the non-key columns can be retrieved.
· it provides fast range access using the primary key because the rows are clustered in primary key order and they
contain both key and non-key columns.
· it avoids duplication of primary key columns by combining primary index with table data.
Even though the index-organized table has an index structure, it can be accessed using SQL just like conventional
tables. All the new SQL extensions are in data definition language (DDL). Once these tables are created, they behave
and operate just like conventional tables using SQL.
In addition to the basic support, index-organized tables in Oracle8i include
· support for an overflow segment, which provides supplementary storage for columns and allows controlling the
placement of columns in the index vs. overflow segment. It provides capability for tuning the number of rows that
fit in an index leaf block. Users can push out infrequently accessed non-key columns to the overflow segment, by
(1) specifying the percentage of space reserved for an index-organized table row in the index block, and/or (2)
specifying a column at which an index-organized table row should be divided into index and overflow portions.
This increases the leaf index row density, that is, the number of index rows that can fit in a leaf block of the B+-tree
structure.
· support for compressing common prefixes of the primary key. Since the rows are clustered in the primary key
order, there is more likelihood of finding common prefixes. Similar compression support is also available for
secondary indexes.
· support for logical secondary indexes, which include primary key as well as a physical leaf block address that is
treated as a guess to where the row may be found. If the guess is correct, access incurs a single block I/O.
However, if the guess is incorrect, the primary key is used to find the row. Having logical secondary indexes
enables faster reorganization of the table as the secondary indexes need not be updated during reorganization.
2. Deploying, Managing, and Administering the Oracle Internet Platform
· support for online reorganization of index-organized tables, that is, the index-organized table itself can be
Paper #256 / Page 2
reorganized while DML operations are in progress thereby providing 24X7 availability.
· support for horizontal partitioning of index-organized tables while retaining partition independence for the table
operations. A novel aspect here is the equi-partitioning of the overflow with the base table, that is, a separate
overflow area is allocated for each base table partition to maintain the partition independence property.
· support for LOB and object type columns, constraints, and triggers.
The rest of the paper is organized as follows. Section 2 covers various classes of applications that benefit from using
index-organized tables. Section 3 presents the results of experiments conducted to compare performance of index
and heap-organized tables. Section 4 discusses tips for adopting index-organized tables. Finally, we conclude with a
summary of index-organized table features and list out the enhancements planned for future releases.
APPLICATIONS
This section highlights index-organized tables as a scalable, high-performance, and high-availability solution for
OLTP, Internet and E-Commerce, and Data Warehousing Applications and includes a few examples of current usage.
OLTP APPLICATIONS
For OLTP applications, index-organized tables will be suitable for primary key-based access. As an example, consider
the order line table which holds new orders in the TPC-C benchmark, where four of the columns (district, warehouse,
order number, orderline) form the primary key. Unlike a heap-organized table with a primary key index, the index-organized
table will have a single primary key B-tree index storage structure that holds both key as well as non-key
columns. The index-organized table will provide fast primary key based access as well as avoids duplication of primary
key. For large scale data, users can make use of partitioning and parallel query support on index-organized tables.
INTERNET AND E-COMMERCE APPLICATIONS
With the explosion of internet, numerous portals and search engines have become popular. These applications
typically need to maintain lists of keywords, users, URLs, etc. These structures can be modeled by an index-organized
table, where each row holds a primary key with some additional information. For example, Lycos is using index-organized
tables to implement keyword tables used by their advertising system, as well as to implement a URL table
used to hold URLs and their links. They have reported considerable speed up in access times.
Similarly E-Commerce applications may need to hold catalog information such as a list of shopping items in an online
store. Also, they may need to hold customer order information. Typically, a customer order is assigned a unique value
which is used subsequently for tracking it. Such information can be best stored in Oracle as an index-organized table,
as the rows will be accessed often by the primary key. Since index-organized tables give insert performance
comparable to heap-organized tables they can handle large number of inserts. Also, once the customer order has been
processed they need to be deleted from the orders table. Again index-organized tables have superior delete
performance making them suitable for such operations. Furthermore, if the index-organized table gets fragmented due
to large number of inserts and deletes then it can be rebuilt online by using the MOVE ONLINE option.
DATA WAREHOUSING APPLICATIONS
Data warehousing applications deal with large amounts of data. Hence parallel loading, scans, and support for
partitioned tables is highly desired. Index-organized tables can be loaded using PARALLEL CREATE TABLE AS
SELECT option. Also partitioned index-organized tables are supported, where each partition can be loaded
concurrently. Parallel index creation is also supported. Plus, users can perform parallel scans on index-organized
tables. All these features make index-organized tables suitable for handling large scale data.
Data warehouse applications often use “star” schemas. This typically consists of one or more very large "fact" tables
and a number of much smaller "dimension" or reference tables. A star query is one that joins several of the dimension
tables, usually by predicates in the query, to one of the fact tables. Star optimization works by joining all the
dimension tables and then joining it with the fact table. Typically, the columns involved in join predicates form the
keys of “fact” table’s primary key, therefore implementing fact table as index-organized table will lead to efficient
execution of star queries. Note that not only can the join predicate be evaluated using the concatenated primary key
index, once the row is identified the rest of the columns are also available as part of the same index structure.
3. Deploying, Managing, and Administering the Oracle Internet Platform
Note that the star optimization works well for schemas with a small number of dimensions and dense fact tables. Eli
Lilly is using a partitioned index-organized table to implement their largest “fact” table for their drug discovery
application. This table is typically joined with smaller “dimension” tables, which are constructed based on input
provided by scientists. The resulting join is used to identify compounds with similar properties and attributes.
If the number of dimensions is large, or the fact table is sparse, or if there are queries where not all dimension tables
have constraining predicates, then star transformation will be a better option. Star transformation works by combining
bitmap indexes on individual fact table columns. Currently index-organized tables do not support bitmap indexes, but
there are plans to add bitmap index support to index-organized tables in the near future.
PERFORMANCE STUDY
This section gives a comparative performance analysis of index-organized and heap-organized tables as reported in
[IOTPERF]. All the experiments are conducted using Oracle8i, Release 8.1.6 on SunOS 5.6, Ultra-60 Sparc
Workstation with 256 MB of main memory and 4K block size. For all the experiments, the order_line table of
TPC-C benchmark [TPCC98] is used. The benchmark models a product sales and distribution business.
CREATE TABLE order_line
( ol_o_id NUMBER, ol_w_id NUMBER, ol_d_id NUMBER, ol_number NUMBER,
ol_i_id NUMBER, ol_supply_w_id NUMBER, ol_quantity NUMBER,
ol_amount NUMBER(6), ol_delivery_date DATE, ol_dist_info CHAR(24),
CONSTRAINT pk_orderline PRIMARY KEY (ol_w_id, ol_d_id, ol_o_id, ol_number)
);
The performance is compared by implementing order_line table both as heap- and index-organized table. For
query and DML experiments, the total time taken is measured and the experiments are conducted for data sizes
varying from 100 MB(~1.75 million rows) to 500 MB(~8.74 million rows). All the measurements are repeated 10
times with 16 MB of database buffer cache and the average values along with maximum standard deviation (s.d) are
reported. Although many experiments were conducted, only key results are presented here due to space limitation.
BULK-LOAD PERFORMANCE AND STORAGE COSTS
Objective: To compare the bulk-load performance as well as to identify storage requirements.
Experimental Set-up: The data is generated in random order while maintaining the uniqueness of the primary key. It is
loaded into the Oracle8i database using SQL*Loader and physical storage statistics are obtained by using the
ANALYZE TABLE command. The same steps are performed for both heap and index-organized table for 200 MB
of data (~ 3.5 million rows).
Results and Observations: The bulk-load time for the index-organized table (28 minutes) is greater than that of the heap-organized
table (17 minutes) for loading 200MB of data, mainly due to increased sort overhead. Sort overhead for
index-organized table is larger because sort entry for each row must include both key columns as well as non-key
columns, whereas a sort entry for index on a heap-organized table needs to include only the key columns. Since the
difference is primarily due to larger sort size, bulk-load performance for index-organized tables can be improved by
allocating larger in-memory sort area, which can be specified using the SORT_AREA_SIZE parameter.
Overall, the index-organized table requires less storage(268 MB for 200MB data set) than the heap-organized
table(344 MB) as it avoids duplication of key columns. For the index-organized table, the primary key index storage
required 268 MB (it also holds all non-key columns). For the heap-organized table, the primary key index storage
required 93 MB and the actual table storage required 251 MB. The storage structures were created with PCTFREE
option set to 10, which reserves 10% of each block for future updates and inserts.
QUERY PERFORMANCE
Objective: To compare the query performance for random and range access for heap and index-organized tables.
Experimental Set-up: For the random access performance experiment, 1000 random order_line rows are selected
and the total time is measured. For range scan performance, the total time for selecting 100000 consecutive
order_line rows is measured.
Paper #256 / Page 3
4. Deploying, Managing, and Administering the Oracle Internet Platform
Results and Observations: The index-organized table shows superior query performance for both random and range
scans. The random access on the 100 MB table took 0.92 seconds for index-organized table as compared to 1.16
seconds for the heap-organized table(See Figure 1). The same performance is observed for larger data sets. The faster
random access for the index-organized table results from finding both key and non-key columns in the primary key
index leaf block.
The range scan query performance is consistently faster for index-organized tables (See Figure 1) when compared to
heap-organized tables. For the 100 MB table, a range-scan on index-organized table took 1.20 seconds as compared to
4.35 seconds for the heap-organized tables. Similar performance is observed for larger data sets. The speed-up can be
attributed to the fact that the rows are clustered in primary key order for index-organized tables, plus an additional
random I/O per row is avoided.
Paper #256 / Page 4
Random Scan
(s.d <= 0.1)
1.5
1
0.5
0
0 100 200 300 400 500
Dataset Size (in MB)
Time (in sec)
Range Scan
(s.d <= 0.1)
6
4
2
0
0 100 200 300 400 500
Dataset Size (in MB)
Time (in sec)
Heap-Org.
Table
Index-
Org.
Table
Figure 1: Random and Range Scan Time
DML PERFORMANCE
Objective: To compare the incremental DML performance for heap and index-organized tables.
Experimental Set-up: Insert, update, and delete performance are measured by inserting, updating, and deleting 2000
order_line rows. The update statement involved modification of the delivery_date field.
Results and Observations: While insert and update performance for index-organized tables are comparable to heap-organized
tables, delete performs better for index-organized tables (Figure 2). It is to be noted, however, that unlike
the other cases, in the 200 MB case, B+-Tree heights corresponding to the index-organized table and the heap-organized
table are 4 and 3, respectively. Due to this difference in B+-Tree heights, insert and update operations for
index-organized table took more time to complete in the 200MB case.
5. Deploying, Managing, and Administering the Oracle Internet Platform
Paper #256 / Page 5
Insert Time (s.d<=0.7)
10
5
0
0
100
200
300
400
500
Dataset Size (in MB)
Time (in sec)
Update Time (s.d<=0.1)
2
1
0
0
100
200
300
400
500
Dataset Size (in MB)
Time (in sec)
Delete Time (s.d<=0.5)
10
5
0
0
100
200
300
400
500
Dataset Size (in MB)
Time (in sec)
Heap-
Org.
Table
Index-
Org.
Table
Figure 2: Insert, Update, and Delete Time: without overflow.
Since single row updates in both heap-organized and index-organized tables involve finding the target row via
primary-key index and then making modifications to a single structure (table or index, respectively), it is easy to argue
that performance of such updates will be comparable. So, we focus instead on performance numbers for updating
multiple rows with consecutive primary-key values. The number of index blocks accessed for selecting the rows that
satisfy the where-clause is higher for index-organized tables because inclusion of non-key columns makes size of index
rows larger than that of primary-key index rows for heap tables. However, due to guaranteed clustered placement of
consecutive rows in index-organized tables and usual lack of such clustering in a heap organization, accessing and
modifying target rows usually requires a significantly lower number of disk block accesses in the case of index-organized
tables. The latter reduction in number of blocks is usually sufficient to offset the earlier increase. This is
reflected in our results which shows comparable update performance for index-organized and heap-organized tables.
Index-organized tables (without overflow) will usually have better delete performance because only a single structure
(index) needs to be modified as opposed to modifying two structures (index and table) for heap-organized tables. This
is reflected in our results (See Figure 2).
6. Deploying, Managing, and Administering the Oracle Internet Platform
Time (in sec) Figure 3: Time for 500MB case for operations on (1) heap-organized table, (2) index-organized tables without overflow and (3) with
Paper #256 / Page 6
EFFECT OF OVERFLOW
Insert
12
9
6
3
0
(a)
Time(in sec)
update
(column not in overflow)
2
1
0
(b)
Time (in sec)
Delete
12
9
6
3
0
(c)
Time (in sec)
Heap-Org.
Table
Index-Org.
Table
Index-Org.
Table with
Overflow
Exact Match Query
(column in overflow)
1
0
(d)
Time (in sec)
Range Scan Query
(column not in overflow)
4
3
2
1
0
(e)
overflow.
Additional experiments were conducted to measure query and DML performance for the 500MB case with column
ol_dist_info pushed out to an overflow segment. This placement option results in higher leaf index row density
as less number of column values per row needs to be stored in the index. This leads to improved performance for any
query or update, that does not need access to the column(s) in the overflow, as less number of index blocks need to
be accessed. Figure 3(b) and 3(e) show this improvement for an update and a range scan query. Insert performance
improves (Figure 3(a)), mainly because reduction in index-row size lowers the frequency of splits during insertion into
the B+-Tree. Delete performance, however, deteriorates (to become comparable to delete performance for heap-organized
table) as two structures, index and overflow, need to be modified to carry out deletion of target rows
(Figure 3(c)). Similar deterioration is seen in an exact match query performance retrieving an overflow-resident
column because both index and overflow need to be accessed
7. Deploying, Managing, and Administering the Oracle Internet Platform
USAGE TIPS
This section gives tips and techniques for adopting index-organized tables.
TIPS ON USING THE BASIC CONFIGURATION
The basic index-organized table configuration constitutes just the primary B-tree index. That is, no overflow data
segment is associated with the table. The tips for using such a configuration are:
· For tables with small row size (<=100 bytes) overflow should be avoided. For OLTP applications, the users
should try using the basic configuration. This organization has the advantage of having a single structure that
provides fast primary key-based access and once the leaf block is reached the rest of the non-key is available
without incurring additional I/O. Similarly all DML operations requires updating a single structure.
· Bulk-loading data into this configuration will take longer time when compared to loading into a heap-organized
table with primary key index. The extra overhead is due to the large sort area needed to hold not only the key
columns but all of the non-key columns. Users should allocate a large in-memory sort buffer or if possible sort
the data before the load operation.
· For parallel bulk-loading of data, users can first bulk-load the data in parallel using SQL*Loader to a heap-organized
table and create index-organized table by using CREATE TABLE AS SELECT … PARALLEL
command. Note that parallel bulk-load via SQL*Loader into an index-organized table is currently not supported.
· Since primary key index segment is where the data for the row is stored, DBAs need to query USER_INDEXES
to look for the physical and storage attributes for the table, namely by looking at the primary key index
information. It is recommended that you name the primary key constraints because the corresponding index
inherits the same name. There is a logical table row in USER_TABLES for the index-organized table but it does
not have any physical data segment associated with it.
TIPS ON USING OVERFLOW
The overflow option allows storing a portion of the row in the overflow area. However, the overflow should be used
judiciously. Specifically,
· For queries requiring full table scans etc., better performance can be achieved for selected columns by forcing them
to be in index segment and pushing the rest of the column into the overflow data segment. The INCLUDING
column option can be used to force tail portion of row into an overflow row-piece. With this technique you can
control the columns stored in index segment and hence control the index row density. Thus, the performance of a
full scan that needs to fetch only the columns stored in index segment can be improved.
· ALTER TABLE MOVE along with overflow is suggested after a bulk load if you fetch the columns in the
overflow by range scans or fast-full scans frequently. Bulk load does not cluster the overflow row-pieces based on
the primary key. So, range scan or fast full scan on such table would involve random I/Os to fetch the overflow
row-pieces. By issuing ALTER TABLE <table_name> MOVE OVERFLOW, the overflow row-pieces will be
clustered based on the primary key values leading to better performance. This applies to individual partitions in a
partitioned index-organized tables as well.
· For an index-organized table with overflow, a separate row in the USER_TABLES view provides information
about physical and storage attributes for the overflow data segment. However, the overflow can not be directly
accessed or manipulated.
TIPS ON USING SECONDARY B-TREE INDEXES
The secondary B-tree indexes provide alternate access paths for index-organized tables. The tips for using them are:
· Create secondary index on the index-organized table after the bulk load of data. For secondary indexes created as
part of bulk-load, the guess-DBA of the index-organized table row is not stored along with the index row. So, for the
queries with secondary index based scans, primary key traversal is required for locating the index-organized table
row. However, creating secondary indexes after the table is loaded, results in filling up the guess-DBA portions
for each row. Then subsequent index based scans will typically incur only one additional I/O to get the index-organized
table row by using the guess-DBAs. Note that secondary B-tree indexes can be created in parallel.
Paper #256 / Page 7
8. Deploying, Managing, and Administering the Oracle Internet Platform
· Do not include any primary key columns as part of the secondary B-tree index in order to improve performance
as secondary indexes on index-organized tables implicitly include primary key columns. Thus, a index-only scan
will be sufficient for fetching the secondary key columns plus any of the primary key columns. Similar benefits
can be seen for queries with joins on primary key columns.
TIPS ON USING PARTITIONED INDEX-ORGANIZED TABLES
Range-partitioned index-organized tables are supported with the condition that the partitioning key must be a subset
of the primary key. The tips for using them are:
· Data can be loaded into multiple partitions concurrently using multiple SQL*Loader sessions. However, parallel
CREATE TABLE AS SELECT command is not yet supported for partitioned index-organized tables and will be
supported in near future.
· Just like partitioned heap-organized tables, partitioned index-organized tables can provide better performance by
utilizing partition pruning and partition-wise joins. In addition, partition independence property is retained just as
in heap-organized tables; namely, DML and query operations on partitions can be performed while maintenance
operations on other partitions are in progress.
TIPS ON USING DATA REORGANIZATION OPERATIONS
Frequent inserts and updates can result in fragmentation of the primary key B-tree index. This can be fixed by
rebuilding the index-organized table using the ALTER TABLE MOVE option. Few additional tips are:
· The primary key B-tree index can be rebuilt online using the ALTER TABLE MOVE ONLINE command. The
primary key B-tree index for a partition can be rebuilt using the ALTER TABLE <table> PARTITION
<partition> MOVE
· Rebuilding index-organized table does not make the secondary B-tree indexes UNUSABLE where in the case of
Paper #256 / Page 8
heap-organized tables the indexes become UNUSABLE.
· MOVING a partition does not make any (non-partitioned, local-partitioned, or global-partitioned) B-tree indexes
UNUSABLE. Partition SPLIT operations do not make the global B-tree indexes UNUSABLE. In the case of
heap-organized tables these operations render the indexes UNUSABLE.
TIPS ON USING KEY COMPRESSION
A multi-column (concatenated) primary key based index-organized table can be compressed using the COMPRESS
option.
· For data that will yield less than 20% compression, there may not be significant improvement in query
performance.
· For data that will yield higher compression (> 20%), apart from the storage savings, both query and load
(CREATE TABLE AS SELECT as well as INSERT AS SELECT) performance will show improvements. Query
and DML performance is improved due to reduced disk I/Os. Load performance is improved due to fewer B-tree
splits as well as reduced I/Os.
TIPS ON USING OBJECT TYPES AND LOBS
Index-organized tables supports object columns and LOBs. They can also be used to create Object tables. A few
usage tips:
· The primary key columns can be scalar attributes within objects. However, the INCLUDING column clause can
specify only top level object columns but not any of its nested attributes.
· For an index-organized table without overflow, the LOBs implicitly have DISABLE STORAGE IN ROW option
set. In line storage of LOBs is allowed only on index-organized tables which have an overflow data segment.
9. Deploying, Managing, and Administering the Oracle Internet Platform
CONCLUSIONS AND FUTURE WORK
The ORGANIZATION INDEX option provides an alternate storage organization for tables, where table data is
mostly held in the primary B-tree index. Tables created using this option, namely, index-organized tables provide
scalable, high-performance, and high-availability storage organization. The comparative performance analysis of index
and heap-organized tables demonstrate the superior performance of index-organized tables for primary key based
access. In fact, index-organized tables are already getting deployed in production environments by customers such as
Lycos and Eli Lilly.
In the near future we plan to add support for online B-tree index operations and guess-DBA fix-up. We also intend to
support bitmap indexes and support for indexing UROWID columns. In addition, there are plans to support LOBs
in partitioned index-organized tables, hash-partitioned index-organized tables, and temporary index-organized tables.
ACKNOWLEDGMENTS
I thank the index-organized table team at New England Development Center, which developed most of these
features. Namely: Jay Banerjee, Eugene Chong, Souri Das, Chuck Freiwald, Mahesh Jagannath, Ramkumar Krishnan,
Anh-Tuan Tran, and Aravind Yalamanchi. Next, I thank Jonathan Klein and Vishwanath Karra for implementing
online move and drop column support. Special thanks to Franco Putzolu for his invaluable feedback.
REFERENCES
[IOTPERF] Srinivasan J., Das S., Freiwald, C., Chong, E. I., Jagannath, M., Yalamanchi, A., Krishnan, R., Tran, A.,
DeFazio, S., Banerjee, J., “Supporting Applications with Primary Key Access Intensive Workloads in Oracle8,”
submitted for publication.
[TPCC98] TPC Benchmark™ C Standard Specification, Revision 3.4, Transaction Processing Performance Council,
Aug. 25, 1998.
Paper #256 / Page 9