Stack It And Unpack It

842 views
737 views

Published on

Partitioning and Compression for Datawarehouses.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
842
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Good morning everyone and welcome to the second session of the morning. My name is Jeff Moss and I’m going to talk to you about a couple of features which can come in handy in datawarehouse/VLDB environments.
  • Stack It And Unpack It

    1. 1. Stack It & Pack It Partitioning And Compression For Warehouses / VLDB Jeff Moss
    2. 2. Who Dunnit ?
    3. 3. Agenda <ul><li>My background </li></ul><ul><li>Squeeze your data with data segment compression </li></ul><ul><li>Partition for success </li></ul><ul><li>Questions </li></ul>
    4. 4. My Background <ul><li>Independent Consultant </li></ul><ul><li>13 years Oracle experience </li></ul><ul><li>Blog: http://oramossoracle.blogspot.com/ </li></ul><ul><li>Focused on warehousing / VLDB since 1998 </li></ul><ul><li>First project </li></ul><ul><ul><li>UK Music Sales Data Mart </li></ul></ul><ul><ul><li>Produces BBC Radio 1 Top 40 chart and many more </li></ul></ul><ul><ul><li>2 billion row sales fact table </li></ul></ul><ul><ul><li>1 Tb total database size </li></ul></ul><ul><li>Currently working with Eon UK (Powergen) </li></ul><ul><ul><li>4Tb Production Warehouse, 8Tb total storage </li></ul></ul><ul><ul><li>Oracle Product Stack </li></ul></ul>
    5. 5. What Is Data Segment Compression ? <ul><li>Compresses data by eliminating intra block repeated column values </li></ul><ul><li>Reduces the space required for a segment </li></ul><ul><ul><li>…but only if there are appropriate repeats! </li></ul></ul><ul><li>Self contained </li></ul><ul><li>Lossless algorithm </li></ul>
    6. 6. Where Can Data Segment Compression Be Used ? <ul><li>Can be used with a number of segment types </li></ul><ul><ul><li>Heap & Nested Tables </li></ul></ul><ul><ul><li>Range or List Partitions </li></ul></ul><ul><ul><li>Materialized Views </li></ul></ul><ul><li>Can’t be used with </li></ul><ul><ul><li>Subpartitions </li></ul></ul><ul><ul><li>Hash Partitions </li></ul></ul><ul><ul><li>Indexes – but they have row level compression </li></ul></ul><ul><ul><li>IOT </li></ul></ul><ul><ul><li>External Tables </li></ul></ul><ul><ul><li>Tables that are part of a Cluster </li></ul></ul><ul><ul><li>LOBs </li></ul></ul>
    7. 7. How Does Segment Compression Work ? Database Block Symbol Table Row Data Area Block Common Header (20 bytes) Transaction Header (24 bytes fixed + 24 bytes per ITL) Data Header (14 bytes) Compressed Data Header (16 bytes - variable ) Tail (4 bytes) 100 Call to discuss bill amount TEL NO YES 3 TEL 4 NO 5 YES 2 Call to discuss bill amount 1 100 1 2 3 4 5 101 Call to discuss new product MAIL NO N/A 8 MAIL 9 N/A 7 Call to discuss new product 6 101 6 7 8 4 9 102 Call to discuss new product TEL YES N/A 10 7 3 5 9 10 102 ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP Table Directory (8 bytes) Row Directory (2 bytes per row )
    8. 8. What Affects Compression ? <ul><li>Undisclosed Algorithm </li></ul><ul><ul><li>I asked but support wouldn’t play ball! </li></ul></ul><ul><li>Many Factors </li></ul><ul><ul><li>Block size </li></ul></ul><ul><ul><li>Anything which affects block overhead </li></ul></ul><ul><ul><ul><li>Interested Transaction Lists ( INITRANS ) </li></ul></ul></ul><ul><ul><ul><li>Number of columns </li></ul></ul></ul><ul><ul><ul><li>Number of rows </li></ul></ul></ul><ul><ul><ul><li>PCTFREE </li></ul></ul></ul><ul><ul><li>Number of repeats ( in the block ) </li></ul></ul><ul><ul><li>Length of column value(s) </li></ul></ul>
    9. 9. Compression v Block Size <ul><li>200K rows, Non ASSM Uniform Local extents </li></ul><ul><li>More chance of repeats in any given block </li></ul>
    10. 10. Compression v ITL <ul><li>10K rows, Non ASSM Uniform Local extents </li></ul><ul><li>More ITL = more overhead = less repeats </li></ul>
    11. 11. Compression v Number Of Columns <ul><li>500K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Same amount of data to store </li></ul><ul><li>More columns = more overhead = less repeats </li></ul>
    12. 12. Compression v PCTFREE <ul><li>200K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Higher PCTFREE = less space = less repeats </li></ul>
    13. 13. Compression v NDV <ul><li>200K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Higher NDV = less repeats </li></ul>
    14. 14. Compression v Column Length <ul><li>80K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Minimum 6 characters for compression </li></ul><ul><li>Longer Length = more compression savings </li></ul>
    15. 15. Compression v Ordering <ul><li>Colocate data to maximise compression benefits </li></ul><ul><li>For maximum compression </li></ul><ul><ul><li>Minimise the total space required by the segment </li></ul></ul><ul><ul><li>Identify most “compressable” column(s) </li></ul></ul><ul><li>For optimal access </li></ul><ul><ul><li>We know how the data is to be queried </li></ul></ul><ul><ul><li>Order the data by </li></ul></ul><ul><ul><ul><li>Access path columns </li></ul></ul></ul><ul><ul><ul><li>Then the next most “compressable” column(s) </li></ul></ul></ul>Uniformly distributed Colocated 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
    16. 16. Get Max Compression Order Package <ul><ul><li>PROCEDURE mgmt_p_get_max_compress_order </li></ul></ul><ul><ul><li>Argument Name Type In/Out Default? </li></ul></ul><ul><ul><li>------------------------------ ----------------------- ------ -------- </li></ul></ul><ul><ul><li>P_TABLE_OWNER VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_TABLE_NAME VARCHAR2 IN </li></ul></ul><ul><ul><li>P_PARTITION_NAME VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_SAMPLE_SIZE NUMBER IN DEFAULT </li></ul></ul><ul><ul><li>P_PREFIX_COLUMN1 VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_PREFIX_COLUMN2 VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_PREFIX_COLUMN3 VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>BEGIN </li></ul></ul><ul><ul><li>mgmt_p_get_max_compress_order(p_table_owner => ‘AE_MGMT’ </li></ul></ul><ul><ul><li>,p_table_name =>’BIG_TABLE’ </li></ul></ul><ul><ul><li>,p_sample_size =>10000); </li></ul></ul><ul><ul><li>END: </li></ul></ul><ul><ul><li>/ </li></ul></ul>Running mgmt_p_get_max_compress_order... ---------------------------------------------------------------------------------------------------- Table : BIG_TABLE Sample Size : 10000 Unique Run ID: 25012006232119 ORDER BY Prefix: ---------------------------------------------------------------------------------------------------- Creating MASTER Table : TEMP_MASTER_25012006232119 Creating COLUMN Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table 3: COL3 ---------------------------------------------------------------------------------------------------- The output below lists each column in the table and the number of blocks/rows and space used when the table data is ordered by only that column, or in the case where a prefix has been specified, where the table data is ordered by the prefix and then that column. From this one can determine if there is a specific ORDER BY which can be applied to to the data in order to maximise compression within the table whilst, in the case of a a prefix being present, ordering data as efficiently as possible for the most common access path(s). ---------------------------------------------------------------------------------------------------- NAME COLUMN BLOCKS ROWS SPACE_GB ============================== ============================== ============ ============ ======== TEMP_COL_001_25012006232119 COL1 290 10000 .0022 TEMP_COL_002_25012006232119 COL2 345 10000 .0026 TEMP_COL_003_25012006232119 COL3 555 10000 .0042
    17. 17. Pros & Cons <ul><li>Pros </li></ul><ul><ul><li>Saves space </li></ul></ul><ul><ul><ul><li>Reduces LIO / PIO </li></ul></ul></ul><ul><ul><ul><li>Speeds up backup/recovery </li></ul></ul></ul><ul><ul><ul><li>Improves query response time </li></ul></ul></ul><ul><ul><li>Transparent </li></ul></ul><ul><ul><ul><li>To readers </li></ul></ul></ul><ul><ul><ul><li>…and writers </li></ul></ul></ul><ul><ul><li>Decreases time to perform some DML </li></ul></ul><ul><ul><ul><li>Deletes should be quicker </li></ul></ul></ul><ul><ul><ul><li>Bulk inserts may be quicker </li></ul></ul></ul>
    18. 18. Pros & Cons <ul><li>Cons </li></ul><ul><ul><li>Increases CPU load </li></ul></ul><ul><ul><li>Can only be used on Direct Path operations </li></ul></ul><ul><ul><ul><li>CTAS </li></ul></ul></ul><ul><ul><ul><li>Serial Inserts using INSERT /*+ APPEND */ </li></ul></ul></ul><ul><ul><ul><li>Parallel Inserts (PDML) </li></ul></ul></ul><ul><ul><ul><li>ALTER TABLE…MOVE… </li></ul></ul></ul><ul><ul><ul><li>Direct Path SQL*Loader </li></ul></ul></ul><ul><ul><li>Increases time to perform some DML </li></ul></ul><ul><ul><ul><li>Bulk inserts may be slower </li></ul></ul></ul><ul><ul><ul><li>Updates are slower </li></ul></ul></ul>
    19. 19. Data Warehousing Specifics <ul><li>Star Schema compresses better than Normalized </li></ul><ul><ul><li>More redundant data </li></ul></ul><ul><li>Focus on… </li></ul><ul><ul><li>Fact Tables and Summaries in Star Schema </li></ul></ul><ul><ul><li>Transaction tables in Normalized Schema </li></ul></ul><ul><li>Performance Impact 1 </li></ul><ul><ul><li>Space Savings </li></ul></ul><ul><ul><ul><li>Star schema: 67% </li></ul></ul></ul><ul><ul><ul><li>Normalized: 24% </li></ul></ul></ul><ul><ul><li>Query Elapsed Times </li></ul></ul><ul><ul><ul><li>Star schema: 16.5% </li></ul></ul></ul><ul><ul><ul><li>Normalized: 10% </li></ul></ul></ul>1 - Table Compression in Oracle 9iR2: A Performance Analysis
    20. 20. Things To Watch Out For <ul><li>DROP COLUMN is awkward </li></ul><ul><ul><li>ORA-39726: Unsupported add/drop column operation on compressed tables </li></ul></ul><ul><ul><li>Uncompress the table and try again - still gives ORA-39726! </li></ul></ul><ul><li>After UPDATEs data is uncompressed </li></ul><ul><ul><li>Performance impact </li></ul></ul><ul><ul><li>Row migration </li></ul></ul><ul><li>Use appropriate physical design settings </li></ul><ul><ul><li>PCTFREE 0 - pack each block </li></ul></ul><ul><ul><li>Large blocksize - reduce overhead / increase repeats per block </li></ul></ul><ul><ul><li>Minimise INITRANS - reduce overhead </li></ul></ul><ul><li>Order data for best compression / access path </li></ul>
    21. 21. A Funny Thing… <ul><li>Block dump trace files still show 9iR2 even in 10g releases… </li></ul><ul><li>ALTER SYSTEM DUMP DATAFILE x BLOCK y; </li></ul>Thanks to Julian Dyke for the block dumping information – http://www.juliandyke.com
    22. 22. What Is Partitioning ? <ul><li>“ Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions .” Oracle Database Concepts Manual, 10gR2 </li></ul><ul><li>Introduced in Oracle 8.0 </li></ul><ul><li>Numerous improvements since </li></ul><ul><li>Subpartitioning adds another level of decomposition </li></ul><ul><li>Partitions and Subpartitions are logical containers </li></ul>
    23. 23. Partition To Tablespace Mapping <ul><li>Partitions map to tablespaces </li></ul><ul><ul><li>Partition can only be in One tablespace </li></ul></ul><ul><ul><li>Tablespace can hold many partitions </li></ul></ul><ul><ul><li>Highest granularity is One tablespace per partition </li></ul></ul><ul><ul><li>Lowest granularity is One tablespace for all the partitions </li></ul></ul><ul><li>Tablespace volatility </li></ul><ul><ul><li>Read / Write </li></ul></ul><ul><ul><li>Read Only </li></ul></ul>P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
    24. 24. Read Only Tablespaces <ul><li>Quicker checkpointing </li></ul><ul><li>Quicker backup </li></ul><ul><li>Quicker recovery </li></ul><ul><li>Reduced space use via compression </li></ul><ul><li>But… </li></ul><ul><li>… depends on granularity… </li></ul>Partition Tablespace
    25. 25. Why Partition ? - Performance <ul><li>Improved query performance </li></ul><ul><ul><li>Pruning or elimination </li></ul></ul><ul><ul><li>Partition wise joins </li></ul></ul><ul><ul><ul><li>Full </li></ul></ul></ul><ul><ul><ul><li>Partial </li></ul></ul></ul><ul><li>Selective Compression </li></ul><ul><ul><li>By Partition </li></ul></ul><ul><li>Selective Reorganisation </li></ul><ul><ul><li>Index Partition REBUILD </li></ul></ul><ul><ul><li>Table Partition MOVE </li></ul></ul>SELECT SUM(sales) FROM part_tab WHERE sales_date BETWEEN ‘01-JAN-2005’ AND ’30-JUN-2005’ Sales Fact Table * Oracle 10gR2 Data Warehousing Manual JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
    26. 26. Why Partition ? - Manageability <ul><li>Archiving </li></ul><ul><ul><li>Use a rolling window approach </li></ul></ul><ul><ul><li>ALTER TABLE … ADD/SPLIT/DROP PARTITION… </li></ul></ul><ul><li>Easier ETL Processing </li></ul><ul><ul><li>Build a new dataset in a staging table </li></ul></ul><ul><ul><li>Add indexes and constraints </li></ul></ul><ul><ul><li>Collect statistics </li></ul></ul><ul><ul><li>Then swap the staging table for a partition on the target </li></ul></ul><ul><ul><ul><li>ALTER TABLE…EXCHANGE PARTITION… </li></ul></ul></ul><ul><li>Easier Maintenance </li></ul><ul><ul><li>Table partition move, e.g. to compress data </li></ul></ul><ul><ul><li>Local Index partition rebuild </li></ul></ul>
    27. 27. Why Partition ? - Scalability <ul><li>Partition is generally consistent and predictable </li></ul><ul><ul><li>Assuming an appropriate partitioning key is used </li></ul></ul><ul><ul><li>…and data has an even distribution across the key </li></ul></ul><ul><li>Read only approach </li></ul><ul><ul><li>Scalable backups - read only tablespaces are ignored </li></ul></ul><ul><ul><li>…so partitions in those tablespaces are ignored </li></ul></ul><ul><li>Pruning allows consistent query performance </li></ul>
    28. 28. Why Partition ? - Availability <ul><li>Offline data impact minimised </li></ul><ul><ul><li>… depending on granularity </li></ul></ul><ul><ul><li>Quicker recovery </li></ul></ul><ul><ul><li>Pruned data not missed </li></ul></ul><ul><ul><li>EXCHANGE PARTITION </li></ul></ul><ul><ul><ul><li>Allows offline build </li></ul></ul></ul><ul><ul><ul><li>Quick swap over </li></ul></ul></ul>P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
    29. 29. Fact Table Partitioning Transaction Date Load Date <ul><li>Easier ETL Processing </li></ul><ul><ul><li>Each load deals with only 1 partition </li></ul></ul><ul><li>No use to end user queries! </li></ul><ul><ul><li>Can’t prune – Full scans! </li></ul></ul><ul><li>Harder ETL Processing </li></ul><ul><ul><li>But still uses EXCHANGE PARTITION </li></ul></ul><ul><li>Useful to end user queries </li></ul><ul><ul><li>Allows full pruning capability </li></ul></ul>07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 January Partition February Partition 22-JAN-2005 Customer 3 01-FEB-2005 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005 Customer 7 04-APR-2005 22-JAN-2005 Customer 3 01-FEB-2005 January Partition February Partition 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 09-APR-2005 Customer 9 10-APR-2005
    30. 30. Watch out for… <ul><li>Partition exchange and table statistics 1 </li></ul><ul><ul><li>Partition stats updated </li></ul></ul><ul><ul><li>… but Global stats are NOT! </li></ul></ul><ul><ul><li>Affects queries accessing multiple partitions </li></ul></ul><ul><ul><li>Solution </li></ul></ul><ul><ul><ul><li>Gather stats on staging table prior to EXCHANGE </li></ul></ul></ul><ul><ul><ul><li>Partition exchange </li></ul></ul></ul><ul><ul><ul><li>Gather stats on partitioned table using GLOBAL </li></ul></ul></ul>Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
    31. 31. Partitioning Feature: Characteristic Reason Matrix    Partition Truncation     Exchange Partition    Archiving    Pruning (Partition Elimination)   Partition wise joins  Parallel DML     Local Indexes    Read Only Partitions Availability Scalability Manageability Performance Characteristic: Feature:
    32. 32. Questions ?
    33. 33. References: Papers <ul><li>Table Compression in Oracle 9iR2: A Performance Analysis </li></ul><ul><li>Table Compression in Oracle 9iR2: An Oracle White Paper </li></ul><ul><li>“ Scaling To Infinity, Partitioning In Oracle Data Warehouses”, Tim Gorman </li></ul><ul><li>Decision Speed: Table Compression In Action </li></ul>
    34. 34. References: Online Presentation / Code <ul><li>http://www.oramoss.demon.co.uk/presentations/stackitandpackit.ppt </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_block_size_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_column_length_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_itl_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_ndv_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_num_cols_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_pctfree_compression.sql </li></ul>

    ×