Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stack It And Unpack It

910 views

Published on

Partitioning and Compression for Datawarehouses.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Stack It And Unpack It

  1. 1. Stack It & Pack It Partitioning And Compression For Warehouses / VLDB Jeff Moss
  2. 2. Who Dunnit ?
  3. 3. Agenda <ul><li>My background </li></ul><ul><li>Squeeze your data with data segment compression </li></ul><ul><li>Partition for success </li></ul><ul><li>Questions </li></ul>
  4. 4. My Background <ul><li>Independent Consultant </li></ul><ul><li>13 years Oracle experience </li></ul><ul><li>Blog: http://oramossoracle.blogspot.com/ </li></ul><ul><li>Focused on warehousing / VLDB since 1998 </li></ul><ul><li>First project </li></ul><ul><ul><li>UK Music Sales Data Mart </li></ul></ul><ul><ul><li>Produces BBC Radio 1 Top 40 chart and many more </li></ul></ul><ul><ul><li>2 billion row sales fact table </li></ul></ul><ul><ul><li>1 Tb total database size </li></ul></ul><ul><li>Currently working with Eon UK (Powergen) </li></ul><ul><ul><li>4Tb Production Warehouse, 8Tb total storage </li></ul></ul><ul><ul><li>Oracle Product Stack </li></ul></ul>
  5. 5. What Is Data Segment Compression ? <ul><li>Compresses data by eliminating intra block repeated column values </li></ul><ul><li>Reduces the space required for a segment </li></ul><ul><ul><li>…but only if there are appropriate repeats! </li></ul></ul><ul><li>Self contained </li></ul><ul><li>Lossless algorithm </li></ul>
  6. 6. Where Can Data Segment Compression Be Used ? <ul><li>Can be used with a number of segment types </li></ul><ul><ul><li>Heap & Nested Tables </li></ul></ul><ul><ul><li>Range or List Partitions </li></ul></ul><ul><ul><li>Materialized Views </li></ul></ul><ul><li>Can’t be used with </li></ul><ul><ul><li>Subpartitions </li></ul></ul><ul><ul><li>Hash Partitions </li></ul></ul><ul><ul><li>Indexes – but they have row level compression </li></ul></ul><ul><ul><li>IOT </li></ul></ul><ul><ul><li>External Tables </li></ul></ul><ul><ul><li>Tables that are part of a Cluster </li></ul></ul><ul><ul><li>LOBs </li></ul></ul>
  7. 7. How Does Segment Compression Work ? Database Block Symbol Table Row Data Area Block Common Header (20 bytes) Transaction Header (24 bytes fixed + 24 bytes per ITL) Data Header (14 bytes) Compressed Data Header (16 bytes - variable ) Tail (4 bytes) 100 Call to discuss bill amount TEL NO YES 3 TEL 4 NO 5 YES 2 Call to discuss bill amount 1 100 1 2 3 4 5 101 Call to discuss new product MAIL NO N/A 8 MAIL 9 N/A 7 Call to discuss new product 6 101 6 7 8 4 9 102 Call to discuss new product TEL YES N/A 10 7 3 5 9 10 102 ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP Table Directory (8 bytes) Row Directory (2 bytes per row )
  8. 8. What Affects Compression ? <ul><li>Undisclosed Algorithm </li></ul><ul><ul><li>I asked but support wouldn’t play ball! </li></ul></ul><ul><li>Many Factors </li></ul><ul><ul><li>Block size </li></ul></ul><ul><ul><li>Anything which affects block overhead </li></ul></ul><ul><ul><ul><li>Interested Transaction Lists ( INITRANS ) </li></ul></ul></ul><ul><ul><ul><li>Number of columns </li></ul></ul></ul><ul><ul><ul><li>Number of rows </li></ul></ul></ul><ul><ul><ul><li>PCTFREE </li></ul></ul></ul><ul><ul><li>Number of repeats ( in the block ) </li></ul></ul><ul><ul><li>Length of column value(s) </li></ul></ul>
  9. 9. Compression v Block Size <ul><li>200K rows, Non ASSM Uniform Local extents </li></ul><ul><li>More chance of repeats in any given block </li></ul>
  10. 10. Compression v ITL <ul><li>10K rows, Non ASSM Uniform Local extents </li></ul><ul><li>More ITL = more overhead = less repeats </li></ul>
  11. 11. Compression v Number Of Columns <ul><li>500K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Same amount of data to store </li></ul><ul><li>More columns = more overhead = less repeats </li></ul>
  12. 12. Compression v PCTFREE <ul><li>200K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Higher PCTFREE = less space = less repeats </li></ul>
  13. 13. Compression v NDV <ul><li>200K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Higher NDV = less repeats </li></ul>
  14. 14. Compression v Column Length <ul><li>80K rows, Non ASSM Uniform Local extents </li></ul><ul><li>Minimum 6 characters for compression </li></ul><ul><li>Longer Length = more compression savings </li></ul>
  15. 15. Compression v Ordering <ul><li>Colocate data to maximise compression benefits </li></ul><ul><li>For maximum compression </li></ul><ul><ul><li>Minimise the total space required by the segment </li></ul></ul><ul><ul><li>Identify most “compressable” column(s) </li></ul></ul><ul><li>For optimal access </li></ul><ul><ul><li>We know how the data is to be queried </li></ul></ul><ul><ul><li>Order the data by </li></ul></ul><ul><ul><ul><li>Access path columns </li></ul></ul></ul><ul><ul><ul><li>Then the next most “compressable” column(s) </li></ul></ul></ul>Uniformly distributed Colocated 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
  16. 16. Get Max Compression Order Package <ul><ul><li>PROCEDURE mgmt_p_get_max_compress_order </li></ul></ul><ul><ul><li>Argument Name Type In/Out Default? </li></ul></ul><ul><ul><li>------------------------------ ----------------------- ------ -------- </li></ul></ul><ul><ul><li>P_TABLE_OWNER VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_TABLE_NAME VARCHAR2 IN </li></ul></ul><ul><ul><li>P_PARTITION_NAME VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_SAMPLE_SIZE NUMBER IN DEFAULT </li></ul></ul><ul><ul><li>P_PREFIX_COLUMN1 VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_PREFIX_COLUMN2 VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>P_PREFIX_COLUMN3 VARCHAR2 IN DEFAULT </li></ul></ul><ul><ul><li>BEGIN </li></ul></ul><ul><ul><li>mgmt_p_get_max_compress_order(p_table_owner => ‘AE_MGMT’ </li></ul></ul><ul><ul><li>,p_table_name =>’BIG_TABLE’ </li></ul></ul><ul><ul><li>,p_sample_size =>10000); </li></ul></ul><ul><ul><li>END: </li></ul></ul><ul><ul><li>/ </li></ul></ul>Running mgmt_p_get_max_compress_order... ---------------------------------------------------------------------------------------------------- Table : BIG_TABLE Sample Size : 10000 Unique Run ID: 25012006232119 ORDER BY Prefix: ---------------------------------------------------------------------------------------------------- Creating MASTER Table : TEMP_MASTER_25012006232119 Creating COLUMN Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table 3: COL3 ---------------------------------------------------------------------------------------------------- The output below lists each column in the table and the number of blocks/rows and space used when the table data is ordered by only that column, or in the case where a prefix has been specified, where the table data is ordered by the prefix and then that column. From this one can determine if there is a specific ORDER BY which can be applied to to the data in order to maximise compression within the table whilst, in the case of a a prefix being present, ordering data as efficiently as possible for the most common access path(s). ---------------------------------------------------------------------------------------------------- NAME COLUMN BLOCKS ROWS SPACE_GB ============================== ============================== ============ ============ ======== TEMP_COL_001_25012006232119 COL1 290 10000 .0022 TEMP_COL_002_25012006232119 COL2 345 10000 .0026 TEMP_COL_003_25012006232119 COL3 555 10000 .0042
  17. 17. Pros & Cons <ul><li>Pros </li></ul><ul><ul><li>Saves space </li></ul></ul><ul><ul><ul><li>Reduces LIO / PIO </li></ul></ul></ul><ul><ul><ul><li>Speeds up backup/recovery </li></ul></ul></ul><ul><ul><ul><li>Improves query response time </li></ul></ul></ul><ul><ul><li>Transparent </li></ul></ul><ul><ul><ul><li>To readers </li></ul></ul></ul><ul><ul><ul><li>…and writers </li></ul></ul></ul><ul><ul><li>Decreases time to perform some DML </li></ul></ul><ul><ul><ul><li>Deletes should be quicker </li></ul></ul></ul><ul><ul><ul><li>Bulk inserts may be quicker </li></ul></ul></ul>
  18. 18. Pros & Cons <ul><li>Cons </li></ul><ul><ul><li>Increases CPU load </li></ul></ul><ul><ul><li>Can only be used on Direct Path operations </li></ul></ul><ul><ul><ul><li>CTAS </li></ul></ul></ul><ul><ul><ul><li>Serial Inserts using INSERT /*+ APPEND */ </li></ul></ul></ul><ul><ul><ul><li>Parallel Inserts (PDML) </li></ul></ul></ul><ul><ul><ul><li>ALTER TABLE…MOVE… </li></ul></ul></ul><ul><ul><ul><li>Direct Path SQL*Loader </li></ul></ul></ul><ul><ul><li>Increases time to perform some DML </li></ul></ul><ul><ul><ul><li>Bulk inserts may be slower </li></ul></ul></ul><ul><ul><ul><li>Updates are slower </li></ul></ul></ul>
  19. 19. Data Warehousing Specifics <ul><li>Star Schema compresses better than Normalized </li></ul><ul><ul><li>More redundant data </li></ul></ul><ul><li>Focus on… </li></ul><ul><ul><li>Fact Tables and Summaries in Star Schema </li></ul></ul><ul><ul><li>Transaction tables in Normalized Schema </li></ul></ul><ul><li>Performance Impact 1 </li></ul><ul><ul><li>Space Savings </li></ul></ul><ul><ul><ul><li>Star schema: 67% </li></ul></ul></ul><ul><ul><ul><li>Normalized: 24% </li></ul></ul></ul><ul><ul><li>Query Elapsed Times </li></ul></ul><ul><ul><ul><li>Star schema: 16.5% </li></ul></ul></ul><ul><ul><ul><li>Normalized: 10% </li></ul></ul></ul>1 - Table Compression in Oracle 9iR2: A Performance Analysis
  20. 20. Things To Watch Out For <ul><li>DROP COLUMN is awkward </li></ul><ul><ul><li>ORA-39726: Unsupported add/drop column operation on compressed tables </li></ul></ul><ul><ul><li>Uncompress the table and try again - still gives ORA-39726! </li></ul></ul><ul><li>After UPDATEs data is uncompressed </li></ul><ul><ul><li>Performance impact </li></ul></ul><ul><ul><li>Row migration </li></ul></ul><ul><li>Use appropriate physical design settings </li></ul><ul><ul><li>PCTFREE 0 - pack each block </li></ul></ul><ul><ul><li>Large blocksize - reduce overhead / increase repeats per block </li></ul></ul><ul><ul><li>Minimise INITRANS - reduce overhead </li></ul></ul><ul><li>Order data for best compression / access path </li></ul>
  21. 21. A Funny Thing… <ul><li>Block dump trace files still show 9iR2 even in 10g releases… </li></ul><ul><li>ALTER SYSTEM DUMP DATAFILE x BLOCK y; </li></ul>Thanks to Julian Dyke for the block dumping information – http://www.juliandyke.com
  22. 22. What Is Partitioning ? <ul><li>“ Partitioning addresses key issues in supporting very large tables and indexes by letting you decompose them into smaller and more manageable pieces called partitions .” Oracle Database Concepts Manual, 10gR2 </li></ul><ul><li>Introduced in Oracle 8.0 </li></ul><ul><li>Numerous improvements since </li></ul><ul><li>Subpartitioning adds another level of decomposition </li></ul><ul><li>Partitions and Subpartitions are logical containers </li></ul>
  23. 23. Partition To Tablespace Mapping <ul><li>Partitions map to tablespaces </li></ul><ul><ul><li>Partition can only be in One tablespace </li></ul></ul><ul><ul><li>Tablespace can hold many partitions </li></ul></ul><ul><ul><li>Highest granularity is One tablespace per partition </li></ul></ul><ul><ul><li>Lowest granularity is One tablespace for all the partitions </li></ul></ul><ul><li>Tablespace volatility </li></ul><ul><ul><li>Read / Write </li></ul></ul><ul><ul><li>Read Only </li></ul></ul>P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
  24. 24. Read Only Tablespaces <ul><li>Quicker checkpointing </li></ul><ul><li>Quicker backup </li></ul><ul><li>Quicker recovery </li></ul><ul><li>Reduced space use via compression </li></ul><ul><li>But… </li></ul><ul><li>… depends on granularity… </li></ul>Partition Tablespace
  25. 25. Why Partition ? - Performance <ul><li>Improved query performance </li></ul><ul><ul><li>Pruning or elimination </li></ul></ul><ul><ul><li>Partition wise joins </li></ul></ul><ul><ul><ul><li>Full </li></ul></ul></ul><ul><ul><ul><li>Partial </li></ul></ul></ul><ul><li>Selective Compression </li></ul><ul><ul><li>By Partition </li></ul></ul><ul><li>Selective Reorganisation </li></ul><ul><ul><li>Index Partition REBUILD </li></ul></ul><ul><ul><li>Table Partition MOVE </li></ul></ul>SELECT SUM(sales) FROM part_tab WHERE sales_date BETWEEN ‘01-JAN-2005’ AND ’30-JUN-2005’ Sales Fact Table * Oracle 10gR2 Data Warehousing Manual JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
  26. 26. Why Partition ? - Manageability <ul><li>Archiving </li></ul><ul><ul><li>Use a rolling window approach </li></ul></ul><ul><ul><li>ALTER TABLE … ADD/SPLIT/DROP PARTITION… </li></ul></ul><ul><li>Easier ETL Processing </li></ul><ul><ul><li>Build a new dataset in a staging table </li></ul></ul><ul><ul><li>Add indexes and constraints </li></ul></ul><ul><ul><li>Collect statistics </li></ul></ul><ul><ul><li>Then swap the staging table for a partition on the target </li></ul></ul><ul><ul><ul><li>ALTER TABLE…EXCHANGE PARTITION… </li></ul></ul></ul><ul><li>Easier Maintenance </li></ul><ul><ul><li>Table partition move, e.g. to compress data </li></ul></ul><ul><ul><li>Local Index partition rebuild </li></ul></ul>
  27. 27. Why Partition ? - Scalability <ul><li>Partition is generally consistent and predictable </li></ul><ul><ul><li>Assuming an appropriate partitioning key is used </li></ul></ul><ul><ul><li>…and data has an even distribution across the key </li></ul></ul><ul><li>Read only approach </li></ul><ul><ul><li>Scalable backups - read only tablespaces are ignored </li></ul></ul><ul><ul><li>…so partitions in those tablespaces are ignored </li></ul></ul><ul><li>Pruning allows consistent query performance </li></ul>
  28. 28. Why Partition ? - Availability <ul><li>Offline data impact minimised </li></ul><ul><ul><li>… depending on granularity </li></ul></ul><ul><ul><li>Quicker recovery </li></ul></ul><ul><ul><li>Pruned data not missed </li></ul></ul><ul><ul><li>EXCHANGE PARTITION </li></ul></ul><ul><ul><ul><li>Allows offline build </li></ul></ul></ul><ul><ul><ul><li>Quick swap over </li></ul></ul></ul>P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only
  29. 29. Fact Table Partitioning Transaction Date Load Date <ul><li>Easier ETL Processing </li></ul><ul><ul><li>Each load deals with only 1 partition </li></ul></ul><ul><li>No use to end user queries! </li></ul><ul><ul><li>Can’t prune – Full scans! </li></ul></ul><ul><li>Harder ETL Processing </li></ul><ul><ul><li>But still uses EXCHANGE PARTITION </li></ul></ul><ul><li>Useful to end user queries </li></ul><ul><ul><li>Allows full pruning capability </li></ul></ul>07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 January Partition February Partition 22-JAN-2005 Customer 3 01-FEB-2005 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005 Customer 7 04-APR-2005 22-JAN-2005 Customer 3 01-FEB-2005 January Partition February Partition 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 09-APR-2005 Customer 9 10-APR-2005
  30. 30. Watch out for… <ul><li>Partition exchange and table statistics 1 </li></ul><ul><ul><li>Partition stats updated </li></ul></ul><ul><ul><li>… but Global stats are NOT! </li></ul></ul><ul><ul><li>Affects queries accessing multiple partitions </li></ul></ul><ul><ul><li>Solution </li></ul></ul><ul><ul><ul><li>Gather stats on staging table prior to EXCHANGE </li></ul></ul></ul><ul><ul><ul><li>Partition exchange </li></ul></ul></ul><ul><ul><ul><li>Gather stats on partitioned table using GLOBAL </li></ul></ul></ul>Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2
  31. 31. Partitioning Feature: Characteristic Reason Matrix    Partition Truncation     Exchange Partition    Archiving    Pruning (Partition Elimination)   Partition wise joins  Parallel DML     Local Indexes    Read Only Partitions Availability Scalability Manageability Performance Characteristic: Feature:
  32. 32. Questions ?
  33. 33. References: Papers <ul><li>Table Compression in Oracle 9iR2: A Performance Analysis </li></ul><ul><li>Table Compression in Oracle 9iR2: An Oracle White Paper </li></ul><ul><li>“ Scaling To Infinity, Partitioning In Oracle Data Warehouses”, Tim Gorman </li></ul><ul><li>Decision Speed: Table Compression In Action </li></ul>
  34. 34. References: Online Presentation / Code <ul><li>http://www.oramoss.demon.co.uk/presentations/stackitandpackit.ppt </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_block_size_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_column_length_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_itl_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_ndv_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_num_cols_compression.sql </li></ul><ul><li>http://www.oramoss.demon.co.uk/Code/test_pctfree_compression.sql </li></ul>

×