Aggregate Join Indices & Dimensional models delivering extraordinary performance Jose M. Borja – Jborja@Menard-inc.com
Theory vs. Practice “ In theory, there is no difference between theory and practice. In practice there is….” Yogi Berra Th...
Background (or who is this guy) <ul><li>20 years working with Relational Databases </li></ul><ul><li>14 years developing D...
What’s the Challenge? <ul><li>Design a Data Warehouse to meet these goals: </li></ul><ul><ul><li>Faithfully implements the...
What is the proposed solution  <ul><li>Maintain two separate Data Models: </li></ul><ul><ul><li>3NF Data Model </li></ul><...
Common misconceptions about this approach?  <ul><li>Waste of space and processing handling two models </li></ul><ul><li>Ha...
How can I do it in Teradata  <ul><li>3NF Model </li></ul><ul><ul><li>Keep it faithful to the EDWDM with good PI  choices <...
Example #1  <ul><li>Task: Compare This Year vs. Last Year sales by Product corporate wide </li></ul><ul><li>3NF Model </li...
Example #1 - SQL for 3NF Model  Select product_id,  sum(sold_qty * price_amt) – discount_amt – coupon_amt) as LY_Sales_Amt...
Example #1 – Dimensional Model  <ul><li>Task: Compare This Year vs. Last Year sales by Product corporate wide </li></ul><u...
Example #1 - SQL for Dimensional Model Select product_id,  sum(net_sale_amt) as LY_Sales From  store_product_daily_sale Wh...
Add Aggregate Join Indices to boost performance  A view is added in the Dimensional model to represent a single table aggr...
Example #1 - SQL for Dimensional Model using the Join Index View Select product_id,  sum(net_sale_amt) as LY_Sales From  j...
A more robust Fact table has more possibilities  Bring additional dimensions to yield different levels of aggregation gran...
 
Store & Subclass at 3 levels of Time granularity
Product at Daily Level and Store at Daily level
Subclass at 5 levels of Time granularity
Use the view to gain access to the Join Index  CREATE JOIN INDEX   JI_PRODUCT_DAILY_SALEv   AS  SELECT  product_id, the_da...
CPU consumption for the LY vs. TY Sales Example  12% 2%
Disk I/O Usage for the LY vs. TY Sales Example 21% 7%
Elapsed Time for the LY vs. TY Sales Example 10% 3%
LY vs. TY for 1 Product Corporate Wide
LY vs. TY for all Product Categories Corporate Wide
Conclusions Teradata technology makes it possible to sustain a 3NF and a Dimensional Model in a single system and enjoy th...
Conclusions Teradata technology makes it easy to get the Dimensional model available for use at different levels of granul...
Conclusions The expense of maintaining a dozen Join Indexes on a single Fact table is paid back with just one substantial ...
Conclusions The number of Secondary Indexes can be kept very low in the 3NF model since the Dimensional Model provides mos...
Tips on Join Indexes Keep join indexes limited to only one table.  Maintenance is too high on Join Indexes with two or mor...
Upcoming SlideShare
Loading in...5
×

Teradata Aggregate Join Indices And Dimensional Models

5,807

Published on

Teradata Partners 2005 presentation on Dimensional Modeling and Aggregate Join Indices

1 Comment
6 Likes
Statistics
Notes
  • it is very good for us
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,807
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Teradata Aggregate Join Indices And Dimensional Models"

  1. 1. Aggregate Join Indices & Dimensional models delivering extraordinary performance Jose M. Borja – Jborja@Menard-inc.com
  2. 2. Theory vs. Practice “ In theory, there is no difference between theory and practice. In practice there is….” Yogi Berra The reason we are here today is to help bridge the gap between theory and practice and to share with you real life experiences on using Aggregate Join Indices and Dimensional Models to deliver extraordinary performance
  3. 3. Background (or who is this guy) <ul><li>20 years working with Relational Databases </li></ul><ul><li>14 years developing Data Architectures and Physical Database Design work </li></ul><ul><li>5 years practicing Data Administration </li></ul><ul><li>10 years of ICASE tool work and Data Model driven development </li></ul><ul><li>6 years of Teradata DW practice </li></ul><ul><ul><li>Teradata DW Administrator and Data Architect </li></ul></ul><ul><ul><li>Teradata DBA </li></ul></ul><ul><ul><li>SQL Script Writer (ETL and Dimensional Models) </li></ul></ul><ul><ul><li>General Teradata Handyman: Performance Tuning, </li></ul></ul><ul><ul><li>DBS Controls, TDQM, PS, TDWM, Troubleshooting, Performance </li></ul></ul><ul><ul><li>Tuning, Workload Management, Capacity Planning, etc. </li></ul></ul>
  4. 4. What’s the Challenge? <ul><li>Design a Data Warehouse to meet these goals: </li></ul><ul><ul><li>Faithfully implements the Enterprise DW Data Model </li></ul></ul><ul><ul><li>Ad Hoc Reporting </li></ul></ul><ul><ul><li>Data Mining </li></ul></ul><ul><ul><li>Business Intelligence (BI tools, on the fly reports) </li></ul></ul><ul><ul><li>Provide operational application support </li></ul></ul><ul><ul><li>Provide tactical query and operational support </li></ul></ul><ul><li>And do all of that with fast response times and tight SLAs </li></ul>
  5. 5. What is the proposed solution <ul><li>Maintain two separate Data Models: </li></ul><ul><ul><li>3NF Data Model </li></ul></ul><ul><ul><ul><li>Keep the data in line with the Enterprise DW Data Model </li></ul></ul></ul><ul><ul><ul><li>Accessible and easy to query </li></ul></ul></ul><ul><ul><ul><li>Available to applications </li></ul></ul></ul><ul><ul><ul><li>Contain all the legacy data at the lowest granular level </li></ul></ul></ul><ul><ul><li>Dimensional Model </li></ul></ul><ul><ul><ul><li>Star Schemas </li></ul></ul></ul><ul><ul><ul><li>Support BI efforts and limited applications </li></ul></ul></ul><ul><ul><ul><li>Building block for targeted mini data marts (one AMP) </li></ul></ul></ul><ul><ul><ul><li>Easy to use </li></ul></ul></ul><ul><ul><ul><li>Place data closer to the point of use (fast access) </li></ul></ul></ul>
  6. 6. Common misconceptions about this approach? <ul><li>Waste of space and processing handling two models </li></ul><ul><li>Handling data twice </li></ul><ul><li>More money for a bigger machine to host two models </li></ul><ul><li>Is that really true? </li></ul><ul><li>What would the 3NF need to get the job done? </li></ul><ul><li>An assortment of Secondary Indexes </li></ul><ul><ul><li>Requires storage and CPU to maintain </li></ul></ul><ul><li>Lots of CPU cycles to join tables and create aggregates </li></ul><ul><ul><li>Limits number of concurrent queries (long run times) </li></ul></ul><ul><ul><li>May dictate the need to get more machine? </li></ul></ul><ul><ul><li>More complex SQL to navigate 3NF model </li></ul></ul>
  7. 7. How can I do it in Teradata <ul><li>3NF Model </li></ul><ul><ul><li>Keep it faithful to the EDWDM with good PI choices </li></ul></ul><ul><ul><li>Keep number of Secondary Indexes small (or near none!) </li></ul></ul><ul><ul><ul><li>Ad Hoc queries can afford slower response times </li></ul></ul></ul><ul><ul><ul><li>Most of the big tables will be available in the DM! </li></ul></ul></ul><ul><li>Dimensional Model </li></ul><ul><ul><li>Build Fact Tables to supply the measures across grains </li></ul></ul><ul><ul><li>Build single table aggregate Join Indexes on a Fact Table </li></ul></ul><ul><ul><ul><li>Handle different levels of dimensional granularity </li></ul></ul></ul><ul><ul><ul><li>Calculate the data once, use it many times </li></ul></ul></ul><ul><ul><ul><li>“ Automagic” maintenance by Teradata (Yes!) </li></ul></ul></ul><ul><ul><ul><li>Reusability of AJIs by optimizer (bonus!) </li></ul></ul></ul><ul><ul><ul><li>AJIs made available as views for direct query access </li></ul></ul></ul>
  8. 8. Example #1 <ul><li>Task: Compare This Year vs. Last Year sales by Product corporate wide </li></ul><ul><li>3NF Model </li></ul><ul><ul><li>Volume of data will be very large (detail level, approximately 2B rows) </li></ul></ul><ul><ul><li>Number of tables may equal Number of Joins </li></ul></ul><ul><ul><li>May be cumbersome for an Ad Hoc script to write quickly </li></ul></ul><ul><ul><li>Aggregate is at corporate granularity (lots of rows qualify!) </li></ul></ul>
  9. 9. Example #1 - SQL for 3NF Model Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as LY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2005-01-01 and date – interval ‘1’ year and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group By product_id Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as TY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2006-01-01 and date and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group by product_id FULL OUTER JOIN
  10. 10. Example #1 – Dimensional Model <ul><li>Task: Compare This Year vs. Last Year sales by Product corporate wide </li></ul><ul><li>Dimensional Model </li></ul><ul><ul><li>Volume of data is smaller </li></ul></ul><ul><ul><li>Measures taken at the intersection of grains (aggregates) </li></ul></ul><ul><ul><li>Fact table eliminates most joins to 3NF tables </li></ul></ul>
  11. 11. Example #1 - SQL for Dimensional Model Select product_id, sum(net_sale_amt) as LY_Sales From store_product_daily_sale Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From store_product_daily_sale Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The fact table is 1/3 the size of the 3NF Sale_Line table and eliminates a table join between Sales and Sales_Line
  12. 12. Add Aggregate Join Indices to boost performance A view is added in the Dimensional model to represent a single table aggregate Join Index at the Corporate level. The AJI removes the Store grain and yields a higher aggregate with less rows.
  13. 13. Example #1 - SQL for Dimensional Model using the Join Index View Select product_id, sum(net_sale_amt) as LY_Sales From ji_product_daily_salev Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From ji_product_daily_salev Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The Join Index is 1/30 the size of the 3NF Sale_Line table
  14. 14. A more robust Fact table has more possibilities Bring additional dimensions to yield different levels of aggregation granularity to the mix of Join Indexes
  15. 16. Store & Subclass at 3 levels of Time granularity
  16. 17. Product at Daily Level and Store at Daily level
  17. 18. Subclass at 5 levels of Time granularity
  18. 19. Use the view to gain access to the Join Index CREATE JOIN INDEX JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . FROM STORE_PRODUCT_DAILY_SALE PRIMARY INDEX ( product_id, the_date); REPLACE VIEW JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . . FROM STORE_PRODUCT_DAILY_SALE; SELECT prodcut_id, the_date, net_sale_amt FROM JI_PRODUCT_DAILY_SALEv WHERE product_id = 198273648;
  19. 20. CPU consumption for the LY vs. TY Sales Example 12% 2%
  20. 21. Disk I/O Usage for the LY vs. TY Sales Example 21% 7%
  21. 22. Elapsed Time for the LY vs. TY Sales Example 10% 3%
  22. 23. LY vs. TY for 1 Product Corporate Wide
  23. 24. LY vs. TY for all Product Categories Corporate Wide
  24. 25. Conclusions Teradata technology makes it possible to sustain a 3NF and a Dimensional Model in a single system and enjoy the benefits of having both worlds.
  25. 26. Conclusions Teradata technology makes it easy to get the Dimensional model available for use at different levels of granularity using Join Indexes. Sweet performance with low resource usage and auto-magic maintenance!
  26. 27. Conclusions The expense of maintaining a dozen Join Indexes on a single Fact table is paid back with just one substantial single report ran against the 3NF model. The Join Indexes are maintained when the DW has less usage at night and the benefits are harvested during the day by the users.
  27. 28. Conclusions The number of Secondary Indexes can be kept very low in the 3NF model since the Dimensional Model provides most of the necessary access to large volumes of data. Most access to the 3NF can be limited to PI queries for application support, tactical queries, or reports that can afford table scans.
  28. 29. Tips on Join Indexes Keep join indexes limited to only one table. Maintenance is too high on Join Indexes with two or more tables. If one of the tables is maintained the Join Index may need to be maintained also. Do not drop and recreate Join Indexes for maintenance. It is not necessary and can be (very, very, very) costly to recreate. Store the Join Index definitions in macros for reuse and storage in the data dictionary. Create a view to provide “direct” access to the Join Index. Create a dummy Join Index on any table to prevent accidental DROPS. A life saver to see the can not drop table message!

×