CoreBigBench: Benchmarking Big Data Core Operations
Teradata Aggregate Join Indices And Dimensional Models
1. Aggregate Join Indices & Dimensional models delivering extraordinary performance Jose M. Borja – Jborja@Menard-inc.com
2. Theory vs. Practice “ In theory, there is no difference between theory and practice. In practice there is….” Yogi Berra The reason we are here today is to help bridge the gap between theory and practice and to share with you real life experiences on using Aggregate Join Indices and Dimensional Models to deliver extraordinary performance
3.
4.
5.
6.
7.
8.
9. Example #1 - SQL for 3NF Model Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as LY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2005-01-01 and date – interval ‘1’ year and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group By product_id Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as TY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2006-01-01 and date and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group by product_id FULL OUTER JOIN
10.
11. Example #1 - SQL for Dimensional Model Select product_id, sum(net_sale_amt) as LY_Sales From store_product_daily_sale Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From store_product_daily_sale Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The fact table is 1/3 the size of the 3NF Sale_Line table and eliminates a table join between Sales and Sales_Line
12. Add Aggregate Join Indices to boost performance A view is added in the Dimensional model to represent a single table aggregate Join Index at the Corporate level. The AJI removes the Store grain and yields a higher aggregate with less rows.
13. Example #1 - SQL for Dimensional Model using the Join Index View Select product_id, sum(net_sale_amt) as LY_Sales From ji_product_daily_salev Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From ji_product_daily_salev Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The Join Index is 1/30 the size of the 3NF Sale_Line table
14. A more robust Fact table has more possibilities Bring additional dimensions to yield different levels of aggregation granularity to the mix of Join Indexes
19. Use the view to gain access to the Join Index CREATE JOIN INDEX JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . FROM STORE_PRODUCT_DAILY_SALE PRIMARY INDEX ( product_id, the_date); REPLACE VIEW JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . . FROM STORE_PRODUCT_DAILY_SALE; SELECT prodcut_id, the_date, net_sale_amt FROM JI_PRODUCT_DAILY_SALEv WHERE product_id = 198273648;
24. LY vs. TY for all Product Categories Corporate Wide
25. Conclusions Teradata technology makes it possible to sustain a 3NF and a Dimensional Model in a single system and enjoy the benefits of having both worlds.
26. Conclusions Teradata technology makes it easy to get the Dimensional model available for use at different levels of granularity using Join Indexes. Sweet performance with low resource usage and auto-magic maintenance!
27. Conclusions The expense of maintaining a dozen Join Indexes on a single Fact table is paid back with just one substantial single report ran against the 3NF model. The Join Indexes are maintained when the DW has less usage at night and the benefits are harvested during the day by the users.
28. Conclusions The number of Secondary Indexes can be kept very low in the 3NF model since the Dimensional Model provides most of the necessary access to large volumes of data. Most access to the 3NF can be limited to PI queries for application support, tactical queries, or reports that can afford table scans.
29. Tips on Join Indexes Keep join indexes limited to only one table. Maintenance is too high on Join Indexes with two or more tables. If one of the tables is maintained the Join Index may need to be maintained also. Do not drop and recreate Join Indexes for maintenance. It is not necessary and can be (very, very, very) costly to recreate. Store the Join Index definitions in macros for reuse and storage in the data dictionary. Create a view to provide “direct” access to the Join Index. Create a dummy Join Index on any table to prevent accidental DROPS. A life saver to see the can not drop table message!