• Save
Teradata Aggregate Join Indices And Dimensional Models
Upcoming SlideShare
Loading in...5
×
 

Teradata Aggregate Join Indices And Dimensional Models

on

  • 5,702 views

Teradata Partners 2005 presentation on Dimensional Modeling and Aggregate Join Indices

Teradata Partners 2005 presentation on Dimensional Modeling and Aggregate Join Indices

Statistics

Views

Total Views
5,702
Views on SlideShare
5,697
Embed Views
5

Actions

Likes
3
Downloads
0
Comments
1

1 Embed 5

http://www.linkedin.com 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • it is very good for us
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Teradata Aggregate Join Indices And Dimensional Models Teradata Aggregate Join Indices And Dimensional Models Presentation Transcript

    • Aggregate Join Indices & Dimensional models delivering extraordinary performance Jose M. Borja – Jborja@Menard-inc.com
    • Theory vs. Practice “ In theory, there is no difference between theory and practice. In practice there is….” Yogi Berra The reason we are here today is to help bridge the gap between theory and practice and to share with you real life experiences on using Aggregate Join Indices and Dimensional Models to deliver extraordinary performance
    • Background (or who is this guy)
      • 20 years working with Relational Databases
      • 14 years developing Data Architectures and Physical Database Design work
      • 5 years practicing Data Administration
      • 10 years of ICASE tool work and Data Model driven development
      • 6 years of Teradata DW practice
        • Teradata DW Administrator and Data Architect
        • Teradata DBA
        • SQL Script Writer (ETL and Dimensional Models)
        • General Teradata Handyman: Performance Tuning,
        • DBS Controls, TDQM, PS, TDWM, Troubleshooting, Performance
        • Tuning, Workload Management, Capacity Planning, etc.
    • What’s the Challenge?
      • Design a Data Warehouse to meet these goals:
        • Faithfully implements the Enterprise DW Data Model
        • Ad Hoc Reporting
        • Data Mining
        • Business Intelligence (BI tools, on the fly reports)
        • Provide operational application support
        • Provide tactical query and operational support
      • And do all of that with fast response times and tight SLAs
    • What is the proposed solution
      • Maintain two separate Data Models:
        • 3NF Data Model
          • Keep the data in line with the Enterprise DW Data Model
          • Accessible and easy to query
          • Available to applications
          • Contain all the legacy data at the lowest granular level
        • Dimensional Model
          • Star Schemas
          • Support BI efforts and limited applications
          • Building block for targeted mini data marts (one AMP)
          • Easy to use
          • Place data closer to the point of use (fast access)
    • Common misconceptions about this approach?
      • Waste of space and processing handling two models
      • Handling data twice
      • More money for a bigger machine to host two models
      • Is that really true?
      • What would the 3NF need to get the job done?
      • An assortment of Secondary Indexes
        • Requires storage and CPU to maintain
      • Lots of CPU cycles to join tables and create aggregates
        • Limits number of concurrent queries (long run times)
        • May dictate the need to get more machine?
        • More complex SQL to navigate 3NF model
    • How can I do it in Teradata
      • 3NF Model
        • Keep it faithful to the EDWDM with good PI choices
        • Keep number of Secondary Indexes small (or near none!)
          • Ad Hoc queries can afford slower response times
          • Most of the big tables will be available in the DM!
      • Dimensional Model
        • Build Fact Tables to supply the measures across grains
        • Build single table aggregate Join Indexes on a Fact Table
          • Handle different levels of dimensional granularity
          • Calculate the data once, use it many times
          • “ Automagic” maintenance by Teradata (Yes!)
          • Reusability of AJIs by optimizer (bonus!)
          • AJIs made available as views for direct query access
    • Example #1
      • Task: Compare This Year vs. Last Year sales by Product corporate wide
      • 3NF Model
        • Volume of data will be very large (detail level, approximately 2B rows)
        • Number of tables may equal Number of Joins
        • May be cumbersome for an Ad Hoc script to write quickly
        • Aggregate is at corporate granularity (lots of rows qualify!)
    • Example #1 - SQL for 3NF Model Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as LY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2005-01-01 and date – interval ‘1’ year and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group By product_id Select product_id, sum(sold_qty * price_amt) – discount_amt – coupon_amt) as TY_Sales_Amt From Sale s, Sale_Line sl Where saledate between 2006-01-01 and date and s.Store_Nbr = sl.Store_Nbr and s.Transaction_Nbr = sl.Transaction_Nbr Group by product_id FULL OUTER JOIN
    • Example #1 – Dimensional Model
      • Task: Compare This Year vs. Last Year sales by Product corporate wide
      • Dimensional Model
        • Volume of data is smaller
        • Measures taken at the intersection of grains (aggregates)
        • Fact table eliminates most joins to 3NF tables
    • Example #1 - SQL for Dimensional Model Select product_id, sum(net_sale_amt) as LY_Sales From store_product_daily_sale Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From store_product_daily_sale Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The fact table is 1/3 the size of the 3NF Sale_Line table and eliminates a table join between Sales and Sales_Line
    • Add Aggregate Join Indices to boost performance A view is added in the Dimensional model to represent a single table aggregate Join Index at the Corporate level. The AJI removes the Store grain and yields a higher aggregate with less rows.
    • Example #1 - SQL for Dimensional Model using the Join Index View Select product_id, sum(net_sale_amt) as LY_Sales From ji_product_daily_salev Where the_date between 2005-01-01 and date – interval ‘1’ year Group by product_id Select product_id, sum(net_sale_amt) as TY_Sales From ji_product_daily_salev Where the_date between 2006-01-01 and date Group by product_id FULL OUTER JOIN The Join Index is 1/30 the size of the 3NF Sale_Line table
    • A more robust Fact table has more possibilities Bring additional dimensions to yield different levels of aggregation granularity to the mix of Join Indexes
    •  
    • Store & Subclass at 3 levels of Time granularity
    • Product at Daily Level and Store at Daily level
    • Subclass at 5 levels of Time granularity
    • Use the view to gain access to the Join Index CREATE JOIN INDEX JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . FROM STORE_PRODUCT_DAILY_SALE PRIMARY INDEX ( product_id, the_date); REPLACE VIEW JI_PRODUCT_DAILY_SALEv AS SELECT product_id, the_date, product_subclass_id, Supplier_id, sum( net_sale_amt) as net_sale_amt) . . . . . . . . FROM STORE_PRODUCT_DAILY_SALE; SELECT prodcut_id, the_date, net_sale_amt FROM JI_PRODUCT_DAILY_SALEv WHERE product_id = 198273648;
    • CPU consumption for the LY vs. TY Sales Example 12% 2%
    • Disk I/O Usage for the LY vs. TY Sales Example 21% 7%
    • Elapsed Time for the LY vs. TY Sales Example 10% 3%
    • LY vs. TY for 1 Product Corporate Wide
    • LY vs. TY for all Product Categories Corporate Wide
    • Conclusions Teradata technology makes it possible to sustain a 3NF and a Dimensional Model in a single system and enjoy the benefits of having both worlds.
    • Conclusions Teradata technology makes it easy to get the Dimensional model available for use at different levels of granularity using Join Indexes. Sweet performance with low resource usage and auto-magic maintenance!
    • Conclusions The expense of maintaining a dozen Join Indexes on a single Fact table is paid back with just one substantial single report ran against the 3NF model. The Join Indexes are maintained when the DW has less usage at night and the benefits are harvested during the day by the users.
    • Conclusions The number of Secondary Indexes can be kept very low in the 3NF model since the Dimensional Model provides most of the necessary access to large volumes of data. Most access to the 3NF can be limited to PI queries for application support, tactical queries, or reports that can afford table scans.
    • Tips on Join Indexes Keep join indexes limited to only one table. Maintenance is too high on Join Indexes with two or more tables. If one of the tables is maintained the Join Index may need to be maintained also. Do not drop and recreate Join Indexes for maintenance. It is not necessary and can be (very, very, very) costly to recreate. Store the Join Index definitions in macros for reuse and storage in the data dictionary. Create a view to provide “direct” access to the Join Index. Create a dummy Join Index on any table to prevent accidental DROPS. A life saver to see the can not drop table message!