• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to KDD for Tony's MI Course

Introduction to KDD for Tony's MI Course






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • The dimension line contains a series of folders identifying all the available dimensions and measures for a selected cube.

Introduction to KDD for Tony's MI Course Introduction to KDD for Tony's MI Course Presentation Transcript

  • COMP 3503 Deductive Modeling with OLAP with Daniel L. Silver Copyright (c), 2007 All Rights Reserved
  • Agenda
    • What is OLAP?
    • OLAP Functionality
    • Overview of Cognos PowerPlay
    • OLAP Pros and Cons
  • What is OLAP?
  • On-Line Analytical Processing
    • OLAP
    • Term coined by E.F. Codd in a document published in 1993 sponsored by Arbor Software Corp (ESSBASE)
    • Redefined requirements for tools to implement decision support and business intelligence systems.
    • Has had a significant impact on the database and business software market.
  • OLAP Definition
    • Online Analytical Processing = OLAP refers to technology that allows users of multidimensional data bases to generate on-line descriptive or comparative summaries ("views") of data and other analytic queries.
    • OLAP facilities can (and should) be integrated into enterprise-wide data base systems and they allow analysts and managers to monitor the performance of the business (e.g., such as various aspects of the manufacturing process or numbers and types of completed transactions at different locations) or the market.
    Courtesy Anders Stjarne
  • Multidimensional Requirements
    • Example: Sales volume as a function of product , time , and geography.
    Product Geography Time Dimensions: Product, Geography, Time Measure: ‘Sales Volume’ Courtesy Anders Stjarne More than three dimensional data cube is referred to as a hypercube
  • Deductive Modelling and Analysis  Courtesy Anders Stjarne Quarter Month Type Customer Line Brand Number Country Branch Sales Rep Quantity Cost Margin Combination 1 Quarter Month Type Customer Line Brand Number Country Branch Sales Rep Quantity Cost Margin Combination 2 When? Time (1997) Who? Customers (Channels) What? Product (Type) Where? Location (Region) Result? Indicator (Revenue) Comprehensive Sales Analysis
  • On-Line Analytical Processing
    • Strong connection to multi-dimensional database model - MOLAP
    • Data-cubes are typically constructed off-line due to time required to build indices
    • Dimensions, values, and aggregations are limited to that within data-cube
    • On-line cube development has allowed RDBMS vendors to survive as major players in OLAP market - ROLAP
  • On-Line Analytical Processing 12 Rules of an OLAP Environment by E.F. Codd
    • Multi-dimensional - data-cubes or hypercubes
    • Transparent access
    • Navigation aids
    • Consistent reporting
    • Client-sever based
    • Generic dimensionality
    • Efficient data storage
    • Multi-user support
    • Unrestricted cross-dimensional operations
    • Intuitive data manipulation
    • Flexible reporting
    • Unlimited levels of aggregation
  • OLAP Functionality
  • On-Line Analytical Processing
    • Deductive Modeling with OLAP
    • Model is developed within the users mind as data is explored
    • Verification or rejection is facilitated by multi-dimensional functions which display data numerically and graphically
    • Best practices:
      • Determine suspected variable interaction
      • Verify/reject model through exploration
      • Drill-down to refine model
      • Maintain record of exploratory findings
  • On-Line Analytical Processing
    • Basic OLAP Functionality
    • Dimension selection - slice & dice
    • Rotation - allows change in perspective
    • Filtration -value range selection
    • Hierarchies of aggregation levels
      • drill-downs to lower levels
      • roll-ups to higher levels
    • Tremendous tool for decision support and executive information delivery and analysis
  • OLAP - Sample Operations
    • Roll up: summarize data
      • total sales volume last year by product category by region
    • Roll down, drill down, drill through: go from higher level summary to lower level summary or detailed data
      • For a particular product category, find the detailed sales data for each salesperson by date
    • Slice and dice: select and project
      • Sales of beverages in the West over the last 6 months
    • Pivot or rotate: change visual dimensions
    Courtesy Anders Stjarne
  • OLAP and Data Mining
    • The final results from OLAP exploration can lead to inductive data mining
    • Data Mining techniques can be applied to the data views and summaries generated by OLAP to provide more in-depth and often more multidimensional knowledge
    • Data Mining techniques can be considered analytic extension of OLAP
  • OLAP Distributed Framework
    • OLAP functions are independent of:
      • Front-end user interface
      • Back-end data storage
    Courtesy Anders Stjarne
    • Multidimensional
      • difficulty handling sparcity efficiently
      • direct representation of the data “cube”
      • rapid drill down on summary data
      • proprietary solutions
      • better performance response
      • does not scale well to handle large amounts of detail
      • thin client, analytical processing done on server
    REF: White, “MOLAP vs ROLAP,” (B&A-15)
    • Relational
      • multidimensional view built on a Relational DBMS
      • hampered by the limitations of SQL
      • handles sparcity automatically
      • stores summary and detail data equally easily
      • easy to share common dimensions across DWs
      • scales well using well-developed relational technology
      • depends on efficient processing of STAR joins and indexes
      • analytical processing done on the client (or middle server)
    Courtesy Anders Stjarne
  • Overview of Cognos PowerPlay OLAP
    • PowerPlay includes the following components:
      • Transformer
        • Used to define the contents of a cube and create the cube
      • PowerPlay
        • Accesses cubes for data exploration and reporting.
    PowerPlay for Windows Components  Courtesy Anders Stjarne
  • PowerPlay Cubes
    • A cube is a structure that stores data multi-dimensionally and provides:
      • secure data access
      • fast retrieval of data.
    • Cubes can be distributed across a network or to individual computers.
     Courtesy Anders Stjarne
  • Measures
    • The numeric (continuous) data that is collected and stored by your organization.
    • The performance measures used to evaluate your business.
    • Examples:
      • Revenue
      • Cost
      • Quantity sold
      • Units on-hand
      • Hours per Job
      • Number of calls
      • Defective units.
     Courtesy Anders Stjarne # % Revenue - Cost = Profit Margin Basic Derived
  • Dimensions and Levels
    • Dimensions are a broad group of descriptive data about the major aspects of your business.
    • Levels represent established hierarchy within dimensions .
     Courtesy Anders Stjarne Dimensions Levels When? Date What? Products Where? Locations Years Months Days Line Type Product Region Branch Country
  • Levels and Categories
      • A category is a data item that populates a level in a dimension.
     Levels Categories Dimension Courtesy Anders Stjarne Locations Region Country Branch Europe United Kingdom London, U.K. Manchester, U.K.
  • Application Development Process  Plan measures and dimensions Create the cube Obtain the required data Develop the PowerPlay model Explore the cube data using PowerPlay Courtesy Anders Stjarne
  • Explorer and Reporter
    • PowerPlay offers two report modes:
     Courtesy Anders Stjarne Build custom reports Add categories Reporter Investigate Replace categories Explorer
  • Explorer Crosstab Report
    • The default Explorer crosstab report contains:
      • the first two dimensions in the rows and columns
      • values for the first measure
      • a summary row and column.
     Rows Columns Summary column Summary Row Measures Courtesy Anders Stjarne
  • PowerPlay Toolbar and Menus
    • You can access commonly used features on the PowerPlay toolbar.
    • PowerPlay menus offer extended features.
    • Right-click a report to view and use the available options from a shortcut menu.
     Courtesy Anders Stjarne
  • The Dimension Line
    • Use the dimension line to:
      • filter data
      • navigate dimensions and change measures
      • view the current level.
     Courtesy Anders Stjarne
  • Dimension Viewer 
    • The dimension viewer is used to view the content and navigation paths of a selected cube, and the cube path.
    • The toolbox buttons provide access to commonly used features.
    Dimension = Locations Toolbox Cube path Level 1 = States Category = CA Level 2 = Cities Category = San Diego Measures Courtesy Anders Stjarne
  • PowerPlay File Extensions  .ppr, .ppx, .pdf for reports . mdc for cubes Courtesy Anders Stjarne
  • Basic OLAP Operations
      • Selection (Filter) – within the range of a dimension
      • Scope – the range on a dimension
      • Slice – a two dimensional ‘page’ from the cube
      • Dice – chopping up along the dimensions
      • Drill down analysis - to the detail beneath summary data
      • Rollup/ Consolidate
      • Rotate (Pivot) – change dimension orientation
        • Swap rows and columns
        • Swap on or off
        • Change nesting order
      • Reach Through – to the source data detail
      • Calculations / Derivation formulas on the measured facts
        • Ratios, Rankings, etc.
        • E.g., NetSales = GrossSales – Cost; NetSales = GrossSales*(1 - Margin)
    REFS: INMON, Building , Ch. 7, p. 243; White, “MOLAP vs ROLAP,” (B&A-15) Courtesy Anders Stjarne
  • Advanced OLAP Operations
    • Trend analysis - over broad vistas of time
      • handling time series data, time calculations
    • Key ratio indicator measurement and tracking
    • Comparisons - present to: past, plan, and others
      • competitive market analysis
    • Problem monitoring - of variables within control limits
    • Alerts and Event-Driven Agent Processing
    Courtesy Anders Stjarne
  • OLAP Pros and Cons
  • On-Line Analytical Processing
    • Strengths of OLAP
    • Powerful visualization ability via GUI
    • Fast, interactive response times
    • Analysis of time series
    • Deductive discovery of clusters/exceptions
    • Many OLAP products available and integrated to DB products
  • On-Line Analytical Processing
    • Weaknesses of OLAP
    • Does not handle continuous variables
    • Does not automatically discover patterns and models
    • Generation of a hypercube requires some training and experience
    • Hypercube generation and update - MOLAP Vs. ROLAP
  • On-Line Analytical Processing
    • Products and Suppliers
    • PC OLAP
      • PowerPlay (Cognos)
    • High-end ROLAP
      • DSS Agent (Microstrategy)
      • InfoBeacon (Platinum Technology)
    • High-end MOLAP
      • Accumate (Kenan)
      • Oracle Express (Oracle)
      • Wired/ESSBASE (AppSource/Arbor Software)
  • Tutorial
    • Cognos Transformer and PowerPlay
    • Star Schema – http://www.ciobriefings.com/whitepapers/StarSchema.asp
  • THE END [email_address]
  • Codd’s 18 Rules for OLAP
      • Multidimensional Conceptual View (#1)
      • Intuitive data manipulation (#10)
      • Accessibility (#3) – OLAP server engine as middleware
      • Batch Extraction & Interpretive (on the fly) – implies hybrid
      • OLAP Analysis Models – categorical, exegetical, contemplative, goal seeking
      • Client-Server Architecture (#5)
      • Transparency (#2)
      • Multi-User Support (#8) – concurrent access, and update, with security
      • Treatment of Non-Normalized Data
      • Storing OLAP Results separate from Source Data
      • Extraction of Missing Values – missing(NULL) distinct from zero
      • Treatment of Missing Values – excluded from statistical calculations
      • Flexible Reporting (#11) – laying out dimensions in any way
      • Uniform Reporting Performance (#4) – not vary by #dimensions, or size
      • Automatic Adjustment of Physical Level (#7) – adjust for sparsity, size
      • Generic Dimensionality (#6) – all dimensions treated uniformly
      • Unlimited Dimensions & Aggregation Levels (#12)
      • Unrestricted Cross-Dimensional Operations (#9)
    Courtesy Anders Stjarne