Business Intelligence SSI Data Mining - Markov Chain

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    5 Favorites

    Business Intelligence SSI Data Mining - Markov Chain - Presentation Transcript

    1. I-DigitalTEK Agile Development IT Solutions for Growing Business Business Intelligence ETL & SSI package “ Data Mining: Markov State Transition” I-DigitalTEK K I IdigitalTEK
    2. SQL Backroom Architecture
      • Data Flow and Data Transformation Services: Microsoft Data Source Views are objects:
        • FTP – MSNQ – WEB SERVICES – SQL Tasks.
        • XML Schema Definition (XSD) and XLS transformation and XPATH Query.
        • Communicate with connection managers.
        • Provide an abstraction between different data access mechanisms.
        • Move data into or out of data flow buffers.
        • Connections can be created to various source such as Microsoft Analysis, Database, Network Connection and File System such as text or excel or SMTP mail server.
        • Container - Enumerators for enumerating over files, generic collections, schema row sets or statically defined collections of strings.
    3. SQL Backroom Architecture
      • Wizard and Slow Moving Dimensions:
        • 6 steps process that defines the additional data flow and runtime behavior for the slowly changing dimension.
      • Data Cleaning Component:
        • clean up before pushing it into ROLAP or a MOLAP store.
        • Fuzzy Grouping transform detects similarities between incoming records to find what appear to be duplicate rows.
        • Lookup transform makes it possible to extract keywords from the documents and categorize them according to the frequency of the occurrence.
      • SSI Programming and Customization:
        • Script Component provides template functions that allow to easily add adapter or transformation functionality to a data flow.
    4. SQL Front Room Architecture
      • Reports Layout:
        • SSRS is capable of generating reports in various formats, such as HTML, XML and Excel formats.
        • Multiple data sources and multiple data sets.
        • Multidimensional Expressions query parameters, a MDX query designer, a Data Mining Extension and a Multi-valued parameters.
      • Querying XML:
        • Connection string to either a URL Web Service or a XML document.
      • Client-side reporting with SSRS
        • SSRS deliver reports to the end user through subscriptions.
      • Security Model
    5. Data Mining and OLAP Cube
      • Data Definition Language:
        • DDL is defined to alter the data models.
        • Root object of the model is the Database sometimes called Catalog.
      • Dimension Hierarchies, Linked Group, Key Performance Indicators:
        • Key Performance Indicators (KPIs) .
        • Unified Data Model (UDM) combines relational and analytical models by presenting a combined view from various data sources.
        • Graphical Query Builders for Online Analytical Processing.
        • Data Source View object makes it possible to create named columns and views on top of the relational tables.
    6. Data Mining and OLAP Cube
      • OLAP or MOLAP:
        • After creating a DSV (Data Source View), the Architect has to define how dimensions, dimension attributes, measures, and partitions map to DSV tables and columns. Star and snowflake schemas are the most commonly used relational schemas in data warehouse reporting. OLTP systems are not very well suited to analyzing data while OLAP systems are designed specifically for analysis and only read data. The CUBE object doesn't store data and doesn't have a physical model. It contains only metadata and calculations.
        • Relational systems (ROLAP) with no data stored directly in the multidimensional database.
        • Multidimensional systems (MOLAP) where the data is loaded into the multidimensional database.
        • Hybrid systems (HOLAP) where the aggregated data is cached in the multidimensional database.
    7. Data Mining and OLAP Cube
      • Optimization and Performance:
        • Partition metadata file, holds information about the slice, partition indexes and aggregation .
      • Data provider and SOAP:
        • Applications updates to the data warehouse and updates to Analysis Services as part of a batch operation done either daily or weekly.
        • More expensive to run.
        • Client generates an XML stream as a XML/A request wrapped in a SOAP envelope via IIS .
        • ADO.NET provider for Analysis Services and an object model designed to simplify access to multidimensional data.
      • Data Security and Administration:
        • Connection and Code access security enables an administrator to prevent external code from performing certain operations
    8. Data Mining
      • “ Data mining is the art form specific to analyzing data and finding hidden patterns by the use of automatic or semiautomatic means.”
      • Built-in Naïve Bayes using conditional probabilities to create predictive rule, Decision Trees to either classify, regress or make association, Time Series, Clustering and Sequence Clustering for Marketing Product basket, Association Rules and a Neural Network.
    9. Markov Chain Overview
      • Markov Property:
        • “ Given the present state, future states are independent of the past states – future states are related through a probabilistic process instead of deterministic”
        • Recurrence versus persistence.
        • Aperiodic versus periodic.
        • Equilibrium distribution & Transition Matrix.
        • Bernouilli Scheme.
        • Markov-MonteCarlo
      • States (Patterns) are identifiable discrete grouping of data (3 to 10).
      • Chi-square transition matrices analysis and validation.
    10. Initial Transition Matrix
      • SQL Task: Aggregate Count of Self Cartesian Product on t = t-1 where variable in State 1 by State(s).
      • Container: generic Array[,] type.
      • Observed probability as count by total number of observation group by state.
      • Expected probability as number of state over number of observation or Bayesian inference.
      • Validate Initial matrices using Chi-square versus expected.
    11. Equilibrium Transition Matrix
      • Bluebit .dll NET 64 bit benchmark.
      • Foreach iterator into Matrix.
      • Matrix.multiply() until isIdentical is true.
      • Equilibrium each row is equal and therefore current state are independents of previous.
      • Second Degree dependency.
      • Law of Large Number and Convergence
      • Periodicity and Fibonacci numbers
    12. Markov Application
      • Random Walk & Financial Market.
      • Decision Support System.
      • Testing.
      • Computer Games.
      • Text Generators.
      • Marketing & Sales.
    13. Reference Material
      • I-DigitalTek - Contact
      • Microsoft SQL Server 2005 Analysis Services By Edward Melomed, Irina Gorbach, Alexander Berger, Py Bateman 
      • Microsoft SQL Server 2005 Integration Services By By Kirk Haselden
      • Microsoft SQL Server 2005 Reporting Services By Michael Lisin, Jim Joseph
      • Data Mining with SQL Server 2005 By ZhaoHui Tang Jamie MacLennan
      • The Microsoft Data Warehouse Toolkit By Joy Mundy, Warren Thonthwaite & Ralph Kimball

    + IDIGITALTEKIDIGITALTEK, 10 months ago

    custom

    1578 views, 5 favs, 1 embeds more stats

    Data mining SQL OLAP - ROLAP

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1578
      • 1576 on SlideShare
      • 2 from embeds
    • Comments 0
    • Favorites 5
    • Downloads 0
    Most viewed embeds
    • 2 views on http://studio.i-digitaltek.net

    more

    All embeds
    • 2 views on http://studio.i-digitaltek.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories