2008 Data Mining Analysis


Published on

Data Integration, Profiling, and Mining with the Microsoft Platform presented 11/2008

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 2008 Data Mining Analysis

    1. 1. Tyler Chessman ( [email_address] ) Technology Specialist Microsoft Corporation
    2. 2. <ul><li>Data Integration and Profiling </li></ul><ul><li>(Quick) Introduction to Data Mining </li></ul><ul><li>Data Mining with Office 2007 </li></ul><ul><ul><li>Excel Table Analysis Tool </li></ul></ul><ul><ul><li>Excel Data Mining Client </li></ul></ul>
    3. 3. END USER TOOLS AND PERFORMANCE MANAGEMENT APPS Excel PerformancePoint Server BI PLATFORM SQL Server Reporting Services SQL Server Analysis Services SQL Server DBMS SQL Server Integration Services SharePoint Server DELIVERY Reports Dashboards Excel Workbooks Analytic Views Scorecards Plans
    4. 4. <ul><li>Enterprise ETL platform </li></ul><ul><ul><li>High performance </li></ul></ul><ul><ul><li>High scale </li></ul></ul><ul><ul><li>More trustworthy and reliable </li></ul></ul><ul><li>Best in class usability </li></ul><ul><ul><li>Rich development environment </li></ul></ul><ul><ul><li>Source control </li></ul></ul><ul><ul><li>Visual debugging of control flow and data </li></ul></ul><ul><ul><li>Great range of transforms out-of-the-box </li></ul></ul><ul><li>Highly extensible </li></ul><ul><ul><li>Custom tasks </li></ul></ul><ul><ul><li>Custom enumerations </li></ul></ul><ul><ul><li>Custom transformations </li></ul></ul><ul><ul><li>Custom data sources </li></ul></ul>
    5. 5. <ul><li>Integrated profiling </li></ul><ul><ul><li>Data Flow Task </li></ul></ul><ul><ul><li>Profile any table </li></ul></ul><ul><ul><li>Output to file </li></ul></ul><ul><li>External Viewer </li></ul><ul><ul><li>Profile per column </li></ul></ul><ul><ul><ul><li>Outliers </li></ul></ul></ul><ul><ul><ul><li>Candidate keys </li></ul></ul></ul><ul><ul><ul><li>Value distribution </li></ul></ul></ul><ul><ul><ul><li>Patterns </li></ul></ul></ul>
    6. 6. <ul><li>SSIS (Simple Import) </li></ul><ul><li>Profiling data </li></ul>
    7. 8. <ul><li>Mining </li></ul><ul><ul><li>Act of excavation in the earth from which ore or minerals can be extracted </li></ul></ul><ul><li>Data Mining </li></ul><ul><ul><li>Act of excavation in the data from which patterns can be extracted </li></ul></ul><ul><ul><li>Alternative name: Knowledge discovery in databases (KDD) </li></ul></ul><ul><ul><li>Multiple disciplines: database, statistics, artificial intelligence </li></ul></ul><ul><ul><li>Fastly maturing technology </li></ul></ul><ul><ul><li>Unlimited applicability </li></ul></ul>
    8. 9. <ul><li>Is this student going to go to a college? </li></ul><ul><ul><li>Based on Gender, ParentIncome, ParentEncouragement, IQ, etc. </li></ul></ul><ul><ul><li>E.g., if ParentEncouragement=Yes and IQ>100, College=Yes </li></ul></ul><ul><ul><li>Classification (prediction) </li></ul></ul><ul><li>Similar questions: </li></ul><ul><ul><li>Is this a spam email? (spam filtering) </li></ul></ul><ul><ul><li>How good/bad is your credit? (credit scoring) </li></ul></ul><ul><ul><li>Recognition of hand-written letters (pen recognition) </li></ul></ul><ul><ul><li>What is this gene like? (bioinformatics) </li></ul></ul><ul><ul><li>Does this person behave like a terrorist? (TIA) </li></ul></ul>
    9. 10. <ul><li>What is the age of a person? </li></ul><ul><ul><li>Based on Hobby, MaritalStatus, NumberOfChildren, Income, HouseOwnership, NumberOfCars, … </li></ul></ul><ul><ul><li>E.g., If MaritalStatus=Yes, Age = 20+4*NumberOfChildren+0.0001*Income+… </li></ul></ul><ul><ul><li> Regression (prediction) </li></ul></ul><ul><li>Similar questions: </li></ul><ul><ul><li>What’s the sales amount of ice cream next month? (sales prediction) </li></ul></ul><ul><ul><li>What’s the stock price of MSFT next week? (stock prediction) </li></ul></ul><ul><ul><li>What’s the income of a customer? (marketing) </li></ul></ul><ul><ul><li>What’s the life-time of a software bug? (bug tracking) </li></ul></ul>
    10. 11. <ul><li>Who are my Web visitor? </li></ul><ul><ul><li>Identify similar groups based on demographics, visiting patterns </li></ul></ul><ul><ul><li>E.g., Daily news readers, email users, shoppers, short-stayers, etc </li></ul></ul><ul><ul><li>Segmentation (clustering) </li></ul></ul><ul><li>Similar questions: </li></ul><ul><ul><li>Identify groups of genes (bioinformatics) </li></ul></ul><ul><ul><li>Identify groups of locations of Cholera incidents in London (spatial data mining) </li></ul></ul><ul><ul><li>Identify group of customers in merchants (Amazon, E-Bay, MSN, WalMart, BlockBuster, etc) (target marketing) </li></ul></ul><ul><ul><li>Identify groups of documents. (text categorization) </li></ul></ul>
    11. 12. <ul><li>What other products are purchased together with a digital camera? </li></ul><ul><ul><li>Based on previous purchases (shopping cart) </li></ul></ul><ul><ul><li>E.g., If a digital camera is purchased, flash memory, battery, printer are also purchased. </li></ul></ul><ul><ul><li>A well-known urban legend, In Walmart, men in their 20s buy beer and diapers together on Fridays. </li></ul></ul><ul><ul><li>Association Analysis (recommendation, market basket analysis, collaborative filtering) </li></ul></ul><ul><li>Similar questions: </li></ul><ul><ul><li>What products to recommend in on-line stores such as Amazon.com, Barnes & Nobles, movie rental, wireless themes, etc. </li></ul></ul><ul><ul><li>What items should be displayed together in merchant. </li></ul></ul><ul><ul><li>What genes appear together in toxic mushrooms. </li></ul></ul>
    12. 13. <ul><li>Could this network packet be from a virus attack? </li></ul><ul><ul><li>Predict likelihood of the network packet pattern </li></ul></ul><ul><ul><li>Anomaly detection (outlier detection) </li></ul></ul><ul><li>Similar questions: </li></ul><ul><ul><li>Are the hospital lab results normal (Adverse drug effect detection) </li></ul></ul><ul><ul><li>Is this credit transaction fraudulent? (fraud detection) </li></ul></ul><ul><ul><li>Does this person behave unusual, maybe worth high-level of security clearance? (TIA) </li></ul></ul>
    13. 14. <ul><li>Classification </li></ul><ul><li>Regression </li></ul><ul><li>Segmentation </li></ul><ul><li>Association Analysis </li></ul><ul><li>Anomaly detection </li></ul><ul><li>Sequence Analysis </li></ul><ul><li>Time-series Analysis </li></ul><ul><li>Text categorization </li></ul><ul><li>Advanced insights discovery </li></ul><ul><li>Others </li></ul>
    14. 15. <ul><li>Decision Trees </li></ul><ul><li>Naïve Bayesian </li></ul><ul><li>Clustering </li></ul><ul><li>Sequence Clustering </li></ul><ul><li>Association Rules </li></ul><ul><li>Neural Network </li></ul><ul><li>Time Series </li></ul><ul><li>Support Vector Machines </li></ul><ul><li>… . </li></ul>
    15. 16. Decision Trees Naïve Bayes Clustering Seq. Clustering Time Series Association rules Neural Network Classification Regression Segmentaion Assoc. Analysis Anomaly Detect. Seq. Analysis Time series √ - second choice √ - first choice √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √
    16. 17. <ul><li>Data Mining Add-In for Excel – Table Analysis </li></ul>
    17. 18. <ul><li>Define the Business Problem </li></ul><ul><li>Prepare the Historical Data </li></ul><ul><li>Explore/Validate the Historical Data </li></ul><ul><li>Build the Data Mining Model(s) </li></ul><ul><li>Explore and Validate the Model(s) </li></ul><ul><li>Deploy and Update the Model(s) </li></ul>
    18. 19. Data Mining Management System (DMMS) Mining Model Define a model Train the model Training Data Test the model Test Data Prediction using the model Prediction Input Data
    19. 20. <ul><li>Data Mining Add-In for Excel – Data Mining Client </li></ul>
    20. 21. <ul><li>SQL Server 2005 Data Mining Add-ins for Office 2007 , May Edition of SQL Server Magazine </li></ul><ul><li> www.sqlmag.com/Article/ArticleID/95451/sql_server_95451.html </li></ul><ul><li>Learn more about SQL Server 2008 http://www.microsoft.com </li></ul><ul><li>Join the SQL PASS community / Houston SQL Server User Group </li></ul><ul><li>http://www.sqlpass.org http://houston.sqlgroups.com </li></ul>