Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Mining for Developers


Published on

Published in: Business, Technology
  • Be the first to comment

Data Mining for Developers

  1. 1. BIN06-IS Understanding the Data Mining Add-Ins for Excel 2007 Lynn Langit MSDN Developer Evangelist – Southern California
  2. 2. Session Prerequisites <ul><li>Working SQL Server 2005 Developer </li></ul><ul><li>Understanding of OLAP concepts </li></ul><ul><li>Working SQL Server Analysis Server 2005 Developer </li></ul><ul><li>Interest in or basic knowledge of Data Mining concepts </li></ul>
  3. 3. Session Objectives and Agenda <ul><ul><li>Understand how to set up a development environment for working with Excel 2007 Data Mining Extensions </li></ul></ul><ul><ul><li>Understand the core functionality of the Data Mining extensions </li></ul></ul><ul><ul><li>Understand the advanced functionality of the Data Mining extensions. </li></ul></ul>
  4. 4. What and Why Data Mining? Predictive Analytics Presentation Exploration Discovery Passive Interactive Proactive Role of Software Business Insight Canned reporting Ad-hoc reporting OLAP Data mining
  5. 5. Data Mining Problems
  6. 6. From Scenarios to Tasks
  7. 7. From Tasks to Techniques
  8. 8. Microsoft’s Predictive Analytics Data Mining SQL extensions (DMX) Application Developer Data Mining Specialist Microsoft Dynamics CRM Analytics Foundation SQL Server 2005 Business Intelligence Development Studio Microsoft SQL Server 2005 Analysis Services Information Worker Data Mining Add-ins for the 2007 Microsoft Office system Microsoft SQL Server 2005 Data Mining BI Analyst Custom Algorithms
  9. 9. Data Mining Add-ins for Office 2007 Table Analysis Tools for Excel 2007 Data Mining Template for Visio 2007 Data Mining Client for Excel 2007
  10. 10. Microsoft Data Mining Lifecycle CRISP-DM SSAS (Data Mining) Excel SSAS (DSV) Query Excel SSIS SSAS SSRS Excel Your Apps SSIS SSAS Excel Data Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  11. 11. Lifecycle – Understand & Prepare Excel Explore Data Clean Data Add-ins
  12. 12. Lifecycle - Prepare Excel Partition Data ADOMD.Net Add-ins SQL Server Analysis Services Data Source Data
  13. 13. Understand & Prepare specifics
  14. 14. Demo 1 – Explore / Clean / Partition Data
  15. 15. Lifecycle – Model & Evaluate Excel Modeling & Evaluation Add-ins SQL Server Analysis Services Data Source Data Mining Models
  16. 16. Modeling Specifics
  17. 17. Demo 3 – Modeling
  18. 18. Evaluation Specifics
  19. 19. Demo 4 – Evaluation
  20. 20. Data Mining – Logical Model Mining Model Mining Model Training Data DB data Client data Application data Data Mining Engine Data To Predict Predicted Data Mining Model DB data Client data Application data “ Just one row ” Data Mining Engine
  21. 21. Data Mining - Physical Model Analysis Services Server Mining Model Data Mining Algorithm Data Source Your Application OLE DB/ ADOMD/ XMLA Deploy BI Dev Studio (Visual Studio) App Data
  22. 22. Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
  23. 23. Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
  24. 24. Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
  25. 25. Configuration <ul><li>Model Creation/Management </li></ul><ul><ul><li>Database Administrators </li></ul></ul><ul><ul><li>Session Mining Models </li></ul></ul><ul><li>Model Application </li></ul><ul><ul><li>Permissions on models </li></ul></ul><ul><ul><li>Permissions on data sources </li></ul></ul>
  26. 26. Deployment <ul><li>Browse </li></ul><ul><ul><li>Copy to Excel </li></ul></ul><ul><ul><li>Drillthrough </li></ul></ul><ul><li>Query </li></ul><ul><ul><li>Default </li></ul></ul><ul><ul><li>Advanced </li></ul></ul><ul><li>Excel Services </li></ul><ul><li>Manage models and structures </li></ul><ul><ul><li>Export/Import </li></ul></ul><ul><ul><li>Rename </li></ul></ul><ul><li>Connection </li></ul><ul><ul><li>Database </li></ul></ul><ul><ul><li>Trace </li></ul></ul>
  27. 27. Advanced Techniques - DMX
  28. 28. Excel Functions* <ul><ul><li>DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange [, string CommaSeparatedColumnNames] ) </li></ul></ul><ul><ul><li>DMPREDICT ( Connection, Model, PredictionResult, Value1, Name1, [...,Value32, Name32] ) </li></ul></ul><ul><ul><li>DMCONTENTQUERY (Connection, Model, PredictionResult [, WhereClause]) </li></ul></ul>
  29. 29. Data Mining Extensions (DMX) CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession
  31. 31. DMX Column Expressions <ul><li>Predictable Columns </li></ul><ul><li>Source Data Columns </li></ul><ul><li>Functions </li></ul><ul><ul><li>Predict </li></ul></ul><ul><ul><ul><li>“ Workhorse” </li></ul></ul></ul><ul><ul><ul><li>Discrete scalar values </li></ul></ul></ul><ul><ul><ul><li>Continuous scalar values </li></ul></ul></ul><ul><ul><ul><li>Associative nested tables </li></ul></ul></ul><ul><ul><ul><li>Sequence nested tables </li></ul></ul></ul><ul><ul><ul><li>Time Series </li></ul></ul></ul><ul><ul><ul><li>Overloaded to </li></ul></ul></ul><ul><ul><ul><ul><li>PredictAssociation </li></ul></ul></ul></ul><ul><ul><ul><ul><li>PredictSequence </li></ul></ul></ul></ul><ul><ul><ul><ul><li>PredictTimeSeries </li></ul></ul></ul></ul><ul><ul><li>PredictProbability </li></ul></ul><ul><ul><li>PredictSupport </li></ul></ul><ul><ul><li>PredictHistogram </li></ul></ul><ul><ul><li>Cluster </li></ul></ul><ul><ul><li>ClusterProbability </li></ul></ul><ul><ul><li>GetNodeId </li></ul></ul><ul><ul><li>IsInNode </li></ul></ul><ul><li>Arithmetic operators </li></ul><ul><li>Stored Procedure </li></ul><ul><li>Subselect </li></ul><ul><ul><li>Select from nested tables </li></ul></ul>
  32. 32. Data Mining Interfaces – XMLA ++ XMLA Over TCP/IP XMLA Over HTTP Analysis Server (msmdsrv.exe) OLAP Data Mining Server ADOMD.NET .Net Stored Procedures Microsoft Algorithms Third Party Algorithms OLEDB for OLAP/DM ADO/DSO Any Platform, Any Device C++ App VB App .Net App AMO Any App ADOMD.NET WAN DM Interfaces
  33. 33. Summary <ul><li>Familiar client for SQL Server Data Mining </li></ul><ul><ul><li>Model Creators </li></ul></ul><ul><ul><li>Model Users </li></ul></ul><ul><ul><li>Excel Data or Server Data </li></ul></ul><ul><li>Implement the full DM Cycle </li></ul><ul><ul><li>Data Understanding </li></ul></ul><ul><ul><li>Data Preparation </li></ul></ul><ul><ul><li>Modeling </li></ul></ul><ul><ul><li>Validation </li></ul></ul><ul><ul><li>Deployment </li></ul></ul>
  34. 34. Resources Technical Communities, Webcasts, Blogs, Chats & User Groups Microsoft Developer Network (MSDN) & TechNet Trial Software and Virtual Labs Microsoft Learning and Certification SQL Server Data Mining
  35. 35. BI Resources from Lynn Langit Foundations of SQL Server 2005 Business Intelligence published by Apress in April 2007 Blog:
  36. 36. Q&A
  37. 37. <ul><li>BIN302 Microsoft Office Excel and Analysis Services: An In-Depth Look at Integration </li></ul><ul><li>OFF312 Using Data in Excel Solutions Built with Visual Studio Tools for the Office System </li></ul>Related Content
  38. 38. © 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.