Data Mining for Developers
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data Mining for Developers

on

  • 5,487 views

 

Statistics

Views

Total Views
5,487
Views on SlideShare
5,487
Embed Views
0

Actions

Likes
1
Downloads
898
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Mining for Developers Presentation Transcript

  • 1. BIN06-IS Understanding the Data Mining Add-Ins for Excel 2007 Lynn Langit MSDN Developer Evangelist – Southern California http://blogs.msdn.com/SoCalDevGal
  • 2. Session Prerequisites
    • Working SQL Server 2005 Developer
    • Understanding of OLAP concepts
    • Working SQL Server Analysis Server 2005 Developer
    • Interest in or basic knowledge of Data Mining concepts
  • 3. Session Objectives and Agenda
      • Understand how to set up a development environment for working with Excel 2007 Data Mining Extensions
      • Understand the core functionality of the Data Mining extensions
      • Understand the advanced functionality of the Data Mining extensions.
  • 4. What and Why Data Mining? Predictive Analytics Presentation Exploration Discovery Passive Interactive Proactive Role of Software Business Insight Canned reporting Ad-hoc reporting OLAP Data mining
  • 5. Data Mining Problems
  • 6. From Scenarios to Tasks
  • 7. From Tasks to Techniques
  • 8. Microsoft’s Predictive Analytics Data Mining SQL extensions (DMX) Application Developer Data Mining Specialist Microsoft Dynamics CRM Analytics Foundation SQL Server 2005 Business Intelligence Development Studio Microsoft SQL Server 2005 Analysis Services Information Worker Data Mining Add-ins for the 2007 Microsoft Office system Microsoft SQL Server 2005 Data Mining BI Analyst Custom Algorithms
  • 9. Data Mining Add-ins for Office 2007 Table Analysis Tools for Excel 2007 Data Mining Template for Visio 2007 Data Mining Client for Excel 2007
  • 10. Microsoft Data Mining Lifecycle CRISP-DM SSAS (Data Mining) Excel SSAS (DSV) Query Excel SSIS SSAS SSRS Excel Your Apps SSIS SSAS Excel Data www.crisp-dm.org Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  • 11. Lifecycle – Understand & Prepare Excel Explore Data Clean Data Add-ins
  • 12. Lifecycle - Prepare Excel Partition Data ADOMD.Net Add-ins SQL Server Analysis Services Data Source Data
  • 13. Understand & Prepare specifics
  • 14. Demo 1 – Explore / Clean / Partition Data
  • 15. Lifecycle – Model & Evaluate Excel Modeling & Evaluation Add-ins SQL Server Analysis Services Data Source Data Mining Models
  • 16. Modeling Specifics
  • 17. Demo 3 – Modeling
  • 18. Evaluation Specifics
  • 19. Demo 4 – Evaluation
  • 20. Data Mining – Logical Model Mining Model Mining Model Training Data DB data Client data Application data Data Mining Engine Data To Predict Predicted Data Mining Model DB data Client data Application data “ Just one row ” Data Mining Engine
  • 21. Data Mining - Physical Model Analysis Services Server Mining Model Data Mining Algorithm Data Source Your Application OLE DB/ ADOMD/ XMLA Deploy BI Dev Studio (Visual Studio) App Data
  • 22. Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
  • 23. Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
  • 24. Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
  • 25. Configuration
    • Model Creation/Management
      • Database Administrators
      • Session Mining Models
    • Model Application
      • Permissions on models
      • Permissions on data sources
  • 26. Deployment
    • Browse
      • Copy to Excel
      • Drillthrough
    • Query
      • Default
      • Advanced
    • Excel Services
    • Manage models and structures
      • Export/Import
      • Rename
    • Connection
      • Database
      • Trace
  • 27. Advanced Techniques - DMX
  • 28. Excel Functions*
      • DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange [, string CommaSeparatedColumnNames] )
      • DMPREDICT ( Connection, Model, PredictionResult, Value1, Name1, [...,Value32, Name32] )
      • DMCONTENTQUERY (Connection, Model, PredictionResult [, WhereClause])
  • 29. Data Mining Extensions (DMX) CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession
  • 30. CREATE MINING MODEL CREATE MINING MODEL MyModel ( [CustID] LONG KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, [Education] TEXT DISCRETE, [Home Ownership] TEXT DISCRETE PREDICT, [Age] LONG CONTINUOUS, [Income] DOUBLE CONTINUOUS ) USING Microsoft_Decision_Trees
  • 31. DMX Column Expressions
    • Predictable Columns
    • Source Data Columns
    • Functions
      • Predict
        • “ Workhorse”
        • Discrete scalar values
        • Continuous scalar values
        • Associative nested tables
        • Sequence nested tables
        • Time Series
        • Overloaded to
          • PredictAssociation
          • PredictSequence
          • PredictTimeSeries
      • PredictProbability
      • PredictSupport
      • PredictHistogram
      • Cluster
      • ClusterProbability
      • GetNodeId
      • IsInNode
    • Arithmetic operators
    • Stored Procedure
    • Subselect
      • Select from nested tables
  • 32. Data Mining Interfaces – XMLA ++ XMLA Over TCP/IP XMLA Over HTTP Analysis Server (msmdsrv.exe) OLAP Data Mining Server ADOMD.NET .Net Stored Procedures Microsoft Algorithms Third Party Algorithms OLEDB for OLAP/DM ADO/DSO Any Platform, Any Device C++ App VB App .Net App AMO Any App ADOMD.NET WAN DM Interfaces
  • 33. Summary
    • Familiar client for SQL Server Data Mining
      • Model Creators
      • Model Users
      • Excel Data or Server Data
    • Implement the full DM Cycle
      • Data Understanding
      • Data Preparation
      • Modeling
      • Validation
      • Deployment
  • 34. Resources Technical Communities, Webcasts, Blogs, Chats & User Groups http://www.microsoft.com/communities/default.mspx Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/default.mspx Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx SQL Server Data Mining http://www.sqlserverdatamining.com http://www.microsoft.com/bi/bicapabilities/data-mining.aspx
  • 35. BI Resources from Lynn Langit Foundations of SQL Server 2005 Business Intelligence published by Apress in April 2007 Blog: http://blogs.msdn.com/SoCalDevGal
  • 36. Q&A
  • 37.
    • BIN302 Microsoft Office Excel and Analysis Services: An In-Depth Look at Integration
    • OFF312 Using Data in Excel Solutions Built with Visual Studio Tools for the Office System
    Related Content
  • 38. © 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.