Data Mining for Developers - Presentation Transcript
BIN06-IS Understanding the Data Mining Add-Ins for Excel 2007 Lynn Langit MSDN Developer Evangelist – Southern California http://blogs.msdn.com/SoCalDevGal
Session Prerequisites
Working SQL Server 2005 Developer
Understanding of OLAP concepts
Working SQL Server Analysis Server 2005 Developer
Interest in or basic knowledge of Data Mining concepts
Session Objectives and Agenda
Understand how to set up a development environment for working with Excel 2007 Data Mining Extensions
Understand the core functionality of the Data Mining extensions
Understand the advanced functionality of the Data Mining extensions.
What and Why Data Mining? Predictive Analytics Presentation Exploration Discovery Passive Interactive Proactive Role of Software Business Insight Canned reporting Ad-hoc reporting OLAP Data mining
Data Mining Problems
From Scenarios to Tasks
From Tasks to Techniques
Microsoft’s Predictive Analytics Data Mining SQL extensions (DMX) Application Developer Data Mining Specialist Microsoft Dynamics CRM Analytics Foundation SQL Server 2005 Business Intelligence Development Studio Microsoft SQL Server 2005 Analysis Services Information Worker Data Mining Add-ins for the 2007 Microsoft Office system Microsoft SQL Server 2005 Data Mining BI Analyst Custom Algorithms
Data Mining Add-ins for Office 2007 Table Analysis Tools for Excel 2007 Data Mining Template for Visio 2007 Data Mining Client for Excel 2007
Microsoft Data Mining Lifecycle CRISP-DM SSAS (Data Mining) Excel SSAS (DSV) Query Excel SSIS SSAS SSRS Excel Your Apps SSIS SSAS Excel Data www.crisp-dm.org Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
Lifecycle – Understand & Prepare Excel Explore Data Clean Data Add-ins
Lifecycle - Prepare Excel Partition Data ADOMD.Net Add-ins SQL Server Analysis Services Data Source Data
Understand & Prepare specifics
Demo 1 – Explore / Clean / Partition Data
Lifecycle – Model & Evaluate Excel Modeling & Evaluation Add-ins SQL Server Analysis Services Data Source Data Mining Models
Modeling Specifics
Demo 3 – Modeling
Evaluation Specifics
Demo 4 – Evaluation
Data Mining – Logical Model Mining Model Mining Model Training Data DB data Client data Application data Data Mining Engine Data To Predict Predicted Data Mining Model DB data Client data Application data “ Just one row ” Data Mining Engine
Data Mining - Physical Model Analysis Services Server Mining Model Data Mining Algorithm Data Source Your Application OLE DB/ ADOMD/ XMLA Deploy BI Dev Studio (Visual Studio) App Data
Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
Architecture Excel Modeling Query Add-ins SQL Server Analysis Services Data Source Data Mining Models
Data Mining Extensions (DMX) CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession
CREATE MINING MODEL CREATE MINING MODEL MyModel ( [CustID] LONG KEY, [Gender] TEXT DISCRETE, [Marital Status] TEXT DISCRETE, [Education] TEXT DISCRETE, [Home Ownership] TEXT DISCRETE PREDICT, [Age] LONG CONTINUOUS, [Income] DOUBLE CONTINUOUS ) USING Microsoft_Decision_Trees
DMX Column Expressions
Predictable Columns
Source Data Columns
Functions
Predict
“ Workhorse”
Discrete scalar values
Continuous scalar values
Associative nested tables
Sequence nested tables
Time Series
Overloaded to
PredictAssociation
PredictSequence
PredictTimeSeries
PredictProbability
PredictSupport
PredictHistogram
Cluster
ClusterProbability
GetNodeId
IsInNode
Arithmetic operators
Stored Procedure
Subselect
Select from nested tables
Data Mining Interfaces – XMLA ++ XMLA Over TCP/IP XMLA Over HTTP Analysis Server (msmdsrv.exe) OLAP Data Mining Server ADOMD.NET .Net Stored Procedures Microsoft Algorithms Third Party Algorithms OLEDB for OLAP/DM ADO/DSO Any Platform, Any Device C++ App VB App .Net App AMO Any App ADOMD.NET WAN DM Interfaces
Summary
Familiar client for SQL Server Data Mining
Model Creators
Model Users
Excel Data or Server Data
Implement the full DM Cycle
Data Understanding
Data Preparation
Modeling
Validation
Deployment
Resources Technical Communities, Webcasts, Blogs, Chats & User Groups http://www.microsoft.com/communities/default.mspx Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/default.mspx Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx SQL Server Data Mining http://www.sqlserverdatamining.com http://www.microsoft.com/bi/bicapabilities/data-mining.aspx
BI Resources from Lynn Langit Foundations of SQL Server 2005 Business Intelligence published by Apress in April 2007 Blog: http://blogs.msdn.com/SoCalDevGal
Q&A
BIN302 Microsoft Office Excel and Analysis Services: An In-Depth Look at Integration
OFF312 Using Data in Excel Solutions Built with Visual Studio Tools for the Office System
0 comments
Post a comment