SSAS 2008 Data Mining Lynn Langit/MSDN Developer Evangelist Microsoft http://blogs.msdn.com/SoCalDevGal
Session Prerequisites Working SQL Server 2008 Developer Understanding of OLAP concepts Working SQL Server Analysis Server 2005 Developer Interest in or basic knowledge of Data Mining concepts
Objectives and Agenda Understand what, why, when & how of SQL Server 2008 Data Mining Examine the core functionality of the Data Mining Extensions Hear about the new and/or advanced functionality of Data Mining
What and Why Data Mining? Predictive Analytics Presentation Exploration Discovery Passive Interactive Proactive Role of Software Business Insight Canned reporting Ad-hoc reporting OLAP Data mining
Cubes vs. Data Mining
DM - Scenarios to Tasks
Tasks to Techniques
BI for Everyone Individual – Excel  Project – Share Point
Microsoft’s Predictive Analytics Data Mining SQL extensions (DMX) Application Developer Data Mining Specialist Microsoft Dynamics CRM Analytics Foundation SQL Server 2008  Business Intelligence Development Studio Microsoft SQL Server 2008 Analysis Services Information  Worker Data Mining Add-ins for  the 2007 Microsoft Office system Microsoft SQL Server 2008 Data Mining BI Analyst Custom Algorithms SQL Services Azure
Data Mining Add-ins for Office 2007 Table Analysis Tools for Excel 2007 Data Mining Template for Visio 2007 Data Mining Client for Excel 2007 Information  Worker BI Analyst Data Mining Specialist
Microsoft Data Mining Lifecycle  CRISP-DM SSAS (Data Mining) Excel SSAS (DSV) Query Excel SSIS SSAS SSRS Excel Your Apps SSIS SSAS Excel Data www.crisp-dm.org Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
Understand & Prepare specifics
Demo 1 – Explore / Clean / Partition Data 2 – Prepare Data
Modeling Specifics
Demo 3 – Select algorithm 4 – Create model
Evaluation Specifics
Demo 5 – Evaluate Model 6 – Deploy model 7- Update model 8 – Query model
Data Mining – Logical Model algorithm Mining Model Mining Model Training Data DB data Client data Application data Data Mining Engine To Predict Predicted Data Mining Model DB data Client data Application data “ Just one row ” Data Mining Engine
Data Mining - Physical Model Analysis Services Server Mining Model Data Mining Algorithm Data Source Your Application OLE DB/ ADOMD/ XMLA Deploy BI Dev Studio  (Visual Studio) App Data
Data Mining Interfaces – APIs XMLA Over TCP/IP XMLA Over HTTP Analysis Server (msmdsrv.exe) OLAP Data Mining Server ADOMD.NET .Net Stored Procedures Microsoft Algorithms Third Party Algorithms OLEDB for OLAP/DM ADO/DSO Any Platform, Any Device C++ App VB App .Net App AMO Any App ADOMD.NET WAN DM Interfaces
Configuration & Deployment Model Creation/Management Database Administrators Session Mining Models Model Application Permissions on models Permissions on data sources Browse Copy to Excel Drillthrough Query Default Advanced Excel Services Manage models and structures Export/Import Rename Connection Database Trace
Data Mining Extensions (DMX)  CREATE MINING MODEL  CreditRisk (CustID   LONG KEY, Gender  TEXT DISCRETE, Income    LONG CONTINUOUS, Profession  TEXT DISCRETE, Risk   TEXT DISCRETE PREDICT) USING  Microsoft_Decision_Trees INSERT INTO   CreditRisk  (CustId, Gender, Income, Profession, Risk) Select  CustomerID, Gender, Income, Profession,Risk From Customers Select  NewCustomers.CustomerID, CreditRisk.Risk,  PredictProbability(CreditRisk.Risk) FROM  CreditRisk  PREDICTION JOIN  NewCustomers ON   CreditRisk.Gender=NewCustomer.Gender   AND  CreditRisk.Income=NewCustomer.Income AND  CreditRisk.Profession=NewCustomer.Profession
DMX Column Expressions Predictable Columns Source Data Columns Functions -  Predict “ Workhorse” Discrete scalar values Continuous scalar values Associative nested tables Sequence nested tables Time Series Overloaded to PredictAssociation PredictSequence PredictTimeSeries PredictProbability PredictSupport PredictHistogram Cluster ClusterProbability GetNodeId IsInNode Arithmetic operators Stored Procedure Subselect Select from nested tables
Demo – Data Mining & Excel 20007 integration
Excel Functions* DMPREDICTTABLEROW  ( Connection, ModelName,   PredictionResult, TableRowRange [, string CommaSeparatedColumnNames] ) DMPREDICT  ( Connection, Model, PredictionResult, Value1, Name1,   [...,Value32, Name32] ) DMCONTENTQUERY  (Connection, Model, PredictionResult [, WhereClause])
DM in the Cloud Test Data Types Relational CSV SQL Services (Azure Services)
Try it in the cloud…
Analysis Results in the Cloud…
Calling the Cloud…(from Excel 2007)
New to SQL Server 2008 DM Microsoft Time Series algorithm improved  ARIMA plus ARTxp method, and a blending algorithm  = better results  New prediction mode allows adding new data to time series models Holdout Support added Easily partition data into training and test sets that are stored in mining structure & available to query after processing Ability to build mining models based on filtered subsets added Results in less structures, i.e. can just filter existing Drillthrough functionality extended  makes all mining structure columns available, not just columns included in the model allows you to build more compact models Cross-validation added allows users to quickly validate their modeling approach by automatically building temporary models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in BIDS, in addition to being accessible programmatically via a stored procedure call.
Summary Data Mining in SQL Server 2008 is mature, powerful and accessible Can use Excel 2007 Familiar client for BI – OLAP cubes AND Data Mining models Model Creators /  Users Excel Data or Server Data SSAS and Excel both support the full DM Cycle Data Understanding & Data Preparation Modeling, Validation & Deployment SQL Services Incubations available now Data Mining from the Cloud More
DM Webcasts Fri, 02 Nov 2007 MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200) Thu, 08 Nov 2007 MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300) Mon, 19 Nov 2007 MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300) Fri, 30 Nov 2007 MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300) Thu, 01 Nov 2007 TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200) Thu, 15 Nov 2007 TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200) Thu, 29 Nov 2007 TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)
DM Resources Technical Communities, Webcasts, Blogs, Chats & User Groups http://www.microsoft.com/communities/default.mspx   Microsoft Developer Network (MSDN) & TechNet  http://microsoft.com/msdn   http://microsoft.com/technet   Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/default.mspx   Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx   SQL Server Data Mining http://www.sqlserverdatamining.com http://www.microsoft.com/bi/bicapabilities/data-mining.aspx
BI Resources from Lynn Langit http :// blogs.msdn.com/SoCalDevGal “ How Do I…BI?” screencast series on MSDN “ Smart Business Intelligence Solutions with Microsoft SQL Server 2008”  MSPress Feb 2009 “ Foundations of SQL Server 2005 Business Intelligence ”  APress April 2007
 

SQL Server 2008 Data Mining

  • 1.
    SSAS 2008 DataMining Lynn Langit/MSDN Developer Evangelist Microsoft http://blogs.msdn.com/SoCalDevGal
  • 2.
    Session Prerequisites WorkingSQL Server 2008 Developer Understanding of OLAP concepts Working SQL Server Analysis Server 2005 Developer Interest in or basic knowledge of Data Mining concepts
  • 3.
    Objectives and AgendaUnderstand what, why, when & how of SQL Server 2008 Data Mining Examine the core functionality of the Data Mining Extensions Hear about the new and/or advanced functionality of Data Mining
  • 4.
    What and WhyData Mining? Predictive Analytics Presentation Exploration Discovery Passive Interactive Proactive Role of Software Business Insight Canned reporting Ad-hoc reporting OLAP Data mining
  • 5.
  • 6.
  • 7.
  • 8.
    BI for EveryoneIndividual – Excel Project – Share Point
  • 9.
    Microsoft’s Predictive AnalyticsData Mining SQL extensions (DMX) Application Developer Data Mining Specialist Microsoft Dynamics CRM Analytics Foundation SQL Server 2008 Business Intelligence Development Studio Microsoft SQL Server 2008 Analysis Services Information Worker Data Mining Add-ins for the 2007 Microsoft Office system Microsoft SQL Server 2008 Data Mining BI Analyst Custom Algorithms SQL Services Azure
  • 10.
    Data Mining Add-insfor Office 2007 Table Analysis Tools for Excel 2007 Data Mining Template for Visio 2007 Data Mining Client for Excel 2007 Information Worker BI Analyst Data Mining Specialist
  • 11.
    Microsoft Data MiningLifecycle CRISP-DM SSAS (Data Mining) Excel SSAS (DSV) Query Excel SSIS SSAS SSRS Excel Your Apps SSIS SSAS Excel Data www.crisp-dm.org Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  • 12.
  • 13.
    Demo 1 –Explore / Clean / Partition Data 2 – Prepare Data
  • 14.
  • 15.
    Demo 3 –Select algorithm 4 – Create model
  • 16.
  • 17.
    Demo 5 –Evaluate Model 6 – Deploy model 7- Update model 8 – Query model
  • 18.
    Data Mining –Logical Model algorithm Mining Model Mining Model Training Data DB data Client data Application data Data Mining Engine To Predict Predicted Data Mining Model DB data Client data Application data “ Just one row ” Data Mining Engine
  • 19.
    Data Mining -Physical Model Analysis Services Server Mining Model Data Mining Algorithm Data Source Your Application OLE DB/ ADOMD/ XMLA Deploy BI Dev Studio (Visual Studio) App Data
  • 20.
    Data Mining Interfaces– APIs XMLA Over TCP/IP XMLA Over HTTP Analysis Server (msmdsrv.exe) OLAP Data Mining Server ADOMD.NET .Net Stored Procedures Microsoft Algorithms Third Party Algorithms OLEDB for OLAP/DM ADO/DSO Any Platform, Any Device C++ App VB App .Net App AMO Any App ADOMD.NET WAN DM Interfaces
  • 21.
    Configuration & DeploymentModel Creation/Management Database Administrators Session Mining Models Model Application Permissions on models Permissions on data sources Browse Copy to Excel Drillthrough Query Default Advanced Excel Services Manage models and structures Export/Import Rename Connection Database Trace
  • 22.
    Data Mining Extensions(DMX) CREATE MINING MODEL CreditRisk (CustID LONG KEY, Gender TEXT DISCRETE, Income LONG CONTINUOUS, Profession TEXT DISCRETE, Risk TEXT DISCRETE PREDICT) USING Microsoft_Decision_Trees INSERT INTO CreditRisk (CustId, Gender, Income, Profession, Risk) Select CustomerID, Gender, Income, Profession,Risk From Customers Select NewCustomers.CustomerID, CreditRisk.Risk, PredictProbability(CreditRisk.Risk) FROM CreditRisk PREDICTION JOIN NewCustomers ON CreditRisk.Gender=NewCustomer.Gender AND CreditRisk.Income=NewCustomer.Income AND CreditRisk.Profession=NewCustomer.Profession
  • 23.
    DMX Column ExpressionsPredictable Columns Source Data Columns Functions - Predict “ Workhorse” Discrete scalar values Continuous scalar values Associative nested tables Sequence nested tables Time Series Overloaded to PredictAssociation PredictSequence PredictTimeSeries PredictProbability PredictSupport PredictHistogram Cluster ClusterProbability GetNodeId IsInNode Arithmetic operators Stored Procedure Subselect Select from nested tables
  • 24.
    Demo – DataMining & Excel 20007 integration
  • 25.
    Excel Functions* DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange [, string CommaSeparatedColumnNames] ) DMPREDICT ( Connection, Model, PredictionResult, Value1, Name1, [...,Value32, Name32] ) DMCONTENTQUERY (Connection, Model, PredictionResult [, WhereClause])
  • 26.
    DM in theCloud Test Data Types Relational CSV SQL Services (Azure Services)
  • 27.
    Try it inthe cloud…
  • 28.
    Analysis Results inthe Cloud…
  • 29.
  • 30.
    New to SQLServer 2008 DM Microsoft Time Series algorithm improved ARIMA plus ARTxp method, and a blending algorithm = better results New prediction mode allows adding new data to time series models Holdout Support added Easily partition data into training and test sets that are stored in mining structure & available to query after processing Ability to build mining models based on filtered subsets added Results in less structures, i.e. can just filter existing Drillthrough functionality extended makes all mining structure columns available, not just columns included in the model allows you to build more compact models Cross-validation added allows users to quickly validate their modeling approach by automatically building temporary models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in BIDS, in addition to being accessible programmatically via a stored procedure call.
  • 31.
    Summary Data Miningin SQL Server 2008 is mature, powerful and accessible Can use Excel 2007 Familiar client for BI – OLAP cubes AND Data Mining models Model Creators / Users Excel Data or Server Data SSAS and Excel both support the full DM Cycle Data Understanding & Data Preparation Modeling, Validation & Deployment SQL Services Incubations available now Data Mining from the Cloud More
  • 32.
    DM Webcasts Fri,02 Nov 2007 MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200) Thu, 08 Nov 2007 MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300) Mon, 19 Nov 2007 MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300) Fri, 30 Nov 2007 MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300) Thu, 01 Nov 2007 TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200) Thu, 15 Nov 2007 TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200) Thu, 29 Nov 2007 TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)
  • 33.
    DM Resources TechnicalCommunities, Webcasts, Blogs, Chats & User Groups http://www.microsoft.com/communities/default.mspx Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/default.mspx Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx SQL Server Data Mining http://www.sqlserverdatamining.com http://www.microsoft.com/bi/bicapabilities/data-mining.aspx
  • 34.
    BI Resources fromLynn Langit http :// blogs.msdn.com/SoCalDevGal “ How Do I…BI?” screencast series on MSDN “ Smart Business Intelligence Solutions with Microsoft SQL Server 2008” MSPress Feb 2009 “ Foundations of SQL Server 2005 Business Intelligence ” APress April 2007
  • 35.