Data Mining By Example – Building
Predictive Model Using Microsoft
Decision Trees
by Shaoli Lu
Microsoft Decision Trees
• Developed by Microsoft research team, the
Microsoft Decision Trees algorithm is a hybrid
decision tree algorithm that supports
classification and regression
Goal
• To predict a prospect’s likelihood of
purchasing a bike
Prerequisite
• An SQL Server instance created (2005 or above)
• SQL Server Analysis Service (SSAS) –
Multidimensional Feature Installed
(this is used to host and browse the mining structures; cube is not required for data mining!)
• AdventureWorksDW database attached
(download from CodePlex - tailor to the SQL Server version you have)
• Visual Studio 2010 or above with SQL Server
Data Tools (SSDT) installed
My Demo Setup
• Visual Studio 2010
• SQL Server 2012
Create Data Mining Project
• Name the project as DM Decision Trees
(DM = Data Mining)
Create Data Source and Impersonation
Create Data Source View
Create Mining Structure
• Choose Microsoft Decision Trees model
• Select Data Source View
• Choose training data
• Select Input/Predict parameters
• Set content types
• Set Holdout percentage
• Name the mining structure and model
Deploy the mining structure and
model
Process the mining model
• This is also called “training the model”
Mining Model Viewer
• Identify dominant attributes
• Left is associative with more important
attributes
• Rich visualization is good for data exploration
as well
Mining Model Accuracy Chart
• This is called “Testing the Model” using the
Holdout data
• Lift chart
• Profit chart
Mining Model Prediction
• Singleton query
• Mass prediction
Browse mining model on SQL Server
• Decision trees
• Dependency network
Summary
• Microsoft Decision Trees is a powerful data
mining model, yet it is easy to build, train and use
• Can perform both Singleton (e.g. embed in an
app) and Mass Predictions (e.g. targeted
marketing)
• Holdout data can be used to test trained model
• Rich visualizations such as Lift/Profit Charts and
Dependency Network can facilitate analysis and
data exploration
• Relational database can be used for data mining;
cube is not required
The End

Data mining by example - building predictive model using microsoft decision trees

  • 1.
    Data Mining ByExample – Building Predictive Model Using Microsoft Decision Trees by Shaoli Lu
  • 2.
    Microsoft Decision Trees •Developed by Microsoft research team, the Microsoft Decision Trees algorithm is a hybrid decision tree algorithm that supports classification and regression
  • 3.
    Goal • To predicta prospect’s likelihood of purchasing a bike
  • 4.
    Prerequisite • An SQLServer instance created (2005 or above) • SQL Server Analysis Service (SSAS) – Multidimensional Feature Installed (this is used to host and browse the mining structures; cube is not required for data mining!) • AdventureWorksDW database attached (download from CodePlex - tailor to the SQL Server version you have) • Visual Studio 2010 or above with SQL Server Data Tools (SSDT) installed
  • 5.
    My Demo Setup •Visual Studio 2010 • SQL Server 2012
  • 6.
    Create Data MiningProject • Name the project as DM Decision Trees (DM = Data Mining)
  • 8.
    Create Data Sourceand Impersonation
  • 10.
  • 12.
    Create Mining Structure •Choose Microsoft Decision Trees model • Select Data Source View • Choose training data • Select Input/Predict parameters • Set content types • Set Holdout percentage • Name the mining structure and model
  • 24.
    Deploy the miningstructure and model
  • 26.
    Process the miningmodel • This is also called “training the model”
  • 28.
    Mining Model Viewer •Identify dominant attributes • Left is associative with more important attributes • Rich visualization is good for data exploration as well
  • 30.
    Mining Model AccuracyChart • This is called “Testing the Model” using the Holdout data • Lift chart • Profit chart
  • 34.
    Mining Model Prediction •Singleton query • Mass prediction
  • 41.
    Browse mining modelon SQL Server • Decision trees • Dependency network
  • 45.
    Summary • Microsoft DecisionTrees is a powerful data mining model, yet it is easy to build, train and use • Can perform both Singleton (e.g. embed in an app) and Mass Predictions (e.g. targeted marketing) • Holdout data can be used to test trained model • Rich visualizations such as Lift/Profit Charts and Dependency Network can facilitate analysis and data exploration • Relational database can be used for data mining; cube is not required
  • 46.