SlideShare a Scribd company logo
presented to fwPASS on 1/26/2010

DATA MINING – A BETTER WAY
TO DESIGN A STIMULUS
PROGRAM LIKE “CASH FOR
CLUNKERS”
About Me
 Work for Systemental as a Consultant and
  Software Developer
 Software development to support Corporate
  business process improvement since 2000
  (Lean or Continuous Improvement Initiatives)
   .Net since 2004
 President, fwPASS.org
 Mfg. Eng. Technology degrees from Ball State
  University
 Six Sigma Black Belt, Certified
What We Will cover

 Data mining – what is it?
 “Cash for Clunkers”
 Other examples
   Amazon.com
   Coke Freestyle
 Basic Data Mining Concepts
 Demo time
Wikipedia

Data mining is the process of extracting
 patterns from data. Data mining is becoming
 an increasingly important tool to transform
 these data into information. It is commonly
 used in a wide range of profiling practices,
 such as marketing, surveillance, fraud
 detection and scientific discovery.
Cash for Clunkers

    Columbia City: SR 30 & SR 9
Objectives of “Cash for
Clunkers”
 Jump start automotive sector sales
   Specifically higher mileage vehicles
 Get gas guzzlers off the street
Cash for Clunkers

  How did they decide who to target and
   how?
  How would you do it?
  Where did the data come from?
  Where should the data come from?
Who to target?

 Anyone, everyone, or targeted
 Self qualified
 Organic growth or just “pull up” existing sales
 Convert foreign sales to GM
   Conflict of interest? – Government motors
 Discriminatory?
Estimating the effectiveness

 Affect of “pull up” vs. organic growth
 Peripheral commercial effect
 Estimation of payback
   Sales, plates and excise tax
   Income tax from lay-off recalls
   Reduction of unemployment
   Auto Insurance
 Reduction in tax revenue at gas pumps
Data content and source

 Public records
 CAFE
 GM Data
 Industry sponsored studies
Amazon.com
SQL Server 2005 Data Mining

 Nine algorithms (3rd party pluggable)
 Both Modeling and exploration in VS
 Integrated tools: SS*S
 API
 Data Mining Extensions to SQL (DMX)
Type of analysis

 Optimization vs. Predictive
 Descriptive – provides deeper understanding
  of existing data
 Predictive – provides insight to understand
  probability of future conditions
Data Mining Objective

 Classification – assign data to known classes
    (discrete)
   Segmentation – clustering in similar groups
   Estimation – predicting continuous values
   Association – what events occur together
   Forecasting – time series estimating of future
Algorithms

1.   Decision Trees (attributes from the tree)
2.   Naive Bayes (uses all attributes)
3.   Clustering
4.   Linear Regression
5.   Logistic Regression
6.   Neural Nets
7.   Sequence Clustering
8.   Time Series
9.   Association Rules (discrete only)
DMX

 Column syntax: Name, data type, content
  type, [usage]
 Case being analyzed – key
 Content type: key, key sequence, key time,
  discrete, continuous, discretized (# of
  buckets)
 Usage: Input, predict, predict-only (not to
  build any other part of model)
Structure

 Datamart, DW, cube
   Data source
     Mining Structure (which fields)
       Mining Models (algorithms, attributes)
         Viewers (tree, clusters, discrimination, classification)
Training the model

 SSIS Percentage Sampling Data Flow
  Component
 Training, Testing
 Estimating error
Demos

 Visual Studio
 SSMS
 Win Client
 Web Client
Miscellaneous

 Sequence or timing
 Prediction + measure of confidence
 Caution: Over-fitting the model
 Nested tables ex: transactional detail data
   Key is never foreign key to case table
   Key is what table is about
References
   http://dean-o.blogspot.com/
   http://abbottanalytics.blogspot.com/
   http://www.thearling.com/umass/index_frame.htm
   http://www.thearling.com/text/dmtechniques/dmtechniques.htm
   MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
   http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M
    ining%20Web%20Controls%20Library
   http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele
    aseId=34035
   Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and
    Stephen Forte – Chapter 20
Thank you!

 Website
   http://www.systemental.com
 Blogs
   http://dean-o.blogspot.com/
   http://practicalhoshin.blogspot.com
 Twitter
   http://www.twitter.com/deanwillson
 Email
   dean@systemental.com
 LinkedIn
   http://www.linkedin.com/in/deanwillson

More Related Content

Similar to Data Mining with SQL Server 2005

KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
W. Daniel Cox, III CMA, CFM
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
Olha Hrytsay
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
DataWorks Summit
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
Vibhore Agarwal
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
llangit
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
Big Data Week
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Teradata Aster
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Caserta
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
llangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
llangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
llangit
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Harvinder Atwal
 

Similar to Data Mining with SQL Server 2005 (20)

KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 

More from Dean Willson

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using Netduino
Dean Willson
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
Dean Willson
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checks
Dean Willson
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity Tools
Dean Willson
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to Powershell
Dean Willson
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organization
Dean Willson
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source Control
Dean Willson
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers Series
Dean Willson
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Dean Willson
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based Security
Dean Willson
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report Builder
Dean Willson
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps Slideshare
Dean Willson
 

More from Dean Willson (12)

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using Netduino
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checks
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity Tools
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to Powershell
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organization
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source Control
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers Series
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based Security
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report Builder
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps Slideshare
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

Data Mining with SQL Server 2005

  • 1. presented to fwPASS on 1/26/2010 DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
  • 2. About Me  Work for Systemental as a Consultant and Software Developer  Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)  .Net since 2004  President, fwPASS.org  Mfg. Eng. Technology degrees from Ball State University  Six Sigma Black Belt, Certified
  • 3. What We Will cover  Data mining – what is it?  “Cash for Clunkers”  Other examples  Amazon.com  Coke Freestyle  Basic Data Mining Concepts  Demo time
  • 4. Wikipedia Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
  • 5. Cash for Clunkers Columbia City: SR 30 & SR 9
  • 6. Objectives of “Cash for Clunkers”  Jump start automotive sector sales  Specifically higher mileage vehicles  Get gas guzzlers off the street
  • 7. Cash for Clunkers  How did they decide who to target and how?  How would you do it?  Where did the data come from?  Where should the data come from?
  • 8. Who to target?  Anyone, everyone, or targeted  Self qualified  Organic growth or just “pull up” existing sales  Convert foreign sales to GM  Conflict of interest? – Government motors  Discriminatory?
  • 9. Estimating the effectiveness  Affect of “pull up” vs. organic growth  Peripheral commercial effect  Estimation of payback  Sales, plates and excise tax  Income tax from lay-off recalls  Reduction of unemployment  Auto Insurance  Reduction in tax revenue at gas pumps
  • 10. Data content and source  Public records  CAFE  GM Data  Industry sponsored studies
  • 12. SQL Server 2005 Data Mining  Nine algorithms (3rd party pluggable)  Both Modeling and exploration in VS  Integrated tools: SS*S  API  Data Mining Extensions to SQL (DMX)
  • 13. Type of analysis  Optimization vs. Predictive  Descriptive – provides deeper understanding of existing data  Predictive – provides insight to understand probability of future conditions
  • 14. Data Mining Objective  Classification – assign data to known classes (discrete)  Segmentation – clustering in similar groups  Estimation – predicting continuous values  Association – what events occur together  Forecasting – time series estimating of future
  • 15. Algorithms 1. Decision Trees (attributes from the tree) 2. Naive Bayes (uses all attributes) 3. Clustering 4. Linear Regression 5. Logistic Regression 6. Neural Nets 7. Sequence Clustering 8. Time Series 9. Association Rules (discrete only)
  • 16. DMX  Column syntax: Name, data type, content type, [usage]  Case being analyzed – key  Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)  Usage: Input, predict, predict-only (not to build any other part of model)
  • 17. Structure  Datamart, DW, cube  Data source  Mining Structure (which fields)  Mining Models (algorithms, attributes)  Viewers (tree, clusters, discrimination, classification)
  • 18. Training the model  SSIS Percentage Sampling Data Flow Component  Training, Testing  Estimating error
  • 19. Demos  Visual Studio  SSMS  Win Client  Web Client
  • 20. Miscellaneous  Sequence or timing  Prediction + measure of confidence  Caution: Over-fitting the model  Nested tables ex: transactional detail data  Key is never foreign key to case table  Key is what table is about
  • 21. References  http://dean-o.blogspot.com/  http://abbottanalytics.blogspot.com/  http://www.thearling.com/umass/index_frame.htm  http://www.thearling.com/text/dmtechniques/dmtechniques.htm  MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise  http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M ining%20Web%20Controls%20Library  http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele aseId=34035  Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
  • 22. Thank you!  Website  http://www.systemental.com  Blogs  http://dean-o.blogspot.com/  http://practicalhoshin.blogspot.com  Twitter  http://www.twitter.com/deanwillson  Email  dean@systemental.com  LinkedIn  http://www.linkedin.com/in/deanwillson