SlideShare a Scribd company logo
1 of 22
Download to read offline
presented to fwPASS on 1/26/2010

DATA MINING – A BETTER WAY
TO DESIGN A STIMULUS
PROGRAM LIKE “CASH FOR
CLUNKERS”
About Me
 Work for Systemental as a Consultant and
  Software Developer
 Software development to support Corporate
  business process improvement since 2000
  (Lean or Continuous Improvement Initiatives)
   .Net since 2004
 President, fwPASS.org
 Mfg. Eng. Technology degrees from Ball State
  University
 Six Sigma Black Belt, Certified
What We Will cover

 Data mining – what is it?
 “Cash for Clunkers”
 Other examples
   Amazon.com
   Coke Freestyle
 Basic Data Mining Concepts
 Demo time
Wikipedia

Data mining is the process of extracting
 patterns from data. Data mining is becoming
 an increasingly important tool to transform
 these data into information. It is commonly
 used in a wide range of profiling practices,
 such as marketing, surveillance, fraud
 detection and scientific discovery.
Cash for Clunkers

    Columbia City: SR 30 & SR 9
Objectives of “Cash for
Clunkers”
 Jump start automotive sector sales
   Specifically higher mileage vehicles
 Get gas guzzlers off the street
Cash for Clunkers

  How did they decide who to target and
   how?
  How would you do it?
  Where did the data come from?
  Where should the data come from?
Who to target?

 Anyone, everyone, or targeted
 Self qualified
 Organic growth or just “pull up” existing sales
 Convert foreign sales to GM
   Conflict of interest? – Government motors
 Discriminatory?
Estimating the effectiveness

 Affect of “pull up” vs. organic growth
 Peripheral commercial effect
 Estimation of payback
   Sales, plates and excise tax
   Income tax from lay-off recalls
   Reduction of unemployment
   Auto Insurance
 Reduction in tax revenue at gas pumps
Data content and source

 Public records
 CAFE
 GM Data
 Industry sponsored studies
Amazon.com
SQL Server 2005 Data Mining

 Nine algorithms (3rd party pluggable)
 Both Modeling and exploration in VS
 Integrated tools: SS*S
 API
 Data Mining Extensions to SQL (DMX)
Type of analysis

 Optimization vs. Predictive
 Descriptive – provides deeper understanding
  of existing data
 Predictive – provides insight to understand
  probability of future conditions
Data Mining Objective

 Classification – assign data to known classes
    (discrete)
   Segmentation – clustering in similar groups
   Estimation – predicting continuous values
   Association – what events occur together
   Forecasting – time series estimating of future
Algorithms

1.   Decision Trees (attributes from the tree)
2.   Naive Bayes (uses all attributes)
3.   Clustering
4.   Linear Regression
5.   Logistic Regression
6.   Neural Nets
7.   Sequence Clustering
8.   Time Series
9.   Association Rules (discrete only)
DMX

 Column syntax: Name, data type, content
  type, [usage]
 Case being analyzed – key
 Content type: key, key sequence, key time,
  discrete, continuous, discretized (# of
  buckets)
 Usage: Input, predict, predict-only (not to
  build any other part of model)
Structure

 Datamart, DW, cube
   Data source
     Mining Structure (which fields)
       Mining Models (algorithms, attributes)
         Viewers (tree, clusters, discrimination, classification)
Training the model

 SSIS Percentage Sampling Data Flow
  Component
 Training, Testing
 Estimating error
Demos

 Visual Studio
 SSMS
 Win Client
 Web Client
Miscellaneous

 Sequence or timing
 Prediction + measure of confidence
 Caution: Over-fitting the model
 Nested tables ex: transactional detail data
   Key is never foreign key to case table
   Key is what table is about
References
   http://dean-o.blogspot.com/
   http://abbottanalytics.blogspot.com/
   http://www.thearling.com/umass/index_frame.htm
   http://www.thearling.com/text/dmtechniques/dmtechniques.htm
   MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
   http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M
    ining%20Web%20Controls%20Library
   http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele
    aseId=34035
   Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and
    Stephen Forte – Chapter 20
Thank you!

 Website
   http://www.systemental.com
 Blogs
   http://dean-o.blogspot.com/
   http://practicalhoshin.blogspot.com
 Twitter
   http://www.twitter.com/deanwillson
 Email
   dean@systemental.com
 LinkedIn
   http://www.linkedin.com/in/deanwillson

More Related Content

Similar to Data Mining for Stimulus Program Design

Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsOlha Hrytsay
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008llangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 

Similar to Data Mining for Stimulus Program Design (20)

KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 

More from Dean Willson

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoDean Willson
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessDean Willson
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checksDean Willson
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsDean Willson
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to PowershellDean Willson
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organizationDean Willson
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source ControlDean Willson
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesDean Willson
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Dean Willson
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityDean Willson
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report BuilderDean Willson
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDean Willson
 

More from Dean Willson (12)

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using Netduino
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checks
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity Tools
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to Powershell
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organization
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source Control
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers Series
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based Security
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report Builder
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps Slideshare
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Data Mining for Stimulus Program Design

  • 1. presented to fwPASS on 1/26/2010 DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
  • 2. About Me  Work for Systemental as a Consultant and Software Developer  Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)  .Net since 2004  President, fwPASS.org  Mfg. Eng. Technology degrees from Ball State University  Six Sigma Black Belt, Certified
  • 3. What We Will cover  Data mining – what is it?  “Cash for Clunkers”  Other examples  Amazon.com  Coke Freestyle  Basic Data Mining Concepts  Demo time
  • 4. Wikipedia Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
  • 5. Cash for Clunkers Columbia City: SR 30 & SR 9
  • 6. Objectives of “Cash for Clunkers”  Jump start automotive sector sales  Specifically higher mileage vehicles  Get gas guzzlers off the street
  • 7. Cash for Clunkers  How did they decide who to target and how?  How would you do it?  Where did the data come from?  Where should the data come from?
  • 8. Who to target?  Anyone, everyone, or targeted  Self qualified  Organic growth or just “pull up” existing sales  Convert foreign sales to GM  Conflict of interest? – Government motors  Discriminatory?
  • 9. Estimating the effectiveness  Affect of “pull up” vs. organic growth  Peripheral commercial effect  Estimation of payback  Sales, plates and excise tax  Income tax from lay-off recalls  Reduction of unemployment  Auto Insurance  Reduction in tax revenue at gas pumps
  • 10. Data content and source  Public records  CAFE  GM Data  Industry sponsored studies
  • 12. SQL Server 2005 Data Mining  Nine algorithms (3rd party pluggable)  Both Modeling and exploration in VS  Integrated tools: SS*S  API  Data Mining Extensions to SQL (DMX)
  • 13. Type of analysis  Optimization vs. Predictive  Descriptive – provides deeper understanding of existing data  Predictive – provides insight to understand probability of future conditions
  • 14. Data Mining Objective  Classification – assign data to known classes (discrete)  Segmentation – clustering in similar groups  Estimation – predicting continuous values  Association – what events occur together  Forecasting – time series estimating of future
  • 15. Algorithms 1. Decision Trees (attributes from the tree) 2. Naive Bayes (uses all attributes) 3. Clustering 4. Linear Regression 5. Logistic Regression 6. Neural Nets 7. Sequence Clustering 8. Time Series 9. Association Rules (discrete only)
  • 16. DMX  Column syntax: Name, data type, content type, [usage]  Case being analyzed – key  Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)  Usage: Input, predict, predict-only (not to build any other part of model)
  • 17. Structure  Datamart, DW, cube  Data source  Mining Structure (which fields)  Mining Models (algorithms, attributes)  Viewers (tree, clusters, discrimination, classification)
  • 18. Training the model  SSIS Percentage Sampling Data Flow Component  Training, Testing  Estimating error
  • 19. Demos  Visual Studio  SSMS  Win Client  Web Client
  • 20. Miscellaneous  Sequence or timing  Prediction + measure of confidence  Caution: Over-fitting the model  Nested tables ex: transactional detail data  Key is never foreign key to case table  Key is what table is about
  • 21. References  http://dean-o.blogspot.com/  http://abbottanalytics.blogspot.com/  http://www.thearling.com/umass/index_frame.htm  http://www.thearling.com/text/dmtechniques/dmtechniques.htm  MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise  http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M ining%20Web%20Controls%20Library  http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele aseId=34035  Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
  • 22. Thank you!  Website  http://www.systemental.com  Blogs  http://dean-o.blogspot.com/  http://practicalhoshin.blogspot.com  Twitter  http://www.twitter.com/deanwillson  Email  dean@systemental.com  LinkedIn  http://www.linkedin.com/in/deanwillson