Data Mining for Stimulus Program Design

presented to fwPASS on 1/26/2010

DATA MINING – A BETTER WAY
TO DESIGN A STIMULUS
PROGRAM LIKE “CASH FOR
CLUNKERS”

About Me
 Work for Systemental as a Consultant and
Software Developer
 Software development to support Corporate
business process improvement since 2000
(Lean or Continuous Improvement Initiatives)
 .Net since 2004
 President, fwPASS.org
 Mfg. Eng. Technology degrees from Ball State
University
 Six Sigma Black Belt, Certified

What We Will cover

 Data mining – what is it?
 “Cash for Clunkers”
 Other examples
 Amazon.com
 Coke Freestyle
 Basic Data Mining Concepts
 Demo time

Wikipedia

Data mining is the process of extracting
patterns from data. Data mining is becoming
an increasingly important tool to transform
these data into information. It is commonly
used in a wide range of profiling practices,
such as marketing, surveillance, fraud
detection and scientific discovery.

Cash for Clunkers

Columbia City: SR 30 & SR 9

Objectives of “Cash for
Clunkers”
 Jump start automotive sector sales
 Specifically higher mileage vehicles
 Get gas guzzlers off the street

Cash for Clunkers

 How did they decide who to target and
how?
 How would you do it?
 Where did the data come from?
 Where should the data come from?

Who to target?

 Anyone, everyone, or targeted
 Self qualified
 Organic growth or just “pull up” existing sales
 Convert foreign sales to GM
 Conflict of interest? – Government motors
 Discriminatory?

Estimating the effectiveness

 Affect of “pull up” vs. organic growth
 Peripheral commercial effect
 Estimation of payback
 Sales, plates and excise tax
 Income tax from lay-off recalls
 Reduction of unemployment
 Auto Insurance
 Reduction in tax revenue at gas pumps

Data content and source

 Public records
 CAFE
 GM Data
 Industry sponsored studies

SQL Server 2005 Data Mining

 Nine algorithms (3rd party pluggable)
 Both Modeling and exploration in VS
 Integrated tools: SS*S
 API
 Data Mining Extensions to SQL (DMX)

Type of analysis

 Optimization vs. Predictive
 Descriptive – provides deeper understanding
of existing data
 Predictive – provides insight to understand
probability of future conditions

Data Mining Objective

 Classification – assign data to known classes
(discrete)
 Segmentation – clustering in similar groups
 Estimation – predicting continuous values
 Association – what events occur together
 Forecasting – time series estimating of future

Algorithms

1. Decision Trees (attributes from the tree)
2. Naive Bayes (uses all attributes)
3. Clustering
4. Linear Regression
5. Logistic Regression
6. Neural Nets
7. Sequence Clustering
8. Time Series
9. Association Rules (discrete only)

DMX

 Column syntax: Name, data type, content
type, [usage]
 Case being analyzed – key
 Content type: key, key sequence, key time,
discrete, continuous, discretized (# of
buckets)
 Usage: Input, predict, predict-only (not to
build any other part of model)

Structure

 Datamart, DW, cube
 Data source
 Mining Structure (which fields)
 Mining Models (algorithms, attributes)
 Viewers (tree, clusters, discrimination, classification)

Training the model

 SSIS Percentage Sampling Data Flow
Component
 Training, Testing
 Estimating error

Demos

 Visual Studio
 SSMS
 Win Client
 Web Client

Miscellaneous

 Sequence or timing
 Prediction + measure of confidence
 Caution: Over-fitting the model
 Nested tables ex: transactional detail data
 Key is never foreign key to case table
 Key is what table is about

References
 http://dean-o.blogspot.com/
 http://abbottanalytics.blogspot.com/
 http://www.thearling.com/umass/index_frame.htm
 http://www.thearling.com/text/dmtechniques/dmtechniques.htm
 MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
 http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M
ining%20Web%20Controls%20Library
 http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele
aseId=34035
 Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and
Stephen Forte – Chapter 20

Thank you!

 Website
 http://www.systemental.com
 Blogs
 http://dean-o.blogspot.com/
 http://practicalhoshin.blogspot.com
 Twitter
 http://www.twitter.com/deanwillson
 Email
 dean@systemental.com
 LinkedIn
 http://www.linkedin.com/in/deanwillson

Data Mining for Stimulus Program Design

Recommended

Recommended

More Related Content

Similar to Data Mining for Stimulus Program Design

Similar to Data Mining for Stimulus Program Design (20)

More from Dean Willson

More from Dean Willson (12)

Recently uploaded

Recently uploaded (20)

Data Mining for Stimulus Program Design