The document discusses using data mining to better design stimulus programs like "Cash for Clunkers". It presents on how data mining works, examples like Amazon, and how data could help target customers. The presentation demonstrates data mining concepts and algorithms in SQL Server 2005 to analyze customer data and identify patterns to improve programs.
1. presented to fwPASS on 1/26/2010
DATA MINING – A BETTER WAY
TO DESIGN A STIMULUS
PROGRAM LIKE “CASH FOR
CLUNKERS”
2. About Me
Work for Systemental as a Consultant and
Software Developer
Software development to support Corporate
business process improvement since 2000
(Lean or Continuous Improvement Initiatives)
.Net since 2004
President, fwPASS.org
Mfg. Eng. Technology degrees from Ball State
University
Six Sigma Black Belt, Certified
3. What We Will cover
Data mining – what is it?
“Cash for Clunkers”
Other examples
Amazon.com
Coke Freestyle
Basic Data Mining Concepts
Demo time
4. Wikipedia
Data mining is the process of extracting
patterns from data. Data mining is becoming
an increasingly important tool to transform
these data into information. It is commonly
used in a wide range of profiling practices,
such as marketing, surveillance, fraud
detection and scientific discovery.
6. Objectives of “Cash for
Clunkers”
Jump start automotive sector sales
Specifically higher mileage vehicles
Get gas guzzlers off the street
7. Cash for Clunkers
How did they decide who to target and
how?
How would you do it?
Where did the data come from?
Where should the data come from?
8. Who to target?
Anyone, everyone, or targeted
Self qualified
Organic growth or just “pull up” existing sales
Convert foreign sales to GM
Conflict of interest? – Government motors
Discriminatory?
9. Estimating the effectiveness
Affect of “pull up” vs. organic growth
Peripheral commercial effect
Estimation of payback
Sales, plates and excise tax
Income tax from lay-off recalls
Reduction of unemployment
Auto Insurance
Reduction in tax revenue at gas pumps
10. Data content and source
Public records
CAFE
GM Data
Industry sponsored studies
12. SQL Server 2005 Data Mining
Nine algorithms (3rd party pluggable)
Both Modeling and exploration in VS
Integrated tools: SS*S
API
Data Mining Extensions to SQL (DMX)
13. Type of analysis
Optimization vs. Predictive
Descriptive – provides deeper understanding
of existing data
Predictive – provides insight to understand
probability of future conditions
14. Data Mining Objective
Classification – assign data to known classes
(discrete)
Segmentation – clustering in similar groups
Estimation – predicting continuous values
Association – what events occur together
Forecasting – time series estimating of future
15. Algorithms
1. Decision Trees (attributes from the tree)
2. Naive Bayes (uses all attributes)
3. Clustering
4. Linear Regression
5. Logistic Regression
6. Neural Nets
7. Sequence Clustering
8. Time Series
9. Association Rules (discrete only)
16. DMX
Column syntax: Name, data type, content
type, [usage]
Case being analyzed – key
Content type: key, key sequence, key time,
discrete, continuous, discretized (# of
buckets)
Usage: Input, predict, predict-only (not to
build any other part of model)
20. Miscellaneous
Sequence or timing
Prediction + measure of confidence
Caution: Over-fitting the model
Nested tables ex: transactional detail data
Key is never foreign key to case table
Key is what table is about
21. References
http://dean-o.blogspot.com/
http://abbottanalytics.blogspot.com/
http://www.thearling.com/umass/index_frame.htm
http://www.thearling.com/text/dmtechniques/dmtechniques.htm
MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M
ining%20Web%20Controls%20Library
http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele
aseId=34035
Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and
Stephen Forte – Chapter 20