Data Mining with SQL Server 2005

1,022
-1

Published on

Inspired by recent political and economic events, this presentation will provide a conceptual overview and a technical primer to data mining using the "Cash for Clunkers" program as a hypothetical example for the discussion.
Related blog post "Cash for Clunkers - a Typical Sales Campaign?" http://practicalhoshin.blogspot.com/2009/08/cash-for-clunkers-typical-sales.html

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,022
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Mining with SQL Server 2005

  1. 1. presented to fwPASS on 1/26/2010 DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
  2. 2. About Me  Work for Systemental as a Consultant and Software Developer  Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)  .Net since 2004  President, fwPASS.org  Mfg. Eng. Technology degrees from Ball State University  Six Sigma Black Belt, Certified
  3. 3. What We Will cover  Data mining – what is it?  “Cash for Clunkers”  Other examples  Amazon.com  Coke Freestyle  Basic Data Mining Concepts  Demo time
  4. 4. Wikipedia Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
  5. 5. Cash for Clunkers Columbia City: SR 30 & SR 9
  6. 6. Objectives of “Cash for Clunkers”  Jump start automotive sector sales  Specifically higher mileage vehicles  Get gas guzzlers off the street
  7. 7. Cash for Clunkers  How did they decide who to target and how?  How would you do it?  Where did the data come from?  Where should the data come from?
  8. 8. Who to target?  Anyone, everyone, or targeted  Self qualified  Organic growth or just “pull up” existing sales  Convert foreign sales to GM  Conflict of interest? – Government motors  Discriminatory?
  9. 9. Estimating the effectiveness  Affect of “pull up” vs. organic growth  Peripheral commercial effect  Estimation of payback  Sales, plates and excise tax  Income tax from lay-off recalls  Reduction of unemployment  Auto Insurance  Reduction in tax revenue at gas pumps
  10. 10. Data content and source  Public records  CAFE  GM Data  Industry sponsored studies
  11. 11. Amazon.com
  12. 12. SQL Server 2005 Data Mining  Nine algorithms (3rd party pluggable)  Both Modeling and exploration in VS  Integrated tools: SS*S  API  Data Mining Extensions to SQL (DMX)
  13. 13. Type of analysis  Optimization vs. Predictive  Descriptive – provides deeper understanding of existing data  Predictive – provides insight to understand probability of future conditions
  14. 14. Data Mining Objective  Classification – assign data to known classes (discrete)  Segmentation – clustering in similar groups  Estimation – predicting continuous values  Association – what events occur together  Forecasting – time series estimating of future
  15. 15. Algorithms 1. Decision Trees (attributes from the tree) 2. Naive Bayes (uses all attributes) 3. Clustering 4. Linear Regression 5. Logistic Regression 6. Neural Nets 7. Sequence Clustering 8. Time Series 9. Association Rules (discrete only)
  16. 16. DMX  Column syntax: Name, data type, content type, [usage]  Case being analyzed – key  Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)  Usage: Input, predict, predict-only (not to build any other part of model)
  17. 17. Structure  Datamart, DW, cube  Data source  Mining Structure (which fields)  Mining Models (algorithms, attributes)  Viewers (tree, clusters, discrimination, classification)
  18. 18. Training the model  SSIS Percentage Sampling Data Flow Component  Training, Testing  Estimating error
  19. 19. Demos  Visual Studio  SSMS  Win Client  Web Client
  20. 20. Miscellaneous  Sequence or timing  Prediction + measure of confidence  Caution: Over-fitting the model  Nested tables ex: transactional detail data  Key is never foreign key to case table  Key is what table is about
  21. 21. References  http://dean-o.blogspot.com/  http://abbottanalytics.blogspot.com/  http://www.thearling.com/umass/index_frame.htm  http://www.thearling.com/text/dmtechniques/dmtechniques.htm  MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise  http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M ining%20Web%20Controls%20Library  http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele aseId=34035  Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
  22. 22. Thank you!  Website  http://www.systemental.com  Blogs  http://dean-o.blogspot.com/  http://practicalhoshin.blogspot.com  Twitter  http://www.twitter.com/deanwillson  Email  dean@systemental.com  LinkedIn  http://www.linkedin.com/in/deanwillson
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×