Your SlideShare is downloading. ×
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Mining with SQL Server 2005

964

Published on

Inspired by recent political and economic events, this presentation will provide a conceptual overview and a technical primer to data mining using the "Cash for Clunkers" program as a hypothetical …

Inspired by recent political and economic events, this presentation will provide a conceptual overview and a technical primer to data mining using the "Cash for Clunkers" program as a hypothetical example for the discussion.
Related blog post "Cash for Clunkers - a Typical Sales Campaign?" http://practicalhoshin.blogspot.com/2009/08/cash-for-clunkers-typical-sales.html

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
964
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. presented to fwPASS on 1/26/2010 DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
  • 2. About Me  Work for Systemental as a Consultant and Software Developer  Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)  .Net since 2004  President, fwPASS.org  Mfg. Eng. Technology degrees from Ball State University  Six Sigma Black Belt, Certified
  • 3. What We Will cover  Data mining – what is it?  “Cash for Clunkers”  Other examples  Amazon.com  Coke Freestyle  Basic Data Mining Concepts  Demo time
  • 4. Wikipedia Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
  • 5. Cash for Clunkers Columbia City: SR 30 & SR 9
  • 6. Objectives of “Cash for Clunkers”  Jump start automotive sector sales  Specifically higher mileage vehicles  Get gas guzzlers off the street
  • 7. Cash for Clunkers  How did they decide who to target and how?  How would you do it?  Where did the data come from?  Where should the data come from?
  • 8. Who to target?  Anyone, everyone, or targeted  Self qualified  Organic growth or just “pull up” existing sales  Convert foreign sales to GM  Conflict of interest? – Government motors  Discriminatory?
  • 9. Estimating the effectiveness  Affect of “pull up” vs. organic growth  Peripheral commercial effect  Estimation of payback  Sales, plates and excise tax  Income tax from lay-off recalls  Reduction of unemployment  Auto Insurance  Reduction in tax revenue at gas pumps
  • 10. Data content and source  Public records  CAFE  GM Data  Industry sponsored studies
  • 11. Amazon.com
  • 12. SQL Server 2005 Data Mining  Nine algorithms (3rd party pluggable)  Both Modeling and exploration in VS  Integrated tools: SS*S  API  Data Mining Extensions to SQL (DMX)
  • 13. Type of analysis  Optimization vs. Predictive  Descriptive – provides deeper understanding of existing data  Predictive – provides insight to understand probability of future conditions
  • 14. Data Mining Objective  Classification – assign data to known classes (discrete)  Segmentation – clustering in similar groups  Estimation – predicting continuous values  Association – what events occur together  Forecasting – time series estimating of future
  • 15. Algorithms 1. Decision Trees (attributes from the tree) 2. Naive Bayes (uses all attributes) 3. Clustering 4. Linear Regression 5. Logistic Regression 6. Neural Nets 7. Sequence Clustering 8. Time Series 9. Association Rules (discrete only)
  • 16. DMX  Column syntax: Name, data type, content type, [usage]  Case being analyzed – key  Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)  Usage: Input, predict, predict-only (not to build any other part of model)
  • 17. Structure  Datamart, DW, cube  Data source  Mining Structure (which fields)  Mining Models (algorithms, attributes)  Viewers (tree, clusters, discrimination, classification)
  • 18. Training the model  SSIS Percentage Sampling Data Flow Component  Training, Testing  Estimating error
  • 19. Demos  Visual Studio  SSMS  Win Client  Web Client
  • 20. Miscellaneous  Sequence or timing  Prediction + measure of confidence  Caution: Over-fitting the model  Nested tables ex: transactional detail data  Key is never foreign key to case table  Key is what table is about
  • 21. References  http://dean-o.blogspot.com/  http://abbottanalytics.blogspot.com/  http://www.thearling.com/umass/index_frame.htm  http://www.thearling.com/text/dmtechniques/dmtechniques.htm  MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise  http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M ining%20Web%20Controls%20Library  http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele aseId=34035  Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
  • 22. Thank you!  Website  http://www.systemental.com  Blogs  http://dean-o.blogspot.com/  http://practicalhoshin.blogspot.com  Twitter  http://www.twitter.com/deanwillson  Email  dean@systemental.com  LinkedIn  http://www.linkedin.com/in/deanwillson

×