Data Mining beyondAdventure WorksMark Tabladillo Ph.D.http://marktab.netOctober 3, 2009
Approach of this Presentation• Emphasize       – Conceptual value of data mining       – Relationship of data mining to th...
Outline• Data Mining Fundamentals• Interactive Demos• Conclusion© 2009 Mark Tabladillo Ph.D.   3
Interactive Demos• Sports• Government Forecasting© 2009 Mark Tabladillo Ph.D.   4
Data Mining Definitions• Data mining is the automatic or semi-  automatic process of exploring data for  meaningful or use...
Microsoft Data Mining• Microsoft Data Mining refers to  Microsoft’s specific implementation of  certain common data mining...
Data Mining Tasks• Supervised       – Answer known, what is correlated?• Unsupervised       – Answer unknown (unspecified)...
List the Data Mining Algorithms• Ten Answers• Each one is a field of academic focus© 2009 Mark Tabladillo Ph.D.           ...
The Data Mining Algorithms•    Microsoft Naive Bayes•    Microsoft Linear Regression•    Microsoft Decision Trees•    Micr...
The Analyze Tab            Menu Option                     Data Mining Algorithm            Analyze Key Influencers       ...
Demo One:National League Baseball• Directions:  You are on the management team for the  Atlanta Braves. To better serve th...
Demo One:National League Baseball• The following rules apply:       – You must make more than one group       – Each group...
Demo One:National League Baseball• Individual attributes can be used to make  groups• Historical statistics can be used to...
Demo Two:Government Forecasting• Directions:  The President is asking your opinion on  how the following numbers will incr...
Demo Two:Government Forecasting876543210    Jan Feb Mar Apr May Jun       Jul   Aug Sep Oct Nov Dec Jan Feb Mar Apr May Ju...
Demo Two:Government Forecasting1210 8 6 4 2 0     Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug Sep Oct Nov ...
Demo Two:Government Forecasting• Rapid response is as useful as prediction• Seek intelligent correlations among related  m...
Forecasting Algorithms• Microsoft Time Series                               Value                               Slide© 200...
Supervised Algorithms•    Microsoft Naive Bayes•    Microsoft Linear Regression•    Microsoft Decision Trees•    Microsoft...
Unsupervised Algorithms•    Microsoft Clustering•    Microsoft Sequence Clustering•    Microsoft Association Rules•    Tex...
Resources• MarkTab.NET     Links, video resources and information for data mining•    Data Mining with Microsoft SQL Serve...
Regroup and Conclusion• Main Points from this Presentation© 2009 Mark Tabladillo Ph.D.           22
Contact Information• Mark Tabladillo  Twitter @marktabnet• Also on:  Linked In  Facebook© 2009 Mark Tabladillo Ph.D.   23
Bonus:Sequence Clustering Ideas•    Trading players in professional sports•    Assigning players to certain positions•    ...
Upcoming SlideShare
Loading in …5
×

Data Mining Beyond Adventure Works (Redmond WA 10/3/2009)

2,211 views
2,115 views

Published on

(Delivered at Redmond WA -- Oct 3, 2009) Microsoft provides excelllent tutorials and information about data mining through the fictional Adventure Works demos. However, what happens when you stray off that neat-and-tidy path? Data miners should be concerned about data preparation, proper algorithm selection, and correct interpretation. This interactive experience will consist of succinct audience participation demos to introduce some practical issues in real-world data mining.

Published in: Business, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,211
On SlideShare
0
From Embeds
0
Number of Embeds
59
Actions
Shares
0
Downloads
72
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Mining Beyond Adventure Works (Redmond WA 10/3/2009)

  1. 1. Data Mining beyondAdventure WorksMark Tabladillo Ph.D.http://marktab.netOctober 3, 2009
  2. 2. Approach of this Presentation• Emphasize – Conceptual value of data mining – Relationship of data mining to the real world• Reserve – Specific procedures and mechanics – Specific mathematics – Production implementation© 2009 Mark Tabladillo Ph.D. 2
  3. 3. Outline• Data Mining Fundamentals• Interactive Demos• Conclusion© 2009 Mark Tabladillo Ph.D. 3
  4. 4. Interactive Demos• Sports• Government Forecasting© 2009 Mark Tabladillo Ph.D. 4
  5. 5. Data Mining Definitions• Data mining is the automatic or semi- automatic process of exploring data for meaningful or useful patterns.• Data mining algorithms typically use estimation or optimization to achieve results (as opposed to only calculations).© 2009 Mark Tabladillo Ph.D. 5
  6. 6. Microsoft Data Mining• Microsoft Data Mining refers to Microsoft’s specific implementation of certain common data mining algorithms for the DMX (Data Mining Extensions) language.• Also called SQL Server Data Mining, the technology is integrated into SQL Server rather than presented as an independent application.© 2009 Mark Tabladillo Ph.D. 6
  7. 7. Data Mining Tasks• Supervised – Answer known, what is correlated?• Unsupervised – Answer unknown (unspecified), what are the groups?• Forecasting – Given a trend, what is next? Value Slide© 2009 Mark Tabladillo Ph.D. 7
  8. 8. List the Data Mining Algorithms• Ten Answers• Each one is a field of academic focus© 2009 Mark Tabladillo Ph.D. 8
  9. 9. The Data Mining Algorithms• Microsoft Naive Bayes• Microsoft Linear Regression• Microsoft Decision Trees• Microsoft Time Series• Microsoft Clustering• Microsoft Sequence Clustering• Microsoft Association Rules• Microsoft Neural Networks• Microsoft Logistic Regression• Text Mining© 2009 Mark Tabladillo Ph.D. 9
  10. 10. The Analyze Tab Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression Shopping Basket Analysis Association Rules© 2009 Mark Tabladillo Ph.D. 10
  11. 11. Demo One:National League Baseball• Directions: You are on the management team for the Atlanta Braves. To better serve the team, you have been instructed by the owner to group the players by considering both their position and their salary.© 2009 Mark Tabladillo Ph.D. 11
  12. 12. Demo One:National League Baseball• The following rules apply: – You must make more than one group – Each group must have at least two players – Players of different position may be in the same group© 2009 Mark Tabladillo Ph.D. 12
  13. 13. Demo One:National League Baseball• Individual attributes can be used to make groups• Historical statistics can be used to group new players• Both supervised and unsupervised algorithms can be applied to the same data© 2009 Mark Tabladillo Ph.D. 13
  14. 14. Demo Two:Government Forecasting• Directions: The President is asking your opinion on how the following numbers will increase over the next few months. Because this project is sensitive, you do not know what these numbers measure. However, based on the available history, make your best projection for the next six periods.© 2009 Mark Tabladillo Ph.D. 14
  15. 15. Demo Two:Government Forecasting876543210 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008© 2009 Mark Tabladillo Ph.D. 15
  16. 16. Demo Two:Government Forecasting1210 8 6 4 2 0 Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2009 2009 2009 2009 2009 2009 2009 2009© 2009 Mark Tabladillo Ph.D. 16
  17. 17. Demo Two:Government Forecasting• Rapid response is as useful as prediction• Seek intelligent correlations among related metrics• Projections depend on time frame – modeling is continual© 2009 Mark Tabladillo Ph.D. 17
  18. 18. Forecasting Algorithms• Microsoft Time Series Value Slide© 2009 Mark Tabladillo Ph.D. 18
  19. 19. Supervised Algorithms• Microsoft Naive Bayes• Microsoft Linear Regression• Microsoft Decision Trees• Microsoft Neural Networks• Microsoft Logistic Regression Value Slide© 2009 Mark Tabladillo Ph.D. 19
  20. 20. Unsupervised Algorithms• Microsoft Clustering• Microsoft Sequence Clustering• Microsoft Association Rules• Text Mining Value Slide© 2009 Mark Tabladillo Ph.D. 20
  21. 21. Resources• MarkTab.NET Links, video resources and information for data mining• Data Mining with Microsoft SQL Server 2008 by Jamie MacLennan (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author)• Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008 (PRO-Developer) by Lynn Langit (Author), Matthew Roche (Author)© 2009 Mark Tabladillo Ph.D. 21
  22. 22. Regroup and Conclusion• Main Points from this Presentation© 2009 Mark Tabladillo Ph.D. 22
  23. 23. Contact Information• Mark Tabladillo Twitter @marktabnet• Also on: Linked In Facebook© 2009 Mark Tabladillo Ph.D. 23
  24. 24. Bonus:Sequence Clustering Ideas• Trading players in professional sports• Assigning players to certain positions• Moving from city to city• Store path at the mall• Cancer treatment path• Taking up a musical instrument• Taking up sports• Blogging• Viral news© 2009 Mark Tabladillo Ph.D. 24

×