Data Mining I:  KnowledgeSEEKER Jennifer Davis Kelly Davis Saurabh Gupta Chris Mathews Shantea Stanford
Overview of Presentation <ul><li>Introduction to Data Mining Methods and Products </li></ul><ul><li>Tutorial:  How to Use ...
What is Data Mining? <ul><li>Filtering large amounts of data </li></ul><ul><li>Searching for hidden patterns and/or trends...
What Sparked Data Mining? <ul><li>“Motivated by business need, large amounts of available data, and humans’ limited cognit...
Popular Data Mining Methods <ul><li>Neural networks –  learning from data patterns and predicting new data </li></ul><ul><...
Types of Data Mining <ul><li>Association –  identifies relationships </li></ul><ul><li>Sequential pattern –  identifies se...
Data Mining Process <ul><li>“Requires personnel with domain, data warehousing, and data mining expertise </li></ul><ul><li...
How Data Mining Is Used? <ul><li>CRM:   Research, churn and promotional management. </li></ul><ul><li>Process Mgmt:   Redu...
Data Mining Products <ul><li>See product list,  http://www.xore.com/prodtable.html </li></ul><ul><li>According to Jackie S...
Data Mining Products <ul><li>Off-the-shelf applications and bundling are becoming more common. </li></ul><ul><li>Wide rang...
Selection Process –  Questions to Ask? <ul><li>Are the data and variables currently available? </li></ul><ul><li>Will mini...
KnowledgeSEEKER by Angoss <ul><li>Angoss Software Corp = Canadian public company specializing in data mining solutions </l...
Users of KnowledgeSEEKER <ul><li>IRS – fraud detection </li></ul><ul><li>University of Rochester – Cancer research </li></...
Sources <ul><li>Angoss Whitepaper:  http://www.angoss.com/ProdServ/ AnalyticalTools/kseeker/whitepaper.html </li></ul><ul>...
KnowledgeSEEKER Tutorial
KnowledgeSEEKER Exercises <ul><li>According to KnowledgeSeeker, which is the most important variable influencing hypertens...
<ul><li>What is the total number of 51-62 year olds who have identified themselves as “former/never smokers” and have an e...
<ul><li>What percent of women between the ages of 32-50 who occasionally drink have high hypertension?  </li></ul>Knowledg...
<ul><li>What is the percent of people in income group 4,5,7, and 8, age bracket 32-50, who have high hypertension? </li></...
<ul><li>In the sample data, how many people have never smoked before?  </li></ul>KnowledgeSEEKER Exercises Answer - 94
<ul><li>What is the most important factor contributing to hypertension according to KnowledgeSeeker for those in the 51-62...
<ul><li>What is the percentage of males who are “regular” smokers among all male participants?  </li></ul>KnowledgeSEEKER ...
<ul><li>Create a graph of the distribution of smoking males.  </li></ul>KnowledgeSEEKER Exercises
<ul><li>Complete the following steps: </li></ul><ul><li>  Dependent variable – Hypertension </li></ul><ul><li>      Click ...
<ul><li>What is the next split after age that has the highest effect on hypertension according to KnowledgeSeeker?  </li><...
<ul><li>Among 32-50 year olds who report a drink pattern of former/never, how many have high hypertension?  </li></ul>Know...
<ul><li>According to KnowledgeSeeker, what is the most important variable influencing hypertension for women between the a...
Upcoming SlideShare
Loading in …5
×

Presentation_DMining_Final.ppt

445 views

Published on

  • Be the first to comment

  • Be the first to like this

Presentation_DMining_Final.ppt

  1. 1. Data Mining I: KnowledgeSEEKER Jennifer Davis Kelly Davis Saurabh Gupta Chris Mathews Shantea Stanford
  2. 2. Overview of Presentation <ul><li>Introduction to Data Mining Methods and Products </li></ul><ul><li>Tutorial: How to Use KnowledgeSEEKER? </li></ul><ul><li>Exercises: How much did you learn? </li></ul>
  3. 3. What is Data Mining? <ul><li>Filtering large amounts of data </li></ul><ul><li>Searching for hidden patterns and/or trends </li></ul><ul><li>Predicting future results </li></ul><ul><li>Creating a competitive advantage and improving decision making </li></ul><ul><li>Data mining is a form of artificial intelligence, but is very different from other BI tools. </li></ul><ul><ul><li>Discovery versus Verification </li></ul></ul>
  4. 4. What Sparked Data Mining? <ul><li>“Motivated by business need, large amounts of available data, and humans’ limited cognitive processing abilities </li></ul><ul><li>Enabled by data warehousing, parallel processing, and data mining algorithms” </li></ul><ul><li>Source: Dr. Hugh Watson </li></ul>
  5. 5. Popular Data Mining Methods <ul><li>Neural networks – learning from data patterns and predicting new data </li></ul><ul><li>Genetic Algorithms – optimizing techniques </li></ul><ul><li>Decision trees – rules for classifying data </li></ul><ul><li>Regression Analysis - statistical </li></ul><ul><li>K-nearest neighbor – classifying and clustering technique based on weighting of selected variables </li></ul><ul><li>Data Visualization – visually showing patterns </li></ul>
  6. 6. Types of Data Mining <ul><li>Association – identifies relationships </li></ul><ul><li>Sequential pattern – identifies sequencing </li></ul><ul><li>Classifying – identifies potential outcomes for predetermined categories </li></ul><ul><li>Clustering – identifies categories </li></ul><ul><li>Prediction – estimates future values or forecasts </li></ul>
  7. 7. Data Mining Process <ul><li>“Requires personnel with domain, data warehousing, and data mining expertise </li></ul><ul><li>Requires data selection, data extraction, data cleansing, and data transformation </li></ul><ul><li>Most data mining tools work with highly granular flat files </li></ul><ul><li>Is an iterative and interactive process” </li></ul><ul><li>Source: Dr. Hugh Watson </li></ul>
  8. 8. How Data Mining Is Used? <ul><li>CRM: Research, churn and promotional management. </li></ul><ul><li>Process Mgmt: Reduce operational delays. </li></ul><ul><li>Analysis: Develop forecasting models and fraud prevention. </li></ul><ul><li>Predictive Capabilities: Develop rules for queries or expert systems and oil exploration. </li></ul><ul><li>Health Care: Medical research and trends. </li></ul><ul><li>Banking: Identify bank locations. </li></ul><ul><li>Sports: Guide movement of players. </li></ul>
  9. 9. Data Mining Products <ul><li>See product list, http://www.xore.com/prodtable.html </li></ul><ul><li>According to Jackie Sweeney, International Data Corporation, “Data mining has matured, producing fortunes for the Big Three vendors - SPSS, IBM and SAS Institute - and robust revenues for a number of smaller vendors who market solutions tailored to vertical markets.” </li></ul>
  10. 10. Data Mining Products <ul><li>Off-the-shelf applications and bundling are becoming more common. </li></ul><ul><li>Wide range of pricing </li></ul><ul><ul><li>SAS Institute’s Enterprise Miner ~ $80k </li></ul></ul><ul><ul><li>IBM Intelligent Miner ~ $60k </li></ul></ul><ul><ul><li>Angoss KnowledgeSEEKER = $4,750 per license, including upgrades and unlimited tech support for 1 year. Annual license renewal fees are 20% of the list price. </li></ul></ul><ul><ul><li>Desktop products start at few hundred dollars </li></ul></ul>
  11. 11. Selection Process – Questions to Ask? <ul><li>Are the data and variables currently available? </li></ul><ul><li>Will mining involve numerical and nominal data? </li></ul><ul><li>Can the tool build models, predict outcomes and verify results? </li></ul><ul><li>Can it process the amount of data required? </li></ul><ul><li>Can the tool handle incomplete data? </li></ul><ul><li>Can the tool process noisy data? </li></ul><ul><li>Can it provide the degree of granularity desired? </li></ul><ul><li>How much technical knowledge is required? </li></ul>
  12. 12. KnowledgeSEEKER by Angoss <ul><li>Angoss Software Corp = Canadian public company specializing in data mining solutions </li></ul><ul><li>Decision tree modeling </li></ul><ul><li>Fully scalable and easy to use </li></ul><ul><li>Specifications </li></ul><ul><ul><li>Operating Systems: Unix, Windows 3.1, 95, 98 and NT. </li></ul></ul><ul><ul><li>Databases: Access, dBase II, III and IV, ODBC, SAS, SPSS. </li></ul></ul>
  13. 13. Users of KnowledgeSEEKER <ul><li>IRS – fraud detection </li></ul><ul><li>University of Rochester – Cancer research </li></ul><ul><li>Hewlett Packard – process and quality control </li></ul><ul><li>Readers’ Digest – market segmentation </li></ul><ul><li>MGM Grand – survey analysis </li></ul>
  14. 14. Sources <ul><li>Angoss Whitepaper: http://www.angoss.com/ProdServ/ AnalyticalTools/kseeker/whitepaper.html </li></ul><ul><li>“ Data Mining for Golden Opportunities”, Smart Computing , January 2000 </li></ul><ul><li>“ Your Business Intelligence Arsenal”, Telephony, Chicago Apr 24, 2000, Douglas Hackney </li></ul><ul><li>Examples and testimonials: http://www.data-mining-software.com/data_mining_examples.htm </li></ul><ul><li>Data Management , Richard T. Watson, 2002 </li></ul><ul><li>http://www.xore.com/prodtable.html (Data Mining Products) </li></ul><ul><li>Dr. Hugh Watson’s slide </li></ul><ul><li>“ Data Mining Gets Real”, Enterprise Systems Journal , April 1999, Jon William Toigo </li></ul><ul><li>http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm (examples of Data Mining uses) </li></ul>
  15. 15. KnowledgeSEEKER Tutorial
  16. 16. KnowledgeSEEKER Exercises <ul><li>According to KnowledgeSeeker, which is the most important variable influencing hypertension for those between the ages of 51-62 who are “regular” or “occasional” smokers?  </li></ul>Answer - Cheese Last Week
  17. 17. <ul><li>What is the total number of 51-62 year olds who have identified themselves as “former/never smokers” and have an eating pattern that includes “a lot/moderate salt?” </li></ul>KnowledgeSEEKER Exercises Answer – 32
  18. 18. <ul><li>What percent of women between the ages of 32-50 who occasionally drink have high hypertension?  </li></ul>KnowledgeSEEKER Exercises Answer - 28.6%
  19. 19. <ul><li>What is the percent of people in income group 4,5,7, and 8, age bracket 32-50, who have high hypertension? </li></ul><ul><li>  </li></ul>KnowledgeSEEKER Exercises Answer - 11.8%
  20. 20. <ul><li>In the sample data, how many people have never smoked before?  </li></ul>KnowledgeSEEKER Exercises Answer - 94
  21. 21. <ul><li>What is the most important factor contributing to hypertension according to KnowledgeSeeker for those in the 51-62 age bracket? </li></ul>KnowledgeSEEKER Exercises Answer - Smoking Next by right clicking and selecting “Go to Split” find the 4 th most important factor from the table.   Answer - Deep fried last week
  22. 22. <ul><li>What is the percentage of males who are “regular” smokers among all male participants?  </li></ul>KnowledgeSEEKER Exercises Answer - 30.8%
  23. 23. <ul><li>Create a graph of the distribution of smoking males. </li></ul>KnowledgeSEEKER Exercises
  24. 24. <ul><li>Complete the following steps: </li></ul><ul><li> Dependent variable – Hypertension </li></ul><ul><li>     Click on Grow / Automatic </li></ul><ul><li>      </li></ul><ul><li>    What is the total number of males between the ages of 63-72 who had fish last week? </li></ul>KnowledgeSEEKER Exercises Answer – 24
  25. 25. <ul><li>What is the next split after age that has the highest effect on hypertension according to KnowledgeSeeker?  </li></ul>KnowledgeSEEKER Exercises Answer - Height
  26. 26. <ul><li>Among 32-50 year olds who report a drink pattern of former/never, how many have high hypertension?  </li></ul>KnowledgeSEEKER Exercises Answer - 0
  27. 27. <ul><li>According to KnowledgeSeeker, what is the most important variable influencing hypertension for women between the ages of 51-62? </li></ul><ul><ul><li>How is this different from males age 51-62? </li></ul></ul>KnowledgeSEEKER Exercises Women – weight Men - drinking pattern

×