Your SlideShare is downloading. ×
Data Mining Issues and Applications
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data Mining Issues and Applications

1,407
views

Published on


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,407
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
57
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Mining Sue Walsh Higher Education Consulting SAS Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 2. Overview Brief Historical Perspective Defining Data Mining Issues • Data Collection and Data Organization • Modeling Issues and Data Difficulties • Skepticism and Communication Applications SAS Enterprise Miner Demonstration SAS Enterprise Miner versus SAS/STAT Another Kind of Data Mining - Text Mining Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 3. History Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 4. Data Mining, circa 1963 IBM 7090 600 cases “Machine storage limitations “Machine storage limitations restricted the total number of restricted the total number of variables which could be variables which could be considered at one time to 25.” considered at one time to 25.” Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 5. Since 1963 Moore’s Law: The information density on silicon-integrated circuits doubles every 18 to 24 months. Parkinson’s Law: Work expands to fill the time available for its completion. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 6. ho sp st el it a oc ec lp ca k tro ni at ta t ra c ie lo de po nt g s in re or t -o gi re de OL m rs TP f-s strie ai ot al e s rl i e ba te ne se le da ns nk ph ta re in on se rv g t ra e at im ns ac ca io ag t io ll s ns es ns cr ta ed x it re ca tu rd rn s ch ar Data Deluge ge s 6
  • 7. The Data Experimental Opportunistic Purpose Research Operational Value Scientific Commercial Generation Actively Passively controlled observed Size Small Massive Hygiene Clean Dirty State Static Dynamic Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 8. The Origins of Data Mining Statistics Pattern Neurocomputing Recognition Machine Data Mining Learning AI Databases KDD Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 9. Solving the Data Puzzle - a Step-by-Step Approach Data collection • Transactional systems ns • Customer information systems ci sio Data organization s De es Data analysis sin Bu Reporting t ul es eR Th Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 10. Definition Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 11. What Is Data Mining? • IT − Complicated database queries • ML − Inductive learning from examples • Stat − What we were taught not to do Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 12. Data Mining – The SAS Definition Advanced methods for exploring and modeling relationships in large amounts of data. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 13. Solving the Data Puzzle - a Step-by-Step Approach Data collection • Transactional systems • Customer information systems Data organization - data warehousing Data analysis - data mining Reporting Action Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 14. The SAS Approach to Data Mining SEMMA Sample Explore Modify Model Assess Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 15. Issues Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 16. Data Collection and Data Organization What data has been collected and where is it? How do I combine legacy systems with current data systems? • Customer Story What is the meaning of some of these data values? Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 17. Modeling Issues and Data Difficulties Data Preparation Rare or Unknown Targets • Over Sampling Undercoverage Dirty Data • Errors • Missing Values Dimension Reduction (Variable Selection) Under and Over Fitting Temporal Infidelity Model Evaluation Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 18. Skepticism and Communication Skepticism • Breaking the Rules (statisticians) • Magic (non-analytical individuals) Communication Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 19. Applications Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 20. Health Care Drug development – to help uncover less expensive but equally effective drug treatments. Medical diagnostics – imaging, real-time monitoring (e.g., predicting women at high risk for emergency C-section). Insurance claims analysis – identify customers likely to buy new policies; define behavior patterns of risky customers. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 21. Business and Finance Banks - to detect which customers are using which products so they can offer the right mix of products and services to better meet customer needs – cross sell and up sell. Credit card companies - to assist in mailing promotional materials to people who are most likely to respond. Lenders - to determine which applicants are most likely to default on a loan. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 22. The Absa Group (a South African Bank) Challenge: Reduce operating expenses and cut losses by leveraging data to improve security and enhance customer relationships. Solution: SAS helped Absa reduce armed robberies by 41 percent over two years, netting a 38 percent reduction in cash loss and an 11 percent increase in customer satisfaction ratings. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 23. Sports and Gambling Sports teams – to analyze data to determine favorable player match ups and call the best plays Gaming industry - to analyze customer gambling trends at casinos. Sports Fanatics – to predict which teams will be chosen for tournament berths as well as to predict game winners. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 24. Education Enrollment Management – which students are likely to attend Retention/Graduation Analysis – which students will remain enrolled after the first year and/or through graduation Donation Prediction – who is likely to donate and how much might they donate Faculty Churn – what faculty members are most likely to leave the institution Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 25. Other Application Areas Insurance – pricing, fraud detection, risk analysis Stock Market – market timing, stock selection, risk analysis Transportation – performance & network optimization to predict life-cycle costs of road pavement Telecommunications – churn reduction Retail – market basket analysis to help determine marketing strategies Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 26. Demonstration Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 27. Data Mining with SAS Enterprise Miner versus with SAS/STAT Features in SAS Enterprise Miner not in SAS/STAT • Decision trees • Neural networks • Automatic data splitting • Automatic score code • Model comparison tool Features in SAS/STAT not in SAS Enterprise Miner • Diagnostic statistics The products offer different model evaluation statistics because of the difference in purpose. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 28. Another Kind of Data Mining Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 29. Text Mining – What is it? Text mining is a process that employs a set of algorithms for converting unstructured text into structured data objects and the quantitative methods used to analyze these data objects. “SAS defines text mining as the process of investigating a large collection of free-form documents in order to discover and use the knowledge that exists in the collection as a whole.” (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 30. Another View of Text Mining Text A Miracle Occurs Numbers Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 31. Text Mining Applications Automotive Early Warning System • Wallace and Cermack (2004) describe the use of text mining for warranty analysis related to the TREAD act. Medical Information Management • TextWise Labs uses sophisticated text mining methodology to extract medical information from disparate data sources on the Internet. • Computer Science Innovations Inc. is developing an application for the National Cancer Institute that automatically converts medical records into XML data. Copyright © 2006, SAS Institute Inc. All rights reserved.
  • 32. Text Mining Applications Insurance Claim Fraud • Insurance companies employ Special Investigative Units (SIU) to investigate claims for fraud. Data mining methods can be employed to automate the process of referral. Text mining methods are applied to claims examiner notes, physician reports, and other textual data to enhance predictive accuracy. Technical Support • Sanders and DeVault (2004) describe a process that employs text mining to improve efficiency in a technical support environment. Copyright © 2006, SAS Institute Inc. All rights reserved.