Data Mining
                                                            Sue Walsh
                                        ...
Overview
                                            Brief Historical Perspective
                                        ...
History




Copyright © 2006, SAS Institute Inc. All rights reserved.
Data Mining, circa 1963
                                              IBM 7090      600 cases




                        ...
Since 1963

                                      Moore’s Law:
                                  The information density o...
ho
                              sp
            st       el           it a
               oc       ec            lp
     c...
The Data
                                                            Experimental   Opportunistic

                   Purp...
The Origins of Data Mining

                                                             Statistics
                      ...
Solving the Data Puzzle
                     - a Step-by-Step Approach
                                           Data col...
Definition




Copyright © 2006, SAS Institute Inc. All rights reserved.
What Is Data Mining?

                               • IT
                                            − Complicated databa...
Data Mining – The SAS Definition


                   Advanced methods for exploring and modeling
                       r...
Solving the Data Puzzle
                     - a Step-by-Step Approach
                                           Data col...
The SAS Approach to Data Mining
                 SEMMA
                                             Sample

              ...
Issues




Copyright © 2006, SAS Institute Inc. All rights reserved.
Data Collection and Data Organization

                                      What data has been collected and where is it?...
Modeling Issues and Data Difficulties
                                            Data Preparation
                       ...
Skepticism and Communication

                                            Skepticism
                                     ...
Applications




Copyright © 2006, SAS Institute Inc. All rights reserved.
Health Care
                         Drug development – to help uncover less expensive but equally
                       ...
Business and Finance

                         Banks - to detect which customers are using which products so
             ...
The Absa Group (a South African Bank)

                                            Challenge:
                            ...
Sports and Gambling
                      Sports teams – to analyze data to determine favorable player match
             ...
Education
                                          Enrollment Management – which students
                               ...
Other Application Areas
                         Insurance – pricing, fraud detection, risk analysis
                     ...
Demonstration




Copyright © 2006, SAS Institute Inc. All rights reserved.
Data Mining with SAS Enterprise Miner
      versus with SAS/STAT
                                      Features in SAS Ent...
Another Kind of Data
                                                            Mining



Copyright © 2006, SAS Institute...
Text Mining – What is it?

                                            Text mining is a process that employs a set of
    ...
Another View of Text Mining


                                                  Text

                                    ...
Text Mining Applications

                                            Automotive Early Warning System
                    ...
Text Mining Applications
                                            Insurance Claim Fraud
                               ...
Upcoming SlideShare
Loading in...5
×

Data Mining Issues and Applications

1,464

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,464
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
64
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Data Mining Issues and Applications"

  1. 1. Data Mining Sue Walsh Higher Education Consulting SAS Copyright © 2006, SAS Institute Inc. All rights reserved.
  2. 2. Overview Brief Historical Perspective Defining Data Mining Issues • Data Collection and Data Organization • Modeling Issues and Data Difficulties • Skepticism and Communication Applications SAS Enterprise Miner Demonstration SAS Enterprise Miner versus SAS/STAT Another Kind of Data Mining - Text Mining Copyright © 2006, SAS Institute Inc. All rights reserved.
  3. 3. History Copyright © 2006, SAS Institute Inc. All rights reserved.
  4. 4. Data Mining, circa 1963 IBM 7090 600 cases “Machine storage limitations “Machine storage limitations restricted the total number of restricted the total number of variables which could be variables which could be considered at one time to 25.” considered at one time to 25.” Copyright © 2006, SAS Institute Inc. All rights reserved.
  5. 5. Since 1963 Moore’s Law: The information density on silicon-integrated circuits doubles every 18 to 24 months. Parkinson’s Law: Work expands to fill the time available for its completion. Copyright © 2006, SAS Institute Inc. All rights reserved.
  6. 6. ho sp st el it a oc ec lp ca k tro ni at ta t ra c ie lo de po nt g s in re or t -o gi re de OL m rs TP f-s strie ai ot al e s rl i e ba te ne se le da ns nk ph ta re in on se rv g t ra e at im ns ac ca io ag t io ll s ns es ns cr ta ed x it re ca tu rd rn s ch ar Data Deluge ge s 6
  7. 7. The Data Experimental Opportunistic Purpose Research Operational Value Scientific Commercial Generation Actively Passively controlled observed Size Small Massive Hygiene Clean Dirty State Static Dynamic Copyright © 2006, SAS Institute Inc. All rights reserved.
  8. 8. The Origins of Data Mining Statistics Pattern Neurocomputing Recognition Machine Data Mining Learning AI Databases KDD Copyright © 2006, SAS Institute Inc. All rights reserved.
  9. 9. Solving the Data Puzzle - a Step-by-Step Approach Data collection • Transactional systems ns • Customer information systems ci sio Data organization s De es Data analysis sin Bu Reporting t ul es eR Th Copyright © 2006, SAS Institute Inc. All rights reserved.
  10. 10. Definition Copyright © 2006, SAS Institute Inc. All rights reserved.
  11. 11. What Is Data Mining? • IT − Complicated database queries • ML − Inductive learning from examples • Stat − What we were taught not to do Copyright © 2006, SAS Institute Inc. All rights reserved.
  12. 12. Data Mining – The SAS Definition Advanced methods for exploring and modeling relationships in large amounts of data. Copyright © 2006, SAS Institute Inc. All rights reserved.
  13. 13. Solving the Data Puzzle - a Step-by-Step Approach Data collection • Transactional systems • Customer information systems Data organization - data warehousing Data analysis - data mining Reporting Action Copyright © 2006, SAS Institute Inc. All rights reserved.
  14. 14. The SAS Approach to Data Mining SEMMA Sample Explore Modify Model Assess Copyright © 2006, SAS Institute Inc. All rights reserved.
  15. 15. Issues Copyright © 2006, SAS Institute Inc. All rights reserved.
  16. 16. Data Collection and Data Organization What data has been collected and where is it? How do I combine legacy systems with current data systems? • Customer Story What is the meaning of some of these data values? Copyright © 2006, SAS Institute Inc. All rights reserved.
  17. 17. Modeling Issues and Data Difficulties Data Preparation Rare or Unknown Targets • Over Sampling Undercoverage Dirty Data • Errors • Missing Values Dimension Reduction (Variable Selection) Under and Over Fitting Temporal Infidelity Model Evaluation Copyright © 2006, SAS Institute Inc. All rights reserved.
  18. 18. Skepticism and Communication Skepticism • Breaking the Rules (statisticians) • Magic (non-analytical individuals) Communication Copyright © 2006, SAS Institute Inc. All rights reserved.
  19. 19. Applications Copyright © 2006, SAS Institute Inc. All rights reserved.
  20. 20. Health Care Drug development – to help uncover less expensive but equally effective drug treatments. Medical diagnostics – imaging, real-time monitoring (e.g., predicting women at high risk for emergency C-section). Insurance claims analysis – identify customers likely to buy new policies; define behavior patterns of risky customers. Copyright © 2006, SAS Institute Inc. All rights reserved.
  21. 21. Business and Finance Banks - to detect which customers are using which products so they can offer the right mix of products and services to better meet customer needs – cross sell and up sell. Credit card companies - to assist in mailing promotional materials to people who are most likely to respond. Lenders - to determine which applicants are most likely to default on a loan. Copyright © 2006, SAS Institute Inc. All rights reserved.
  22. 22. The Absa Group (a South African Bank) Challenge: Reduce operating expenses and cut losses by leveraging data to improve security and enhance customer relationships. Solution: SAS helped Absa reduce armed robberies by 41 percent over two years, netting a 38 percent reduction in cash loss and an 11 percent increase in customer satisfaction ratings. Copyright © 2006, SAS Institute Inc. All rights reserved.
  23. 23. Sports and Gambling Sports teams – to analyze data to determine favorable player match ups and call the best plays Gaming industry - to analyze customer gambling trends at casinos. Sports Fanatics – to predict which teams will be chosen for tournament berths as well as to predict game winners. Copyright © 2006, SAS Institute Inc. All rights reserved.
  24. 24. Education Enrollment Management – which students are likely to attend Retention/Graduation Analysis – which students will remain enrolled after the first year and/or through graduation Donation Prediction – who is likely to donate and how much might they donate Faculty Churn – what faculty members are most likely to leave the institution Copyright © 2006, SAS Institute Inc. All rights reserved.
  25. 25. Other Application Areas Insurance – pricing, fraud detection, risk analysis Stock Market – market timing, stock selection, risk analysis Transportation – performance & network optimization to predict life-cycle costs of road pavement Telecommunications – churn reduction Retail – market basket analysis to help determine marketing strategies Copyright © 2006, SAS Institute Inc. All rights reserved.
  26. 26. Demonstration Copyright © 2006, SAS Institute Inc. All rights reserved.
  27. 27. Data Mining with SAS Enterprise Miner versus with SAS/STAT Features in SAS Enterprise Miner not in SAS/STAT • Decision trees • Neural networks • Automatic data splitting • Automatic score code • Model comparison tool Features in SAS/STAT not in SAS Enterprise Miner • Diagnostic statistics The products offer different model evaluation statistics because of the difference in purpose. Copyright © 2006, SAS Institute Inc. All rights reserved.
  28. 28. Another Kind of Data Mining Copyright © 2006, SAS Institute Inc. All rights reserved.
  29. 29. Text Mining – What is it? Text mining is a process that employs a set of algorithms for converting unstructured text into structured data objects and the quantitative methods used to analyze these data objects. “SAS defines text mining as the process of investigating a large collection of free-form documents in order to discover and use the knowledge that exists in the collection as a whole.” (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) Copyright © 2006, SAS Institute Inc. All rights reserved.
  30. 30. Another View of Text Mining Text A Miracle Occurs Numbers Copyright © 2006, SAS Institute Inc. All rights reserved.
  31. 31. Text Mining Applications Automotive Early Warning System • Wallace and Cermack (2004) describe the use of text mining for warranty analysis related to the TREAD act. Medical Information Management • TextWise Labs uses sophisticated text mining methodology to extract medical information from disparate data sources on the Internet. • Computer Science Innovations Inc. is developing an application for the National Cancer Institute that automatically converts medical records into XML data. Copyright © 2006, SAS Institute Inc. All rights reserved.
  32. 32. Text Mining Applications Insurance Claim Fraud • Insurance companies employ Special Investigative Units (SIU) to investigate claims for fraud. Data mining methods can be employed to automate the process of referral. Text mining methods are applied to claims examiner notes, physician reports, and other textual data to enhance predictive accuracy. Technical Support • Sanders and DeVault (2004) describe a process that employs text mining to improve efficiency in a technical support environment. Copyright © 2006, SAS Institute Inc. All rights reserved.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×