7. Data Mining and Its Applications


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • SPSS analytical solutions can also increase profits by enabling the organization to sell more products and services to existing customers – and to do so more efficiently.
  • SPSS analytical solutions enable increased profits by helping the organization understand which customers are likely to defect, so it can keep customers longer and gain more time to sell more products and services to each customer.
  • Here are a few examples of the impact of waste fraud abuse and mismanagement in government… A GAO report found that 17 Federal programs lost $19.1 billion in improper payments in 1998. But the widely cited figure under states the size of the problem. It accounts only for the improper payments that could be quantified. The GAO has said that: “Improper payments are much greater than have been disclosed thus far.” (GAO, Financial Management: Increased Attention Needed to Prevent Billions in Improper Payments , October 1999) Nearly 75% of this $19.1 Billion is estimated to come from improper medicare payments – nearly $12.6 in inaccurate/inappropriate payments. One report estimates that as high as 10% of all Medicare/Medicaid payments are fraudulent. Medicare has attracted its own class of organized criminals – persons who specialize in defrauding health care and health insurance systems. When the HHS inspector general administration reported the $12.6 billion in improper Medicare payments for 1998, the otherwise disturbing figure suggested at least a trace of good news: it was $7.7 billion lower than the previous year. The Office of the Inspector General identified approximately $31 Million in social security benefits paid to deceased beneficiaries. GAO reports that program administrators have failed to recover outstanding overpayments at SSA, which now total about $4 billion (in 199). In fact, Current and former recipients (in 1998) owed SSA more that $3.3 billion in newly detected overpayments for the year. Based on prior experience, SSA is likely to collect less than 15% of the outstanding debt in a given year. -- GAO Performance and Accountability Series January 1999 According to the Congressional Budget Office [CBO], the rate of food stamp overpayments rose from 6.9 percent in 1996, to 7.3 percent in 1997, to 7.6 percent in 1998. The General Accounting Office [GAO] has often noted these longstanding problems. Millions of dollars in overpayments in the Food Stamp Program occur because eligible persons are paid too much or because ineligible individuals improperly participated in the Food Stamp Program. For example, thousands of prisoners and deceased individuals have been included as members of households receiving food stamps. This again is according to a GAO report , Major Management Challenges and Program Risks: Department of Agriculture , January 1999) HUD overpayments were estimated at $538 million in 1995. The GAO reported that the figure had grown to $847 million in 1998 according to Reviving the Reform Agenda, House budget Committee February 2000 An IG report in May 2001 estimated that nearly $2Billion in subsidies was overpaid. HUD estimates that nearly $1 of every $18 it spends in its section 8 assisted housing program is wasted.
  • 7. Data Mining and Its Applications

    1. 1. Data Mining and Its Applications Data Mining and Its Applications Data Mining Techniques – For Marketing, Sales, and Customer Support , by Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, Inc., 1997. Discovering Data Mining from concept to implementation , by Cabena, Harjinian, Stadler, Verhees and Zanasi, Prentice Hall, 1997. Building Data Mining Applications for CRM , by Alex Berson, Stephen Smith and Kurt Thearling, McGraw Hall, 1999. Data Mining Cookbook – Modeling Data for Marketing, Risk, and Customer Relationship Management , by Olivia Parr Rud, John Wiley & Sons, Inc, 2001. Mastering Data Mining – The Art and Science of Customer Relationship management , by Michael J.A. Berry and Gordon S. Linoff, John Wiley & Sons, Inc, 2000. Machine Learning , by Tom M. Mitchell, McGraw-Hill, 1997. Data Mining – Concepts and Techniques , by Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2001. Introduction to Data Mining , by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005.
    2. 2. <ul><li>Lots of data is being collected and warehoused </li></ul><ul><ul><li>Web data, e-commerce </li></ul></ul><ul><ul><li>purchases at department/ grocery stores </li></ul></ul><ul><ul><li>Bank/Credit Card transactions </li></ul></ul><ul><li>Computers have become cheaper and more powerful </li></ul><ul><li>Competitive Pressure is Strong </li></ul><ul><ul><li>Provide better, customized services for an edge (e.g. in Customer Relationship Management) </li></ul></ul>Why Mine Data? Data Mining and Its Applications
    3. 3. Mining Large Data Sets - Motivation <ul><li>There is often information “ hidden ” in the data that is not readily evident </li></ul><ul><li>Human analysts may take weeks to discover useful information </li></ul><ul><li>Much of the data is never analyzed at all </li></ul>Data Mining and Its Applications The Data Gap Total new disk (TB) since 1995 Number of analysts From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
    4. 4. What is Data Mining? <ul><li>Many Definitions </li></ul><ul><ul><li>Non-trivial extraction of implicit, previously unknown and potentially useful information from data </li></ul></ul><ul><ul><li>Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns </li></ul></ul>Data Mining and Its Applications
    5. 5. What is (not) Data Mining? Data Mining and Its Applications <ul><li>What is Data Mining? </li></ul><ul><ul><li>Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) </li></ul></ul><ul><ul><li>Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) </li></ul></ul><ul><li>What is not Data Mining? </li></ul><ul><ul><li>Look up phone number in phone directory </li></ul></ul><ul><ul><li>Query a Web search engine for information about “Amazon” </li></ul></ul>
    6. 6. <ul><li>Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems </li></ul><ul><li>Traditional Techniques may be unsuitable due to </li></ul><ul><ul><li>Enormity of data </li></ul></ul><ul><ul><li>High dimensionality of data </li></ul></ul><ul><ul><li>Heterogeneous, distributed nature of data </li></ul></ul>Origins of Data Mining Data Mining and Its Applications Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems
    7. 7. Data Mining Tasks <ul><li>Prediction Methods </li></ul><ul><ul><li>Use some variables to predict unknown or future values of other variables. </li></ul></ul><ul><li>Description Methods </li></ul><ul><ul><li>Find human-interpretable patterns that describe the data. </li></ul></ul>Data Mining and Its Applications From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996
    8. 8. Data Mining Tasks... <ul><li>Classification </li></ul><ul><li>Clustering </li></ul><ul><li>Association Rule Discovery </li></ul><ul><li>Sequential Pattern Discovery </li></ul>Data Mining and Its Applications
    9. 9. The Virtuous Cycle of Data Mining Data Mining and Its Applications Measure the results of your efforts to provide insight on how to exploit your data . Identify business problems and areas where analyzing data can provide value Act on the information Taken from a talk given by Michael J.A. Berry on Data Mining for CRM. Transform data into actionable information using data mining techniques
    10. 10. Some Typical Business Problems <ul><li>Customer profiling </li></ul><ul><li>Customer segmentation </li></ul><ul><li>Customer retention </li></ul><ul><li>Basket analysis (retail) </li></ul><ul><li>Direct marketing </li></ul><ul><li>Cross selling </li></ul><ul><li>Fraud detection </li></ul>Data Mining and Its Applications
    11. 11. Customer Profiling <ul><li>Question </li></ul><ul><ul><li>what kinds of customers were profitable in last year? </li></ul></ul><ul><li>Data </li></ul><ul><ul><li>Customer details such as Age, Gender, Occupation, Salary Levels, Account, etc., </li></ul></ul><ul><ul><li>Earnings from customers in last year. </li></ul></ul><ul><li>Data Mining </li></ul><ul><ul><li>Divide customers into profitability categories according to earnings such as highly profitable, profitable, non-profitable, loss. </li></ul></ul><ul><ul><li>Find rules using data mining techniques </li></ul></ul><ul><ul><li>Analyze the rules and take actions </li></ul></ul>Data Mining and Its Applications
    12. 12. Customer Profiling: Rules <ul><li>IF age > 30 and Age <=45 and </li></ul><ul><li>occupation is professional and </li></ul><ul><li>salary level is between 50,000 and 70,000 </li></ul><ul><li>Then this user is profitable </li></ul><ul><li>The rules are with some statistic support such as support and confidence. </li></ul>Data Mining and Its Applications
    13. 13. Customer Segmentation <ul><li>Customer segmentation is a process to divide customers into different groups or segments. Customers in the same segment have similar needs or behaviors so that similar marketing strategies or service policies can be applied to them. </li></ul><ul><li>Customer segments are required in several business areas including </li></ul><ul><ul><li>Marketing </li></ul></ul><ul><ul><li>Customer services </li></ul></ul><ul><ul><li>Products and service development </li></ul></ul><ul><ul><li>Sales promotion </li></ul></ul><ul><ul><li>Customer retention </li></ul></ul>Data Mining and Its Applications
    14. 14. Life Cycle of a Loan Product Data Mining and Its Applications
    15. 15. Business Objectives <ul><li>Mellon Bank Corporation is a major financial services company head-quarted in Pittsburgh. </li></ul><ul><ul><li>Build an extendible loan secured by the values of a client ’ s own property. </li></ul></ul><ul><ul><li>Achieve the highest possible Return On Investment. </li></ul></ul><ul><ul><li>Based on customers with DDA, build a model for HELOC. </li></ul></ul>Data Mining and Its Applications
    16. 16. Data Preparaton <ul><li>The primary data source was the approximately 40,000 Mellon customers who had (or once had) HELCO Cs and DDAs. </li></ul><ul><li>Data </li></ul><ul><ul><li>Demographic data sourced both internally and externally (age, income, length of residence, and other indicators of economic condition) </li></ul></ul><ul><ul><li>DDA data (history of loan balance over 3, 6, 9, 12, 18 months, history of returned checks, history of interest rates. </li></ul></ul><ul><ul><li>Property data sourced externally (home purchase price, loan-to-value ratio) </li></ul></ul><ul><ul><li>Other data related to credit worthiness </li></ul></ul><ul><li>Use 120 variables </li></ul>Data Mining and Its Applications
    17. 17. Data Mining and Its Applications
    18. 18. Responders Data Mining and Its Applications
    19. 19. C lassification Data Mining and Its Applications
    20. 20. Customer Retention <ul><li>Question: </li></ul><ul><ul><li>Find out what kinds of customers tend to churn and build a model which can predict the likely-to-churn customers. </li></ul></ul><ul><li>Data mining solution: </li></ul><ul><ul><li>Collect data about the customers who have churned. </li></ul></ul><ul><ul><li>Select a set of customers who have been loyal. </li></ul></ul><ul><ul><li>Merge the two data sets to form training, testing and evaluation data sets. </li></ul></ul>Data Mining and Its Applications
    21. 21. Data Mining and Its Applications More Efficient Acquisition More Profit Longer Lasting Relationship More Frequent Up/Cross Sell Time Revenue Loss Less Loss Profit Understanding Customers Taken from SPSS talk.
    22. 22. Data Mining and Its Applications More Efficient Acquisition Longer Lasting Relationship Even More Profit More Frequent Up/Cross Sell Time Revenue Loss Less Loss Profit Understanding Customers Taken from SPSS talk.
    23. 23. Basket Analysis Data Mining and Its Applications
    24. 24. Basket Analysis Data Mining and Its Applications Rule A  D C  A A  C B & C  D Support 2/5 2/5 2/5 1/5 Confidence 2/3 2/4 2/3 1/3 A B C A C D B C D A D E B C E
    25. 25. The Impact of Fraud <ul><li>GAO ( The United States General Accounting Office) cited $19.1 billion in improper government payments in 17 major programs for fiscal year 1998. </li></ul><ul><ul><li>Medicare $12.6 Billion </li></ul></ul><ul><ul><li>Supplemental Security Income $1.6 B </li></ul></ul><ul><ul><li>The Food Stamp Program $1.4 B </li></ul></ul><ul><ul><li>Old Age and Survival Insurance $1.2 B </li></ul></ul><ul><ul><li>Disability Insurance $941 Million </li></ul></ul><ul><ul><li>Housing Subsidies $847 Million </li></ul></ul><ul><ul><li>Veterans’ Benefits, Unemployment Insurance and Others $514 Million </li></ul></ul>Data Mining and Its Applications
    26. 26. Background <ul><li>HIC (The Health Insurance Commission) in Australia is a federal government agency. </li></ul><ul><li>HIC pays insurance claims more than 20 million Australian dollars and pay out about A$8 billion in funds every year. </li></ul><ul><li>More than 300 million transactions are processed and stored every year. 1.3TB in five year. </li></ul>Data Mining and Its Applications
    27. 27. Preventing Fraud and Abuse <ul><li>Business Objectives </li></ul><ul><ul><li>The focus of the HIC project was on the recent and steady 10% annual rise in the cost of pathology claims for clinical tests. </li></ul></ul><ul><li>Approaches </li></ul><ul><ul><li>To identify potential fraudulent claims or claims arising from inappropriate practice, and </li></ul></ul><ul><ul><li>To develop general profiles of the GP practices in order to compare practice behaviors of individual GPs. </li></ul></ul>Data Mining and Its Applications
    28. 28. Data Proprocessing <ul><li>Two databases </li></ul><ul><ul><li>Episode Database </li></ul></ul><ul><ul><ul><li>One E pisode record records a patient visit. </li></ul></ul></ul><ul><ul><ul><li>In total, 6.8 million records. </li></ul></ul></ul><ul><ul><ul><li>There were 227 different pathology tests. </li></ul></ul></ul><ul><ul><li>GP (doctor) database </li></ul></ul><ul><ul><ul><li>There are 17,000 records related to active GPs </li></ul></ul></ul><ul><li>The behavior of 10,409 GPs was to be studied. </li></ul><ul><ul><li>A matrix of 10,409 by 227 elements. </li></ul></ul><ul><ul><li>The elements were then scaled from 0 to 1 with respect to the total number of tests of each kind. </li></ul></ul>Data Mining and Its Applications
    29. 29. Input to Segmentation Data Mining and Its Applications
    30. 30. Overview Data Mining and Its Applications
    31. 31. Data Mining <ul><li>They conducted association rule mining, when support = 0.25% , the team decided that the presence of some tests in the input database was causing spurious rules to be revealed ( Pathology Episode Initiation (PEI)). </li></ul><ul><li>PEI tests depend on who ordered them and where they were ordered. </li></ul><ul><li>When the PEI tests were removed, the number of rules dropped significantly. </li></ul>Data Mining and Its Applications
    32. 32. Result Analysis <ul><li>A request for a microscopic examination of feces for parasites (OCP) was associated with a cultural examination of feces (FCS) in 0.85% of cases. </li></ul><ul><ul><li>A 92.6% chance that if OCP tests were requested, they would be done with FCS. </li></ul></ul><ul><ul><li>A 0.61% of chance, OCP was associated with a different more expensive test called MCS32, which costs A$13.55 per test. </li></ul></ul>Data Mining and Its Applications
    33. 33. GP Profiles Data Mining and Its Applications
    34. 34. Discussions <ul><li>Segment 13: </li></ul><ul><ul><li>Represent the majority of traditional GPs who are practicing conventionally. 5,450 GPs. Total 52% of GPs . </li></ul></ul><ul><ul><li>Only 6.2% of the medical pathology tests </li></ul></ul><ul><li>Segment 4: </li></ul><ul><ul><li>54 GPs. Only 0.51% of GPs. </li></ul></ul><ul><ul><li>2.7% of the medical pathology tests. </li></ul></ul>Data Mining and Its Applications
    35. 35. Data Mining and Its Applications 明报 2004.4.21
    36. 36. Data Mining and Its Applications