SlideShare a Scribd company logo
DBM630: Data Mining and
                       Data Warehousing

                              MS.IT. Rangsit University
                                                 Semester 2/2011



                                               Lecture 10
                 Data Mining: Case Studies and
                                  Applications

    by Kritsada Sriphaew (sriphaew.k AT gmail.com)

1
Topics
 Data Mining Goals
 Data Floods
 Machine Learning / Data Mining Application
  areas
 Case Studies




2                      Data Warehousing and Data Mining by Kritsada Sriphaew
THE FOUR GOALS OF DATA MINING
•       Prediction: Using current data to make prediction on
        future activities

•       Identification: "Data patterns can be used
        to identify the existence of an item, an
        event, or an activity“

•       Classification: Breaking the data down into categories
        based on certain attributes.

•       Optimization: Using the mined data to make
        optimizations on resources, such as time,
        money, etc.

    3                            Data Warehousing and Data Mining by Kritsada Sriphaew
Trends leading to Data Flood
   More data is generated:
       Bank, telecom, other
        business transactions ...
       Scientific data: astronomy,
        biology, etc
       Web, text, and e-commerce




4                             Data Warehousing and Data Mining by Kritsada Sriphaew
Big Data Examples
   Europe's Very Long Baseline Interferometry (VLBI)
    has 16 telescopes, each of which produces 1
    Gigabit/second of astronomical data over a 25-day
    observation session
       storage and analysis a big problem
   AT&T handles billions of calls per day
       so much data, it cannot be all stored -- analysis has to be
        done “on the fly”, on streaming data



5                                  Data Warehousing and Data Mining by Kritsada Sriphaew
Largest databases in 2003
   Commercial databases:
       Winter Corp. 2003 Survey: France Telecom has largest
        decision-support DB, ~30TB; AT&T ~ 26 TB
   Web
       Alexa internet archive: 7 years of data, 500 TB
       Google searches 4+ Billion pages, several hundreds TB
       IBM WebFountain, 160 TB (2003)
       Internet Archive (www.archive.org),~ 300 TB



6                                 Data Warehousing and Data Mining by Kritsada Sriphaew
5 million terabytes created in 2002
   UC Berkeley 2003 estimate: 5 exabytes (5 million
    terabytes) of new data was created in 2002.
    www.sims.berkeley.edu/research/projects/how-much-info-2003/
   US produces ~40% of new stored data worldwide




1018
7                                        Data Warehousing and Data Mining by Kritsada Sriphaew
Data Growth Rate
 Twice as much information was created in 2002 as
  in 1999 (~30% growth rate)
 Other growth rate estimates even higher
 Very little data will ever be looked at by a human
 Knowledge Discovery is NEEDED to make sense and
  use of data.




 8                        Data Warehousing and Data Mining by Kritsada Sriphaew
Machine Learning / Data Mining
Application areas
    Science
        astronomy, bioinformatics, drug discovery, …
    Business
        advertising, CRM (Customer Relationship management),
         investments, manufacturing, sports/entertainment, telecom,
         e-Commerce, targeted marketing, health care, …
    Web:
        search engines, advertising, web and text mining, …
    Government
        surveillance & anti-terror (?|), crime detection, profiling tax
         cheaters, …
 9                                   Data Warehousing and Data Mining by Kritsada Sriphaew
DM applications in 2004 (in %)
    13%: Banking
      9%: Direct Marketing, Fraud Detection, Scientific data analysis
      8%: Bioinformatics
      7%: Insurance, Medical/Pharmaceutic Applications
      6%: eCommerce/Web, Telecommunications
      4%: Investments/Stocks, Manufacturing, Retail, Security
      Bellow: Travel, Entertainment/News, …




    10                                 Data Warehousing and Data Mining by Kritsada Sriphaew
Data Mining for Customer Modeling
   Customer Tasks:
       attrition prediction (odchod zákazníků)
       targeted marketing:
         cross-sell, customer acquisition
       credit-risk
       fraud detection
   Industries
       banking, telecom, retail sales (maloobchodní prodej), …



11                                 Data Warehousing and Data Mining by Kritsada Sriphaew
Customer Attrition: Case Study

  Situation: Attrition rate for mobile phone customers is
   around 25-30% a year!
 Task:
  Given customer information for the past N months,
   predict who is likely to attrite next month.
  Also, estimate customer value and what is the cost-
   effective offer to be made to this customer.




12                           Data Warehousing and Data Mining by Kritsada Sriphaew
Customer Attrition Results
 Verizon Wireless built a customer data warehouse
 Identified potential attriters
 Developed multiple, regional models
 Targeted customers with high propensity to accept
  the offer
 Reduced attrition rate from over 2%/month to under
  1.5%/month (huge impact, with >30 M subscribers)
(Reported in 2003)




 13                      Data Warehousing and Data Mining by Kritsada Sriphaew
Assessing Credit Risk: Case Study
 Situation: Person applies for a loan
 Task: Should a bank approve the loan?
 Note: People who have the best credit don’t need
  the loans, and people with worst credit are not likely
  to repay. Bank’s best customers are in the middle




 14                         Data Warehousing and Data Mining by Kritsada Sriphaew
Credit Risk - Results
 Banks develop credit models using variety of
  machine learning methods.
 Mortgage and credit card proliferation are the results
  of being able to successfully predict if a person is
  likely to default on a loan
 Widely deployed in many countries




 15                        Data Warehousing and Data Mining by Kritsada Sriphaew
Successful e-commerce – Case Study
 A person buys a book (product) at Amazon.com.
 Task: Recommend other books (products) this person
  is likely to buy
 Amazon does clustering based on books bought:
       customers who bought “Advances in Knowledge Discovery
        and Data Mining”, also bought “Data Mining: Practical
        Machine Learning Tools and Techniques with Java
        Implementations”
   Recommendation program is quite successful

 16                             Data Warehousing and Data Mining by Kritsada Sriphaew
Unsuccessful e-commerce case study (KDD-Cup
2000)
    Data: clickstream and purchase data from Gazelle.com, legwear and
     legcare e-tailer
    Q: Characterize visitors who spend more than $12 on an average order
     at the site
    Dataset of 3,465 purchases, 1,831 customers
    Very interesting analysis by Cup participants
        thousands of hours - $X,000,000 (Millions) of consulting
    Total sales -- $Y,000
    Obituary: Gazelle.com out of business, Aug 2000




    17                                        Data Warehousing and Data Mining by Kritsada Sriphaew
Genomic Microarrays – Case Study
Given microarray data for a number of samples
  (patients), can we
 Accurately diagnose the disease?
 Predict outcome for given treatment?
 Recommend best treatment?




 18                      Data Warehousing and Data Mining by Kritsada Sriphaew
Example: ALL/AML data
 38 training cases, 34 test, ~ 7,000 genes
 2 Classes: Acute Lymphoblastic Leukemia (ALL) vs
  Acute Myeloid Leukemia (AML)
 Use train data to build diagnostic model

ALL                                                              AML




         Results on test data:
          33/34 correct, 1 error may be mislabeled
    19                       Data Warehousing and Data Mining by Kritsada Sriphaew
Security and Fraud Detection - Case Study

    Credit Card Fraud Detection
    Detection of Money laundering
        FAIS (US Treasury)
    Securities Fraud
        NASDAQ KDD system
    Phone fraud
        AT&T, Bell Atlantic, British Telecom/MCI
    Bio-terrorism detection at Salt Lake
     Olympics 2002

20                                Data Warehousing and Data Mining by Kritsada Sriphaew
Data Mining and Privacy
   in 2006, NSA (National Security Agency) was reported to
    be mining years of call info, to identify terrorism
    networks
   Social network analysis has a potential to find networks
   Invasion of privacy – do you mind if your call information
    is in a gov database?
   What if NSA program finds one real suspect for 1,000
    false leads ? 1,000,000 false leads?




    21                          Data Warehousing and Data Mining by Kritsada Sriphaew
Case Study: Intrusion Detection Systems
    Detection Approach
        Misuse Detection
         ▪ Based on known malicious patterns
           (signatures)
        Anomaly Detection
         ▪ Based on deviations from established
           normal patterns (profiles)

    Data Source
        Network-based (NIDS)
         ▪ Network traffic
        Host-based (HIDS)
         ▪ Audit trails


23                                   Data Warehousing and Data Mining by Kritsada Sriphaew
Case Study: Intrusion Detection Systems
Data Mining Usage
 Signature extraction
 Rule matching
 Alarm data analysis
       Reduce false alarms
       Eliminate redundant alarms

 Feature selection
 Training Data cleaning


 24                              Data Warehousing and Data Mining by Kritsada Sriphaew
Case Study: Intrusion Detection Systems
    Behavioral Feature for Network Anomaly Detection
        Training set = normal network traffic
        Feature provides semantics of the values of data
        Feature selection is important
        Proposed method:
         ▪ Feature extraction based on protocol behavior
         ▪ Many Attacks uses protocol improperly
           ▪   Ping of Death
           ▪   SYN Flood
           ▪   Teardrop

25                               Data Warehousing and Data Mining by Kritsada Sriphaew
Case Study: Intrusion Detection Systems

  Decision Tree (only small part)

                             <=0.4    WWW
                 tcpPerPSH                         <=0.79       SMTP
         >0.01               >0.4
                                     tcpPerPSH
                                                   >0.79         FTP
tcpPerFIN
                                                 <=0.03      telnet                        >0.79   SMTP
        <=0.01
                           >546773   tcpPerSYN                           >73   tcpPerPSH
                 meanIAT                          >0.03    meanipTLen                              …
                        >546773      …                                  <=73
                                                                               …


   26                                               Data Warehousing and Data Mining by Kritsada Sriphaew
CASE STUDY: METLIFE
Company Profile

MetLife, Inc. is a leading provider of insurance and other financial
services to millions of individual and institutional customers
throughout the United States.

Established in 1863, Metlife now has offices all over
the US and the world, and offers ten different types
of insurances and financial services.




  27                             Data Warehousing and Data Mining by Kritsada Sriphaew
CASE STUDY: METLIFE
Industry: Insurance and Financial Services

How they use Data Mining: Fraud Detection




 28                        Data Warehousing and Data Mining by Kritsada Sriphaew
CASE STUDY: METLIFE
•    Project first started in 2001

•    MetLife set out to build $50 Million relational database

•    This project would consolidate data from 30 business
     world wide.




    29                           Data Warehousing and Data Mining by Kritsada Sriphaew
CASE STUDY: METLIFE
•    Around same time, it was reported that $30 Million of
     insurance money went to fraudulent claims.

•    MetLife teamed up with Computer Sciences Corporation
     (CSC) to

         o   License their data mining tool (called Fraud
             Investigator),

         o   Develop @First, "an early fraud
             detection system"



    30                               Data Warehousing and Data Mining by Kritsada Sriphaew
CASE STUDY: METLIFE
•    By 2003, MetLife's data mining operation was in full swing.

•    They were able to detect fraud in a fraction of the time it
     would take in man hours

•    One example is detecting rate evasion




    31                           Data Warehousing and Data Mining by Kritsada Sriphaew
CASE STUDY: METLIFE
•    Rate evasion is lying about where you live to pay lower
     premiums.

•    Metlife used data mining to detect rate evasion by
     matching ZIP codes with phone numbers to see if the
     cities matched.

•    In 2.5 hours, Metlife found 107 fraudulent claims,
     all linked to a rate-evasion ring in NY and
     Massachusetts.




    32                          Data Warehousing and Data Mining by Kritsada Sriphaew

More Related Content

What's hot

Data mining in agriculture
Data mining in agricultureData mining in agriculture
Data mining in agriculture
Sibananda Khatai
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Si Krishan
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
Kartik Kalpande Patil
 
Data Mining
Data MiningData Mining
Data Mining
Medicaps University
 
Dwdm
DwdmDwdm
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
DeepaR42
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
DataminingTools Inc
 
Data Mining
Data MiningData Mining
Data Mining
Mîrză MuNib
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
Data mining
Data mining Data mining
Data mining
sayalipatil528
 
Data mining
Data miningData mining
Data mining
imran khan
 
Ch 1 intro_dw
Ch 1 intro_dwCh 1 intro_dw
Ch 1 intro_dw
Sushil Kulkarni
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
DanWooster1
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
IOSR Journals
 

What's hot (20)

Data mining in agriculture
Data mining in agricultureData mining in agriculture
Data mining in agriculture
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Data Mining
Data MiningData Mining
Data Mining
 
Dwdm
DwdmDwdm
Dwdm
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Data mining
Data mining Data mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Dbm630 Lecture01
Dbm630 Lecture01Dbm630 Lecture01
Dbm630 Lecture01
 
Ch 1 intro_dw
Ch 1 intro_dwCh 1 intro_dw
Ch 1 intro_dw
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data mining
Data miningData mining
Data mining
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 

Viewers also liked

Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Dr. Sunil Kr. Pandey
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAswathy S Nair
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
Yang Li
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
Yang Li
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
HBaseCon
 
References - sql injection
References - sql injection References - sql injection
References - sql injection
Mohammed
 
Testing
TestingTesting
Testing
Mohammed
 
Data cubes
Data cubesData cubes
Data cubes
Mohammed
 
Oracle-Mengendalikan User
Oracle-Mengendalikan UserOracle-Mengendalikan User
Oracle-Mengendalikan User
idnats
 
MPLS
MPLSMPLS
MPLS
idnats
 

Viewers also liked (20)

Dbm630_lecture01
Dbm630_lecture01Dbm630_lecture01
Dbm630_lecture01
 
Dbm630 lecture07
Dbm630 lecture07Dbm630 lecture07
Dbm630 lecture07
 
Dbm630 lecture04
Dbm630 lecture04Dbm630 lecture04
Dbm630 lecture04
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Dbm630 lecture08
Dbm630 lecture08Dbm630 lecture08
Dbm630 lecture08
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Dbm630_lecture02-03
Dbm630_lecture02-03Dbm630_lecture02-03
Dbm630_lecture02-03
 
Dbm630 lecture05
Dbm630 lecture05Dbm630 lecture05
Dbm630 lecture05
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Datacube
DatacubeDatacube
Datacube
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
References
References References
References
 
References - sql injection
References - sql injection References - sql injection
References - sql injection
 
Testing
TestingTesting
Testing
 
Data cubes
Data cubesData cubes
Data cubes
 
Oracle-Mengendalikan User
Oracle-Mengendalikan UserOracle-Mengendalikan User
Oracle-Mengendalikan User
 
MPLS
MPLSMPLS
MPLS
 

Similar to Dbm630 lecture10

Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
JanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptJanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.ppt
georgejustymirobi1
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
SangrangBargayary3
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
ShubhamSamrat5
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
AidaMustapha6
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
admsoyadm4
 
data mining
data miningdata mining
data mining
AMITKUMAR202236
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
VaibhavGupta447155
 
ai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.pptai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.ppt
ALAMGIRHOSSAIN256982
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
Subrata Kumer Paul
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
Mrityunjay Emmi
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
VijayasankariS
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
bommaiah
 

Similar to Dbm630 lecture10 (20)

Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
JanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.pptJanData-mining-to-knowledge-discovery.ppt
JanData-mining-to-knowledge-discovery.ppt
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
isd314-01
isd314-01isd314-01
isd314-01
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
mineria de datos
mineria de datosmineria de datos
mineria de datos
 
mineria datos
mineria datosmineria datos
mineria datos
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
D
DD
D
 
ai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.pptai based computer basic learning Lecture about Bigdata.ppt
ai based computer basic learning Lecture about Bigdata.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
 

More from Tokyo Institute of Technology

Lecture 4 online and offline business model generation
Lecture 4 online and offline business model generationLecture 4 online and offline business model generation
Lecture 4 online and offline business model generation
Tokyo Institute of Technology
 
Lecture 4: Brand Creation
Lecture 4: Brand CreationLecture 4: Brand Creation
Lecture 4: Brand Creation
Tokyo Institute of Technology
 
Lecture3 ExperientialMarketing
Lecture3 ExperientialMarketingLecture3 ExperientialMarketing
Lecture3 ExperientialMarketing
Tokyo Institute of Technology
 
Lecture3 Tools and Content Creation
Lecture3 Tools and Content CreationLecture3 Tools and Content Creation
Lecture3 Tools and Content Creation
Tokyo Institute of Technology
 
Lecture2: Innovation Workshop
Lecture2: Innovation WorkshopLecture2: Innovation Workshop
Lecture2: Innovation Workshop
Tokyo Institute of Technology
 
Lecture0: introduction Online Marketing
Lecture0: introduction Online MarketingLecture0: introduction Online Marketing
Lecture0: introduction Online Marketing
Tokyo Institute of Technology
 
Lecture2: Marketing and Social Media
Lecture2: Marketing and Social MediaLecture2: Marketing and Social Media
Lecture2: Marketing and Social Media
Tokyo Institute of Technology
 
Lecture1: E-Commerce Business Model
Lecture1: E-Commerce Business ModelLecture1: E-Commerce Business Model
Lecture1: E-Commerce Business Model
Tokyo Institute of Technology
 
Lecture0: Introduction Social Commerce
Lecture0: Introduction Social CommerceLecture0: Introduction Social Commerce
Lecture0: Introduction Social Commerce
Tokyo Institute of Technology
 

More from Tokyo Institute of Technology (11)

Lecture 4 online and offline business model generation
Lecture 4 online and offline business model generationLecture 4 online and offline business model generation
Lecture 4 online and offline business model generation
 
Lecture 4: Brand Creation
Lecture 4: Brand CreationLecture 4: Brand Creation
Lecture 4: Brand Creation
 
Lecture3 ExperientialMarketing
Lecture3 ExperientialMarketingLecture3 ExperientialMarketing
Lecture3 ExperientialMarketing
 
Lecture3 Tools and Content Creation
Lecture3 Tools and Content CreationLecture3 Tools and Content Creation
Lecture3 Tools and Content Creation
 
Lecture2: Innovation Workshop
Lecture2: Innovation WorkshopLecture2: Innovation Workshop
Lecture2: Innovation Workshop
 
Lecture0: introduction Online Marketing
Lecture0: introduction Online MarketingLecture0: introduction Online Marketing
Lecture0: introduction Online Marketing
 
Lecture2: Marketing and Social Media
Lecture2: Marketing and Social MediaLecture2: Marketing and Social Media
Lecture2: Marketing and Social Media
 
Lecture1: E-Commerce Business Model
Lecture1: E-Commerce Business ModelLecture1: E-Commerce Business Model
Lecture1: E-Commerce Business Model
 
Lecture0: Introduction Social Commerce
Lecture0: Introduction Social CommerceLecture0: Introduction Social Commerce
Lecture0: Introduction Social Commerce
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Coursesyllabus_dbm630
Coursesyllabus_dbm630Coursesyllabus_dbm630
Coursesyllabus_dbm630
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Dbm630 lecture10

  • 1. DBM630: Data Mining and Data Warehousing MS.IT. Rangsit University Semester 2/2011 Lecture 10 Data Mining: Case Studies and Applications by Kritsada Sriphaew (sriphaew.k AT gmail.com) 1
  • 2. Topics  Data Mining Goals  Data Floods  Machine Learning / Data Mining Application areas  Case Studies 2 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 3. THE FOUR GOALS OF DATA MINING • Prediction: Using current data to make prediction on future activities • Identification: "Data patterns can be used to identify the existence of an item, an event, or an activity“ • Classification: Breaking the data down into categories based on certain attributes. • Optimization: Using the mined data to make optimizations on resources, such as time, money, etc. 3 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 4. Trends leading to Data Flood  More data is generated:  Bank, telecom, other business transactions ...  Scientific data: astronomy, biology, etc  Web, text, and e-commerce 4 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 5. Big Data Examples  Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session  storage and analysis a big problem  AT&T handles billions of calls per day  so much data, it cannot be all stored -- analysis has to be done “on the fly”, on streaming data 5 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 6. Largest databases in 2003  Commercial databases:  Winter Corp. 2003 Survey: France Telecom has largest decision-support DB, ~30TB; AT&T ~ 26 TB  Web  Alexa internet archive: 7 years of data, 500 TB  Google searches 4+ Billion pages, several hundreds TB  IBM WebFountain, 160 TB (2003)  Internet Archive (www.archive.org),~ 300 TB 6 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 7. 5 million terabytes created in 2002  UC Berkeley 2003 estimate: 5 exabytes (5 million terabytes) of new data was created in 2002. www.sims.berkeley.edu/research/projects/how-much-info-2003/  US produces ~40% of new stored data worldwide 1018 7 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 8. Data Growth Rate  Twice as much information was created in 2002 as in 1999 (~30% growth rate)  Other growth rate estimates even higher  Very little data will ever be looked at by a human  Knowledge Discovery is NEEDED to make sense and use of data. 8 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 9. Machine Learning / Data Mining Application areas  Science  astronomy, bioinformatics, drug discovery, …  Business  advertising, CRM (Customer Relationship management), investments, manufacturing, sports/entertainment, telecom, e-Commerce, targeted marketing, health care, …  Web:  search engines, advertising, web and text mining, …  Government  surveillance & anti-terror (?|), crime detection, profiling tax cheaters, … 9 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 10. DM applications in 2004 (in %)  13%: Banking  9%: Direct Marketing, Fraud Detection, Scientific data analysis  8%: Bioinformatics  7%: Insurance, Medical/Pharmaceutic Applications  6%: eCommerce/Web, Telecommunications  4%: Investments/Stocks, Manufacturing, Retail, Security  Bellow: Travel, Entertainment/News, … 10 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 11. Data Mining for Customer Modeling  Customer Tasks:  attrition prediction (odchod zákazníků)  targeted marketing:  cross-sell, customer acquisition  credit-risk  fraud detection  Industries  banking, telecom, retail sales (maloobchodní prodej), … 11 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 12. Customer Attrition: Case Study  Situation: Attrition rate for mobile phone customers is around 25-30% a year! Task:  Given customer information for the past N months, predict who is likely to attrite next month.  Also, estimate customer value and what is the cost- effective offer to be made to this customer. 12 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 13. Customer Attrition Results  Verizon Wireless built a customer data warehouse  Identified potential attriters  Developed multiple, regional models  Targeted customers with high propensity to accept the offer  Reduced attrition rate from over 2%/month to under 1.5%/month (huge impact, with >30 M subscribers) (Reported in 2003) 13 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 14. Assessing Credit Risk: Case Study  Situation: Person applies for a loan  Task: Should a bank approve the loan?  Note: People who have the best credit don’t need the loans, and people with worst credit are not likely to repay. Bank’s best customers are in the middle 14 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 15. Credit Risk - Results  Banks develop credit models using variety of machine learning methods.  Mortgage and credit card proliferation are the results of being able to successfully predict if a person is likely to default on a loan  Widely deployed in many countries 15 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 16. Successful e-commerce – Case Study  A person buys a book (product) at Amazon.com.  Task: Recommend other books (products) this person is likely to buy  Amazon does clustering based on books bought:  customers who bought “Advances in Knowledge Discovery and Data Mining”, also bought “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”  Recommendation program is quite successful 16 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 17. Unsuccessful e-commerce case study (KDD-Cup 2000)  Data: clickstream and purchase data from Gazelle.com, legwear and legcare e-tailer  Q: Characterize visitors who spend more than $12 on an average order at the site  Dataset of 3,465 purchases, 1,831 customers  Very interesting analysis by Cup participants  thousands of hours - $X,000,000 (Millions) of consulting  Total sales -- $Y,000  Obituary: Gazelle.com out of business, Aug 2000 17 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 18. Genomic Microarrays – Case Study Given microarray data for a number of samples (patients), can we  Accurately diagnose the disease?  Predict outcome for given treatment?  Recommend best treatment? 18 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 19. Example: ALL/AML data  38 training cases, 34 test, ~ 7,000 genes  2 Classes: Acute Lymphoblastic Leukemia (ALL) vs Acute Myeloid Leukemia (AML)  Use train data to build diagnostic model ALL AML Results on test data: 33/34 correct, 1 error may be mislabeled 19 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 20. Security and Fraud Detection - Case Study  Credit Card Fraud Detection  Detection of Money laundering  FAIS (US Treasury)  Securities Fraud  NASDAQ KDD system  Phone fraud  AT&T, Bell Atlantic, British Telecom/MCI  Bio-terrorism detection at Salt Lake Olympics 2002 20 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 21. Data Mining and Privacy  in 2006, NSA (National Security Agency) was reported to be mining years of call info, to identify terrorism networks  Social network analysis has a potential to find networks  Invasion of privacy – do you mind if your call information is in a gov database?  What if NSA program finds one real suspect for 1,000 false leads ? 1,000,000 false leads? 21 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 22. Case Study: Intrusion Detection Systems  Detection Approach  Misuse Detection ▪ Based on known malicious patterns (signatures)  Anomaly Detection ▪ Based on deviations from established normal patterns (profiles)  Data Source  Network-based (NIDS) ▪ Network traffic  Host-based (HIDS) ▪ Audit trails 23 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 23. Case Study: Intrusion Detection Systems Data Mining Usage  Signature extraction  Rule matching  Alarm data analysis  Reduce false alarms  Eliminate redundant alarms  Feature selection  Training Data cleaning 24 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 24. Case Study: Intrusion Detection Systems  Behavioral Feature for Network Anomaly Detection  Training set = normal network traffic  Feature provides semantics of the values of data  Feature selection is important  Proposed method: ▪ Feature extraction based on protocol behavior ▪ Many Attacks uses protocol improperly ▪ Ping of Death ▪ SYN Flood ▪ Teardrop 25 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 25. Case Study: Intrusion Detection Systems Decision Tree (only small part) <=0.4 WWW tcpPerPSH <=0.79 SMTP >0.01 >0.4 tcpPerPSH >0.79 FTP tcpPerFIN <=0.03 telnet >0.79 SMTP <=0.01 >546773 tcpPerSYN >73 tcpPerPSH meanIAT >0.03 meanipTLen … >546773 … <=73 … 26 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 26. CASE STUDY: METLIFE Company Profile MetLife, Inc. is a leading provider of insurance and other financial services to millions of individual and institutional customers throughout the United States. Established in 1863, Metlife now has offices all over the US and the world, and offers ten different types of insurances and financial services. 27 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 27. CASE STUDY: METLIFE Industry: Insurance and Financial Services How they use Data Mining: Fraud Detection 28 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 28. CASE STUDY: METLIFE • Project first started in 2001 • MetLife set out to build $50 Million relational database • This project would consolidate data from 30 business world wide. 29 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 29. CASE STUDY: METLIFE • Around same time, it was reported that $30 Million of insurance money went to fraudulent claims. • MetLife teamed up with Computer Sciences Corporation (CSC) to o License their data mining tool (called Fraud Investigator), o Develop @First, "an early fraud detection system" 30 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 30. CASE STUDY: METLIFE • By 2003, MetLife's data mining operation was in full swing. • They were able to detect fraud in a fraction of the time it would take in man hours • One example is detecting rate evasion 31 Data Warehousing and Data Mining by Kritsada Sriphaew
  • 31. CASE STUDY: METLIFE • Rate evasion is lying about where you live to pay lower premiums. • Metlife used data mining to detect rate evasion by matching ZIP codes with phone numbers to see if the cities matched. • In 2.5 hours, Metlife found 107 fraudulent claims, all linked to a rate-evasion ring in NY and Massachusetts. 32 Data Warehousing and Data Mining by Kritsada Sriphaew