SlideShare a Scribd company logo
What is Data Mining?
Agenda
• What Data Mining IS and IS NOT
• Steps in the Data Mining Process
  – CRISP-DM
  – Explanation of Models
  – Examples of Data Mining
    Applications
• Questions
The Evolution of Data Analysis
Evolutionary Step Business Question Enabling                Product Providers Characteristics
                                    Technologies

Data Collection    "What was my total Computers, tapes,     IBM, CDC            Retrospective,
(1960s)            revenue in the last disks                                    static data delivery
                   five years?"

Data Access        "What were unit     Relational           Oracle, Sybase,     Retrospective,
(1980s)            sales in New        databases            Informix, IBM,      dynamic data
                   England last        (RDBMS),             Microsoft           delivery at record
                   March?"             Structured Query                         level
                                       Language (SQL),
                                       ODBC

Data Warehousing   "What were unit     On-line analytic     SPSS, Comshare,   Retrospective,
& Decision         sales in New        processing           Arbor, Cognos,    dynamic data
Support            England last        (OLAP),              Microstrategy,NCR delivery at multiple
(1990s)            March? Drill down   multidimensional                       levels
                   to Boston."         databases, data
                                       warehouses

Data Mining        "What’s likely to   Advanced             SPSS/Clementine,    Prospective,
(Emerging Today)   happen to Boston    algorithms,          Lockheed, IBM,      proactive
                   unit sales next     multiprocessor       SGI, SAS, NCR,      information
                   month? Why?"        computers, massive   Oracle, numerous    delivery
                                       databases            startups
Results of Data Mining
       Include:
  • Forecasting what may happen in
    the future
  • Classifying people or things into
    groups by recognizing patterns
  • Clustering people or things into
    groups based on their attributes
  • Associating what events are likely
    to occur together
  • Sequencing what events are likely
    to lead to later events
Data mining is not
•Brute-force crunching of bulk
data
•“Blind” application of algorithms
•Going to find relationships
where none exist
•Presenting data in different
ways
•A database intensive task
•A difficult to understand
technology requiring an
advanced degree in computer
science
Data Mining Is
        •A hot buzzword for a class of
        techniques that find patterns in data
        •A user-centric, interactive process
        which leverages analysis
        technologies and computing power
        •A group of techniques that find
        relationships that have not
        previously been discovered
        •Not reliant on an existing database
        •A relatively easy task that requires
        knowledge of the business problem/
        subject matter expertise
Data Mining versus
         OLAP
•OLAP - On-line
Analytical
Processing
   – Provides you
     with a very
     good view of
     what is
     happening, but
     can not predict
     what will
     happen in the
     future or why it
     is happening
Data Mining Versus Statistical
             Analysis
•Data Mining                     •Data Analysis
    – Originally developed to act – Tests for statistical
      as expert systems to solve       correctness of models
      problems                          • Are statistical
    – Less interested in the               assumptions of models
      mechanics of the                     correct?
      technique                              – Eg Is the R-Square
    – If it makes sense then let’s             good?
      use it                         – Hypothesis testing
    – Does not require                  • Is the relationship
      assumptions to be made               significant?
      about data                             – Use a t-test to validate
    – Can find patterns in very                significance
      large amounts of data          – Tends to rely on sampling
    – Requires understanding         – Techniques are not
      of data and business             optimised for large amounts
      problem                          of data
                                     – Requires strong statistical
                                       skills
Examples of What People
    are Doing with Data Mining:
•Fraud/Non-Compliance           •Recruiting/Attracting
Anomaly detection               customers
   – Isolate the factors that   •Maximizing
     lead to fraud, waste and   profitability (cross
                                selling, identifying
     abuse                      profitable customers)
   – Target auditing and
                                •Service Delivery and
     investigative efforts more Customer Retention
     effectively                  – Build profiles of
•Credit/Risk Scoring                customers likely
                                    to use which
•Intrusion detection                services
•Parts failure prediction      •Web Mining
How Can We Do Data
  Mining?
By Utilizing the CRISP-
 DM Methodology
  – a standard process
  – existing data
  – software
    technologies
  – situational expertise
Why Should There be a
Standard Process?
                               •Framework for recording
                               experience
                                   – Allows projects to be
The data mining process must         replicated
be reliable and repeatable by •Aid to project planning and
people with little data mining management
                               •“Comfort factor” for new
background.                    adopters
                                   – Demonstrates maturity of
                                     Data Mining
                                   – Reduces dependency on
                                     “stars”
Process
    Standardization
CRISP-DM:
•   CRoss Industry Standard Process for Data Mining
•   Initiative launched Sept.1996
•   SPSS/ISL, NCR, Daimler-Benz, OHRA
•   Funding from European commission
•   Over 200 members of the CRISP-DM SIG worldwide
    – DM Vendors - SPSS, NCR, IBM, SAS, SGI, Data Distilleries,
      Syllogic, Magnify, ..
    – System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte
      & Touche, …
    – End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, ...
CRISP-DM
•Non-proprietary
•Application/Industry
neutral
•Tool neutral
•Focus on business issues
    – As well as technical
      analysis
•Framework for guidance
•Experience base
    – Templates for
      Analysis
The
CRISP-
DM
Process
Model
Why CRISP-DM?
•The data mining process must be reliable and repeatable by
people with little data mining skills

•CRISP-DM provides a uniform framework for
   –guidelines
   –experience documentation

•CRISP-DM is flexible to account for differences
   –Different business/agency problems
   –Different data
Phases and Tasks
      B u s in e s s               D a ta                 D a ta
                                                                                               M o d e lin g              E v a lu a t io n             D e p lo y m e n t
  U n d e r s t a n d in g U n d e r s t a n d in g P r e p a r a t io n


D e t e r m in e               C o lle c t In it ia l D a t a   D ata Set                  S e le c t M o d e lin g E v a lu a t e R e s u lt s   P la n D e p lo y m e n t
   B u s i n e s s O b j e c t Ii v e s D ata C ollection
                                  nitial                        D ata Set D escription         T e c h n iq u e      A ssessment of D ata         D eployment P lan
B ackground                        R eport                                                 M odeling T echnique         M ining R esults w.r.t.
B usiness Objectives                                             S e le c t D a t a        M odeling A ssumptions       B usiness Success         P la n M o n it o r in g a n d
B usiness Success              D e s c r ib e D a t a            R ationale for I nclusion /                            C riteria                    M a in t e n a n c e
  C riteria                    D ata D escription R eport           E xclusion             G e n e r a t e T e s t D A pproved M odels
                                                                                                                     e s ig n                     M onitoring and
                                                                                           T est D esign                                           M aintenance P lan
S i t u a t i o n A s s e s s mEex p l o r e D a t a
                                  nt                             C le a n D a t a                                    R e v ie w P r o c e s s
I nventory of R esources       D ata E xploration R eport        D ata C leaning R eport B u i l d M o d e l         R eview of P rocess          P r o d u c e F in a l R e p o
R equirements,                                                                             P arameter Settings                                    F inal R eport
  A ssumptions, and            V e r i f y D a t a Q u a l i t y C o n s t r u c t D a tM odels
                                                                                            a                        D e t e r m in e N e x t S   F e p s resentation
                                                                                                                                                  t inal P
  C onstraints                 D ata Q uality R eport            D erived A ttributes      M odel D escription       List of P ossible A ctions
R isks and C ontingencies                                        Generated R ecords                                  D ecision                    R e v ie w P r o je c t
T erminology                                                                               As s es s Model                                        E xperience
C osts and B enefits                                             I n t e g r a t e D a t a odel A ssessment
                                                                                           M                                                        D ocumentation
                                                                 M erged D ata             R evised P arameter
D e t e r m in e                                                                            Settings
    D a t a M in in g G o a l                                    F o rma t D a ta
D ata M ining Goals                                              R eformatted D ata
D ata M ining Success
   C riteria

P r o d u c e P r o je c t P la n
P roj P lan
     ect
I nitial A sessment of
  T ools and T echniques
Phases in the DM Process:
CRISP-DM
Phases in the DM
    Process (1 & 2)
•Business Understanding:
   – Statement of
     Business Objective
   – Statement of Data
                         •Data Understanding
     Mining objective
                            – Explore the data and
   – Statement of Success
                              verify the quality
     Criteria
                            – Find outliers
Phases in the DM
  Process (3)
• Data preparation:
   – Takes usually over 90% of our time
      • Collection
      • Assessment
      • Consolidation and Cleaning
          – table links, aggregation level,
            missing values, etc
      • Data selection
          – active role in ignoring non-
            contributory data?
          – outliers?
          – Use of samples
          – visualization tools
      • Transformations - create new
        variables
Phases in the DM Process
            (4)
 • Model building
   – Selection of the modeling
     techniques is based upon
     the data mining objective
   – Modeling is an iterative
     process - different for
     supervised and
     unsupervised learning
      • May model for either
        description or prediction
Types of Models
•Prediction Models for       •Descriptive Models for
Predicting and               Grouping and Finding
Classifying                  Associations
   – Regression algorithms      – Clustering/Grouping
     (predict numeric
     outcome): neural             algorithms: K-
     networks, rule               means, Kohonen
     induction, CART (OLS       – Association
     regression, GLM)             algorithms: apriori,
   – Classification               GRI
     algorithm predict
     symbolic outcome):
     CHAID, C5.0
     (discriminant analysis,
     logistic regression)
Neural Network
  Input layer
           Hidden layer


                    Output
Neural Networks
• Description
  – Difficult interpretation
  – Tends to ‘overfit’ the data
  – Extensive amount of training time
  – A lot of data preparation
  – Works with all data types
Rule Induction
•Description
   – Produces decision trees:
      • income < $40K
          – job > 5 yrs then good
            risk
          – job < 5 yrs then bad                                                              Credit ranking (1=default)


            risk                                                                                   Cat. %
                                                                                                   Bad 52.01 168
                                                                                                                   n

                                                                                                   Good 47.99 155

      • income > $40K                                                                              Total (100.00) 323

                                                                                                Paid Weekly/Monthly
                                                                                     P-value=0.0000, Chi-square=179.6665, df=1
          – high debt then bad risk                              Weekly pay                                                                Monthly salary


          – low debt then good risk                           Cat. %
                                                              Bad 86.67 143
                                                              Good 13.33 22
                                                                             n                                                           Cat. %
                                                                                                                                         Bad 15.82 25
                                                                                                                                         Good 84.18 133
                                                                                                                                                        n


                                                              Total (51.08) 165                                                          Total (48.92) 158

   – Or Rule Sets:                                           Age Categorical
                                                 P-value=0.0000, Chi-square=30.1113, df=1
                                                                                                                                        Age Categorical
                                                                                                                            P-value=0.0000, Chi-square=58.7255, df=1


       • Rule #1 for good risk:       Young (< 25);Middle (25-35)

                                          Cat. %         n
                                                                                     Old ( > 35)

                                                                                  Cat. %           n               Cat. %
                                                                                                                         Young (< 25)

                                                                                                                                 n
                                                                                                                                                         Middle (25-35);Old ( > 35)

                                                                                                                                                            Cat. %         n

           – if income > $40K             Bad 90.51 143
                                          Good 9.49 15
                                          Total (48.92) 158
                                                                                  Bad 0.00
                                                                                  Good 100.00
                                                                                  Total (2.17)
                                                                                                   0
                                                                                                   7
                                                                                                   7
                                                                                                                   Bad 48.98 24
                                                                                                                   Good 51.02 25
                                                                                                                   Total (15.17) 49
                                                                                                                                                            Bad 0.92 1
                                                                                                                                                            Good 99.08 108
                                                                                                                                                            Total (33.75) 109


           – if low debt                                                                                              Social Class
                                                                                                        P-value=0.0016, Chi-square=12.0388, df=1



       • Rule #2 for good risk:                                                                    Management;Clerical

                                                                                                   Cat. %         n
                                                                                                                                          Professional

                                                                                                                                        Cat. %        n

           – if income < $40K
                                                                                                   Bad 0.00       0                     Bad 58.54 24
                                                                                                   Good 100.00    8                     Good 41.46 17
                                                                                                   Total (2.48)   8                     Total (12.69) 41


           – if job > 5 years
Rule Induction
Description
• Intuitive output
• Handles all forms of numeric data, as well
  as non-numeric (symbolic) data

C5 Algorithm a special case of rule
  induction
• Target variable must be symbolic
Apriori
Description
• Seeks association rules in
  dataset
• ‘Market basket’ analysis
• Sequence discovery
Kohonen Network
Description
• unsupervised
• seeks to
  describe
  dataset in
  terms of
  natural
  clusters of
  cases
Phases in the DM
      Process (5)
• Model Evaluation
  – Evaluation of model: how well it
    performed on test data
  – Methods and criteria depend on
    model type:
     • e.g., coincidence matrix with
       classification models, mean
       error rate with regression
       models
  – Interpretation of model:
    important or not, easy or hard
    depends on algorithm
Phases in the DM
    Process (6)
•Deployment
   – Determine how the results need to be
     utilized
   – Who needs to use them?
   – How often do they need to be used
•Deploy Data Mining results by:
   – Scoring a database
   – Utilizing results as business rules
   – interactive scoring on-line
Specific Data Mining
Applications:
What data mining has
done for...
         The US Internal Revenue Service
         needed to improve customer
         service and...

    Scheduled its workforce
to provide faster, more accurate
     answers to questions.
What data mining has done
for...
          The US Drug Enforcement
          Agency needed to be more
          effective in their drug “busts”
          and

   analyzed suspects’ cell phone
   usage to focus investigations.
What data mining has done
for...
    HSBC need to cross-sell more
    effectively by identifying profiles
    that would be interested in higher
    yielding investments and...

 Reduced direct mail costs by 30%
    while garnering 95% of the
      campaign’s revenue.
Final Comments
 • Data Mining can be utilized in any
   organization that needs to find
   patterns or relationships in their
   data.
 • By using the CRISP-DM
   methodology, analysts can have a
   reasonable level of assurance that
   their Data Mining efforts will
   render useful, repeatable, and
   valid results.
Questions?

More Related Content

What's hot

Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
SSaudia
 
How to Effectively Audit your IT Infrastructure
How to Effectively Audit your IT InfrastructureHow to Effectively Audit your IT Infrastructure
How to Effectively Audit your IT Infrastructure
Netwrix Corporation
 
IT Risk Management
IT Risk ManagementIT Risk Management
IT Risk Management
Tudor Damian
 
CONTROL AND AUDIT
CONTROL AND AUDITCONTROL AND AUDIT
CONTROL AND AUDIT
Ros Dina
 
Strategic information system planning
Strategic information system planningStrategic information system planning
Strategic information system planning
Dhani Ahmad
 
Information Systems Audit - Ron Weber chapter 1
Information Systems Audit - Ron Weber chapter 1Information Systems Audit - Ron Weber chapter 1
Information Systems Audit - Ron Weber chapter 1
Sreekanth Narendran
 
Information Systems Control and Audit - Chapter 4 - Systems Development Manag...
Information Systems Control and Audit - Chapter 4 - Systems Development Manag...Information Systems Control and Audit - Chapter 4 - Systems Development Manag...
Information Systems Control and Audit - Chapter 4 - Systems Development Manag...
Sreekanth Narendran
 
SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)
SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)
SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)Biswajit Bhattacharjee
 
Business Performance Management
Business Performance ManagementBusiness Performance Management
Business Performance Management
Seta Wicaksana
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
Fazle Rabbi Ador
 
Auditing application controls
Auditing application controlsAuditing application controls
Auditing application controls
CenapSerdarolu
 
Security Audit Best-Practices
Security Audit Best-PracticesSecurity Audit Best-Practices
Security Audit Best-Practices
Marco Raposo
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Dr. Dipti Patil
 
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
kzoe1996
 
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
PECB
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
SHIVANI SONI
 
Information System Architecture and Audit Control Lecture 1
Information System Architecture and Audit Control Lecture 1Information System Architecture and Audit Control Lecture 1
Information System Architecture and Audit Control Lecture 1
Yasir Khan
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
Conducting an Information Systems Audit
Conducting an Information Systems Audit Conducting an Information Systems Audit
Conducting an Information Systems Audit
Sreekanth Narendran
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
Online
 

What's hot (20)

Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
How to Effectively Audit your IT Infrastructure
How to Effectively Audit your IT InfrastructureHow to Effectively Audit your IT Infrastructure
How to Effectively Audit your IT Infrastructure
 
IT Risk Management
IT Risk ManagementIT Risk Management
IT Risk Management
 
CONTROL AND AUDIT
CONTROL AND AUDITCONTROL AND AUDIT
CONTROL AND AUDIT
 
Strategic information system planning
Strategic information system planningStrategic information system planning
Strategic information system planning
 
Information Systems Audit - Ron Weber chapter 1
Information Systems Audit - Ron Weber chapter 1Information Systems Audit - Ron Weber chapter 1
Information Systems Audit - Ron Weber chapter 1
 
Information Systems Control and Audit - Chapter 4 - Systems Development Manag...
Information Systems Control and Audit - Chapter 4 - Systems Development Manag...Information Systems Control and Audit - Chapter 4 - Systems Development Manag...
Information Systems Control and Audit - Chapter 4 - Systems Development Manag...
 
SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)
SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)
SECURITY & CONTROL OF INFORMATION SYSTEM (Management Information System)
 
Business Performance Management
Business Performance ManagementBusiness Performance Management
Business Performance Management
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Auditing application controls
Auditing application controlsAuditing application controls
Auditing application controls
 
Security Audit Best-Practices
Security Audit Best-PracticesSecurity Audit Best-Practices
Security Audit Best-Practices
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
 
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Information System Architecture and Audit Control Lecture 1
Information System Architecture and Audit Control Lecture 1Information System Architecture and Audit Control Lecture 1
Information System Architecture and Audit Control Lecture 1
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Conducting an Information Systems Audit
Conducting an Information Systems Audit Conducting an Information Systems Audit
Conducting an Information Systems Audit
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 

Viewers also liked

Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
Datamining Tools
 
Ymag56 hr
Ymag56 hrYmag56 hr
An Introduction to Data Mining
An Introduction to Data MiningAn Introduction to Data Mining
An Introduction to Data Miningbutest
 
Concept description characterization and comparison
Concept description characterization and comparisonConcept description characterization and comparison
Concept description characterization and comparisonric_biet
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
COSTARCH Analytical Consulting (P) Ltd.
 
Tax DSS
Tax DSSTax DSS
Business IT Alignment Heuristic
Business IT Alignment HeuristicBusiness IT Alignment Heuristic
Business IT Alignment Heuristic
Kodok Ngorex
 
Data mining
Data mining Data mining
Data mining
sayalipatil528
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
Pragya Pandey
 
Clinical decision support systems
Clinical decision support systemsClinical decision support systems
Clinical decision support systems
Padmaja Muttamshetty
 
Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...
Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...
Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...
Health Informatics New Zealand
 
Basic research
Basic researchBasic research
Basic research
Manu Alias
 
Stack using Linked List
Stack using Linked ListStack using Linked List
Stack using Linked List
Sayantan Sur
 
Organising skills
Organising skillsOrganising skills
Organising skills
Nijaz N
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issues
Krish_ver2
 
Human Resource Management : The Importance of Effective Strategy and Planning
Human Resource Management : The Importance of Effective Strategy and PlanningHuman Resource Management : The Importance of Effective Strategy and Planning
Human Resource Management : The Importance of Effective Strategy and Planning
Asia Master Training آسيا ماسترز للتدريب والتطوير
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 

Viewers also liked (20)

Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Ymag56 hr
Ymag56 hrYmag56 hr
Ymag56 hr
 
An Introduction to Data Mining
An Introduction to Data MiningAn Introduction to Data Mining
An Introduction to Data Mining
 
Concept description characterization and comparison
Concept description characterization and comparisonConcept description characterization and comparison
Concept description characterization and comparison
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Ch01
Ch01Ch01
Ch01
 
Ch02
Ch02Ch02
Ch02
 
Tax DSS
Tax DSSTax DSS
Tax DSS
 
Business IT Alignment Heuristic
Business IT Alignment HeuristicBusiness IT Alignment Heuristic
Business IT Alignment Heuristic
 
Data mining
Data mining Data mining
Data mining
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
Clinical decision support systems
Clinical decision support systemsClinical decision support systems
Clinical decision support systems
 
Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...
Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...
Seyedjamal Zolhavarieh - A model of knowledge quality assessment in clinical ...
 
Basic research
Basic researchBasic research
Basic research
 
Stack using Linked List
Stack using Linked ListStack using Linked List
Stack using Linked List
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Organising skills
Organising skillsOrganising skills
Organising skills
 
1.3 applications, issues
1.3 applications, issues1.3 applications, issues
1.3 applications, issues
 
Human Resource Management : The Importance of Effective Strategy and Planning
Human Resource Management : The Importance of Effective Strategy and PlanningHuman Resource Management : The Importance of Effective Strategy and Planning
Human Resource Management : The Importance of Effective Strategy and Planning
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 

Similar to Data mining applications

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
BigMine
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Deepika ParthaSarathy
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
Gary Allemann
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
cedrinemadera
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
ShivanandaVSeeri
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Precisely
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
asmeerana605
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
Data Science Milan
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Edward Curry
 
Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)
Bill Chambers
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
Professor Lili Saghafi
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptx
KannanThangavelu2
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
Shivam Singh
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 

Similar to Data mining applications (20)

351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptx
 
Data Analytics and Big Data on IoT
Data Analytics and Big Data on IoTData Analytics and Big Data on IoT
Data Analytics and Big Data on IoT
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 

More from Dr. C.V. Suresh Babu

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
Dr. C.V. Suresh Babu
 
Association rules
Association rulesAssociation rules
Association rules
Dr. C.V. Suresh Babu
 
Clustering
ClusteringClustering
Classification
ClassificationClassification
Classification
Dr. C.V. Suresh Babu
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
Dr. C.V. Suresh Babu
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
Dr. C.V. Suresh Babu
 
DART
DARTDART
Mycin
MycinMycin
Expert systems
Expert systemsExpert systems
Expert systems
Dr. C.V. Suresh Babu
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
Dr. C.V. Suresh Babu
 
Bayes network
Bayes networkBayes network
Bayes network
Dr. C.V. Suresh Babu
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
Dr. C.V. Suresh Babu
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
Dr. C.V. Suresh Babu
 
Rule based system
Rule based systemRule based system
Rule based system
Dr. C.V. Suresh Babu
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
Dr. C.V. Suresh Babu
 
Production based system
Production based systemProduction based system
Production based system
Dr. C.V. Suresh Babu
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
Dr. C.V. Suresh Babu
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
Dr. C.V. Suresh Babu
 

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 

Data mining applications

  • 1. What is Data Mining?
  • 2. Agenda • What Data Mining IS and IS NOT • Steps in the Data Mining Process – CRISP-DM – Explanation of Models – Examples of Data Mining Applications • Questions
  • 3. The Evolution of Data Analysis Evolutionary Step Business Question Enabling Product Providers Characteristics Technologies Data Collection "What was my total Computers, tapes, IBM, CDC Retrospective, (1960s) revenue in the last disks static data delivery five years?" Data Access "What were unit Relational Oracle, Sybase, Retrospective, (1980s) sales in New databases Informix, IBM, dynamic data England last (RDBMS), Microsoft delivery at record March?" Structured Query level Language (SQL), ODBC Data Warehousing "What were unit On-line analytic SPSS, Comshare, Retrospective, & Decision sales in New processing Arbor, Cognos, dynamic data Support England last (OLAP), Microstrategy,NCR delivery at multiple (1990s) March? Drill down multidimensional levels to Boston." databases, data warehouses Data Mining "What’s likely to Advanced SPSS/Clementine, Prospective, (Emerging Today) happen to Boston algorithms, Lockheed, IBM, proactive unit sales next multiprocessor SGI, SAS, NCR, information month? Why?" computers, massive Oracle, numerous delivery databases startups
  • 4. Results of Data Mining Include: • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns • Clustering people or things into groups based on their attributes • Associating what events are likely to occur together • Sequencing what events are likely to lead to later events
  • 5. Data mining is not •Brute-force crunching of bulk data •“Blind” application of algorithms •Going to find relationships where none exist •Presenting data in different ways •A database intensive task •A difficult to understand technology requiring an advanced degree in computer science
  • 6. Data Mining Is •A hot buzzword for a class of techniques that find patterns in data •A user-centric, interactive process which leverages analysis technologies and computing power •A group of techniques that find relationships that have not previously been discovered •Not reliant on an existing database •A relatively easy task that requires knowledge of the business problem/ subject matter expertise
  • 7. Data Mining versus OLAP •OLAP - On-line Analytical Processing – Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening
  • 8. Data Mining Versus Statistical Analysis •Data Mining •Data Analysis – Originally developed to act – Tests for statistical as expert systems to solve correctness of models problems • Are statistical – Less interested in the assumptions of models mechanics of the correct? technique – Eg Is the R-Square – If it makes sense then let’s good? use it – Hypothesis testing – Does not require • Is the relationship assumptions to be made significant? about data – Use a t-test to validate – Can find patterns in very significance large amounts of data – Tends to rely on sampling – Requires understanding – Techniques are not of data and business optimised for large amounts problem of data – Requires strong statistical skills
  • 9. Examples of What People are Doing with Data Mining: •Fraud/Non-Compliance •Recruiting/Attracting Anomaly detection customers – Isolate the factors that •Maximizing lead to fraud, waste and profitability (cross selling, identifying abuse profitable customers) – Target auditing and •Service Delivery and investigative efforts more Customer Retention effectively – Build profiles of •Credit/Risk Scoring customers likely to use which •Intrusion detection services •Parts failure prediction •Web Mining
  • 10. How Can We Do Data Mining? By Utilizing the CRISP- DM Methodology – a standard process – existing data – software technologies – situational expertise
  • 11. Why Should There be a Standard Process? •Framework for recording experience – Allows projects to be The data mining process must replicated be reliable and repeatable by •Aid to project planning and people with little data mining management •“Comfort factor” for new background. adopters – Demonstrates maturity of Data Mining – Reduces dependency on “stars”
  • 12. Process Standardization CRISP-DM: • CRoss Industry Standard Process for Data Mining • Initiative launched Sept.1996 • SPSS/ISL, NCR, Daimler-Benz, OHRA • Funding from European commission • Over 200 members of the CRISP-DM SIG worldwide – DM Vendors - SPSS, NCR, IBM, SAS, SGI, Data Distilleries, Syllogic, Magnify, .. – System Suppliers / consultants - Cap Gemini, ICL Retail, Deloitte & Touche, … – End Users - BT, ABB, Lloyds Bank, AirTouch, Experian, ...
  • 13. CRISP-DM •Non-proprietary •Application/Industry neutral •Tool neutral •Focus on business issues – As well as technical analysis •Framework for guidance •Experience base – Templates for Analysis
  • 15. Why CRISP-DM? •The data mining process must be reliable and repeatable by people with little data mining skills •CRISP-DM provides a uniform framework for –guidelines –experience documentation •CRISP-DM is flexible to account for differences –Different business/agency problems –Different data
  • 16. Phases and Tasks B u s in e s s D a ta D a ta M o d e lin g E v a lu a t io n D e p lo y m e n t U n d e r s t a n d in g U n d e r s t a n d in g P r e p a r a t io n D e t e r m in e C o lle c t In it ia l D a t a D ata Set S e le c t M o d e lin g E v a lu a t e R e s u lt s P la n D e p lo y m e n t B u s i n e s s O b j e c t Ii v e s D ata C ollection nitial D ata Set D escription T e c h n iq u e A ssessment of D ata D eployment P lan B ackground R eport M odeling T echnique M ining R esults w.r.t. B usiness Objectives S e le c t D a t a M odeling A ssumptions B usiness Success P la n M o n it o r in g a n d B usiness Success D e s c r ib e D a t a R ationale for I nclusion / C riteria M a in t e n a n c e C riteria D ata D escription R eport E xclusion G e n e r a t e T e s t D A pproved M odels e s ig n M onitoring and T est D esign M aintenance P lan S i t u a t i o n A s s e s s mEex p l o r e D a t a nt C le a n D a t a R e v ie w P r o c e s s I nventory of R esources D ata E xploration R eport D ata C leaning R eport B u i l d M o d e l R eview of P rocess P r o d u c e F in a l R e p o R equirements, P arameter Settings F inal R eport A ssumptions, and V e r i f y D a t a Q u a l i t y C o n s t r u c t D a tM odels a D e t e r m in e N e x t S F e p s resentation t inal P C onstraints D ata Q uality R eport D erived A ttributes M odel D escription List of P ossible A ctions R isks and C ontingencies Generated R ecords D ecision R e v ie w P r o je c t T erminology As s es s Model E xperience C osts and B enefits I n t e g r a t e D a t a odel A ssessment M D ocumentation M erged D ata R evised P arameter D e t e r m in e Settings D a t a M in in g G o a l F o rma t D a ta D ata M ining Goals R eformatted D ata D ata M ining Success C riteria P r o d u c e P r o je c t P la n P roj P lan ect I nitial A sessment of T ools and T echniques
  • 17. Phases in the DM Process: CRISP-DM
  • 18. Phases in the DM Process (1 & 2) •Business Understanding: – Statement of Business Objective – Statement of Data •Data Understanding Mining objective – Explore the data and – Statement of Success verify the quality Criteria – Find outliers
  • 19. Phases in the DM Process (3) • Data preparation: – Takes usually over 90% of our time • Collection • Assessment • Consolidation and Cleaning – table links, aggregation level, missing values, etc • Data selection – active role in ignoring non- contributory data? – outliers? – Use of samples – visualization tools • Transformations - create new variables
  • 20. Phases in the DM Process (4) • Model building – Selection of the modeling techniques is based upon the data mining objective – Modeling is an iterative process - different for supervised and unsupervised learning • May model for either description or prediction
  • 21. Types of Models •Prediction Models for •Descriptive Models for Predicting and Grouping and Finding Classifying Associations – Regression algorithms – Clustering/Grouping (predict numeric outcome): neural algorithms: K- networks, rule means, Kohonen induction, CART (OLS – Association regression, GLM) algorithms: apriori, – Classification GRI algorithm predict symbolic outcome): CHAID, C5.0 (discriminant analysis, logistic regression)
  • 22. Neural Network Input layer Hidden layer Output
  • 23. Neural Networks • Description – Difficult interpretation – Tends to ‘overfit’ the data – Extensive amount of training time – A lot of data preparation – Works with all data types
  • 24. Rule Induction •Description – Produces decision trees: • income < $40K – job > 5 yrs then good risk – job < 5 yrs then bad Credit ranking (1=default) risk Cat. % Bad 52.01 168 n Good 47.99 155 • income > $40K Total (100.00) 323 Paid Weekly/Monthly P-value=0.0000, Chi-square=179.6665, df=1 – high debt then bad risk Weekly pay Monthly salary – low debt then good risk Cat. % Bad 86.67 143 Good 13.33 22 n Cat. % Bad 15.82 25 Good 84.18 133 n Total (51.08) 165 Total (48.92) 158 – Or Rule Sets: Age Categorical P-value=0.0000, Chi-square=30.1113, df=1 Age Categorical P-value=0.0000, Chi-square=58.7255, df=1 • Rule #1 for good risk: Young (< 25);Middle (25-35) Cat. % n Old ( > 35) Cat. % n Cat. % Young (< 25) n Middle (25-35);Old ( > 35) Cat. % n – if income > $40K Bad 90.51 143 Good 9.49 15 Total (48.92) 158 Bad 0.00 Good 100.00 Total (2.17) 0 7 7 Bad 48.98 24 Good 51.02 25 Total (15.17) 49 Bad 0.92 1 Good 99.08 108 Total (33.75) 109 – if low debt Social Class P-value=0.0016, Chi-square=12.0388, df=1 • Rule #2 for good risk: Management;Clerical Cat. % n Professional Cat. % n – if income < $40K Bad 0.00 0 Bad 58.54 24 Good 100.00 8 Good 41.46 17 Total (2.48) 8 Total (12.69) 41 – if job > 5 years
  • 25. Rule Induction Description • Intuitive output • Handles all forms of numeric data, as well as non-numeric (symbolic) data C5 Algorithm a special case of rule induction • Target variable must be symbolic
  • 26. Apriori Description • Seeks association rules in dataset • ‘Market basket’ analysis • Sequence discovery
  • 27. Kohonen Network Description • unsupervised • seeks to describe dataset in terms of natural clusters of cases
  • 28. Phases in the DM Process (5) • Model Evaluation – Evaluation of model: how well it performed on test data – Methods and criteria depend on model type: • e.g., coincidence matrix with classification models, mean error rate with regression models – Interpretation of model: important or not, easy or hard depends on algorithm
  • 29. Phases in the DM Process (6) •Deployment – Determine how the results need to be utilized – Who needs to use them? – How often do they need to be used •Deploy Data Mining results by: – Scoring a database – Utilizing results as business rules – interactive scoring on-line
  • 31. What data mining has done for... The US Internal Revenue Service needed to improve customer service and... Scheduled its workforce to provide faster, more accurate answers to questions.
  • 32. What data mining has done for... The US Drug Enforcement Agency needed to be more effective in their drug “busts” and analyzed suspects’ cell phone usage to focus investigations.
  • 33. What data mining has done for... HSBC need to cross-sell more effectively by identifying profiles that would be interested in higher yielding investments and... Reduced direct mail costs by 30% while garnering 95% of the campaign’s revenue.
  • 34. Final Comments • Data Mining can be utilized in any organization that needs to find patterns or relationships in their data. • By using the CRISP-DM methodology, analysts can have a reasonable level of assurance that their Data Mining efforts will render useful, repeatable, and valid results.

Editor's Notes

  1. The US Internal Revenue Service is using data mining to improve customer service. [Click] By analyzing incoming requests for help and information, the IRS hopes to schedule its workforce to provide faster, more accurate answers to questions.
  2. The US DFAS needs to search through 2.5 million financial transactions that may indicate inaccurate charges. Instead of relying on tips to point out fraud, the DFAS is mining the data to identify suspicious transactions. [Click] Using Clementine, the agency examined credit card transactions and was able to identify purchases that did not match past patterns. Using this information, DFAS could focus investigations, finding fraud more costs effectively.
  3. Retail banking is a highly competitive business. In addition to competition from other banks, banks also see intense competition from financial services companies of all kinds, from stockbrokers to mortgage companies. With so many organizations working the same customer base, the value of customer retention is greater than ever before. As a result, HSBC Bank USA looks to enticing existing customers to &amp;quot;roll over&amp;quot; maturing products, or on cross-selling new ones. [Click] Using SPSS products, HSBC found that it could reduce direct mail costs by 30% while still bringing in 95% of the campaign’s revenue. Because HSBC is sending out fewer mail pieces, customers are likely to be more loyal because they don’t receive junk mail from the bank.