1
DATA MINING IN HEALTHCARE
Dr. V. Subha, B.E., M.E., Ph.D.,
Assistant Professor,
Department of Computer Science & Engineering,
Manonmaniam Sundaranar University,
Tirunelveli. 2
CONTENTS
1. Introduction to Data Mining
2. Data Mining techniques
3. Data Mining in Healthcare
4. Data Mining Resources
3
1. INTRODUCTION TO DATA MINING
4
5
6
Data Vs Knowledge
7
9
Definition of Data Mining
• Process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.
- Predicts outcomes of future
observations.
10
Confluence of different disciplines
11
Steps in Data Mining Evolution
12
Data Mining - Motivation
• Growth in data.
• High dimensionality of data.
• Heterogeneous and complex data.
• Development of commercial data mining software.
• Growth of computing power and storage capacity.
• Limitation of humans.
13
Why Data Mining ?
• Credit ratings :
• Given a database of 100,000 names, which persons are the least likely
to default on their credit cards?
• Fraud detection :
• Which types of transactions are likely to be fraudulent, given the
demographics and transactional history of a particular customer?
• Customer relationship management:
• Which of my customers are likely to be the most loyal, and which are
most likely to leave for a competitor?
Data Mining helps extract such information
14
Data Mining for decision making
15
Types of data analysis
Predictive models help in diagnosing diseases and then
uncovering a special path for treatment.
16
Knowledge Discovery in Databases (KDD)
17
Steps of KDD Process
Data relevant to the
analysis is decided and
retrieved from the
database
Data cleansing -
Noisy data and
missing data are
handled
appropriately
Processed data is
transformed into
appropriate forms
for mining
Clever
techniques are
applied to
extract patterns
that are
potentially
useful
Mined data
patterns are
evaluated
18
Applications of data mining
Popular applications of data mining 19
2. DATA MINING TECHNIQUES
20
Data Mining Techniques
21
Data Mining Techniques…
• Unsupervised learning
- No knowledge of output
- Self-guided learning algorithm.
• Supervised learning
- Knowledge of output
- Learning with an expert/teacher
22
Data Mining Techniques…
23
Classification – Prediction Model
Data Mining
Algorithms
Classification
Prediction
• Process of finding a model that describes the data classes or concepts.
• Purpose is to use this model to predict the class of instances whose class
label is unknown.
• This derived model is based on the analysis of sets of training data.
24
Example Dataset
25
Dataset Splitting
26
27
Model Construction
Training
Data
AGE GENDER CHEST PAIN DISEASE
63 Male atyp_angina yes
67 Male asympt yes
67 Male asympt yes
37 Male non_anginal no
41 Female atyp_angina no
62 Female asympt yes
Classification
Algorithms
IF chest pain = ‘asympt’
OR age > 53
THEN disease = ‘yes’
Classifier
(Model)
28
Using the Model in Prediction
Classifier
(Model)
Testing Data
(50,male,asympt)
Disease?
Classification Techniques
Neural Networks Bayesian Networks
Decision Trees Support Vector Machines
29
Confusion Matrix
• Contains information about actual and predicted classifications
done by a classification system.
• Performance of such systems is commonly evaluated using the
data in the confusion matrix.
30
Confusion Matrix…
• TP – Patient with heart disease.
• FP – Patient wrongly identified to have heart disease.
• FN – Patient have been left out from treatment for heart disease.
• TN – Patient does not have heart disease.
Predicted
Class
Actual Class
Positive Negative
Positive True Positive (TP) False Positive (FP)
Negative False Negative (FN) True Negative (TN)
31
Performance Metrics
• Accuracy =
• Sensitivity =
• Specificity =










FN
FP
TN
TP
TN
TP






 FN
TP
TP






 FP
TN
TN
32
3. DATA MINING IN HEALTHCARE
33
Data mining in healthcare
• Hospitals deal with lot of data on a day to day basis.
• Difficult for a single person to handle.
• Data mining techniques help a lot in analyzing
patient’s health data.
• Patient’s health care data can be sorted and any kind
of illness can be predicted which helps in treating the
patients.
34
Disease diagnosis and effective treatment
35
Healthcare applications with Data mining
36
Disease diagnosis using Data Mining
37
4. DATA MINING - RESOURCES
38
Data mining Tools - Open Source
DataMelt
39
Data mining Tools - Proprietary License
Oracle Data Mining
IBM Cognos IBM SPSS Modeler
SAS Data Mining Sisense
SSDT
Teradata Board toolkit Dundas BI
40
Data Mining Tutorials
www.tutorialpoint.com
www.guru99.com
www.tutorialride.com
www.zentut.com
www.cs.cmu.edu
www.javatpoint.com
 www.analyticsvidhya.com
41
Dataset Repository - UCI Repository
https://archive.ics.uci.edu
42
Books
• Han, Jiawei, Jian Pei, and Micheline Kamber. Data
mining: concepts and techniques. Elsevier, 2011.
• Zaki, Mohammed J., Wagner Meira Jr, and Wagner
Meira. Data mining and analysis: Fundamental
concepts and algorithms. Cambridge University Press,
2014.
• Gorunescu, Florin. Data Mining: Concepts, models and
techniques. Vol. 12. Springer Science & Business
Media, 2011.
• Ron Zacharski. A Programmer's Guide to Data Mining
: The Ancient Art of the Numerati,2013
43
44
45

Data mining in healthcare

  • 1.
  • 2.
    DATA MINING INHEALTHCARE Dr. V. Subha, B.E., M.E., Ph.D., Assistant Professor, Department of Computer Science & Engineering, Manonmaniam Sundaranar University, Tirunelveli. 2
  • 3.
    CONTENTS 1. Introduction toData Mining 2. Data Mining techniques 3. Data Mining in Healthcare 4. Data Mining Resources 3
  • 4.
    1. INTRODUCTION TODATA MINING 4
  • 5.
  • 6.
  • 7.
  • 9.
  • 10.
    Definition of DataMining • Process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. - Predicts outcomes of future observations. 10
  • 11.
  • 12.
    Steps in DataMining Evolution 12
  • 13.
    Data Mining -Motivation • Growth in data. • High dimensionality of data. • Heterogeneous and complex data. • Development of commercial data mining software. • Growth of computing power and storage capacity. • Limitation of humans. 13
  • 14.
    Why Data Mining? • Credit ratings : • Given a database of 100,000 names, which persons are the least likely to default on their credit cards? • Fraud detection : • Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer? • Customer relationship management: • Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? Data Mining helps extract such information 14
  • 15.
    Data Mining fordecision making 15
  • 16.
    Types of dataanalysis Predictive models help in diagnosing diseases and then uncovering a special path for treatment. 16
  • 17.
    Knowledge Discovery inDatabases (KDD) 17
  • 18.
    Steps of KDDProcess Data relevant to the analysis is decided and retrieved from the database Data cleansing - Noisy data and missing data are handled appropriately Processed data is transformed into appropriate forms for mining Clever techniques are applied to extract patterns that are potentially useful Mined data patterns are evaluated 18
  • 19.
    Applications of datamining Popular applications of data mining 19
  • 20.
    2. DATA MININGTECHNIQUES 20
  • 21.
  • 22.
    Data Mining Techniques… •Unsupervised learning - No knowledge of output - Self-guided learning algorithm. • Supervised learning - Knowledge of output - Learning with an expert/teacher 22
  • 23.
  • 24.
    Classification – PredictionModel Data Mining Algorithms Classification Prediction • Process of finding a model that describes the data classes or concepts. • Purpose is to use this model to predict the class of instances whose class label is unknown. • This derived model is based on the analysis of sets of training data. 24
  • 25.
  • 26.
  • 27.
    27 Model Construction Training Data AGE GENDERCHEST PAIN DISEASE 63 Male atyp_angina yes 67 Male asympt yes 67 Male asympt yes 37 Male non_anginal no 41 Female atyp_angina no 62 Female asympt yes Classification Algorithms IF chest pain = ‘asympt’ OR age > 53 THEN disease = ‘yes’ Classifier (Model)
  • 28.
    28 Using the Modelin Prediction Classifier (Model) Testing Data (50,male,asympt) Disease?
  • 29.
    Classification Techniques Neural NetworksBayesian Networks Decision Trees Support Vector Machines 29
  • 30.
    Confusion Matrix • Containsinformation about actual and predicted classifications done by a classification system. • Performance of such systems is commonly evaluated using the data in the confusion matrix. 30
  • 31.
    Confusion Matrix… • TP– Patient with heart disease. • FP – Patient wrongly identified to have heart disease. • FN – Patient have been left out from treatment for heart disease. • TN – Patient does not have heart disease. Predicted Class Actual Class Positive Negative Positive True Positive (TP) False Positive (FP) Negative False Negative (FN) True Negative (TN) 31
  • 32.
    Performance Metrics • Accuracy= • Sensitivity = • Specificity =           FN FP TN TP TN TP        FN TP TP        FP TN TN 32
  • 33.
    3. DATA MININGIN HEALTHCARE 33
  • 34.
    Data mining inhealthcare • Hospitals deal with lot of data on a day to day basis. • Difficult for a single person to handle. • Data mining techniques help a lot in analyzing patient’s health data. • Patient’s health care data can be sorted and any kind of illness can be predicted which helps in treating the patients. 34
  • 35.
    Disease diagnosis andeffective treatment 35
  • 36.
  • 37.
  • 38.
    4. DATA MINING- RESOURCES 38
  • 39.
    Data mining Tools- Open Source DataMelt 39
  • 40.
    Data mining Tools- Proprietary License Oracle Data Mining IBM Cognos IBM SPSS Modeler SAS Data Mining Sisense SSDT Teradata Board toolkit Dundas BI 40
  • 41.
  • 42.
    Dataset Repository -UCI Repository https://archive.ics.uci.edu 42
  • 43.
    Books • Han, Jiawei,Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011. • Zaki, Mohammed J., Wagner Meira Jr, and Wagner Meira. Data mining and analysis: Fundamental concepts and algorithms. Cambridge University Press, 2014. • Gorunescu, Florin. Data Mining: Concepts, models and techniques. Vol. 12. Springer Science & Business Media, 2011. • Ron Zacharski. A Programmer's Guide to Data Mining : The Ancient Art of the Numerati,2013 43
  • 44.
  • 45.