2. DATA MINING IN HEALTHCARE
Dr. V. Subha, B.E., M.E., Ph.D.,
Assistant Professor,
Department of Computer Science & Engineering,
Manonmaniam Sundaranar University,
Tirunelveli. 2
3. CONTENTS
1. Introduction to Data Mining
2. Data Mining techniques
3. Data Mining in Healthcare
4. Data Mining Resources
3
10. Definition of Data Mining
• Process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.
- Predicts outcomes of future
observations.
10
13. Data Mining - Motivation
• Growth in data.
• High dimensionality of data.
• Heterogeneous and complex data.
• Development of commercial data mining software.
• Growth of computing power and storage capacity.
• Limitation of humans.
13
14. Why Data Mining ?
• Credit ratings :
• Given a database of 100,000 names, which persons are the least likely
to default on their credit cards?
• Fraud detection :
• Which types of transactions are likely to be fraudulent, given the
demographics and transactional history of a particular customer?
• Customer relationship management:
• Which of my customers are likely to be the most loyal, and which are
most likely to leave for a competitor?
Data Mining helps extract such information
14
18. Steps of KDD Process
Data relevant to the
analysis is decided and
retrieved from the
database
Data cleansing -
Noisy data and
missing data are
handled
appropriately
Processed data is
transformed into
appropriate forms
for mining
Clever
techniques are
applied to
extract patterns
that are
potentially
useful
Mined data
patterns are
evaluated
18
22. Data Mining Techniques…
• Unsupervised learning
- No knowledge of output
- Self-guided learning algorithm.
• Supervised learning
- Knowledge of output
- Learning with an expert/teacher
22
24. Classification – Prediction Model
Data Mining
Algorithms
Classification
Prediction
• Process of finding a model that describes the data classes or concepts.
• Purpose is to use this model to predict the class of instances whose class
label is unknown.
• This derived model is based on the analysis of sets of training data.
24
27. 27
Model Construction
Training
Data
AGE GENDER CHEST PAIN DISEASE
63 Male atyp_angina yes
67 Male asympt yes
67 Male asympt yes
37 Male non_anginal no
41 Female atyp_angina no
62 Female asympt yes
Classification
Algorithms
IF chest pain = ‘asympt’
OR age > 53
THEN disease = ‘yes’
Classifier
(Model)
28. 28
Using the Model in Prediction
Classifier
(Model)
Testing Data
(50,male,asympt)
Disease?
30. Confusion Matrix
• Contains information about actual and predicted classifications
done by a classification system.
• Performance of such systems is commonly evaluated using the
data in the confusion matrix.
30
31. Confusion Matrix…
• TP – Patient with heart disease.
• FP – Patient wrongly identified to have heart disease.
• FN – Patient have been left out from treatment for heart disease.
• TN – Patient does not have heart disease.
Predicted
Class
Actual Class
Positive Negative
Positive True Positive (TP) False Positive (FP)
Negative False Negative (FN) True Negative (TN)
31
34. Data mining in healthcare
• Hospitals deal with lot of data on a day to day basis.
• Difficult for a single person to handle.
• Data mining techniques help a lot in analyzing
patient’s health data.
• Patient’s health care data can be sorted and any kind
of illness can be predicted which helps in treating the
patients.
34
40. Data mining Tools - Proprietary License
Oracle Data Mining
IBM Cognos IBM SPSS Modeler
SAS Data Mining Sisense
SSDT
Teradata Board toolkit Dundas BI
40
43. Books
• Han, Jiawei, Jian Pei, and Micheline Kamber. Data
mining: concepts and techniques. Elsevier, 2011.
• Zaki, Mohammed J., Wagner Meira Jr, and Wagner
Meira. Data mining and analysis: Fundamental
concepts and algorithms. Cambridge University Press,
2014.
• Gorunescu, Florin. Data Mining: Concepts, models and
techniques. Vol. 12. Springer Science & Business
Media, 2011.
• Ron Zacharski. A Programmer's Guide to Data Mining
: The Ancient Art of the Numerati,2013
43