Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
Search and Society: Reimagining Information Access for Radical Futures
HEALTH PREDICTION ANALYSIS USING DATA MINING
1. Project presentation of
“ HEALTH PREDICTION ANALYSIS USING
DATA MINING “
Presented by
Name Roll
no
Kritika Ashok Rane. 28
Ashish Ravindra Salve. 30
Ashwini Dhananjay Sawant. 31
Under Guidance Of
Prof. J. P. Patil
2. What is Need of data mining in healthcare.
1. The application of Data mining healthcare has a lot of positive and also life-saving outcomes.
2. Data mining refers to the vast quantities of information created by the digitization of everything, that gets consolidated and
analyzed by specific technologies.
3. The costs of treatments are much higher than they should be, and they have been rising for the past 20 years. Clearly, we need
some smart, data-driven thinking in this area.
How it will affect on patient’s health and wealth.
1. Applied to healthcare, it will use specific health data of a population (or of a particular individual) and potentially help to
prevent epidemics, cure disease, cut down costs, etc.
2. Data mining has become an increasingly pervasive activity in all areas medical science research. Data mining has resulted in
tha discovery of useful hidden patterns from massive databases. By using data mining techniques finally physicians need to
know how quickly identify and diagnose potential cases.
3. Rural Areas get relatively less healthcare facilities and doctor
availability is very poor but there may be some people who might have
relatively sufficient knowledge about pharmaceuticals and they can
treat the patient in urgent basis using their knowledge of what may
happen to the patient . So we need something that predict what will
happen to the patient in less time and save patient’s life. So our main
moto of doing this project is to help the needy people.
4. Hospital management
The ability to detect anomalous behaviour based on purchase, usage and other transactional behaviour information has made
data mining a key tool in variety of organizations to detect fraudulent claims, inappropriate prescriptions and other
abnormal behavioural patterns.
Healthcare management
Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease diagnosis,
electronic patient records, medical devices etc. The large amounts of data is a key resource to be processed and analyzed for
knowledge extraction that enables support for cost-savings and decision making
Pharmaceutical industry
When there is no dispositive evidence favouring a particular treatment option Based on patients’ profile, history, physical
examination, diagnosis and utilizing previous treatment patterns, new treatment plans can be effectively suggested
Personalized treatment planning
Healthcare organizations make customer relationship management decisions, Physicians identify effective treatments and
best practices, and Patients receive better and more affordable healthcare services
Prediction of diseases
Monitor patient’s vital signs and many more….
5. Personalized Medicine
One of the top goal is to create a personalised treatment plan based on individual biology.
Predictive Analytics And Preventive Measures
Prevention is always better than cure. For the healthcare industry, it also happens to save a lot of
money.
The Ultimate EHR ( electronic health record ) also referred as EMR (electronic medical record)
This precious file would contain every piece of information about patient’s health, would always be
up to date could be shared across any network.
Disease Modelling and Mapping
One of the flashiest uses of data science in the past few years has been in tracking, finding ways ton
halt or prevent diseases.
Reduce Fraud And Enhance Security
This particular industry is 200% more likely to experience data breaches than any other industry
because personal data is extremely valuable and profitable in black markets.
6. The application of Data Mining in healthcare has a lot of positive and
also life-saving outcomes. So we are going to develop a model which
predicts the health of the patient through the medical history of patient
stored in the database as EHR in the hospital or particular healthcare
organization so the specialists can predict the disease in less time and
should give the proper treatment to the patient.
7.
8. “Application of big data in medical science Brings revolution in managing health Care of humans”
by Dr. Gagandeep Jagdev (2015)
Domain Big data
Task Performed
Personalized treatment planning
Assisted diagnosis
Fraud detection
Monitor patient vital signs
Digitization of data
Technology Used
First stage: mapping
Intermediate stages: shuffling
Final stage: reducing
Provide Information About
Various disease
Patient treatment
Hospital management info
9. “Data Mining Applications in Healthcare Sector” by M. Sumanth
Domain Data mining
Task Performed
Treatment management
Healthcare management
Customer relationship management
Fraud and abuse
Medical device industry
Pharmaceutical industry
System biology
Hospital management
Technology Used --na--
Provide Information About
Patient treatment
10. “Hybrid Approach for Heart Disease Detection Using Clustering and ANN”
by Neha Chikshe, Tejasweeta Dixit, Rashmi Gore, Prerana Akade (2016)
Domain Clustering and ANN
Task Performed Prediction Of heart Disease
Technology Used
Clustering
Neural networks navie bayes
Decision tree
K- nearest neighbour
Provide Information About Heart disease
11. “Analysis of Data mining techniques for healthcare decision support system using liver disorder dataset”
by Tapas Rajan Baitharu, Subhendu Kumar Pani (2016)
Domain Data mining, ANN
Task Performed Liver disease prediction Analysis
Technology Used
• Naïve Bayes
• Multiplier Perceptron
• ZeroR
Provide Information About
Comparison of various algorithms
12. Hardware Requirement:
1. 1.5 gigahertz (GHz) dual-core C.P.U.
2. 4 GB RAM
3. 1024x768 minimum screen resolution
4. 10GB Of hard disk space
Software Requirements:
1. Microsoft Windows 7+
2. Xampp web server
3. text editor (notepad)
4. Anaconda
13. We Are using following algorithms for implementation
of this project:
• Decision tree
• Logistic regression
• K-nearest neighbours
• Naive Bayes classifier
14. Decision trees are commonly used in operations research, specifically in
decision analysis, to help identify a strategy most likely to reach a goal, but
are also a popular tool in data mining.
Example:-
15. It’s a classification algorithm, that is used where the response
variable is categorical. The idea of Logistic Regression is to find a
relationship between features and probability of particular outcome.
Why we are using Logistic regression:
With binary classification, let ‘x’ be some feature and ‘y’ be the
output which can be either 0 or 1. So we can predict is the
patients records state that the person has the disease or not based
on training data.
16. A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify objects.
Naive Bayes classifiers assume strong, or naive, independence between attributes of
data points. Naive Bayes is also known as simple Bayes or independence Bayes
Formula:-
17. K nearest neighbors is a simple algorithm that stores all available cases
and classifies new cases based on a similarity measure (e.g., distance
functions) KNN has been used in statistical estimation and pattern
recognition.
Example:-
18. Programming Languages :-
1. HTML & CSS:- for designing the interface
2. PHP:- connecting user interface with database
3. Python:- for mining data and generating results
4. SQL:- for Database Management
Python Libraries Required
1. Pandas:- for data manipulation and analysis
2. Seborn:- for data visualization
3. Sklearn:-for data mining
19.
20.
21.
22.
23.
24.
25. We used Statlog (Heart) Data Set from UCI Machine Learning
This data source contains 13 attributes
1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholesterol in mg/dl
6. fasting blood sugar > 120 mg/dl
26. 7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
29. Doctor dashboard showing predictive result
of patient to the doctor including their
reports.
Patient dashboard showing predictive result
and report to patient
30.
31. we had observed that each algorithm has its own unique property on
which the accuracy is determined.
So we had used election approach to state the results in which the
system is providing prediction based on mean of all results from
various algorithms which helps to get more accurate and appropriate
result which also help to make the application more robust.
32. In this system, we used different data mining algorithms &
calculated their accuracy which is given above. In which we had
observed that each algorithm has its own unique property on
which the accuracy is determined. So the system is providing
prediction based on mean of all results from various algorithms
which helps to get more accurate and appropriate result .
33. M. Sumanth. “Data Mining Applications in Healthcare Sector”.
Neha Chikshe, Tejasweeta Dixit, Rashmi Gore Prerana Akade (2016). “Hybrid
Approach for Heart Disease Detection Using Clustering and ANN”. IJRITCC,JAN
2016
Volume 4 Issue 1.
Dr. Gagandeep Jagdev, (2015).” Application Of Big Data In Medical Science Brings
Revolution In Managing Health Care Of Humans”. IJEEE,JAN 2015 Volume 2 SPl.
Issue 1.
Tapas Ranjan Baitharu, Subhendu Kumar Pani (2016)”Analysis of Data Mining
Techniques For Healthcare Decision Support System Using Liver Disorder Dataset”.
Books:
Areth James, Daniela Witten ,Trevor Hastie , Robert Tibshirani ,”An
Introduction to Statistical Learning” By Springer Publications
34. Websites:
1. www.ijritcc.com/index.php/ijritcc/article/view/1718, accessed on 17/08/18
2. www.slideshare.net/madallapallisumanth/data mininginhealthcaresector, accessed
on 17/08/18
3. www.issuu.com/ijeeeapm/docs/id77, accessed on 17/08/18
4.www.immagic.com/eLibrary/ARCHIVES/GENERAL/WIKIPEDI/W1120615B.pdf,
accessed on 31/08/18
5. www.Wikipedia.com, accessed on 05/10/18
6. www.saedsayad.com, accessed on 10/10/18
7. www.scikit-learn.org/stable/modules/generated/sklearn.metrics.confusionmatrix.html,
accessed on 9/01/19
8.www.python.org, accessed on 15/01/19
9.www.archive.ics.uci.edu/ml/datasets/Heart + Disease,accessed on 28/01/19