Project presentation of
“ HEALTH PREDICTION ANALYSIS USING
DATA MINING “
Presented by
Name Roll
no
Kritika Ashok Rane. 28
Ashish Ravindra Salve. 30
Ashwini Dhananjay Sawant. 31
Under Guidance Of
Prof. J. P. Patil
 What is Need of data mining in healthcare.
1. The application of Data mining healthcare has a lot of positive and also life-saving outcomes.
2. Data mining refers to the vast quantities of information created by the digitization of everything, that gets consolidated and
analyzed by specific technologies.
3. The costs of treatments are much higher than they should be, and they have been rising for the past 20 years. Clearly, we need
some smart, data-driven thinking in this area.
 How it will affect on patient’s health and wealth.
1. Applied to healthcare, it will use specific health data of a population (or of a particular individual) and potentially help to
prevent epidemics, cure disease, cut down costs, etc.
2. Data mining has become an increasingly pervasive activity in all areas medical science research. Data mining has resulted in
tha discovery of useful hidden patterns from massive databases. By using data mining techniques finally physicians need to
know how quickly identify and diagnose potential cases.
Rural Areas get relatively less healthcare facilities and doctor
availability is very poor but there may be some people who might have
relatively sufficient knowledge about pharmaceuticals and they can
treat the patient in urgent basis using their knowledge of what may
happen to the patient . So we need something that predict what will
happen to the patient in less time and save patient’s life. So our main
moto of doing this project is to help the needy people.
 Hospital management
The ability to detect anomalous behaviour based on purchase, usage and other transactional behaviour information has made
data mining a key tool in variety of organizations to detect fraudulent claims, inappropriate prescriptions and other
abnormal behavioural patterns.
 Healthcare management
Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease diagnosis,
electronic patient records, medical devices etc. The large amounts of data is a key resource to be processed and analyzed for
knowledge extraction that enables support for cost-savings and decision making
 Pharmaceutical industry
When there is no dispositive evidence favouring a particular treatment option Based on patients’ profile, history, physical
examination, diagnosis and utilizing previous treatment patterns, new treatment plans can be effectively suggested
 Personalized treatment planning
Healthcare organizations make customer relationship management decisions, Physicians identify effective treatments and
best practices, and Patients receive better and more affordable healthcare services
 Prediction of diseases
 Monitor patient’s vital signs and many more….
 Personalized Medicine
One of the top goal is to create a personalised treatment plan based on individual biology.
 Predictive Analytics And Preventive Measures
Prevention is always better than cure. For the healthcare industry, it also happens to save a lot of
money.
 The Ultimate EHR ( electronic health record ) also referred as EMR (electronic medical record)
This precious file would contain every piece of information about patient’s health, would always be
up to date could be shared across any network.
 Disease Modelling and Mapping
One of the flashiest uses of data science in the past few years has been in tracking, finding ways ton
halt or prevent diseases.
 Reduce Fraud And Enhance Security
This particular industry is 200% more likely to experience data breaches than any other industry
because personal data is extremely valuable and profitable in black markets.
The application of Data Mining in healthcare has a lot of positive and
also life-saving outcomes. So we are going to develop a model which
predicts the health of the patient through the medical history of patient
stored in the database as EHR in the hospital or particular healthcare
organization so the specialists can predict the disease in less time and
should give the proper treatment to the patient.
“Application of big data in medical science Brings revolution in managing health Care of humans”
by Dr. Gagandeep Jagdev (2015)
Domain Big data
Task Performed
 Personalized treatment planning
 Assisted diagnosis
 Fraud detection
 Monitor patient vital signs
 Digitization of data
Technology Used
 First stage: mapping
 Intermediate stages: shuffling
 Final stage: reducing
Provide Information About
 Various disease
 Patient treatment
 Hospital management info
“Data Mining Applications in Healthcare Sector” by M. Sumanth
Domain Data mining
Task Performed
 Treatment management
 Healthcare management
 Customer relationship management
 Fraud and abuse
 Medical device industry
 Pharmaceutical industry
 System biology
 Hospital management
Technology Used --na--
Provide Information About
 Patient treatment
“Hybrid Approach for Heart Disease Detection Using Clustering and ANN”
by Neha Chikshe, Tejasweeta Dixit, Rashmi Gore, Prerana Akade (2016)
Domain Clustering and ANN
Task Performed Prediction Of heart Disease
Technology Used
 Clustering
 Neural networks navie bayes
 Decision tree
 K- nearest neighbour
Provide Information About Heart disease
“Analysis of Data mining techniques for healthcare decision support system using liver disorder dataset”
by Tapas Rajan Baitharu, Subhendu Kumar Pani (2016)
Domain Data mining, ANN
Task Performed Liver disease prediction Analysis
Technology Used
• Naïve Bayes
• Multiplier Perceptron
• ZeroR
Provide Information About
Comparison of various algorithms
 Hardware Requirement:
1. 1.5 gigahertz (GHz) dual-core C.P.U.
2. 4 GB RAM
3. 1024x768 minimum screen resolution
4. 10GB Of hard disk space
 Software Requirements:
1. Microsoft Windows 7+
2. Xampp web server
3. text editor (notepad)
4. Anaconda
We Are using following algorithms for implementation
of this project:
• Decision tree
• Logistic regression
• K-nearest neighbours
• Naive Bayes classifier
Decision trees are commonly used in operations research, specifically in
decision analysis, to help identify a strategy most likely to reach a goal, but
are also a popular tool in data mining.
Example:-
 It’s a classification algorithm, that is used where the response
variable is categorical. The idea of Logistic Regression is to find a
relationship between features and probability of particular outcome.
 Why we are using Logistic regression:
With binary classification, let ‘x’ be some feature and ‘y’ be the
output which can be either 0 or 1. So we can predict is the
patients records state that the person has the disease or not based
on training data.
A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify objects.
Naive Bayes classifiers assume strong, or naive, independence between attributes of
data points. Naive Bayes is also known as simple Bayes or independence Bayes
Formula:-
K nearest neighbors is a simple algorithm that stores all available cases
and classifies new cases based on a similarity measure (e.g., distance
functions) KNN has been used in statistical estimation and pattern
recognition.
Example:-
 Programming Languages :-
1. HTML & CSS:- for designing the interface
2. PHP:- connecting user interface with database
3. Python:- for mining data and generating results
4. SQL:- for Database Management
 Python Libraries Required
1. Pandas:- for data manipulation and analysis
2. Seborn:- for data visualization
3. Sklearn:-for data mining
 We used Statlog (Heart) Data Set from UCI Machine Learning
This data source contains 13 attributes
1. age
2. sex
3. chest pain type (4 values)
4. resting blood pressure
5. serum cholesterol in mg/dl
6. fasting blood sugar > 120 mg/dl
7. resting electrocardiographic results (values 0,1,2)
8. maximum heart rate achieved
9. exercise induced angina
10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment
12. number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
Real: 1,4,5,8,10,12
Ordered:11
Binary: 2,6,9
Nominal:7,3,13
Variable to be predicted
Absence (1) or presence (2) of heart disease
EHR creation interface
Doctor dashboard showing all
records created by doctor.
Doctor dashboard showing predictive result
of patient to the doctor including their
reports.
Patient dashboard showing predictive result
and report to patient
 we had observed that each algorithm has its own unique property on
which the accuracy is determined.
 So we had used election approach to state the results in which the
system is providing prediction based on mean of all results from
various algorithms which helps to get more accurate and appropriate
result which also help to make the application more robust.
 In this system, we used different data mining algorithms &
calculated their accuracy which is given above. In which we had
observed that each algorithm has its own unique property on
which the accuracy is determined. So the system is providing
prediction based on mean of all results from various algorithms
which helps to get more accurate and appropriate result .
 M. Sumanth. “Data Mining Applications in Healthcare Sector”.
 Neha Chikshe, Tejasweeta Dixit, Rashmi Gore Prerana Akade (2016). “Hybrid
Approach for Heart Disease Detection Using Clustering and ANN”. IJRITCC,JAN
2016
Volume 4 Issue 1.
 Dr. Gagandeep Jagdev, (2015).” Application Of Big Data In Medical Science Brings
Revolution In Managing Health Care Of Humans”. IJEEE,JAN 2015 Volume 2 SPl.
Issue 1.
 Tapas Ranjan Baitharu, Subhendu Kumar Pani (2016)”Analysis of Data Mining
Techniques For Healthcare Decision Support System Using Liver Disorder Dataset”.
Books:
Areth James, Daniela Witten ,Trevor Hastie , Robert Tibshirani ,”An
Introduction to Statistical Learning” By Springer Publications
Websites:
1. www.ijritcc.com/index.php/ijritcc/article/view/1718, accessed on 17/08/18
2. www.slideshare.net/madallapallisumanth/data mininginhealthcaresector, accessed
on 17/08/18
3. www.issuu.com/ijeeeapm/docs/id77, accessed on 17/08/18
4.www.immagic.com/eLibrary/ARCHIVES/GENERAL/WIKIPEDI/W1120615B.pdf,
accessed on 31/08/18
5. www.Wikipedia.com, accessed on 05/10/18
6. www.saedsayad.com, accessed on 10/10/18
7. www.scikit-learn.org/stable/modules/generated/sklearn.metrics.confusionmatrix.html,
accessed on 9/01/19
8.www.python.org, accessed on 15/01/19
9.www.archive.ics.uci.edu/ml/datasets/Heart + Disease,accessed on 28/01/19

HEALTH PREDICTION ANALYSIS USING DATA MINING

  • 1.
    Project presentation of “HEALTH PREDICTION ANALYSIS USING DATA MINING “ Presented by Name Roll no Kritika Ashok Rane. 28 Ashish Ravindra Salve. 30 Ashwini Dhananjay Sawant. 31 Under Guidance Of Prof. J. P. Patil
  • 2.
     What isNeed of data mining in healthcare. 1. The application of Data mining healthcare has a lot of positive and also life-saving outcomes. 2. Data mining refers to the vast quantities of information created by the digitization of everything, that gets consolidated and analyzed by specific technologies. 3. The costs of treatments are much higher than they should be, and they have been rising for the past 20 years. Clearly, we need some smart, data-driven thinking in this area.  How it will affect on patient’s health and wealth. 1. Applied to healthcare, it will use specific health data of a population (or of a particular individual) and potentially help to prevent epidemics, cure disease, cut down costs, etc. 2. Data mining has become an increasingly pervasive activity in all areas medical science research. Data mining has resulted in tha discovery of useful hidden patterns from massive databases. By using data mining techniques finally physicians need to know how quickly identify and diagnose potential cases.
  • 3.
    Rural Areas getrelatively less healthcare facilities and doctor availability is very poor but there may be some people who might have relatively sufficient knowledge about pharmaceuticals and they can treat the patient in urgent basis using their knowledge of what may happen to the patient . So we need something that predict what will happen to the patient in less time and save patient’s life. So our main moto of doing this project is to help the needy people.
  • 4.
     Hospital management Theability to detect anomalous behaviour based on purchase, usage and other transactional behaviour information has made data mining a key tool in variety of organizations to detect fraudulent claims, inappropriate prescriptions and other abnormal behavioural patterns.  Healthcare management Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease diagnosis, electronic patient records, medical devices etc. The large amounts of data is a key resource to be processed and analyzed for knowledge extraction that enables support for cost-savings and decision making  Pharmaceutical industry When there is no dispositive evidence favouring a particular treatment option Based on patients’ profile, history, physical examination, diagnosis and utilizing previous treatment patterns, new treatment plans can be effectively suggested  Personalized treatment planning Healthcare organizations make customer relationship management decisions, Physicians identify effective treatments and best practices, and Patients receive better and more affordable healthcare services  Prediction of diseases  Monitor patient’s vital signs and many more….
  • 5.
     Personalized Medicine Oneof the top goal is to create a personalised treatment plan based on individual biology.  Predictive Analytics And Preventive Measures Prevention is always better than cure. For the healthcare industry, it also happens to save a lot of money.  The Ultimate EHR ( electronic health record ) also referred as EMR (electronic medical record) This precious file would contain every piece of information about patient’s health, would always be up to date could be shared across any network.  Disease Modelling and Mapping One of the flashiest uses of data science in the past few years has been in tracking, finding ways ton halt or prevent diseases.  Reduce Fraud And Enhance Security This particular industry is 200% more likely to experience data breaches than any other industry because personal data is extremely valuable and profitable in black markets.
  • 6.
    The application ofData Mining in healthcare has a lot of positive and also life-saving outcomes. So we are going to develop a model which predicts the health of the patient through the medical history of patient stored in the database as EHR in the hospital or particular healthcare organization so the specialists can predict the disease in less time and should give the proper treatment to the patient.
  • 8.
    “Application of bigdata in medical science Brings revolution in managing health Care of humans” by Dr. Gagandeep Jagdev (2015) Domain Big data Task Performed  Personalized treatment planning  Assisted diagnosis  Fraud detection  Monitor patient vital signs  Digitization of data Technology Used  First stage: mapping  Intermediate stages: shuffling  Final stage: reducing Provide Information About  Various disease  Patient treatment  Hospital management info
  • 9.
    “Data Mining Applicationsin Healthcare Sector” by M. Sumanth Domain Data mining Task Performed  Treatment management  Healthcare management  Customer relationship management  Fraud and abuse  Medical device industry  Pharmaceutical industry  System biology  Hospital management Technology Used --na-- Provide Information About  Patient treatment
  • 10.
    “Hybrid Approach forHeart Disease Detection Using Clustering and ANN” by Neha Chikshe, Tejasweeta Dixit, Rashmi Gore, Prerana Akade (2016) Domain Clustering and ANN Task Performed Prediction Of heart Disease Technology Used  Clustering  Neural networks navie bayes  Decision tree  K- nearest neighbour Provide Information About Heart disease
  • 11.
    “Analysis of Datamining techniques for healthcare decision support system using liver disorder dataset” by Tapas Rajan Baitharu, Subhendu Kumar Pani (2016) Domain Data mining, ANN Task Performed Liver disease prediction Analysis Technology Used • Naïve Bayes • Multiplier Perceptron • ZeroR Provide Information About Comparison of various algorithms
  • 12.
     Hardware Requirement: 1.1.5 gigahertz (GHz) dual-core C.P.U. 2. 4 GB RAM 3. 1024x768 minimum screen resolution 4. 10GB Of hard disk space  Software Requirements: 1. Microsoft Windows 7+ 2. Xampp web server 3. text editor (notepad) 4. Anaconda
  • 13.
    We Are usingfollowing algorithms for implementation of this project: • Decision tree • Logistic regression • K-nearest neighbours • Naive Bayes classifier
  • 14.
    Decision trees arecommonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in data mining. Example:-
  • 15.
     It’s aclassification algorithm, that is used where the response variable is categorical. The idea of Logistic Regression is to find a relationship between features and probability of particular outcome.  Why we are using Logistic regression: With binary classification, let ‘x’ be some feature and ‘y’ be the output which can be either 0 or 1. So we can predict is the patients records state that the person has the disease or not based on training data.
  • 16.
    A naive Bayesclassifier is an algorithm that uses Bayes' theorem to classify objects. Naive Bayes classifiers assume strong, or naive, independence between attributes of data points. Naive Bayes is also known as simple Bayes or independence Bayes Formula:-
  • 17.
    K nearest neighborsis a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions) KNN has been used in statistical estimation and pattern recognition. Example:-
  • 18.
     Programming Languages:- 1. HTML & CSS:- for designing the interface 2. PHP:- connecting user interface with database 3. Python:- for mining data and generating results 4. SQL:- for Database Management  Python Libraries Required 1. Pandas:- for data manipulation and analysis 2. Seborn:- for data visualization 3. Sklearn:-for data mining
  • 25.
     We usedStatlog (Heart) Data Set from UCI Machine Learning This data source contains 13 attributes 1. age 2. sex 3. chest pain type (4 values) 4. resting blood pressure 5. serum cholesterol in mg/dl 6. fasting blood sugar > 120 mg/dl
  • 26.
    7. resting electrocardiographicresults (values 0,1,2) 8. maximum heart rate achieved 9. exercise induced angina 10. oldpeak = ST depression induced by exercise relative to rest 11. the slope of the peak exercise ST segment 12. number of major vessels (0-3) colored by flourosopy 13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
  • 27.
    Real: 1,4,5,8,10,12 Ordered:11 Binary: 2,6,9 Nominal:7,3,13 Variableto be predicted Absence (1) or presence (2) of heart disease
  • 28.
    EHR creation interface Doctordashboard showing all records created by doctor.
  • 29.
    Doctor dashboard showingpredictive result of patient to the doctor including their reports. Patient dashboard showing predictive result and report to patient
  • 31.
     we hadobserved that each algorithm has its own unique property on which the accuracy is determined.  So we had used election approach to state the results in which the system is providing prediction based on mean of all results from various algorithms which helps to get more accurate and appropriate result which also help to make the application more robust.
  • 32.
     In thissystem, we used different data mining algorithms & calculated their accuracy which is given above. In which we had observed that each algorithm has its own unique property on which the accuracy is determined. So the system is providing prediction based on mean of all results from various algorithms which helps to get more accurate and appropriate result .
  • 33.
     M. Sumanth.“Data Mining Applications in Healthcare Sector”.  Neha Chikshe, Tejasweeta Dixit, Rashmi Gore Prerana Akade (2016). “Hybrid Approach for Heart Disease Detection Using Clustering and ANN”. IJRITCC,JAN 2016 Volume 4 Issue 1.  Dr. Gagandeep Jagdev, (2015).” Application Of Big Data In Medical Science Brings Revolution In Managing Health Care Of Humans”. IJEEE,JAN 2015 Volume 2 SPl. Issue 1.  Tapas Ranjan Baitharu, Subhendu Kumar Pani (2016)”Analysis of Data Mining Techniques For Healthcare Decision Support System Using Liver Disorder Dataset”. Books: Areth James, Daniela Witten ,Trevor Hastie , Robert Tibshirani ,”An Introduction to Statistical Learning” By Springer Publications
  • 34.
    Websites: 1. www.ijritcc.com/index.php/ijritcc/article/view/1718, accessedon 17/08/18 2. www.slideshare.net/madallapallisumanth/data mininginhealthcaresector, accessed on 17/08/18 3. www.issuu.com/ijeeeapm/docs/id77, accessed on 17/08/18 4.www.immagic.com/eLibrary/ARCHIVES/GENERAL/WIKIPEDI/W1120615B.pdf, accessed on 31/08/18 5. www.Wikipedia.com, accessed on 05/10/18 6. www.saedsayad.com, accessed on 10/10/18 7. www.scikit-learn.org/stable/modules/generated/sklearn.metrics.confusionmatrix.html, accessed on 9/01/19 8.www.python.org, accessed on 15/01/19 9.www.archive.ics.uci.edu/ml/datasets/Heart + Disease,accessed on 28/01/19