Researc-paper_Project Work Phase-1 PPT (21CS09).pptx
1. PROJECT TOPIC: EARLY STAGE DIABETES PREDICTION USING MACHINE LEARNING
PROJECT GROUP ID: 21CS09
Presentation By:
Thejaswini V A 1VE18CS163
Veenashree N 1VE18CS169
Vivek B S 1VE18CS175
Yathindra Prasad N 1VE18CS178
Guided By:
Dr. Nirmala S Guptha
Prof. and HOD
Dept. of CSE-AI
SVCE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SRI VENKATESHWARA COLLEGE OF ENGINEERING
VIDYANAGAR,BANGALORE-562157
2. AGENDA
1. INTRODUCTION
2. WORK FLOW DIAGRAM
3. DESCRIPTION OF MODELS
4. OBJECTIVES
5. INPUT TO MODEL
6. OUTPUT OF THE MODEL
7. HARDWARE AND SOFTWARE REQUIREMENTS
8. STATUS
9. CONCLUSION
10. REFERENCES
3. INTRODUCTION
• Diabetes is noxious diseases in the world causes because of obesity or high blood glucose
level, and so forth.
• It affects the hormone insulin, resulting in abnormal metabolism when body does not make
enough insulin.
• According to (WHO) World Health Organization about 422 million people suffering from
diabetes particularly from low or idle income countries and this could be increased to 490
billion up to the year of 2030.
• However prevalence of diabetes is found among various Countries like Canada, China,
and India etc.
• Population of India is now more than 100 million so the actual number of diabetics in India
is 40 million and is major cause of death in the world.
5. DESCRIPTION OF MODELS
• K-nearest neighbor (KNN) is a supervised machine learning algorithm used for
classification and regression problems . It is based on the theory of similarity measuring.
• Therefore, to predict a new value, neighbors should be put into consideration.
• KNN uses some mathematical equations to calculate the distance between points to find
neighbors.
• In a regression problem, KNN is used to find the mean of the k labels. While in
classification problems, the mode of k labels will be returned .
A. K-Nearest Neighbor
6. B. LOGISTIC REGRESSION
• Logistic regression is a statistical technique used to predict a dependent variable
according to two or more
• We use custom made socket in order to swap the application based modules independent
variables. As well as, present a linear relationship between them and fit them in a linear
equation. The format of the linear equation is as following : yi=β0+β1xi1+...+βnxn +ϵ (1)
Y=1/1+e^-(mx+b) where, for i=n observations,
yi=dependent variable ,
xi=independent variables,
β0=y-intercept, βn=slope coefficients for each independent
variable, ϵ=the model’s error term or residuals
7. C. DECISION TREE
• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
8. D. RANDOM FOREST
• Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.
• As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset." Instead of relying on one decision tree, the random
forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overfitting.
9. OBJECTIVES
• To design an efficient prediction model which will be able to support different types of
models.
• To increase the accuracy to predict weather patient is suffering from diabetes or not
• To design and implement a model in order to achieve an efficiency.
• To develop a model which will be precise in it’s operations.
12. SOFTWARE AND HARDWARE REQUIREMENTS
Hardware requirements:
• Processor : Intel i3 or higher, 2.00GHz or faster
• RAM : 4 GB and above
• Hard disk : 4 GB and above
• Input device : Standard Keyboard and Mouse
• Output device : VGA and High Resolution monitor
Software requirements:
• Operating System: Windows 8 or higher
• Programming : Python 3 and related libraries
• IDE : Jupyter Notebook
13. STATUS
• We are currently working with the backend process which includes many stages for the
required output.
• We have extracted the dataset and performed the initial stage of data cleansing and data
analysation.
• We have performed the visualization of data using the dataset and splitting of the dataset.
• Currently we are working on the different algorithms of classification models.
14. CONCLUSION
• Machine learning has a great ability to revolutionise the diabetic risk prediction
with the help of advanced computational methods.
• Detection of diabetics in the early stages is the key for treatment.
• This work describes a machine learning approach for predicting diabetics level.
• The techniques may also help researchers to develop an accurate and effective
tool that will reach at the table of clinicians to help them to make better decision
about the disease status.
15. REFERENCES
• Choubey, D.K., Paul, S., Kumar, S., Kumar, S., 2017. Classification of Pima Indian
diabetes dataset using naive bayes with genetic algorithm as an attribute selection, in:
Communication and Computing Systems: Proceedings of the International Conference on
Communication and Computing System (ICCCS 2016), pp. 451– 455.
• Dhomse Kanchan B., M.K.M., 2016. Study of Machine Learning Algorithms for Special
Disease Prediction using Principal of Component Analysis, in: 2016 International
Conference on Global Trends in Signal Processing, Information Computing and
Communication, IEEE. pp. 5–10.