IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
This paper presents a review & performs a comparative evaluation of few known machine learning
algorithms in terms of their suitability & code performance on any given data set of any size. In this paper,
we describe our Machine Learning ToolBox that we have built using python programming language. The
algorithms used in the toolbox consists of supervised classification algorithms such as Naïve Bayes,
Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are
tested on iris and diabetes dataset and are compared on the basis of their accuracy under different
conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of
any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on
different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the
data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or
unstructured form and then can choose the features he/she wants to use for training the machine We have
given our concluding remarks on the performance of implemented algorithms based on experimental
analysis
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERcscpconf
In this paper, we study the performance criterion of machine learning tools in classifying breast cancer. We compare the data mining tools such as Naïve Bayes, Support vector machines, Radial basis neural networks, Decision trees J48 and simple CART. We used both binary and multi class data sets namely WBC, WDBC and Breast tissue from UCI machine learning depositary. The experiments are conducted in WEKA. The aim of this research is to find out the best classifier with respect to accuracy, precision, sensitivity and specificity in detecting breast cancer
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
This paper presents a review & performs a comparative evaluation of few known machine learning
algorithms in terms of their suitability & code performance on any given data set of any size. In this paper,
we describe our Machine Learning ToolBox that we have built using python programming language. The
algorithms used in the toolbox consists of supervised classification algorithms such as Naïve Bayes,
Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are
tested on iris and diabetes dataset and are compared on the basis of their accuracy under different
conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of
any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on
different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the
data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or
unstructured form and then can choose the features he/she wants to use for training the machine We have
given our concluding remarks on the performance of implemented algorithms based on experimental
analysis
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERcscpconf
In this paper, we study the performance criterion of machine learning tools in classifying breast cancer. We compare the data mining tools such as Naïve Bayes, Support vector machines, Radial basis neural networks, Decision trees J48 and simple CART. We used both binary and multi class data sets namely WBC, WDBC and Breast tissue from UCI machine learning depositary. The experiments are conducted in WEKA. The aim of this research is to find out the best classifier with respect to accuracy, precision, sensitivity and specificity in detecting breast cancer
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
The main objective of this survey paper is to provide a detailed description of Wireless Sensor Networks with Medium Access Control layer and Routing layer. In the medium access control layer, Event Driven Time Division Multiple Access protocol is studied and in Network layer, two routing protocols Bellman-Ford and Dynamic Source Routing are studied.
You will learn the basic concepts of machine learning classification and will be introduced to some different algorithms that can be used. This is from a very high level and will not be getting into the nitty-gritty details.
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
Decision Trees in Machine Learning - Decision tree method is a commonly used data mining method for establishing classification systems based on several covariates or for developing prediction algorithms for a target variable.
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
—Recommendation is crucial in both academia andindustry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic re-gression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies sufferfrom two limitations: (1) considering the recommendation asa static procedure and ignoring the dynamic interactive naturebetween users and the recommender systems; (2) focusing on theimmediate feedback of recommended items and neglecting thelong-term rewards. To address the two limitations, in this paperwe propose a novel recommendation framework based on deepreinforcement learning, called DRR. The DRR framework treatsrecommendation as a sequential decision making procedure andadopts an “Actor-Critic” reinforcement learning scheme to modelthe interactions between the users and recommender systems,which can consider both the dynamic adaptation and long-term rewards. Further more, a state representation module isincorporated into DRR, which can explicitly capture the interac-tions between items and users. Three instantiation structures aredeveloped. Extensive experiments on four real-world datasets areconducted under both the offline and online evaluation settings.The experimental results demonstrate the proposed DRR methodindeed outperforms the state-of-the-art competitors
A pre conference workshop on Machine Learning was organized as a part of #doppa17, DevOps++ Global Summit 2017. The workshop was conducted by Dr. Vivek Vijay and Dr. Sandeep Yadav. All the copyrights are reserved with the author.
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
In this work, the TREPAN algorithm is enhanced and extended for extracting decision trees
from neural networks. We empirically evaluated the performance of the algorithm on a set of
databases from real world events. This benchmark enhancement was achieved by adapting
Single-test TREPAN and C4.5 decision tree induction algorithms to analyze the datasets. The
models are then compared with X-TREPAN for comprehensibility and classification accuracy.
Furthermore, we validate the experimentations by applying statistical methods. Finally, the
modified algorithm is extended to work with multi-class regression problems and the ability to
comprehend generalized feed forward networks is achieved.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
The main objective of this survey paper is to provide a detailed description of Wireless Sensor Networks with Medium Access Control layer and Routing layer. In the medium access control layer, Event Driven Time Division Multiple Access protocol is studied and in Network layer, two routing protocols Bellman-Ford and Dynamic Source Routing are studied.
You will learn the basic concepts of machine learning classification and will be introduced to some different algorithms that can be used. This is from a very high level and will not be getting into the nitty-gritty details.
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
Decision Trees in Machine Learning - Decision tree method is a commonly used data mining method for establishing classification systems based on several covariates or for developing prediction algorithms for a target variable.
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
—Recommendation is crucial in both academia andindustry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic re-gression, factorization machines, neural networks and multi-armed bandits. However, most of the previous studies sufferfrom two limitations: (1) considering the recommendation asa static procedure and ignoring the dynamic interactive naturebetween users and the recommender systems; (2) focusing on theimmediate feedback of recommended items and neglecting thelong-term rewards. To address the two limitations, in this paperwe propose a novel recommendation framework based on deepreinforcement learning, called DRR. The DRR framework treatsrecommendation as a sequential decision making procedure andadopts an “Actor-Critic” reinforcement learning scheme to modelthe interactions between the users and recommender systems,which can consider both the dynamic adaptation and long-term rewards. Further more, a state representation module isincorporated into DRR, which can explicitly capture the interac-tions between items and users. Three instantiation structures aredeveloped. Extensive experiments on four real-world datasets areconducted under both the offline and online evaluation settings.The experimental results demonstrate the proposed DRR methodindeed outperforms the state-of-the-art competitors
A pre conference workshop on Machine Learning was organized as a part of #doppa17, DevOps++ Global Summit 2017. The workshop was conducted by Dr. Vivek Vijay and Dr. Sandeep Yadav. All the copyrights are reserved with the author.
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
In this work, the TREPAN algorithm is enhanced and extended for extracting decision trees
from neural networks. We empirically evaluated the performance of the algorithm on a set of
databases from real world events. This benchmark enhancement was achieved by adapting
Single-test TREPAN and C4.5 decision tree induction algorithms to analyze the datasets. The
models are then compared with X-TREPAN for comprehensibility and classification accuracy.
Furthermore, we validate the experimentations by applying statistical methods. Finally, the
modified algorithm is extended to work with multi-class regression problems and the ability to
comprehend generalized feed forward networks is achieved.
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
Data mining, the extraction of hidden information from large databases, is to predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data-Mining classification techniques deals with determining to which group each data instances are associated with. It can deal with a wide variety of data so that large amount of data can be involved in processing. This paper deals with analysis on various data mining classification techniques such as Decision Tree Induction, Naïve Bayes , k-Nearest Neighbour (KNN) classifiers in mammographic mass dataset.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
Analytical study of feature extraction techniques in opinion miningcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for
dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction
in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first
part discusses various techniques and second part makes a detailed appraisal of the major
techniques used for feature extraction
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
Although opinion mining is in a nascent stage of development but still the ground is set for dense growth of researches in the field. One of the important activities of opinion mining is to extract opinions of people based on characteristics of the object under study. Feature extraction in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first part discusses various techniques and second part makes a detailed appraisal of the major techniques used for feature extraction
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
Although opinion mining is in a nascent stage of development but still the ground is set for dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first part discusses various techniques and second part makes a detailed appraisal of the major techniques used for feature extraction.
Classification of data is a data mining technique based on machine learning is used to classification of each item set in as a set of dataset into a set of predefined labelled as classes or groups. Classification is tasks for different application such as text classification, image classification, class’s predictions, data Classification etc. In this paper, we presenting the major classification techniques used for prediction of classes using supervised learning dataset. Several major types of classification method including Random Forest, Naive Bayes, Support Vector Machine (SVM) techniques. The goal of this review paper is to provide a review, accuracy and comparative between different classification techniques in data mining.
Clustering and Classification of Cancer Data Using Soft Computing Technique IOSR Journals
Clustering and classification of cancer data has been used with success in field of medical side. In
this paper the two algorithm K-means and fuzzy C-means proposed for the comparison and find the accuracy of
the result. this paper address the problem of learning to classify the cancer data with two different method and
information derived from the training and testing .various soft computing based classification and show the
comparison of classification technique and classification of this health care data .this paper present the
accuracy of the result in cancer data.
Similar to IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES (20)
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
1. Progress REPORT ON
IMAGE CLASSIFICATION USING
DIFFERENT CLASSICAL APPROACHES
UNIVERSITY INSTITUTE OF TECHNOLOGY
THE UNIVERSITY OF BURDWAN
(Dept. Of Information Technology, 2016-2020)
SUPERVISOR: MR. ARINDAM CHOWDHURY
SUBMITTED BY:
(GROUP-03) - 7th
Semester
PRASHANT CHOUDHARY (2016-3003)
VIKASH KUMAR (2016-3028)
RAKESH RANJAN (2016-3027)
SUMIT ABHISHEK (2016-3031)
2. Contents
1. Abstract
2. Introduction
3. Problem Statement and Data sets
4. Some terminologies
5. Software & Hardware Requirement
6. Different models used (Algorithms)
a. K-Nearest Neighbors
b. Random Forest Classification
c. Adaptive Boosting
d. Support Vector Machine
7. Implementation of our models on problem set
8. Comparison between various Algorithms
9. Future improvements and scopes
10. Conclusion
11. References
3. ABSTRACT
Image classification is a complex process that may be affected by many
factors. This paper examines current practices, problems, and prospects
of image classification. The emphasis is placed on the summarization of
major advanced classification approaches and the techniques used for
improving classification accuracy. In addition, some important issues
affecting classification performance are discussed. This literature review
suggests that designing a suitable image‐processing procedure is a
prerequisite for a successful classification of remotely sensed data into a
thematic map. Effective use of multiple features of remotely sensed data
and the selection of a suitable classification method are especially
significant for improving classification accuracy. Non‐parametric
classifiers such as neural network, decision tree classifier, and
knowledge‐based classification have increasingly become important
approaches for multisource data classification. Integration of remote
sensing, geographical information systems (GIS), and expert system
emerges as a new research frontier.
More research, however, is needed to identify and reduce uncertainties
in the image‐processing chain to improve classification accuracy.
4. INTRODUCTION
The image classification follows the steps as pre-processing,
segmentation, feature extraction and classification. In the Classification
system database is very important that contains predefined sample
patterns of object under consideration that compare with the test object
to classify it appropriate class. Image Classification is an important task
in various fields such as biometry, remote sensing, and biomedical
images. In a typical classification system image is captured by a camera
and consequently processed. In Supervised classification, first of all
training took place through known group of pixels. The trained classifier
used to classify other images. The Unsupervised classification uses the
properties of the pixels to group them and these groups are known as
cluster and process is called clustering. The numbers of clusters are
decided by users. When trained pixels are not available the unsupervised
classification is used. The example for classification methods are:
Decision Tree, Artificial Neural Network (ANN) and Support Vector
Machines.
5. PROBLEM STATEMENTS AND DATA SETS
Problem statement: To study a retina image dataset and to model a
classifier for predicting whether a person is suffering from glaucoma or not.
the problem statement for a document classifier has two aspects: the
document space and set of document class. The former defines the range
of input documents and the latter defines the output that the classifier can
produce.
Here in our project, the document space is a database consisting of several
numerical data sets of retinal Image.
Data Sets: we have taken 255 retinal image data sets and performed our
classification operations on that image. We have used 70% of the image
data set for training our model and left 30% for testing the model.
The features are extracted from the fundus images using image processing
techniques - kurtosis, k-stat, mean, median, standard deviation and the
obtained numerical features are stored in a dataset.
6. Some Terminologies
Confusion Matrix:
A confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized with count values
and broken down by each class. This is the key to the confusion matrix.
The confusion matrix shows the ways in which your classification model is
confused when it makes predictions.
It gives us insight not only into the errors being made by a classifier but more
importantly the types of errors that are being made.
Definition of the Terms:
• Positive (P) : Observation is positive (for example: is an apple).
• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative.
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive.
7. SOFTWARE AND HARDWARE REQUIREMENTS
• SOFTWARE
1. Jupyter Notebook (Anaconda):Anaconda is a free and open-
source[5] distribution of the Python and R programming languages
for scientific computing (data science, machine
learning applications, large-scale data processing, predictive
analytics, etc.), that aims to simplify package management and
deployment. Package versions are managed by the package
management system conda.[6] The Anaconda distribution includes
data-science packages suitable for Windows, Linux, and MacOS.
And Different Package install for implementation
a) NumPy Library
b) Pandas Library
c) Matplotlib
2. Browser
• HARDWARE
1. Windows 7/8/10
2. RAM 2GB
3. Minimum Storage 20GB
8. DIFFERENT MODELS USED (Algorithms)
We Have used four algorithms which are
➢ K-Nearest Neighbors
➢ Random Forest Classification
➢ Adaptive Boosting
➢ Support Vector Machine
K-NEAREST NEIGHBORS
The K-NN is also the classifier of the category of supervised learning algorithm. In
supervised learning the targets are known to us but the pathway to target is not
known. To comprehend machine learning nearest neighbor forms is the perfect
example. Let us consider that there are many clusters of labelled samples. The
nature of items of the same identified clusters or groups are of homogeneous
nature. Now if an unlabeled item needs to be labelled under one of the labelled
groups. Now to classify it K-nearest neighbors is easy and best algorithm that have
record of all available classes can perfectly put the new item into the class on the
basis of largest number of votes for k neighbors. In this way KNN is one of the
alternate to classify an unlabeled item into identified class. Selecting the no. of
nearest neighbors or in another words calculating k value plays important role in
determining the efficiency of designed model. The accuracy and efficiency of k-
NN algorithm basically evaluated by the K value determined. A larger number for
k value has advantage in reducing the variance because of noisy data.
9. Advantage: The KNN is an unbiased algorithm and have not any assumption of
the data under consideration. It is very popular because of its simplicity and ease of
implementation plus effectiveness.
Disadvantage: The k-NN not create model so abstraction process not included. It
takes high time to predicate the item. It requires high time to prepare data to design
a robust system.
ALGORITHM FOR KNN:
10.
11.
12. RANDOM FOREST ALGORITHM
Random Forest is a method that operates by constructing multiple decision trees
during training phase.The decision of the majority of the trees is choose by the
random forest as the final decision.
Random Forests grows many classification trees. To classify a new object from an
input vector, put the input vector down each of the trees in the forest. Each tree
gives a classification, and we say the tree "votes" for that class. The forest chooses
the classification having the most votes (over all the trees in the forest).
Each tree is grown as follows:
1. If the number of cases in the training set is N, sample N cases at random -
but with replacement, from the original data. This sample will be the training
set for growing the tree.
2. If there are M input variables, a number m<<M is specified such that at each
node, m variables are selected at random out of the M and the best split on
these m is used to split the node. The value of m is held constant during the
forest growing.
3. Each tree is grown to the largest extent possible. There is no pruning.
13. Algorithm for Construction of Random Forest is
Step 1: Let the number of training cases be “n” and let the number of
variables included in the classifier be “m”.
Step 2: Let the number of input variables used to make decision at the
node of a tree be “p”. We assume that p is always less than “m”.
Step 3: Choose a training set for the decision tree by choosing k times
with replacement from all “n” available training cases by taking a
bootstrap sample. Bootstrapping computes for a given set of data the
accuracy in terms of deviation from the mean data. It is usually used for
hypothesis tests. Simple block bootstrap can be used when the data can
be divided into nonoverlapping blocks. But, moving block bootstrap is
used when we divide the data into overlapping blocks where the portion
“k” of overlap between first and second block is always equal to the “k”
overlap between second and third overlap and so on. We use the
remaining cases to estimate the error of the tree. Bootstrapping is also
used for estimating the properties of the given training data.
Step 4: For each node of the tree, randomly choose variables on which to
search for the best split. New data can be predicted by considering the
majority votes in the tree. Predict data which is not in the bootstrap
sample. And compute the aggregate.
Step 5: Calculate the best split based on these chosen variables in the
training set. Base the decision at that node using the best split.
Step 6: Each tree is fully grown and not pruned. Pruning is used to cut of
the leaf nodes so that the tree can grow further. Here the tree is
completely retained.
Step 7: The best split is one with the least error i.e. the least deviation
from the observed data set.
14. Advantages:
1. It provides accurate predictions for many types of applications
2. It can measure the importance of each feature with respect to the
training data set.
3. Pairwise proximity between samples can be measured by the
training data set.
Disadvantages:
1. For data including categorical variables with different number of
levels, random forests are biased in favor of those attributes
with more levels.
2. If the data contain groups of correlated features of similar
relevance for the output, then smaller groups are favored over
larger groups
Applications:
1. Is used for image classification for pixel analysis.
2. Is used in the field of Bioinformatics for complex data Analysis.
3. It is used for video segmentation (high dimensional data).
15.
16. ADABOOST ALGORITHM
First of all, AdaBoost is short for Adaptive Boosting. Basically, Ada Boosting was
the first really successful boosting algorithm developed for binary classification.
Also, it is the best starting point for understanding boosting. Moreover, modern
boosting methods build on AdaBoost, most notably stochastic gradient boosting
machines.
Generally, AdaBoost is used with short decision trees. Further, the first tree is
created, the performance of the tree on each training instance is used. Also, we use
it to weight how much attention the next tree. Thus, it is created should pay
attention to each training instance. Hence, training data that is hard to predict is
given more weight. Although, whereas easy to predict instances are given less
weight.
Learn AdaBoost Model from Data
Ada Boosting is best used to boost the performance of decision trees and this is
based on binary classification problems.
Each instance in the training dataset is weighted. The initial weight is set to:
weight(xi) = 1/n
Where xi is the i’th training instance and n is the number of training instances
How To Train One Model?
A weak classifier is prepared on the training data using the weighted samples. Only
binary classification problems are supported. So each decision stump makes one
decision on one input variable. And outputs a +1.0 or -1.0 value for the first or
second class value.
The misclassification rate is calculated for the trained model. Traditionally, this is
calculated as:
error = (correct – N) / N
Where error is the misclassification rate. While correct is the number of training
instance predicted by the model. And N is the total number of training instances.
17. AdaBoost Ensemble
• Basically, weak models are added sequentially, trained using the weighted
training data.
• Generally, the process continues until a pre-set number of weak learners
have been created.
• Once completed, you are left with a pool of weak learners each with a stage
value.
Making Predictions with AdaBoost
Predictions are made by calculating the weighted average of the weak classifiers.
For a new input instance, each weak learner calculates a predicted value as either
+1.0 or -1.0. The predicted values are weighted by each weak learner stage value.
The prediction for the ensemble model is taken as a sum of the weighted
predictions. If the sum is positive, then the first class is predicted, if negative the
second class is predicted
Data Preparation for AdaBoost
This section lists some heuristics for best preparing your data for AdaBoost.
Quality Data: Because of the ensemble method attempt to correct
misclassifications in the training data. Also, you need to be careful that the training
data is high-quality. Outliers: Generally, outliers will force the ensemble down the
rabbit hole of work. Although, it is so hard to correct for cases that are unrealistic.
These could be removed from the training dataset. Noisy Data: Basically, noisy
data, specifical noise in the output variable can be problematic. But if possible,
attempt to isolate and clean these from your training dataset.
18. AdaBoost algorithm advantages:
Very good use of weak classifiers for cascading;
Different classification algorithms can be used as weak classifiers;
AdaBoost has a high degree of precision;
Relative to the bagging algorithm and Random Forest Algorithm, AdaBoost fully
considers the weight of each classifier;
Adaboost algorithm disadvantages:
The number of AdaBoost iterations is also a poorly set number of weak classifiers,
which can be determined using cross-validation;
Data imbalance leads to a decrease in classification accuracy;
Training is time consuming, and it is best to cut the point at each reselection of the
current classifier;
19.
20. SUPPORT VECTOR MACHINE
The Support vector machine comes in the category of supervised learning .The
SVM used for regression and classification. But it is popularly known for
classification. It is a very efficient classifier. In this every object or item is
represented by a point in the n- dimensional space. The value of each feature is
represented by the particular coordinate. Then the items divided into classes by
finding hyper-plane as shown in the figure.
The diagram shows support Vectors that represent the coordinates of each item.
The SVM algorithm is a good choice to segregates the two classes.
SVM Advantages
SVM’s are very good when we have no idea on the data.
Works well with even unstructured and semi structured data like text, Images and
trees.
The kernel trick is real strength of SVM. With an appropriate kernel function, we
can solve any complex problem.
Unlike in neural networks, SVM is not solved for local optima.
21. It scales relatively well to high dimensional data.
SVM models have generalization in practice, the risk of over-fitting is less in
SVM.
SVM is always compared with ANN. When compared to ANN models, SVMs
give better results.
SVM Disadvantages
Choosing a “good” kernel function is not easy.
Long training time for large datasets.
Difficult to understand and interpret the final model, variable weights and
individual impact.
Since the final model is not so easy to see, we cannot do small calibrations to the
model hence it’s tough to incorporate our business logic.
The SVM hyper parameters are Cost -C and gamma. It is not that easy to fine-tune
these hyper-parameters. It is hard to visualize their impact
SVM Application
• Protein Structure Prediction
• Intrusion Detection
• Handwriting Recognition
• Detecting Steganography in digital images
• Breast Cancer Diagnosis
• Almost all the applications where ANN is used
24. FURTHER IMPROVEMENTS AND FUTURE SCOPES
In our Glaucoma dataset, we achieved accuracy of 82% in finding the disease and
in future we will increase the accuracy to higher extent.
We will use algorithms like Convolutional Neural Network, to increase the
accuracy rate.
Currently we are using numerical data set as our input for classification and we
will directly take image data set as input in future.
Advances in image processing and its classification will be helpful in diagnosing
medical conditions correctly.
It will be helpful in recognizing people, performing surgery, reprograming, defects
in human DNA etc.
25. CONCLUSION
The paper provides a brief idea of classifier to the beginners of this field.
It helps the researchers in selecting the appropriate classifier for their problem.
This paper explains about KNN, SVM, Random Forest and Adaboost Algorithm
which are very popular classifier in field of image processing. The classifier
mainly classified as supervised or unsupervised classifiers.so in short this paper
provides the theoretical knowledge of concept of above mentioned classifiers
We applied four algorithms on our glaucoma dataset and we found that random
forest algorithm has highest accuracy level of 82% in detecting glaucoma diseases.
We found that KNN algorithm has highest Specificity value.
All this Algorithms can be used for better medical diagnosis of disease like cancer,
Eye disease etc.
It can also be used for biometric purposes such as identity, face and finger print
documentation.