SlideShare a Scribd company logo
1 of 47
Advance Boost Classifier using Random Tree and KNN for
segmentation and Classification of Brain tumor
Chapter 1: Introduction
The accurate diagnosis of diseases with high prevalence rate, such as Brain Tumor diseases, is
one of the most important biomedical problems whose administration is imperative. In this
project, we present a new method for the automated diagnosis of diseases based on the
improvement of random forests classification algorithm. More specifically, the dynamic
determination of the optimum number of base classifiers composing the random forests is
addressed. The proposed method is different from most of the methods reported in the literature,
which follow an overproduce-and-choose strategy, where the members of the ensemble are
selected from a pool of classifiers, which is known a priori. In our case, the number of classifiers
is determined during the growing procedure of the forest. Additionally, the proposed method
produces an ensemble not only accurate, but also diverse, ensuring the two important properties
that should characterize an ensemble classifier. The method is based on an online fitting
procedure and it is evaluated using eight biomedical datasets and five versions of the random
forests algorithm (40 cases). The method decided correctly the number of trees in 90% of the test
cases.
Chapter 2: Literature Survey
INTRODUCTION
Paper1: FCM and KNN BasedAutomatic Brain Tumor Detection
A brain tumor is formed when abnormal cells get accumulated within the brain. These cells
multiply in an uncontrolled manner and damage the brain tissues. Magnetic Resonance Image
scans are commonly used to diagnose brain tumors. However, segmenting and detecting the
brain tumor manually is a tedious task for the radiologists. Hence, there is a need for automatic
systems which yield accurate results. A fully automatic method is introduced to detect brain
tumors. It consists of five stages Image Acquisition, Preprocessing, Segmentation, using Fuzzy
C-means technique; Harris Corner Detection based feature extraction and classification using K-
NN. Performance metrics such as accuracy, precision, sensitivity and specificity are used to
evaluate the performance.
Methodology
A schematic overview of the proposed approach is illustrated in . A random forest classifier was
applied to the feature data from each modality independently, not only to obtain single-modality
classification results for comparison, but also to derive the similarities required for manifold
learning. The resulting similarity matrices were combined, and classicalMDSwas applied to
generate a joint embedding for multi-modality classification. Full details of the data collection
and feature extraction are presented in the Neuroimaging and biological feature data section.
Paper 2: Automated Diagnosis of Diseases Based on Classification: Dynamic Determination
of the Number of Trees in Random Forests Algorithm
An important task of any diagnostic system is the process of attempting to determine and/or
identify a possible disease or disorder and the decision reached by this process. For this purpose,
machine learning algorithms are widely employed [1], [2]. For these machine learning
techniques to be useful in medical diagnostic problems, they must be characterized by high
performance, the ability to deal with missing data and with noisy data, the transparency of
diagnostic knowledge, and the ability to explain decisions. In this paper, the improvement of the
random forests classification algorithm, which meets the aforementioned characteristics, is
addressed. This is achieved by determining automatically the only tuning parameter of the
algorithm, which is the number of base classifiers that compose the ensemble and affects its
performance. Random forests are a substantial modification of KNN [3]–[6]. It constructs a large
number of unpruned and decorrelated trees. The generation of the trees is based on the
combination of two sources of randomness. First, each tree is constructed on a bootstrap replicate
of the original dataset, as in KNN, and second a random feature subset, of fixed predefined size,
is considered for splitting each node of the tree. Gini index is used as the feature evaluation
measure that determines the best split. The decision tree is built to the maximum size without
pruning. The random forests classify each new instance by the majority vote of the full set of
trees.
One of the most important issues in the creation of an ensemble classifier, such as
random forests, is the size of the ensemble, the number of classifiers composing the ensemble,
and how the unnecessary classifiers are removed from the ensemble. The factors that may affect
the size of the ensemble are:
1) The desired accuracy,
2) The computational cost,
3) The nature of the classification problem, and
4) The number of available processors.
The methods reported in the literature, dealing with this problem, can be grouped into three
categories:
1) Methods that preselect the ensemble size,
2) Methods that postselect the ensemble size (pruning of the ensemble), and
3) Methods that select the ensemble size during training.
Preselection methods are the simplest way to determine the ensemble size. More specifically, the
number of the base classifiers is a tuning parameter of the algorithm, which can be set by the
user. Pruning methods contain precombining and postcombining methods [8]. In the first case,
pruning is performed before combining the classifiers. The classifiers that seem to perform well
are included in the ensemble. The predictive strength of a classifier is determined using different
evaluation measures. In postcombining pruning methods, the classifiers are removed from the
ensemble based on their contribution to the collective. More specifically, most of the
postcombining pruning methods are based on the overproduce-and-choose strategy, which
consists of two phases. The overproduction phase aims to produce a large initial pool of
candidate classifiers, while the selection phase aims to choose adequate classifiers from the pool
of classifiers so that the selected group of classifiers can achieve optimum positive predictive
rate. In the second phase (selection phase), different approaches are used. More specifically,
ensemble selection methods can be grouped into the following categories:
1) Weighted voting methods
2) search-based methods
3) clustering-based methods
4) Ranking methods
5) Optimization of a measure or function methods.
Architecture for previous method
The proposed method is based on the iterative procedure shown in Fig. 1. The method consists of
three basic steps: 1) the construction of the initial forest, 2) the application of the fitting
procedure, and 3) the examination of the termination criterion.
1) Construction of the Forest:
In the first step, the method constructs a forest with ten trees. For the construction of the
forest, the classical random forests and some modifications of it are used. More
specifically, random forests with ReliefF (RF with ReliefF) [9], random forests with
multiple estimators (RF with me) [9], RK Random Forests (RK-RF) [24], and RK
Random Forests with multiple estimators (RK-RF with me) [16] are employed. The
classical random forests constructs a collection of trees. For the construction of each tree,
a bootstrap sample of the dataset is selected. The tree is built to the maximum size
without pruning. The tree is just grown until each terminal node contains only members
of a single class. The Gini index [9] is used to determine the best split of each node. Only
a subset m of the total set of M features is employed as the candidate splitters of the node
of the tree. The number of the selected features (m) remains constant throughout the
construction of the forest. If a popularity of the trees agrees on a given classification, then
that is the predicted class of the sample being classified. In RF with ReliefF, the Gini
index is replaced by ReliefF. ReliefF evaluates the partitioning capability of attributes
according to how well their values distinguish between similar instances. The
replacement of Gini index is the core idea of RF with me too. However, five evaluation
measures are used instead of one. Those evaluation measures are: Gini index [9], grain
ratio [9], ReliefF [9], minimum description length [9], and myopic ReliefF [9]. The
differentiation between classical random forests and RK-RF lies in the value of the
parameter m. More specifically, it is not the same throughout the construction of the
forest, but it is randomly chosen for each node of the tree. Finally, in RK-RF with me, the
random selection of the parameter m is combined with multiple estimators to accomplish
the construction of the forest. The detailed description of the previous algorithms is
provided in [16]. During the construction of the forest, the accuracy and the average
correlation of the forest are computed each time a new tree is added. The forest primarily
consists of ten trees. The reason for selecting this initial value is due to the fact that the
fitting procedure, that is going to be applied in the next step, needs an adequate number
of points to start.
2) Fitting Procedure:
After the construction of the initial forest, an iterative procedure is used. The procedure
consists of three basic stages: 1) add a new tree, 2) apply polynomial fits, and 3) select
the best fit. The polynomial fits that are employed are given by: fn−1 (x) = pn xn + pn−1
xn−1 + ・ ・ ・ + p0, n= 2, ..., 9. (1) where x is the data to be fitted and pn are the
coefficients of the polynomial. The best fit is the one with the minimum rms error (the
root of the average of the squares of the differences between the predicted and actual
values).
3) Examination of the Termination Criterion:
In this step, the method examines if the stopping criterion is fulfilled. For this purpose,
three different criteria were tested in order to conclude to the best one. The first criterion
(criterion 1) searches for consecutive points in the fitted curve, where the difference
between the fitted curve and the curve of the accuracy is greater than a predefined
threshold. If there is such a region, the method terminates and returns the point of this
region for which the maximum accuracy is observed. The number of consecutive points
should be at least 10 and the difference, point by point, between the curves should be
greater than 0.004 [25]. In the case that the criterion is fulfilled and there are more than
one points, where the maximum accuracy is observed, the one with the lowest Brier score
[26] is selected. Brier score is a function, which measures the average squared deviation
between predicted probabilities for a set of events and their outcomes. Thus, the lowest
score corresponds to the highest accuracy.
Chapter 3: Proposed Work
Statement of Problem:
Attributed their success to the two main components of their system: discrimination and
randomization. Discrimination refers to the use of SVM to learn the splits at each node, whereas
randomization refers to a random selection of image patches, which are used as a form of
features to learn the splits at each node. There are several problems that may arise from this
randomization procedure. Firstly, if we consider image patches of size 50X50 in an 500X500
image, sampling space may contain thousands of patches, which makes it less likely that a
randomly selected patch will contain an object of interest for the image categorization. In
addition, randomly selected samples are more likely to over-lap with each other, which would
cause redundancy. Therefore, in this project, I investigated new ways for selecting image
patches. In theory, more informative patch selection should result in higher quality splits at each
tree node, which in turn should increase overall accuracy of the classifier.
Features and Scope:
To Fix the problems related to random patch selection I integrated a selective search
segmentation algorithm into the original random forest framework. Image patches selected using
selective search segmentation is more likely to contain the objects of interest. In addition,
segmentation should eliminate redundant overlapping between the image patches, which will
make our feature space more diverse. Fixing these two problems should result in an increased
discriminative power of random forest.
Goals:
Before beginning Random Forest procedure, I standardize each image by rescaling them to the
same size and then apply Selective Search Segmentation to extract important regions from each
image. Each region is represented by 4 coordinates in the image (points in the bottom left and top
right corners of the region). Then, SVM is applied to all the regions that were returned by
Selective Search Segmentation and its centroids are chosen as then all candidate regions. In this
particular case, I used 1024 centroids.
INTRODUCTION
Decision tree is one of the machine learning methods widely used to analyze proteomics data.
Generated from a given dataset, a single decision tree reports a classification result by each of its
terminal leaves (classifiers). Even though there exist many algorithms such as C4.5 that can be
used to generate a well-modeled single decision tree, it is still possible that its prediction is
biased, thus adversely affecting its accuracy.
To overcome this problem, more than one decision trees are used to analyze the data. It is based
on the concept of forming a panel of experts who will then vote to decide the final outcome. The
panel of experts is analogous to the ensemble of decision trees, which provide a pool of
classifiers. Similar to voting, the classifier that has a majority becomes the true classification
result for the data. the decision tree ensemble is more accurate than a single decision tree.
The diagram shown summarizes the process of generating decision tree ensemble and
classification of data using KNN algorithm. To explain briefly, KNN algorithm begins with
randomly sampling (with replacement) the data from the original dataset to form a training set. A
multiple of training sets are usually generated. Note that, since the replacement is allowed, the
data in the training set can be duplicated. Each of training sets then generates a decision tree. For
a given test data, each decision tree predicts an outcome, represented by a classifier. The
ensemble of decision trees forms a panel of experts whose votes determine the final classification
result from this group of classifiers.
ARCHITECTURE DIAGRAM
The performance in identifying biomarkers for premalignant pancreatic Alzheimer could be
enhanced by using the decision tree ensemble techniques instead of a single algorithm
counterpart. These techniques had proven more likely to accurately distinguish disease class
from normal class as indicated by a larger area under the Receiver Operating Characteristic
curve. Moreover, they achieved comparatively lower root mean squared errors.
According to their method, the peptide mass-spectrometry data were processed first to improve
data integrity and reduce variations among data due to the differences in sample loading
conditions. The preprocessing steps involved baseline adjustment using group median,
smoothing to remove noise using a Gaussian kernel, and normalization to make all the data
comparable. After that, the data were randomly sampled such that 90% formed a training set and
the remaining 10% formed a test set.
The training set was used in feature selection. In the study, the authors considered three different
feature selection methods. The first method was a two-sample homoscedastic t test, which was
used under the assumption that all the features from either normal or disease class had normal
distribution. Unlike the first method, the second method based on ANDI rank test considered that
the features had no distribution. The last feature selection method was a genetic algorithm.
The test set was used to generate a single decision tree including the decision tree ensembles.
The ensemble methods being studied were Random Forest, Random Tree, KNN, boost, Stacking,
Adaboost, and Multiboost. Their performances were measured in terms of accuracy and error in
the classification of the features, selected by each selection method. Then, they were compared
against the performance of a single decision tree generated by C4.5 algorithm. The process
repeated ten times to validate the resulting performance consistency.
According to the results reported, the decision tree ensembles achieved higher accuracy up to
70% regardless of the feature selection methods used. In terms of biomarker identification, both
the t test and the ANDI rank test had similarly impressive performance by consistently selecting
the same biomarker-suspect features. Unlike the first two methods, the performance of the
genetic algorithm was considerably poor. also noted that 70% accuracy was still lower than
expected. This could be as a result from a naturally low concentration of the biomarkers at the
premalignant stage of the Alzheimer. In addition, it was also possible that one dataset might not
be suitable for all algorithms, thus underestimating the accuracy.
Raw spectrum data:
We use GAUSSIAN EDGE with 4 levels.
Gaussian kernel smoothing:
A process of averaging the data points by applying a Gaussian functions. Basically, the
Gaussian function is used to generate a set of normalized weighting coefficients for the
data points whose weighted sum generates a new value. This new value replaces the old
one at the center of Gaussian curve.
Goal and overview of this research:
The goal of this research work was to extract the meaningful knowledge lied in the database and
transform them into meaningful rules.
Block diagram of the research work
process
Then the rules are used to predict the class labels of unknown data. Finally we introduced KNN
and Boosting to improve the accuracy of this whole process.
Keeping the aimed goal of this research in mind we constructed the whole research process as
shown in the block diagram in Fig. 1. Here the decision tree induction algorithms are used to turn
the hidden knowledge into a large dataset into decision rules. Again enhancements are made to
these algorithms to extract and use the rules more precisely to improve the accuracy. In the
research we have used heart disease dataset which is collected from UCI machine learning
repository. At first, ID3 algorithm is used to extract rules from the dataset and to use the rules to
classify new data which is implemented in C#. C4.5, the successor of ID3 is then used to classify
data more accurately. Finally, two new approaches named KNN and Boosting are introduced to
improve the predictive accuracy of C4.5.
Background study
Classification and prediction:
Data classification is a two-step process . In the first step, a model is built describing
predetermined set of data classes or concepts. The model is constructed by analyzing database
tuples described by attributes. Each tuple is assumed to belong to a predefined class as
determined by one of the attributes, called the class label attribute. In the context of
classification, data tuples are also referred to as samples or objects. The data tuples analyzed to
build the model collectively form the training dataset. The individual tuples making up the
training set are referred to as training samples and are randomly selected from the sample
population . Since the class label of each training sample is provided, this step is also known as
supervised learning. It contrasts with unsupervised learning, in which the class label of each
training sample is not known and the number or set of classes to be learned may not be known in
advance .
Prediction can be viewed as the construction and use of a model to assess the class of an
unlabeled sample or to assess the value or value ranges of an attribute that a given sample is
likely to have. In this view, classification and regression are the two major types of prediction
problems where classification is used to predict discrete or nominal values, while regression is
used to predict continuous or ordered values. In the view, however, refer to the use of prediction
to predict class label as classification and the use of prediction to predict continuous values as
prediction.
Decision tree induction:
Decision tree induction is a greedy algorithm that constructs decision tree in a top-down
recursive divide and conquer manner. A decision tree is a tree in which each branch node
represents a choice between a numbers of alternatives and each leaf node represents a decision.
Decision trees are commonly used for gaining information for the purpose of decision-making. It
starts with a root node and forms this node; users split each node recursively according to
decision tree learning algorithm. The final result is a decision tree in which each branch
represents a possible scenario of decision and its outcome.
For extracting rules, information gain measure is used to select the test attribute at each node in
the tree. The attribute with the highest information gain is chosen as the test attribute for the
current node and the path from the root node to each leaf node in the tree is tracked to construct
rules from the dataset.
They use induction in order to provide an appropriate classification of objects in terms of their
attributes, inferring decision tree rules. In their learning phase, explicit rules or interactions
among relevant features are induced. Such a learning method differs from non-linear classifiers
such as support vector machines or neural networks where the learning phase is to determine the
parameters of the non-linear kernel functions.
ID3 algorithm:
The ID3 (Iterative Dichotomiser 3) technique to building a decision tree is based on information
theory and attempts to minimize the expected number of comparisons. The basic idea of the
induction algorithm is to ask questions whose answers provide the most information. The first
question divides the search space into two large search domains while the second performs little
division of the space. The basic strategy used by ID3 is to choose splitting attributes with the
highest information gain first. The amount of information associated with an attribute value is
related to the probability of occurrence.
Let node N represents or hold the tuples of partition D. The attribute with the highest information
gain is chosen as the splitting attribute for node N. This attribute minimizes the information
needed to classify the tuples in the resulting partitions and reflects the least randomness or
impurity in these partitions. To calculate the of an attribute, at first we calculate the entropy of
that attribute by the following formula:
(1)
Where, Pj is the probability that an arbitrary tuple in S belongs to class Cj and estimated by |Ci,D|
/ |D|. A log function to the base 2 is used because the information is encoded in bits. Entropy (S)
is just the average amount of information needed to identify the class label of the tuple in S.
Now, the gain of an attribute is calculated by the formula
(2)
where, Si ={S1, S2......Sn} = partitions of S according to values of attribute A:
n = Number of attributes A
|Si| = Number of cases in the partition Si
|S| = Total number of cases in S
Information gain is defined as the difference between the original information requirement and
new requirement.
(3)
In other words, Gain (A) tell us how much would be gained by branching on A. It is the expected
reduction in the information requirement caused by knowing the value of A. The attribute A with
highest information gain is chosen as the splitting attribute at node N.
MATERIALS AND METHODS
New decission tree learning algorithms:
The C4.5 algorithm extension of his own ID3 algorithm for generating decision trees. KNN and
Boosting are general strategies for improving classifier and predictor accuracy. Suppose that we
are a patient and would like to have a diagnosis made based on the symptoms. Instead of asking
one doctor, we may choose to ask several. If a certain diagnosis occurs more than any others, we
may choose this as the final or best diagnosis. That is the final diagnosis is made based on a
majority vote where each doctor gets an equal vote. Now replace each doctor by a classifier, we
have the basic idea behind KNN.
In boosting, we assign weights to the value of each doctor’s diagnosis, based on the accuracies of
previous diagnoses they have made. The final diagnosis is then a combination of the weighted
diagnoses.
C4.5 Algorithm:
Just as with CART, the C4.5 algorithm recursively visits each decision node, selecting the
optimal split, until no further splits are possible. The steps of C4.5 algorithm for growing a
decision tree is given below:
• Choose attribute for root node
• Create branch for each value of that attribute
• Split cases according to branches
• Repeat process for each branch until all cases in the branch have the same class
A question that, how an attribute is chosen as a root node? At first, we calculate of the gain ratio
of each attribute. The root node will be that attribute whose gain ratio is maximum. Gain ratio is
calculated by the formula.
(4)
Where, A is an attribute whose gain ratio will be calculated. The attribute A with the maximum
gain ratio is selected as the splitting attribute. This attribute minimizes the information needed to
classify the tuples in the resulting partitions. Such an approach minimizes the expected number
of tests needed to classify a given tuple and guarantees that a simple tree if found. To calculate
the gain of an attribute, at first we calculate the entropy of that attribute by the following formula
(5)
Where, Pi is the probability that an arbitrary tuple in S belongs to class Ci and estimated by
|Ci;D|/|D|. A log function to the base 2 is used because the information is encoded in bits. Entropy
(S) is just the average amount of information needed to identify the class label of the tuple in S.
Now gain of an attribute is calculated by the formula
(6)
Where, Si = {S1, S2.....Sn} = partitions of S according to values of attribute A:
n = Number of attributes A
|Si| = Number of cases in the partition Si
|S| = Total number of cases in S
The gain ratio divides the gain by the evaluated split information. This penalizes splits with
many outcomes.
(7)
The split information is the weighted average calculation of the information using the proportion
of cases which are passed to each child. When there are cases with unknown outcomes on the
split attribute, the split information treats this as an additional split direction. This is done to
penalize splits which are made using cases with missing values. After finding the best split, the
tree continues to be grown recursively using the same process.
KNN:
We first take an intuitive look at how researchers as a method of increasing accuracy. Suppose
that we are a patient and would like to have a diagnosis made based on the symptoms. Instead of
asking one doctor, you may choose to ask several. If a certain diagnosis occurs more than any
others, you may choose this as the final or best diagnosis. That is the final diagnosis is made
based on a majority vote where each doctor gets an equal vote. Now replace each doctor by a
classifier, you have the basic idea behind KNN. Intuitively, a majority vote made by a large
group of doctors may be more reliable than a majority vote made by a small group.
Given a set, D, of d tuples, KNN works as . For iteration i (I =1, 2, 3,....., k), a training set, Di of
d tuples is sampled with replacement from the original set of tuples, D. Note that the term KNN
stands bootstrap aggregation. Each training set is a bootstrap sample. Because sampling with
replacement is used, some of the original tuples of D may not be included in Di, whereas others
may occurs more than once. A classifier model Mi is learned for each training set, Di. To classify
an unknown tuple, X, each classifier, Mi, returns its class prediction, which counts as one vote.
The bagged classifier, M*, counts the votes and assigns the class with the most vote to X. KNN
can be applied to the prediction of continuous values by taking the average value of each
prediction for a given test tuple.
Algorithm: KNN: The KNN algorithm creates an ensemble of models (classifiers or predictors)
for a learning scheme where each model gives an equally-weighted prediction.
Input:
• D, a set of training tuples
• k, the number of models in the ensemble
• A learning scheme (e.g., decision tree algorithm, backpropagation, etc.)
Output: A composite model, M*
Method:
• For I = 1 to k do// create k models
• Create bootstrap sample, Di by sampling D with replacement
• Use Di to derive a model, Mi
• Endfor
To use the composite model on a tuple, X:
• If classification then
• Let each of the k models classify X and return the majority vote
• If prediction then
• Let each of the k models predict a value for X and return the average predicted value
The bagged classifier often has significantly greater accuracy than a single classifier derived
from D, the original training data. It will not be considerably worse and is more robust to the
effects of noisy data. The increased accuracy occurs because the composite model reduces the
variance of the individual classifiers. For prediction, it was theoretically proven that a bagged
predictor will always have improved accuracy over a single predictor derived from D.
Boosting:
Boosting is a general method for improving accuracy of any given learning algorithm. It is an
effective method of producing a very accurate prediction rule by combining rough and
moderately inaccurate rules of thumb. In the research we have focused especially on the
AdaBoost .
Adaboost algorithm:
In AdaBoost, the input includes a dataset D of d class-labeled tuples, an integer k specifying the
number of classifiers in the ensemble and a classification-learning scheme.
Each tuple in the dataset is assigned a weight. The higher the weight is the more it influences the
learned theory. Initially, all weights are assigned a same value of 1/d. The algorithm repeats k
times. At each time, a model Mi is built on current dataset Di which is obtained by sampling with
replacement on original training dataset D. The framework of this algorithm is as follows:
Algorithm: AdaBoost
Input:
• D, a set of d class-labeled training tuples
• K, the number of rounds
• A classification learning scheme
Output: A composite model
Method:
• Initialize the weight of each tuple in D to 1/d
• For I = 1-k do
• Sample D with replacement according to the tuple weights to obtain Di
• Use training set Di to drive a model, Mi
• Compute the error rate error(Mi) of Mi
• If error(Mi) >0.5 then
• Reinitialize the weights to 1/d
• Go back to step 3 and try again
• Endif
• Update and normalize the weight of each tuple;
• Endfor
The error rate of Mi is the sum of the weights of all tuples in Di that of the tuples in Di that Mi
misclassified:
(8)
Where, err (Xj) = 1, if Xj is misclassified and err (Xj) = 0 otherwise. Then the weight of each
tuple is updated so that the weights of misclassified tuples are increased and the weights of
correctly classified tuples are decreased. This can be done by multiplying the weights of each
correctly classified tuple by error (Mi)/(1- error (Mi)). The weights of all tuples are then
normalized so that the sum of them of them is equal to 1. In order to keep this constraint, the
weight of each tuple is divided by the sum of the new weights.
After K rounds, a composite model will be generated, or an ensemble of classifiers which is then
used to classify new data. When a new tuple X comes, it is classified through these steps:
• Initialize weight of each class to 0
• For i = 1-k do
• Get weight wi of classifier Mi
• Get class prediction for X from Mi:c = Mi (Xi)
• Add βi to weight for class c
• endfor
• Return the class with the largest weight
The weight wi of each classifier Mi is calculated by this Eq. 9:
(9)
Requirements for KNN and boosting: These two methods for utilizing multiple classifiers
make different assumptions about the learning system. As above, KNN requires that the learning
system should not be stable, so that small changes to the training set should lead to different
classifiers. Breiman also notes that, poor predictors can be transformed into worse ones by KNN.
Boosting, on the other hand, does not preclude the use of learning systems that produce poor
predictors, provided that their error on the given distribution can be kept below 50%. However,
boosting implicitly requires the same instability as KNN; if Ct is the same Ct-1 the weight
adjustment scheme has the probability that error (Mt) = 0.5.
Chapter 4: Project Design
Hardware Requirements
• SYSTEM : Pentium IV 2.4 GHz
• HARD DISK : 40 GB
• FLOPPY DRIVE : 1.44 MB
• MONITOR : 15 VGA colour
• MOUSE : Logitech.
• RAM : 256 MB
Software Requirements
• Operating system :- Windows XP Professional
• Front End : - Asp .Net 2.0.
• Coding Language :- Visual C# .Net
• Back-End : - Sql Server 2000.
Module I/O
Preprocessing
Given Input- Image.
Expected Output-Normalize image
DFT
Given Input- Image and Dataset.
Expected Output- Classified Image.
KNN
Given Input- Classified image.
Expected Output-Image Bins.
Boosting
Given Input- Image Bins.
Expected Output- Rank Classified Image.
Verification
Given Input- Checks with user’s stored details like security answers or hidden details.
Expected Output-If the verification is success, user can perform transaction, else blocks the card.
Module diagram
UML Diagrams
Start
Image
RDT
KNN Bin
Bagging
Dataset
Stop
Use case diagram
Dataset
RDT
Bagging
KNN Bin
System
Image
Class diagram
Classification.
Segment
Image_name
RDT()
ID5()
Boosting.
_ans
details
verify()
Dataset.
Sequence
Sequence()
KNN Bin
qst
ans
info
Object diagram
Sequence diagram
Preprocessing
Boosting
RDT
Classification
KNN Bin
Component Diagram
image RDT KNN Bin Boosting DB Ratio
Transaction
Feature
Block
Block Details
DB feature
Ranked image
Image Library
ID4.5
Tree Componentn
KNN
Storage
Detail
Dataflow diagram
Directory
Preproc
essing
image
Feature
Classific
ation
Matrix
Dataset
Chapter 5: Proposed Simulation/Experiments/Results/Analysis
This study explores the utility of three different feature selection schemas to reduce the high
dimensionality of a pancreatic Alzheimer proteomic dataset. Using the top features selected from
each method, we compared the prediction performances of a single decision tree algorithm C4.5
with six different decision-tree based classifier ensembles (Random forest, Stacked
generalization, KNN, Adaboost, Logitboost and Multiboost). We show that ensemble classifiers
always outperform single decision tree classifier in having greater accuracies and smaller
prediction errors when applied to a pancreatic Alzheimer proteomics dataset.
Classification results using features selected by Student t test.
Algorithm
Accuracy(%
)
TP
rate
FP
rate
TN
rate
FN
rate
Sensitivit
y
Specificit
y
Precisio
n
Fmeasur
e
RMSE
Random
Forest
0.6500
0.7
9
0.5
3
0.4
8
0.2
1
0.79 0.48 0.65 0.71
0.456
9
KNN 0.6833
0.7
8
0.4
4
0.5
6
0.2
2
0.78 0.56 0.69 0.73
0.428
5
Logitboost 0.6889
0.8
3
0.4
9
0.5
1
0.1
7
0.83 0.51 0.69 0.75
0.440
2
Stacking 0.6444
0.9
9
0.7
9
0.2
1
0.0
1
0.99 0.21 0.61 0.76
0.476
1
Multiboost 0.6889
0.8
1
0.4
6
0.5
4
0.1
9
0.81 0.54 0.70 0.74
0.517
5
Logistic 0.7500
0.7
9
0.3
0
0.7
0
0.2
1
0.79 0.70 0.78 0.78
0.422
4
Naivebaye
s
0.6833
0.6
4
0.2
6
0.7
4
0.3
6
0.64 0.74 0.76 0.68
0.528
9
Bayesnet 0.6722
0.6
3
0.2
8
0.7
3
0.3
7
0.63 0.73 0.74 0.67
0.530
8
Neural
Network
0.7000
0.7
0
0.3
0
0.7
0
0.3
0
0.70 0.70 0.75 0.72
0.451
7
RBFnet 0.6722
0.7
6
0.4
4
0.5
6
0.2
4
0.76 0.56 0.69 0.71
0.463
2
CRDTNN 0.9644
0.7
1
0.3
3
0.6
8
0.2
9
0.71 0.68 0.74 0.71
0.548
9
TP rate: True positive rate, FP rate: False positive rate, TN rate: True negative rate, FN rate:
False negative rate, RMSE: Root Mean Squared Error. RBFnet: Radio Basis Function network,
SVM: Support Vector Machine.
Chapter 6: Testing
TESTING
Testing is an activity in which a system or component is executed under specified
conditions, whose results are observed or recorded, and an evaluation is made about some aspect
of the system or component. A successful testing uncovers errors in the software. So in general
testing demonstrates that the system is working according to the specifications, and that it meets
the performance requirements. This is the final stage of any project. Testing is a process of
executing the program with the intent of finding an error, it is a set of activities that can be
planned in advance and conducted systematically. The purpose of System testing is to correct the
errors in the system. Nothing is completed without the testing, as it is vital to the success of the
system.
6.1 Testing Phases:
Software testing phases include the following:
Test activities are determined and Test data is selected. The test is conducted and test
results are compared with the expected results.
There are various types of Testing:
Unit Testing:
Unit testing is a procedure used to validate that individual units of source code are
working properly. A unit is the smallest testable part of an application. In procedural
programming a unit may be an individual program, function, procedure etc, while in object-
oriented programming, the smallest unit is always a Class; which may be a base/super class,
abstract class or derived/child class. Units are distinguished from modules in that modules are
typically made up of units
Integration Testing:
Integration Testing is the phase of software testing in which individual software modules
are combined and tested as a group. It follows unit testing and precedes system testing. The goal
is to see if the modules are properly integrated and the emphasis being on the testing interfaces
among modules.
System Testing:
System testing is testing conducted on a complete, integrated system to evaluate the
system's compliance with its specified requirements. System testing is actually done to the entire
system against the Functional Requirement Specification(s) (FRS) and/or the System
Requirement Specification (SRS). It is also intended to test up to and beyond the bounds defined
in the software/hardware requirements specification(s).
Acceptance Testing:
Acceptance testing generally involves running a suite of tests on the completed system.
The acceptance test suite is run against the supplied input data or using an acceptance test script
to direct the testers. Then the results obtained are compared with the expected results. If there is
a correct match for every case, the test suite is said to pass. If not, the system may either be
rejected or accepted on conditions previously agreed between the sponsor and the manufacturer.
6.2 Testing Methods:
Testing is a process of executing a program to find out errors. Any testing can be done in
two ways:
White Box Testing:
White Box testing uses an internal perspective of the system to design test cases based on
internal structure. It requires programming skills to identify all paths through the software. The
tester chooses test case inputs to exercise paths through the code and determines the appropriate
outputs. Using the testing a software engineer can derive the following
Test cases:
Exercise all the logical decisions on either true or false sides. Execute all
loops at their boundaries and within their operational boundaries. Exercise the internal data
structures to assure their validity.
Black Box Testing:
Black box testing takes an external perspective of the test object to derive test cases.
These tests can be functional or non-functional, though usually functional. The test designer
selects valid and invalid input and determines the correct output. There is no knowledge of the
test object's internal structure.
Black Box testing attempts to find errors in the following categories:
 Incorrect or missing functions
 Interface errors
 Errors in data structures
 Performance errors
 Initialization and termination errors
6.3 Test Approach:
Testing can be done in two ways:
o Bottom-up approach
o Top-down approach
Bottom-up approach:
In a bottom-up approach the individual base elements of the system are first specified in
great detail. These elements are then linked together to form larger subsystems, which then in
turn are linked, sometimes in many levels, until a complete top-level system is formed. This
strategy often resembles a "seed" model, whereby the beginnings are small, but eventually grow
in complexity and completeness. However, "organic strategies", may result in a tangle of
elements and subsystems, developed in isolation, and subject to local optimization as opposed to
meeting a global purpose.
Top-down approach:
In a top-down approach an overview of the system is first formulated, specifying but not
detailing any first-level subsystems. Each subsystem is then detailed enough to realistically
validate the model.
7.1 Black box Testing:
Test Case 1: Color space conversion
Objective: To check whether the RGB space converted into YUV
Description: After putting RGB pixelite matrix get YUV factor
Expected Behavior Observed Behavior
Y is Luminance (brightness).
U & V are Chrominance factor
Conversion is done for obtaining
brightness & color factors.
Y is Luminance (brightness).
U & V are Chrominance factor.
Y=0.299*R +0.587*G+0.11*B
U= (B-Y)*0.565
V= (R-Y)* 0.713 for y= 0 step 0
toImageHeight
Test Case 2: Calculate histogram
Objective: To check whether the Histogram computed or not
Description: To check whether the Histogram computed or not while camera start
capturing
Expected Behavior Observed Behavior
Compute Histogram for Color Image Y(Lumi histogram) in Array index
U(Chromi Histogram) in Array index
V(Chromi Histogram) in Array index
YLumi = (int)(Blue * 0.1133 + Green *
0.5859 + Red * 0.3008);
UChro = (int)(0.493 * (Blue - YLumi) +
128);
VChro = (int)(0.877 * (Red - YLumi) +
128);
HIndex = YLumi / 4;
HIndex = UChro / 4;
HIndex = VChro / 4;
Test Case 3: Analysis
Objective: To Compare histogram using similarity HMM function
Description: Compare two histogram using DCOS function till out layer equals 1
Expected Behavior Observed Behavior
Compare two histogram using DCOS
function and out layer equals 1
Dcos(A,B) = 1 for two hiastogram
Test Case 4: Record
Objective: To check whether the record module is working properly or not
Description: After selecting this option, the recording should start
Expected Behavior Observed Behavior
Pixelgrabber grab pixel for image array Recording starts. And Pixelgrabber grab
pixel for image array
Test Case 5: File Indexing
Objective: To check the working of file indexing module.
Description: After selecting this option, the file indexing should start and frames should
be captured.
Expected Behavior Observed Behavior
File Indexing Starts and keyframes are
captured.
File Indexing Starts and Keyframes are
captured.
Test Case 6: Image Feature Extraction.
Objective: To check if all the indexed Image list and their keyframes are displayed.
Description: After selecting this option, a list of all indexed Image is displayed and when
a video is selected, its keyframes are displayed.
Expected Behavior Observed Behavior
A list of indexed Image and their keyframes
are displayed.
A list of indexed Image and their keyframes
are displayed.
Test Case 7: Query.
Objective: To check if the query works properly and searches the image in indexed
Image.
Description: If the image queried is present in any video then the search is positive and a
path is displayed.
Expected Behavior Observed Behavior
If image is present in an indexed video then
path is displayed else not found.
If image is present in an indexed video then
path is displayed else not found.
Test Case 8: Exit function.
Objective: To check whether exit function is working correctly or not.
Description: When we click on exit button project should be closed.
Expected Behavior Observed Behavior
When we click on exit button project
should be closed.
Project is closed successfully.
7.2 GUI testing:-
Graphical User Interface (GUI) presents interesting challenges for software
engineers. Because of reusable components provided as part of GUI development
environments, the creation of the user interface has become less time consuming
and more precise. But, the same time, the complexity of GUIs has grown, leading
to more difficulty in the design and execution of the test cases. Because many
modern GUIs have the same look and same feel, a series of test cases can be
derived.
Test Case 1:
Objective: To check whether the menu selection process is working properly.
Description: When we select any option from the menu, then that chosen option is
selected and appropriate action is taken.
Expected Behavior Observed Behavior
Chosen option is selected and appropriate
action is taken.
Chosen option is selected and appropriate
action is taken.
Test Case 2:
Objective: To check working of right-click menu which are on the main form.
Description: To check whether right-click menu shortcut are working properly.
Expected Behavior Observed Behavior
The shortcuts work properly. The shortcuts work properly.
7.3 System Testing:-
System testing is actually a series of different tests whose primary purpose
is to fully exercise a computer based system. Although each test has different
purpose, all work to verify that system elements have been properly integrated and
allocated functions.
Test Case 1:
Objective: To check whether the system is working properly.
Description: The analysis, Classification and Detection are working properly.
Expected Behavior Observed Behavior
The analysis, indexing and Detection are
working properly.
The analysis, indexing and Detection are
working properly.
Chapter 7:Schedule Work And Estimate
Estimation and Efforts:-
The costing feasibility of the project can be estimated using current estimated
models such as the Lines of Code, which allow us to estimate cost as a function of size.
Thus, this also allows us to estimate and analyze the feasibility of completion of the
system in the given timeframe. This allows us to have a realistic estimate as well as a
continuous evaluative perspective of the progress of the project.
Number of people working on this project = 3
Duration of project = August 2010 to April 2011
The project is divided over a period from August 2010 to April 2011. This time span is divided
into two major parts as follow.
DURATION FROM
DATE
TO DATE WEEKS HOURS/
WEEK
Duration I August November 14 6
Duration II Jan April 16 10
Table 4.2.1 Duration Table
Due to the academic compulsions we will be available for the project for following man-hours.
For Duration I : 14 * 6 = 84 MAN HOURS
For Duration II : 16 * 10 = 160 MAN HOURS
TOTAL availability = 224 MAN HOURS
Name of module LOC count
Capture 667
Analysis 445
Recording 430
File Indexing 460
Query 221
Total 2223
Table 4.2.2 KLOC Table
Constructive Cost Model (COCOMO)computes software development effort (and
cost) as a function of program size. Program size is expressed in estimated thousands of
lines of code (KLOC).
COCOMO applies to three classes of software projects:
• Organic projects: Small teams with good experience working with less than rigid
requirements
• Semi-detached projects: Medium teams with mixed experience working with a mix of
rigid and less than rigid requirements
• Embedded projects: Developed within a set of tight constraints (hardware, software,
operational)
KLOC is the estimated number of delivered lines (expressed in thousands) of code for
project, The coefficients a, b, c and d are given in the following table.
Software project a b c d
Organic 2.4 1.05 2.5 0.38
Semi- detached 3.0 1.12 2.5 0.35
Embedded 3.6 1.20 2.5 0.32
Table 4.2.3 COCOMO coefficients Table
In COCOMO model the effort can be calculated as:
Effort Applied (E) = a * (KLOC) ^ b (man-months)
And duration of the project can be estimated as:
Development Time (D) = c* E ^ d (months)
Our project ‘Content Based Video Indexing & Image Retrieval’ comes under image
processing category.
So, a = 2.4, b=1.05, c=2.5, d=0.38
E = 2.4 * (2.223) ^ 1.05
= 5.5526 man-months
D = 2.5 * 2.223 ^ 0.38
= 3.3867 months
According COCOMO model, the average cost per person month is Rs 10,000 so overall
software cost can be estimated as,
Software cost = E * 10,000
= 5.5526 * 10,000
= Rs 55,526.00 (approx)
4.3 Time Line Schedule:-
From To Task
01-08-2010 06-08-2010 Group Formation and finalization
07-08-2010 13-08-2010 Topic Search and Finalization
14-08-2010 20-08-2010 Preliminary Information Gathering
21-08-2010 27-08-2010 Synopsis Preparation and Submission
28-08-2010 03-09-2010 Project Discussion with Coordinator and Topic
Finalization
04-09-2010 10-09-2010 Detailed Literature Survey
11-09-2010 24-09-2010 Algorithm Finalization and Detailed Study
25-09-2010 01-10-2010 Drawing UML diagrams
02-10-2010 08-10-2010 Preparing PPT
09-10-2010 15-10-2010 Preparing Mid Term Report
16-10-2010 26-11-2010 Language Study (Visual C# .NET)
01-01-2011 02-03-2011 Coding and Implementation
03-03-2011 27-04-2011 Documentation
Table 4.3 Time Line Schedule Table
4.4 Time Line Chart:
Figure 4.4 Time line chart
Chapter 8: Conclusion and Future Direction
Our proposed system implements a novel classification mechanism for efficiently analyze the
brain tumor images using RDTNN classifier. We utilized ROI (Region of Interest) segmentation
method for CT image. Using DWT, the key features are extracted; the extracted features are
taken as input for RDT to reduce the dimensionality of features. Then the images were trained
with KNN classifier. Finally, the proposed algorithm is significantly efficient for classification of
the human brain image is benign and malignant with high sensitivity, specificity and accuracy
rates. The performance of this study appears some advantages of this technique: it is accurate,
robust easy to operate, non-invasive and inexpensive. In future work, we have a plan to explore
different types of medicinal images as well as some other application domains and study some
formal properties of image features.
Reference
[1] I. Kononenko, “Machine learning for medical diagnosis: History, state of the art and
perspective,” Artif. Intell. Med., vol. 23, no. 1, pp. 89–109,
2001.
[2] G. D. Magoulas and A. Prentza, “Machine learning in medical applications,” Mach. Learning
Appl. (Lecture Notes Comput. Sci.),
Berlin/Heidelberg, Germany: Springer, vol. 2049, pp. 300–307, 2001.
[3] L. Breiman, “KNN predictors,” Mach. Learning, vol. 24, no. 2, pp. 123–140, 1996.
[4] Y. Freud and R. E. Schapire, “A decision-theoretic generalization of online learning and an
application to boosting,” J. Comput. Syst. Sci., vol. 55,
no. 1, pp. 119–139, 1997.
[5] T.K.Ho, “The random subspace method for constructing decision forests,” IEEE Trans.
Pattern Anal.Mach. Intell., vol. 20, no. 8, pp. 832–844, 1998.
[6] L. Breiman, “Random forests,” Mach. Learning, vol. 45, pp. 5–32, 2001.
[7] L. Rokach and O. Maimon, Data Mining with Decision Trees Theory and Applications
(Machine Perception and Artificial Intelligence Series 69). H. Bunke and P. S. P. Wang, Eds.
Singapore: World Scientific, 2008.
[8] A. L. Prodromidis, S. J. Stolfo, and P. K. Chan, “Effective and efficient pruning of
metaclassifiers in a distributed data mining system,”, Columbia
Univ., New York, Tech. Rep. CUCS-017-99, 1999.
[9] M. Robnik-Sikonja, “Improving random forests,” in Proc. Eur. Conf. Mach. Learning, 2004,
pp. 359–369.
[10] A. Tsymbal, M. Pechenizkiy, and P. Cunningham, “Dynamic integration with random
forests,” in Proc. Eur. Conf. Mach. Learning, vol. 4212,
Berlin/Heidelberg, Germany: Springer, 2006.
[11] P. Cunningham, “A taxonomy of similarity mechanisms for case-based reasoning,”,
University College Dublin, Dublin, Ireland, Tech. Rep. UCDCSI-
2008-01, 2008.
[12] H. Hu, J. Li, H. Wang, G. Daggard, and M. Shi, “A maximally diversified multiple decision
tree algorithm for microarray data classification,”, presented at the Workshop Intell. Syst.
Bioinformat., Hobart, Australia
2006.
[13] S. Gunter and H. Bunke, “Optimization of weights in a multiple classifier handwritten word
recognition system using a genetic algorithm,”
Electron. Letters Comput. Vision Image Anal., pp. 25–41, 2004.
[14] E. E. Tripoliti, D. I. Fotiadis, M. Argyropoulou, and G. Manis, “A six stage approach for the
diagnosis of the Alzheimer’s disease based on
fMRI data,” J. Biomed. Informat., vol. 43, pp. 307–310, 2010.
[15] S. Bernard, L. Heutte, and S. Adam, “On the selection of decision trees
in random forests,” in Proc. IEEE-ENNS Int. Joint Conf. Neural Netw.,
2009, pp. 302–307.
[16] E. E. Tripoliti, D. I. Fotiadis, and G. Manis, “Modifications of random
forests algorithm,” Data Knowl. Eng., to be published.
[17] E. Gatnar, “A diversity measure for tree-based classifier ensembles,” in
Data Analysis and Decision Support, D. Baier, et al.., Eds. Heidelberg,
Germany: Springer, pp. 30–38, 2005.
[18] G. Giacinto, F. Roli, and G. Fumera, “Design of effective multiple classifiers
systems by clustering of classifiers,” in Proc. 15th Int. Conf. Pattern
Recog., 2000, pp. 160–163.
[19] G. Martinez-Munoz and A. Suarez, “Pruning in ordered KNN ensembles,”
in Proc. 23rd Int. Conf. Mach. Learning, 2006, pp. 609–616.
[20] C. Orrite, M. Rodriquez, F. Martinez, and M. Fairhurst, “Classifier ensemble
generation for the majority vote rule,” in Lecture Notes on Computer
Science, J. Ruiz-Shulcloper et al., Eds. BerlinHeidelberg, Germany:
Springer-Verlag, pp. 340–347, 2008.
[21] P. Letinne, O. Bebeir, and C. Decaestecker, “Limiting the number
of trees in random forests,” in Lecture Notes on Computer Science.
BerlinHeidelberg, Germany: Springer-Verlag, 2001, pp. 178–187.
[22] J. Xiao and Ch. He, “Dynamic classifier ensemble selection based on
GMDH,” in Proc. Int. Joint Conf. Comput. Sci. Optimization, 2009,
pp. 731–734.
[23] R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “A
comparison of decision tree ensemble creation techniques,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 173–180, Jan. 2007.
[24] S. Bernard, L. Heutte, and S. Adam, “Forest-RK: A new random forest
induction method,” in Proc. Int. Conf. Intell. Comput. 2008. Lecture
Notes in Artificial Intelligence 5227, D.-S. Huang, et al., Eds. Heidelberg,
Germany: Springer, 2008a, pp. 430–437.
[25] E. E. Tripoliti, D. I. Fotiadis, and G. Manis, “Dynamic construction of
random forests: Evaluation using biomedical engineering problems,”, presented
at the 10th Int. Conf. Inf. Technol. Appl. Biomed. Corfu, Greece,
2010.
[26] G. W. Brier, “Verification of forecasts expressed in terms of probability,”
Monthly Weather Review, vol. 78, pp. 1–3, 1950.
Advance KNN classification of brain tumor

More Related Content

What's hot

Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray dataGianluca Bontempi
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...cscpconf
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningIOSR Journals
 
Pest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image ProcessingPest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image ProcessingIOSR Journals
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Survey on semi supervised classification methods and
Survey on semi supervised classification methods andSurvey on semi supervised classification methods and
Survey on semi supervised classification methods andeSAT Publishing House
 
Basic course for computer based methods
Basic course for computer based methodsBasic course for computer based methods
Basic course for computer based methodsimprovemed
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesNTNU
 
Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...journalBEEI
 
Predictions on wheat crop yielding through fuzzy set theory and optimization ...
Predictions on wheat crop yielding through fuzzy set theory and optimization ...Predictions on wheat crop yielding through fuzzy set theory and optimization ...
Predictions on wheat crop yielding through fuzzy set theory and optimization ...TELKOMNIKA JOURNAL
 
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...IJERA Editor
 
Basic course on computer-based methods
Basic course on computer-based methodsBasic course on computer-based methods
Basic course on computer-based methodsimprovemed
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...cscpconf
 
Data & data reprentation
Data & data reprentationData & data reprentation
Data & data reprentationSomeshwarMoholkar
 
Support vector classification
Support vector classificationSupport vector classification
Support vector classificationbutest
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 

What's hot (20)

Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 
Poster_JOBIM_v4.2
Poster_JOBIM_v4.2Poster_JOBIM_v4.2
Poster_JOBIM_v4.2
 
Pest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image ProcessingPest Control in Agricultural Plantations Using Image Processing
Pest Control in Agricultural Plantations Using Image Processing
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Survey on semi supervised classification methods and
Survey on semi supervised classification methods andSurvey on semi supervised classification methods and
Survey on semi supervised classification methods and
 
Basic course for computer based methods
Basic course for computer based methodsBasic course for computer based methods
Basic course for computer based methods
 
A Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification treesA Bayesian approach to estimate probabilities in classification trees
A Bayesian approach to estimate probabilities in classification trees
 
Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...Selection of optimal hyper-parameter values of support vector machine for sen...
Selection of optimal hyper-parameter values of support vector machine for sen...
 
Predictions on wheat crop yielding through fuzzy set theory and optimization ...
Predictions on wheat crop yielding through fuzzy set theory and optimization ...Predictions on wheat crop yielding through fuzzy set theory and optimization ...
Predictions on wheat crop yielding through fuzzy set theory and optimization ...
 
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
Possibilistic Fuzzy C Means Algorithm For Mass classificaion In Digital Mammo...
 
Basic course on computer-based methods
Basic course on computer-based methodsBasic course on computer-based methods
Basic course on computer-based methods
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
 
Data & data reprentation
Data & data reprentationData & data reprentation
Data & data reprentation
 
Support vector classification
Support vector classificationSupport vector classification
Support vector classification
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 

Similar to Advance KNN classification of brain tumor

IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.Ehsan Lotfi
 
Detection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingDetection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingNaeem Shehzad
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmIOSR Journals
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...ahmad abdelhafeez
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2NBER
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A ReviewIOSRjournaljce
 
Positive Impression of Low-Ranking Microrn as in Human Cancer Classification
Positive Impression of Low-Ranking Microrn as in Human Cancer ClassificationPositive Impression of Low-Ranking Microrn as in Human Cancer Classification
Positive Impression of Low-Ranking Microrn as in Human Cancer Classificationcsandit
 
International Journal of Pharmaceutical Science Invention (IJPSI)
International Journal of Pharmaceutical Science Invention (IJPSI)International Journal of Pharmaceutical Science Invention (IJPSI)
International Journal of Pharmaceutical Science Invention (IJPSI)inventionjournals
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
 
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...IRJET Journal
 

Similar to Advance KNN classification of brain tumor (20)

IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.2014 Gene expressionmicroarrayclassification usingPCA–BEL.
2014 Gene expressionmicroarrayclassification usingPCA–BEL.
 
Detection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processingDetection of leaf diseases and classification using digital image processing
Detection of leaf diseases and classification using digital image processing
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution Algorithm
 
F017533540
F017533540F017533540
F017533540
 
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
Classification Techniques: A Review
Classification Techniques: A ReviewClassification Techniques: A Review
Classification Techniques: A Review
 
Positive Impression of Low-Ranking Microrn as in Human Cancer Classification
Positive Impression of Low-Ranking Microrn as in Human Cancer ClassificationPositive Impression of Low-Ranking Microrn as in Human Cancer Classification
Positive Impression of Low-Ranking Microrn as in Human Cancer Classification
 
International Journal of Pharmaceutical Science Invention (IJPSI)
International Journal of Pharmaceutical Science Invention (IJPSI)International Journal of Pharmaceutical Science Invention (IJPSI)
International Journal of Pharmaceutical Science Invention (IJPSI)
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data ClusteringAn Heterogeneous Population-Based Genetic Algorithm for Data Clustering
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
 
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization an...
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 

Recently uploaded

Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligencemahaffeycheryld
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptamrabdallah9
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTUUNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTUankushspencer015
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.benjamincojr
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...Amil baba
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfragupathi90
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological universityMohd Saifudeen
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...IJECEIAES
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 

Recently uploaded (20)

Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTUUNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
 
electrical installation and maintenance.
electrical installation and maintenance.electrical installation and maintenance.
electrical installation and maintenance.
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university21scheme vtu syllabus of visveraya technological university
21scheme vtu syllabus of visveraya technological university
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...Fuzzy logic method-based stress detector with blood pressure and body tempera...
Fuzzy logic method-based stress detector with blood pressure and body tempera...
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 

Advance KNN classification of brain tumor

  • 1. Advance Boost Classifier using Random Tree and KNN for segmentation and Classification of Brain tumor Chapter 1: Introduction The accurate diagnosis of diseases with high prevalence rate, such as Brain Tumor diseases, is one of the most important biomedical problems whose administration is imperative. In this project, we present a new method for the automated diagnosis of diseases based on the improvement of random forests classification algorithm. More specifically, the dynamic determination of the optimum number of base classifiers composing the random forests is addressed. The proposed method is different from most of the methods reported in the literature, which follow an overproduce-and-choose strategy, where the members of the ensemble are selected from a pool of classifiers, which is known a priori. In our case, the number of classifiers is determined during the growing procedure of the forest. Additionally, the proposed method produces an ensemble not only accurate, but also diverse, ensuring the two important properties that should characterize an ensemble classifier. The method is based on an online fitting procedure and it is evaluated using eight biomedical datasets and five versions of the random forests algorithm (40 cases). The method decided correctly the number of trees in 90% of the test cases. Chapter 2: Literature Survey INTRODUCTION Paper1: FCM and KNN BasedAutomatic Brain Tumor Detection A brain tumor is formed when abnormal cells get accumulated within the brain. These cells multiply in an uncontrolled manner and damage the brain tissues. Magnetic Resonance Image scans are commonly used to diagnose brain tumors. However, segmenting and detecting the brain tumor manually is a tedious task for the radiologists. Hence, there is a need for automatic systems which yield accurate results. A fully automatic method is introduced to detect brain tumors. It consists of five stages Image Acquisition, Preprocessing, Segmentation, using Fuzzy C-means technique; Harris Corner Detection based feature extraction and classification using K- NN. Performance metrics such as accuracy, precision, sensitivity and specificity are used to evaluate the performance. Methodology A schematic overview of the proposed approach is illustrated in . A random forest classifier was applied to the feature data from each modality independently, not only to obtain single-modality classification results for comparison, but also to derive the similarities required for manifold learning. The resulting similarity matrices were combined, and classicalMDSwas applied to generate a joint embedding for multi-modality classification. Full details of the data collection and feature extraction are presented in the Neuroimaging and biological feature data section.
  • 2. Paper 2: Automated Diagnosis of Diseases Based on Classification: Dynamic Determination of the Number of Trees in Random Forests Algorithm An important task of any diagnostic system is the process of attempting to determine and/or identify a possible disease or disorder and the decision reached by this process. For this purpose, machine learning algorithms are widely employed [1], [2]. For these machine learning techniques to be useful in medical diagnostic problems, they must be characterized by high performance, the ability to deal with missing data and with noisy data, the transparency of diagnostic knowledge, and the ability to explain decisions. In this paper, the improvement of the random forests classification algorithm, which meets the aforementioned characteristics, is addressed. This is achieved by determining automatically the only tuning parameter of the algorithm, which is the number of base classifiers that compose the ensemble and affects its performance. Random forests are a substantial modification of KNN [3]–[6]. It constructs a large number of unpruned and decorrelated trees. The generation of the trees is based on the combination of two sources of randomness. First, each tree is constructed on a bootstrap replicate of the original dataset, as in KNN, and second a random feature subset, of fixed predefined size, is considered for splitting each node of the tree. Gini index is used as the feature evaluation measure that determines the best split. The decision tree is built to the maximum size without pruning. The random forests classify each new instance by the majority vote of the full set of trees. One of the most important issues in the creation of an ensemble classifier, such as random forests, is the size of the ensemble, the number of classifiers composing the ensemble, and how the unnecessary classifiers are removed from the ensemble. The factors that may affect the size of the ensemble are: 1) The desired accuracy, 2) The computational cost, 3) The nature of the classification problem, and 4) The number of available processors. The methods reported in the literature, dealing with this problem, can be grouped into three categories: 1) Methods that preselect the ensemble size, 2) Methods that postselect the ensemble size (pruning of the ensemble), and 3) Methods that select the ensemble size during training. Preselection methods are the simplest way to determine the ensemble size. More specifically, the number of the base classifiers is a tuning parameter of the algorithm, which can be set by the user. Pruning methods contain precombining and postcombining methods [8]. In the first case, pruning is performed before combining the classifiers. The classifiers that seem to perform well are included in the ensemble. The predictive strength of a classifier is determined using different evaluation measures. In postcombining pruning methods, the classifiers are removed from the ensemble based on their contribution to the collective. More specifically, most of the
  • 3. postcombining pruning methods are based on the overproduce-and-choose strategy, which consists of two phases. The overproduction phase aims to produce a large initial pool of candidate classifiers, while the selection phase aims to choose adequate classifiers from the pool of classifiers so that the selected group of classifiers can achieve optimum positive predictive rate. In the second phase (selection phase), different approaches are used. More specifically, ensemble selection methods can be grouped into the following categories: 1) Weighted voting methods 2) search-based methods 3) clustering-based methods 4) Ranking methods 5) Optimization of a measure or function methods. Architecture for previous method The proposed method is based on the iterative procedure shown in Fig. 1. The method consists of three basic steps: 1) the construction of the initial forest, 2) the application of the fitting procedure, and 3) the examination of the termination criterion. 1) Construction of the Forest: In the first step, the method constructs a forest with ten trees. For the construction of the forest, the classical random forests and some modifications of it are used. More specifically, random forests with ReliefF (RF with ReliefF) [9], random forests with multiple estimators (RF with me) [9], RK Random Forests (RK-RF) [24], and RK Random Forests with multiple estimators (RK-RF with me) [16] are employed. The classical random forests constructs a collection of trees. For the construction of each tree,
  • 4. a bootstrap sample of the dataset is selected. The tree is built to the maximum size without pruning. The tree is just grown until each terminal node contains only members of a single class. The Gini index [9] is used to determine the best split of each node. Only a subset m of the total set of M features is employed as the candidate splitters of the node of the tree. The number of the selected features (m) remains constant throughout the construction of the forest. If a popularity of the trees agrees on a given classification, then that is the predicted class of the sample being classified. In RF with ReliefF, the Gini index is replaced by ReliefF. ReliefF evaluates the partitioning capability of attributes according to how well their values distinguish between similar instances. The replacement of Gini index is the core idea of RF with me too. However, five evaluation measures are used instead of one. Those evaluation measures are: Gini index [9], grain ratio [9], ReliefF [9], minimum description length [9], and myopic ReliefF [9]. The differentiation between classical random forests and RK-RF lies in the value of the parameter m. More specifically, it is not the same throughout the construction of the forest, but it is randomly chosen for each node of the tree. Finally, in RK-RF with me, the random selection of the parameter m is combined with multiple estimators to accomplish the construction of the forest. The detailed description of the previous algorithms is provided in [16]. During the construction of the forest, the accuracy and the average correlation of the forest are computed each time a new tree is added. The forest primarily consists of ten trees. The reason for selecting this initial value is due to the fact that the fitting procedure, that is going to be applied in the next step, needs an adequate number of points to start. 2) Fitting Procedure: After the construction of the initial forest, an iterative procedure is used. The procedure consists of three basic stages: 1) add a new tree, 2) apply polynomial fits, and 3) select the best fit. The polynomial fits that are employed are given by: fn−1 (x) = pn xn + pn−1 xn−1 + ・ ・ ・ + p0, n= 2, ..., 9. (1) where x is the data to be fitted and pn are the coefficients of the polynomial. The best fit is the one with the minimum rms error (the root of the average of the squares of the differences between the predicted and actual values). 3) Examination of the Termination Criterion: In this step, the method examines if the stopping criterion is fulfilled. For this purpose, three different criteria were tested in order to conclude to the best one. The first criterion (criterion 1) searches for consecutive points in the fitted curve, where the difference between the fitted curve and the curve of the accuracy is greater than a predefined threshold. If there is such a region, the method terminates and returns the point of this region for which the maximum accuracy is observed. The number of consecutive points should be at least 10 and the difference, point by point, between the curves should be greater than 0.004 [25]. In the case that the criterion is fulfilled and there are more than one points, where the maximum accuracy is observed, the one with the lowest Brier score [26] is selected. Brier score is a function, which measures the average squared deviation
  • 5. between predicted probabilities for a set of events and their outcomes. Thus, the lowest score corresponds to the highest accuracy. Chapter 3: Proposed Work Statement of Problem: Attributed their success to the two main components of their system: discrimination and randomization. Discrimination refers to the use of SVM to learn the splits at each node, whereas randomization refers to a random selection of image patches, which are used as a form of features to learn the splits at each node. There are several problems that may arise from this randomization procedure. Firstly, if we consider image patches of size 50X50 in an 500X500 image, sampling space may contain thousands of patches, which makes it less likely that a randomly selected patch will contain an object of interest for the image categorization. In addition, randomly selected samples are more likely to over-lap with each other, which would cause redundancy. Therefore, in this project, I investigated new ways for selecting image patches. In theory, more informative patch selection should result in higher quality splits at each tree node, which in turn should increase overall accuracy of the classifier. Features and Scope: To Fix the problems related to random patch selection I integrated a selective search segmentation algorithm into the original random forest framework. Image patches selected using selective search segmentation is more likely to contain the objects of interest. In addition, segmentation should eliminate redundant overlapping between the image patches, which will make our feature space more diverse. Fixing these two problems should result in an increased discriminative power of random forest. Goals: Before beginning Random Forest procedure, I standardize each image by rescaling them to the same size and then apply Selective Search Segmentation to extract important regions from each image. Each region is represented by 4 coordinates in the image (points in the bottom left and top right corners of the region). Then, SVM is applied to all the regions that were returned by
  • 6. Selective Search Segmentation and its centroids are chosen as then all candidate regions. In this particular case, I used 1024 centroids. INTRODUCTION Decision tree is one of the machine learning methods widely used to analyze proteomics data. Generated from a given dataset, a single decision tree reports a classification result by each of its terminal leaves (classifiers). Even though there exist many algorithms such as C4.5 that can be used to generate a well-modeled single decision tree, it is still possible that its prediction is biased, thus adversely affecting its accuracy. To overcome this problem, more than one decision trees are used to analyze the data. It is based on the concept of forming a panel of experts who will then vote to decide the final outcome. The panel of experts is analogous to the ensemble of decision trees, which provide a pool of classifiers. Similar to voting, the classifier that has a majority becomes the true classification result for the data. the decision tree ensemble is more accurate than a single decision tree. The diagram shown summarizes the process of generating decision tree ensemble and classification of data using KNN algorithm. To explain briefly, KNN algorithm begins with randomly sampling (with replacement) the data from the original dataset to form a training set. A multiple of training sets are usually generated. Note that, since the replacement is allowed, the data in the training set can be duplicated. Each of training sets then generates a decision tree. For a given test data, each decision tree predicts an outcome, represented by a classifier. The ensemble of decision trees forms a panel of experts whose votes determine the final classification result from this group of classifiers.
  • 7. ARCHITECTURE DIAGRAM The performance in identifying biomarkers for premalignant pancreatic Alzheimer could be enhanced by using the decision tree ensemble techniques instead of a single algorithm counterpart. These techniques had proven more likely to accurately distinguish disease class from normal class as indicated by a larger area under the Receiver Operating Characteristic curve. Moreover, they achieved comparatively lower root mean squared errors. According to their method, the peptide mass-spectrometry data were processed first to improve data integrity and reduce variations among data due to the differences in sample loading conditions. The preprocessing steps involved baseline adjustment using group median, smoothing to remove noise using a Gaussian kernel, and normalization to make all the data comparable. After that, the data were randomly sampled such that 90% formed a training set and the remaining 10% formed a test set. The training set was used in feature selection. In the study, the authors considered three different feature selection methods. The first method was a two-sample homoscedastic t test, which was used under the assumption that all the features from either normal or disease class had normal distribution. Unlike the first method, the second method based on ANDI rank test considered that the features had no distribution. The last feature selection method was a genetic algorithm. The test set was used to generate a single decision tree including the decision tree ensembles. The ensemble methods being studied were Random Forest, Random Tree, KNN, boost, Stacking,
  • 8. Adaboost, and Multiboost. Their performances were measured in terms of accuracy and error in the classification of the features, selected by each selection method. Then, they were compared against the performance of a single decision tree generated by C4.5 algorithm. The process repeated ten times to validate the resulting performance consistency. According to the results reported, the decision tree ensembles achieved higher accuracy up to 70% regardless of the feature selection methods used. In terms of biomarker identification, both the t test and the ANDI rank test had similarly impressive performance by consistently selecting the same biomarker-suspect features. Unlike the first two methods, the performance of the genetic algorithm was considerably poor. also noted that 70% accuracy was still lower than expected. This could be as a result from a naturally low concentration of the biomarkers at the premalignant stage of the Alzheimer. In addition, it was also possible that one dataset might not be suitable for all algorithms, thus underestimating the accuracy.
  • 9. Raw spectrum data: We use GAUSSIAN EDGE with 4 levels. Gaussian kernel smoothing: A process of averaging the data points by applying a Gaussian functions. Basically, the Gaussian function is used to generate a set of normalized weighting coefficients for the data points whose weighted sum generates a new value. This new value replaces the old one at the center of Gaussian curve. Goal and overview of this research: The goal of this research work was to extract the meaningful knowledge lied in the database and transform them into meaningful rules.
  • 10. Block diagram of the research work process Then the rules are used to predict the class labels of unknown data. Finally we introduced KNN and Boosting to improve the accuracy of this whole process. Keeping the aimed goal of this research in mind we constructed the whole research process as shown in the block diagram in Fig. 1. Here the decision tree induction algorithms are used to turn the hidden knowledge into a large dataset into decision rules. Again enhancements are made to these algorithms to extract and use the rules more precisely to improve the accuracy. In the research we have used heart disease dataset which is collected from UCI machine learning repository. At first, ID3 algorithm is used to extract rules from the dataset and to use the rules to classify new data which is implemented in C#. C4.5, the successor of ID3 is then used to classify data more accurately. Finally, two new approaches named KNN and Boosting are introduced to improve the predictive accuracy of C4.5. Background study
  • 11. Classification and prediction: Data classification is a two-step process . In the first step, a model is built describing predetermined set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. Each tuple is assumed to belong to a predefined class as determined by one of the attributes, called the class label attribute. In the context of classification, data tuples are also referred to as samples or objects. The data tuples analyzed to build the model collectively form the training dataset. The individual tuples making up the training set are referred to as training samples and are randomly selected from the sample population . Since the class label of each training sample is provided, this step is also known as supervised learning. It contrasts with unsupervised learning, in which the class label of each training sample is not known and the number or set of classes to be learned may not be known in advance . Prediction can be viewed as the construction and use of a model to assess the class of an unlabeled sample or to assess the value or value ranges of an attribute that a given sample is likely to have. In this view, classification and regression are the two major types of prediction problems where classification is used to predict discrete or nominal values, while regression is used to predict continuous or ordered values. In the view, however, refer to the use of prediction to predict class label as classification and the use of prediction to predict continuous values as prediction. Decision tree induction: Decision tree induction is a greedy algorithm that constructs decision tree in a top-down recursive divide and conquer manner. A decision tree is a tree in which each branch node represents a choice between a numbers of alternatives and each leaf node represents a decision. Decision trees are commonly used for gaining information for the purpose of decision-making. It starts with a root node and forms this node; users split each node recursively according to decision tree learning algorithm. The final result is a decision tree in which each branch represents a possible scenario of decision and its outcome. For extracting rules, information gain measure is used to select the test attribute at each node in the tree. The attribute with the highest information gain is chosen as the test attribute for the current node and the path from the root node to each leaf node in the tree is tracked to construct rules from the dataset. They use induction in order to provide an appropriate classification of objects in terms of their attributes, inferring decision tree rules. In their learning phase, explicit rules or interactions among relevant features are induced. Such a learning method differs from non-linear classifiers such as support vector machines or neural networks where the learning phase is to determine the parameters of the non-linear kernel functions. ID3 algorithm:
  • 12. The ID3 (Iterative Dichotomiser 3) technique to building a decision tree is based on information theory and attempts to minimize the expected number of comparisons. The basic idea of the induction algorithm is to ask questions whose answers provide the most information. The first question divides the search space into two large search domains while the second performs little division of the space. The basic strategy used by ID3 is to choose splitting attributes with the highest information gain first. The amount of information associated with an attribute value is related to the probability of occurrence. Let node N represents or hold the tuples of partition D. The attribute with the highest information gain is chosen as the splitting attribute for node N. This attribute minimizes the information needed to classify the tuples in the resulting partitions and reflects the least randomness or impurity in these partitions. To calculate the of an attribute, at first we calculate the entropy of that attribute by the following formula: (1) Where, Pj is the probability that an arbitrary tuple in S belongs to class Cj and estimated by |Ci,D| / |D|. A log function to the base 2 is used because the information is encoded in bits. Entropy (S) is just the average amount of information needed to identify the class label of the tuple in S. Now, the gain of an attribute is calculated by the formula (2) where, Si ={S1, S2......Sn} = partitions of S according to values of attribute A: n = Number of attributes A |Si| = Number of cases in the partition Si |S| = Total number of cases in S Information gain is defined as the difference between the original information requirement and new requirement. (3)
  • 13. In other words, Gain (A) tell us how much would be gained by branching on A. It is the expected reduction in the information requirement caused by knowing the value of A. The attribute A with highest information gain is chosen as the splitting attribute at node N. MATERIALS AND METHODS New decission tree learning algorithms: The C4.5 algorithm extension of his own ID3 algorithm for generating decision trees. KNN and Boosting are general strategies for improving classifier and predictor accuracy. Suppose that we are a patient and would like to have a diagnosis made based on the symptoms. Instead of asking one doctor, we may choose to ask several. If a certain diagnosis occurs more than any others, we may choose this as the final or best diagnosis. That is the final diagnosis is made based on a majority vote where each doctor gets an equal vote. Now replace each doctor by a classifier, we have the basic idea behind KNN. In boosting, we assign weights to the value of each doctor’s diagnosis, based on the accuracies of previous diagnoses they have made. The final diagnosis is then a combination of the weighted diagnoses. C4.5 Algorithm: Just as with CART, the C4.5 algorithm recursively visits each decision node, selecting the optimal split, until no further splits are possible. The steps of C4.5 algorithm for growing a decision tree is given below: • Choose attribute for root node • Create branch for each value of that attribute • Split cases according to branches • Repeat process for each branch until all cases in the branch have the same class A question that, how an attribute is chosen as a root node? At first, we calculate of the gain ratio of each attribute. The root node will be that attribute whose gain ratio is maximum. Gain ratio is calculated by the formula. (4)
  • 14. Where, A is an attribute whose gain ratio will be calculated. The attribute A with the maximum gain ratio is selected as the splitting attribute. This attribute minimizes the information needed to classify the tuples in the resulting partitions. Such an approach minimizes the expected number of tests needed to classify a given tuple and guarantees that a simple tree if found. To calculate the gain of an attribute, at first we calculate the entropy of that attribute by the following formula (5) Where, Pi is the probability that an arbitrary tuple in S belongs to class Ci and estimated by |Ci;D|/|D|. A log function to the base 2 is used because the information is encoded in bits. Entropy (S) is just the average amount of information needed to identify the class label of the tuple in S. Now gain of an attribute is calculated by the formula (6) Where, Si = {S1, S2.....Sn} = partitions of S according to values of attribute A: n = Number of attributes A |Si| = Number of cases in the partition Si |S| = Total number of cases in S The gain ratio divides the gain by the evaluated split information. This penalizes splits with many outcomes. (7) The split information is the weighted average calculation of the information using the proportion of cases which are passed to each child. When there are cases with unknown outcomes on the split attribute, the split information treats this as an additional split direction. This is done to penalize splits which are made using cases with missing values. After finding the best split, the tree continues to be grown recursively using the same process. KNN:
  • 15. We first take an intuitive look at how researchers as a method of increasing accuracy. Suppose that we are a patient and would like to have a diagnosis made based on the symptoms. Instead of asking one doctor, you may choose to ask several. If a certain diagnosis occurs more than any others, you may choose this as the final or best diagnosis. That is the final diagnosis is made based on a majority vote where each doctor gets an equal vote. Now replace each doctor by a classifier, you have the basic idea behind KNN. Intuitively, a majority vote made by a large group of doctors may be more reliable than a majority vote made by a small group. Given a set, D, of d tuples, KNN works as . For iteration i (I =1, 2, 3,....., k), a training set, Di of d tuples is sampled with replacement from the original set of tuples, D. Note that the term KNN stands bootstrap aggregation. Each training set is a bootstrap sample. Because sampling with replacement is used, some of the original tuples of D may not be included in Di, whereas others may occurs more than once. A classifier model Mi is learned for each training set, Di. To classify an unknown tuple, X, each classifier, Mi, returns its class prediction, which counts as one vote. The bagged classifier, M*, counts the votes and assigns the class with the most vote to X. KNN can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple. Algorithm: KNN: The KNN algorithm creates an ensemble of models (classifiers or predictors) for a learning scheme where each model gives an equally-weighted prediction. Input: • D, a set of training tuples • k, the number of models in the ensemble • A learning scheme (e.g., decision tree algorithm, backpropagation, etc.) Output: A composite model, M* Method: • For I = 1 to k do// create k models • Create bootstrap sample, Di by sampling D with replacement • Use Di to derive a model, Mi • Endfor
  • 16. To use the composite model on a tuple, X: • If classification then • Let each of the k models classify X and return the majority vote • If prediction then • Let each of the k models predict a value for X and return the average predicted value The bagged classifier often has significantly greater accuracy than a single classifier derived from D, the original training data. It will not be considerably worse and is more robust to the effects of noisy data. The increased accuracy occurs because the composite model reduces the variance of the individual classifiers. For prediction, it was theoretically proven that a bagged predictor will always have improved accuracy over a single predictor derived from D. Boosting: Boosting is a general method for improving accuracy of any given learning algorithm. It is an effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules of thumb. In the research we have focused especially on the AdaBoost . Adaboost algorithm: In AdaBoost, the input includes a dataset D of d class-labeled tuples, an integer k specifying the number of classifiers in the ensemble and a classification-learning scheme. Each tuple in the dataset is assigned a weight. The higher the weight is the more it influences the learned theory. Initially, all weights are assigned a same value of 1/d. The algorithm repeats k times. At each time, a model Mi is built on current dataset Di which is obtained by sampling with replacement on original training dataset D. The framework of this algorithm is as follows: Algorithm: AdaBoost Input: • D, a set of d class-labeled training tuples • K, the number of rounds • A classification learning scheme
  • 17. Output: A composite model Method: • Initialize the weight of each tuple in D to 1/d • For I = 1-k do • Sample D with replacement according to the tuple weights to obtain Di • Use training set Di to drive a model, Mi • Compute the error rate error(Mi) of Mi • If error(Mi) >0.5 then • Reinitialize the weights to 1/d • Go back to step 3 and try again • Endif • Update and normalize the weight of each tuple; • Endfor The error rate of Mi is the sum of the weights of all tuples in Di that of the tuples in Di that Mi misclassified: (8) Where, err (Xj) = 1, if Xj is misclassified and err (Xj) = 0 otherwise. Then the weight of each tuple is updated so that the weights of misclassified tuples are increased and the weights of correctly classified tuples are decreased. This can be done by multiplying the weights of each correctly classified tuple by error (Mi)/(1- error (Mi)). The weights of all tuples are then normalized so that the sum of them of them is equal to 1. In order to keep this constraint, the weight of each tuple is divided by the sum of the new weights.
  • 18. After K rounds, a composite model will be generated, or an ensemble of classifiers which is then used to classify new data. When a new tuple X comes, it is classified through these steps: • Initialize weight of each class to 0 • For i = 1-k do • Get weight wi of classifier Mi • Get class prediction for X from Mi:c = Mi (Xi) • Add βi to weight for class c • endfor • Return the class with the largest weight The weight wi of each classifier Mi is calculated by this Eq. 9: (9) Requirements for KNN and boosting: These two methods for utilizing multiple classifiers make different assumptions about the learning system. As above, KNN requires that the learning system should not be stable, so that small changes to the training set should lead to different classifiers. Breiman also notes that, poor predictors can be transformed into worse ones by KNN. Boosting, on the other hand, does not preclude the use of learning systems that produce poor predictors, provided that their error on the given distribution can be kept below 50%. However, boosting implicitly requires the same instability as KNN; if Ct is the same Ct-1 the weight adjustment scheme has the probability that error (Mt) = 0.5.
  • 19. Chapter 4: Project Design Hardware Requirements • SYSTEM : Pentium IV 2.4 GHz • HARD DISK : 40 GB • FLOPPY DRIVE : 1.44 MB • MONITOR : 15 VGA colour • MOUSE : Logitech. • RAM : 256 MB Software Requirements
  • 20. • Operating system :- Windows XP Professional • Front End : - Asp .Net 2.0. • Coding Language :- Visual C# .Net • Back-End : - Sql Server 2000. Module I/O Preprocessing Given Input- Image. Expected Output-Normalize image DFT Given Input- Image and Dataset. Expected Output- Classified Image. KNN
  • 21. Given Input- Classified image. Expected Output-Image Bins. Boosting Given Input- Image Bins. Expected Output- Rank Classified Image. Verification Given Input- Checks with user’s stored details like security answers or hidden details. Expected Output-If the verification is success, user can perform transaction, else blocks the card. Module diagram
  • 26. Component Diagram image RDT KNN Bin Boosting DB Ratio Transaction Feature Block Block Details DB feature Ranked image
  • 29. Chapter 5: Proposed Simulation/Experiments/Results/Analysis This study explores the utility of three different feature selection schemas to reduce the high dimensionality of a pancreatic Alzheimer proteomic dataset. Using the top features selected from each method, we compared the prediction performances of a single decision tree algorithm C4.5 with six different decision-tree based classifier ensembles (Random forest, Stacked generalization, KNN, Adaboost, Logitboost and Multiboost). We show that ensemble classifiers always outperform single decision tree classifier in having greater accuracies and smaller prediction errors when applied to a pancreatic Alzheimer proteomics dataset. Classification results using features selected by Student t test. Algorithm Accuracy(% ) TP rate FP rate TN rate FN rate Sensitivit y Specificit y Precisio n Fmeasur e RMSE Random Forest 0.6500 0.7 9 0.5 3 0.4 8 0.2 1 0.79 0.48 0.65 0.71 0.456 9 KNN 0.6833 0.7 8 0.4 4 0.5 6 0.2 2 0.78 0.56 0.69 0.73 0.428 5 Logitboost 0.6889 0.8 3 0.4 9 0.5 1 0.1 7 0.83 0.51 0.69 0.75 0.440 2 Stacking 0.6444 0.9 9 0.7 9 0.2 1 0.0 1 0.99 0.21 0.61 0.76 0.476 1 Multiboost 0.6889 0.8 1 0.4 6 0.5 4 0.1 9 0.81 0.54 0.70 0.74 0.517 5 Logistic 0.7500 0.7 9 0.3 0 0.7 0 0.2 1 0.79 0.70 0.78 0.78 0.422 4 Naivebaye s 0.6833 0.6 4 0.2 6 0.7 4 0.3 6 0.64 0.74 0.76 0.68 0.528 9 Bayesnet 0.6722 0.6 3 0.2 8 0.7 3 0.3 7 0.63 0.73 0.74 0.67 0.530 8 Neural Network 0.7000 0.7 0 0.3 0 0.7 0 0.3 0 0.70 0.70 0.75 0.72 0.451 7 RBFnet 0.6722 0.7 6 0.4 4 0.5 6 0.2 4 0.76 0.56 0.69 0.71 0.463 2 CRDTNN 0.9644 0.7 1 0.3 3 0.6 8 0.2 9 0.71 0.68 0.74 0.71 0.548 9 TP rate: True positive rate, FP rate: False positive rate, TN rate: True negative rate, FN rate: False negative rate, RMSE: Root Mean Squared Error. RBFnet: Radio Basis Function network, SVM: Support Vector Machine.
  • 30. Chapter 6: Testing TESTING Testing is an activity in which a system or component is executed under specified conditions, whose results are observed or recorded, and an evaluation is made about some aspect of the system or component. A successful testing uncovers errors in the software. So in general testing demonstrates that the system is working according to the specifications, and that it meets the performance requirements. This is the final stage of any project. Testing is a process of executing the program with the intent of finding an error, it is a set of activities that can be planned in advance and conducted systematically. The purpose of System testing is to correct the errors in the system. Nothing is completed without the testing, as it is vital to the success of the system. 6.1 Testing Phases: Software testing phases include the following: Test activities are determined and Test data is selected. The test is conducted and test results are compared with the expected results. There are various types of Testing: Unit Testing: Unit testing is a procedure used to validate that individual units of source code are working properly. A unit is the smallest testable part of an application. In procedural programming a unit may be an individual program, function, procedure etc, while in object- oriented programming, the smallest unit is always a Class; which may be a base/super class, abstract class or derived/child class. Units are distinguished from modules in that modules are typically made up of units Integration Testing:
  • 31. Integration Testing is the phase of software testing in which individual software modules are combined and tested as a group. It follows unit testing and precedes system testing. The goal is to see if the modules are properly integrated and the emphasis being on the testing interfaces among modules. System Testing: System testing is testing conducted on a complete, integrated system to evaluate the system's compliance with its specified requirements. System testing is actually done to the entire system against the Functional Requirement Specification(s) (FRS) and/or the System Requirement Specification (SRS). It is also intended to test up to and beyond the bounds defined in the software/hardware requirements specification(s). Acceptance Testing: Acceptance testing generally involves running a suite of tests on the completed system. The acceptance test suite is run against the supplied input data or using an acceptance test script to direct the testers. Then the results obtained are compared with the expected results. If there is a correct match for every case, the test suite is said to pass. If not, the system may either be rejected or accepted on conditions previously agreed between the sponsor and the manufacturer. 6.2 Testing Methods: Testing is a process of executing a program to find out errors. Any testing can be done in two ways: White Box Testing: White Box testing uses an internal perspective of the system to design test cases based on internal structure. It requires programming skills to identify all paths through the software. The tester chooses test case inputs to exercise paths through the code and determines the appropriate outputs. Using the testing a software engineer can derive the following Test cases: Exercise all the logical decisions on either true or false sides. Execute all loops at their boundaries and within their operational boundaries. Exercise the internal data structures to assure their validity. Black Box Testing:
  • 32. Black box testing takes an external perspective of the test object to derive test cases. These tests can be functional or non-functional, though usually functional. The test designer selects valid and invalid input and determines the correct output. There is no knowledge of the test object's internal structure. Black Box testing attempts to find errors in the following categories:  Incorrect or missing functions  Interface errors  Errors in data structures  Performance errors  Initialization and termination errors 6.3 Test Approach: Testing can be done in two ways: o Bottom-up approach o Top-down approach Bottom-up approach: In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which then in turn are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, whereby the beginnings are small, but eventually grow in complexity and completeness. However, "organic strategies", may result in a tangle of elements and subsystems, developed in isolation, and subject to local optimization as opposed to meeting a global purpose. Top-down approach:
  • 33. In a top-down approach an overview of the system is first formulated, specifying but not detailing any first-level subsystems. Each subsystem is then detailed enough to realistically validate the model. 7.1 Black box Testing: Test Case 1: Color space conversion Objective: To check whether the RGB space converted into YUV Description: After putting RGB pixelite matrix get YUV factor Expected Behavior Observed Behavior Y is Luminance (brightness). U & V are Chrominance factor Conversion is done for obtaining brightness & color factors. Y is Luminance (brightness). U & V are Chrominance factor. Y=0.299*R +0.587*G+0.11*B U= (B-Y)*0.565 V= (R-Y)* 0.713 for y= 0 step 0 toImageHeight Test Case 2: Calculate histogram Objective: To check whether the Histogram computed or not Description: To check whether the Histogram computed or not while camera start capturing Expected Behavior Observed Behavior Compute Histogram for Color Image Y(Lumi histogram) in Array index U(Chromi Histogram) in Array index
  • 34. V(Chromi Histogram) in Array index YLumi = (int)(Blue * 0.1133 + Green * 0.5859 + Red * 0.3008); UChro = (int)(0.493 * (Blue - YLumi) + 128); VChro = (int)(0.877 * (Red - YLumi) + 128); HIndex = YLumi / 4; HIndex = UChro / 4; HIndex = VChro / 4; Test Case 3: Analysis Objective: To Compare histogram using similarity HMM function Description: Compare two histogram using DCOS function till out layer equals 1 Expected Behavior Observed Behavior Compare two histogram using DCOS function and out layer equals 1 Dcos(A,B) = 1 for two hiastogram Test Case 4: Record Objective: To check whether the record module is working properly or not Description: After selecting this option, the recording should start Expected Behavior Observed Behavior
  • 35. Pixelgrabber grab pixel for image array Recording starts. And Pixelgrabber grab pixel for image array Test Case 5: File Indexing Objective: To check the working of file indexing module. Description: After selecting this option, the file indexing should start and frames should be captured. Expected Behavior Observed Behavior File Indexing Starts and keyframes are captured. File Indexing Starts and Keyframes are captured. Test Case 6: Image Feature Extraction. Objective: To check if all the indexed Image list and their keyframes are displayed. Description: After selecting this option, a list of all indexed Image is displayed and when a video is selected, its keyframes are displayed. Expected Behavior Observed Behavior A list of indexed Image and their keyframes are displayed. A list of indexed Image and their keyframes are displayed. Test Case 7: Query.
  • 36. Objective: To check if the query works properly and searches the image in indexed Image. Description: If the image queried is present in any video then the search is positive and a path is displayed. Expected Behavior Observed Behavior If image is present in an indexed video then path is displayed else not found. If image is present in an indexed video then path is displayed else not found. Test Case 8: Exit function. Objective: To check whether exit function is working correctly or not. Description: When we click on exit button project should be closed. Expected Behavior Observed Behavior When we click on exit button project should be closed. Project is closed successfully. 7.2 GUI testing:- Graphical User Interface (GUI) presents interesting challenges for software engineers. Because of reusable components provided as part of GUI development environments, the creation of the user interface has become less time consuming and more precise. But, the same time, the complexity of GUIs has grown, leading to more difficulty in the design and execution of the test cases. Because many modern GUIs have the same look and same feel, a series of test cases can be derived. Test Case 1:
  • 37. Objective: To check whether the menu selection process is working properly. Description: When we select any option from the menu, then that chosen option is selected and appropriate action is taken. Expected Behavior Observed Behavior Chosen option is selected and appropriate action is taken. Chosen option is selected and appropriate action is taken. Test Case 2: Objective: To check working of right-click menu which are on the main form. Description: To check whether right-click menu shortcut are working properly. Expected Behavior Observed Behavior The shortcuts work properly. The shortcuts work properly. 7.3 System Testing:- System testing is actually a series of different tests whose primary purpose is to fully exercise a computer based system. Although each test has different purpose, all work to verify that system elements have been properly integrated and allocated functions. Test Case 1:
  • 38. Objective: To check whether the system is working properly. Description: The analysis, Classification and Detection are working properly. Expected Behavior Observed Behavior The analysis, indexing and Detection are working properly. The analysis, indexing and Detection are working properly.
  • 39. Chapter 7:Schedule Work And Estimate
  • 40. Estimation and Efforts:- The costing feasibility of the project can be estimated using current estimated models such as the Lines of Code, which allow us to estimate cost as a function of size. Thus, this also allows us to estimate and analyze the feasibility of completion of the system in the given timeframe. This allows us to have a realistic estimate as well as a continuous evaluative perspective of the progress of the project. Number of people working on this project = 3 Duration of project = August 2010 to April 2011 The project is divided over a period from August 2010 to April 2011. This time span is divided into two major parts as follow. DURATION FROM DATE TO DATE WEEKS HOURS/ WEEK Duration I August November 14 6 Duration II Jan April 16 10 Table 4.2.1 Duration Table Due to the academic compulsions we will be available for the project for following man-hours. For Duration I : 14 * 6 = 84 MAN HOURS For Duration II : 16 * 10 = 160 MAN HOURS TOTAL availability = 224 MAN HOURS Name of module LOC count
  • 41. Capture 667 Analysis 445 Recording 430 File Indexing 460 Query 221 Total 2223 Table 4.2.2 KLOC Table Constructive Cost Model (COCOMO)computes software development effort (and cost) as a function of program size. Program size is expressed in estimated thousands of lines of code (KLOC). COCOMO applies to three classes of software projects: • Organic projects: Small teams with good experience working with less than rigid requirements • Semi-detached projects: Medium teams with mixed experience working with a mix of rigid and less than rigid requirements • Embedded projects: Developed within a set of tight constraints (hardware, software, operational) KLOC is the estimated number of delivered lines (expressed in thousands) of code for project, The coefficients a, b, c and d are given in the following table. Software project a b c d Organic 2.4 1.05 2.5 0.38 Semi- detached 3.0 1.12 2.5 0.35 Embedded 3.6 1.20 2.5 0.32
  • 42. Table 4.2.3 COCOMO coefficients Table In COCOMO model the effort can be calculated as: Effort Applied (E) = a * (KLOC) ^ b (man-months) And duration of the project can be estimated as: Development Time (D) = c* E ^ d (months) Our project ‘Content Based Video Indexing & Image Retrieval’ comes under image processing category. So, a = 2.4, b=1.05, c=2.5, d=0.38 E = 2.4 * (2.223) ^ 1.05 = 5.5526 man-months D = 2.5 * 2.223 ^ 0.38 = 3.3867 months According COCOMO model, the average cost per person month is Rs 10,000 so overall software cost can be estimated as, Software cost = E * 10,000 = 5.5526 * 10,000 = Rs 55,526.00 (approx) 4.3 Time Line Schedule:- From To Task
  • 43. 01-08-2010 06-08-2010 Group Formation and finalization 07-08-2010 13-08-2010 Topic Search and Finalization 14-08-2010 20-08-2010 Preliminary Information Gathering 21-08-2010 27-08-2010 Synopsis Preparation and Submission 28-08-2010 03-09-2010 Project Discussion with Coordinator and Topic Finalization 04-09-2010 10-09-2010 Detailed Literature Survey 11-09-2010 24-09-2010 Algorithm Finalization and Detailed Study 25-09-2010 01-10-2010 Drawing UML diagrams 02-10-2010 08-10-2010 Preparing PPT 09-10-2010 15-10-2010 Preparing Mid Term Report 16-10-2010 26-11-2010 Language Study (Visual C# .NET) 01-01-2011 02-03-2011 Coding and Implementation 03-03-2011 27-04-2011 Documentation Table 4.3 Time Line Schedule Table
  • 44. 4.4 Time Line Chart: Figure 4.4 Time line chart
  • 45. Chapter 8: Conclusion and Future Direction Our proposed system implements a novel classification mechanism for efficiently analyze the brain tumor images using RDTNN classifier. We utilized ROI (Region of Interest) segmentation method for CT image. Using DWT, the key features are extracted; the extracted features are taken as input for RDT to reduce the dimensionality of features. Then the images were trained with KNN classifier. Finally, the proposed algorithm is significantly efficient for classification of the human brain image is benign and malignant with high sensitivity, specificity and accuracy rates. The performance of this study appears some advantages of this technique: it is accurate, robust easy to operate, non-invasive and inexpensive. In future work, we have a plan to explore different types of medicinal images as well as some other application domains and study some formal properties of image features. Reference [1] I. Kononenko, “Machine learning for medical diagnosis: History, state of the art and perspective,” Artif. Intell. Med., vol. 23, no. 1, pp. 89–109, 2001. [2] G. D. Magoulas and A. Prentza, “Machine learning in medical applications,” Mach. Learning Appl. (Lecture Notes Comput. Sci.), Berlin/Heidelberg, Germany: Springer, vol. 2049, pp. 300–307, 2001. [3] L. Breiman, “KNN predictors,” Mach. Learning, vol. 24, no. 2, pp. 123–140, 1996. [4] Y. Freud and R. E. Schapire, “A decision-theoretic generalization of online learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997. [5] T.K.Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 20, no. 8, pp. 832–844, 1998. [6] L. Breiman, “Random forests,” Mach. Learning, vol. 45, pp. 5–32, 2001. [7] L. Rokach and O. Maimon, Data Mining with Decision Trees Theory and Applications (Machine Perception and Artificial Intelligence Series 69). H. Bunke and P. S. P. Wang, Eds. Singapore: World Scientific, 2008. [8] A. L. Prodromidis, S. J. Stolfo, and P. K. Chan, “Effective and efficient pruning of metaclassifiers in a distributed data mining system,”, Columbia Univ., New York, Tech. Rep. CUCS-017-99, 1999. [9] M. Robnik-Sikonja, “Improving random forests,” in Proc. Eur. Conf. Mach. Learning, 2004, pp. 359–369. [10] A. Tsymbal, M. Pechenizkiy, and P. Cunningham, “Dynamic integration with random forests,” in Proc. Eur. Conf. Mach. Learning, vol. 4212, Berlin/Heidelberg, Germany: Springer, 2006. [11] P. Cunningham, “A taxonomy of similarity mechanisms for case-based reasoning,”, University College Dublin, Dublin, Ireland, Tech. Rep. UCDCSI- 2008-01, 2008.
  • 46. [12] H. Hu, J. Li, H. Wang, G. Daggard, and M. Shi, “A maximally diversified multiple decision tree algorithm for microarray data classification,”, presented at the Workshop Intell. Syst. Bioinformat., Hobart, Australia 2006. [13] S. Gunter and H. Bunke, “Optimization of weights in a multiple classifier handwritten word recognition system using a genetic algorithm,” Electron. Letters Comput. Vision Image Anal., pp. 25–41, 2004. [14] E. E. Tripoliti, D. I. Fotiadis, M. Argyropoulou, and G. Manis, “A six stage approach for the diagnosis of the Alzheimer’s disease based on fMRI data,” J. Biomed. Informat., vol. 43, pp. 307–310, 2010. [15] S. Bernard, L. Heutte, and S. Adam, “On the selection of decision trees in random forests,” in Proc. IEEE-ENNS Int. Joint Conf. Neural Netw., 2009, pp. 302–307. [16] E. E. Tripoliti, D. I. Fotiadis, and G. Manis, “Modifications of random forests algorithm,” Data Knowl. Eng., to be published. [17] E. Gatnar, “A diversity measure for tree-based classifier ensembles,” in Data Analysis and Decision Support, D. Baier, et al.., Eds. Heidelberg, Germany: Springer, pp. 30–38, 2005. [18] G. Giacinto, F. Roli, and G. Fumera, “Design of effective multiple classifiers systems by clustering of classifiers,” in Proc. 15th Int. Conf. Pattern Recog., 2000, pp. 160–163. [19] G. Martinez-Munoz and A. Suarez, “Pruning in ordered KNN ensembles,” in Proc. 23rd Int. Conf. Mach. Learning, 2006, pp. 609–616. [20] C. Orrite, M. Rodriquez, F. Martinez, and M. Fairhurst, “Classifier ensemble generation for the majority vote rule,” in Lecture Notes on Computer Science, J. Ruiz-Shulcloper et al., Eds. BerlinHeidelberg, Germany: Springer-Verlag, pp. 340–347, 2008. [21] P. Letinne, O. Bebeir, and C. Decaestecker, “Limiting the number of trees in random forests,” in Lecture Notes on Computer Science. BerlinHeidelberg, Germany: Springer-Verlag, 2001, pp. 178–187. [22] J. Xiao and Ch. He, “Dynamic classifier ensemble selection based on GMDH,” in Proc. Int. Joint Conf. Comput. Sci. Optimization, 2009, pp. 731–734. [23] R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “A comparison of decision tree ensemble creation techniques,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 173–180, Jan. 2007. [24] S. Bernard, L. Heutte, and S. Adam, “Forest-RK: A new random forest induction method,” in Proc. Int. Conf. Intell. Comput. 2008. Lecture Notes in Artificial Intelligence 5227, D.-S. Huang, et al., Eds. Heidelberg, Germany: Springer, 2008a, pp. 430–437. [25] E. E. Tripoliti, D. I. Fotiadis, and G. Manis, “Dynamic construction of random forests: Evaluation using biomedical engineering problems,”, presented at the 10th Int. Conf. Inf. Technol. Appl. Biomed. Corfu, Greece, 2010. [26] G. W. Brier, “Verification of forecasts expressed in terms of probability,” Monthly Weather Review, vol. 78, pp. 1–3, 1950.