2. Best known database to be found in the
pattern recognition literature.
Data set- Iris flower data set(Donated date -
1988-07-01), also known as Fisher's Iris data
set and Anderson's Iris data set b/c Edgar
Anderson collected the data.
It is multivariate(more than 2 dependent
variable) data set Study of three related Iris
flowers species. Data set contain 50 sample
of each species(Iris-Setosa, Iris-Virginica, Iris-
Versicolor)
3. Sepal length in cm
Sepal width in cm
Petal length in cm
Petal width in cm
4. One class is linearly separable from the other
2; the latter are NOT linearly separable from
each other
Missing Attributes Values : None
7. Classify a new flower as belonging to one of
the 3 classes given the 4 features
8.
9. What is data saying ?
( Exploratory data analysis).
We will try to find the answer of the
following questions with the help of all
available asset
10. 1. Descriptive statistics- SD, Min, Max etc
2. Class Distribution (Species counts are
balanced or imbalanced) – Balanced
3. Univariate Plots:- Understand each attribute
better.
11. # Box and whisker plots(Give idea about
distribution of input attributes)
12.
13. 3.2 Distribution of attribute through their
bin, we find the distribution of attribute
follow Gaussian or other distributions
14. Understand the relationships between
attributes & species better. (Which attributes
contributes a lot in classifying species)
15.
16. 1.Using Sepal_Lenght & Sepal_Width features,
we can only distinguish Setosa flower from
others
2.Seperating Versicolor & Virginica is much
harder as they have considerable overlap
3.Hence, Sepal_Lenght & Sepal_Width features
only work well for Setosa
17.
18. 1.Using Petal_Lenght & Petal_Width features,
we can distinguish Setosa, Versicolor &
Virginica fairly
2.There are slightly overlap of Versicolor &
Virginica.
3.Graph shows that Petal (Length and Width)
features are best contributor for Iris Species
as compare to Sepal (Length and Width)
23. 1 Import Library
2 Create Correlation Matrix
3 Spliting the Data Set
But keep in mind;
24. 3.1 Take all the data features
3.2 Take only Sepal Features(Length & Width)
3.3 Take only Petal Features(Length & Width)
3.4 Take all relevant Features from correlation
Matrix
25. 4 Evaluate by using 6 different
Algorithms(Cross Validation)
Here,
4.1 Logistic Regression (LR)
4.2 Linear Discriminant Analysis(LDA)
4.3 K-Nearest Neighbour(KNN)
4.4 Classification and Regression Tree(CART)
4.5 Gaussion Naive Bayes(NB)
4.6 Support Vector Machine
26. 5 Final Evalution (Compare all model
according to features selection and accuracy)
6 Deep Learning
28. Case Features used Best
Model
Train
Accuracy
Test
Accuracy
Missclassified
1 All features in SVM .9899 .9555 2 classes
2 Sepal only SVM .8472 .7111 12
3 Petal only SVM .9899 .9333 3
4 PetalWidth,Sepal
(Len,Wid)
SVM/LDA .9809 .9111 4
5 PetalLen,Sepal
(Len,Wid)
SVM .9700 .9111 4