Iris - Most loved dataset

•Download as PPTX, PDF•

2 likes•3,573 views

DrAsmitaTitre

Iris dataset

Data & Analytics

 Best known database to be found in the
pattern recognition literature.
 Data set- Iris flower data set(Donated date -
1988-07-01), also known as Fisher's Iris data
set and Anderson's Iris data set b/c Edgar
Anderson collected the data.
 It is multivariate(more than 2 dependent
variable) data set Study of three related Iris
flowers species. Data set contain 50 sample
of each species(Iris-Setosa, Iris-Virginica, Iris-
Versicolor)

 Sepal length in cm
 Sepal width in cm
 Petal length in cm
 Petal width in cm

 One class is linearly separable from the other
2; the latter are NOT linearly separable from
each other
 Missing Attributes Values : None

 Class Distribution: 33.3% for each of 3
classes.

Classify a new flower as belonging to one of
the 3 classes given the 4 features

 What is data saying ?
( Exploratory data analysis).
 We will try to find the answer of the
following questions with the help of all
available asset

1. Descriptive statistics- SD, Min, Max etc
2. Class Distribution (Species counts are
balanced or imbalanced) – Balanced
3. Univariate Plots:- Understand each attribute
better.

# Box and whisker plots(Give idea about
distribution of input attributes)

 3.2 Distribution of attribute through their
bin, we find the distribution of attribute
follow Gaussian or other distributions

Understand the relationships between
attributes & species better. (Which attributes
contributes a lot in classifying species)

1.Using Sepal_Lenght & Sepal_Width features,
we can only distinguish Setosa flower from
others
2.Seperating Versicolor & Virginica is much
harder as they have considerable overlap
3.Hence, Sepal_Lenght & Sepal_Width features
only work well for Setosa

1.Using Petal_Lenght & Petal_Width features,
we can distinguish Setosa, Versicolor &
Virginica fairly
2.There are slightly overlap of Versicolor &
Virginica.
3.Graph shows that Petal (Length and Width)
features are best contributor for Iris Species
as compare to Sepal (Length and Width)

1 Import Library
2 Create Correlation Matrix
3 Spliting the Data Set
But keep in mind;

3.1 Take all the data features
3.2 Take only Sepal Features(Length & Width)
3.3 Take only Petal Features(Length & Width)
3.4 Take all relevant Features from correlation
Matrix

 4 Evaluate by using 6 different
Algorithms(Cross Validation)
Here,
4.1 Logistic Regression (LR)
4.2 Linear Discriminant Analysis(LDA)
4.3 K-Nearest Neighbour(KNN)
4.4 Classification and Regression Tree(CART)
4.5 Gaussion Naive Bayes(NB)
4.6 Support Vector Machine

 5 Final Evalution (Compare all model
according to features selection and accuracy)
 6 Deep Learning

Note:-
Testing dataset size(validation size) is small

Case Features used Best
Model
Train
Accuracy
Test
Accuracy
Missclassified
1 All features in SVM .9899 .9555 2 classes
2 Sepal only SVM .8472 .7111 12
3 Petal only SVM .9899 .9333 3
4 PetalWidth,Sepal
(Len,Wid)
SVM/LDA .9809 .9111 4
5 PetalLen,Sepal
(Len,Wid)
SVM .9700 .9111 4

What's hot

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

Decision trees in Machine Learning Mohammad Junaid Khan

Plant disease detection and classification using deep learning JAVAID AHMAD WANI

Exploratory data analysis data visualizationDr. Hamdan Al-Sabri

Decision Tree LearningMilind Gokhale

Linear Discriminant Analysis (LDA)Anmol Dwivedi

$Decision tree induction \ Decision Tree Algorithm with Example| Data science$ $Decision tree induction \ Decision Tree Algorithm with Example| Data science$

Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6

Gradient Boosted treesNihar Ranjan

ClusteringM Rizwan Aqeel

Machine Learning With Python | Machine Learning Algorithms | Machine Learning...Simplilearn

Data preprocessingankur bhalla

L3 some other propertiesMohammad Umar Rehman

Principal Component AnalysisRicardo Wendell Rodrigues da Silveira

Pca pptDheeraj Dwivedi

3.3 hierarchical methodsKrish_ver2

DBSCAN : A Clustering AlgorithmPınar Yahşi

Introduction to R Graphics with ggplot2izahn

Lecture 6: Ensemble Methods Marina Santini

Dimensionality Reductionmrizwan969

Clusters techniquesrajshreemuthiah

What's hot (20)

Classification Based Machine Learning Algorithms

Decision trees in Machine Learning

Plant disease detection and classification using deep learning

Exploratory data analysis data visualization

Decision Tree Learning

Linear Discriminant Analysis (LDA)

$Decision tree induction \ Decision Tree Algorithm with Example| Data science$ $Decision tree induction \ Decision Tree Algorithm with Example| Data science$

Decision tree induction \ Decision Tree Algorithm with Example| Data science

Gradient Boosted trees

Clustering

Machine Learning With Python | Machine Learning Algorithms | Machine Learning...

Data preprocessing

L3 some other properties

Principal Component Analysis

Pca ppt

3.3 hierarchical methods

DBSCAN : A Clustering Algorithm

Introduction to R Graphics with ggplot2

Lecture 6: Ensemble Methods

Dimensionality Reduction

Clusters techniques

Recently uploaded

Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Industrialised data - the key to AI success.pdfLars Albertsson

Digi Khata Problem along complete plan.pptxTanveerAhmed817946

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Ukraine War presentation: KNOW THE BASICSAishani27

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor

Recently uploaded (20)

Predicting Employee Churn: A Data-Driven Approach Project Presentation

Log Analysis using OSSEC sasoasasasas.pptx

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

RA-11058_IRR-COMPRESS Do 198 series of 1998

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Industrialised data - the key to AI success.pdf

Digi Khata Problem along complete plan.pptx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

Decoding Loan Approval: Predictive Modeling in Action

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Ukraine War presentation: KNOW THE BASICS

Call Girls In Mahipalpur O9654467111 Escorts Service

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...

Iris - Most loved dataset

1. Dr Asmita Titre

2.  Best known database to be found in the pattern recognition literature.  Data set- Iris flower data set(Donated date - 1988-07-01), also known as Fisher's Iris data set and Anderson's Iris data set b/c Edgar Anderson collected the data.  It is multivariate(more than 2 dependent variable) data set Study of three related Iris flowers species. Data set contain 50 sample of each species(Iris-Setosa, Iris-Virginica, Iris- Versicolor)

3.  Sepal length in cm  Sepal width in cm  Petal length in cm  Petal width in cm

4.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other  Missing Attributes Values : None

5.  Class Distribution: 33.3% for each of 3 classes.

7. Classify a new flower as belonging to one of the 3 classes given the 4 features

9.  What is data saying ? ( Exploratory data analysis).  We will try to find the answer of the following questions with the help of all available asset

10. 1. Descriptive statistics- SD, Min, Max etc 2. Class Distribution (Species counts are balanced or imbalanced) – Balanced 3. Univariate Plots:- Understand each attribute better.

11. # Box and whisker plots(Give idea about distribution of input attributes)

12.

13.  3.2 Distribution of attribute through their bin, we find the distribution of attribute follow Gaussian or other distributions

14. Understand the relationships between attributes & species better. (Which attributes contributes a lot in classifying species)

15.

16. 1.Using Sepal_Lenght & Sepal_Width features, we can only distinguish Setosa flower from others 2.Seperating Versicolor & Virginica is much harder as they have considerable overlap 3.Hence, Sepal_Lenght & Sepal_Width features only work well for Setosa

17.

18. 1.Using Petal_Lenght & Petal_Width features, we can distinguish Setosa, Versicolor & Virginica fairly 2.There are slightly overlap of Versicolor & Virginica. 3.Graph shows that Petal (Length and Width) features are best contributor for Iris Species as compare to Sepal (Length and Width)

19.

20.

21.

22. (Here comes the beauty of machine)

23. 1 Import Library 2 Create Correlation Matrix 3 Spliting the Data Set But keep in mind;

24. 3.1 Take all the data features 3.2 Take only Sepal Features(Length & Width) 3.3 Take only Petal Features(Length & Width) 3.4 Take all relevant Features from correlation Matrix

25.  4 Evaluate by using 6 different Algorithms(Cross Validation) Here, 4.1 Logistic Regression (LR) 4.2 Linear Discriminant Analysis(LDA) 4.3 K-Nearest Neighbour(KNN) 4.4 Classification and Regression Tree(CART) 4.5 Gaussion Naive Bayes(NB) 4.6 Support Vector Machine

26.  5 Final Evalution (Compare all model according to features selection and accuracy)  6 Deep Learning

27. Note:- Testing dataset size(validation size) is small

28. Case Features used Best Model Train Accuracy Test Accuracy Missclassified 1 All features in SVM .9899 .9555 2 classes 2 Sepal only SVM .8472 .7111 12 3 Petal only SVM .9899 .9333 3 4 PetalWidth,Sepal (Len,Wid) SVM/LDA .9809 .9111 4 5 PetalLen,Sepal (Len,Wid) SVM .9700 .9111 4

Iris - Most loved dataset

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Recently uploaded

Recently uploaded (20)

Iris - Most loved dataset