SlideShare a Scribd company logo
1 of 19
Classification Technique KNN in
           Data Mining
       ---on dataset “Iris”


      Comp722 data mining
        Kaiwen Qi, UNC
          Spring 2012
Outline
   Dataset introduction
   Data processing
   Data analysis
   KNN & Implementation
   Testing
Dataset
   Raw dataset
    Iris(http://archive.ics.uci.edu/ml/datasets/Iris)




                                                                 5 Attributes
                                                (a) Raw
    150 total records                                                     Sepal length in cm
                                                data                     (continious number)
                                                                            Sepal width in cm
                                                                          (continious number)
                  50 records Iris Setosa                                   Petal length in cm
                                                                          (continious number)

                                                                            Petal width in cm
                  50 records Iris Versicolour                             (continious number)
                                                                                   Class
                                                                        (nominal data:
                  50 records Iris Virginica                                  Iris Setosa
                                                                             Iris Versicolour
                                                                             Iris Virginica)

       (b) Data
                                                             (C) Data
       organization
Classification Goal
   Task
Data Processing
   Original data
Data Processing
• Balanced distribution
Data Analysis
   Statistics
Data Analysis
   Histogram
Data Analysis
   Histogram
KNN
   KNN algorithm




    The unknown data, the green circle, is classified to be square when
    K is 5. The distance between two points is calculated with Euclidean
    distance d(p, q)=         . .In this example, square is the majority
    in 5 nearest neighbors.
KNN
   Advantage
       the skimpiness of implementation. It is good
        at dealing with numeric attributes.
       Does not set up the model and just imports
        the dataset with very low computer overhead.
       Does not need to calculate the useful attribute
        subset. Compared with naïve Bayesian, we
        do not need to worry about lack of available
        probability data
Implementation of KNN
   Algorithm
        Algorithm: KNN. Asses a classification label from training data for an
         unlabeled data
         Input: K, the number of neighbors.
         Dataset that include training data
        Output: A string that indicates unknown tuple’s classification

    Method:
     Create a distance array whose size is K
     Initialize the array with the distances between the unlabeled tuple with
      first K records in dataset
     Let i=k+1
     calculate the distance between the unlabeled tuple with the (k+1)th
      record in dataset, if the distance is greater than the biggest distance in
      the array, replace the old max distance with the new distance; i=i+1
     repeat step (4) until i is greater than dataset size(150)
     Count the class number in the array, the class of biggest number is
      mining result
Implementation of KNN
   UML
Testing
   Testing (K=7, total 150 tuples)
Testing
   Testing (K=7, 60% data as training data)
Testing
   Input random distribution dataset



               Random dataset




      Accuracy test:
Performance
   Comparison
     Decision tree
    Advantage                                    Naïve Bayesian
    • comprehensibility
    • construct a decision tree without any     Advantage
      domain knowledge                          • relatively simply.
    • handle high dimensional                   • By simply calculating
    • By eliminating unrelated attributes         attributes frequency from
      and tree pruning, it simplifies             training datanand without
      classification calculation                  any other operations (e.g.
    Disadvantage                                  sort, search),
    • requires good quality of training data.   Disadvantage
    • usually runs in memory                    • The assumption of
    • Not good at handling continuous             independence is not right
      number features.                          • No available probability data
                                                  to calculate probability
Conclusion
   KNN is a simple algorithm with high
    classification accuracy for dataset with
    continuous attributes.
   It shows high performance with balanced
    distribution training data as input.
Thanks
Question?

More Related Content

What's hot

Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Big data visualization
Big data visualizationBig data visualization
Big data visualizationAnurag Gupta
 
Automatic Blood Vessels Segmentation of Retinal Images
Automatic Blood Vessels Segmentation of Retinal ImagesAutomatic Blood Vessels Segmentation of Retinal Images
Automatic Blood Vessels Segmentation of Retinal ImagesHarish Rajula
 
Data mining in retail industry
Data mining in retail industryData mining in retail industry
Data mining in retail industryMonicaRaveshanker
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extractionskylian
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction systemSWAMI06
 
In-Memory Big Data Analytics
In-Memory Big Data AnalyticsIn-Memory Big Data Analytics
In-Memory Big Data AnalyticsSupreeth M P
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data MiningR A Akerkar
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic conceptsKrish_ver2
 

What's hot (20)

Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
Automatic Blood Vessels Segmentation of Retinal Images
Automatic Blood Vessels Segmentation of Retinal ImagesAutomatic Blood Vessels Segmentation of Retinal Images
Automatic Blood Vessels Segmentation of Retinal Images
 
Data mining in retail industry
Data mining in retail industryData mining in retail industry
Data mining in retail industry
 
Kdd process
Kdd processKdd process
Kdd process
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction system
 
In-Memory Big Data Analytics
In-Memory Big Data AnalyticsIn-Memory Big Data Analytics
In-Memory Big Data Analytics
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Clustering
ClusteringClustering
Clustering
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 

Viewers also liked

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027ankitgadgil
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecturehasanshan
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models774474
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Marekting research applications ppt
Marekting research applications pptMarekting research applications ppt
Marekting research applications pptANSHU TIWARI
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Data Modeling Basics
Data Modeling BasicsData Modeling Basics
Data Modeling Basicsrenuindia
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)JamesDempsey1
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 

Viewers also liked (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
ML KNN-ALGORITHM
ML KNN-ALGORITHMML KNN-ALGORITHM
ML KNN-ALGORITHM
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Marekting research applications ppt
Marekting research applications pptMarekting research applications ppt
Marekting research applications ppt
 
Data mining
Data miningData mining
Data mining
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Data Modeling Basics
Data Modeling BasicsData Modeling Basics
Data Modeling Basics
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Copy Testing
Copy TestingCopy Testing
Copy Testing
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 
Promotion
PromotionPromotion
Promotion
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 

Similar to Classification of Iris Flower Species Using KNN Algorithm

Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniquesGiorgos Bamparopoulos
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
ICCV2009: Max-Margin Ađitive Classifiers for Detection
ICCV2009: Max-Margin Ađitive Classifiers for DetectionICCV2009: Max-Margin Ađitive Classifiers for Detection
ICCV2009: Max-Margin Ađitive Classifiers for Detectionzukun
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
lecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdflecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdfssusercc3ff71
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09bwhitcher
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningVishwas Lele
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 
FUNCTION APPROXIMATION
FUNCTION APPROXIMATIONFUNCTION APPROXIMATION
FUNCTION APPROXIMATIONankita pandey
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachUniversitat de Barcelona
 
Learning Classifier Systems for Class Imbalance Problems
Learning Classifier Systems  for Class Imbalance  ProblemsLearning Classifier Systems  for Class Imbalance  Problems
Learning Classifier Systems for Class Imbalance ProblemsXavier Llorà
 

Similar to Classification of Iris Flower Species Using KNN Algorithm (14)

Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
ICCV2009: Max-Margin Ađitive Classifiers for Detection
ICCV2009: Max-Margin Ađitive Classifiers for DetectionICCV2009: Max-Margin Ađitive Classifiers for Detection
ICCV2009: Max-Margin Ađitive Classifiers for Detection
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
lecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdflecture_RNN Autoencoder.pdf
lecture_RNN Autoencoder.pdf
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
FUNCTION APPROXIMATION
FUNCTION APPROXIMATIONFUNCTION APPROXIMATION
FUNCTION APPROXIMATION
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Learning Classifier Systems for Class Imbalance Problems
Learning Classifier Systems  for Class Imbalance  ProblemsLearning Classifier Systems  for Class Imbalance  Problems
Learning Classifier Systems for Class Imbalance Problems
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Classification of Iris Flower Species Using KNN Algorithm

  • 1. Classification Technique KNN in Data Mining ---on dataset “Iris” Comp722 data mining Kaiwen Qi, UNC Spring 2012
  • 2. Outline  Dataset introduction  Data processing  Data analysis  KNN & Implementation  Testing
  • 3. Dataset  Raw dataset Iris(http://archive.ics.uci.edu/ml/datasets/Iris) 5 Attributes (a) Raw 150 total records Sepal length in cm data (continious number) Sepal width in cm (continious number) 50 records Iris Setosa Petal length in cm (continious number) Petal width in cm 50 records Iris Versicolour (continious number) Class (nominal data: 50 records Iris Virginica Iris Setosa Iris Versicolour Iris Virginica) (b) Data (C) Data organization
  • 5. Data Processing  Original data
  • 7. Data Analysis  Statistics
  • 8. Data Analysis  Histogram
  • 9. Data Analysis  Histogram
  • 10. KNN  KNN algorithm The unknown data, the green circle, is classified to be square when K is 5. The distance between two points is calculated with Euclidean distance d(p, q)= . .In this example, square is the majority in 5 nearest neighbors.
  • 11. KNN  Advantage  the skimpiness of implementation. It is good at dealing with numeric attributes.  Does not set up the model and just imports the dataset with very low computer overhead.  Does not need to calculate the useful attribute subset. Compared with naïve Bayesian, we do not need to worry about lack of available probability data
  • 12. Implementation of KNN  Algorithm  Algorithm: KNN. Asses a classification label from training data for an unlabeled data Input: K, the number of neighbors. Dataset that include training data Output: A string that indicates unknown tuple’s classification Method:  Create a distance array whose size is K  Initialize the array with the distances between the unlabeled tuple with first K records in dataset  Let i=k+1  calculate the distance between the unlabeled tuple with the (k+1)th record in dataset, if the distance is greater than the biggest distance in the array, replace the old max distance with the new distance; i=i+1  repeat step (4) until i is greater than dataset size(150)  Count the class number in the array, the class of biggest number is mining result
  • 14. Testing  Testing (K=7, total 150 tuples)
  • 15. Testing  Testing (K=7, 60% data as training data)
  • 16. Testing  Input random distribution dataset Random dataset Accuracy test:
  • 17. Performance  Comparison Decision tree Advantage Naïve Bayesian • comprehensibility • construct a decision tree without any Advantage domain knowledge • relatively simply. • handle high dimensional • By simply calculating • By eliminating unrelated attributes attributes frequency from and tree pruning, it simplifies training datanand without classification calculation any other operations (e.g. Disadvantage sort, search), • requires good quality of training data. Disadvantage • usually runs in memory • The assumption of • Not good at handling continuous independence is not right number features. • No available probability data to calculate probability
  • 18. Conclusion  KNN is a simple algorithm with high classification accuracy for dataset with continuous attributes.  It shows high performance with balanced distribution training data as input.