SlideShare a Scribd company logo
Project II
Data Mining a
Mushroom Dataset
Group 1
Raymond Borges
Jarilyn Hernandez
The Mushroom Dataset
Data Set                      Number of
                 Multivariate            8124 Area:           Life
Characteristics:              Instances:
Attribute                    Number of           Date
                 Categorical             22               1987
Characteristics:             Attributes:         Donated:

This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.

Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
Mushroom Dataset
 22 Independent attributes
 1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom

4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond             N = none
C = creosote           P = pungent
F = foul               S = spicy
L = anise              Y = fishy
M = musty




           a   c   f   l    m n      p   s   y
J48 Tree 100%                                                     E = Edible
Classification                                                    P = Poisonous



   E       P           P         E          P                 P        P           P
almond creosote    foul      anise        musty   none pungent spicy              fishy


   E      E        E         E             P          E       E                   E

 black   brown    buff chocolate green orange purple white                    yellow


                                                                              E
                            P                             E
                                                              narrow       broad
                           close         crowded distant

          E            P             E            E           E        E
       abundant clustered numerous scattered several               solitary
Simplest rule-set (Benchmark)
These are Poisonous
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly
 and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white
(100% accuracy)
Habitat Insights
Waste is safe but stay away from paths




Woods   Grasses   Leaves Meadows Paths   Urban   Waste
Population Insights
  Mushrooms travel safer in groups




Abundant Clustered Numerous Scattered   Several   Solitary
Information  Knowledge

         Population Data                                        %Rates vs. Mushrooms
                                                           120.00%

                                                           100.00%

                                                            80.00%

                                                            60.00%

                                                            40.00%

                                                            20.00%

Abundant Clustered Numerous Scattered Several   Solitary     0.00%




                                                                     % Poisonous   % Edible
Poisonous/Edible Ratio
vs. Mushroom Population Density
                         300.00%


                         250.00%
                                                          several
Poisonous/Edible Ratio




                         200.00%


                         150.00%


                         100.00%


                          50.00%           solitary
                                                                        scattered
                                                                                           clustered
                           0.00%                                                    numerous         abundant
                                   0   1              2             3          4          5        6       7

                         -50.00%
                                                             Mushroom Density
Conclusions
 If   it stinks don’t eat it, 98.52% accuracy

 Ifit doesn’t stink and it’s spore color is not
  green then you have a 99.41% chance of
  survival

 Odor  and spore color may be the best
  attributes statistically but not in the field
Future Work
   Use more easily identified attributes to classify
    mushrooms to produce a method of easier
    visual classification

   Eliminate nonvisual attributes

Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk

   Compare the two methods

More Related Content

Viewers also liked

Group7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_FinalGroup7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_Final
Manikandan Sundarapandian
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Pawandeep Kaur
 
Scopus Overview
Scopus OverviewScopus Overview
Scopus Overview
FSC632
 
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
SHPINE TECHNOLOGIES
 
Plagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopPlagiarism for Faculty Workshop
Plagiarism for Faculty Workshop
Cathy Burwell
 
ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014
SHPINE TECHNOLOGIES
 
Why publish in an international journal?
Why publish in an international journal?Why publish in an international journal?
Why publish in an international journal?
Anindito Subagyo
 
Embedded project titles1:2015-2016
Embedded project titles1:2015-2016Embedded project titles1:2015-2016
Embedded project titles1:2015-2016
SHPINE TECHNOLOGIES
 
PROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIESPROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIES
SHPINE TECHNOLOGIES
 
Java course
Java course Java course
Java course
SHPINE TECHNOLOGIES
 
Matlab titles 2015 2016
Matlab titles 2015 2016Matlab titles 2015 2016
Matlab titles 2015 2016
SHPINE TECHNOLOGIES
 
Marshmallow
MarshmallowMarshmallow
Marshmallow
shubham kanojia
 
Android os by jje
Android os by jjeAndroid os by jje
Android os by jje
Jefin Joseph
 
Android ieee project titles 2015 2016
Android ieee project titles 2015 2016Android ieee project titles 2015 2016
Android ieee project titles 2015 2016
SHPINE TECHNOLOGIES
 
Java titles 2015 2016
Java titles 2015 2016Java titles 2015 2016
Java titles 2015 2016
SHPINE TECHNOLOGIES
 
Dot Net Course Syllabus
Dot Net Course SyllabusDot Net Course Syllabus
Dot Net Course Syllabus
SHPINE TECHNOLOGIES
 
Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-C
Daniela Da Cruz
 

Viewers also liked (17)

Group7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_FinalGroup7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_Final
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
Scopus Overview
Scopus OverviewScopus Overview
Scopus Overview
 
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
 
Plagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopPlagiarism for Faculty Workshop
Plagiarism for Faculty Workshop
 
ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014
 
Why publish in an international journal?
Why publish in an international journal?Why publish in an international journal?
Why publish in an international journal?
 
Embedded project titles1:2015-2016
Embedded project titles1:2015-2016Embedded project titles1:2015-2016
Embedded project titles1:2015-2016
 
PROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIESPROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIES
 
Java course
Java course Java course
Java course
 
Matlab titles 2015 2016
Matlab titles 2015 2016Matlab titles 2015 2016
Matlab titles 2015 2016
 
Marshmallow
MarshmallowMarshmallow
Marshmallow
 
Android os by jje
Android os by jjeAndroid os by jje
Android os by jje
 
Android ieee project titles 2015 2016
Android ieee project titles 2015 2016Android ieee project titles 2015 2016
Android ieee project titles 2015 2016
 
Java titles 2015 2016
Java titles 2015 2016Java titles 2015 2016
Java titles 2015 2016
 
Dot Net Course Syllabus
Dot Net Course SyllabusDot Net Course Syllabus
Dot Net Course Syllabus
 
Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-C
 

Recently uploaded

OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Project 2 Data Mining Part 1

  • 1. Project II Data Mining a Mushroom Dataset Group 1 Raymond Borges Jarilyn Hernandez
  • 2. The Mushroom Dataset Data Set Number of Multivariate 8124 Area: Life Characteristics: Instances: Attribute Number of Date Categorical 22 1987 Characteristics: Attributes: Donated: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.
  • 3. Mushroom Dataset  22 Independent attributes  1 Class Attribute (Can you eat it?) Edible(4,208)51.8% Poisonous(3,916)48.2%
  • 4. Mushroom Dataset 22 Attributes Total 18 Intrinsically on Mushroom 4 Others 1 Habitat 1 Population 1 Bruises 1 Odor
  • 5. Odor attribute, 1R Learner The Simplest Rule 98.52% Acc. A = almond N = none C = creosote P = pungent F = foul S = spicy L = anise Y = fishy M = musty a c f l m n p s y
  • 6. J48 Tree 100% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
  • 7. Simplest rule-set (Benchmark) These are Poisonous 1. Odor = not almond or anise or none (120 poisonous cases missed, 98.52% accuracy) 2. Spore-print-color =green (48 cases missed, 99.41% accuracy) 3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown (8 cases missed, 99.90% accuracy) 4. Habitat= leaves and cap-color=white 4. May also be population=clustered and cap-color=white (100% accuracy)
  • 8. Habitat Insights Waste is safe but stay away from paths Woods Grasses Leaves Meadows Paths Urban Waste
  • 9. Population Insights Mushrooms travel safer in groups Abundant Clustered Numerous Scattered Several Solitary
  • 10. Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
  • 11. Poisonous/Edible Ratio vs. Mushroom Population Density 300.00% 250.00% several Poisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
  • 12. Conclusions  If it stinks don’t eat it, 98.52% accuracy  Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival  Odor and spore color may be the best attributes statistically but not in the field
  • 13. Future Work  Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification  Eliminate nonvisual attributes Focus on visual-queue attributes, e.g. habitat, population, cap and stalk  Compare the two methods

Editor's Notes

  1. Pistasvisuales