Project IIData Mining aMushroom DatasetGroup 1Raymond BorgesJarilyn Hernandez
The Mushroom DatasetData Set                      Number of                 Multivariate            8124 Area:           L...
Mushroom Dataset 22 Independent attributes 1 Class Attribute (Can you eat it?)Edible(4,208)51.8%Poisonous(3,916)48.2%
Mushroom Dataset22 Attributes Total18 Intrinsicallyon Mushroom4 Others1 Habitat1 Population1 Bruises1 Odor
Odor attribute, 1R LearnerThe Simplest Rule 98.52% Acc.A = almond             N = noneC = creosote           P = pungentF ...
J48 Tree 100%                                                     E = EdibleClassification                                ...
Simplest rule-set (Benchmark)These are Poisonous1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% a...
Habitat InsightsWaste is safe but stay away from pathsWoods   Grasses   Leaves Meadows Paths   Urban   Waste
Population Insights  Mushrooms travel safer in groupsAbundant Clustered Numerous Scattered   Several   Solitary
Information  Knowledge         Population Data                                        %Rates vs. Mushrooms               ...
Poisonous/Edible Ratiovs. Mushroom Population Density                         300.00%                         250.00%     ...
Conclusions If   it stinks don’t eat it, 98.52% accuracy Ifit doesn’t stink and it’s spore color is not  green then you ...
Future Work   Use more easily identified attributes to classify    mushrooms to produce a method of easier    visual clas...
Upcoming SlideShare
Loading in …5
×

Project 2 Data Mining Part 1

957 views
858 views

Published on

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
957
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Pistasvisuales
  • Project 2 Data Mining Part 1

    1. 1. Project IIData Mining aMushroom DatasetGroup 1Raymond BorgesJarilyn Hernandez
    2. 2. The Mushroom DatasetData Set Number of Multivariate 8124 Area: LifeCharacteristics: Instances:Attribute Number of Date Categorical 22 1987Characteristics: Attributes: Donated:This data set includes descriptions of hypothetical samplescorresponding to 23 species of gilled mushrooms in theAgaricus and Lepiota Family.Each species is identified as definitely edible, definitelypoisonous, or of unknown edibility and not recommended.This latter class was combined with the poisonous one.
    3. 3. Mushroom Dataset 22 Independent attributes 1 Class Attribute (Can you eat it?)Edible(4,208)51.8%Poisonous(3,916)48.2%
    4. 4. Mushroom Dataset22 Attributes Total18 Intrinsicallyon Mushroom4 Others1 Habitat1 Population1 Bruises1 Odor
    5. 5. Odor attribute, 1R LearnerThe Simplest Rule 98.52% Acc.A = almond N = noneC = creosote P = pungentF = foul S = spicyL = anise Y = fishyM = musty a c f l m n p s y
    6. 6. J48 Tree 100% E = EdibleClassification P = Poisonous E P P E P P P Palmond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
    7. 7. Simplest rule-set (Benchmark)These are Poisonous1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% accuracy)2. Spore-print-color =green(48 cases missed, 99.41% accuracy)3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown(8 cases missed, 99.90% accuracy)4. Habitat= leaves and cap-color=white4. May also be population=clustered and cap-color=white(100% accuracy)
    8. 8. Habitat InsightsWaste is safe but stay away from pathsWoods Grasses Leaves Meadows Paths Urban Waste
    9. 9. Population Insights Mushrooms travel safer in groupsAbundant Clustered Numerous Scattered Several Solitary
    10. 10. Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00%Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
    11. 11. Poisonous/Edible Ratiovs. Mushroom Population Density 300.00% 250.00% severalPoisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
    12. 12. Conclusions If it stinks don’t eat it, 98.52% accuracy Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival Odor and spore color may be the best attributes statistically but not in the field
    13. 13. Future Work Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification Eliminate nonvisual attributesFocus on visual-queue attributes, e.g.habitat, population, cap and stalk Compare the two methods

    ×