Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Project 2 Data Mining Part 1


Published on

Published in: Technology, Sports
  • Be the first to comment

  • Be the first to like this

Project 2 Data Mining Part 1

  1. 1. Project IIData Mining aMushroom DatasetGroup 1Raymond BorgesJarilyn Hernandez
  2. 2. The Mushroom DatasetData Set Number of Multivariate 8124 Area: LifeCharacteristics: Instances:Attribute Number of Date Categorical 22 1987Characteristics: Attributes: Donated:This data set includes descriptions of hypothetical samplescorresponding to 23 species of gilled mushrooms in theAgaricus and Lepiota Family.Each species is identified as definitely edible, definitelypoisonous, or of unknown edibility and not recommended.This latter class was combined with the poisonous one.
  3. 3. Mushroom Dataset 22 Independent attributes 1 Class Attribute (Can you eat it?)Edible(4,208)51.8%Poisonous(3,916)48.2%
  4. 4. Mushroom Dataset22 Attributes Total18 Intrinsicallyon Mushroom4 Others1 Habitat1 Population1 Bruises1 Odor
  5. 5. Odor attribute, 1R LearnerThe Simplest Rule 98.52% Acc.A = almond N = noneC = creosote P = pungentF = foul S = spicyL = anise Y = fishyM = musty a c f l m n p s y
  6. 6. J48 Tree 100% E = EdibleClassification P = Poisonous E P P E P P P Palmond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
  7. 7. Simplest rule-set (Benchmark)These are Poisonous1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% accuracy)2. Spore-print-color =green(48 cases missed, 99.41% accuracy)3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown(8 cases missed, 99.90% accuracy)4. Habitat= leaves and cap-color=white4. May also be population=clustered and cap-color=white(100% accuracy)
  8. 8. Habitat InsightsWaste is safe but stay away from pathsWoods Grasses Leaves Meadows Paths Urban Waste
  9. 9. Population Insights Mushrooms travel safer in groupsAbundant Clustered Numerous Scattered Several Solitary
  10. 10. Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00%Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
  11. 11. Poisonous/Edible Ratiovs. Mushroom Population Density 300.00% 250.00% severalPoisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
  12. 12. Conclusions If it stinks don’t eat it, 98.52% accuracy Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival Odor and spore color may be the best attributes statistically but not in the field
  13. 13. Future Work Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification Eliminate nonvisual attributesFocus on visual-queue attributes, e.g.habitat, population, cap and stalk Compare the two methods