Project 3 mushrooms

1,374 views

Published on

Published in: Technology, Self Improvement
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,374
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Project 3 mushrooms

  1. 1. Data Mining aMushroom DatasetRaymond BorgesJarilyn Hernandez
  2. 2. Outline Background Introduction Hypotheses Methodology Results Conclusions Future Work
  3. 3. BackgroundPrevious Work
  4. 4. The Mushroom Dataset Hypothetical examples of 23 species from Agaricus and Lepiota families Class attribute: EdibilityEdible(4,208)51.8%Poisonous(3,916)48.2%Data Set Number of Multivariate 8124 Area: LifeCharacteristics: Instances:Attribute Number of Date Categorical 22 1987Characteristics: Attributes: Donated:
  5. 5. Benchmark ruleset1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% accuracy)2. Spore-print-color =green(48 cases missed, 99.41% accuracy)3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown(8 cases missed, 99.90% accuracy)4. Habitat= leaves and cap-color=white or4. Population=clustered and cap-color=white(100% accuracy)
  6. 6. The Mushroom Dataset22 Attributes18 Visuallyon Mushroom4 Others1 Habitat1 Population1 Bruises1 Odor
  7. 7. Visual Attribute rulesetOnly 4 attrib.(100% accuracy)1. Stalk surface above ring = not silky and ring number = not one, (79% accuracy JRIP)2. Population not clustered(80% accuracy J48)Once retrieved test these two rules:3. Odor = not bad, (98% accuracy J48)4. Spore print color = not green, (100% J48)
  8. 8. Results Odor and spore color may be the best attributes statistically but in the field Focused on visual-queue attributes, e.g. habitat, population, cap and stalk Obtained a more practical classification
  9. 9. IntroductionProject III
  10. 10. IntroductionTaking into account humanBased on:  Lightingconditions  Mushroom stage in lifecycle  Humidity  Seasons  Human senses?  other unknown factors…
  11. 11. IntroductionSome attributes difficult to discernTextures, Shapes orColors like: Brown Chocolate Buff Cinnamon
  12. 12. Hypotheses1. Complex attributes = Higher error probability2. Human senses + external factors = Big impactSo…Ruleset will change to approach realitySome attributes will fair much better than others
  13. 13. Methodology
  14. 14. MethodologyCollect survey responses:1. Evaluate species in different conditions2. Measure overall accuracy3. Weight attributes based on survey performance
  15. 15. Methodology part 1Take 3 mushroom species Agaricus Abruptibulbus Agaricus Augustus Lepiota RubrotinctaPlace under 2 distinct set of conditions
  16. 16. Methodology part 25 questions per species in each conditionAugustusRubrotinctaAbruptibulbus Augustus Rubrotincta Abruptibulbusunder conditions Xunder conditions X under conditions Y under conditions Y
  17. 17. Methodology part 3 Design Tutorial (SurveyMonkey.com) Design Website (Weebly.com)Get people to take survey (hardest part) Designed Flyers Poster boards Business cards
  18. 18. Survey at Mountainlair
  19. 19. Survey at Mountainlair
  20. 20. Methodology 4 Calculate survey test scores Calculate species’ accuracy variation Calculate attributes’ accuracy variation Calculate attribute weights Use data mining tools to find best ruleset
  21. 21. Weighting Methodology
  22. 22. Results
  23. 23. Overall Survey Results 30 questions per survey 15 Attributes measured 37 completed surveys 1,110 answered questions Overall A 0Survey Grades B 1 C 7 D 8 F 14 Highest was 24 out of 30 correct answers
  24. 24. Results Survey Accuracy per Attribute100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
  25. 25. Attribute Accuracy Attribute Variation veil color 37.8 10.8 ring number 59.5 5.4 stalk shape 33.75 18.9 cap shape 32.45 48.7 cap surface 32.45 5.4 cap color 81.1 5.4 gill spacing 78.4 16.2 stalk root 45.95 21.7 stalk color above ring 59.45 64.9 stalk color below ring 67.6 10.8 gill size 36.45 13.5stalk surface below ring 78.4 5.4 ring type 73 5.4stalk surface above ring 63.55 2.7 gill color 63.55 13.5 0 10 20 30 40 50 60 70 80 90 100
  26. 26. Weighted Attributes 100 90 80 70 60Weight 50 40 30 20 10 76.7 74.2 69 65.7 61.8 60.3 56.3 55 36 33.7 31.5 30.7 27.4 20.9 16.7 0
  27. 27. J48 Tree 99.6% E = EdibleClassification P = Poisonous E P P E P P P Palmond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E E silky scaly fibrous smooth
  28. 28. J48 Tree 99.9% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E E E scaly fibrous silky smooth P P P P P P P Eevanescent flaring zone sheathing none large cobwebby pendant
  29. 29. Attribute Accuracy 100 90A 80 Cap Color, 10c Stalk Surface Below, 4 Ring Type, 8 70 Stalk Color Below, 9c Stalk Surface Above, 4 60 Ring Number, 3 Stalk Color Above, 9u 50 Stalk Root, 7r 40 Veil Color, 4a 30 Stalk Shape, 2 Cap Surface, 4 Cap Shape, 6c 20y 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Complexity
  30. 30. Conclusions
  31. 31. ConclusionComplex attributes = Higher error probabilityHypothesis 1: FalseThey are actually more accurate the morecomplex the attributeFat spheres = Complex attributesHeight = Survey accuracy
  32. 32. ConclusionHuman senses + external factors = Big impactHypothesis 2: True 24% change in correctly identifying attributes due to ambient environment conditions 1.2 questions answered incorrectly out of 5 due to ambient environments of mushrooms
  33. 33. Future Work Evaluatemushroom expertise for increase in mushroom attribute identification accuracy Measure Spore print color and Odor in surveys?
  34. 34. Questions?

×