Your SlideShare is downloading. ×
Project 3 mushrooms
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Project 3 mushrooms

934
views

Published on

Published in: Technology, Self Improvement

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
934
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Mining aMushroom DatasetRaymond BorgesJarilyn Hernandez
  • 2. Outline Background Introduction Hypotheses Methodology Results Conclusions Future Work
  • 3. BackgroundPrevious Work
  • 4. The Mushroom Dataset Hypothetical examples of 23 species from Agaricus and Lepiota families Class attribute: EdibilityEdible(4,208)51.8%Poisonous(3,916)48.2%Data Set Number of Multivariate 8124 Area: LifeCharacteristics: Instances:Attribute Number of Date Categorical 22 1987Characteristics: Attributes: Donated:
  • 5. Benchmark ruleset1. Odor = not almond or anise or none(120 poisonous cases missed, 98.52% accuracy)2. Spore-print-color =green(48 cases missed, 99.41% accuracy)3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown(8 cases missed, 99.90% accuracy)4. Habitat= leaves and cap-color=white or4. Population=clustered and cap-color=white(100% accuracy)
  • 6. The Mushroom Dataset22 Attributes18 Visuallyon Mushroom4 Others1 Habitat1 Population1 Bruises1 Odor
  • 7. Visual Attribute rulesetOnly 4 attrib.(100% accuracy)1. Stalk surface above ring = not silky and ring number = not one, (79% accuracy JRIP)2. Population not clustered(80% accuracy J48)Once retrieved test these two rules:3. Odor = not bad, (98% accuracy J48)4. Spore print color = not green, (100% J48)
  • 8. Results Odor and spore color may be the best attributes statistically but in the field Focused on visual-queue attributes, e.g. habitat, population, cap and stalk Obtained a more practical classification
  • 9. IntroductionProject III
  • 10. IntroductionTaking into account humanBased on:  Lightingconditions  Mushroom stage in lifecycle  Humidity  Seasons  Human senses?  other unknown factors…
  • 11. IntroductionSome attributes difficult to discernTextures, Shapes orColors like: Brown Chocolate Buff Cinnamon
  • 12. Hypotheses1. Complex attributes = Higher error probability2. Human senses + external factors = Big impactSo…Ruleset will change to approach realitySome attributes will fair much better than others
  • 13. Methodology
  • 14. MethodologyCollect survey responses:1. Evaluate species in different conditions2. Measure overall accuracy3. Weight attributes based on survey performance
  • 15. Methodology part 1Take 3 mushroom species Agaricus Abruptibulbus Agaricus Augustus Lepiota RubrotinctaPlace under 2 distinct set of conditions
  • 16. Methodology part 25 questions per species in each conditionAugustusRubrotinctaAbruptibulbus Augustus Rubrotincta Abruptibulbusunder conditions Xunder conditions X under conditions Y under conditions Y
  • 17. Methodology part 3 Design Tutorial (SurveyMonkey.com) Design Website (Weebly.com)Get people to take survey (hardest part) Designed Flyers Poster boards Business cards
  • 18. Survey at Mountainlair
  • 19. Survey at Mountainlair
  • 20. Methodology 4 Calculate survey test scores Calculate species’ accuracy variation Calculate attributes’ accuracy variation Calculate attribute weights Use data mining tools to find best ruleset
  • 21. Weighting Methodology
  • 22. Results
  • 23. Overall Survey Results 30 questions per survey 15 Attributes measured 37 completed surveys 1,110 answered questions Overall A 0Survey Grades B 1 C 7 D 8 F 14 Highest was 24 out of 30 correct answers
  • 24. Results Survey Accuracy per Attribute100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
  • 25. Attribute Accuracy Attribute Variation veil color 37.8 10.8 ring number 59.5 5.4 stalk shape 33.75 18.9 cap shape 32.45 48.7 cap surface 32.45 5.4 cap color 81.1 5.4 gill spacing 78.4 16.2 stalk root 45.95 21.7 stalk color above ring 59.45 64.9 stalk color below ring 67.6 10.8 gill size 36.45 13.5stalk surface below ring 78.4 5.4 ring type 73 5.4stalk surface above ring 63.55 2.7 gill color 63.55 13.5 0 10 20 30 40 50 60 70 80 90 100
  • 26. Weighted Attributes 100 90 80 70 60Weight 50 40 30 20 10 76.7 74.2 69 65.7 61.8 60.3 56.3 55 36 33.7 31.5 30.7 27.4 20.9 16.7 0
  • 27. J48 Tree 99.6% E = EdibleClassification P = Poisonous E P P E P P P Palmond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E E silky scaly fibrous smooth
  • 28. J48 Tree 99.9% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E E E scaly fibrous silky smooth P P P P P P P Eevanescent flaring zone sheathing none large cobwebby pendant
  • 29. Attribute Accuracy 100 90A 80 Cap Color, 10c Stalk Surface Below, 4 Ring Type, 8 70 Stalk Color Below, 9c Stalk Surface Above, 4 60 Ring Number, 3 Stalk Color Above, 9u 50 Stalk Root, 7r 40 Veil Color, 4a 30 Stalk Shape, 2 Cap Surface, 4 Cap Shape, 6c 20y 10 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Complexity
  • 30. Conclusions
  • 31. ConclusionComplex attributes = Higher error probabilityHypothesis 1: FalseThey are actually more accurate the morecomplex the attributeFat spheres = Complex attributesHeight = Survey accuracy
  • 32. ConclusionHuman senses + external factors = Big impactHypothesis 2: True 24% change in correctly identifying attributes due to ambient environment conditions 1.2 questions answered incorrectly out of 5 due to ambient environments of mushrooms
  • 33. Future Work Evaluatemushroom expertise for increase in mushroom attribute identification accuracy Measure Spore print color and Odor in surveys?
  • 34. Questions?