Data Mining 101

2,577 views

Published on

A presentation I gave earlier today at 4th FOSS Conference in Greece, introducing data mining, its principles and application into a wider public, plus showcasing the use ofweka software for all core data mining purposes.

Published in: Technology
2 Comments
2 Likes
Statistics
Notes
No Downloads
Views
Total views
2,577
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
109
Comments
2
Likes
2
Embeds 0
No embeds

No notes for slide

Data Mining 101

  1. 1. Data Mining 101 George Tziralis, FOSS Conf, June 19 09, Athens, GR
  2. 2. the facts
  3. 3. the facts The data gap
  4. 4. the promise understand and take advantage of the world’s information
  5. 5. the name data mining: statistics at speed, scale and simplicity
  6. 6. what is Databases Statistics Artificial Intelligence
  7. 7. the difference •statistics: define a hypothesis, then test •data mining: test all possible hypotheses •is it possible? YES!
  8. 8. the tasks •classification •association •clustering •prediction
  9. 9. the process •data input & exploration •preprocessing •data mining algorithms •evaluation & intrepretation
  10. 10. an example # color size value buy 1 blue 5.32 b no 2 yellow 8.57 a yes 3 green 1.23 c no 4 yellow 9.35 c yes 5 red 5.99 b yes 6 red 4.43 b yes 7 green 6.21 b no 8 white 4.89 a yes 9 black 5.15 b no 10 green 5.67 b no
  11. 11. an example attribute target # color size value buy 1 blue 5.32 b no 2 yellow 8.57 a yes 3 green 1.23 c no instance 4 yellow 9.35 c yes 5 red 5.99 b yes 6 red 4.43 b yes 7 green 6.21 b no 8 white 4.89 a yes 9 black 5.15 b no 10 green 5.67 b no
  12. 12. so far size 10.0 7.5 5.0 2.5 0
  13. 13. now • if size = [4.0 - 7.0] & value = {b,c} then buy = no
  14. 14. now • If color = yellow then buy = yes • If color = red then buy = yes • If color = white then buy = yes • If color = green then buy = no • If color = blue then buy = no • If color = black then buy = no
  15. 15. ok, cool! but how?
  16. 16. the tool Weka Waikato Environment for Knowledge Analysis OSS, written in Java, providing API
  17. 17. start start -> explorer
  18. 18. explore open file -> data -> contact-lenses.arff
  19. 19. .arff how-to % ARFF file of the example’s data @relation testset @attribute color {blue, yellow, green, red} @attribute size numeric @attribute value {a, b, c} @attribute buy {yes, no} @data blue, 5.32, b, no yellow, 8.57, a, yes green, 1.23, c, no ...
  20. 20. preprocess filter -> ... {tons of filters}
  21. 21. visualize tab “visualize” (per target/class)
  22. 22. visualize tab “preprocess’’ -> visualize all (per class)
  23. 23. select attributes tab “select attributes” (default settings)
  24. 24. classify tab “classify’’ -> rules -> PART -> start!
  25. 25. associate tab “associate’’ -> start! (default settings)
  26. 26. pls tell me more!
  27. 27. the book your data mining & data guide!
  28. 28. thank you gtziralis@gmail.com

×