Upcoming SlideShare
×

# Data Mining 101

2,577 views

Published on

A presentation I gave earlier today at 4th FOSS Conference in Greece, introducing data mining, its principles and application into a wider public, plus showcasing the use ofweka software for all core data mining purposes.

Published in: Technology
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No

Are you sure you want to  Yes  No
• 101.good start

Are you sure you want to  Yes  No
Views
Total views
2,577
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
109
2
Likes
2
Embeds 0
No embeds

No notes for slide

### Data Mining 101

1. 1. Data Mining 101 George Tziralis, FOSS Conf, June 19 09, Athens, GR
2. 2. the facts
3. 3. the facts The data gap
4. 4. the promise understand and take advantage of the world’s information
5. 5. the name data mining: statistics at speed, scale and simplicity
6. 6. what is Databases Statistics Artiﬁcial Intelligence
7. 7. the difference •statistics: deﬁne a hypothesis, then test •data mining: test all possible hypotheses •is it possible? YES!
8. 8. the tasks •classiﬁcation •association •clustering •prediction
9. 9. the process •data input & exploration •preprocessing •data mining algorithms •evaluation & intrepretation
10. 10. an example # color size value buy 1 blue 5.32 b no 2 yellow 8.57 a yes 3 green 1.23 c no 4 yellow 9.35 c yes 5 red 5.99 b yes 6 red 4.43 b yes 7 green 6.21 b no 8 white 4.89 a yes 9 black 5.15 b no 10 green 5.67 b no
11. 11. an example attribute target # color size value buy 1 blue 5.32 b no 2 yellow 8.57 a yes 3 green 1.23 c no instance 4 yellow 9.35 c yes 5 red 5.99 b yes 6 red 4.43 b yes 7 green 6.21 b no 8 white 4.89 a yes 9 black 5.15 b no 10 green 5.67 b no
12. 12. so far size 10.0 7.5 5.0 2.5 0
13. 13. now • if size = [4.0 - 7.0] & value = {b,c} then buy = no
14. 14. now • If color = yellow then buy = yes • If color = red then buy = yes • If color = white then buy = yes • If color = green then buy = no • If color = blue then buy = no • If color = black then buy = no
15. 15. ok, cool! but how?
16. 16. the tool Weka Waikato Environment for Knowledge Analysis OSS, written in Java, providing API
17. 17. start start -> explorer
18. 18. explore open ﬁle -> data -> contact-lenses.arff
19. 19. .arff how-to % ARFF ﬁle of the example’s data @relation testset @attribute color {blue, yellow, green, red} @attribute size numeric @attribute value {a, b, c} @attribute buy {yes, no} @data blue, 5.32, b, no yellow, 8.57, a, yes green, 1.23, c, no ...
20. 20. preprocess ﬁlter -> ... {tons of ﬁlters}
21. 21. visualize tab “visualize” (per target/class)
22. 22. visualize tab “preprocess’’ -> visualize all (per class)
23. 23. select attributes tab “select attributes” (default settings)
24. 24. classify tab “classify’’ -> rules -> PART -> start!
25. 25. associate tab “associate’’ -> start! (default settings)
26. 26. pls tell me more!
27. 27. the book your data mining & data guide!
28. 28. thank you gtziralis@gmail.com