Data Mining 101

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Data Mining 101 - Presentation Transcript

    1. Data Mining 101 George Tziralis, FOSS Conf, June 19 09, Athens, GR
    2. the facts
    3. the facts The data gap
    4. the promise understand and take advantage of the world’s information
    5. the name data mining: statistics at speed, scale and simplicity
    6. what is Databases Statistics Artificial Intelligence
    7. the difference •statistics: define a hypothesis, then test •data mining: test all possible hypotheses •is it possible? YES!
    8. the tasks •classification •association •clustering •prediction
    9. the process •data input & exploration •preprocessing •data mining algorithms •evaluation & intrepretation
    10. an example # color size value buy 1 blue 5.32 b no 2 yellow 8.57 a yes 3 green 1.23 c no 4 yellow 9.35 c yes 5 red 5.99 b yes 6 red 4.43 b yes 7 green 6.21 b no 8 white 4.89 a yes 9 black 5.15 b no 10 green 5.67 b no
    11. an example attribute target # color size value buy 1 blue 5.32 b no 2 yellow 8.57 a yes 3 green 1.23 c no instance 4 yellow 9.35 c yes 5 red 5.99 b yes 6 red 4.43 b yes 7 green 6.21 b no 8 white 4.89 a yes 9 black 5.15 b no 10 green 5.67 b no
    12. so far size 10.0 7.5 5.0 2.5 0
    13. now • if size = [4.0 - 7.0] & value = {b,c} then buy = no
    14. now • If color = yellow then buy = yes • If color = red then buy = yes • If color = white then buy = yes • If color = green then buy = no • If color = blue then buy = no • If color = black then buy = no
    15. ok, cool! but how?
    16. the tool Weka Waikato Environment for Knowledge Analysis OSS, written in Java, providing API
    17. start start -> explorer
    18. explore open file -> data -> contact-lenses.arff
    19. .arff how-to % ARFF file of the example’s data @relation testset @attribute color {blue, yellow, green, red} @attribute size numeric @attribute value {a, b, c} @attribute buy {yes, no} @data blue, 5.32, b, no yellow, 8.57, a, yes green, 1.23, c, no ...
    20. preprocess filter -> ... {tons of filters}
    21. visualize tab “visualize” (per target/class)
    22. visualize tab “preprocess’’ -> visualize all (per class)
    23. select attributes tab “select attributes” (default settings)
    24. classify tab “classify’’ -> rules -> PART -> start!
    25. associate tab “associate’’ -> start! (default settings)
    26. pls tell me more!
    27. the book your data mining & data guide!
    28. thank you gtziralis@gmail.com

    + George TziralisGeorge Tziralis, 5 months ago

    custom

    541 views, 0 favs, 0 embeds more stats

    A presentation I gave earlier today at 4th FOSS Con more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 541
      • 541 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 24
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories