R & Data mining in action

619 views
396 views

Published on

Presentation from workshop "R & Data mining in action" given at JDD 2013.
Code samples with description (in Polish): https://gist.github.com/kmrowca/public

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
619
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Przykład z kodem pocztowym i numerem telefonu
  • R & Data mining in action

    1. 1. R & data mining in action Katarzyna Mrowca
    2. 2. Sztuka czytania między wierszami czyli język R i Data Mining w akcji
    3. 3. <me> Katarzyna Mrowca </me>
    4. 4. The deal 
    5. 5. Agenda • Quick glance on theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •…
    6. 6. Agenda • Quick glance on theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •… Theory Exercise
    7. 7. Agenda • Quick glance on theory - Data preparation • Exercises • Regression • Time series • Decision trees • Cluser analysis Theory • Text mining •… Exercise
    8. 8. Quick glance on theory!
    9. 9. What data mining is?
    10. 10. What „google” says?
    11. 11. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,
    12. 12. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
    13. 13. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
    14. 14. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
    15. 15. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
    16. 16. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
    17. 17. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
    18. 18. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
    19. 19. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
    20. 20. What „google” says? Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Source: wikipedia
    21. 21. Data mining – what is „inside” • Predictive • Regression • Classification • Collaborative Filtering • Descriptive • Clustering / similarity matching • Association rules and variants • Deviation detection
    22. 22. Data mining – what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
    23. 23. Data mining – what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
    24. 24. What data mining is not?
    25. 25. Why Data Mining is so popular?
    26. 26. What is a difference between statistics and data mining?
    27. 27. Data preparation
    28. 28. Variables
    29. 29. Qualitative & Quantitative
    30. 30. Tame R console!
    31. 31. NetBeans + R Source: https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide
    32. 32. RHIPE <– R+ Hadoop Find out more: http://www.datadr.org/
    33. 33. Revolution Analytics <- R + Hadoop + Enterprise Find out more: http://www.revolutionanalytics.com
    34. 34. Take a break 
    35. 35. Regression
    36. 36. Time series
    37. 37. Decision trees
    38. 38. Regression trees
    39. 39. Classification trees
    40. 40. K means
    41. 41. Text mining
    42. 42. Thank you!

    ×