# R & Data mining in action

Presentation from workshop "R & Data mining in action" given at JDD 2013.
Code samples with description (in Polish): https://gist.github.com/kmrowca/public

• Przykład z kodem pocztowym i numerem telefonu
1. 1. R & data mining in action Katarzyna Mrowca
2. 2. Sztuka czytania między wierszami czyli język R i Data Mining w akcji
3. 3. <me> Katarzyna Mrowca </me>
4. 4. The deal 
5. 5. Agenda • Quick glance on theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •…
6. 6. Agenda • Quick glance on theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •… Theory Exercise
7. 7. Agenda • Quick glance on theory - Data preparation • Exercises • Regression • Time series • Decision trees • Cluser analysis Theory • Text mining •… Exercise
8. 8. Quick glance on theory!
9. 9. What data mining is?
10. 10. What „google” says?
11. 11. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,
12. 12. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
13. 13. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
14. 14. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
15. 15. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
16. 16. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
17. 17. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
18. 18. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
19. 19. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
20. 20. What „google” says? Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Source: wikipedia
21. 21. Data mining – what is „inside” • Predictive • Regression • Classification • Collaborative Filtering • Descriptive • Clustering / similarity matching • Association rules and variants • Deviation detection
22. 22. Data mining – what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
23. 23. Data mining – what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
24. 24. What data mining is not?
25. 25. Why Data Mining is so popular?
26. 26. What is a difference between statistics and data mining?
27. 27. Data preparation
28. 28. Variables
29. 29. Qualitative & Quantitative
30. 30. Tame R console!
31. 31. NetBeans + R Source: https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide
32. 32. RHIPE <– R+ Hadoop Find out more: http://www.datadr.org/
33. 33. Revolution Analytics <- R + Hadoop + Enterprise Find out more: http://www.revolutionanalytics.com
34. 34. Take a break 
35. 35. Regression
36. 36. Time series
37. 37. Decision trees
38. 38. Regression trees
39. 39. Classification trees
40. 40. K means
41. 41. Text mining
42. 42. Thank you!