R & data mining in action
Katarzyna Mrowca
Sztuka czytania między
wierszami
czyli język R i Data Mining w akcji
<me>

Katarzyna Mrowca

</me>
The deal 
Agenda
• Quick glance on theory - Data mining
• Exercises on… paper
• Quick glance on tool – R console
• Exercises – became friend with R
•…
Agenda
• Quick glance on theory - Data mining
• Exercises on… paper
• Quick glance on tool – R console
• Exercises – became friend with R
•…

Theory

Exercise
Agenda
• Quick glance on theory - Data preparation
• Exercises
• Regression
• Time series
• Decision trees
• Cluser analysis
Theory
• Text mining
•…

Exercise
Quick glance on theory!
What data mining is?
What „google” says?
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science,
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
What „google” says?
The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
What „google” says?
The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
What „google” says?
Aside from the raw analysis step, it involves database and data
management aspects, data pre-processing, model and inference
considerations, interestingness metrics, complexity considerations,
post-processing of discovered structures, visualization, and online
updating.

Source: wikipedia
Data mining – what is „inside”
• Predictive
• Regression
• Classification
• Collaborative Filtering

• Descriptive
• Clustering / similarity matching
• Association rules and variants
• Deviation detection
Data mining – what is „inside”
• Predictive:
• Regression
• Classification
• Collaborative Filtering

• Descriptive:
• Clustering / similarity matching
• Association rules and variants
• Deviation detection
Data mining – what is „inside”
• Predictive:
• Regression
• Classification
• Collaborative Filtering

• Descriptive:
• Clustering / similarity matching
• Association rules and variants
• Deviation detection
What data mining is not?
Why Data Mining is so
popular?
What is a difference between
statistics and data mining?
Data preparation
Variables
Qualitative & Quantitative
Tame R console!
NetBeans + R

Source: https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide
RHIPE <– R+ Hadoop
Find out more: http://www.datadr.org/
Revolution Analytics <- R +
Hadoop + Enterprise
Find out more: http://www.revolutionanalytics.com
Take a break 
Regression
Time series
Decision trees
Regression trees
Classification trees
K means
Text mining
Thank you!

R & Data mining in action

  • 1.
    R & datamining in action Katarzyna Mrowca
  • 2.
    Sztuka czytania między wierszami czylijęzyk R i Data Mining w akcji
  • 3.
  • 5.
  • 6.
    Agenda • Quick glanceon theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •…
  • 7.
    Agenda • Quick glanceon theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •… Theory Exercise
  • 8.
    Agenda • Quick glanceon theory - Data preparation • Exercises • Regression • Time series • Decision trees • Cluser analysis Theory • Text mining •… Exercise
  • 9.
  • 10.
  • 11.
  • 12.
    What „google” says? Datamining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,
  • 13.
    What „google” says? Datamining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 14.
    What „google” says? Datamining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 15.
    What „google” says? Datamining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 16.
    What „google” says? Datamining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 17.
    What „google” says? Datamining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 18.
    What „google” says? Theoverall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 19.
    What „google” says? Theoverall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 20.
    What „google” says? Theoverall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 21.
    What „google” says? Asidefrom the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Source: wikipedia
  • 22.
    Data mining –what is „inside” • Predictive • Regression • Classification • Collaborative Filtering • Descriptive • Clustering / similarity matching • Association rules and variants • Deviation detection
  • 23.
    Data mining –what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
  • 24.
    Data mining –what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
  • 25.
  • 26.
    Why Data Miningis so popular?
  • 27.
    What is adifference between statistics and data mining?
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    NetBeans + R Source:https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide
  • 33.
    RHIPE <– R+Hadoop Find out more: http://www.datadr.org/
  • 34.
    Revolution Analytics <-R + Hadoop + Enterprise Find out more: http://www.revolutionanalytics.com
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.

Editor's Notes

  • #30 Przykład z kodem pocztowym i numerem telefonu