2. What is R?
R is a system for statistical analysis and graphics
R is freely distributed under the terms of the GNU
Free
To install R, Visit:
https://cran.r-project.org/
R has +15000 packages + Documentation
2 Ali Ghods | Aix-Marseille Université 2018-12-07
3. R Popularity
The number of scholarly articles found in each year by Google Scholar:
(source: http://r4stats.com/articles/popularity/)
3 Ali Ghods | Aix-Marseille Université 2018-12-07
4. Advantages
It seems complex for beginners, but it’s not true because the main feature of R is the
Flexibility
New methods are available sooner
Vast selection of analytic and graphics
It runs on any computer
Powerful object-oriented language
Vast selection of input/output formats, accessibility from Excel, SPSS, SAS, etc.
4 Ali Ghods | Aix-Marseille Université 2018-12-07
6. Find and Install a package
Find packages:
R Archive Network
Statistical Data Analysis
https://awesome-r.com/
Install packages:
installed.packages()
install.packages(”package name”)
update.packages(”package name”)
6 Ali Ghods | Aix-Marseille Université 2018-12-07
7. What are the most Popular Packages in R?
To manipulate data: dplyr, tidyr
To visualize data: ggplot2, plotly
To report results: shiny, xtable, rmarkdown
To analyze data: psych, pls
Network Analysis: igraph
Text-mining: tm, tidytext
7 Ali Ghods | Aix-Marseille Université 2018-12-07
8. Data sets
Your data set from your research
Public data sets
Google: https://toolbox.google.com/datasetsearch
https://github.com/awesomedata/awesome-public-datasets
Data Mock: e.g. https://www.mockaroo.com/
8 Ali Ghods | Aix-Marseille Université 2018-12-07
9. Where to learn R?
R help, package support documents
R for Beginners
Introduction to Probability and Statistics Using R
Introduction à la programmation en R
https://www.r-bloggers.com/
https://rdrr.io/
https://www.rdocumentation.org/
9 Ali Ghods | Aix-Marseille Université 2018-12-07
10. Example: Text analysis, sentiment analysis
Data: A list of 3150 Amazon customers reviews for Alexa Echo, Firestick, Echo Dot etc.
Source: https://www.kaggle.com/sid321axn/amazon-alexa-reviews
Objective:
1 Find out the most frequent words
2 Find out the most positive and negative words
10 Ali Ghods | Aix-Marseille Université 2018-12-07
11. Example
#demanded packages
library(tidytext, dplyr, readr, tokenizers, ggplot2)
#retrieve data
data <- read_tsv("amazon_alexa.tsv")
#tokenization
comments_token <- tokenize_words(data$verified_reviews, lowercase = TRUE, stopwords = TRUE,
strip_numeric = TRUE, strip_punct = TRUE)
#the most frequent words
comments_token <- comments_token %>% anti_join(stop_words) %>% count(word, sort = TRUE)
g <- ggplot(comments_token[1:10,], aes(x = reorder(word, -n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = n), position = position_dodge(0.9), vjust = 0)
plot(g)
11 Ali Ghods | Aix-Marseille Université 2018-12-07
12. Example
#demanded packages
library(tidytext, dplyr, readr, tokenizers, ggplot2)
#retrieve data
data <- read_tsv("amazon_alexa.tsv")
#tokenization
comments_token <- tokenize_words(data$verified_reviews, lowercase = TRUE, stopwords = TRUE,
strip_numeric = TRUE, strip_punct = TRUE)
#the most frequent words
comments_token <- comments_token %>% anti_join(stop_words) %>% count(word, sort = TRUE)
g <- ggplot(comments_token[1:10,], aes(x = reorder(word, -n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = n), position = position_dodge(0.9), vjust = 0)
plot(g)
12 Ali Ghods | Aix-Marseille Université 2018-12-07
13. Example
#Sentiment Analysis
bing_word_counts %>%
filter(n > 50) %>%
mutate(n = ifelse(sentiment == 'negative', -n, n)) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_bar(stat = 'identity') +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ylab('Contribution to sentiment') + ggtitle('Most common positive and negative words')
13 Ali Ghods | Aix-Marseille Université 2018-12-07
14. Example
#Sentiment Analysis
bing_word_counts %>%
filter(n > 50) %>%
mutate(n = ifelse(sentiment == 'negative', -n, n)) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_bar(stat = 'identity') +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ylab('Contribution to sentiment') + ggtitle('Most common positive and negative words')
14 Ali Ghods | Aix-Marseille Université 2018-12-07