SlideShare a Scribd company logo
1 of 103
Introduction to
Vitomir Kovanović
University of South Australia
#vkovanovic
vitomir.kovanovic.info
Vitomir.Kovanovic@unisa.edu.au
Download tutorial materials
http://bit.ly/lasi18r
About me
• Learning analytics researcher
• Research Fellow and Data Scientist
University of South Australia, Adelaide
• 5+ years of experience with R
• 15+ years programming experience (Java/Python/Bash/R)
• Member of the SoLAR executive board (organizers of LASI/LAK)
Outline of this talk
• Overview of R (30min)
• Installing R and R Studio
• Overview of R Studio
• R language overview
• Datatypes
• Data frames & Factors
• Loading/saving data frames into CSV files
• Statistical Examples
• T-test
• One-way ANOVA
• Multiple regression
Outline of this talk
• Overview of R (30min)
• Installing R and R Studio
• Overview of R Studio
• R language overview
• Datatypes
• Data frames & Factors
• Loading/saving data frames into CSV files
• Statistical Examples
• T-test
• One-way ANOVA
• Multiple regression
What is R?
• Vector-based programming language and interactive environment
for data analysis
• Popular with statistical researchers
• Large number of software packages/libraries for statistics, data
mining, machine learning, visualisation
• Successor to S programming language from Bell Labs (1976)
• Free software version of S (“GNU S”)
• Support multiple programming paradigms (functional, procedural,
object oriented)
• Increasingly used in Learning Analytics field
Learning R
• Using R is a bit akin to smoking.
• The beginning is difficult, one may get headaches and even gag the
first few times.
• In the long run, it becomes pleasurable and even addictive.
• Yet, deep down, for those willing to be honest, there is something
not fully healthy in it.
– Francois Pinard
How to learn R
• Not like any “regular” programming language (Java/Python)
• R is a domain-specific language for statistical analysis
• Very quirky, many hacks
• In the right hands, provides endless power
Good and bad sides to R
• The best thing about R is that it was written by statisticians. - Bow
Cowgill, Google
• The worst thing about R is that it was written by statisticians. - Bow
Cowgill, Google
• Developers and statisticians have different priorities
• They are limited in their ability to change their environment.
• They have to rely on algorithms that have been developed for them.
• The way they approach a problem is constrained by how SAS/SPSS
employed programmers thought to approach them.
• And they have to pay money to use these constraining algorithms.
=
Muggle
SPSS users are like muggles
• They can rely on functions (spells) that have been developed for them
by statistical researchers, but they can also create their own.
• They don’t have to pay for the use of them, and once experienced
enough (like Dumbledore), they are almost unlimited in their ability to
change their environment.
=
R users are like wizards
Wizard
Best way to learn R
• Learn R in parallel with learning statistics
• Try to use it for your next research project
• Enrol in a stats/R MOOC
• lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
• Read one of the R tutorials:
• Quick-R: www.statmethods.net
• R-Bloggers: www.r-bloggers.com
• Read a good book about R & statistics
• www-bcf.usc.edu/~gareth/ISL/
• Read THE BOOK about R and statistics (and life in general)
• au.sagepub.com/en-gb/oce/discovering-statistics-using-r/book236067
Immensely popular
Why is R so popular
• Out of the box statistical methods (e.g., regression)
• Vector-based language
• Simpler to use than Python + NumPy/ SciPy/ Pandas
• Handling of missing data
• Superb plotting
• Cutting edge research packages
• Most problems can be solved with an obscure single line of code
• R Studio is a great tool for programming in R
R Studio IDE
• Fantastic development
environment
• Made R much more
accessible especially for
beginners
• Alternatives: Eclipse,
Emacs (ESS), Vim
Linear regression example in R
speed (mph) dist (ft)
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
11 11 28
12 12 14
13 12 20
14 12 24
15 12 28
16 13 26
17 13 34
18 13 34
19 13 46
20 14 26
summary(lm(dist ~ speed, data=cars))
Call:
lm(formula = dist ~ speed, data = cars)
Residuals:
Min 1Q Median 3Q Max
-29.069 -9.525 -2.272 9.215 43.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.5791 6.7584 -2.601 0.0123 *
speed 3.9324 0.4155 9.464 1.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Ezekiel, M. (1930) Methods of
Correlation Analysis. Wiley.
Outline of this talk
• Overview of R (30min)
• Installing R and R Studio
• Overview of R Studio
• R language overview
• Datatypes
• Data frames & Factors
• Loading/saving data frames into CSV files
• Statistical Examples
• T-test
• One-way ANOVA
• Multiple regression
Installing R
Comprehensive R Archive Network (CRAN) – https://cran.r-project.org
Take a look at the glorious R interface
Installing R studio
Outline of this talk
• Overview of R (30min)
• Installing R and R Studio
• Overview of R studio
• R language overview
• Datatypes
• Data frames & Factors
• Loading/saving data frames into CSV files
• Statistical Examples
• T-test
• One-way ANOVA
• Multiple regression
Overview of R Studio
Run Hello World as a muggle
Run Hello World as a wizard
Two types of R source code
• “Regular” R code (extension .R)
• “RMarkdown” source code (extension .Rmd)
• A mix of Markdown text and R code blocks
• Rmarkdown has a special compiler Knitr
• How Knitr works:
• From .Rmd file, each R code block is sent to R process to get its output
• Markdown text + R codeblocks and/or results are joined to produce temporary .Md file
• Temporary .Md file is processed by markdown translator to generate HTML, PDF, DOCX
• Good for producing tutorials and step by step instructions
Example Rmarkdown
Getting help
Installing a package library
Use library in your code: library
• Simply include the library (typically done at the top of the file)
library(psych)
Outline of this talk
• Overview of R (30min)
• Installing R and R Studio
• Overview of R studio
• R language overview
• Statistical Examples
• T-test
• One-way ANOVA
• Multiple regression
R variable declarations: <-
• variable.name instead of
• variable_name (C, Python)
• variableName (Java/C#)
• VariableName (VisualBasic)
• Don’t use equals sign to assign a value!
# this is how variables are declared in R
test.variable <- "some text"
number.variable <- 123.45
boolean.variable <- TRUE
another.boolean.variable <- FALSE
missing.value <- NA
Calling functions
• rnorm: generate random samples from a normal distribution
three.random.numbers <- rnorm(3)
print(three.random.numbers)
## [1] 0.9826734 -0.8985821 0.5707538
million.random.numbers <- rnorm(1000000)
print(mean(million.random.numbers))
## [1] -0.001602627
print(sd(million.random.numbers))
## [1] 0.9998751
Required and optional arguments
• The definition of rnorm function
• Two types of arguments:
• Required: must be specified
• Optional: may be specified
they have default values
rnorm(n, mean = 0, sd = 1)
Invoking functions with optional arguments
new.million.random.numbers <- rnorm(1000000, 5, 10)
print(mean(new.million.random.numbers))
## [1] 4.992608
print(sd(new.million.random.numbers))
## [1] 9.986131
Use names for optional arguments to make code clearer
• Much easier to read and understand
new.million.random.numbers <- rnorm(1000000, mean=5, sd=10)
new.million.random.numbers <- rnorm(1000000, 5, 10)
Names enable “skipping” of other optional arguments
new.million.random.numbers <- rnorm(1000000, 10)
new.million.random.numbers <- rnorm(1000000, 0, 10)
new.million.random.numbers <- rnorm(1000000, sd=10)
• mean argument is 10
• sd argument is 10
• We can skip mean argument by naming sd argument
R vectors: c( , , ,)
# this is how vectors and lists are declared in R
number.vector <- c(1, 2, 3)
print(number.vector[1])
## [1] 1
print(number.vector[3])
## [1] 3
Can only be of a single type!
text.vector <- c("New", "York", "City")
print(text.vector[1])
## [1] "New"
print(text.vector[3])
## [1] "City"
R vectors: c( , , ,)
R vectors shorthands: simple ranges with X:Y notation
• Generate all numbers in a range: N1:N2
ten.numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
print(ten.numbers)
## [1] 1 2 3 4 5 6 7 8 9 10
ten.numbers <- 1:10
print(ten.numbers)
## [1] 1 2 3 4 5 6 7 8 9 10
R vectors shorthands: ranges with a step size using seq()
• More complex sequence with an increment of n: seq(from, to, by)
odd.numbers <- c(1, 3, 5, 7, 9)
print(odd.numbers)
## [1] 1 3 5 7 9
odd.numbers <- seq(1, 9, by=2)
print(odd.numbers)
## [1] 1 3 5 7 9
R vectors shorthands: repeat a value using rep()
• Repeat a value n times: rep(value, n)
five.ones <- c(1, 1, 1, 1, 1)
print(five.ones)
## [1] 1 1 1 1 1
five.ones <- rep(1, times=5)
print(five.ones)
## [1] 1 1 1 1 1
R is vector-based language
• In R, everything is a vector! This is why [1] is printed in front of
every value
• If you print a longer vector, you will see more numbers in brackets,
they just indicate a position of the next item (R is 18th element)
print(5)
## [1] 5
print(LETTERS)
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
Vectors can be manipulated using vector operations
• Vectors can be manipulated using vector operations
print(c(1, 2, 3) + c(10, 20, 30))
## [1] 11 22 33
Vectors get expanded
• Shorter vectors get expanded to match the longer one
• This is equivalent to
print(1 + c(10, 20, 30))
## [1] 11 21 31
print(c(1, 1, 1) + c(10, 20, 30))
## [1] 11 21 31
Warning if lengths don’t “match”.
• This produces a warning
• This does not produce a warning
print(c(1, 2) + c(10, 20, 30))
## Warning in c(1, 2) + c(10, 20, 30): longer object length
is not a multiple of shorter object length
## [1] 11 22 31
print(c(1, 2) + c(10, 20, 30, 40))
## [1] 11 22 31 42
If statement
jedi <- "Anakin"
if(jedi == "Anakin") {
print("He is the Chosen One")
} else {
print("He is not the Chosen One")
}
## [1] "He is the Chosen One"
For loop
for (jedi in c("Yoda", "Obi-Wan", "Anakin", "Luke")) {
print(jedi)
}
## [1] "Yoda"
## [1] "Obi-Wan"
## [1] "Anakin"
## [1] "Luke"
Always ask yourself do you really need a loop!
For loops should be avoided
Print is a vectorized function!
print(c("Yoda", "Obi-Wan", "Anakin", "Luke"))
## [1] "Yoda" "Obi-Wan" "Anakin" "Luke"
Data frames and Factors
Data frames
• Tabular data structures
jedis <- data.frame(name=c("Yoda", "Obi-Wan", "Anakin", "Luke"),
age=c(900, 58, 45, NA))
print(jedis)
## name age
## 1 Yoda 900
## 2 Obi-Wan 58
## 3 Anakin 45
## 4 Luke NA
Accessing Data frames
Access notation: variable[row, column]
• By rows
print(jedis[1, ])
## name age
## 1 Yoda 900
print(jedis[1:2, ])
## name age
## 1 Yoda 900
## 2 Obi-Wan 58
print(jedis[-c(1, 2), ])
## name age
## 3 Anakin 45
## 4 Luke NA
Accessing Data frames
• By columns
print(jedis[ , 1])
## [1] Yoda Obi-Wan Anakin Luke
## Levels: Anakin Luke Obi-Wan Yoda
print(jedis[ ,"age"])
## [1] 900 58 45 NA
print(jedis[ , c("name", "age")])
## name age
## 1 Yoda 900
## 2 Obi-Wan 58
## 3 Anakin 45
## 4 Luke NA
Accessing Data frames
• Row and column selectors can be combined
print(jedis[1:2, "age"])
## [1] 900 58
Accessing Data frames
• Special notation variable$column
print(jedis$name)
## [1] Yoda Obi-Wan Anakin Luke
## Levels: Anakin Luke Obi-Wan Yoda
print(jedis$age)
## [1] 900 58 45 NA
Adding new column
• Easiest with column notation:
jedis$side <- "Light"
print(jedis)
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 3 Anakin 45 Light
## 4 Luke NA Light
jedis$side[3] <- "Dark“
print(jedis)
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 3 Anakin 45 Dark
## 4 Luke NA Light
Data frames have row names and column names
print(jedis)
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 3 Anakin 45 Dark
## 4 Luke NA Light
colnames(jedis) <- c("jedi.name", "death.age", "force.side")
rownames(jedis) <- c(“jedi.1", "jedi.2", "jedi. 3", “jedi.4")
print(jedis)
## jedi.name death.age force.side
## jedi.1 Yoda 900 Light
## jedi.2 Obi-Wan 58 Light
## jedi.3 Anakin 45 Dark
## jedi.4 Luke NA Light
Data frames have row names and column names
• Access by name works also for rows, not just columns
print(jedis[“jedi.1",])
## jedi.name death.age force side
## jedi.1 Yoda 900 Light
Filtering data frame rows based on a condition
• We can use a Boolean vector to filter by rows
print(jedis[c(TRUE, TRUE, FALSE, TRUE),])
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 4 Luke NA Light
Filtering data frame rows based on a condition
• Data frame columns are vectors!
• == operation on vectors returns a Boolean vector
• Which means we can write filtering very naturally as:
print(jedis$side)
## [1] "Light" "Light" "Dark" "Light"
print(jedis$side == "Light")
## [1] TRUE TRUE FALSE TRUE
print(jedis[jedis$side == "Light",])
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 4 Luke NA Light
Filtering data frame rows based on a condition
• Data frame columns are vectors!
• == operation on vectors returns a Boolean vector
• Which means we can write filtering very naturally as:
print(jedis$side)
## [1] "Light" "Light" "Dark" "Light"
print(jedis$side == "Light")
## [1] TRUE TRUE FALSE TRUE
print(jedis[c(TRUE, TRUE, FALSE, TRUE),])
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 4 Luke NA Light
Factors: categorical variables
• Categorical variables: can have levels (input) and labels (output)
• We need to specify the list of potential scores in the input:
# five-level Likert-scale responses to q1
q1 <- factor(c(1, 3, 2, 4, 4, 4))
print(q1)
## [1] 1 3 2 4 4 4
## Levels: 1 2 3 4
q1 <- factor(c(1, 3, 2, 4, 4, 4),
levels=c(1,2,3,4,5))
print(q1)
## [1] 1 3 2 4 4 4
## Levels: 1 2 3 4 5
Factors: categorical variables
• Factors can be also textual
• We need to specify all possible input values
q1 <- factor(c("SD", "N", "D", "A", "A", "A"))
print(q1)
## [1] SD N D A A A
## Levels: A D N SD
q1 <- factor(c("SD", "N", "D", "A", "A", "A"),
levels=c("SD", "D", "N", "A", "SA"))
print(q1)
## [1] SD N D A A A
## Levels: SD D N A SA
Factors: categorical variables
• The visuals of the factor can be altered
q1 <- factor(c(1, 3, 2, 4, 4, 4),
levels=c(1, 2, 3, 4, 5),
labels=c("SD", "D", "N", "A", "SA"))
print(q1)
## [1] SD N D A A A
## Levels: SD D N A SA
q1 <- factor(c("SD", "N", "D", "A", "A", "A"),
levels=c("SD", "D", "N", "A", "SA"),
labels=c(1, 2, 3, 4, 5))
print(q1)
## [1] 1 3 2 4 4 4
## Levels: 1 2 3 4 5
Data frames revisited
• data.frame function creates factors from textual columns automatically
jedis <- data.frame(name=c("Yoda", "Obi-Wan", "Anakin", "Luke"),
age=c(900, 58, 45, NA))
print(jedis$name)
## [1] Yoda Obi-Wan Anakin Luke
## Levels: Anakin Luke Obi-Wan Yoda
Data frames revisited
• data.frame function creates factors from textual columns automatically
jedis <- data.frame(name=c("Yoda", "Obi-Wan", "Anakin", "Luke"),
age=c(900, 58, 45, NA),
stringsAsFactors = FALSE)
print(jedis$name)
## [1] "Yoda" "Obi-Wan" "Anakin" "Luke"
Writing and reading data frames to/from files
• Writing data using write.csv() function
• Read data from csv file using read.csv() function
write.csv(jedis, file="jedi_masters.csv", row.names=FALSE)
jedis <- read.csv("jedi_masters.csv", stringsAsFactors=FALSE)
print(jedis)
## name age side
## 1 Yoda 900 Light
## 2 Obi-Wan 58 Light
## 3 Anakin 45 Dark
## 4 Luke NA Light
Work directory
• Work directory: the directory from which R is started
• getwd() – returns the current working directory
• setwd(directory.path) sets the working directory to directory.path
• When reading/writing files seems not to work, always check work
directory with getwd()!
getwd() and setwd() in practice
Setting work directory in R Studio
Some useful functions to know
Create cross tabulation with table()
print(table(jedis$side))
##
## Dark Light
## 1 3
Concatenate vectors as text using paste()/paste0()
print(paste("Participant", 1:10))
## [1] "Participant 1" "Participant 2" "Participant 3" "Participant 4"
## [5] "Participant 5" "Participant 6" "Participant 7" "Participant 8"
## [9] "Participant 9" "Participant 10"
rownames(jedis) <- c("jedi.1", "jedi.2", "jedi.3", "jedi.4")
rownames(jedis) <- paste0("jedi.", 1:4)
Show excerpt from a data frame using head()
midwest <- read.csv("http://goo.gl/G1K41K") # yes, read.csv also works with URLs
print(head(midwest[, c("PID", "area", "poptotal", "popdensity", "state")]))
## PID area poptotal popdensity state
## 1 561 0.052 66090 1270.9615 IL
## 2 562 0.014 10626 759.0000 IL
## 3 563 0.022 14991 681.4091 IL
## 4 564 0.017 30806 1812.1176 IL
## 5 565 0.018 5836 324.2222 IL
## 6 566 0.050 35688 713.7600 IL
Plotting with ggplot2
ggplot2 – a superb plotting library
• Built upon the Grammar of Graphics ideas
• Each element of a plot is a separate function
• Chaining the functions produces the final plot
• Extremely powerful!
• Even ported to Python (ggplot python library)
ggplot2 example
midwest <- read.csv("http://goo.gl/G1K41K") # yes, read.csv also works with URLs
print(head(midwest[, c("PID", "area", "poptotal", "popdensity", "state")]))
## PID area poptotal popdensity state
## 1 561 0.052 66090 1270.9615 IL
## 2 562 0.014 10626 759.0000 IL
## 3 563 0.022 14991 681.4091 IL
## 4 564 0.017 30806 1812.1176 IL
## 5 565 0.018 5836 324.2222 IL
## 6 566 0.050 35688 713.7600 IL
ggplot2 example
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method="loess", se=F) +
xlim(c(0, 0.1)) +
ylim(c(0, 500000)) +
labs(subtitle="Area Vs Population",
y="Population",
x="Area",
title="Scatterplot",
caption = "Source: midwest")
plot(gg)
Formulas
Mini-language for statistical model specification
Formula notation
~ splits dependent and independent variables
Formula Mathematical equation Comment
b ~ a b = c0 + c1*a Simple regression
Formula notation
+ used to add a variable to the model
Formula Mathematical equation Comment
c ~ a + b c = c0 + c1*a + c2*b Multiple regression
Formula notation
: used to specify an interaction term
Formula Mathematical equation Comment
c ~ a + b + a:b c = c0 + c1*a + c2*b + c3*a*b Multiple regression with AB
interaction term
d ~ a + b + c + a:b + b:c + a:c +
a:b:c
d = c0 + c1*a + c2*b + c3*c +
c4*a*b + c5*a*c + c6*b*c +
c7*a*b*c
Multiple regression with AB, BC,
AC, and ABC interaction terms
Formula notation
* adds variables and their interactions
Formula Mathematical equation Comment
c ~ a * b c = c0 + c1*a + c2*b + c3*a*b Multiple regression with AB
interaction term
d ~ a * b * c d = c0 + c1*a + c2*b + c3*c +
c4*a*b + c5*a*c + c6*b*c +
c7*a*b*c
Multiple regression with AB, BC,
AC, and ABC interaction terms
Formula notation
. adds all available variables not already mentioned in the formula
Formula Mathematical equation Comment
z ~ a + b + c + d + e + f + g + h
+ i + j + k + l + m + n + o + p
+ q + r + s + t + u + v + w + x
+ y
z = c0 + c1*a + c2*b + c3*c … +
c25*y
Multiple regression
z ~ . z = c0 + c1*a + c2*b + c3*c … +
c25*y
Multiple regression
Formula notation
^n used to limit interactions up to n-th level
Formula Mathematical equation Comment
d ~ (a + b + c)^2 d = c0 + c1*a + c2*b + c3*c +
c4*a*b + c5*a*c + c6*b*c
Multiple regression with AB, AC,
and BC interaction terms (but not
ABC)
Formula notation
- is used to remove an interaction term
Formula Mathematical equation Comment
d ~ a * b * c – a:c d = c0 + c1*a + c2*b + c3*c +
c4*a*b + c6*b*c +
c7*a*b*c
Multiple regression with all
interaction terms except a:c
d ~ a * b * c – a:b:c d = c0 + c1*a + c2*b + c3*c +
c4*a*b + c5*a*c + c6*b*c
Multiple regression with AB, AC,
and BC interaction terms (but not
ABC)
Formula notation
I() is used to interpret content “as-is” (90% of the time used for polynomials)
Formula Mathematical equation Comment
b ~ a + I(a^2) b = c0 + c1*a + c2*a2
Simple polynomial regression
Formula notation
+0 is used to remove intercept term
Formula Mathematical equation Comment
b ~ 0 + a b = c0*a Simple regression without intercept
Statistical text examples
(it was about time)
Independent t-test example
ind.t.test.data <- read.csv("./data/independent_t_test_data.csv")
print(head(ind.t.test.data))
## weight group
## 1 4.81 group 1
## 2 4.17 group 1
## 3 4.41 group 1
## 4 3.59 group 1
## 5 5.87 group 1
## 6 3.83 group 1
Independent t-test example
ind.t.test.result <- t.test(weight~group, data = ind.t.test.data)
print(ind.t.test.result)
##
## Welch Two Sample t-test
##
## data: weight by group
## t = -3.0101, df = 14.104, p-value = 0.009298
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.4809144 -0.2490856
## sample estimates:
## mean in group 1 mean in group 2
## 4.661 5.526
Paired t-test example
paired.t.test.data <- read.csv("./data/paired_t_test_data.csv")
print(head(paired.t.test.data))
## weight time
## 1 4.81 pre
## 2 4.17 pre
## 3 4.41 pre
## 4 3.59 pre
## 5 5.87 pre
## 6 3.83 pre
Paired t-test example
paired.t.test.result <- t.test(weight~time, data = paired.t.test.data, paired=TRUE)
print(paired.t.test.result)
##
## Paired t-test
##
## data: weight by time
## t = 2.8464, df = 9, p-value = 0.0192
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1775354 1.5524646
## sample estimates:
## mean of the differences
## 0.865
Multiple regression example
reg.data <- read.csv("./data/regression_data.csv")
print(head(reg.data))
## model mpg disp hp wt
## 1 Mazda RX4 21.0 160 110 2.620
## 2 Mazda RX4 Wag 21.0 160 110 2.875
## 3 Datsun 710 22.8 108 93 2.320
## 4 Hornet 4 Drive 21.4 258 110 3.215
## 5 Hornet Sportabout 18.7 360 175 3.440
## 6 Valiant 18.1 225 105 3.460
Multiple regression example
print(describe(reg.data))
## vars n mean sd median trimmed mad min max range
## model* 1 32 16.50 9.38 16.50 16.50 11.86 1.00 32.00 31.00
## mpg 2 32 20.09 6.03 19.20 19.70 5.41 10.40 33.90 23.50
## disp 3 32 230.72 123.94 196.30 222.52 140.48 71.10 472.00 400.90
## hp 4 32 146.69 68.56 123.00 141.19 77.10 52.00 335.00 283.00
## wt 5 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91
## skew kurtosis se
## model* 0.00 -1.31 1.66
## mpg 0.61 -0.37 1.07
## disp 0.38 -1.21 21.91
## hp 0.73 -0.14 12.12
## wt 0.42 -0.02 0.17
Multiple regression example
regression.model <- lm(mpg~disp+hp+wt, data = reg.data)
print(summary(regression.model))
##
## Call:
## lm(formula = mpg ~ disp + hp + wt, data = reg.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.891 -1.640 -0.172 1.061 5.861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.105505 2.110815 17.579 < 2e-16 ***
## disp -0.000937 0.010350 -0.091 0.92851
## hp -0.031157 0.011436 -2.724 0.01097 *
## wt -3.800891 1.066191 -3.565 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8083
## F-statistic: 44.57 on 3 and 28 DF, p-value: 8.65e-11
Multiple regression example: standardized coefficients
reg.data.scaled <- lapply(reg.data[,2:5], scale)
regression.model.standardized <- lm(mpg~disp+hp+wt, data = reg.data.scaled)
print(summary(regression.model.standardized), digits=5)
##
## Call:
## lm(formula = mpg ~ disp + hp + wt, data = reg.data.scaled)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.645619 -0.272123 -0.028543 0.176069 0.972453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5677e-17 7.7403e-02 0.0000 1.000000
## disp -1.9269e-02 2.1283e-01 -0.0905 0.928507
## hp -3.5444e-01 1.3009e-01 -2.7245 0.010971 *
## wt -6.1706e-01 1.7309e-01 -3.5649 0.001331 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.43786 on 28 degrees of freedom
## Multiple R-squared: 0.82684, Adjusted R-squared: 0.80828
## F-statistic: 44.566 on 3 and 28 DF, p-value: 8.6496e-11
Multiple regression example: standardized coefficients
# we can also do it in formula directly
regression.model.standardized <- lm(scale(mpg)~scale(disp)+scale(hp)+scale(wt), data = reg.data)
print(summary(regression.model.standardized), digits=5)
##
## Call:
## lm(formula = scale(mpg) ~ scale(disp) + scale(hp) + scale(wt),
## data = reg.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.645619 -0.272123 -0.028543 0.176069 0.972453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5677e-17 7.7403e-02 0.0000 1.000000
## scale(disp) -1.9269e-02 2.1283e-01 -0.0905 0.928507
## scale(hp) -3.5444e-01 1.3009e-01 -2.7245 0.010971 *
## scale(wt) -6.1706e-01 1.7309e-01 -3.5649 0.001331 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.43786 on 28 degrees of freedom
## Multiple R-squared: 0.82684, Adjusted R-squared: 0.80828
## F-statistic: 44.566 on 3 and 28 DF, p-value: 8.6496e-11
One-way ANOVA example
anova.test.data <- read.csv("./data/anova_test_data.csv")
print(head(anova.test.data))
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
One-way ANOVA example
library(psych)
print(describeBy(anova.test.data, anova.test.data$group))
##
## Descriptive statistics by group
## group: ctrl
## vars n mean sd median trimmed mad min max range skew kurtosis
## weight 1 10 5.03 0.58 5.15 5 0.72 4.17 6.11 1.94 0.23 -1.12
## group* 2 10 1.00 0.00 1.00 1 0.00 1.00 1.00 0.00 NaN NaN
## se
## weight 0.18
## group* 0.00
## --------------------------------------------------------
## group: trt1
## vars n mean sd median trimmed mad min max range skew kurtosis
## weight 1 10 4.66 0.79 4.55 4.62 0.53 3.59 6.03 2.44 0.47 -1.1
## group* 2 10 2.00 0.00 2.00 2.00 0.00 2.00 2.00 0.00 NaN NaN
## se
## weight 0.25
## group* 0.00
## --------------------------------------------------------
## group: trt2
## vars n mean sd median trimmed mad min max range skew kurtosis
## weight 1 10 5.53 0.44 5.44 5.5 0.36 4.92 6.31 1.39 0.48 -1.16
## group* 2 10 3.00 0.00 3.00 3.0 0.00 3.00 3.00 0.00 NaN NaN
## se
## weight 0.14
## group* 0.00
One-way ANOVA example
res.aov <- aov(weight ~ group, data = anova.test.data)
print(summary(res.aov))
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 3.766 1.8832 4.846 0.0159 *
## Residuals 27 10.492 0.3886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
One-way ANOVA example
tukey.res <- TukeyHSD(res.aov)
print(tukey.res)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ group, data = anova.test.data)
##
## $group
## diff lwr upr p adj
## trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
## trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
## trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
Summary
• R is a fantastic language for data analysis
• Very powerful, due to vector-based programming
• Cutting edge libraries
• Built-in statistical functionalities
• Powerful data visualisation libraries
Thank you!
Vitomir Kovanović
University of South Australia
#vkovanovic
vitomir.kovanovic.info
Vitomir.Kovanovic@unisa.edu.au

More Related Content

What's hot

Learn a language : LISP
Learn a language : LISPLearn a language : LISP
Learn a language : LISPDevnology
 
Data science : R Basics Harvard University
Data science : R Basics Harvard UniversityData science : R Basics Harvard University
Data science : R Basics Harvard UniversityMrMoliya
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenRevolution Analytics
 
Towards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositoriesTowards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositoriesValentina Paunovic
 
R tech introcomputer
R tech introcomputerR tech introcomputer
R tech introcomputerRose Rajput
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...openCypher
 
Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Robert Pickering
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
 
Python Workshop. LUG Maniapl
Python Workshop. LUG ManiaplPython Workshop. LUG Maniapl
Python Workshop. LUG ManiaplAnkur Shrivastava
 
Lecture matlab speech_2013
Lecture matlab speech_2013Lecture matlab speech_2013
Lecture matlab speech_2013charu pathak
 

What's hot (16)

Learn a language : LISP
Learn a language : LISPLearn a language : LISP
Learn a language : LISP
 
Data science : R Basics Harvard University
Data science : R Basics Harvard UniversityData science : R Basics Harvard University
Data science : R Basics Harvard University
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
 
Towards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositoriesTowards advanced data retrieval from learning objects repositories
Towards advanced data retrieval from learning objects repositories
 
R tech introcomputer
R tech introcomputerR tech introcomputer
R tech introcomputer
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
Future features for openCypher: Schema, Constraints, Subqueries, Configurable...
 
Lecture4 lexical analysis2
Lecture4 lexical analysis2Lecture4 lexical analysis2
Lecture4 lexical analysis2
 
Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#
 
N20190729
N20190729N20190729
N20190729
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
 
Python Workshop. LUG Maniapl
Python Workshop. LUG ManiaplPython Workshop. LUG Maniapl
Python Workshop. LUG Maniapl
 
sorting
sortingsorting
sorting
 
R crash course
R crash courseR crash course
R crash course
 
Lecture matlab speech_2013
Lecture matlab speech_2013Lecture matlab speech_2013
Lecture matlab speech_2013
 

Similar to Introduction to R for Learning Analytics Researchers

DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxmyworld93
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSonaCharles2
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxSreeLaya9
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data AnalyticsArchana Gopinath
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.pptrajalakshmi5921
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdfRohanBorgalli
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptxsalutiontechnology
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAnshika865276
 
R programming presentation
R programming presentationR programming presentation
R programming presentationAkshat Sharma
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in Rliz__is
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul SinghRavi Basil
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
FULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdfFULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdfattalurilalitha
 

Similar to Introduction to R for Learning Analytics Researchers (20)

DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Slides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MDSlides on introduction to R by ArinBasu MD
Slides on introduction to R by ArinBasu MD
 
17641.ppt
17641.ppt17641.ppt
17641.ppt
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptx
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
How to obtain and install R.ppt
How to obtain and install R.pptHow to obtain and install R.ppt
How to obtain and install R.ppt
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
R programming
R programmingR programming
R programming
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptx
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.ppt
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Reproducible research (and literate programming) in R
Reproducible research (and literate programming) in RReproducible research (and literate programming) in R
Reproducible research (and literate programming) in R
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
FULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdfFULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdf
 

More from Vitomir Kovanovic

Introduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and ManagersIntroduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and ManagersVitomir Kovanovic
 
Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...Vitomir Kovanovic
 
Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...Vitomir Kovanovic
 
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Vitomir Kovanovic
 
Validating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analyticsValidating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analyticsVitomir Kovanovic
 
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Vitomir Kovanovic
 
Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...Vitomir Kovanovic
 
Unsupervised Learning for Learning Analytics Researchers
Unsupervised Learning for Learning Analytics ResearchersUnsupervised Learning for Learning Analytics Researchers
Unsupervised Learning for Learning Analytics ResearchersVitomir Kovanovic
 
A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...Vitomir Kovanovic
 
Introduction to Learning Analytics
Introduction to Learning AnalyticsIntroduction to Learning Analytics
Introduction to Learning AnalyticsVitomir Kovanovic
 
Introduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisIntroduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisVitomir Kovanovic
 
Understand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsUnderstand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsVitomir Kovanovic
 
Assessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methodsAssessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methodsVitomir Kovanovic
 
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
Kovanović et al.   2017 - developing a mooc experimentation platform: insight...Kovanović et al.   2017 - developing a mooc experimentation platform: insight...
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...Vitomir Kovanovic
 
Learning Analytics for Communities of Inquiry
Learning Analytics for Communities of InquiryLearning Analytics for Communities of Inquiry
Learning Analytics for Communities of InquiryVitomir Kovanovic
 
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...Vitomir Kovanovic
 
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Vitomir Kovanovic
 
What does effective online/blended teaching look like?
What does effective online/blended teaching look like?What does effective online/blended teaching look like?
What does effective online/blended teaching look like?Vitomir Kovanovic
 
MOOCs in the news- A European perspective
MOOCs in the news- A European perspectiveMOOCs in the news- A European perspective
MOOCs in the news- A European perspectiveVitomir Kovanovic
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsVitomir Kovanovic
 

More from Vitomir Kovanovic (20)

Introduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and ManagersIntroduction to Learning Analytics for High School Teachers and Managers
Introduction to Learning Analytics for High School Teachers and Managers
 
Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...Extending video interactions to support self-regulated learning in an online ...
Extending video interactions to support self-regulated learning in an online ...
 
Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...Analysing social presence in online discussions through network and text anal...
Analysing social presence in online discussions through network and text anal...
 
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
 
Validating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analyticsValidating a theorized model of engagement in learning analytics
Validating a theorized model of engagement in learning analytics
 
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
 
Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...Developing Self-regulated Learning in High-school Students: The Role of Learn...
Developing Self-regulated Learning in High-school Students: The Role of Learn...
 
Unsupervised Learning for Learning Analytics Researchers
Unsupervised Learning for Learning Analytics ResearchersUnsupervised Learning for Learning Analytics Researchers
Unsupervised Learning for Learning Analytics Researchers
 
A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...A novel model of cognitive presence assessment using automated learning analy...
A novel model of cognitive presence assessment using automated learning analy...
 
Introduction to Learning Analytics
Introduction to Learning AnalyticsIntroduction to Learning Analytics
Introduction to Learning Analytics
 
Introduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisIntroduction to Epistemic Network Analysis
Introduction to Epistemic Network Analysis
 
Understand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analyticsUnderstand students’ self-reflections through learning analytics
Understand students’ self-reflections through learning analytics
 
Assessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methodsAssessing cognitive presence using automated learning analytics methods
Assessing cognitive presence using automated learning analytics methods
 
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
Kovanović et al.   2017 - developing a mooc experimentation platform: insight...Kovanović et al.   2017 - developing a mooc experimentation platform: insight...
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
 
Learning Analytics for Communities of Inquiry
Learning Analytics for Communities of InquiryLearning Analytics for Communities of Inquiry
Learning Analytics for Communities of Inquiry
 
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
 
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
 
What does effective online/blended teaching look like?
What does effective online/blended teaching look like?What does effective online/blended teaching look like?
What does effective online/blended teaching look like?
 
MOOCs in the news- A European perspective
MOOCs in the news- A European perspectiveMOOCs in the news- A European perspective
MOOCs in the news- A European perspective
 
Automated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion TranscriptsAutomated Content Analysis of Discussion Transcripts
Automated Content Analysis of Discussion Transcripts
 

Recently uploaded

Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptxPoojaSen20
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...Gary Wood
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxCeline George
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................MirzaAbrarBaig5
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfPondicherry University
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17Celine George
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismDabee Kamal
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 

Recently uploaded (20)

Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 

Introduction to R for Learning Analytics Researchers

  • 1. Introduction to Vitomir Kovanović University of South Australia #vkovanovic vitomir.kovanovic.info Vitomir.Kovanovic@unisa.edu.au
  • 3. About me • Learning analytics researcher • Research Fellow and Data Scientist University of South Australia, Adelaide • 5+ years of experience with R • 15+ years programming experience (Java/Python/Bash/R) • Member of the SoLAR executive board (organizers of LASI/LAK)
  • 4. Outline of this talk • Overview of R (30min) • Installing R and R Studio • Overview of R Studio • R language overview • Datatypes • Data frames & Factors • Loading/saving data frames into CSV files • Statistical Examples • T-test • One-way ANOVA • Multiple regression
  • 5. Outline of this talk • Overview of R (30min) • Installing R and R Studio • Overview of R Studio • R language overview • Datatypes • Data frames & Factors • Loading/saving data frames into CSV files • Statistical Examples • T-test • One-way ANOVA • Multiple regression
  • 6. What is R? • Vector-based programming language and interactive environment for data analysis • Popular with statistical researchers • Large number of software packages/libraries for statistics, data mining, machine learning, visualisation • Successor to S programming language from Bell Labs (1976) • Free software version of S (“GNU S”) • Support multiple programming paradigms (functional, procedural, object oriented) • Increasingly used in Learning Analytics field
  • 7. Learning R • Using R is a bit akin to smoking. • The beginning is difficult, one may get headaches and even gag the first few times. • In the long run, it becomes pleasurable and even addictive. • Yet, deep down, for those willing to be honest, there is something not fully healthy in it. – Francois Pinard
  • 8. How to learn R • Not like any “regular” programming language (Java/Python) • R is a domain-specific language for statistical analysis • Very quirky, many hacks • In the right hands, provides endless power
  • 9. Good and bad sides to R • The best thing about R is that it was written by statisticians. - Bow Cowgill, Google • The worst thing about R is that it was written by statisticians. - Bow Cowgill, Google • Developers and statisticians have different priorities
  • 10. • They are limited in their ability to change their environment. • They have to rely on algorithms that have been developed for them. • The way they approach a problem is constrained by how SAS/SPSS employed programmers thought to approach them. • And they have to pay money to use these constraining algorithms. = Muggle SPSS users are like muggles
  • 11. • They can rely on functions (spells) that have been developed for them by statistical researchers, but they can also create their own. • They don’t have to pay for the use of them, and once experienced enough (like Dumbledore), they are almost unlimited in their ability to change their environment. = R users are like wizards Wizard
  • 12. Best way to learn R • Learn R in parallel with learning statistics • Try to use it for your next research project • Enrol in a stats/R MOOC • lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about • Read one of the R tutorials: • Quick-R: www.statmethods.net • R-Bloggers: www.r-bloggers.com • Read a good book about R & statistics • www-bcf.usc.edu/~gareth/ISL/ • Read THE BOOK about R and statistics (and life in general) • au.sagepub.com/en-gb/oce/discovering-statistics-using-r/book236067
  • 14. Why is R so popular • Out of the box statistical methods (e.g., regression) • Vector-based language • Simpler to use than Python + NumPy/ SciPy/ Pandas • Handling of missing data • Superb plotting • Cutting edge research packages • Most problems can be solved with an obscure single line of code • R Studio is a great tool for programming in R
  • 15. R Studio IDE • Fantastic development environment • Made R much more accessible especially for beginners • Alternatives: Eclipse, Emacs (ESS), Vim
  • 16. Linear regression example in R speed (mph) dist (ft) 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 7 10 18 8 10 26 9 10 34 10 11 17 11 11 28 12 12 14 13 12 20 14 12 24 15 12 28 16 13 26 17 13 34 18 13 34 19 13 46 20 14 26 summary(lm(dist ~ speed, data=cars)) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -29.069 -9.525 -2.272 9.215 43.201 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -17.5791 6.7584 -2.601 0.0123 * speed 3.9324 0.4155 9.464 1.49e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 15.38 on 48 degrees of freedom Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12 Ezekiel, M. (1930) Methods of Correlation Analysis. Wiley.
  • 17. Outline of this talk • Overview of R (30min) • Installing R and R Studio • Overview of R Studio • R language overview • Datatypes • Data frames & Factors • Loading/saving data frames into CSV files • Statistical Examples • T-test • One-way ANOVA • Multiple regression
  • 18. Installing R Comprehensive R Archive Network (CRAN) – https://cran.r-project.org
  • 19. Take a look at the glorious R interface
  • 21. Outline of this talk • Overview of R (30min) • Installing R and R Studio • Overview of R studio • R language overview • Datatypes • Data frames & Factors • Loading/saving data frames into CSV files • Statistical Examples • T-test • One-way ANOVA • Multiple regression
  • 22. Overview of R Studio
  • 23. Run Hello World as a muggle
  • 24. Run Hello World as a wizard
  • 25. Two types of R source code • “Regular” R code (extension .R) • “RMarkdown” source code (extension .Rmd) • A mix of Markdown text and R code blocks • Rmarkdown has a special compiler Knitr • How Knitr works: • From .Rmd file, each R code block is sent to R process to get its output • Markdown text + R codeblocks and/or results are joined to produce temporary .Md file • Temporary .Md file is processed by markdown translator to generate HTML, PDF, DOCX • Good for producing tutorials and step by step instructions
  • 29. Use library in your code: library • Simply include the library (typically done at the top of the file) library(psych)
  • 30. Outline of this talk • Overview of R (30min) • Installing R and R Studio • Overview of R studio • R language overview • Statistical Examples • T-test • One-way ANOVA • Multiple regression
  • 31. R variable declarations: <- • variable.name instead of • variable_name (C, Python) • variableName (Java/C#) • VariableName (VisualBasic) • Don’t use equals sign to assign a value! # this is how variables are declared in R test.variable <- "some text" number.variable <- 123.45 boolean.variable <- TRUE another.boolean.variable <- FALSE missing.value <- NA
  • 32. Calling functions • rnorm: generate random samples from a normal distribution three.random.numbers <- rnorm(3) print(three.random.numbers) ## [1] 0.9826734 -0.8985821 0.5707538 million.random.numbers <- rnorm(1000000) print(mean(million.random.numbers)) ## [1] -0.001602627 print(sd(million.random.numbers)) ## [1] 0.9998751
  • 33. Required and optional arguments • The definition of rnorm function • Two types of arguments: • Required: must be specified • Optional: may be specified they have default values rnorm(n, mean = 0, sd = 1)
  • 34. Invoking functions with optional arguments new.million.random.numbers <- rnorm(1000000, 5, 10) print(mean(new.million.random.numbers)) ## [1] 4.992608 print(sd(new.million.random.numbers)) ## [1] 9.986131
  • 35. Use names for optional arguments to make code clearer • Much easier to read and understand new.million.random.numbers <- rnorm(1000000, mean=5, sd=10) new.million.random.numbers <- rnorm(1000000, 5, 10)
  • 36. Names enable “skipping” of other optional arguments new.million.random.numbers <- rnorm(1000000, 10) new.million.random.numbers <- rnorm(1000000, 0, 10) new.million.random.numbers <- rnorm(1000000, sd=10) • mean argument is 10 • sd argument is 10 • We can skip mean argument by naming sd argument
  • 37. R vectors: c( , , ,) # this is how vectors and lists are declared in R number.vector <- c(1, 2, 3) print(number.vector[1]) ## [1] 1 print(number.vector[3]) ## [1] 3 Can only be of a single type!
  • 38. text.vector <- c("New", "York", "City") print(text.vector[1]) ## [1] "New" print(text.vector[3]) ## [1] "City" R vectors: c( , , ,)
  • 39. R vectors shorthands: simple ranges with X:Y notation • Generate all numbers in a range: N1:N2 ten.numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) print(ten.numbers) ## [1] 1 2 3 4 5 6 7 8 9 10 ten.numbers <- 1:10 print(ten.numbers) ## [1] 1 2 3 4 5 6 7 8 9 10
  • 40. R vectors shorthands: ranges with a step size using seq() • More complex sequence with an increment of n: seq(from, to, by) odd.numbers <- c(1, 3, 5, 7, 9) print(odd.numbers) ## [1] 1 3 5 7 9 odd.numbers <- seq(1, 9, by=2) print(odd.numbers) ## [1] 1 3 5 7 9
  • 41. R vectors shorthands: repeat a value using rep() • Repeat a value n times: rep(value, n) five.ones <- c(1, 1, 1, 1, 1) print(five.ones) ## [1] 1 1 1 1 1 five.ones <- rep(1, times=5) print(five.ones) ## [1] 1 1 1 1 1
  • 42. R is vector-based language • In R, everything is a vector! This is why [1] is printed in front of every value • If you print a longer vector, you will see more numbers in brackets, they just indicate a position of the next item (R is 18th element) print(5) ## [1] 5 print(LETTERS) ## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" ## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
  • 43. Vectors can be manipulated using vector operations • Vectors can be manipulated using vector operations print(c(1, 2, 3) + c(10, 20, 30)) ## [1] 11 22 33
  • 44. Vectors get expanded • Shorter vectors get expanded to match the longer one • This is equivalent to print(1 + c(10, 20, 30)) ## [1] 11 21 31 print(c(1, 1, 1) + c(10, 20, 30)) ## [1] 11 21 31
  • 45. Warning if lengths don’t “match”. • This produces a warning • This does not produce a warning print(c(1, 2) + c(10, 20, 30)) ## Warning in c(1, 2) + c(10, 20, 30): longer object length is not a multiple of shorter object length ## [1] 11 22 31 print(c(1, 2) + c(10, 20, 30, 40)) ## [1] 11 22 31 42
  • 46. If statement jedi <- "Anakin" if(jedi == "Anakin") { print("He is the Chosen One") } else { print("He is not the Chosen One") } ## [1] "He is the Chosen One"
  • 47. For loop for (jedi in c("Yoda", "Obi-Wan", "Anakin", "Luke")) { print(jedi) } ## [1] "Yoda" ## [1] "Obi-Wan" ## [1] "Anakin" ## [1] "Luke" Always ask yourself do you really need a loop!
  • 48. For loops should be avoided Print is a vectorized function! print(c("Yoda", "Obi-Wan", "Anakin", "Luke")) ## [1] "Yoda" "Obi-Wan" "Anakin" "Luke"
  • 49. Data frames and Factors
  • 50. Data frames • Tabular data structures jedis <- data.frame(name=c("Yoda", "Obi-Wan", "Anakin", "Luke"), age=c(900, 58, 45, NA)) print(jedis) ## name age ## 1 Yoda 900 ## 2 Obi-Wan 58 ## 3 Anakin 45 ## 4 Luke NA
  • 51. Accessing Data frames Access notation: variable[row, column] • By rows print(jedis[1, ]) ## name age ## 1 Yoda 900 print(jedis[1:2, ]) ## name age ## 1 Yoda 900 ## 2 Obi-Wan 58 print(jedis[-c(1, 2), ]) ## name age ## 3 Anakin 45 ## 4 Luke NA
  • 52. Accessing Data frames • By columns print(jedis[ , 1]) ## [1] Yoda Obi-Wan Anakin Luke ## Levels: Anakin Luke Obi-Wan Yoda print(jedis[ ,"age"]) ## [1] 900 58 45 NA print(jedis[ , c("name", "age")]) ## name age ## 1 Yoda 900 ## 2 Obi-Wan 58 ## 3 Anakin 45 ## 4 Luke NA
  • 53. Accessing Data frames • Row and column selectors can be combined print(jedis[1:2, "age"]) ## [1] 900 58
  • 54. Accessing Data frames • Special notation variable$column print(jedis$name) ## [1] Yoda Obi-Wan Anakin Luke ## Levels: Anakin Luke Obi-Wan Yoda print(jedis$age) ## [1] 900 58 45 NA
  • 55. Adding new column • Easiest with column notation: jedis$side <- "Light" print(jedis) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 3 Anakin 45 Light ## 4 Luke NA Light jedis$side[3] <- "Dark“ print(jedis) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 3 Anakin 45 Dark ## 4 Luke NA Light
  • 56. Data frames have row names and column names print(jedis) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 3 Anakin 45 Dark ## 4 Luke NA Light colnames(jedis) <- c("jedi.name", "death.age", "force.side") rownames(jedis) <- c(“jedi.1", "jedi.2", "jedi. 3", “jedi.4") print(jedis) ## jedi.name death.age force.side ## jedi.1 Yoda 900 Light ## jedi.2 Obi-Wan 58 Light ## jedi.3 Anakin 45 Dark ## jedi.4 Luke NA Light
  • 57. Data frames have row names and column names • Access by name works also for rows, not just columns print(jedis[“jedi.1",]) ## jedi.name death.age force side ## jedi.1 Yoda 900 Light
  • 58. Filtering data frame rows based on a condition • We can use a Boolean vector to filter by rows print(jedis[c(TRUE, TRUE, FALSE, TRUE),]) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 4 Luke NA Light
  • 59. Filtering data frame rows based on a condition • Data frame columns are vectors! • == operation on vectors returns a Boolean vector • Which means we can write filtering very naturally as: print(jedis$side) ## [1] "Light" "Light" "Dark" "Light" print(jedis$side == "Light") ## [1] TRUE TRUE FALSE TRUE print(jedis[jedis$side == "Light",]) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 4 Luke NA Light
  • 60. Filtering data frame rows based on a condition • Data frame columns are vectors! • == operation on vectors returns a Boolean vector • Which means we can write filtering very naturally as: print(jedis$side) ## [1] "Light" "Light" "Dark" "Light" print(jedis$side == "Light") ## [1] TRUE TRUE FALSE TRUE print(jedis[c(TRUE, TRUE, FALSE, TRUE),]) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 4 Luke NA Light
  • 61. Factors: categorical variables • Categorical variables: can have levels (input) and labels (output) • We need to specify the list of potential scores in the input: # five-level Likert-scale responses to q1 q1 <- factor(c(1, 3, 2, 4, 4, 4)) print(q1) ## [1] 1 3 2 4 4 4 ## Levels: 1 2 3 4 q1 <- factor(c(1, 3, 2, 4, 4, 4), levels=c(1,2,3,4,5)) print(q1) ## [1] 1 3 2 4 4 4 ## Levels: 1 2 3 4 5
  • 62. Factors: categorical variables • Factors can be also textual • We need to specify all possible input values q1 <- factor(c("SD", "N", "D", "A", "A", "A")) print(q1) ## [1] SD N D A A A ## Levels: A D N SD q1 <- factor(c("SD", "N", "D", "A", "A", "A"), levels=c("SD", "D", "N", "A", "SA")) print(q1) ## [1] SD N D A A A ## Levels: SD D N A SA
  • 63. Factors: categorical variables • The visuals of the factor can be altered q1 <- factor(c(1, 3, 2, 4, 4, 4), levels=c(1, 2, 3, 4, 5), labels=c("SD", "D", "N", "A", "SA")) print(q1) ## [1] SD N D A A A ## Levels: SD D N A SA q1 <- factor(c("SD", "N", "D", "A", "A", "A"), levels=c("SD", "D", "N", "A", "SA"), labels=c(1, 2, 3, 4, 5)) print(q1) ## [1] 1 3 2 4 4 4 ## Levels: 1 2 3 4 5
  • 64. Data frames revisited • data.frame function creates factors from textual columns automatically jedis <- data.frame(name=c("Yoda", "Obi-Wan", "Anakin", "Luke"), age=c(900, 58, 45, NA)) print(jedis$name) ## [1] Yoda Obi-Wan Anakin Luke ## Levels: Anakin Luke Obi-Wan Yoda
  • 65. Data frames revisited • data.frame function creates factors from textual columns automatically jedis <- data.frame(name=c("Yoda", "Obi-Wan", "Anakin", "Luke"), age=c(900, 58, 45, NA), stringsAsFactors = FALSE) print(jedis$name) ## [1] "Yoda" "Obi-Wan" "Anakin" "Luke"
  • 66. Writing and reading data frames to/from files • Writing data using write.csv() function • Read data from csv file using read.csv() function write.csv(jedis, file="jedi_masters.csv", row.names=FALSE) jedis <- read.csv("jedi_masters.csv", stringsAsFactors=FALSE) print(jedis) ## name age side ## 1 Yoda 900 Light ## 2 Obi-Wan 58 Light ## 3 Anakin 45 Dark ## 4 Luke NA Light
  • 67. Work directory • Work directory: the directory from which R is started • getwd() – returns the current working directory • setwd(directory.path) sets the working directory to directory.path • When reading/writing files seems not to work, always check work directory with getwd()!
  • 68. getwd() and setwd() in practice
  • 69. Setting work directory in R Studio
  • 71. Create cross tabulation with table() print(table(jedis$side)) ## ## Dark Light ## 1 3
  • 72. Concatenate vectors as text using paste()/paste0() print(paste("Participant", 1:10)) ## [1] "Participant 1" "Participant 2" "Participant 3" "Participant 4" ## [5] "Participant 5" "Participant 6" "Participant 7" "Participant 8" ## [9] "Participant 9" "Participant 10" rownames(jedis) <- c("jedi.1", "jedi.2", "jedi.3", "jedi.4") rownames(jedis) <- paste0("jedi.", 1:4)
  • 73. Show excerpt from a data frame using head() midwest <- read.csv("http://goo.gl/G1K41K") # yes, read.csv also works with URLs print(head(midwest[, c("PID", "area", "poptotal", "popdensity", "state")])) ## PID area poptotal popdensity state ## 1 561 0.052 66090 1270.9615 IL ## 2 562 0.014 10626 759.0000 IL ## 3 563 0.022 14991 681.4091 IL ## 4 564 0.017 30806 1812.1176 IL ## 5 565 0.018 5836 324.2222 IL ## 6 566 0.050 35688 713.7600 IL
  • 75. ggplot2 – a superb plotting library • Built upon the Grammar of Graphics ideas • Each element of a plot is a separate function • Chaining the functions produces the final plot • Extremely powerful! • Even ported to Python (ggplot python library)
  • 76. ggplot2 example midwest <- read.csv("http://goo.gl/G1K41K") # yes, read.csv also works with URLs print(head(midwest[, c("PID", "area", "poptotal", "popdensity", "state")])) ## PID area poptotal popdensity state ## 1 561 0.052 66090 1270.9615 IL ## 2 562 0.014 10626 759.0000 IL ## 3 563 0.022 14991 681.4091 IL ## 4 564 0.017 30806 1812.1176 IL ## 5 565 0.018 5836 324.2222 IL ## 6 566 0.050 35688 713.7600 IL
  • 77. ggplot2 example gg <- ggplot(midwest, aes(x=area, y=poptotal)) + geom_point(aes(col=state, size=popdensity)) + geom_smooth(method="loess", se=F) + xlim(c(0, 0.1)) + ylim(c(0, 500000)) + labs(subtitle="Area Vs Population", y="Population", x="Area", title="Scatterplot", caption = "Source: midwest") plot(gg)
  • 79. Formula notation ~ splits dependent and independent variables Formula Mathematical equation Comment b ~ a b = c0 + c1*a Simple regression
  • 80. Formula notation + used to add a variable to the model Formula Mathematical equation Comment c ~ a + b c = c0 + c1*a + c2*b Multiple regression
  • 81. Formula notation : used to specify an interaction term Formula Mathematical equation Comment c ~ a + b + a:b c = c0 + c1*a + c2*b + c3*a*b Multiple regression with AB interaction term d ~ a + b + c + a:b + b:c + a:c + a:b:c d = c0 + c1*a + c2*b + c3*c + c4*a*b + c5*a*c + c6*b*c + c7*a*b*c Multiple regression with AB, BC, AC, and ABC interaction terms
  • 82. Formula notation * adds variables and their interactions Formula Mathematical equation Comment c ~ a * b c = c0 + c1*a + c2*b + c3*a*b Multiple regression with AB interaction term d ~ a * b * c d = c0 + c1*a + c2*b + c3*c + c4*a*b + c5*a*c + c6*b*c + c7*a*b*c Multiple regression with AB, BC, AC, and ABC interaction terms
  • 83. Formula notation . adds all available variables not already mentioned in the formula Formula Mathematical equation Comment z ~ a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y z = c0 + c1*a + c2*b + c3*c … + c25*y Multiple regression z ~ . z = c0 + c1*a + c2*b + c3*c … + c25*y Multiple regression
  • 84. Formula notation ^n used to limit interactions up to n-th level Formula Mathematical equation Comment d ~ (a + b + c)^2 d = c0 + c1*a + c2*b + c3*c + c4*a*b + c5*a*c + c6*b*c Multiple regression with AB, AC, and BC interaction terms (but not ABC)
  • 85. Formula notation - is used to remove an interaction term Formula Mathematical equation Comment d ~ a * b * c – a:c d = c0 + c1*a + c2*b + c3*c + c4*a*b + c6*b*c + c7*a*b*c Multiple regression with all interaction terms except a:c d ~ a * b * c – a:b:c d = c0 + c1*a + c2*b + c3*c + c4*a*b + c5*a*c + c6*b*c Multiple regression with AB, AC, and BC interaction terms (but not ABC)
  • 86. Formula notation I() is used to interpret content “as-is” (90% of the time used for polynomials) Formula Mathematical equation Comment b ~ a + I(a^2) b = c0 + c1*a + c2*a2 Simple polynomial regression
  • 87. Formula notation +0 is used to remove intercept term Formula Mathematical equation Comment b ~ 0 + a b = c0*a Simple regression without intercept
  • 88. Statistical text examples (it was about time)
  • 89. Independent t-test example ind.t.test.data <- read.csv("./data/independent_t_test_data.csv") print(head(ind.t.test.data)) ## weight group ## 1 4.81 group 1 ## 2 4.17 group 1 ## 3 4.41 group 1 ## 4 3.59 group 1 ## 5 5.87 group 1 ## 6 3.83 group 1
  • 90. Independent t-test example ind.t.test.result <- t.test(weight~group, data = ind.t.test.data) print(ind.t.test.result) ## ## Welch Two Sample t-test ## ## data: weight by group ## t = -3.0101, df = 14.104, p-value = 0.009298 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.4809144 -0.2490856 ## sample estimates: ## mean in group 1 mean in group 2 ## 4.661 5.526
  • 91. Paired t-test example paired.t.test.data <- read.csv("./data/paired_t_test_data.csv") print(head(paired.t.test.data)) ## weight time ## 1 4.81 pre ## 2 4.17 pre ## 3 4.41 pre ## 4 3.59 pre ## 5 5.87 pre ## 6 3.83 pre
  • 92. Paired t-test example paired.t.test.result <- t.test(weight~time, data = paired.t.test.data, paired=TRUE) print(paired.t.test.result) ## ## Paired t-test ## ## data: weight by time ## t = 2.8464, df = 9, p-value = 0.0192 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 0.1775354 1.5524646 ## sample estimates: ## mean of the differences ## 0.865
  • 93. Multiple regression example reg.data <- read.csv("./data/regression_data.csv") print(head(reg.data)) ## model mpg disp hp wt ## 1 Mazda RX4 21.0 160 110 2.620 ## 2 Mazda RX4 Wag 21.0 160 110 2.875 ## 3 Datsun 710 22.8 108 93 2.320 ## 4 Hornet 4 Drive 21.4 258 110 3.215 ## 5 Hornet Sportabout 18.7 360 175 3.440 ## 6 Valiant 18.1 225 105 3.460
  • 94. Multiple regression example print(describe(reg.data)) ## vars n mean sd median trimmed mad min max range ## model* 1 32 16.50 9.38 16.50 16.50 11.86 1.00 32.00 31.00 ## mpg 2 32 20.09 6.03 19.20 19.70 5.41 10.40 33.90 23.50 ## disp 3 32 230.72 123.94 196.30 222.52 140.48 71.10 472.00 400.90 ## hp 4 32 146.69 68.56 123.00 141.19 77.10 52.00 335.00 283.00 ## wt 5 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91 ## skew kurtosis se ## model* 0.00 -1.31 1.66 ## mpg 0.61 -0.37 1.07 ## disp 0.38 -1.21 21.91 ## hp 0.73 -0.14 12.12 ## wt 0.42 -0.02 0.17
  • 95. Multiple regression example regression.model <- lm(mpg~disp+hp+wt, data = reg.data) print(summary(regression.model)) ## ## Call: ## lm(formula = mpg ~ disp + hp + wt, data = reg.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.891 -1.640 -0.172 1.061 5.861 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.105505 2.110815 17.579 < 2e-16 *** ## disp -0.000937 0.010350 -0.091 0.92851 ## hp -0.031157 0.011436 -2.724 0.01097 * ## wt -3.800891 1.066191 -3.565 0.00133 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.639 on 28 degrees of freedom ## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8083 ## F-statistic: 44.57 on 3 and 28 DF, p-value: 8.65e-11
  • 96. Multiple regression example: standardized coefficients reg.data.scaled <- lapply(reg.data[,2:5], scale) regression.model.standardized <- lm(mpg~disp+hp+wt, data = reg.data.scaled) print(summary(regression.model.standardized), digits=5) ## ## Call: ## lm(formula = mpg ~ disp + hp + wt, data = reg.data.scaled) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.645619 -0.272123 -0.028543 0.176069 0.972453 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.5677e-17 7.7403e-02 0.0000 1.000000 ## disp -1.9269e-02 2.1283e-01 -0.0905 0.928507 ## hp -3.5444e-01 1.3009e-01 -2.7245 0.010971 * ## wt -6.1706e-01 1.7309e-01 -3.5649 0.001331 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.43786 on 28 degrees of freedom ## Multiple R-squared: 0.82684, Adjusted R-squared: 0.80828 ## F-statistic: 44.566 on 3 and 28 DF, p-value: 8.6496e-11
  • 97. Multiple regression example: standardized coefficients # we can also do it in formula directly regression.model.standardized <- lm(scale(mpg)~scale(disp)+scale(hp)+scale(wt), data = reg.data) print(summary(regression.model.standardized), digits=5) ## ## Call: ## lm(formula = scale(mpg) ~ scale(disp) + scale(hp) + scale(wt), ## data = reg.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.645619 -0.272123 -0.028543 0.176069 0.972453 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.5677e-17 7.7403e-02 0.0000 1.000000 ## scale(disp) -1.9269e-02 2.1283e-01 -0.0905 0.928507 ## scale(hp) -3.5444e-01 1.3009e-01 -2.7245 0.010971 * ## scale(wt) -6.1706e-01 1.7309e-01 -3.5649 0.001331 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.43786 on 28 degrees of freedom ## Multiple R-squared: 0.82684, Adjusted R-squared: 0.80828 ## F-statistic: 44.566 on 3 and 28 DF, p-value: 8.6496e-11
  • 98. One-way ANOVA example anova.test.data <- read.csv("./data/anova_test_data.csv") print(head(anova.test.data)) ## weight group ## 1 4.17 ctrl ## 2 5.58 ctrl ## 3 5.18 ctrl ## 4 6.11 ctrl ## 5 4.50 ctrl ## 6 4.61 ctrl
  • 99. One-way ANOVA example library(psych) print(describeBy(anova.test.data, anova.test.data$group)) ## ## Descriptive statistics by group ## group: ctrl ## vars n mean sd median trimmed mad min max range skew kurtosis ## weight 1 10 5.03 0.58 5.15 5 0.72 4.17 6.11 1.94 0.23 -1.12 ## group* 2 10 1.00 0.00 1.00 1 0.00 1.00 1.00 0.00 NaN NaN ## se ## weight 0.18 ## group* 0.00 ## -------------------------------------------------------- ## group: trt1 ## vars n mean sd median trimmed mad min max range skew kurtosis ## weight 1 10 4.66 0.79 4.55 4.62 0.53 3.59 6.03 2.44 0.47 -1.1 ## group* 2 10 2.00 0.00 2.00 2.00 0.00 2.00 2.00 0.00 NaN NaN ## se ## weight 0.25 ## group* 0.00 ## -------------------------------------------------------- ## group: trt2 ## vars n mean sd median trimmed mad min max range skew kurtosis ## weight 1 10 5.53 0.44 5.44 5.5 0.36 4.92 6.31 1.39 0.48 -1.16 ## group* 2 10 3.00 0.00 3.00 3.0 0.00 3.00 3.00 0.00 NaN NaN ## se ## weight 0.14 ## group* 0.00
  • 100. One-way ANOVA example res.aov <- aov(weight ~ group, data = anova.test.data) print(summary(res.aov)) ## Df Sum Sq Mean Sq F value Pr(>F) ## group 2 3.766 1.8832 4.846 0.0159 * ## Residuals 27 10.492 0.3886 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • 101. One-way ANOVA example tukey.res <- TukeyHSD(res.aov) print(tukey.res) ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = weight ~ group, data = anova.test.data) ## ## $group ## diff lwr upr p adj ## trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711 ## trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960 ## trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
  • 102. Summary • R is a fantastic language for data analysis • Very powerful, due to vector-based programming • Cutting edge libraries • Built-in statistical functionalities • Powerful data visualisation libraries
  • 103. Thank you! Vitomir Kovanović University of South Australia #vkovanovic vitomir.kovanovic.info Vitomir.Kovanovic@unisa.edu.au