3. R Studio is that all the information you
need to write code is available in a
single window.
Additionally, with many shortcuts, auto
completion, and highlighting for the
major file types you use while developing
in R, R Studio will make typing easier
and less error-prone.
R offers a wide variety of statistics-
related libraries and provides a
favorable environment for statistical
computing and design.
ADD A FOOTER 3
13. Data frame
13
A DataFrame is a data structure that organizes data into a 2-
dimensional table of rows and columns, much like a spreadsheet.
DataFrames are one of the most common data structures used in
modern data analytics because they are a flexible and intuitive way of
storing and working with data.
Numerical=c(1,2,3,4,5)
Character=c("one","two","three","four","five")
logical=c(TRUE,FALSE,FALSE,TRUE,TRUE)
data.frame(Character,Numerical,logical) Character Numerical
logical) 1 one 1 TRUE 2 two 2 FALSE 3
three 3 FALSE 4 four 4 TRUE 5 five 5
TRUE
15. 15
A histogram is a graph used to represent
the frequency distribution of a few data
points of one variable. Which is equal to
class interval.
hist(iris$Sepal.Length)
hist(iris$Petal.Width)
hist(faithful$eruptions)
18. It is basically a table where each column is a variable and each row has one
set of values for each of those variables (much like a single sheet in a program
like LibreOffice Calc or Microsoft Excel).
18
Basic
data("iris")
names(iris)
Result "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
dim(iris)
Result = 150 5
str(iris3)
num [1:50, 1:4, 1:3] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 3
..$ : NULL
..$ : chr [1:4] "Sepal L." "Sepal W." "Petal L." "Petal W."
..$ : chr [1:3] "Setosa" "Versicolor" "Virginica"
19. 19
sum(iris$Sepal.Length)
Result = 876.5
sum(iris$Sepal.Width)
result = 458.6
sum(iris$Petal.Length)
result = 563.7
sum(iris$Petal.Width)
result = 179.9
IQR(iris$Sepal.Length)
Result= 1.3
sort(iris3)
sort(iris$Sepal.Length)
round(iris$Sepal.Length)
20. 20
summary(iris)
• Result
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300
virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
summary(iris$Sepal.Length)
• Result
Min. 1st Qu. Median Mean 3rd Qu. Max. 4.300 5.100 5.800
5.843 6.400 7.900
21. 21
sum(iris$Sepal.Length)
Result = 876.5
sum(iris$Sepal.Width)
result = 458.6
sum(iris$Petal.Length)
result = 563.7
sum(iris$Petal.Width)
result = 179.9
IQR(iris$Sepal.Length)
Result= 1.3
22. 22
mean(x, na.rm = T)
Result 30.33333
median(x,na.rm=T)
Result 28.5
summary(x)
result Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 22.00 28.50 30.33 37.25 55.00 >
sd(x,na.rm=T)
result 15.68014
var(x,na.rm=T)
Result 245.8667
23. 23
A quantile defines a particular part of a data set, i.e. a quantile determines how many values in
a distribution are above or below a certain limit
quantile(x, probs = seq(0,1,.2), na.rm=T)
0% 20% 40% 60% 80% 100%
10 20 28 29 40 55
quantile(x, probs = seq(0,1,.3), na.rm=T)
0% 30% 60% 90%
10.0 24.0 29.0 47.5
quantile(x, probs = seq(0,1,.4), na.rm=T)
0% 40% 80%
10 28 40
quantile(x, probs = seq(0,1,.6), na.rm=T)
0% 60%
10 29
quantile(x, probs = seq(0,1,.9), na.rm=T)
0 0% 90% 10.0 47.5
24. 24
An integer (pronounced IN-tuh-jer) is a whole number (not a fractional number) that can be
positive, negative, or zero. Examples of integers are: -5, 1, 5, 8, 97,
firstTwentyIntegers = 1:30
sum(firstTwentyIntegers)
Result = 465
36. dbinom
dbinom(0, 5, .5) #probabilty of 0 heads in 5 flips
Result 0.03125
dbinom(0:5, 5, .5) #full probability dist. for 5 flips
Result 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125
sum(dbinom(0:2, 5, .5)) #probability of 2 or fewer heads in 5
flips
Result 0.5
sum(dbinom(0:8, 9, .10)) #probability of 6 or fewer heads in 8
flips
Result 1
37. rbinom, binom.test, prop.test
pbinom(2, 5, .5) #same as last line
Result 0.5
table(rbinom(10000, 5, .5)) / 10000
Result 0 1 2 3 4 5
0.0335 0.1544 0.3131 0.3182 0.1532 0.0276
binom.test(29,200, .21)
Result Exact binomial test
data: 29 and 200
number of successes = 29, number of trials = 200, p-value = 0.02374
alternative hypothesis: true probability of success is not equal to 0.21
95 percent confidence interval:
0.09930862 0.20156150
sample estimates:
probability of success
0.145
prop.test(29, 200, .21)
39. dpois(2:7, 4.2) #probabilities of 2,3,4,5,6,or7
result 0.13226099 0.18516538 0.19442365 0.16331587 0.11432111 0.06859266
ppois(1, 9.2) #probabilities of 1 or fewer successes in pois(4.2); sameas sum (0:1,4.2
Result 0.001030602
1-ppois(7,4.2) #probability of 8 or more successes in pois(4.2)
0.001030602
dpois(), ppois()
41. t.test(extra ~ group, data=sleep) # 2-sample t with group id column
Result
Welch Two Sample t-test
data: extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-3.3654832 0.2054832
sample estimates:
mean in group 1 mean in group 2
0.75 2.33
data(sleep)
42. t.test(sleepGrp1, sleepGrp2, conf.level=.99)
Welch Two Sample t-test
data: sleepGrp1 and sleepGrp2
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
-4.0276329 0.8676329
sample estimates:
mean of x mean of y
0.75 2.33
data(sleep)
43. Two sample test
Two-sample t test power calculation
n = 40
delta = 0.5
sd = 0.4
sig.level = 0.01
power = 0.998096
alternative = two.sided
NOTE: n is number in *each* group