Data is getting bigger and more complex than ever before. Why not learn how to automate your analyses using the R programming language? This sessions covers the basics of using R such as operators, functions, data frames and factors.
1. Cookies
RBasics
Sept. 28, 2017
10:00-11:30 a.m.
Morgan Library, Computer Classroom 175
DataCleaningUsingR
Nov. 2, 2017
10:00-11:30 a.m.
Morgan Library, Computer Classroom 175
DataWranglingUsingR
Nov. 30, 2017
10:00-11:30 a.m.
Morgan Library, Computer Classroom 175
DataVisualizationUsingR
Feb. 15, 2018
10:00-11:30 a.m.
Morgan Library, Computer Classroom 175
VersionControl UsingGit
March 15, 2018
10:00-11:30 a.m.
Morgan Library, Computer Classroom 175
CreatingReproducibleReports
WithRMarkdown
April 19, 2018
10:00-11:30 a.m.
Morgan Library, Computer Classroom 175
REGISTER ONLINE:
ookies
5
5
5
5
5
5
STER ONLINE:
Basic Data
Analysis using R
C. Tobin Magle, PhD
09-28-2017
10:00-11:30 a.m.
Morgan Library
Computer Classroom 175
Based on http://www.datacarpentry.org/R-ecology-lesson/
2. Outline
• Intro to R and R studio
• Operators and functions
• Data Frames
• Factors
3. What is R? R Studio?
• R – a programming language
+ software that interprets it
• RStudio – popular software
to write R scripts and interact
with the R software
• http://www.datacarpentry.org/
R-ecology-
lesson/#setup_instructions
4. Why learn R
• Research Reproducibility
• Widely used, 10000+ “packages”
• Works on many data types
• Produced high-quality graphics
• Free, open source, cross platform
5. Setup a working directory
• Start RStudio
• File > New project > New directory > Empty project
• Enter a name for this new folder and choose a convenient
location for it (working directory)
• Click on “Create project”
• Create a data folder in your working directory
• Create a new R script (File > New File > R script) and save it
in your working directory
7. Script vs console
• Both accept commands
• Console: runs the commands
• Doesn’t save*
• Script: commands you want to save for later;
• These commands need to be sent to the console to be run
• Ctrl-enter to send from script to console
8. Operators
• Symbols that tells R to perform a mathematical or logical
operations
https://www.tutorialspoint.com/r/r_operators.htm
Type Symbol
Arithmetic + - * / ^
Assignment <-
Extraction [ ]
Relational > < == != >= <=
Logical & | !
9. Assignment operator
• Saves values into variables
• variable <- value
• weight_kg <- 55
• Short key alt- dash
10. Arithmetic operators
• Does math
• 2+2
• 4*4
• 5/2
• 3-1
• Can be combined with the
assignment operator
• weight_lb <- 2.2*weight_kg
12. Functions and arguments
• A sequence of instructions that perform a task
• Predefined, packages, “home-made”
• Have names
• Accepts arguments (input)
• Return a value (output)
• Examples: sqrt, round
• args(round)
13. (Down)loading data
• Can download using download.file
• download.file("https://ndownloader.figshare.com/files/2292169",
"data/portal_data_joined.csv")
• Read data using read.csv function
• surveys <- read.csv('data/portal_data_joined.csv')
14. Storing data in a data frame
1. Rows = observations
2. Cols = variables
3. All values in a column must be the same data type
• (number or text)
4. Data must be “rectangular”
• Same # rows/cols
15. Inspecting data frames
• head(surveys) = look at first 6 rows (all columns)
• str(surveys) = structure # rows, cols, data types
• nrow(surveys) = number of columns
• ncol(surveys) = number of columns
• names(surveys) = column names
• summary(surveys) = does summary stats for each column
17. Subsetting
• Use the extraction operator ([ ])
• Row column format: surveys[row,column]
• surveys[1,2] #first row, second column
• Select entire row/col: surveys[,column]
• surveys[1,] #first row, all column
• surveys[,1] #first column, all rows
• Ranges: surveys[a:b, column]
• surveys[1:3, 7] #rows 1-3, 7th column
18. By column name
• surveys["species_id"] # Result is a data.frame
• surveys[, "species_id"] # Result is a vector
• surveys[["species_id"]] # Result is a vector
• surveys$species_id # Result is a vector
19. Exercise 3:
1. Create a data frame (surveys_200) containing only the
observations from rows 1 to 200 of the surveys dataset.
2. Use nrow() to subset the last row in surveys_200.
3. Use nrow() to extract the row that is in the middle
surveys_200. Store in a variable called surveys_mid
20. Factors
•Represent categorical data
•Critical for stats and plotting
•Stored as integers with text labels (levels)
•Can be ordered or unordered
•Orders labels by alpha order of text labels
21. Functions for factors
• Create: sex <- factor(c("male", "female", "female", "male"))
• Unique text labels: levels(sex)
• Number of levels: nlevels(sex)
• Specify level order: sex <- factor(sex, levels = c("male", "female"))
22. Converting factors
• To character: as.character(sex)
• To number:
• f <- factor(c(1990, 1983, 1977, 1998, 1990))
• as.numeric(f) # wrong! and there is no warning...
as.numeric(as.character(f)) # works...
• as.numeric(levels(f))[f] # The recommended way.
24. Renaming levels
• Label missing values
• sex <- surveys$sex # subset the column
• head(sex) # look at first 6 records
• levels(sex) # look at the factor levels
• levels(sex)[1] <- "missing" # change the first label to “missing”
• levels(sex) # look at factor levels again
• head(sex) # see where missing values were
25. Exercise 4: Renaming factors
1. Rename “F” and “M” to “female” and “male” respectively.
2. Now that we have renamed the factor level to “missing”, can
you recreate the barplot such that “missing” is last (after
“male”)?
26. What if you don’t want to use factors?
• Argument: stringsAsFactors=FALSE
## Compare the difference between when the data are being read as
## `factor`, and when they are being read as `character`.
surveys <- read.csv("data/portal_data_joined.csv", stringsAsFactors = TRUE)
str(surveys)
surveys <- read.csv("data/portal_data_joined.csv", stringsAsFactors = FALSE)
str(surveys)
## Convert the column "plot_type" into a factor
surveys$plot_type <- factor(surveys$plot_type)
27. Saving Data as .csv
• Save a subset of your data
• Name: write.csv
• Input: data frame, destination file, separator
• Output: a file to the specified location
• write.table(surveys, "data/surveys4.tsv", sep = "t")
28. Need help?
• Email: tobin.magle@colostate.edu
• Data Management Services website:
http://lib.colostate.edu/services/data-management
• Data Carpentry: http://www.datacarpentry.org/
• R Ecology Lesson:
http://www.datacarpentry.org/OpenRefine-ecology-lesson/
• Base R Cheat sheet: https://www.rstudio.com/wp-
content/uploads/2016/10/r-cheat-sheet-3.pdf