Getting started with R Jacob van Etten Alberto Labarga
What is R?“R is a free software environment forstatistical computing and graphics.”www.r-project.org
Why R?R takes time to learn and use. So why should I bother?There are more user-friendly programmes, right?
12 reasons to learn R1. Rigour and strategy in data analysis – not“thinking after clicking”.2. Automatizing repeated calculations cansave time in the end.3. A lot of stuff is simply not feasible in menudriven software.
12 reasons to learn R4. R is not only about software, it is also an onlinecommunity of user support and collaboration.5. Scripts make it easy to communicate aboutyour problem. Important for collaborativeresearch!6. Research becomes replicable when scripts arepublished.
12 reasons to learn R7. R packages represent state-of-the-art inmany academic fields.8. Graphics in R are very good.9. R stimulates learning – graduation fromuser to developer.
12 reasons to learn R10. R is free, which saves you money. Or R maybe the only option when budgets arerestricted.11. R encourages to freely explore newmethods and learn about them.12. Knowing to work with R is a valuable andtransferable skill.
Choose a mirror nearby and then... Binaries “When downloading, a completely functional program without any installer is also often called program binary, or binaries (as opposed to the source code).” (Wikipedia)
And finally, you can download R...When downloading has finished, run the installer
Four parts Create a new R script: File – New – R ScriptScripts, Workspace/historydocumentation Files, plots, packages and helpConsole
Our first code...Type1+1into the script area.Then click “Run”.What happens?
Exercises: running codeType a second line with another calculation(use “-”, “/”, or “*”) and click “Run” again.Select only one line with the mouse or Shift +arrows. Then click “Run”.Save your first code to a separate folder“Rexercises”.
Following exercisesIn the next exercises, we will develop a script.Just copy every new line and add it to your script,without erasing the previous part.If you want to make a comment in your script,put a # before that line. Like this:#important to remember: use # to comment
If the exercises are a bit silly......that’s because you are learning.
VectorType a new line with the expression1:10in the script and run this line.A concatenation of values is called a vector.
Making a new variableIf we send 1:10 into the console it will onlyprint the outcome. To “store” this vector, weneed to do the following. a <- 1:10 new variable “a” assign vector values 1 to 10
Operations with vectorsTry the following and see what happens.aa*2a*ab <- a * abprint(b)
Other ways of making vectorsd <- c(1, 6, 9)dclass(d)f <- LETTERSfclass(f)What is the difference between d and f?
FunctionsActually, we have already seen functions!Functions consist of a name followed by oneor more arguments. Commas and bracketscomplete the expression. class(f) c(d,f) name argument
Cheat sheetWhen you use R, you will become familiarwith the most common functions.If you need a less common function, there areways to discover the right one.For now, use the cheat sheet to look up thefunctions you need.
Getting help on functionsThis will open help pages for the functions in yourbrowser.?c?classEspecially the examples are often helpful.Just copy and paste the code into the console andsee with your own eyes what happens!
MatricesWe have already met the vector.If we put two or more vector together ascolumns, we get a matrix.X <- c(1,2,3)Y <- c(8,9,7)Z <- c(4,2,8)M <- cbind(X, Y, Z)How many columns and rows does M have?
Data framesMatrices must consist of values of the same class. But oftendatasets consist of a mix of different types of variables (realnumbers and groups). This is the job of data frames.L <- c(“a”, “b”, “c”)Df <- data.frame(X,Y,Z,L)Visualize Df like this:str(Df)What would happen if you tried to make a matrix out ofthese same vectors instead? Try and see.
Getting data into R?read.csvCSV files are a relatively trouble-free way ofgetting data into R.It is a fairly common format.You can make a CSV file in any spreadsheetsoftware.
Create a CSV fileFirst name Family name Sex AgeJohn Travolta Male 57Elijah Wood Male 30Nicole Kidman Female 44Keira Knightley Female 26 Add your own favorite actor, too. Open the file with Notepad. Make sure the values are separated by commas.
Now use R to read itNow read it into R.actors <- read.csv(yourfile.csv)str(actors)
SubsettingThere are many ways of selecting only part of adata frame. Observe carefully what happens.actors[1:2,]actors[,1:2]actors[“Age”]actors[c(“Name”, “Age”)]subset(actors, Age> 40)Now create a new data frame with the actorsyounger than 45.
GraphicsThe plot function makes graphs.plot(actors[c(“sex”, “Age”)])
SummaryYou now know about:VariablesFunctionsVectorsMatricesData framesGetting tabular data into RSubsettingSimple plotting