R Introduction


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Header=T means first row contains variable names
  • Some numbers are actually factors- think of 0/1 for dead/alive or zipcodes (average zipcode?)
  • R Introduction

    1. 1. R IntroWeek 1<br />Scott Chamberlain<br />[modified from Haldre Rogers]<br />September 9, 2011<br />
    2. 2. Don’t just listen to me! Other Intros to R:<br />http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf<br />http://www.cyclismo.org/tutorial/R/<br />http://www.r-tutor.com/r-introduction<br />Quick R: http://www.statmethods.net/<br />http://www.bioconductor.org/help/course-materials/2011/CSAMA/Monday/Morning%20Talks/R_intro.pdf<br />
    3. 3. R user frameworks<br />R from command line: OSX and PC<br />Just type “R” into the command line – and have fun!<br />R itself<br />http://www.r-project.org/<br />RStudio – good choice<br />http://www.rstudio.org/<br />RevolutionR [free academic version] – this is sort of the SAS-ised version of R<br />http://www.revolutionanalytics.com/downloads/free-academic.php<br />Uses proprietary .xdf file format that speeds up computation times<br />Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors<br />https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources<br />If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R<br />You can learn using these interfaces what code does what after pressing buttons<br />
    4. 4. R user frameworks, cont.<br />R from Python<br />RPy: http://rpy.sourceforge.net/<br />C from R: <br />rcpp package:<br />http://cran.r-project.org/web/packages/Rcpp/index.html<br />http://dirk.eddelbuettel.com/code/rcpp.html<br />Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.<br />E.g., http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html<br />& http://dirk.eddelbuettel.com/code/rcpp.examples.html<br />Excel from R<br />XLConnect package: http://cran.r-project.org/web/packages/XLConnect/index.html<br />And more….see for yourself<br />
    5. 5. R Tips<br />R can crash  Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: <br />https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources<br />When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples<br />Not doing this makes people not want to help you!<br />R automatically overwrites files with the same file name!!!!<br />Make sure you want to overwrite a file before doing so<br />
    6. 6. Style<br />
    7. 7. Not this kind of style…<br />
    8. 8. This kind of style!!!<br />
    9. 9. Style<br />Style is important so YOU and OTHERS can read your code and actually use it<br />Google style guide: <br />http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html#generallayout<br />Henrik Bengtsson style guide: <br />http://www1.maths.lth.se/help/R/RCC/<br />Hadley Wickham's style guide: <br />https://github.com/hadley/devtools/wiki/Style<br />
    10. 10. Preparing your data for R<br />What makes clean data?<br />Correct spelling<br />Identical capitalization (e.g. Premna vspremna)<br />If myvector <- c(3, 4, 5), calling Myvector does not work!<br />No spaces between words (spaces turned into “.”)<br />Generally try to avoid, use underscores instead<br />NA or blank (if using csv) for missing values<br />Find and replace to get rid of spaces after words<br />I generally keep an .xls and a .csv file so you can always recreate work in R with the .csv file and still modify the .xls file<br />
    11. 11. Bringing data into R<br />Create csv file<br />One worksheet only<br />No special formatting, filters, comments etc.<br />Copy only columns and rows with your data to the CSV, as R will read in columns without data sometimes<br />Name your variables well <br />self-explanatory, unique, lowercase, short-ish, one-word names<br />In R, set the working directory<br />setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")<br />What is the working directory? getwd()<br />What is in the working directory? dir()<br />Read in data<br />CSV files: iris.df <- read.csv("iris_df.csv", header=T)<br />Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it<br />From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")<br />From excel files: (using the XLConnect package)<br />iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)<br />Write data<br />write.csv(dataframe, “dataframename.csv”), OR<br />save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]<br />
    12. 12. R data structures<br />Scalar:<br />Object with a single value, either numeric or character<br />Vector:<br />Sequence of any values, including numeric, character, and NA<br />List:<br />Arbitrary collections of variables – very useful R object<br />Character:<br />Text, e.g., “this is some text”<br />Factor:<br />Like character vectors, but only w/ values in predefined “levels”<br />Matrix:<br />Only numeric values allowed<br />Dataframe: <br />Each column can be of a different class<br />Immutable dataframe: <br />special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations<br />Function<br />Environment<br />
    13. 13. Exploring dataframes<br />str(dataframe) gives column formats and dimensions<br />head(dataframe) and tail() give first and last 6 rows<br />names(dataframe) gives column names<br />row.names(dataframe) gives row names<br />attributes(dataframe) gives column and row names and object class<br />summary(dataframe) gives a lot of good information<br />Make sure variables are appropriate form<br />Character/string, Numeric, Factor, Integer, logical<br />Make sure mins, maxs, means, etc. seem right<br />Make sure you don’t have typing errors so Premna and premna are two separate factors<br />Use: unique(iris$species) to see what all unique values of a column are<br />Or use: levels(spider$species) to see different levels<br />
    14. 14. To attach or not to attach…that is the question<br />Some like to use ‘attach’ to make dataframe variables accessible by name within the R session <br />Generally, ‘attach’ is frowned upon by R junkies. <br />Use dataframe$y, or data=dataframe, or dataframe[,”y”], or dataframe[, 2]<br />To detach the object, use: detach() <br /> I recommend: do not use attach, but do what you want<br />
    15. 15. R Packages<br />3,262 packages!!!!<br />Packages are extensions written by anyone for any purpose, usually loaded by:<br />install.packages(”packagename”), then<br />require(packagename) or library()<br />Use ?functionname for help on any function in base R or in R packages<br />In RStudio, just press tab when in parentheses after the function name to see function options!!!<br />Explore packages at the CRAN site:<br />http://cran.r-project.org/web/packages/<br />Inside-R package reference: <br />http://www.inside-r.org/packages<br />
    16. 16. Data manipulation<br />Packages: plyr, data.table, doBY, sqldf, reshape2, and more<br />Comparison of packages<br />Modified from code from Recipes, scripts and Genomics blog: https://gist.github.com/878919<br />data.table is by far the fastest!!! <br />BUT, ease of use and flexibility may be plyr? See for yourself…<br />Also, see examples in the tutorial code for reshape2 package for neat data manipulation tricks<br />
    17. 17. Visualizations<br />A few different approaches:<br />Base graphics<br />Lattice graphics<br />Grid graphics<br />ggplot2 graphics<br />Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics<br />An example:<br />
    18. 18. more on ggplot2 graphics<br />There are classes taught by Hadley Wickham here at Rice if you want to learn more!<br />Data visualization (Stat645): http://had.co.nz/stat645/<br />Statistical computing (Stat405): http://had.co.nz/stat405/<br />Hadley’s website is really helpful: http://had.co.nz/ggplot2/<br />The ggplot2 google groups site: https://groups.google.com/forum/#!forum/ggplot2<br />
    19. 19. QUICK RSTUDIO RUN THROUGH<br />Keyboard shortcuts!!<br />http://www.rstudio.org/docs/using/keyboard_shortcuts<br />
    20. 20. USE CASE HERE<br />[see intro_usecase.R file]<br />