Getting Started with R


Quick introduction to the open source R programming language.

Published in: Data & Analytics
  R Part 1: Getting Started, Language Basics and Data Visualization Jen Underwood Founder & Principal Consultant Impact Analytix, LLC
  Impact Analytix, LLC o Impact Analytix, LLC is a boutique business intelligence and predictive analytics firm based in Tampa, Florida. o Jen Underwood, Founder & Principal Consultant • ~20 years of business intelligence industry experience • Former Global Microsoft BI and Analytics Technical Product Manager and seasoned Big-Four Consulting BI Practice Lead • Passionate technology blogger, evangelist and volunteer, TDWI, BeyeNETWORK, PASS, SharePoint Conference, and Microsoft TechEd • Bachelor of Business Administration degree University of Wisconsin Milwaukee Post Graduate Certificate Computer Science - Data Mining University of California, San Diego
  Getting Started
  Introduction o R popular open source statistical platform o Base install with ~8 packages o Extended via packages o R package foreign allows for reading data from SAS, SSPS and others o RODBC, xlsx and many others for connectivity o Huge and growing global developer community o Commonly used R Tools o The R Project for Statistical Computing download latest R from CRAN o R Studio IDE popular for development Free, open source, and works great on Windows, Mac, and Linux
  R o RGUI.exe default user interface to the command line language o R Console > o to quit type q() and to clear workspace ctrl + l o Case-sensitive: "jen", "Jen" and "jEn" o ?, Help( ) or Help(function name) o Use print(variable name) to see variable content or type the variable at command prompt and hit enter o Use # for Comments in R o Use <- for assigning a variable <- > x <- "hello world" > x [1] "hello world" > 1+2 [1] 3
  R Studio Source R files Console for interactive work Variables and command history Installed packages, help, other goodies
  Development Environment o Set a working directory getwd() o save() or save.image() or use menu o load("file name") o install.packages(package name) or use Tools > Install Packages menu o Popular repositories for R o Use library("package name") to load a package only when needed to save memory o detach(package: package name) unloads the package
  Development Environment o Objects o Variables, arrays of numbers, strings, functions, structures o Use memory o objects() to see them o rm(object name) to remove them o If saved, in work directory file called .RData o R function calls, options) o Vectors, Lists, Arrays or Matrices and Data Frames o Data Frame in R like a database table o Many ways to get data into and out of R o R Data Import/Export Help has a plethora of options to work with data
  Reading from Files o Import Dataset menu in R Studio o Copy from clipboard read.delim("clipboard") and using scan() o Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. read.csv(file, header = TRUE, sep = ",", quote = """, dec = ".", fill = TRUE, comment.char = "", ...) read.delim(file, header = TRUE, sep = "t", quote = """, dec = ".", fill = TRUE, comment.char = "", ...) read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL, startRow=NULL, endRow=NULL, colIndex=NULL,, header=TRUE, colClasses=NA, keepFormulas=FALSE, encoding="unknown", ...) o Other packages like gdata or XLConnect loadWorkbook("C:UsersJenDocumentsBikeBuyers.xlsx") readWorksheet(wb, sheet = "BikeBuyers", startRow = 0, endRow = 10,startCol = 0, endCol = 0)
  Connecting to a Database o RODBC package for database connectivity, also many vendor specific R packages available o Other packages like sqlutils for database query and procedure calling functionality odbcConnect(dsn, uid = "", pwd = "", ...) odbcDriverConnect(connection = "", case, believeNRows = TRUE, colQuote, tabQuote = colQuote, interpretDot = TRUE, DBMSencoding = "", rows_at_time = 100, readOnlyOptimize = FALSE) odbcConnectAccess(access.file, uid = "", pwd = "", ...) odbcConnectExcel(xls.file, readOnly = TRUE, ...)
  Function Description odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame sqlQuery(channel, query) Submit a query to an ODBC database and return the results sqlSave(channel, mydf, tablename = sqtable, append = FALSE) Write or update (append=True) a data frame to a table in the ODBC database sqlDrop(channel, sqtable) Remove a table from the ODBC database close(channel) Close the connection Querying a Database
  Querying a Database # RODBC Example # import data from a DBMS library(RODBC) myconn <-odbcConnect("mydsn", uid="Jen", pwd="demo") demoDf <- sqlQuery(myconn, "select top 10 * from dbo.FactInternetSales") close(myconn)
  Writing to Files o writeClipboard exports vector data o write.table converts object to a data frame and prints to a file or connection write.table(x, file = "foo.csv", sep = ",", col.names = NA,qmethod = "double") write.csv(x, file = "foo.csv")
  R Language Basics
  Basics o Expressions: 1+1, 10*10, "Hello World" o Logical Values: TRUE or FALSE o Variables: Store values into a variable using <- x <- 42 o Functions: name and one or more arguments in parenthesis sum(1,3,4) sqrt(16) help(plot)
  Basics o Vectors: numbers, strings, logical values, or any other type, as long as they're all the same type; c (Combine) creates a new vector by combining a list of values c(4, 7, 9) o Matrices: 2-dimensional array matrix(0, 3, 4) o Data Frames: similar to a database table or an Excel spreadsheet demoDF <- c("king", "joy", "pen") demoDF2 = read.csv("C:UsersJenDocumentsBikeBuyers.csv")
  R Data Visualization
  Graphic Packages o R graphics functions can be grouped into three types: o High level plotting functions that create graph, often with axis labels and titles o Low level plotting functions that allow additional information to be added to an existing graph, or that allow graphs to be drawn from scratch o Interactive graphics functions that allow extraction of information
  Graphic Examples > example(plot) > example(barplot) > example(boxplot) > example(dotchart) > example(coplot) > example(hist) > example(fourfoldplot) > example(stars) > example(image) > example(contour) > example(filled.contour) > example(persp)
  Graphing with Sample Data Sets o R comes with a package of base datasets library(help = "datasets") o Use the print function to explore content print(iris) o Start to play/explore using R visualizations plot(iris$Petal.Length, iris$Petal.Width) install.packages("ggplot2") library("ggplot2") qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
  Additional R Resources
  Resources o Free O'Reilly R School o CRAN Intro o R Tutor o One Page Survival Guide survival-guide-to-data-science-with-r o Popular Books o R Cookbook, R in a Nutshell, R for Business Analytics
