Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Getting Started with R


Published on

Quick introduction to the open source R programming language.

Published in: Data & Analytics
  • Be the first to comment

Getting Started with R

  1. 1. © 2014 Impact Analytix, LLC q u i c k l y m a k e a p o s i t i v e i m p a c t R Part 1: Getting Started, Language Basics and Data Visualization Jen Underwood Founder & Principal Consultant Impact Analytix, LLC 813.435.5344
  2. 2. © 2014 Impact Analytix, LLC Impact Analytix, LLC o Impact Analytix, LLC is a boutique business intelligence and predictive analytics firm based in Tampa, Florida. o Jen Underwood, Founder & Principal Consultant • ~20 years of business intelligence industry experience • Former Global Microsoft BI and Analytics Technical Product Manager and seasoned Big-Four Consulting BI Practice Lead • Passionate technology blogger, evangelist and volunteer, TDWI, BeyeNETWORK, PASS, SharePoint Conference, and Microsoft TechEd • Bachelor of Business Administration degree University of Wisconsin Milwaukee Post Graduate Certificate Computer Science - Data Mining University of California, San Diego
  3. 3. © 2014 Impact Analytix, LLC Getting Started
  4. 4. © 2014 Impact Analytix, LLC Introduction o R popular open source statistical platform o Base install with ~8 packages o Extended via packages o R package foreign allows for reading data from SAS, SSPS and others o RODBC, xlsx and many others for connectivity o Huge and growing global developer community o Commonly used R Tools o The R Project for Statistical Computing download latest R from CRAN o R Studio IDE popular for development Free, open source, and works great on Windows, Mac, and Linux
  5. 5. © 2014 Impact Analytix, LLC R o RGUI.exe default user interface to the command line language o R Console > o to quit type q() and to clear workspace ctrl + l o Case-sensitive: “jen”, “Jen” and “jEn” o ?, Help( ) or Help(function name) o Use print(variable name) to see variable content or type the variable at command prompt and hit enter o Use # for Comments in R o Use <- for assigning a variable <- > x <- “hello world” > x [1] “hello world” > 1+2 [1] 3
  6. 6. © 2014 Impact Analytix, LLC R Studio Source R files Console for interactive work Variables and command history Installed packages, help, other goodies
  7. 7. © 2014 Impact Analytix, LLC Development Environment o Set a working directory getwd() o save() or save.image() or use menu o load(“file name”) o install.packages(package name) or use Tools > Install Packages menu o Popular repositories for R o Use library(“package name”) to load a package only when needed to save memory o detach(package: package name) unloads the package
  8. 8. © 2014 Impact Analytix, LLC Development Environment o Objects o Variables, arrays of numbers, strings, functions, structures o Use memory o objects() to see them o rm(object name) to remove them o If saved, in work directory file called .RData o R function calls, options) o Vectors, Lists, Arrays or Matrices and Data Frames o Data Frame in R like a database table o Many ways to get data into and out of R o R Data Import/Export Help has a plethora of options to work with data
  9. 9. © 2014 Impact Analytix, LLC Reading from Files o Import Dataset menu in R Studio o Copy from clipboard read.delim("clipboard") and using scan() o Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. read.csv(file, header = TRUE, sep = ",", quote = """, dec = ".", fill = TRUE, comment.char = "", ...) read.delim(file, header = TRUE, sep = "t", quote = """, dec = ".", fill = TRUE, comment.char = "", ...) read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL, startRow=NULL, endRow=NULL, colIndex=NULL,, header=TRUE, colClasses=NA, keepFormulas=FALSE, encoding="unknown", ...) o Other packages like gdata or XLConnect loadWorkbook("C:UsersJenDocumentsBikeBuyers.xlsx") readWorksheet(wb, sheet = "BikeBuyers", startRow = 0, endRow = 10,startCol = 0, endCol = 0)
  10. 10. © 2014 Impact Analytix, LLC Connecting to a Database o RODBC package for database connectivity, also many vendor specific R packages available o Other packages like sqlutils for database query and procedure calling functionality odbcConnect(dsn, uid = "", pwd = "", ...) odbcDriverConnect(connection = "", case, believeNRows = TRUE, colQuote, tabQuote = colQuote, interpretDot = TRUE, DBMSencoding = "", rows_at_time = 100, readOnlyOptimize = FALSE) odbcConnectAccess(access.file, uid = "", pwd = "", ...) odbcConnectExcel(xls.file, readOnly = TRUE, ...)
  11. 11. © 2014 Impact Analytix, LLC Function Description odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame sqlQuery(channel, query) Submit a query to an ODBC database and return the results sqlSave(channel, mydf, tablename = sqtable, append = FALSE) Write or update (append=True) a data frame to a table in the ODBC database sqlDrop(channel, sqtable) Remove a table from the ODBC database close(channel) Close the connection Querying a Database
  12. 12. © 2014 Impact Analytix, LLC Querying a Database # RODBC Example # import data from a DBMS library(RODBC) myconn <-odbcConnect("mydsn", uid=“Jen", pwd=“demo") demoDf <- sqlQuery(myconn, "select top 10 * from dbo.FactInternetSales") close(myconn)
  13. 13. © 2014 Impact Analytix, LLC Writing to Files o writeClipboard exports vector data o write.table converts object to a data frame and prints to a file or connection write.table(x, file = "foo.csv", sep = ",", col.names = NA,qmethod = "double") write.csv(x, file = "foo.csv")
  14. 14. © 2014 Impact Analytix, LLC R Language Basics
  15. 15. © 2014 Impact Analytix, LLC Basics o Expressions: 1+1, 10*10, “Hello World” o Logical Values: TRUE or FALSE o Variables: Store values into a variable using <- x <- 42 o Functions: name and one or more arguments in parenthesis sum(1,3,4) sqrt(16) help(plot)
  16. 16. © 2014 Impact Analytix, LLC Basics o Vectors: numbers, strings, logical values, or any other type, as long as they're all the same type; c (Combine) creates a new vector by combining a list of values c(4, 7, 9) o Matrices: 2-dimensional array matrix(0, 3, 4) o Data Frames: similar to a database table or an Excel spreadsheet demoDF <- c(“king”, “joy”, “pen”) demoDF2 = read.csv("C:UsersJenDocumentsBikeBuyers.csv")
  17. 17. © 2014 Impact Analytix, LLC R Data Visualization
  18. 18. © 2014 Impact Analytix, LLC Graphic Packages o R graphics functions can be grouped into three types: o High level plotting functions that create graph, often with axis labels and titles o Low level plotting functions that allow additional information to be added to an existing graph, or that allow graphs to be drawn from scratch o Interactive graphics functions that allow extraction of information
  19. 19. © 2014 Impact Analytix, LLC > example(plot) > example(barplot) > example(boxplot) > example(dotchart) > example(coplot) > example(hist) > example(fourfoldplot) > example(stars) > example(image) > example(contour) > example(filled.contour) > example(persp) Graphic Examples
  20. 20. © 2014 Impact Analytix, LLC Graphing with Sample Data Sets o R comes with a package of base datasets library(help = "datasets") o Use the print function to explore content print(iris) o Start to play/explore using R visualizations plot(iris$Petal.Length, iris$Petal.Width) install.packages("ggplot2") library(“ggplot2”) qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
  21. 21. © 2014 Impact Analytix, LLC Additional R Resources
  22. 22. © 2014 Impact Analytix, LLC Resources o Free O’Reilly R School o CRAN Intro o R Tutor o One Page Survival Guide survival-guide-to-data-science-with-r o Popular Books o R Cookbook, R in a Nutshell, R for Business Analytics
  23. 23. © 2014 Impact Analytix, LLC quickly make a positive impact