Successfully reported this slideshow.
Your SlideShare is downloading. ×

R language introduction

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Programming in R
Programming in R
Loading in …3
×

Check these out next

1 of 27 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (17)

Advertisement

More from Shashwat Shriparv (20)

Advertisement

Recently uploaded (20)

R language introduction

  1. 1. R Language dwivedishashwat@gmail.com
  2. 2. Background • 1991: Created by Ross Ihaka and Robert Gentleman • 2000: R version 1.0.0 is released • Latest version is 2.15.2 released in Oct „12 • R version 2.15.3 is scheduled to release in Mar „12 and 3.0.0 is scheduled to be released in Apr ‟13 • http://www.r-project.org (basic information about R) • http://www.cran.r-project.org (base system and additional packages) • help() or ?help, help.search() or ??help
  3. 3. Background • R is a free software environment for statistical computing and graphics • Very active and vibrant user community • Graphical capabilities • Physical memory • Base R and around 4000 packages 1/21/2014
  4. 4. Introduction  memory.limit(): To find out maximum amount of available physical memory  memory.size(): To find out how much memory is in use  getwd(): Shows the path of your current working directory  setwd(path): Allows you to set a new path for your current working directory  dir(): List down all the files in your working directory  Program Editor (open, load, run, save) • ls(): List all objects in your workspace • rm(): Removes object from your workspace
  5. 5. Introduction  Commands to R are expressions (4/3) or assignments (x <- 4/3)  R is case sensitive  Everything in R is a object  Normally R objects are accessed by their names which is made up from letters, and digits (0 to 9) or a period (“.”) in non-initial positions.  Every object has a class  R has 5 basic classes of objects  character  numeric (real numbers)  integers  complex  logical (True / False) • The most basic object is a vector  A vector can only contains objects of the same class
  6. 6. Background • Ex. •  x <- 1 # assignment  Print(x) # explicit printing  X # auto printing Ex.         • Q.    • x <- c(0.5, 0.6) # numeric X <- c(TRUE, FALSE) # logical X <- c(T, F) # logical X <- c(“a”, “b”, “c”, “d”) # character X <- 1:20 # integer X <- c(1+0i, 2+4i) #complex seq(from=1, to=10, by=1) rep(c(1,2,3,4,5), times=2, each=2) X <- c(1.7, “a”) X <- c(TRUE, 2) X <- c(“a”, TRUE) When different objects are mixed in a vector, coercion occurs so that every element in the vector is of same class
  7. 7. Session 1 Remaining • rep(x, times, length.out, each) • rm() • rm(list=ls(pattern=“^test”)) • rm(list=ls(pattern=“test”)) • rm(list=setdiff(ls(), “test”)) • rm(list=ls())
  8. 8. Introduction • Ex.  X <- 0:6  X <- c(“a”, “b”, “c”)  X <- c(1, 2, 3)  Numbers in R generally treated as numeric (i.e. double precision real numbers)  If you explicitly wants an integer then you need to specify L suffix  Special number Inf (1/0), it actually a real number, 1/Inf will give you 0  Undefined value NaN (0/0) Not a Number, it can be though of as missing value • # indicates comments
  9. 9. Data Types  R objects can have attributes (attributes())  Class (class())  Length (length())  names (colnames for a matrix), dimnames (rownames, colnames for a matrix)  dimensions (dim())  other user defined attributes  Various data types in R  Vectors  Vector(mode, length) • Lists: Special type of vectors which can contain objects of different classes.  x <- list(1,2,3,“a”,”b”,”c”)  x <- list(a=c(1,2,3), b=1:4, c=c(“a”,”b”,”c”))
  10. 10. Data Types  Matrix: vectors with dimension attribute. Dimension itself is an integer vector of length 2 (nrow, ncol). Matrices are constructed column wise.       m <- matrix(nrow=2, ncol=3) m <- matrix(1:6, nrow=2, ncol=3) x <- 1:3 y <- 10:12 cbind(x, y) rbind(x,y)  Data frames (data.frame())  https://stat.ethz.ch/pipermail/rhelp/attachments/20101027/05a229bb/attachment.pl  Factors: Used for categorical data i.e. Male & Female or analyst, senior analyst, manager etc.      x <- factor(c(“a”, “b”, “b”, “c”, “c”, “c”, “d”)) levels() unclass(x) levels([4:6]) Levels([4:6, drop=TRUE])
  11. 11. Date & Time  Converting a character variable to a date variable  as.Date(variable_name, input_format)  strptime(variable_name, input_format)  Output will be %Y-%m-%d %H:%M:%S  %Y: Year with century  %m: Month as decimal number (01-12)  %d: Day of the month as decimal number(01-31)  %H: Hrs as decimal numbers (00-23)  %M: Minutes as decimal numbers (00-59)  %S: Seconda as decimal numbers (00-59)  Converting a date variable to a character variable / formatting a date variable  strftime(date_variable_name, output_format)  format(data_variable_name, output_format)  as.character(date_variable_name, output_format)
  12. 12. Sub-setting  [ always returns an object of the same class as the original; can be used to select more than one element  [[ is used to extract elements of list or data frames; it can only be used to extract single element and the class of the returned object will not necessarily be a list or data frame  $ is used to extract elements of a list or data frames by names; semantics are similar to [[
  13. 13. Operators  <: Less than  <=: Less than equals to  >: Greater than  >=: Greater than equals to  ==: Exactly equals to  !=: Not equal to  | or II: OR  & or &&: AND  !: NOT
  14. 14. Some Examples  x <- c(“a”, “b”, “c”, “c”, “d”, “a”)  x[1], x[1:4], x[x > “a”], u <- x >”a”  x <- matrix(1:6,2,3)  x[1,2], x[1,], x[,1], x[1,2, drop=FALSE]  x <- list(var_1=c(1:10), var_2=c(“a”, “b”, “c”), var_3=0.6)  x[1], x[[1]], x$var_1  name <- “var_1”, x[name], x[[name]], x$name  x[c(1,3)], x[[c(1,3)]], x[[1]][[3]]  Produce a character vector containing var_1, var_2, var_3… var_999  Remove missing values from x <- c(1, 2, 3, NA, 4, 5, NA, 6)  y <- c(“a”, “b”, NA, NA, “c”, “d”, “e”, “f”), prepare a matrix containing two columns x & y and does not have any missing value  What is the sum & mean of Wind for the observations which has temperature greater then 60 & month equals to 5  How to create a new directory with a given name
  15. 15. Reading / Writing Data Set  Principle functions for reading data into R  read.table(), read.csv(): Used for reading tabular data  readLines(): For reading lines of a text file  source(): For reading in R code file  dget(): For reading in R code file  load(): For reading in saved workspaces  unserialize(): For reading single R objects in binary form • Principle functions for writing data to files  write.table()  writeLines()  dump()  dput()  save()  serialize()
  16. 16. Importing / Exporting Data  Read.table() is one of the most commonly used function for reading data. Few important arguments;  file, name of the file to be read,  header, logical indicating if the file has a header line  sep, a string indicating how the columns are separated  colClasses, a character vector indicating class of each column in the dataset  nrows, the maximum number of rows to be read in the dataset  na.strings, a character vector of strings which are to be interpreted as NA values  comment.char, a character string indicating the comment character  skip, number of lines to skip from beginning  stringAsFactors, logical indicating should character variables be codes as factors  Write.table()  X, the object to be written, preferable a matrix or a data frame  File, path and name of the file to be created  Sep, a string indicating how the columns are separated  Row.names, col.names, logical indicating whether the row names or col names to be written along with x
  17. 17. Data Summary / Manipulation  attach(x): For attaching a file  detach(x): For detaching a file • summary(x): For displaying summary statistics of a data set • str(x): For displaying summary statistics of a data set in a different manner then summary() • sort(): For sorting a vector or factor • order(): For ordering along more than one variable • merge(): Merge two data frames by common columns or row names, or do other versions of database join operations • cut(x, breaks, labels): Divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.  cut(x, 10, 1:10)
  18. 18. Data Summary / Manipulation • pretty(x, n): Compute a sequence of about n+1 equally spaced „round‟ values which cover the range of the values in x.  pretty(x, 100) • substr(x, start, stop) <- value: Extract or replace substrings in a character vector. • strsplit(): Split the elements of a character vector x into substrings according to the matches to substring split within them. • rank(): Returns the sample ranks of the values in a vector. Ties (i.e., equal values) and missing values can be handled in several ways • aggregate(): Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.  ddply(): For each subset of a data frame, apply function then combine results into a data frame.
  19. 19. Control Structures  Allows you to control the flow of execution of the program  if, else (testing a condition)  if (condition) {do something} else if {do something different} else {do something different}  for (executing a loop fixed number of times)  for (i in 1:10) { do something}  while (executing a loop while a condition is true)  while (condition) { do something}  repeat (execute a infinite loop)  break (break the execution of a loop)  next (skip a iteration of a loop)  return (exit a function)  Create a vector with all integers from 1 to 1000 and replace all even number by their inverse
  20. 20. Loop Functions  lapply: Returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X  lapply(airquality, mean)  Calculate sum of all the variables of the airquality dataset excluding NAs  sapply: Sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate  sapply(airquality, mean)  Repeat the problem present in lapply using sapply and see the difference  apply: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix  apply(airquality, 1, sum)  Calculate deciles including min and max of all the variables of the dataset airquality excluding NAs  Calculate square of each element of a matrix with dimensions 10 & 2 and entries 1 to 20
  21. 21. Loop Functions  tapply: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors  tapply(airquality$Ozone, aiqruality$Month, sum)  Calculate sum of Ozone variable for observations having month equals to 5  mapply: mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each argument, the second elements, the third elements, and so on  mapply(rep, 1:4, 4:1)  Calculate sum of two lists with dimensions 10 & 2 and having entries 1 to 20, 101 to 120, 201 to 220 & 301 to 320
  22. 22. Plotting Functions  plot(x,y)  hist(x)  par()  pch: plotting symbol  lty: line type  lwd: line width  col: plotting color  las: axis label orientation  bg: background color  mar: margin size  oma: outer margin size  mfrow: number of plots per row, column (plots are filled row-wise)  mfcol: number of plots per row, column (plots are filled column-wise)
  23. 23. Plotting Functions  lines: add lines to the plot  points: add points to the plot  text: add text labels to the plot  title: add annotations to x, y axis labels, title, subtitle, outer margin  mtext: add text to the margins of the plot  axis: adding axis ticks/labels
  24. 24. Functions  function ()  Exact match –> Partial match –> Positional match  Return value of a function is the last expression in the function body to be evaluated  Functions can be nested, so that a function can be defined inside another function  Functions can be passed as arguments to other functions
  25. 25. Debugging • Primary tools for debugging functions in R  traceback: prints out the function call stack after an error occurs; does nothing if there is no error  debug: flags a function for debug mode which allows you to step through execution of a function one line at a time  browser: suspends the execution of a function whenever it is called and puts the function in debug mode  trace: allows you to insert debugging code into a function at specific places  recover: allows you to modify the error behavior so that you can browse the function call stack
  26. 26. Debugging  Indications that something‟s is not right  message: a generic notification/diagnostic message produced by the message function; execution of the function continues  warning: an indication that something is wrong but not necessarily fatal produced by warning function‟ execution of the function continues  error: an indication that a fatal problem has occurred produced by stop function; execution stops  condition: a generic concept for indicating that something unexpected can occur; programmers can create their own conditions
  27. 27. Thanks a lot For Question Read more  dwivedishashwat@gmail.com

×