Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to R


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Introduction to R

  1. 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Introduction to R - 09.01.2014
  2. 2. Getting help with functions To get more information on any specific named function, for example solve, the command is > help(solve) An alternative is: > ?solve Running: > help.start() we will launch a Web browser that allows to enter to the help home page. The ?? command allows searching for help in a different way. For example, it is usefull to get a help of non installed packages.
  3. 3. objects and saving data The entities that R creates and manipulates are known as objects. During an R session, objects are created and stored by name. > objects() can be used to display the name of the objects which are currently stored within R. > rm() can be used to remove objects. At the end of each R session, you are given the opportunity to save all the currently available objects. You can save the objects (the workspace) in .RData format in the current directory. You also can save command lines in .Rhistory format.
  4. 4. scalars and vectors: manipulation To set up a vector named x, namely 1, 2, 3, 4 and 5, let use the R command: > x = c(1,2,3,4,5) or, identically, the assign function could be used. > assign(”x”, c(1,2,3,4,5)) x is a vector of length 5. To check it we can use the following function: > length(x) >1/x gives the reciprocal of x. > y = c(x,0,x) would create a vector with 11 entries consisting of two copies of x with a 0 in the middle.
  5. 5. scalars and vectors: manipulation Vectors can be used in arithmetic expressions. Vector in the same expression need not all to be of the same length. If not, the output value have the length of the longest vector in the expression. For example: >v =2∗x +y +1 generate a new vector of length 11 constructed by adding together, element by element, 2*x repeated 2.2 times, y repeated just once, and 1 repeated 11 times. So, WARNING: R compute that kind of expression even if it is wrongly defined.
  6. 6. scalars and vectors: manipulation - 2 In addition, are also available log, exp, sin, cos, tan, sqrt and, of course, the classical arithmetic operators min(x) and max(x) select the smallest and the largest element of the vector. sum(x) and prod(x) display the sum and the product, respectively, of the numbers within the vector. mean(x) calculates the sample (arithmetic) mean, wich is the same of sum(x)/length(x); and var(x) gives the sample variance: sum((x − mean(x))2 )/(length(x) − 1) sort(x) returns a vector of the same size of x, with the elements in increasing order.
  7. 7. seq and rep There are facilities to generate commonly used sequences of numbers. > 1:30 is the vector c(1,2, ..., 29,30) > 2*1:15 is the vector c(2,4, ..., 28,30) of length 15. In addition, seq() is in use. seq(2:10) is the same of the vector 2:10 by=, from=, to= are usefull command: >seq(from= 30, to = 1) >seq(-10, 10, by = 0.5) rep() can be used for replicating and object. > rep(x, times=5) > rep(x, each=5)
  8. 8. logical vectors As well as numerical vectors, R allows manipulation of logical quantities. The elements of a logical vector can have the value TRUE, FALSE and NA (”not available”) Logical vectors are generated by conditions. Example: > temp = x > 3 The logical operator are : <, <=, >=, ==, ! = for inequality. In addition, if c1 and c2 are logical expressions, then c1c2 is the intersection (”and”), c1|c2 is the union (”or ”), and !c1 is the negation of c1
  9. 9. missing Values In some cases the components of a vector may not be completely known: in this case we assign the value ”NA” The function gives a logical vector of the same size as x with value TRUE if the corresponding element in x is NA. > z = c(1:3, NA); ind = There is a second kind of ”missing” values that are produced by numerical computation, the so-called Not a Number, NaN, values. Example: > 0/0 > Inf/Inf
  10. 10. index vectors: subsets of a vector Subsets of the elements of a vector may be selected by appendix to the name of the vector an index vector in square brackets. 1. A logical vector: Values corresponding to TRUE in the index vector are selected: > y = x[!] 2. A vector of positive (negative) integer quantities: in this case the values in the index vector must lie in the set {1, 2, ..., length(x)}. In the second case the selected vales will be excluded. > x[2:3]; x[-(2:3)] 3. A vector of character string: this is possible only after applying a names to the objects. > cars = c(1,2,3) > names(cars)=c(”ferrari”,”lamborghini”,”bugatti”) > pref = cars[c(”ferrari”,”bugatti”)]
  11. 11. Objects and attribute To each object it is associated one (and only one) attribute (it’s the reason why we called them ”atomic”) The objects can be: numeric, logical, complex, character and raw Usefull commands: mode(), as.numeric(), is.numeric() For example, create a numeric vector: > z = 0:9 change it in character: > digits = as.character(z); and coerce it in a numeric:> d = as.integer(digits) d and z are the same!
  12. 12. arrays, matrices and data.frame Vectors are the most important type of objects in R, but there are several others. Between the others: matrix: they are multidimensional generalizations of vectors data.frame: matrix-like structures, but the column can be of different types. This is used when we manage with both numerical and categorical data. How to transform a vector in matrix? > v = 1:50 > dim(v) = c(10,5)
  13. 13. arrays, matrices and data.frame (2) How to create by beginning a matrix? > m = array(1:20, dim= c(4,5)) Subsetting a matrix or replacing a subset of a matrix with zeros? Lets give a look to the examples in the codes.
  14. 14. matrix manipulation The operator ÷ ∗ ÷ is used for the matrix moltiplication. An nx1 or 1xn matrices are also valid matrices. If for example, A and B are square matrix of the same size, then: >A*B is the matrix of element by element products(it doesn’t work for matrices with different dimension), and > A ÷ ∗ ÷ t(B) is the matrix product. diag(A) return the elements in the main diagonal of A. ginv(A) and t(A) return the inverse and the transposed matrix. Ginv() require MASS package.
  15. 15. lists and data frames An R list is an object consisting of an ordered collection of objects known as its components. Here is a simple example of how to make a list: > Lst = list(name=”Rodolfo”, surname=”Metulini”, age = ”30”) It is possible to concatenating two or more lists: list.ABC = C(list.A, list.B, list.C) A data.frame is a list with a specific class ”data.frame”. We can convert a matrix object in a data.frame objects with the command The Easiest way to create a data.frame object is by mean of read.table () function.
  16. 16. reading data Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. There are basically two similar commands to upload data. 1. read.table(): specific for .csv files. 2. read.delim(): specific for .txt files Usefull commands: sep = ” ”: to specify if data in the dataset are separated by ;, ., , or they are tab delimited. header = TRUE : to specify that first row in the dataset refers to variable names moreover, read.dta() is used to upload data from STATA :)
  17. 17. distributions and co. One convenient use of R is to provide a comprehensive set of statistical tables. Functions are provided to evaluate the comulative distribution P(X < x), the probability density function and the quantile function (given q, the smallest x such that P(X < x) > q), and to simulate from the distribution. Here, by ”d” for the density , ”p” (pnorm, punif, pexp etc ..) for the CDF, ”q” for the quantile function. and ”r ” for simulation. Let empirically examine the distribution of a variable (codes).
  18. 18. covar and concentration indices The covariance and the correlation measure the degree at which two variables change togheter The correlation is a index [-1,1], the covariance is a pure number (depends on the values assumed by the variables) > Cov = cov(A,B) > Cor = corr(A,B) We can also calculate the correlation netween A and B as follow: > CorAB = Cov / sqrt(Var(A)*Var(B)) Gini index: it is the most popular concentration index, we need to install ineq package Mode: the most frequent value within the distribution, we need to install modeest package, mfv command
  19. 19. homeworks For who of us is familiar with STATA, lets try to upload a .dta file with read.dta() function. Study the agreement with other distributions (exponential? uniform? it is up to you) of eruption data.