Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

R programming groundup-basic-section-i

587 views

Published on

R Programming Basics Section I

Published in: Software
  • Be the first to comment

R programming groundup-basic-section-i

  1. 1. R-Programming–Basics R Programming Ground Up! Syed Awase Khirni Syed Awase earned his PhD from University of Zurich in GIS, supported by EU V Framework Scholarship from SPIRIT Project (www.geo-spirit.org). He currently provides consulting services through his startup www.territorialprescience.com and www.sycliq.com 1Copyright 2008-2016 Syed Awase Khirni TPRI
  2. 2. R-Programming–Basics R Project • R – Free Software environment for statistical computing and graphics. • https://www.r- project.org • https://cran.r- project.org/mirrors.html Copyright 2008-2016 Syed Awase Khirni TPRI 2
  3. 3. R-Programming–Basics S • S Language – Developed by John Chambers et. al at Bell Labs • 1976 -> internal statistical analysis environment – originally implemented as Fortran Libraries • 1988-> Rewritten in C – statistical models in S by Chambers and Hastie • 1998-> S v.4.0 • 1991-> R created in New Zealand by Ross Ihaka and Robert Gentleman. • 1993 -> public release of R • 1995-> Martin Machler convinced Ross and Robert to use the GNU GPU License • 1996 , 1997 -> R Core Group Formed with (S Plus Core Group) • 2000- R Version 1.0 Released • 2015 R Version 3.1.3 -> March 9, 2015. Copyright 2008-2016 Syed Awase Khirni TPRI 3
  4. 4. R-Programming–Basics Design of the R System • R –Statistical Programming language based on S language developed by Bell Labs. • Divided into 2 conceptual parts – Base – Add-on Packages • Base – R System contains – The base package which is required to run R and contains the most fundamental functions. – Other packages contained in the base system include utils, stats, datasets, graphics, grDevices, grid, methods, tools, parallel, compiler, splines, tcltk, stats4 • Add-on Packages are packages that are published by either R Core group or any third party vendors • Syntax similar to S, making it easy for S-PLUS users to switch over • Semantics are superficially similar to S, but in reality are quite different • Runs on almost any standard computing platform/OS Copyright 2008-2016 Syed Awase Khirni TPRI 4
  5. 5. R-Programming–Basics R? • R is an integrated suite of software facilities for data manipulation, calculation and graphical display • R has – Effective data handling and storage facility – A suite of operators for calculations on arrays and matrices – A large, coherent, integrated collection of tools for data analysis – Graphical facilities for data analysis and display – A well developed, simple and effective programming language Copyright 2008-2016 Syed Awase Khirni TPRI 5
  6. 6. R-Programming–Basics R- Drawbacks • Little built-in support for dynamic or 3-D graphics • Functionality is based on consumer demand and user contributions • Web support provided through third party software. Copyright 2008-2016 Syed Awase Khirni TPRI 6
  7. 7. R-Programming–Basics DATA TYPES AND BASIC OPERATIONS IN R Copyright 2008-2016 Syed Awase Khirni TPRI 7
  8. 8. R-Programming–Basics Data Types • Objects • Numbers • Attributes • Entering Input and Printing • Vectors, Lists • Factors • Missing Values • Data Frames • Names Copyright 2008-2016 Syed Awase Khirni TPRI 8
  9. 9. R-Programming–Basics Objects in R • R has five basic or atomic classes of objects – Character – Numeric (real number) – Integer – Complex – Logical (true/false) • The most basic object is a vector – A vector can only contain objects of the same class – The one exception is a list, which is represented as a vector but can contain objects of different classes – Empty vectors can be created with the vector() function Copyright 2008-2016 Syed Awase Khirni TPRI 9
  10. 10. R-Programming–Basics R Studio Copyright 2008-2016 Syed Awase Khirni TPRI 10
  11. 11. R-Programming–Basics Install.packages() • To install additional third party packages into your R software. We use • Install.packages(“XLCon nect”) – To install XLConnect package – To activate an already installed package we use • Library(“packagename”) Copyright 2008-2016 Syed Awase Khirni TPRI 11 Check if the package is already installed or not. any(grepl("<name of your package>", installed.packages()))
  12. 12. R-Programming–Basics Numbers in R • Treated as numeric objects (i.e. double precision real numbers) • Suffix L => integer • Example : 1 => numeric object – 1L => explicitly gives an integer • 1/0 => inf (infinity) • NaN => not a number or missing value Copyright 2008-2016 Syed Awase Khirni TPRI 12
  13. 13. R-Programming–Basics Attributes • R objects can have attributes – Names, dimnames – Dimensions (e.g. matrices, arrays) – Class – Length – Other user-defined attributes/metadata • Attributes of an object can be accessed using the attributes() function. Copyright 2008-2016 Syed Awase Khirni TPRI 13
  14. 14. R-Programming–Basics Assignment Operator (<-) • Expressions in R are done using <- assignment operator. • The grammar of the language determines whether an expression is complete or not • The # character indicates a comment. Anything to the right of the # (including the # itself) is ignored • [1] indicates that x is a vector and 123781213412 is the first element Copyright 2008-2016 Syed Awase Khirni TPRI 14 //auto printing Ctrl+L to clear console
  15. 15. R-Programming–Basics Vectors in R • The c() function can be used to create vectors of objects. Copyright 2008-2016 Syed Awase Khirni TPRI 15
  16. 16. R-Programming–Basics Vectors in R • Using the vector() function Copyright 2008-2016 Syed Awase Khirni TPRI 16
  17. 17. R-Programming–Basics Mixing Objects • When different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class. Copyright 2008-2016 Syed Awase Khirni TPRI 17
  18. 18. R-Programming–Basics Explicit Coercion • Objects can be explicitly coerced from one class to another using the as.* functions. Copyright 2008-2016 Syed Awase Khirni TPRI 18
  19. 19. R-Programming–Basics Matrices • Vectors with a dimension attribute are called Matrices. The dimension attribute is itself an integer vector of length 2(nrow, ncol) • Matrices are constructed column-wise, so entries can be thought of starting from the upper left corner and running down the columns. • Matrices can also be created directly from vectors by adding a dimension attribute. Copyright 2008-2016 Syed Awase Khirni TPRI 19
  20. 20. R-Programming–Basics Cbind-ing • Matrices can be created by Column-binding with cbind() function Copyright 2008-2016 Syed Awase Khirni TPRI 20
  21. 21. R-Programming–Basics Rbind-ing • Matrices can be created by row-binding using rbind() function. Copyright 2008-2016 Syed Awase Khirni TPRI 21
  22. 22. R-Programming–Basics Lists in R • Lists are a special type of vector that can contain elements of different classes. • Lists are a very important data type in R Copyright 2008-2016 Syed Awase Khirni TPRI 22
  23. 23. R-Programming–Basics Factors • Used to represent categorical data. Factors can be unordered or ordered. • Factors are treated specially by modelling functions like lm() and glm() • Using factors with labels is better than using integers because factors are self- describing, having a variable that has values. Copyright 2008-2016 Syed Awase Khirni TPRI 23
  24. 24. R-Programming–Basics Missing Values • Many existing, industrial and research datasets contain Missing values. • These can occur due to various reasons such as manual data entry procedures, equipment errors and incorrect measurements. • Missing values can appear in the form of outliers or even wrong data (i.e out of boundaries) Copyright 2008-2016 Syed Awase Khirni TPRI 24 • Missing values are denoted by NA or NaN for undefined mathematical operations – Is.na() is used to test objects if they are NA – Is.nan() is used to test for NaN – NA values have a class also, so there are integerNA, characterNA etc. – A NaN value is also NA but the converse is not true.
  25. 25. R-Programming–Basics Missing Values • Three type of problems are usually associated with missing values – Loss of efficiency – Complications in handling and analyzing the data – Bias resulting from differences between missing and complete data. Copyright 2008-2016 Syed Awase Khirni TPRI 25 Identifying NA values using is.na() and is.nan()
  26. 26. R-Programming–Basics Data Frames • Used to store tabular data (table of values) – They are represented as a special type of list, where every element of the list has to have the same length. – Each element of the list can be thought of as a column and the length of each element of the list is the number of the rows • Data frames can store different classes of objects in each column, while matrices must have every element of the same class • Data frames also have a special attribute called row.names. • Data frames are usually created by calling read.table() or read.csv() • Can be converted to a matrix by calling data.matrix() method Copyright 2008-2016 Syed Awase Khirni TPRI 26
  27. 27. R-Programming–Basics Data Frames Copyright 2008-2016 Syed Awase Khirni TPRI 27
  28. 28. R-Programming–Basics Data Frame in R Copyright 2008-2016 Syed Awase Khirni TPRI 28
  29. 29. R-Programming–Basics Names in R • R Objects can also have names, which is very useful for writing readable code and self- describing objects Copyright 2008-2016 Syed Awase Khirni TPRI 29
  30. 30. R-Programming–Basics Subsetting • Extracting subsets from an existing dataset is called subsetting – []Always returns an object of the same class as the original – [[]]Used to extract elements of a list or a data frame. – $ is used to extract element of a list or data frame by name; semantics are similar to that of [[]]. Copyright 2008-2016 Syed Awase Khirni TPRI 30
  31. 31. R-Programming–Basics Subsetting Matrix Copyright 2008-2016 Syed Awase Khirni TPRI 31
  32. 32. R-Programming–Basics Subsetting List Copyright 2008-2016 Syed Awase Khirni TPRI 32
  33. 33. R-Programming–Basics Subsetting Nested Elements Copyright 2008-2016 Syed Awase Khirni TPRI 33
  34. 34. R-Programming–Basics Partial Matching • Partial matching of names is allowed with [[]] and $ Copyright 2008-2016 Syed Awase Khirni TPRI 34
  35. 35. R-Programming–Basics Remove NA values • A common task is to remove missing value (NAs) prior to performing any analysis. Copyright 2008-2016 Syed Awase Khirni TPRI 35
  36. 36. R-Programming–Basics Vectorized Operations • Many operations in R are vectorized making code more efficient, concise and easier to read. Copyright 2008-2016 Syed Awase Khirni TPRI 36
  37. 37. R-Programming–Basics Vectorized Matrix Operations Copyright 2008-2016 Syed Awase Khirni TPRI 37
  38. 38. R-Programming–Basics Reading Data • R provides some useful functions to read data – Read.table, read.csv for reading tabular data – readLines, for reading lines of a text file – Source: for reading in R code files (inverse of dump) – dget: for reading in R code files (inverse of dput) – Load: for reading in saved workspaces – Unserialize, for reading single R objects in binary form. Copyright 2008-2016 Syed Awase Khirni TPRI 38
  39. 39. R-Programming–Basics Writing Data • R provides a set of functions to write data into files – Write.table: to write data in table format – writeLines: to write lines – Dump – Dput – Save – serialize Copyright 2008-2016 Syed Awase Khirni TPRI 39
  40. 40. R-Programming–Basics Reading data files with read.table • For small to moderately sized datasets, we can just call read.table without specifying any other arguments. • Data <- read.table(“sampledata. txt”) Copyright 2008-2016 Syed Awase Khirni TPRI 40
  41. 41. R-Programming–Basics R-DataSets • https://vincentarelbundock.github.io/Rdatasets/ datasets.html • http://openflights.org/data.html • http://www.public.iastate.edu/~hofmann/data_i n_r_sortable.html • https://r-dir.com/reference/datasets.html • http://fimi.ua.ac.be/data/ • https://datamarket.com/data/list/?q=provider:ts dl • https://www.data.gov/ Copyright 2008-2016 Syed Awase Khirni TPRI 41
  42. 42. R-Programming–Basics Directory/get working directory • Setting and getting the current working directory Copyright 2008-2016 Syed Awase Khirni TPRI 42 > setwd("<path to your folder>")
  43. 43. R-Programming–Basics Reading CSV files Copyright 2008-2016 Syed Awase Khirni TPRI 43
  44. 44. R-Programming–Basics Airmile data Copyright 2008-2016 Syed Awase Khirni TPRI 44
  45. 45. R-Programming–Basics Mocking sample data with mockaroo Copyright 2008-2016 Syed Awase Khirni TPRI 45 https://www.mockaroo.com/
  46. 46. R-Programming–Basics Reading large datasets with read.table Copyright 2008-2016 Syed Awase Khirni TPRI 46
  47. 47. R-Programming–Basics Write.csv() • One of the easiest ways to save an R data frame is to write it to a csv file or tsv file or text file. Copyright 2008-2016 Syed Awase Khirni TPRI 47
  48. 48. R-Programming–Basics dput() • Writes an ASCII text representation of an R object to a file or connection, or uses one to recreate the object Copyright 2008-2016 Syed Awase Khirni TPRI 48
  49. 49. R-Programming–Basics Head and Tail of DataSet • Returns the first or the last part of an object , i.e. vector, matrix, table, data frame or function. Copyright 2008-2016 Syed Awase Khirni TPRI 49
  50. 50. R-Programming–Basics Loading “foreign” data • Sometimes, we would like to import data from other statistical packages like SAS,SPSS and Stata • Reading stata (.dta) files with foreign library • Writing data files from R into Stata is also very straightforward. Copyright 2008-2016 Syed Awase Khirni TPRI 50
  51. 51. R-Programming–Basics Library”foreign”data • SPSS Data – For data files in SPSS format, it can be opened with the function read.spss from “foreign” package. – “to.data.frame” option set to TRUE to return a data frame. Copyright 2008-2016 Syed Awase Khirni TPRI 51
  52. 52. R-Programming–Basics Loading “foreign”data • Excel data – Sometimes, we have data in xls format that needs to be imported into R prior to its use. – Library(gdata) Copyright 2008-2016 Syed Awase Khirni TPRI 52
  53. 53. R-Programming–Basics Loading”foreign”data • Using XLConnect package • Install.packages(“XLCon nect”); Copyright 2008-2016 Syed Awase Khirni TPRI 53
  54. 54. R-Programming–Basics Loading”foreign data” • Minitab – For importing minitab portable worksheets into R. We can use foreign library. Copyright 2008-2016 Syed Awase Khirni TPRI 54
  55. 55. R-Programming–Basics Computing Memory Requirements • An integer takes 8bytes for numeric data type. • Imagine you have a data frame with 100,000 rows and 100 columns. • 100,000 X100X8bytes/numeric – 220 bytes/MB – Which accounts for 785 MB of memory is required. Copyright 2008-2016 Syed Awase Khirni TPRI 55
  56. 56. R-Programming–Basics Text Formats • dump and dput are useful because the resulting textual format is editable and in the case of corruption, potentially recoverable • In the case of writing out to a table or CSV file, dump and dput preserve the metadata (sacrificing some readability), so that another user doesn’t have to specify it all over again. • Textual formats can work much better with version control programs like GIT and SVN, used to track changes meaningfully • Text formats have longer life and adhere to “unix philosophy” • However, the format is not very space-efficient. Copyright 2008-2016 Syed Awase Khirni TPRI 56
  57. 57. R-Programming–Basics Dump() function • Creates a file in a format that can be read with the source() function or pasted in with the copy/paste edit functions of the windowing system. Copyright 2008-2016 Syed Awase Khirni TPRI 57
  58. 58. R-Programming–Basics Dput() function • Dput function saves data as an R expression, which means that the resulting file can actually be copied and pasted into the R console. • Creates and uses an ASCII file representing the object • Writes an ASCII version of the object onto the file. Copyright 2008-2016 Syed Awase Khirni TPRI 58
  59. 59. R-Programming–Basics Functions in R • Functions are a fundamental building block of R – Functions can be assigned to variables – Functions can be stored in lists, – Functions can be passed as arguments to other functions – Functions can have nested functions. • Anonymous functions are functions that have no name. • We use functions to incorporate sets of instructions that we want to use repeatedly or that because of their complexity, are better self-contained in a sub-program and called when needed. Copyright 2008-2016 Syed Awase Khirni TPRI 59
  60. 60. R-Programming–Basics User Defined Functions in R • UDF are defined to accomplish a particular task and are not aware that a dedicated function or library exists already. Copyright 2008-2016 Syed Awase Khirni TPRI 60
  61. 61. R-Programming–Basics User Defined Functions in R Copyright 2008-2016 Syed Awase Khirni TPRI 61
  62. 62. R-Programming–Basics User Defined Functions in R Copyright 2008-2016 Syed Awase Khirni TPRI 62
  63. 63. R-Programming–Basics Infix Operators in R • They are unique functions and methods that facilitate basic data expressions or transformations. • They refer to the placement of the arithmetic operator between variables. • The types of infix operators used in R include functions for data extraction, arithmetic sequences, comparison, logical testings, variable assignments and custom data functions Copyright 2008-2016 Syed Awase Khirni TPRI 63
  64. 64. R-Programming–Basics Infix Operator in R • Infix operators, are used between operands, these operators do a function call in the background. Copyright 2008-2016 Syed Awase Khirni TPRI 64
  65. 65. R-Programming–Basics Predefined infix Operators in R Operator Rank Description %% 6 Reminder operator %/% Integer Division %*% 6 Matrix Multiplication %o% 6 Outer Product %x% 6 Kronecker product %in% 9 Matching operator :: 1 Extract -> extract function from a package namespace. ::: 1 Extract-> extract a hidden function from a namespace $ 2 Extract list subset, extract list data by name @ 2 Extract attributes by memory slot or location. [[]] 3 Extract data by index Copyright 2008-2016 Syed Awase Khirni TPRI 65
  66. 66. R-Programming–Basics Predefined infix operators in R Operator Rank Description ^ 4 Arithmetic Exponential Operator : 5 Generate sequence of number ! 8 Not/Negation Operator Xor 10 Logical/Exclusive OR & 10 Logical and element && 10 Logical and control ~ 11 Assignment(equal) used in formals and model building <<- 12 Permanent Assignment <- 13 Left assignment -> 13 Right assignment Copyright 2008-2016 Syed Awase Khirni TPRI 66
  67. 67. R-Programming–Basics User Defined infix in R Copyright 2008-2016 Syed Awase Khirni TPRI 67
  68. 68. R-Programming–Basics User defined infix function in R Copyright 2008-2016 Syed Awase Khirni TPRI 68
  69. 69. R-Programming–Basics CONTROL FLOW IN R SYED AWASE KHIRNI Copyright 2008-2016 Syed Awase Khirni TPRI 69
  70. 70. R-Programming–Basics If If..else Copyright 2008-2016 Syed Awase Khirni TPRI 70
  71. 71. R-Programming–Basics Ifelse() • Vectors form the basic building block of R programming. • Most functions in R take vector as input and output a resultant vector • Vectorization of code will be much faster than applying the same function to each element of the vector individually. • Ifelse() is a vector equivalent of if..else statement • Test_expression must be a logical vector (or an object that can be coerced to logical) • Return value is a vector with the same length as test_expression Copyright 2008-2016 Syed Awase Khirni TPRI 71
  72. 72. R-Programming–Basics forloop Copyright 2008-2016 Syed Awase Khirni TPRI 72
  73. 73. R-Programming–Basics While Copyright 2008-2016 Syed Awase Khirni TPRI 73
  74. 74. R-Programming–Basics Break Next Copyright 2008-2016 Syed Awase Khirni TPRI 74
  75. 75. R-Programming–Basics Repeat Loop • A repeat loop is used to iterate over a block of code multiple number of time • There is no condition check in repeat loop to exit the loop • We must put a condition explicitly inside the body of the loop and use the break statement to exit the loop Copyright 2008-2016 Syed Awase Khirni TPRI 75
  76. 76. R-Programming–Basics OBJECTS AND CLASSES IN R SYED AWASE KHIRNI Copyright 2008-2016 Syed Awase Khirni TPRI 76
  77. 77. R-Programming–Basics OOP in R • An object is a data structure have some attributes and methods which act on the attributes • A class is a blue print for the object. • R has three(3) class systems – S3 Class System – S4 Class System – Reference Class System Copyright 2008-2016 Syed Awase Khirni TPRI 77
  78. 78. R-Programming–Basics S3 Class System • Primitive in nature • Lacks a formal definition and object of this class can be simply created by adding a class attribute. • Objects are created by setting the class attribute • Attributes are accessed using $ • Methods belong to generic function • Follows copy-on-modify semantics S4 Class System • A formally defined structure which helps in making object of the same class look more or less similar. • Class components are properly defined using the setClass() function and objects are created using the new() function. • Attributes are accessed using @ • Methods belong to generic function • Follows copy-on-modify semantics Copyright 2008-2016 Syed Awase Khirni TPRI 78
  79. 79. R-Programming–Basics Reference Class System • Similar to the object oriented programming we are used to in C# and Java. • Basically an extension of S4 class system with an environment added to it. • Reference Class System – Class defined using SetRefClass() – Objects are created using generator functions – Attributes are accessed using $ – Methods belong to the class – Does not follow copy- on-modify semantics Copyright 2008-2016 Syed Awase Khirni TPRI 79
  80. 80. R-Programming–Basics S3 Class System Copyright 2008-2016 Syed Awase Khirni TPRI 80
  81. 81. R-Programming–Basics S3 Class Copyright 2008-2016 Syed Awase Khirni TPRI 81
  82. 82. R-Programming–Basics S3 Class Method Copyright 2008-2016 Syed Awase Khirni TPRI 82
  83. 83. R-Programming–Basics S3 class with methods Copyright 2008-2016 Syed Awase Khirni TPRI 83
  84. 84. R-Programming–Basics Inheritance – S3 Class System Copyright 2008-2016 Syed Awase Khirni TPRI 84
  85. 85. R-Programming–Basics S4 Class System in R • S4 class is defined using the setClass() function • Member variables are called slots • When defining a class, we need to set the name and the slots (along with class of the slot) Copyright 2008-2016 Syed Awase Khirni TPRI 85
  86. 86. R-Programming–Basics S4 Class System in R Accessing Slots • Slots of an object are accessed using @ Modifying Slots Copyright 2008-2016 Syed Awase Khirni TPRI 86 • A slot can be modified through reassignment operations as shown below
  87. 87. R-Programming–Basics Inheritance in S4 Copyright 2008-2016 Syed Awase Khirni TPRI 87
  88. 88. R-Programming–Basics R Reference Class System • Reference class in R are similar to the object oriented programming, we are used to seeing in C++, Java, Python. • Unlike S3 and S4 classes, methods belong to class rather than generic functions. • Reference class are internally implemented as S4 classes with an environment added to it. • setRefClass() returns a generator function which is used to create objects of that class Copyright 2008-2016 Syed Awase Khirni TPRI 88
  89. 89. R-Programming–Basics Reference Class in R Accessing Fields in R • Fields of the object can be accessed using the $ operator Modifying Fields in R Copyright 2008-2016 Syed Awase Khirni TPRI 89 • Fields can be modified by reassignment
  90. 90. R-Programming–Basics Copyright 2008-2016 Syed Awase Khirni TPRI 90
  91. 91. R-Programming–Basics Reference Methods .copy() Copyright 2008-2016 Syed Awase Khirni TPRI 91
  92. 92. R-Programming–Basics Reference Methods Copyright 2008-2016 Syed Awase Khirni TPRI 92
  93. 93. R-Programming–Basics Inheritance in Reference Class Copyright 2008-2016 Syed Awase Khirni TPRI 93
  94. 94. R-Programming–Basics sak@sycliq.com sak@territorialprescience.com Contact Us Thank You We also provide Code Driven Open House Trainings 94© Syed Awase 2008- 16 TPRI For code driven trainings Reach out to us +91-9035433124 Current Offerings • AngularJS 1.5.x • Typescript • AngularJS 2 (with NodeJS) • KnockOutJS (with NodeJS) • BackBoneJS (with NodeJS) • Ember JS / Ext JS (with NodeJS) • Raspberry Pi • Responsive Web Design with Bootstrap, Google Material Design and KendoUI • C# ASP.NET MVC • C# ASP.NET WEB API • C# ASP.NET WCF, WPF • JAVA , SPRING, HIBERNATE • Python , Django • R Statistical Programming • Android Programming • Python/Django • Ruby on Rails INDIA HYDERABAD | BANGALORE | CHENNAI | PUNE OVERSEAS SINGAPORE | MALAYSIA | DUBAI

×