Language R
By Franck Benault
Created 05/10/2015
Last update 22/12/2015
R introduction
● R is a statistical and graphical programming language
– Lingua Franca of Data Science
● Easy to use and powerful
● R is free and exists on very platform (Window, Unix)
– Large community
● There will be a lack of data-scientists
● Some elements are coming from Datacamp tutorials
R in public repositories of Github
Year Rank Nb public repository
2014 14th 48.574
2013 24th 7.867
2012 25th 5.626
● Index Tiobe
– http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
– R is 19th (September 2015)
R links
● Datacamp (R training)
– https://www.datacamp.com/
● Following datacamp, each year the number of R
users grows by 40 %.
● My examples in github
– https://github.com/franck-benault/test-R
R plan
● Environment
● Data types
– From basics to Dataframes
● R and statistics
● Diagrams
R environment
● Rstudio
R's fundamental data types
● Logical value (TRUE, FALSE, T, F, NA)
● Numeric (2, 4.5)
● Integer (2L)
● Character
● Complex
● Raw (store raw bytes)
● is.* functions (test : is.numeric(), is.integer() ...)
● as.* functions (conversion as.numeric(), as.integer() ...)
R datatype vector
● Vector
– Sequence of data elements (one dimension)
– Same datatype
– a <- c(1,2,5.3,6,-2,4)
– b <- c("one","two","three")
– d <- c(1,2.1,"three") # vector of character
● Methods
– is.vector(v)
– Length(v)
– Names(v) <- v2 #to associate a name to the values
● Basic data types are vectors
– a <-2
– is.vector(a) #return TRUE
R datatype vector, methods
● A lot of methods can be used on vector of numeric
– mean(V) #average
– median(V)
– sum(V)
● Name on vectors
– a <- c(1,6,5)
– n <- c("Ford","Renault", "Fiat")
– names(a) <- n
– b <- c(Ford=1, Renault=6, Fiat=5)
● You need a collection of elements with different datatype use a List
R datatype matrix, names
● Names with rownames, and colnames
– m <- matrix(1:6, byrow=TRUE, nrow=2)
– rownames(m) <- c("row1", "row2")
– colnames(m) <- c("col1", "col2", "col3")
● matrix(1:6, byrow=TRUE, nrow=2,
dimnames=list(c("row1", "row2"),c("col1","col2","col3")))
R datatype matrix
● Matrix
– two dimensions
– all elements have same type
● Creation, matrix() function with vector as parameter
– y<-matrix(1:20, nrow=5,ncol=4)
● Creation from two or more vectors, cbind or rbind
– cbind(1:4, 1:4, 1:4)
– rbind(1:4, 1:4, 1:4)
R datatype Factor
● Categorical variable
– Limited number of different values
– Belong to a category
● In R, Factor datastructure
● # example blood type
– blood <- c("A","B", "O", "AB","O", "A")
– blood_factor <- factor(blood)
– blood_factor
– #order of the levels alphabetical
– str(blood_factor)
– table(blood_factor)
R datatype List
● List
– One dimension
– Different R objects (even list, matrix, vector)
– Loss of functionality
● Creation of list
– song <- list("Rsome types", 190, 5)
● Naming a list
– names(song) <- c("title","duration","track")
– song <- list(title="Rsome types", duration=190, track=5)
R datatype dataframe
● Datasets
– Observations
– Variables
● Example people
– Row = observation
– Properties = variables
● Store that in R
– List
– Dataframe
R datatype dataframe
● data.frame
– Specifically for a dataset
– Rows = observations
– Columns = variables
– Contains elements of different types
● Read a csv file to create a dataframe
– people <-read.csv("./people.csv", sep="",
header=TRUE)
R and statistics
● Four types of variables (SS Stevens 1946)
– Nominal (categories)
– Ordinal (rank 1st 2nd etc)
– Interval (interval between each value is equal)
– Ratio (interval + « true » zero)
R and statistics : Data description
● Data description
– centrality
● Mean (average), function mean()
● Median (50%), function median()
● Mode (peak)
– Spread
● Standard deviation (variance and sd)
● Inter quartile range
– Scale() : transformation to Z-score (mean = 0)
R and statistics : main functions
● Rnorm()
– generation of a sample following the normal distribution
● Summary()
– Lot of information
● Min,max,average,median etc
Diagrams for qualitative data
● Qualitative, diagrams
– histogram
– Bar plot
– Pie chart
R Diagrams
● Qualitative, diagrams
– Bar plot
– Pie chart
● Quantitative
– Few numerical value
● Diagram = dot plot
– Lot of data
● Histogram
● Box plot
R Libraries
● Maps
– Install.packages(« maps »)
– library(« maps »)
– map(« world »)
– map(« france »)
– title("la France")
Conclusion
● When will you start using R ?
● Maybe it is also a good idea to follow a basis statistics
course

Introduction to the language R

  • 1.
    Language R By FranckBenault Created 05/10/2015 Last update 22/12/2015
  • 2.
    R introduction ● Ris a statistical and graphical programming language – Lingua Franca of Data Science ● Easy to use and powerful ● R is free and exists on very platform (Window, Unix) – Large community ● There will be a lack of data-scientists ● Some elements are coming from Datacamp tutorials
  • 3.
    R in publicrepositories of Github Year Rank Nb public repository 2014 14th 48.574 2013 24th 7.867 2012 25th 5.626 ● Index Tiobe – http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html – R is 19th (September 2015)
  • 4.
    R links ● Datacamp(R training) – https://www.datacamp.com/ ● Following datacamp, each year the number of R users grows by 40 %. ● My examples in github – https://github.com/franck-benault/test-R
  • 5.
    R plan ● Environment ●Data types – From basics to Dataframes ● R and statistics ● Diagrams
  • 6.
  • 7.
    R's fundamental datatypes ● Logical value (TRUE, FALSE, T, F, NA) ● Numeric (2, 4.5) ● Integer (2L) ● Character ● Complex ● Raw (store raw bytes) ● is.* functions (test : is.numeric(), is.integer() ...) ● as.* functions (conversion as.numeric(), as.integer() ...)
  • 8.
    R datatype vector ●Vector – Sequence of data elements (one dimension) – Same datatype – a <- c(1,2,5.3,6,-2,4) – b <- c("one","two","three") – d <- c(1,2.1,"three") # vector of character ● Methods – is.vector(v) – Length(v) – Names(v) <- v2 #to associate a name to the values ● Basic data types are vectors – a <-2 – is.vector(a) #return TRUE
  • 9.
    R datatype vector,methods ● A lot of methods can be used on vector of numeric – mean(V) #average – median(V) – sum(V) ● Name on vectors – a <- c(1,6,5) – n <- c("Ford","Renault", "Fiat") – names(a) <- n – b <- c(Ford=1, Renault=6, Fiat=5) ● You need a collection of elements with different datatype use a List
  • 10.
    R datatype matrix,names ● Names with rownames, and colnames – m <- matrix(1:6, byrow=TRUE, nrow=2) – rownames(m) <- c("row1", "row2") – colnames(m) <- c("col1", "col2", "col3") ● matrix(1:6, byrow=TRUE, nrow=2, dimnames=list(c("row1", "row2"),c("col1","col2","col3")))
  • 11.
    R datatype matrix ●Matrix – two dimensions – all elements have same type ● Creation, matrix() function with vector as parameter – y<-matrix(1:20, nrow=5,ncol=4) ● Creation from two or more vectors, cbind or rbind – cbind(1:4, 1:4, 1:4) – rbind(1:4, 1:4, 1:4)
  • 12.
    R datatype Factor ●Categorical variable – Limited number of different values – Belong to a category ● In R, Factor datastructure ● # example blood type – blood <- c("A","B", "O", "AB","O", "A") – blood_factor <- factor(blood) – blood_factor – #order of the levels alphabetical – str(blood_factor) – table(blood_factor)
  • 13.
    R datatype List ●List – One dimension – Different R objects (even list, matrix, vector) – Loss of functionality ● Creation of list – song <- list("Rsome types", 190, 5) ● Naming a list – names(song) <- c("title","duration","track") – song <- list(title="Rsome types", duration=190, track=5)
  • 14.
    R datatype dataframe ●Datasets – Observations – Variables ● Example people – Row = observation – Properties = variables ● Store that in R – List – Dataframe
  • 15.
    R datatype dataframe ●data.frame – Specifically for a dataset – Rows = observations – Columns = variables – Contains elements of different types ● Read a csv file to create a dataframe – people <-read.csv("./people.csv", sep="", header=TRUE)
  • 16.
    R and statistics ●Four types of variables (SS Stevens 1946) – Nominal (categories) – Ordinal (rank 1st 2nd etc) – Interval (interval between each value is equal) – Ratio (interval + « true » zero)
  • 17.
    R and statistics: Data description ● Data description – centrality ● Mean (average), function mean() ● Median (50%), function median() ● Mode (peak) – Spread ● Standard deviation (variance and sd) ● Inter quartile range – Scale() : transformation to Z-score (mean = 0)
  • 18.
    R and statistics: main functions ● Rnorm() – generation of a sample following the normal distribution ● Summary() – Lot of information ● Min,max,average,median etc
  • 19.
    Diagrams for qualitativedata ● Qualitative, diagrams – histogram – Bar plot – Pie chart
  • 20.
    R Diagrams ● Qualitative,diagrams – Bar plot – Pie chart ● Quantitative – Few numerical value ● Diagram = dot plot – Lot of data ● Histogram ● Box plot
  • 21.
    R Libraries ● Maps –Install.packages(« maps ») – library(« maps ») – map(« world ») – map(« france ») – title("la France")
  • 22.
    Conclusion ● When willyou start using R ? ● Maybe it is also a good idea to follow a basis statistics course