PhillyR
2018-2019 Kickoff
Sept 6, 2018
2016 - 2018 PhillyR Summary
- 11 Meetups, 9 in the last
12 months
- 8 presenters
- Sean Davis, Greg Baldini,
Leon Kim, Scott Jackson,
George Kikuchi, Tan Chen,
Russ Lavery, Dror Walter
- 5 Sponsors
- Microsoft, RStudio,
PromtWorks, WeWork,
Optymyze, Jornaya
2018 - 2019 PhillyR
- For 2018-2019 season, PhillyR is part of R User Support Program funded by R
Consortium.
- PhillyR and its members are expected to follow R Consortium’s R Community Code
of Conduct.
- If you have questions or concerns regarding PhillyR and RConsortium’s Code of
Conduct, please contact conduct@r-consortium.org for more details
2018 - 2019 PhillyR
- PhillyR is run by volunteer organizers. If you like to become an organizer, please
contact me through the PhillyR meetup page
- PhillyR is entirely funded by sponsors and donations! Help us find sponsors
and/or contribute financially to PhillyR through meetup page
- PhillyR events are decided by YOU. If you have a topic that you would like to learn
and/or present, host a meetup. We will find the place and food to host your event!
Announcements
Data Structures in R
2018-2019 Kickoff
Sept 6, 2018
Your computer
R software
1. R code 2. R execution
3. R output
RStudio software
4. Display R output
Your computer
R software
1. R code 2. R execution
3. R output
RStudio software
Write your R codes here if you want to save the R
codes you ran
This is just a text file with .R extension
Writing R code here is exactly the same thing
except you can’t save your code easily
R is a programming language
- This means it represents data in computers and operates some function on that
data.
- There are many ways you can represent data in computers
- The same data can be represented in different ways, with different functions
String (e.g. “Leon”) -> character and factor
Number (e.g. 1) -> integer and double
and this data is often stored as variable for later use so that you don’t have to
keep writing “Leon” or 1 everytime. e.g. myName <- “Leon” , myNumber <- 1
R doesn’t have scalar type
- In other programming languages, there is a big difference between storing “Leon” to
a variable vs storing both “Leon” and “Kim” a variable
- This is called a “list” or “array” of string in other languages
- A non-list or non-array is called scalar type
- This is 1-Dimensional data structure (Q: Why is it 1 not 2?)
- Because R is statistics-oriented, every variable can have 1 or many values
- If you want to apply some function to every value, you do not have to loop
through every element. Many operations are vectorized
- x <- c(1,2,3)
y <- x + 1
y now has values (2,3,4)
- Therefore in R, the most basic data structure is a vector
R doesn’t have scalar type
- In other programming languages, there is a big difference between storing “Leon” to
a variable vs storing both “Leon” and “Kim” a variable
- This is called a “list” or “array” of string in other languages
- A non-list or non-array is called scalar type
- This is 1-Dimensional data structure (Q: Why is it 1 not 2?)
- Because R is statistics-oriented, every variable can have 1 or many values
- If you want to apply some function to every value, you do not have to loop
through every element. Many operations are vectorized
- x <- c(1,2,3)
y <- x + 1
y now has values (2,3,4)
- Therefore in R, the most basic data structure is a atomic vector
- Combines multipe vectors into one vector
- c(1,2,3)
- c(c(1), c(2,3))
- c(x, y)
- Because a vector can be of different length, operations can depend on length
- c(1,2,3) + 1
- c(1,2,3) + c(1) exactly the same as above
- c(1,2,3) + c(1,2) error because this doesn’t make sense, even in real life!
Warning message:
In c(1, 2, 3) + c(1, 2) :
longer object length is not a multiple of shorter object length
- c(1,2,3,4) + c(1,2)
- 2 4 4 6
R list
- R does have a “list” but it is slightly different from other programming languages
- Since all values can be length of 1 or more, the difference between vector and list is
that list can have different types in each element
- i.e. vectors are homogeneous
- c(1,2,3) c(“leon”, “kim”, “loves”, “R”)
- i.e. vectors are heterogeneous
- list(“PhillyR”, 2018)
- list(c(1,2,3),
c(“leon”, “kim”, “loves”, “R”)
)
- list(list(“4”, 6),
c(“leon”, “kim”, “loves”, “R”)
)
- list: A default name is the index #
- List: names shown between [[ ]]
named vectors and list
- If no name is specified:
- vector: no names displayed,
only values
- If name is specified:
- vector: display name on top
- Each element in a vector and list have a “name”.
named vectors and list
Let’s get 2-Dimensional
- Vectors and lists are 1-Dimensional. They grow in length only.
- Most of data that we are familiar with are 2-Dimensional (i.e. is rectangular)
- In R, this is represented as a vector (or a list) with row and column lengths (i.e. 2D)
- i.e. matrix is homogeneous
- matrix(1:9, nrow = 1, ncol = 1)
- i.e. data.frame is heterogeneous
- data.frame(x, y, z)
- where x, y, z are vectors that must ...
- Have same length (or at least multiples of each other)
- but x, y , z can be of different type
Let’s get 2-Dimensional
- data.frame can have special columns
- This suspiciously behave similar to lists
- An element in a list can be a list
- This is because data.frame is a list
with constraints applied to them

PhillyR 18-19 Kickoff - Data Structure Intro

  • 1.
  • 2.
    2016 - 2018PhillyR Summary - 11 Meetups, 9 in the last 12 months - 8 presenters - Sean Davis, Greg Baldini, Leon Kim, Scott Jackson, George Kikuchi, Tan Chen, Russ Lavery, Dror Walter - 5 Sponsors - Microsoft, RStudio, PromtWorks, WeWork, Optymyze, Jornaya
  • 3.
    2018 - 2019PhillyR - For 2018-2019 season, PhillyR is part of R User Support Program funded by R Consortium. - PhillyR and its members are expected to follow R Consortium’s R Community Code of Conduct. - If you have questions or concerns regarding PhillyR and RConsortium’s Code of Conduct, please contact conduct@r-consortium.org for more details
  • 4.
    2018 - 2019PhillyR - PhillyR is run by volunteer organizers. If you like to become an organizer, please contact me through the PhillyR meetup page - PhillyR is entirely funded by sponsors and donations! Help us find sponsors and/or contribute financially to PhillyR through meetup page - PhillyR events are decided by YOU. If you have a topic that you would like to learn and/or present, host a meetup. We will find the place and food to host your event!
  • 5.
  • 6.
    Data Structures inR 2018-2019 Kickoff Sept 6, 2018
  • 8.
    Your computer R software 1.R code 2. R execution 3. R output RStudio software
  • 9.
    4. Display Routput Your computer R software 1. R code 2. R execution 3. R output RStudio software
  • 10.
    Write your Rcodes here if you want to save the R codes you ran This is just a text file with .R extension
  • 11.
    Writing R codehere is exactly the same thing except you can’t save your code easily
  • 12.
    R is aprogramming language - This means it represents data in computers and operates some function on that data. - There are many ways you can represent data in computers - The same data can be represented in different ways, with different functions String (e.g. “Leon”) -> character and factor Number (e.g. 1) -> integer and double and this data is often stored as variable for later use so that you don’t have to keep writing “Leon” or 1 everytime. e.g. myName <- “Leon” , myNumber <- 1
  • 13.
    R doesn’t havescalar type - In other programming languages, there is a big difference between storing “Leon” to a variable vs storing both “Leon” and “Kim” a variable - This is called a “list” or “array” of string in other languages - A non-list or non-array is called scalar type - This is 1-Dimensional data structure (Q: Why is it 1 not 2?) - Because R is statistics-oriented, every variable can have 1 or many values - If you want to apply some function to every value, you do not have to loop through every element. Many operations are vectorized - x <- c(1,2,3) y <- x + 1 y now has values (2,3,4) - Therefore in R, the most basic data structure is a vector
  • 14.
    R doesn’t havescalar type - In other programming languages, there is a big difference between storing “Leon” to a variable vs storing both “Leon” and “Kim” a variable - This is called a “list” or “array” of string in other languages - A non-list or non-array is called scalar type - This is 1-Dimensional data structure (Q: Why is it 1 not 2?) - Because R is statistics-oriented, every variable can have 1 or many values - If you want to apply some function to every value, you do not have to loop through every element. Many operations are vectorized - x <- c(1,2,3) y <- x + 1 y now has values (2,3,4) - Therefore in R, the most basic data structure is a atomic vector
  • 15.
    - Combines multipevectors into one vector - c(1,2,3) - c(c(1), c(2,3)) - c(x, y) - Because a vector can be of different length, operations can depend on length - c(1,2,3) + 1 - c(1,2,3) + c(1) exactly the same as above - c(1,2,3) + c(1,2) error because this doesn’t make sense, even in real life! Warning message: In c(1, 2, 3) + c(1, 2) : longer object length is not a multiple of shorter object length - c(1,2,3,4) + c(1,2) - 2 4 4 6
  • 16.
    R list - Rdoes have a “list” but it is slightly different from other programming languages - Since all values can be length of 1 or more, the difference between vector and list is that list can have different types in each element - i.e. vectors are homogeneous - c(1,2,3) c(“leon”, “kim”, “loves”, “R”) - i.e. vectors are heterogeneous - list(“PhillyR”, 2018) - list(c(1,2,3), c(“leon”, “kim”, “loves”, “R”) ) - list(list(“4”, 6), c(“leon”, “kim”, “loves”, “R”) )
  • 17.
    - list: Adefault name is the index # - List: names shown between [[ ]] named vectors and list - If no name is specified: - vector: no names displayed, only values - If name is specified: - vector: display name on top - Each element in a vector and list have a “name”.
  • 18.
  • 19.
    Let’s get 2-Dimensional -Vectors and lists are 1-Dimensional. They grow in length only. - Most of data that we are familiar with are 2-Dimensional (i.e. is rectangular) - In R, this is represented as a vector (or a list) with row and column lengths (i.e. 2D) - i.e. matrix is homogeneous - matrix(1:9, nrow = 1, ncol = 1) - i.e. data.frame is heterogeneous - data.frame(x, y, z) - where x, y, z are vectors that must ... - Have same length (or at least multiples of each other) - but x, y , z can be of different type
  • 20.
    Let’s get 2-Dimensional -data.frame can have special columns - This suspiciously behave similar to lists - An element in a list can be a list - This is because data.frame is a list with constraints applied to them