ADVANCED DATA
STRUCTURES
• data.frame - similar excel spreadsheet - two dimensions – rows and columns
• matrix - two dimensions – rows and columns
• List – different data types in a vector
• array - n Dimensional
R BASICS
DATA FRAMES
• One of the most powerful features in
RData Frames are like Excel Spreadsheets
DATA FRAMES
• Data Frames are like Excel Spreadsheets
• Each Column is a Variable and a Vector. Each
vector is of the same length.
• Each column holds the same date type within
the column, but each column can be different
data types.
• Each Row is an observation
Variable A Variable B Variable C
88.5 Red True
90.3 Green False
55.7 Yellow True
125.0 Blue True
DATA FRAMES
• Data Frames are created using the
"data.frame()" function
• Use "nrow()" to check the number of rows in a
data frame
• Use "ncol()" to check the number of columns in
a data frame
• Use "dim()" to check the number of rows and
columns in a data frame
• Use "names()" to look at the column names of a
data frame
• Use "rownames()" to check the names of each
row.
• Use rownames() <- c("One","Two","Three", ect)
to assign names to the individual rows in a data
frame
DATA FRAMES
• To look at the first few rows of a data frame use
the "head()" function.
• To look at the last few rows of a data frame use
the function "tail()"
• You can use the "class()" function to check the
class of a data frame.
• The $ or [] is used to access individual columns of
a data frame.
• Ex. theDataFrame$Currency
• Ex. theDataFrame[3, 2] # the row is the first
argument and the column is the second
argument
• To access multiple columns by name make the
column argument a character vector of names
• theDataFrame[ , c("Currency", "ExchRate")]
DATA FRAMES
• Data Frames are created using the "data.frame()" function
• Use "nrow()" to check the number of rows in a data frame
• Use "ncol()" to check the number of columns in a data frame
• Use "dim()" to check the number of rows and columns in a data frame
• Use "names()" to look at the column names of a data frame
• Use "rownames()" to check the names of each row.
• Use rownames() <- c("One","Two","Three", etc) to assign names to the individual
rows in a data frame
DATA FRAMES
• To look at the first few rows of a data frame use the "head()" function.
• To look at the last few rows of a data frame use the function "tail()"
• You can use the "class()" function to check the class of a data frame.
• The $ or [] is used to access individual columns of a data frame.
• Ex. theDataFrame$Currency
• Ex. theDataFrame[3, 2] # the row is the first argument, and the column is the second argument
• To access multiple columns by name make the column argument a character vector of names
• theDataFrame[ , c("Currency", "ExchRate")]
LISTS
• A List is a container that can hold various data types.
• A List can contain all the same data types, a mix of data types, data frames, or other lists.
• Lists are useful in creating multi dimensional objects in to one object
• To create a list we use the "list()" function
• Ex. list("a", 2, theDataFrame)
• You can use the "names()" function to see the names of each of the elements in a list.
LISTS
• You can also assign names to the elements of a list while creating the list
• names(list5) <- c(TheDataFrame = data.frame, TheVector = 1:10, TheList = list3)
LISTS
• Access an element of a list using the [ ] and specify the element number or name. This allows for
accessing only one element at a time.
• Ex. list5[ [ 1 ] ]
• Ex. list5[ [ "data.frame" ] ]
• You can use the "length()" function to determine the length of a list
MATRICES
• Similar to data.frames in that it is rectangular with rows and columns except that every single element,
regardless of column, must be the same type, most commonly all numerics.
• Also act similarly to vectors with element-by-element addition, multiplication, subtraction, division and
equality.
• nrow, ncol and dim functions work just like they do for data.frames
• Matrices are created using the "matrix()" function
• # create a 5x2 matrix
> A <- matrix(1:10, nrow=5)
MATRICES
• Matrix multiplication is a commonly used operation in mathematics, requiring the number of columns
of the left-hand matrix to be the same as the number of rows of the right-hand matrix.
• Matrix multiplication keeps the row names from the left matrix and the column names from the right matrix.
• Matrices can also have row and column names by using the rownames() and colnames() functions
• There are two special vectors, "letters" and "LETTERS", that contain the lower case and upper-case
letters, respectively.
ARRAYS
• Arrays are multidimensional vectors.
• Arrays must be all of the same type
• Elements of an array are accessed using the square brackets [ ]
• The first element is the row index
• The second is the column index
• Difference between arrays and matrix is that matrices are restricted to 2 elements and an array can be
however many you want.
READING DATA INTO R
READING CSVS
• In order to read data from a CSV use the "read.table()"
• The result of using read.table is a data.frame
• You can use the head() function to view the first few rows of data
READING CSVS
• function arguments can be specified without the name of the argument (positionally indicated), but
specifying the arguments is good practice
• Ex. Read.table(file=theUrl, header = TRUE, sep=",")
• The second argument, header, indicates that the first row of data holds the column names.
• The third argument, sep, gives the delimiter separating data cells. Changing this to other values such
as “t” (tab delimited) or “;” (semicolon delimited) enables it to read other types of files.
READING CSVS
• The readr package has functions for reading text files
• Most common function is the read_delim function for CSV files
• Read_delim returns a tibble which is an extension of data.frame.
• read_delim is faster than read.table;
• The functions read_csv, read_csv2 and read_tsv are special cases for when the delimiters are commas
(,), semicolons (;) and tabs (t), respectively.
EXCEL DATA
• The package readxl, by Hadley Wickham, makes reading Excel files, both .xls and .xlsx, easy.
• Main function is read_excel, which reads the data from a single Excel sheet.
• Unlike read.table, read_delim and fread, read_excel cannot read data directly from the Internet, and
thus the files must be downloaded first
OTHER DATA FORMATS
• Connecting to Databases is one of the most common ways to connect to data.
• Typically, the connection is made using an ODBC connection to the database
• There are many packages that will create the connections needed
• Getting data from other statistical tools such as SAS or SPSS
• The “Foreign” package has tools that will let you connect to commonly used statistical tools
FUNCTIONS
• A set of statements organized together to perform a specific task
• Functions are used to make code reusable and maintainable
• Functions can be built-in, or user defined
FUNCTION COMPONENTS
• Function Name – The actual name of the function
• Arguments – a placeholder. When a function is invoked, you pass a value to an argument.
• Arguments are optional. A function may contain no arguments.
• Arguments can have default values
• Function Body - contains all the statements that define what the function does
• Return Value – the return value of a function is the last expression in the function to be evaluated
WRITING FUNCTIONS
• Functions are assigned to objects like any other variable.
• newFunction <- function()
• The parenthesis following a function can be empty or contain one or many arguments
• Each argument in a function is separated by a comma
• newFunction <- function(x=1,"book2”, FALSE)
WRITING FUNCTIONS
• The body of the function must contain opening and closing braces.
• newFunction <- function()
{
}
• The commands of the function are placed between the opening and closing braces.
CALLING FUNCTIONS
• Functions are called using the function name followed by the open and close parenthesis.
• Example function call: newFunction()
• The value of an argument can be supplied during a function call by either position or by name
RETURN VALUES
• A function is written typically to return a value based off a calculation of some sorts.
• 1 + 2 = 3
• 3 is the returned value of the addition of 1 + 2
• Best practices in coding is that if a value is to be returned then it should explicitly specify that a value
should be returned in the function body by using the return() command.
PIPES
• Pipes are a new way to call functions in R
• Uses the “Magrittr” package
• Takes the value or object on the left side of the pipe and inserts it into the first argument of the
function on the right side of the pipe.

DataStructures.pptx

  • 1.
    ADVANCED DATA STRUCTURES • data.frame- similar excel spreadsheet - two dimensions – rows and columns • matrix - two dimensions – rows and columns • List – different data types in a vector • array - n Dimensional
  • 2.
    R BASICS DATA FRAMES •One of the most powerful features in RData Frames are like Excel Spreadsheets
  • 3.
    DATA FRAMES • DataFrames are like Excel Spreadsheets • Each Column is a Variable and a Vector. Each vector is of the same length. • Each column holds the same date type within the column, but each column can be different data types. • Each Row is an observation Variable A Variable B Variable C 88.5 Red True 90.3 Green False 55.7 Yellow True 125.0 Blue True
  • 4.
    DATA FRAMES • DataFrames are created using the "data.frame()" function • Use "nrow()" to check the number of rows in a data frame • Use "ncol()" to check the number of columns in a data frame • Use "dim()" to check the number of rows and columns in a data frame • Use "names()" to look at the column names of a data frame • Use "rownames()" to check the names of each row. • Use rownames() <- c("One","Two","Three", ect) to assign names to the individual rows in a data frame
  • 5.
    DATA FRAMES • Tolook at the first few rows of a data frame use the "head()" function. • To look at the last few rows of a data frame use the function "tail()" • You can use the "class()" function to check the class of a data frame. • The $ or [] is used to access individual columns of a data frame. • Ex. theDataFrame$Currency • Ex. theDataFrame[3, 2] # the row is the first argument and the column is the second argument • To access multiple columns by name make the column argument a character vector of names • theDataFrame[ , c("Currency", "ExchRate")]
  • 6.
    DATA FRAMES • DataFrames are created using the "data.frame()" function • Use "nrow()" to check the number of rows in a data frame • Use "ncol()" to check the number of columns in a data frame • Use "dim()" to check the number of rows and columns in a data frame • Use "names()" to look at the column names of a data frame • Use "rownames()" to check the names of each row. • Use rownames() <- c("One","Two","Three", etc) to assign names to the individual rows in a data frame
  • 7.
    DATA FRAMES • Tolook at the first few rows of a data frame use the "head()" function. • To look at the last few rows of a data frame use the function "tail()" • You can use the "class()" function to check the class of a data frame. • The $ or [] is used to access individual columns of a data frame. • Ex. theDataFrame$Currency • Ex. theDataFrame[3, 2] # the row is the first argument, and the column is the second argument • To access multiple columns by name make the column argument a character vector of names • theDataFrame[ , c("Currency", "ExchRate")]
  • 8.
    LISTS • A Listis a container that can hold various data types. • A List can contain all the same data types, a mix of data types, data frames, or other lists. • Lists are useful in creating multi dimensional objects in to one object • To create a list we use the "list()" function • Ex. list("a", 2, theDataFrame) • You can use the "names()" function to see the names of each of the elements in a list.
  • 9.
    LISTS • You canalso assign names to the elements of a list while creating the list • names(list5) <- c(TheDataFrame = data.frame, TheVector = 1:10, TheList = list3)
  • 10.
    LISTS • Access anelement of a list using the [ ] and specify the element number or name. This allows for accessing only one element at a time. • Ex. list5[ [ 1 ] ] • Ex. list5[ [ "data.frame" ] ] • You can use the "length()" function to determine the length of a list
  • 11.
    MATRICES • Similar todata.frames in that it is rectangular with rows and columns except that every single element, regardless of column, must be the same type, most commonly all numerics. • Also act similarly to vectors with element-by-element addition, multiplication, subtraction, division and equality. • nrow, ncol and dim functions work just like they do for data.frames • Matrices are created using the "matrix()" function • # create a 5x2 matrix > A <- matrix(1:10, nrow=5)
  • 12.
    MATRICES • Matrix multiplicationis a commonly used operation in mathematics, requiring the number of columns of the left-hand matrix to be the same as the number of rows of the right-hand matrix. • Matrix multiplication keeps the row names from the left matrix and the column names from the right matrix. • Matrices can also have row and column names by using the rownames() and colnames() functions • There are two special vectors, "letters" and "LETTERS", that contain the lower case and upper-case letters, respectively.
  • 13.
    ARRAYS • Arrays aremultidimensional vectors. • Arrays must be all of the same type • Elements of an array are accessed using the square brackets [ ] • The first element is the row index • The second is the column index • Difference between arrays and matrix is that matrices are restricted to 2 elements and an array can be however many you want.
  • 14.
  • 15.
    READING CSVS • Inorder to read data from a CSV use the "read.table()" • The result of using read.table is a data.frame • You can use the head() function to view the first few rows of data
  • 16.
    READING CSVS • functionarguments can be specified without the name of the argument (positionally indicated), but specifying the arguments is good practice • Ex. Read.table(file=theUrl, header = TRUE, sep=",") • The second argument, header, indicates that the first row of data holds the column names. • The third argument, sep, gives the delimiter separating data cells. Changing this to other values such as “t” (tab delimited) or “;” (semicolon delimited) enables it to read other types of files.
  • 17.
    READING CSVS • Thereadr package has functions for reading text files • Most common function is the read_delim function for CSV files • Read_delim returns a tibble which is an extension of data.frame. • read_delim is faster than read.table; • The functions read_csv, read_csv2 and read_tsv are special cases for when the delimiters are commas (,), semicolons (;) and tabs (t), respectively.
  • 18.
    EXCEL DATA • Thepackage readxl, by Hadley Wickham, makes reading Excel files, both .xls and .xlsx, easy. • Main function is read_excel, which reads the data from a single Excel sheet. • Unlike read.table, read_delim and fread, read_excel cannot read data directly from the Internet, and thus the files must be downloaded first
  • 19.
    OTHER DATA FORMATS •Connecting to Databases is one of the most common ways to connect to data. • Typically, the connection is made using an ODBC connection to the database • There are many packages that will create the connections needed • Getting data from other statistical tools such as SAS or SPSS • The “Foreign” package has tools that will let you connect to commonly used statistical tools
  • 20.
    FUNCTIONS • A setof statements organized together to perform a specific task • Functions are used to make code reusable and maintainable • Functions can be built-in, or user defined
  • 21.
    FUNCTION COMPONENTS • FunctionName – The actual name of the function • Arguments – a placeholder. When a function is invoked, you pass a value to an argument. • Arguments are optional. A function may contain no arguments. • Arguments can have default values • Function Body - contains all the statements that define what the function does • Return Value – the return value of a function is the last expression in the function to be evaluated
  • 22.
    WRITING FUNCTIONS • Functionsare assigned to objects like any other variable. • newFunction <- function() • The parenthesis following a function can be empty or contain one or many arguments • Each argument in a function is separated by a comma • newFunction <- function(x=1,"book2”, FALSE)
  • 23.
    WRITING FUNCTIONS • Thebody of the function must contain opening and closing braces. • newFunction <- function() { } • The commands of the function are placed between the opening and closing braces.
  • 24.
    CALLING FUNCTIONS • Functionsare called using the function name followed by the open and close parenthesis. • Example function call: newFunction() • The value of an argument can be supplied during a function call by either position or by name
  • 25.
    RETURN VALUES • Afunction is written typically to return a value based off a calculation of some sorts. • 1 + 2 = 3 • 3 is the returned value of the addition of 1 + 2 • Best practices in coding is that if a value is to be returned then it should explicitly specify that a value should be returned in the function body by using the return() command.
  • 26.
    PIPES • Pipes area new way to call functions in R • Uses the “Magrittr” package • Takes the value or object on the left side of the pipe and inserts it into the first argument of the function on the right side of the pipe.

Editor's Notes

  • #6 Look at the R file and talk about using the brackets with the $ to access columns and rows
  • #8 Look at the R file and talk about using the brackets with the $ to access columns and rows