Advanced Data Analytics: Moving Data Around Jeffrey Stanton School of Information Studies Syracuse University
R and the File System• R maintains a current working directory to simplify the process of reading and saving filesgetwd() # shows the pathname of current foldersetwd("pathname") # Sets a new pathhistory() # shows most recent commands# Creates a CSV file using data from a dataframewrite.table(dataFr, sep=",", file="filename.csv")# Reads a CSV file into a dataframetargetFrame = read.table("filename.csv", sep=",") 2
R and the Windows Clipboard• For small chunks of data, it may be convenient to “cut and paste”• Create a small rectangle of data in Excel and copy it to the clipboard• Then, in R: > read.DIF("clipboard",transpose=TRUE) V1 V2 1 1 1 2 2 0 3 3 1 4 4 0 5 5 1 6 6 0 3
Include Variable Names• You can pull in the variable names (the column headings) as well• Then, in R: > read.DIF("clipboard",transpose=TRUE,header=TRUE) Subject Code 1 1 1 2 2 0 3 3 1 4 4 0 5 5 1 6 6 0 4
An Explanation of Data Frames• Every single piece of data in R is a “vector”: A list of “scalar” values all of the same mode – Scalar just means a single element or value, like the number 5 – R vectors can be lists with any number of elements, including just one element; so a scalar could be stored in a vector of length one – The mode of a vector can be numerical, or character, or logical• Just like Excel spreadsheets and other data programs like SPSS, vectors in R can be two dimensional, with a certain number of columns and a certain number of rows; a two dimensional vector is called a matrix• But, being a vector, a matrix has to contain elements all of the same mode, so a matrix cannot always hold a typical spreadsheet or data set, because these often have different types in each column• This is where the data frame comes in: A data frame is a list of vectors, all of the same length, each of which can be a different type 6
read.DIF also works with files> setwd(“C:/DataMining/DataFiles")> newDF = read.DIF(“excelExport.dif", transpose=TRUE,header=TRUE)> class(newDF) "data.frame"> attach(newDF)# Note that Excel, DIF, and R# don’t always agree on data# formats. For example, currency# in Excel will not export to# integer values in R, so remove# as much formatting as possible. 7
Demonstrating Mastery• Create or find data in an Excel spreadsheet and export as a CSV file• Import data into R from a CSV or TXT file• Export a data frame into a CSV file• Read the CSV file into Excel• Advanced: Use data interchange format (“DIF”) to exchange files between R and Excel• Advanced: Use a data frame in R to store data obtained from a spreadsheet 8
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.