2. As mentioned earlier, R takes the lead in data analysis by
supporting various types of formats from SAS , SPSS , SQL, Excel,
CSV, Twitter API and so on.
With the given data formats R have various methods to read
those into R .
Some of the commonly used methods of reading data in R
1. scan
2. read.table
3. read.delim
4. read.csv
5. read.csv.sql
6. fread
R Import Functions
Rupak Roy
3. 1. scan(): reads data into a vector or list from the console or a file .
> s <- scan (“testing.csv”, what = “ ” , sep=“ ,” , skip =1)
where,
what = “ ”( if we don’t use this it will through error, since default is numeric)
sep = “,” ( represents the delimiter that separates the line )
skip = 1 (skips the first row )
for more info we can always use > ?scan
2. read.table(): reads a file in table format and creates a data frame from it.
> t <- read.table( file.choose() , header = TRUE, sep = “,” , StringAsFactors =F )
StringAsFactors = F i.e. FALSE refers load the string variables without converting
them into factors .
> str(t) #observe the structure before and after StringAsFactors = F )
#to read a tab delimited file
> t <- read.table (file.chose(), header = TRUE, sep=“t”)
R Import Functions
4. 3. read.delim(): useful to read delimited file like tab ( t) separate file and
comma ( , ) separated file. It is just a wrapper function of read.table()
> r <- read.delim(file.choose(),sep = ",", header = TRUE)
4. read.csv(): the most commonly used function to read a csv file. Again it
is a wrapper function of read.table()
> c <- read.csv(file.choose(),sep=“;”, header= TRUE, dec= “.”)
Where,
dec = the character used in the file for decimal points.
Note: we can use “ ? ” like ?read.delim, ?read.csv to know more about the
parameters and the features of the function.
R Import Functions
Rupak Roy
5. R Import Functions
5. read.lines(): Reads the entire row or all text lines from a connection as line.
> readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE, encoding = "unknown",
skipNul = FALSE)
Where,
con= refers connection to a object, for example a character string file.
n= The (maximal) number of lines to read. Negative values indicate that one should
read up to the end of input on the connection.
Ok= logical parameter to reach the end of the connection before n > 0 lines are
read? If not, an error will be generated.
Warn= logical parameter that that warns if a text file is missing or if there are
embedded nulls in the file.
Encoding= encoding to be assumed for input strings. By default standard UTF=8 is
assumed for input strings.
skipNul= logical parameter indicating should nulls be skipped or not.
6. R Import Functions
• R treats/identifies missing values in different ways:
For numeric variables – empty spaces
For character variables – should have NA hard coded for empty spaces.
R also identifies formats like NaN, Inf, -Inf as empty or missing values.
With the scalability features of R if the data have different format other than those
R also provides options to refer a missing or empty values using na.string(“ ”)
Rupak Roy
7. R Import Functions
Special parameters:
Strip.white = TRUE
To identify extra white spaces present in character strings.
Blank.lines.skip = FALSE
Helps to ignore the blank lines
FILL = TRUE
To identify rows if unequal length
For comments use comment.char = “ ” we can always use ?read.csv,
?read.table to get the full list of parameters available for each function.
8. Next:
How to export our analysis or any data from the R
objects into a csv or a delimited value files
Import & Export Data
in R
Rupak Roy