Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Reading data into r
1. Reading Structured Data into R
Srikanth Potukuchi
Analytics Consultant
srikanth.potukuchi@gmail.com
April 21, 2018
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 1 / 26
2. Overview
1 Introduction
2 Delimited Files
3 Excel Files
4 Reading from Database
5 Data from Other Statistical Tools
6 Getting Website Data
7 Summary
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 2 / 26
3. Introduction
Introduction-Focus on Structured Data
Data comes in many forms: Structured and UnStructured. In this
deck, we are focusssing on Structured Data-which is basically in the form
of rows and columns (Also known as Tabular Data).
We will look at reading delimited files(CSV, Tab, Special
delimiters), excel files, database(MS Access and Oracle), data from other
Statistical tools like SAS, SPSS, Stata, Octave, Minitab,and Systat; and
finally we will look at extracting data from Websites as well.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 3 / 26
4. Introduction
Introduction-Installing Packages
Installing a package in R can be done using install.packages(“Your
Package Name”) and except for the built-in Base package- other packages
have to be loaded using library(Your Package Name) or require(Your
Package Name). Here’s an example:
>install.packages(‘‘RODBC ’’)
>library(RODBC)
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 4 / 26
5. Introduction
Data Included with R.
Suppose we install a package- say ggplot2.
What datasets are included in this package?
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 5 / 26
6. Delimited Files
Delimited Files-Comma Separated File (CSV)
CSV files are mostly read using read.csv() or read.csv2(). The
difference between the two is the default separator and decimal point. The
basic syntax is:
>read.csv(file ,header=TRUE , sep=",",quote=""",
dec=".",fill=TRUE ,comment.char="" ,...)
>read.csv2(file ,header=TRUE ,sep=";",quote=""",
dec=",",fill=TRUE ,comment.char="" ,...)
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 6 / 26
7. Delimited Files
Delimited Files-Comma Separated File (CSV)
Large CSV files are mostly read using read.table(). The basic
syntax is:
>read.table(file ,header=FALSE , sep="",quote=""",
dec=".",stringsAsFactors =default. stringsAsFactors () ,...)
Reading CSV file from a website
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 7 / 26
8. Delimited Files
Delimited Files-Comma Separated File (CSV)
The result of using read.table is a data.frame. The path will give an
error if forward slash or double slash is not used.
Reading CSV file locally
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 8 / 26
9. Excel Files
Excel Files
Difficult to read excel data into R
Simplest method convert the Excel file to a CSV file.
Packages to address this problem:
Gdata
XLConnect
xlsReadWrite
Reading Excel file locally-set the path
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 9 / 26
10. Excel Files
Excel Files
Use the function readWorksheetFromFile
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 10 / 26
11. Reading from Database
Reading from Database
To read from databases like Microsoft SQL Server,DB2, MySQL,
Oracle SQL or Microsoft Access we must provide an ODBC
connection.
We need RODBC package to use ODBC in R.
First Step is to create a DSN which differs by OS.
ODBC Connection
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 11 / 26
12. Reading from Database
Reading from Database
Download necessary drivers if not present here:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 12 / 26
13. Reading from Database
Reading from MS Access Database
Add the data source:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 13 / 26
14. Reading from Database
Reading from MS Access Database
Use odbcConnect() function:
Use sqlQuery() function:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 14 / 26
15. Reading from Database
Reading from Oracle Database
Use this Link for details on reading from Oracle Database.:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 15 / 26
16. Data from Other Statistical Tools
Data from Other Statistical Tools
List of Packages and Functions:
Reading fromSystat:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 16 / 26
17. Getting Website Data
Simple HTML Tables
Use XML package:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 17 / 26
18. Getting Website Data
Simple HTML Tables
Use readHTMLTable function:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 18 / 26
19. Getting Website Data
Use Regular Expressions to scrape the Web.
Presidents Data:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 19 / 26
20. Getting Website Data
Use Regular Expressions to scrape the Web.
Use stringr package: Here we used str split() to split a column based on a
pattern.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 20 / 26
21. Getting Website Data
Use Regular Expressions to scrape the Web.
Use Reduce() to combine the rows into a Matrix.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 21 / 26
22. Getting Website Data
Use Regular Expressions to scrape the Web.
Final Results.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 22 / 26
23. Summary
Packages and Functions
Data Packages Functions
CSV Base read.csv(),read.csv2(), read.table()
Excel XLConnect readWorksheetFromFile()
Database RODBC odbcConnect()
SAS sas7bdat read.sas7bdat()
All other foreign read.spss(), read.dta(), read.octave()
read.mtp(),read.sysstat()
Table : R Packages for reading Structured Data
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 23 / 26
24. Summary
Use files across platforms- Windows,Mac,and Linux OS.
Use Rdata files to pass around data or any R objects like variables and
functions.
Use Save().
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 24 / 26
25. Summary
References
Jared P. Lander (2013)
R for Everyone: Advanced Analytics and Graphics
Addison-Wesley Data & Analytics Series.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 25 / 26