SlideShare a Scribd company logo
1 of 25
Download to read offline
Reading Structured Data into R
Srikanth Potukuchi
Analytics Consultant
srikanth.potukuchi@gmail.com
April 21, 2018
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 1 / 26
Overview
1 Introduction
2 Delimited Files
3 Excel Files
4 Reading from Database
5 Data from Other Statistical Tools
6 Getting Website Data
7 Summary
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 2 / 26
Introduction
Introduction-Focus on Structured Data
Data comes in many forms: Structured and UnStructured. In this
deck, we are focusssing on Structured Data-which is basically in the form
of rows and columns (Also known as Tabular Data).
We will look at reading delimited files(CSV, Tab, Special
delimiters), excel files, database(MS Access and Oracle), data from other
Statistical tools like SAS, SPSS, Stata, Octave, Minitab,and Systat; and
finally we will look at extracting data from Websites as well.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 3 / 26
Introduction
Introduction-Installing Packages
Installing a package in R can be done using install.packages(“Your
Package Name”) and except for the built-in Base package- other packages
have to be loaded using library(Your Package Name) or require(Your
Package Name). Here’s an example:
>install.packages(‘‘RODBC ’’)
>library(RODBC)
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 4 / 26
Introduction
Data Included with R.
Suppose we install a package- say ggplot2.
What datasets are included in this package?
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 5 / 26
Delimited Files
Delimited Files-Comma Separated File (CSV)
CSV files are mostly read using read.csv() or read.csv2(). The
difference between the two is the default separator and decimal point. The
basic syntax is:
>read.csv(file ,header=TRUE , sep=",",quote=""",
dec=".",fill=TRUE ,comment.char="" ,...)
>read.csv2(file ,header=TRUE ,sep=";",quote=""",
dec=",",fill=TRUE ,comment.char="" ,...)
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 6 / 26
Delimited Files
Delimited Files-Comma Separated File (CSV)
Large CSV files are mostly read using read.table(). The basic
syntax is:
>read.table(file ,header=FALSE , sep="",quote=""",
dec=".",stringsAsFactors =default. stringsAsFactors () ,...)
Reading CSV file from a website
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 7 / 26
Delimited Files
Delimited Files-Comma Separated File (CSV)
The result of using read.table is a data.frame. The path will give an
error if forward slash or double slash is not used.
Reading CSV file locally
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 8 / 26
Excel Files
Excel Files
Difficult to read excel data into R
Simplest method convert the Excel file to a CSV file.
Packages to address this problem:
Gdata
XLConnect
xlsReadWrite
Reading Excel file locally-set the path
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 9 / 26
Excel Files
Excel Files
Use the function readWorksheetFromFile
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 10 / 26
Reading from Database
Reading from Database
To read from databases like Microsoft SQL Server,DB2, MySQL,
Oracle SQL or Microsoft Access we must provide an ODBC
connection.
We need RODBC package to use ODBC in R.
First Step is to create a DSN which differs by OS.
ODBC Connection
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 11 / 26
Reading from Database
Reading from Database
Download necessary drivers if not present here:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 12 / 26
Reading from Database
Reading from MS Access Database
Add the data source:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 13 / 26
Reading from Database
Reading from MS Access Database
Use odbcConnect() function:
Use sqlQuery() function:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 14 / 26
Reading from Database
Reading from Oracle Database
Use this Link for details on reading from Oracle Database.:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 15 / 26
Data from Other Statistical Tools
Data from Other Statistical Tools
List of Packages and Functions:
Reading fromSystat:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 16 / 26
Getting Website Data
Simple HTML Tables
Use XML package:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 17 / 26
Getting Website Data
Simple HTML Tables
Use readHTMLTable function:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 18 / 26
Getting Website Data
Use Regular Expressions to scrape the Web.
Presidents Data:
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 19 / 26
Getting Website Data
Use Regular Expressions to scrape the Web.
Use stringr package: Here we used str split() to split a column based on a
pattern.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 20 / 26
Getting Website Data
Use Regular Expressions to scrape the Web.
Use Reduce() to combine the rows into a Matrix.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 21 / 26
Getting Website Data
Use Regular Expressions to scrape the Web.
Final Results.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 22 / 26
Summary
Packages and Functions
Data Packages Functions
CSV Base read.csv(),read.csv2(), read.table()
Excel XLConnect readWorksheetFromFile()
Database RODBC odbcConnect()
SAS sas7bdat read.sas7bdat()
All other foreign read.spss(), read.dta(), read.octave()
read.mtp(),read.sysstat()
Table : R Packages for reading Structured Data
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 23 / 26
Summary
Use files across platforms- Windows,Mac,and Linux OS.
Use Rdata files to pass around data or any R objects like variables and
functions.
Use Save().
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 24 / 26
Summary
References
Jared P. Lander (2013)
R for Everyone: Advanced Analytics and Graphics
Addison-Wesley Data & Analytics Series.
Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 25 / 26

More Related Content

What's hot

Chapter 05 adding structures1
Chapter 05 adding structures1Chapter 05 adding structures1
Chapter 05 adding structures1
Kranthi Kumar
 
Chapter 02 abap dictionary objects1
Chapter 02 abap dictionary objects1Chapter 02 abap dictionary objects1
Chapter 02 abap dictionary objects1
Kranthi Kumar
 
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLiSSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
Hong-Bing Li
 
RES814 U1 Individual Project
RES814 U1 Individual ProjectRES814 U1 Individual Project
RES814 U1 Individual Project
ThienSi Le
 

What's hot (20)

Chapter 05 adding structures1
Chapter 05 adding structures1Chapter 05 adding structures1
Chapter 05 adding structures1
 
Using spreadsheets in the classroom
Using spreadsheets in the classroomUsing spreadsheets in the classroom
Using spreadsheets in the classroom
 
Spreadsheet terminology
Spreadsheet terminologySpreadsheet terminology
Spreadsheet terminology
 
Chapter 02 abap dictionary objects1
Chapter 02 abap dictionary objects1Chapter 02 abap dictionary objects1
Chapter 02 abap dictionary objects1
 
Tableau online training
Tableau online trainingTableau online training
Tableau online training
 
How Well Do you Know Your Library : overview of resources and services availa...
How Well Do you Know Your Library : overview of resources and services availa...How Well Do you Know Your Library : overview of resources and services availa...
How Well Do you Know Your Library : overview of resources and services availa...
 
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLiSSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
 
Data Visualization: Analyzing your library data
Data Visualization: Analyzing your library dataData Visualization: Analyzing your library data
Data Visualization: Analyzing your library data
 
SAP ABAP data dictionary
SAP ABAP data dictionarySAP ABAP data dictionary
SAP ABAP data dictionary
 
RES814 U1 Individual Project
RES814 U1 Individual ProjectRES814 U1 Individual Project
RES814 U1 Individual Project
 
What is a DATA DICTIONARY?
What is a DATA DICTIONARY?What is a DATA DICTIONARY?
What is a DATA DICTIONARY?
 
Data Dictionary
Data DictionaryData Dictionary
Data Dictionary
 
What is a spreadsheet
What is a spreadsheetWhat is a spreadsheet
What is a spreadsheet
 
RUGCombine & Livetrix
RUGCombine & LivetrixRUGCombine & Livetrix
RUGCombine & Livetrix
 
Tableau online training || Tableau Server
Tableau online training || Tableau ServerTableau online training || Tableau Server
Tableau online training || Tableau Server
 
MS Access Intro
MS Access IntroMS Access Intro
MS Access Intro
 
Spreadsheet basics ppt
Spreadsheet basics pptSpreadsheet basics ppt
Spreadsheet basics ppt
 
Application of excel and spss programme in statistical
Application of excel and spss programme in statisticalApplication of excel and spss programme in statistical
Application of excel and spss programme in statistical
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
CIS145 Test 1 Review
CIS145 Test 1 ReviewCIS145 Test 1 Review
CIS145 Test 1 Review
 

Similar to Reading data into r

Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environment
Sasha Citino
 
Share point ssis adapters 2011
Share point ssis adapters 2011Share point ssis adapters 2011
Share point ssis adapters 2011
Krishna Na
 
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing LiSSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
Hong-Bing Li
 

Similar to Reading data into r (20)

_TS_SDMX_Thailand.pdf
_TS_SDMX_Thailand.pdf_TS_SDMX_Thailand.pdf
_TS_SDMX_Thailand.pdf
 
Introduction to einstein analytics
Introduction to einstein analyticsIntroduction to einstein analytics
Introduction to einstein analytics
 
Introduction to einstein analytics
Introduction to einstein analyticsIntroduction to einstein analytics
Introduction to einstein analytics
 
Data Architecture Process in a BI environment
Data Architecture Process in a BI environmentData Architecture Process in a BI environment
Data Architecture Process in a BI environment
 
Chapter.07
Chapter.07Chapter.07
Chapter.07
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparison
 
Data analysis with pandas and scikit-learn
Data analysis with pandas and scikit-learnData analysis with pandas and scikit-learn
Data analysis with pandas and scikit-learn
 
Introduction to StratexQuery and StratexViews
Introduction to StratexQuery and StratexViewsIntroduction to StratexQuery and StratexViews
Introduction to StratexQuery and StratexViews
 
DataAnalysis
DataAnalysisDataAnalysis
DataAnalysis
 
Linked Data Entity Summarization (PhD defense)
Linked Data Entity Summarization (PhD defense)Linked Data Entity Summarization (PhD defense)
Linked Data Entity Summarization (PhD defense)
 
Share point ssis adapters 2011
Share point ssis adapters 2011Share point ssis adapters 2011
Share point ssis adapters 2011
 
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing LiSSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
 
Rational Publishing Engine and Rational System Architect
Rational Publishing Engine and Rational System ArchitectRational Publishing Engine and Rational System Architect
Rational Publishing Engine and Rational System Architect
 
Effective Use of Excel
Effective Use of ExcelEffective Use of Excel
Effective Use of Excel
 
My tableau
My tableauMy tableau
My tableau
 
System design
System designSystem design
System design
 
Lee Granger Bi Portfolio
Lee Granger Bi PortfolioLee Granger Bi Portfolio
Lee Granger Bi Portfolio
 
Query Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQLQuery Optimizer – MySQL vs. PostgreSQL
Query Optimizer – MySQL vs. PostgreSQL
 
SPSS Step-by-Step
SPSS Step-by-StepSPSS Step-by-Step
SPSS Step-by-Step
 
ReportsDashboardsSql_hbli
ReportsDashboardsSql_hbliReportsDashboardsSql_hbli
ReportsDashboardsSql_hbli
 

Recently uploaded

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 

Recently uploaded (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

Reading data into r

  • 1. Reading Structured Data into R Srikanth Potukuchi Analytics Consultant srikanth.potukuchi@gmail.com April 21, 2018 Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 1 / 26
  • 2. Overview 1 Introduction 2 Delimited Files 3 Excel Files 4 Reading from Database 5 Data from Other Statistical Tools 6 Getting Website Data 7 Summary Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 2 / 26
  • 3. Introduction Introduction-Focus on Structured Data Data comes in many forms: Structured and UnStructured. In this deck, we are focusssing on Structured Data-which is basically in the form of rows and columns (Also known as Tabular Data). We will look at reading delimited files(CSV, Tab, Special delimiters), excel files, database(MS Access and Oracle), data from other Statistical tools like SAS, SPSS, Stata, Octave, Minitab,and Systat; and finally we will look at extracting data from Websites as well. Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 3 / 26
  • 4. Introduction Introduction-Installing Packages Installing a package in R can be done using install.packages(“Your Package Name”) and except for the built-in Base package- other packages have to be loaded using library(Your Package Name) or require(Your Package Name). Here’s an example: >install.packages(‘‘RODBC ’’) >library(RODBC) Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 4 / 26
  • 5. Introduction Data Included with R. Suppose we install a package- say ggplot2. What datasets are included in this package? Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 5 / 26
  • 6. Delimited Files Delimited Files-Comma Separated File (CSV) CSV files are mostly read using read.csv() or read.csv2(). The difference between the two is the default separator and decimal point. The basic syntax is: >read.csv(file ,header=TRUE , sep=",",quote=""", dec=".",fill=TRUE ,comment.char="" ,...) >read.csv2(file ,header=TRUE ,sep=";",quote=""", dec=",",fill=TRUE ,comment.char="" ,...) Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 6 / 26
  • 7. Delimited Files Delimited Files-Comma Separated File (CSV) Large CSV files are mostly read using read.table(). The basic syntax is: >read.table(file ,header=FALSE , sep="",quote=""", dec=".",stringsAsFactors =default. stringsAsFactors () ,...) Reading CSV file from a website Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 7 / 26
  • 8. Delimited Files Delimited Files-Comma Separated File (CSV) The result of using read.table is a data.frame. The path will give an error if forward slash or double slash is not used. Reading CSV file locally Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 8 / 26
  • 9. Excel Files Excel Files Difficult to read excel data into R Simplest method convert the Excel file to a CSV file. Packages to address this problem: Gdata XLConnect xlsReadWrite Reading Excel file locally-set the path Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 9 / 26
  • 10. Excel Files Excel Files Use the function readWorksheetFromFile Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 10 / 26
  • 11. Reading from Database Reading from Database To read from databases like Microsoft SQL Server,DB2, MySQL, Oracle SQL or Microsoft Access we must provide an ODBC connection. We need RODBC package to use ODBC in R. First Step is to create a DSN which differs by OS. ODBC Connection Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 11 / 26
  • 12. Reading from Database Reading from Database Download necessary drivers if not present here: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 12 / 26
  • 13. Reading from Database Reading from MS Access Database Add the data source: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 13 / 26
  • 14. Reading from Database Reading from MS Access Database Use odbcConnect() function: Use sqlQuery() function: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 14 / 26
  • 15. Reading from Database Reading from Oracle Database Use this Link for details on reading from Oracle Database.: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 15 / 26
  • 16. Data from Other Statistical Tools Data from Other Statistical Tools List of Packages and Functions: Reading fromSystat: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 16 / 26
  • 17. Getting Website Data Simple HTML Tables Use XML package: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 17 / 26
  • 18. Getting Website Data Simple HTML Tables Use readHTMLTable function: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 18 / 26
  • 19. Getting Website Data Use Regular Expressions to scrape the Web. Presidents Data: Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 19 / 26
  • 20. Getting Website Data Use Regular Expressions to scrape the Web. Use stringr package: Here we used str split() to split a column based on a pattern. Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 20 / 26
  • 21. Getting Website Data Use Regular Expressions to scrape the Web. Use Reduce() to combine the rows into a Matrix. Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 21 / 26
  • 22. Getting Website Data Use Regular Expressions to scrape the Web. Final Results. Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 22 / 26
  • 23. Summary Packages and Functions Data Packages Functions CSV Base read.csv(),read.csv2(), read.table() Excel XLConnect readWorksheetFromFile() Database RODBC odbcConnect() SAS sas7bdat read.sas7bdat() All other foreign read.spss(), read.dta(), read.octave() read.mtp(),read.sysstat() Table : R Packages for reading Structured Data Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 23 / 26
  • 24. Summary Use files across platforms- Windows,Mac,and Linux OS. Use Rdata files to pass around data or any R objects like variables and functions. Use Save(). Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 24 / 26
  • 25. Summary References Jared P. Lander (2013) R for Everyone: Advanced Analytics and Graphics Addison-Wesley Data & Analytics Series. Srikanth Potukuchi (Consultant Analytics) Reading Data April 21, 2018 25 / 26