SlideShare a Scribd company logo
Merge Multiple files into single
dataframe using R
Yogesh Khandelwal
Problem Description
• The zip file contains 332 comma-separated-value (CSV) files
containing pollution monitoring data for fine particulate
matter (PM) air pollution at 332 locations in the United States.
Each file contains data from a single monitor and the ID
number for each monitor is contained in the file name. For
example, data for monitor 200 is contained in the file
"200.csv".
• Data Source: http://spark-
public.s3.amazonaws.com/compdata/data/specdata.zip
Variable Name
Variables in file
• Date: the date of observation in YYYY-MM-DD format
(year-month-day) ,Datatype:factor
• sulfate: the level of sulfate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• nitrate: the level of nitrate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• Id:location id,Datatype:int
Before we start we should know
• Functions in R
• How to merge data files
Functions in R
Functions in R
Functions are created using the function() directive and are
stored as R objects just like anything else. In particular, they are R
objects of class “function”.
f <- function(<arguments>) {
## Do something interesting
}
• Functions in R are “first class objects”, which means that they can
be treated much like any other R object. Importantly,
• Functions can be passed as arguments to other functions.
• Functions can be nested, so that you can define a function
inside of another function
• The return value of a function is the last expression in the function
• body to be evaluated.
Function contd..
• For ex:
Function name
Function defination
Function call
Our objective
• How we can merge no. of files into single data
frame?
• How to apply same function to different files
in efficient way?
How to merge two different files?
• No.of options available like
1. Use merge() function
2. Use rbind(),cbind() etc.
How to merge no.of files as a single
data frame
• Approach 1
files<-list.files("specdata",full.names = TRUE)
dat<-NULL
for(i in 1:332)
{
dat<-rbind(dat,read.csv(files[i]))
}
• Further we can run various command on merged file object as per our need some are like:
1. Str(dat)
2. Head(dat)
3. Tail(dat) etc.
Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE,
the file names (rather than paths) are returned.
How to handle missing value in R ?
contd.
• In R, NA is used to represent any value that is 'not available' or 'missing' (in
the | statistical sense)
• Missing values play an important role in statistics and data analysis. Often,
missing values must not be ignored, but rather they should be carefully
studied to see if there's an underlying pattern or cause for their
missingness.
• For ex:
• X<-c(1,2,NA,4)
• Y<-c(NA,2,3,1)
• >x+y
• [1] NA 4 NA 5
• Multiple options are available in R to handle NA values like
• Is.NA()
• Set na.rm=TRUE as a function argument
> mean(X) [1] NA
> mean(X,na.rm = TRUE) [1] 2.333333
Apply what we learn to our dataset
Function defination
Function call
pollutantmean('specdata','nitrate',1:10)
[1] 0.7976266
Thank You!!

More Related Content

What's hot

Role of transportation in supply chain mgmt
Role of transportation in supply chain mgmtRole of transportation in supply chain mgmt
Role of transportation in supply chain mgmttulasi
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
J M
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
Datamining Tools
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
Dabbal Singh Mahara
 
Resume
ResumeResume
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Tony Nguyen
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
Jeremiah Fadugba
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
Krish_ver2
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
rajshreemuthiah
 
SCM - Framework of Structuring Drivers
SCM - Framework of Structuring DriversSCM - Framework of Structuring Drivers
SCM - Framework of Structuring Drivers
Zaka Ul Hassan
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data MiningR A Akerkar
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structure
Pooja Bhojwani
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
Ismail El Gayar
 

What's hot (13)

Role of transportation in supply chain mgmt
Role of transportation in supply chain mgmtRole of transportation in supply chain mgmt
Role of transportation in supply chain mgmt
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Resume
ResumeResume
Resume
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
SCM - Framework of Structuring Drivers
SCM - Framework of Structuring DriversSCM - Framework of Structuring Drivers
SCM - Framework of Structuring Drivers
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
 
Graph in data structure
Graph in data structureGraph in data structure
Graph in data structure
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 

Similar to Merge Multiple CSV in single data frame using R

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
Data Con LA
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
Marcia Zeng
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in india
Edhole.com
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
Subhas Kumar Ghosh
 
Data Life Cycle
Data Life CycleData Life Cycle
Data Life Cycle
Jason Henderson
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
AtulTandan
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
SriTeja Allaparthi
 
File Handling Btech computer science and engineering ppt
File Handling Btech computer science and engineering pptFile Handling Btech computer science and engineering ppt
File Handling Btech computer science and engineering ppt
pinuadarsh04
 
I explore
I exploreI explore
Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)
Prof Ansari
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
The HDF-EOS Tools and Information Center
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
Adetula Bunmi
 
Authoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAuthoring Tool of AAT with DADT
Authoring Tool of AAT with DADT
AAT Taiwan
 
Basics.ppt
Basics.pptBasics.ppt
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questions
subhashmr
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questions
Umesh Sawant
 
Searching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile RelationshipsSearching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile RelationshipsTakashi Kobayashi
 
File handling
File handlingFile handling
File handling
BeebashPokhrel
 

Similar to Merge Multiple CSV in single data frame using R (20)

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
Digital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS dataDigital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS data
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in india
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Data Life Cycle
Data Life CycleData Life Cycle
Data Life Cycle
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
File Handling Btech computer science and engineering ppt
File Handling Btech computer science and engineering pptFile Handling Btech computer science and engineering ppt
File Handling Btech computer science and engineering ppt
 
I explore
I exploreI explore
I explore
 
Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
 
Authoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAuthoring Tool of AAT with DADT
Authoring Tool of AAT with DADT
 
Basics.ppt
Basics.pptBasics.ppt
Basics.ppt
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questions
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questions
 
Searching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile RelationshipsSearching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile Relationships
 
File handling
File handlingFile handling
File handling
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 

Merge Multiple CSV in single data frame using R

  • 1. Merge Multiple files into single dataframe using R Yogesh Khandelwal
  • 2. Problem Description • The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv". • Data Source: http://spark- public.s3.amazonaws.com/compdata/data/specdata.zip
  • 3.
  • 5. Variables in file • Date: the date of observation in YYYY-MM-DD format (year-month-day) ,Datatype:factor • sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • Id:location id,Datatype:int
  • 6. Before we start we should know • Functions in R • How to merge data files
  • 8. Functions in R Functions are created using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class “function”. f <- function(<arguments>) { ## Do something interesting } • Functions in R are “first class objects”, which means that they can be treated much like any other R object. Importantly, • Functions can be passed as arguments to other functions. • Functions can be nested, so that you can define a function inside of another function • The return value of a function is the last expression in the function • body to be evaluated.
  • 9. Function contd.. • For ex: Function name Function defination Function call
  • 10. Our objective • How we can merge no. of files into single data frame? • How to apply same function to different files in efficient way?
  • 11. How to merge two different files?
  • 12. • No.of options available like 1. Use merge() function 2. Use rbind(),cbind() etc.
  • 13. How to merge no.of files as a single data frame • Approach 1 files<-list.files("specdata",full.names = TRUE) dat<-NULL for(i in 1:332) { dat<-rbind(dat,read.csv(files[i])) } • Further we can run various command on merged file object as per our need some are like: 1. Str(dat) 2. Head(dat) 3. Tail(dat) etc. Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE, the file names (rather than paths) are returned.
  • 14. How to handle missing value in R ?
  • 15. contd. • In R, NA is used to represent any value that is 'not available' or 'missing' (in the | statistical sense) • Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness. • For ex: • X<-c(1,2,NA,4) • Y<-c(NA,2,3,1) • >x+y • [1] NA 4 NA 5 • Multiple options are available in R to handle NA values like • Is.NA() • Set na.rm=TRUE as a function argument > mean(X) [1] NA > mean(X,na.rm = TRUE) [1] 2.333333
  • 16. Apply what we learn to our dataset Function defination

Editor's Notes

  1. lapply() applies a given function for each element in a list,so there will be several function calls. do.call() applies a given function to the list as a whole,so there is only one function call.