R PROGRAMMING is a computer language, environment to statistical computing & graphics. It is a GNU project which is almost equal r to the S language and its environment which was earlier developed by Bell Laboratories (now Lucent Technologies) by John Chambers & colleagues. R can be implemented as a different advanced version of S. There are few important differences, but lot of code written for S runs unaltered under R programming
1. info@digitalnest.in Digital Nest 8088998664
http://www.digitalnest.in/r-programming-for-data-science-course-hyderabad-india/
1. Introduction
Readingdatain a statistical systemforanalysisandexportof resultstoanothersystem
for reportwritingcanbe frustratingtasksthatcan take a lotlongerthan statistical analysis
itself,althoughmostreaderswill findthe latter muchmore attractive.
Thismanual describesthe importandexportfacilitiesavailable eitherinRor by
packagesavailable atCRAN or elsewhere.
Unlessotherwise indicated,everythingdescribedinthismanual is(atleastinprinciple)available
on all platformsrunningR.
In general,statistical systemslikeRare notparticularlysuitedtomanipulationsof
large scale data. Some othersystemsare betterthanR at that, and some of the pushof
thismanual isto suggestthat ratherthan duplicatingthe functionalityinR,we can doanother
systemdothe work!(For example,TherneauandGrambsch(2000) indicatedthattheyprefer
do some data manipulationinSASandthenuse the package survival (https://CRAN.R-project.
org / package = survival) inSforthe analysis.) Database manipulationsystemsare oftenvery
suitable formanipulatingandretrievingdata:multiplepacketstointeractwithDBMS are
discussedhere.
There are packagesto allowfeaturesdevelopedinlanguages suchasJava,perl and
pythonto be directlyintegratedwiththe Rcode,makinguse of the facilitiesinthese languages even
more appropriate.(See the rJavapackage (https://CRAN.R-project.org/package=rJava)
of CRAN andthe SJava,RSPerl andRSPythonpackagesof the Omegahatproject,http://
www.omegahat.net.)
It shouldalsobe rememberedthatR as S comesfromthe Unix traditionof small re-usable
tools,andit can be rewardingtouse toolssuchas awk andperl to manipulate the databefore
importor afterexport.The case studyin Becker,Chambers&Wilks(1988, Chapter9) isa
example,where Unix toolswere usedtocheckandmanipulate databefore entering
S. Traditional Unix toolsare nowmuchmore widelyavailable,includingforWindows.
Thismanual was writtenforthe firsttime in2000, and the numberof R boxeshasincreased
a hundredtimessince.Forspecializeddataformats,itisuseful tolookforan appropriate package
Alreadyexists.
2. info@digitalnest.in Digital Nest 8088998664
http://www.digitalnest.in/r-programming-for-data-science-course-hyderabad-india/
1.1 Imports
The easiestformof data to importintoR is a simple textfile,whichis oftenacceptable for
small or mediumscale problems.The mainfunctiontoimportfromatextfile isscan,and
thisunderliesmostof the more practical functionsdiscussedinChapter2[Spreadsheet-like
data],page 8.
However,all statistical consultants are familiarwiththe presentationbyaclient
a USB stick(formerlyafloppydiskora CD-R) of data ina certainproprietarybinaryformat,
for example "anExcel spreadsheet"or"anSPSSfile".Oftenthe easiestthingtodoisto use
the original applicationtoexportthe dataasa textfile (and
have copiesof the most commonapplicationsontheircomputersforthispurpose).however,
thisisnot alwayspossible,andChapter3 [Importfromotherstatistical systems],page 14,
discussesthe facilitiesavailabletoaccessthese filesdirectlyfromR.ForExcel spreadsheets,
the available methodsare summarizedinChapter9[ReadingExcel Spreadsheets],page 29.
In some cases,the data has beenstoredinabinaryform forcompactnessandspeedof access.
An applicationof whatwe have seenmanytimesisdataimaging,whichisnormallystored
as a stream of bytesas representedinmemory,possiblyprecededbyaheader.Suchdata formats
are discussedinChapter5[BinaryFiles],page 22,and Section7.5 [BinaryConnections],page 26.
For much largerdatabases,itiscommonto manage data usingdatabase management
system(DBMS).It isagainpossible touse the DBMS to extracta simple file,but
Chapter1: Introduction5
for manyof these DBMS,the extractionoperationcanbe carriedout directlyfromapacket R: See
Chapter4 [RelationalDatabases],page 16.Importingdata overnetworkconnections
inChapter8 [NetworkInterfaces],page 28.
1.1.1 Encodings
Unlessthe file toimportisentirelyinASCII,itisusuallynecessarytoknow how
has beencoded.Fortextfiles,agoodwayto findsomethingaboutitsstructure isthe file