2. Introduction – Rand RStudio
R:
• Ris acomputer language for statistical computing and is becomingthe
leading language in data science and statistics.
• T
oday,Ris the tool of choice for data scientists in every industryand
field.
• It open source freely available software and very widely usedby
professional statisticians.
• Rhasalarge collection of intermediate tools and excellent graphical
tools for dataanalysis.
3. RStudio:
• RStudioismore widely usedthanR
• RStudioisanIntegrated Development Environment (IDE)for R.
• It includesaconsole,syntax-highlighting editor thatsupports
direct codeexecutionaswell astools for plotting, history,
debugging& workspacemanagement.
Introduction –Rand RStudio
10. Extension Package
• Sometimes we need additional functionality beyond those offered by
the core Rlibrary.
• In order to install an extension package,you shouldinvoke
the install.packages function at the prompt and follow the instruction.
Or
• Select install packagesfrom Toolsdrop down menu
13. Getting Help
• Rprovides an inbuilt facility for getting more information onany
specific feature or namedfunction.
• For example, entering ?cor help(c) at the prompt givesdocumentation
of the function cinR.
• If there is no help available on any function or feature it displays an
error message.
• Bydefault the function help searches in the packageswhich are
loaded in memory.
• Thefunction try.all.packages allows searching all packages.
14. DataTypes
• In analysis of statistical data we use different types of data. For
manipulating suchdata, Rsupports following datatypes
- logical : It is aBoolean type data whose value canbe TRUEorFALSE.
- numeric : It canbe real orinteger.
- complex : It consists ofreal and imaginary numbers.
16. Numeric
• Decimal values are called numeric in R.
• It is the default computational datatype.
• If we assignadecimal value to avariable xasfollows, xwill be of
numeric type.
17. Integer
• In order to create an integer variable in R,we needinvoke
the as.integer function. Wecan be assured that y is indeed aninteger
by applying the is.integerfunction.
• Wecancoerce anumeric value into an integer with the
sameas.integer function
19. Character
• Acharacter object is used to represent string values in R.
• Weconvert objects into character valueswith
the as.character() function:
• Twocharacter values canbe concatenated with the pastefunction.
20. V
ectors
• Roperates on named data structures. Thesimplest suchstructureis
numeric vector.
• Numeric vector is asingle entity consisting of an ordered collectionof
numbers.
• T
ocreate avector named xcontaining 5 numbers 2,5,8,1,35 use
following command
This is assignment using function c().Assignment can also be made
by using assign function.
21. V
ector
• Avector cancontain character strings.
• Incidentally, the number of members in avector is givenby
the length function.
22. Combining Vectors
• Vectors canbe combined via the function c.
• For example, the following two vectors n and sare combined into a
new vector containing elements from bothvectors.
23. VectorArithmetic
• Arithmetic operations of vectors are performedmember-by-
member, i.e., member wise.
• For example, suppose we havetwo vectors aand b.
• Then, if we multiply aby 5, we would get avector with each ofits
members multiplied by 5.
• And if we add aand b together, the sum would be avector whose
members are the sum of the corresponding members from aandb.
24. VectorArithmetic
Recycling Rule
If two vectorsareof unequallength, the shorter onewill berecycledin
order to matchthe longer vector.Forexample,thefollowing
vectorsu andvhavedifferent lengths,andtheir sumiscomputedby
recyclingvaluesof the shorter vectoru.
25. VectorIndex
• Weretrieve valuesin avector bydeclaringanindex insideasingle
squarebracket "[]" operator.
• Forexample,the following showshow to retrieve avectormember.
26. VectorIndex
Negative Index
If the indexisnegative, it would stripthe memberwhoseposition has the
sameabsolutevalueasthe negative index.Forexample,the following
createsavector slicewith the thirdmember removed.
Out-of-Range Index
If anindex isout-of-range, amissingvaluewill bereported viathe
symbolNA.
27. Numeric IndexVector
• Anew vector canbe sliced from agiven vector with anumeric index
vector, which consists of member positions of the original vector to be
retrieved.
• Here it shows how toretrieve avector slice containing the second and
third members of agivenvector’s.
29. 29
Named VectorMembers
• Wecanassignnamesto vector members.
• For example, the following variable v is acharacter string vector
with two members.
• Wenow name the first member asFirst, and the second asLast.
30. Named VectorMembers
• Thenwe can retrieve the first member by itsname.
• Furthermore, we canreverse the order with acharacter stringindex
vector.
31. Matrix
• Amatrix is acollection of data elements arranged in atwo-
dimensional rectangular layout. Thefollowing is an example of amatrix
with 2 rows and 3columns.
• Wereproduce amemory representation of the matrix in Rwith
the matrix function.
33. • An element at the mth row, nth column of Acanbe accessedby the
expressionA[m, n].
• Theentire mth row Acan be extracted asA[m,].
• Similarly, the entire nth column Acanbe extracted asA[,n].
35. • If we assign names to the rows and columns of the matrix, than we can
accessthe elements by names.
36. Matrix Construction
• There are various waysto construct amatrix. When we construct a
matrix directly with data elements, the matrix content is filled along
the column orientation bydefault.
• For example, in the following code snippet, the content of Bisfilled
along the columnsconsecutively.
38. Combining Matrices
• The columns of two matrices having the same number of rows can be
combined into a larger matrix. For example, suppose we have another
matrix Calso with 3rows.
39. • Thenwe can combine the columns of Band Cwith cbind.
40. • Similarly, we cancombine the rows of two matrices ifthey havethe
samenumber of columns with the rbindfunction.
42. TheWorkspace
• Theworkspace is your current Rworking environment and includes
any user-defined objects (vectors, matrices, data frames, lists,
functions).
• At the end of an Rsession, the user can savean image of thecurrent
workspace that is automatically reloaded the next time Risstarted.
• Commandsare entered interactively at the Ruserprompt.
• Up and down arrow keys scroll through your commandhistory.
43. TheWorkspace
• Y
ouwill probably want to keep different projects indifferent physical
directories. Here are some standard commands for managing your
workspace.
IMPORT
ANTNOTEFORWINDOWS USERS:
• Rgets confused if you useapath in your codelike
c:mydocumentsmyfile.txt
• This is because Rsees"" asan escapecharacter. Instead,use
c:/mydocuments/myfile.txt
46. Data Frame
• Adata frame is used for storing data tables. It is alist ofvectors of
equal length.
Example :
Following are the height (in cms)and weight (in kgs) of10 boys.
Prepare adata frame of height andweight.
48. FromExcel (.xlsx Extension)
• One of the best ways to read an Excel file is to export it to acomma
delimited file and import itusing the method above.
• Alternatively you can usethe xlsx packageto accessExcelfiles. The
first row should contain variable/columnnames.
• read in the first worksheet from the workbook myexcel.xlsx
• first row contains variable names
• read in the worksheet namedmysheet
49. FromSAS
• SaveSASdataset in trasport format
libname out xport 'c:/mydata.xpt';
data out.mydata;
set sasuser.mydata;
run;
• In R
# character variables are converted to Rfactors
50. Sorting Data
• T
osort adata frame in R,use the order( )function.
• Bydefault, sorting is ASCENDING.Prepend the sorting variable bya
minus sign to indicate DESCENDINGorder.
Here are some examples.
Sorting examples using the mtcarsdataset
# sort bympg
51. Sorting Data
• # sort by mpg andcyl
• #sort by mpg (ascending) and cyl(descending)