SlideShare a Scribd company logo
Presented By
Shantilal Bhayal
Assistant Professor
MEDICAPS UNIVERSITY
Programming
Environment
R-
Freedom 0: Freedom to run the Program .How, When and What
Freedom 1: Freedom to study how the program works, adapt it to your
needs. Access to source code recondition for this.
Freedom 2: The freedom to redistribute copies so you can help
your neighbor.
Freedom 3: The freedom to improve the program ,and release
your improvement to the public so that whole community can
benefits.
What is data science ?
Hacking ( Programming) +
Maths/Statistics + Domain Knowledge =
Data Science
SO NEXT WHAT IS
Data Scientist ?
A data scientist is simply a person who
can
write code = in R, Python,Java, SQL,
Hadoop (Pig,HQL,MR) etc
= for data storage, querying,
summarization, visualization
= how efficiently, and in time (fast
results?)
= where on databases, on cloud,
servers
and understand enough statistics
to derive insights from data
so business can make decisions
Data Science with R :A popular language in Data Science
https://www.tiobe.com/tiobe-index/
The are some milestone dates in the development of R:
R version 4.2.1 (Funny-Looking Kid) has been released on 2022-06-23.
► Early 1990s: The development of R began.
► August 1993: The software was announced on the S-news mailing list.
► www.r-project.org/mail,html
► June 1995: After some persuasive arguments by Martin Mächler - code available as “free software,” under
the FSF’s GNU GPL, Version 2.
► Mid-1997: The initial R Development Core Team was formed(Core group)
► February 2000: The first version of R, version 1.0.0, was released.
► R : Past and Future History (r-project.org); https://cran.r-project/doc/html/interface98-paper/paper.html
What's great about R?
CRAN Packages By Date (r-project.org) https://cran.r-project.org/web/packages/
R can perform various data analysis and data science tasks for free
Interactive Visualization with Shiny package (Equivalent SAS Product : Visual Analytics)
Ensemble Learning / Machine Learning (SAS Product : SAS Enterprise Miner)
Text / Social Media Mining (SAS Product : SAS Text Miner)
Optimization and Forecasting (SAS Product : SAS ETS, PROC OPTMODEL)
RStudio IDE (SAS Product : SAS Enterprise Guide)
 Integartion: Tableau, SQL Server, VS , PowewrBI
The system saves data sets between sessions, so you don't need to reload them each time. It
saves your command history too.
What is R?
Free alternative to MATLAB,Excdel ,SAS and SPSS.
R is a:
1. Statistical Software
2. Language
3. Environment
4. Ecosystem
Used by Google ,Facebook ,Bank of America etc.
Millions of user word wide
Where is R used?
Big data demands of companies
analyse user behaviour.
online advertising and e-commerce
Weather services use it for weather forecasts.
It is a fundamental tool for analytics-driven
organizations
What is R?
 R is a dialect of S.
 S was a language, or is a language that was developed by John
Chambers and at the now-defunct Bell Labs.
 S was initiated in 1976 as an internal statistical analysis environment-
originally implemented as a Fortran Libraries.
 Early versions of the language did not contain functions for statistical
modelling.
 So in 1988, the system was rewritten in the C language and to make it more
portable across systems and it began to resemble the system that we have
today. Historical Notes
 In 1993 Bell Labs gave a corporation called StatSci which became Insightful
Corporation, an exclusive license to develop and sell the S language.
 In 2004, Insightful purchased the S language completely from Lucent for $2 million
is the current owner.
 In 2006, Alcatel purchased Lucent Technologies and it's now called Alcatel-Lucent.
 Insightful sell its implementation of the S language under the product name S-PLUS
and has built a number of fancy features(GUI Mostely) on top of it- ”PLUS” .
Version 4 of the S language was released in 1998. And its version, it's the
version we more or less use today. The book Programming with Data, which is
a reference for this course, is written by John Chambers sometimes called the
green book and it documents version four of the S language.
 In 2008 the Insightful Corporation was acquired a company called TIBCO for $25
million dollars
 The basic fundamentals of the S language have not really changed since 1998.
 In 1998 S won the Association for Computing Machinery’s Software System award
 1991: It was created in New Zealand by two gentleman named Ross Ihaka
and Robert Gentleman.
 1993: First announcement to public.
 1995: Martin Michler convinced Ross and Robert to use, to license R under the
GNU General Public License to make R free software.
 1996: A Public mailing list is created(R-help and R-devel).
 1997: The core group is formed. The core group control the source code of R
 2000: R 1.0.0 Version is released.
R version 4.2.1 (Funny-Looking Kid) has been released on 2022-06-23.
What is R
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display
An effective data handling and storage facility,
A suite of operators for calculations on arrays, in particular matrices,
A large, coherent, integrated collection of intermediate tools for data analysis,
Graphical facilities for data analysis and display either directly at the computer or on hardcopy, and
A well developed, simple and effective programming language (called ‘S’) which includes
conditionals, loops, user defined recursive functions and input and output facilities.
 R is a system for statistical computation and graphics.
 It consists of a language plus a run-time environment with graphics, a
debugger , access to certain system functions, and the ability to run programs
stored in script files.
 It is free software distributed under a GNU-style copyleft, and an official part
of the GNU project (“GNU S”).
Install R: https://cran.r-project.org/bin/windows/base/
Install RStudio
https://www.rstudio.com/products/rstudio/
Alternatives to the standard R editors
Eclipse StatET www.walware.de/goto/statet IDE -java
Emacs Speaks Statistics http://ess.r-project.org Emacs, a powerful text and code
editor
Tinn-R www.sciviews.org/Tinn-R : This editor, developed specifically for working with
R, is available only for Windows
Design of the R System
The R System is divided into 2 conceptual part.
1. The base R system that you downloaded from CRAN
2. Everything else
R functionality divided into number of packages
 The “base” system contain, among the other things the base package which is
required to run R And contain the most fundamental functions
 There are other packages contained in the base system which includes for
example util, stats, datasets, graphics ,grDevices, grid, methods ,tools ,
parallel,compiler,stats4.
 There are also recommended packages: that are kind of fundamental
packages that more or less everyone might use. And then there are a series of
recommended packages, so, boot for bootstrap, class classification, cluster,
codetools,foreign, and a variety of other packages.
How R works
1. R is an interpreted language, not a compiled one.
2. Syntax : lm(y ~ x) which means “fitting a linear model with y as response and x
as predictor”.
3. ls() and ls-content of function .
How R works
Features
►It comes as free, open-source code- stable and Reliable . license- www.r-
project.org/COPYING.
►It runs anywhere-MAC, Windows, Unix System
►It supports extensions :data manipulation, statistical modeling, and graphics.
Extensibility-write own s/w and distribute it on the form of add-on pkgs.
►It provides an engaged community :
► www.r-project.org/mail.html
www.stackoverflow.com/questions/tagged/r
http://stats.stackexchange.com/questions/tagged/r
www.twitter.com/search/rstats(R regional Conferences)
It connects with other languages: R package foreign
http://cran.rproject.org/web/packages/foreign/index.html SPSS, SAS, Stata.
RODBC, ROracle
Unique Features: Performing multiple
calculations with vectors: R is Vector
based language
Ex: x<- 1:5
Call x
> x [1] 1 2 3 4 5
> x+2
> x+ 6:10 ( Two Vector)
Processing more than just statistics :
data processing, graphic visualization,
and analysis of all sort
Running code without a compiler-
Development cycle easy- downside of
interptreted language –slow
Object oriented and Functional
Programming
Distributed Computing
 In distributed computing, tasks are split between multiple processing nodes to reduce
processing time and increase efficiency. ddR and multiDplyr -large data sets.
Compatibility with Other Data Processing Technologies
R can be easily paired with other data processing and distributed computing technologies
technologies like Hadoop and Spark.
It is possible to remotely use a Spark cluster to process large datasets using R
 Generates Report in any Desired Format: R’s markdown package
Limitations of R
Steep Learning Curve: R is not an easy language to get started with. Beginners find it
hard to get their feet wet due to the command-line interface. (Rstudio)
Hungry for Physical Memory: R stores all its data in the physical memory ,hard to
handle large data set. Hadoop integration for R
Slower execution: R would need a lot of optimizations before your code can run as
fast as it does on MATLAB or Python.
Drawback of R
 Essentially based on 40 yrs. old Technology.
 Little built-in support for Dynamic and 3-D graphics.
 Functionality is based on consumer demand and user contributions.
 Object must be stored in physical memory of computer: but here have been
advancement to deal with this too.
 Not ideal for all Possible solutions.
Some important commands
1. help(command), ?command
2. help. start(): opens the help system in the system default browser
3. apropos(): Show all the commands that contain the “partword”
4. install.packages(“pkg”): install a library of command form CRAN website.
5. installed.packages(): list of the packages installed
6. library(pkg) : Load a package of commands, make them available for use (the
pkg must be installed)
7. search(): shows a list of all packages and( other objects) that are loaded and
available for use.
8. detach(package:name)- name will be replaced with package name
Pacman-make them available all
packages
Install.packages(pacman)
require(pacman); configuration message
Library(pacman) – no message
p_unload(dplyr,ggplot2,tidyr) # clear specific package..
P_unload(all)
Detach(“package:datasets”,unload=TRUE) # for base
# clear console
Cat(“014”) # ctr+L
Cancelling commands : Ctr+C
Reserved Words in R
The reserved words in R's parser are
if else repeat while function for in next break
TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_
NA_complex_ NA_character_
... and ..1, ..2 etc,
Operator in R
Operator Syntax and Precedence
:: ::: access variables in a namespace
$ @ component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> special operators (including %% and %/%)
* / multiply, divide
The following unary and binary operators are defined. They are
listed in precedence groups, from highest to lowest.
Operator and Precedence
+ - (binary) add, subtract
< > <= >= == != ordering and comparison
! negation
& && and
| || or
~ as in formulae
-> ->> rightwards assignment
<- <<- assignment (right to left)
= assignment (right to left)
? help (unary and binary)
Relational Operator
Comparators or operators to check whether two object are equal or not
# R program to illustrate
# the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)
# Performing operations on Operands
cat ("Addition of vectors :", vec1 + vec2, "n")
cat ("Subtraction of vectors :", vec1 - vec2, "n")
cat ("Multiplication of vectors :", vec1 * vec2, "n")
cat ("Division of vectors :", vec1 / vec2, "n")
cat ("Modulo of vectors :", vec1 %% vec2, "n")
cat ("Power operator :", vec1 ^ vec2)
%in%
The %in% operator in R can be used to identify if an element (e.g., a
number) belongs to a vector or dataframe. For example, it can be used
the see if the number 1 is in the sequence of numbers 1 to 10
 This operator is used to multiply a matrix with its transpose.
 The number of columns of the first matrix must be equal to the
number of rows of the second matrix.
%*% Operator:
What is the Difference Between the == and
%in% Operators in R
 The %in% operator is used for matching values. “returns a vector of the positions
of (first) matches of its first argument in its second”.
 On the other hand, the == operator, is a logical operator and is used to compare if
two elements are exactly equal. Using the %in% operator you can compare
vectors of different lengths to see if elements of one vector match at least one
element in another.
1: Using %in% to Compare two Sequences of
Numbers (vectors)
# sequence of numbers 1:
a <- seq(1, 5)
# sequence of numbers 2:
b <- seq(3, 12)
# using the %in% operator to check matching
values in the vectors
a %in% b
R Resource
1. FAQ: https://CRAN.R-project.org/doc/FAQ/R-FAQ.html
2. Mailing lists: https://www.R-project.org/mail.html
3. Archives: https://CRAN.R-project.org/mirrors.html
4. Bug-tracking system: https://bugs.R-project.org/
Arithmetic Operators
These unary and binary operators perform
arithmetic on numeric or complex vectors (or
objects)
+ x - x
x + y x - y
x * y
x / y
x ^ y
x %% y
x %/% y
Language objects
Language objects : calls, expressions, and names.
objects have modes "call", "expression", and "name",
They can be created directly from expressions using the quote
mechanism and converted to and from lists by the as.list and as.call
functions.
Entering Input
• At the R prompt we type expression.
> x<-1
Print(x) S<-rep(obj,times=10)
[1] 1 seq(length=100,from=4 by=1)
> msg<- “Welcome”
The grammar of the language determine whether an expression is
complete or not. > X<- # incomplete expression
R commands
R commands, case sensitivity, etc. (country locale)
Executing commands from or diverting output to a file
source("commands.R'") # execute command save in file named commands.R
sink("record.lis") # divert all subsequent output from the console to an external
file, record.lis. SQR
sink() #restores it to the console once again.
.Rdata= # all Object
.Rhistory # command line used in session
Data permanency and removing objects
 The entities that R creates and manipulates are known as objects.
 variables, arrays of numbers, character strings, functions, or
structures built from such components.
 During an R session, objects are created and stored by name
 > objects()
 The collection of objects currently stored is called the workspace.
To remove objects
> rm(x, y, ….)
Objects, their modes and attributes
The entities R operates on are technically known as objects.
Example; “atomic” vector # component or mode same
Recursive: List, function and expression
mode : basic type of its fundamental constituents. This is a special case of a
“property” of an object
mode(object) and length(object)
compl<-c(2+3i,4+5i) l=2 m=complex
properties of an object are usually provided by attributes(object)
As.character(_) As.complex(object)
Empty object
emp-obje<-character()
emp_obj[6]<-57
Changing the length of an object
The class of an object
All objects in R have a class, reported by the function class.
A special attribute known as the class of the object is used to allow
for an object-oriented style of programming in R.
# To remove temporarily the effects of class, use the function
unclass(). For example if winter has the class "data.frame" then
> winter
will print it in data frame form, which is rather like a matrix, whereas
Learn R Programming (Tutorial & Examples) | Free Introduction Course (statisticsglobe.com)
R Guides – Statology
https://www.youtube.com/watch?v=fpl_ny-
jX5Y&list=RDCMUC87aeHqMrlR6ED0w2SVi5nw&start_radio=1&rv=fpl_ny-jX5Y&t=0
https://www.youtube.com/watch?v=UYclmg1_KLk&list=PLqzoL9-
eJTNDw71zWePXyHx3_cm_fMP8S&index=3
Your best quote that reflects your
approach… “It’s one small step for
man, one giant leap for mankind.”
- NEIL ARMSTRONG
 Identifying potential problems.
 Optimizing price dynamically.
Improving the allocation of “available to promise” inventory.
What is supply chain management? | IBM
www.fsf.org
https://www.youtube.com/watch?v=ckdHNu4kfL0
Why is Supply Chain Management is important?

More Related Content

Similar to R_L1-Aug-2022.pptx

DSM software tools
DSM software toolsDSM software tools
DSM software tools
FAO
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
IRJET Journal
 
Python vs. r for data science
Python vs. r for data sciencePython vs. r for data science
Python vs. r for data science
Hugo Shi
 
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziFinancial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Dr. Muhammad Ali Tirmizi., Ph.D.
 
R programming language
R programming languageR programming language
R programming language
Keerti Verma
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane
 
Bhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystemBhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystem
Vijayananda Mohire
 
4. Digital Soil Mapping: Software and Tools
4. Digital Soil Mapping: Software and Tools4. Digital Soil Mapping: Software and Tools
4. Digital Soil Mapping: Software and Tools
FAO
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
arttan2001
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCE
USDSI
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
Ajay Ohri
 
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Maurice Dawson
 
4. Digital Soil Mapping software tools
4. Digital Soil Mapping software tools4. Digital Soil Mapping software tools
4. Digital Soil Mapping software tools
ExternalEvents
 
Know thy logos
Know thy logosKnow thy logos
Know thy logos
Vishal V
 
Airline Data Analysis
Airline Data AnalysisAirline Data Analysis
Airline Data Analysis
Pedro Craggett
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
hemasri56
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
Ferdin Joe John Joseph PhD
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
Akshat Sharma
 

Similar to R_L1-Aug-2022.pptx (20)

DSM software tools
DSM software toolsDSM software tools
DSM software tools
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
 
Python vs. r for data science
Python vs. r for data sciencePython vs. r for data science
Python vs. r for data science
 
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziFinancial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
 
R programming language
R programming languageR programming language
R programming language
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Bhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystemBhadale group of companies our technology ecosystem
Bhadale group of companies our technology ecosystem
 
4. Digital Soil Mapping: Software and Tools
4. Digital Soil Mapping: Software and Tools4. Digital Soil Mapping: Software and Tools
4. Digital Soil Mapping: Software and Tools
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
 
Reason To learn & use r
Reason To learn & use rReason To learn & use r
Reason To learn & use r
 
GET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCEGET STARTED WITH R FOR DATA SCIENCE
GET STARTED WITH R FOR DATA SCIENCE
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
 
4. Digital Soil Mapping software tools
4. Digital Soil Mapping software tools4. Digital Soil Mapping software tools
4. Digital Soil Mapping software tools
 
Know thy logos
Know thy logosKnow thy logos
Know thy logos
 
Airline Data Analysis
Airline Data AnalysisAirline Data Analysis
Airline Data Analysis
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 

Recently uploaded

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 

Recently uploaded (20)

一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 

R_L1-Aug-2022.pptx

  • 1. Presented By Shantilal Bhayal Assistant Professor MEDICAPS UNIVERSITY Programming Environment R-
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Freedom 0: Freedom to run the Program .How, When and What Freedom 1: Freedom to study how the program works, adapt it to your needs. Access to source code recondition for this. Freedom 2: The freedom to redistribute copies so you can help your neighbor. Freedom 3: The freedom to improve the program ,and release your improvement to the public so that whole community can benefits.
  • 7. What is data science ? Hacking ( Programming) + Maths/Statistics + Domain Knowledge = Data Science
  • 8. SO NEXT WHAT IS Data Scientist ? A data scientist is simply a person who can write code = in R, Python,Java, SQL, Hadoop (Pig,HQL,MR) etc = for data storage, querying, summarization, visualization = how efficiently, and in time (fast results?) = where on databases, on cloud, servers and understand enough statistics to derive insights from data so business can make decisions
  • 9. Data Science with R :A popular language in Data Science https://www.tiobe.com/tiobe-index/
  • 10. The are some milestone dates in the development of R: R version 4.2.1 (Funny-Looking Kid) has been released on 2022-06-23. ► Early 1990s: The development of R began. ► August 1993: The software was announced on the S-news mailing list. ► www.r-project.org/mail,html ► June 1995: After some persuasive arguments by Martin Mächler - code available as “free software,” under the FSF’s GNU GPL, Version 2. ► Mid-1997: The initial R Development Core Team was formed(Core group) ► February 2000: The first version of R, version 1.0.0, was released. ► R : Past and Future History (r-project.org); https://cran.r-project/doc/html/interface98-paper/paper.html
  • 11. What's great about R? CRAN Packages By Date (r-project.org) https://cran.r-project.org/web/packages/ R can perform various data analysis and data science tasks for free Interactive Visualization with Shiny package (Equivalent SAS Product : Visual Analytics) Ensemble Learning / Machine Learning (SAS Product : SAS Enterprise Miner) Text / Social Media Mining (SAS Product : SAS Text Miner) Optimization and Forecasting (SAS Product : SAS ETS, PROC OPTMODEL) RStudio IDE (SAS Product : SAS Enterprise Guide)  Integartion: Tableau, SQL Server, VS , PowewrBI The system saves data sets between sessions, so you don't need to reload them each time. It saves your command history too.
  • 12. What is R? Free alternative to MATLAB,Excdel ,SAS and SPSS. R is a: 1. Statistical Software 2. Language 3. Environment 4. Ecosystem Used by Google ,Facebook ,Bank of America etc. Millions of user word wide
  • 13. Where is R used? Big data demands of companies analyse user behaviour. online advertising and e-commerce Weather services use it for weather forecasts. It is a fundamental tool for analytics-driven organizations
  • 14.
  • 15.
  • 16. What is R?  R is a dialect of S.  S was a language, or is a language that was developed by John Chambers and at the now-defunct Bell Labs.  S was initiated in 1976 as an internal statistical analysis environment- originally implemented as a Fortran Libraries.  Early versions of the language did not contain functions for statistical modelling.
  • 17.  So in 1988, the system was rewritten in the C language and to make it more portable across systems and it began to resemble the system that we have today. Historical Notes  In 1993 Bell Labs gave a corporation called StatSci which became Insightful Corporation, an exclusive license to develop and sell the S language.  In 2004, Insightful purchased the S language completely from Lucent for $2 million is the current owner.  In 2006, Alcatel purchased Lucent Technologies and it's now called Alcatel-Lucent.  Insightful sell its implementation of the S language under the product name S-PLUS and has built a number of fancy features(GUI Mostely) on top of it- ”PLUS” .
  • 18. Version 4 of the S language was released in 1998. And its version, it's the version we more or less use today. The book Programming with Data, which is a reference for this course, is written by John Chambers sometimes called the green book and it documents version four of the S language.  In 2008 the Insightful Corporation was acquired a company called TIBCO for $25 million dollars  The basic fundamentals of the S language have not really changed since 1998.  In 1998 S won the Association for Computing Machinery’s Software System award
  • 19.  1991: It was created in New Zealand by two gentleman named Ross Ihaka and Robert Gentleman.  1993: First announcement to public.  1995: Martin Michler convinced Ross and Robert to use, to license R under the GNU General Public License to make R free software.  1996: A Public mailing list is created(R-help and R-devel).  1997: The core group is formed. The core group control the source code of R  2000: R 1.0.0 Version is released. R version 4.2.1 (Funny-Looking Kid) has been released on 2022-06-23.
  • 20. What is R R is an integrated suite of software facilities for data manipulation, calculation and graphical display An effective data handling and storage facility, A suite of operators for calculations on arrays, in particular matrices, A large, coherent, integrated collection of intermediate tools for data analysis, Graphical facilities for data analysis and display either directly at the computer or on hardcopy, and A well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities.
  • 21.  R is a system for statistical computation and graphics.  It consists of a language plus a run-time environment with graphics, a debugger , access to certain system functions, and the ability to run programs stored in script files.  It is free software distributed under a GNU-style copyleft, and an official part of the GNU project (“GNU S”).
  • 24. Alternatives to the standard R editors Eclipse StatET www.walware.de/goto/statet IDE -java Emacs Speaks Statistics http://ess.r-project.org Emacs, a powerful text and code editor Tinn-R www.sciviews.org/Tinn-R : This editor, developed specifically for working with R, is available only for Windows
  • 25. Design of the R System The R System is divided into 2 conceptual part. 1. The base R system that you downloaded from CRAN 2. Everything else R functionality divided into number of packages  The “base” system contain, among the other things the base package which is required to run R And contain the most fundamental functions
  • 26.  There are other packages contained in the base system which includes for example util, stats, datasets, graphics ,grDevices, grid, methods ,tools , parallel,compiler,stats4.  There are also recommended packages: that are kind of fundamental packages that more or less everyone might use. And then there are a series of recommended packages, so, boot for bootstrap, class classification, cluster, codetools,foreign, and a variety of other packages.
  • 27. How R works 1. R is an interpreted language, not a compiled one. 2. Syntax : lm(y ~ x) which means “fitting a linear model with y as response and x as predictor”. 3. ls() and ls-content of function .
  • 29. Features ►It comes as free, open-source code- stable and Reliable . license- www.r- project.org/COPYING. ►It runs anywhere-MAC, Windows, Unix System ►It supports extensions :data manipulation, statistical modeling, and graphics. Extensibility-write own s/w and distribute it on the form of add-on pkgs. ►It provides an engaged community : ► www.r-project.org/mail.html www.stackoverflow.com/questions/tagged/r http://stats.stackexchange.com/questions/tagged/r www.twitter.com/search/rstats(R regional Conferences)
  • 30. It connects with other languages: R package foreign http://cran.rproject.org/web/packages/foreign/index.html SPSS, SAS, Stata. RODBC, ROracle Unique Features: Performing multiple calculations with vectors: R is Vector based language Ex: x<- 1:5 Call x > x [1] 1 2 3 4 5 > x+2 > x+ 6:10 ( Two Vector) Processing more than just statistics : data processing, graphic visualization, and analysis of all sort Running code without a compiler- Development cycle easy- downside of interptreted language –slow Object oriented and Functional Programming
  • 31. Distributed Computing  In distributed computing, tasks are split between multiple processing nodes to reduce processing time and increase efficiency. ddR and multiDplyr -large data sets. Compatibility with Other Data Processing Technologies R can be easily paired with other data processing and distributed computing technologies technologies like Hadoop and Spark. It is possible to remotely use a Spark cluster to process large datasets using R  Generates Report in any Desired Format: R’s markdown package
  • 32. Limitations of R Steep Learning Curve: R is not an easy language to get started with. Beginners find it hard to get their feet wet due to the command-line interface. (Rstudio) Hungry for Physical Memory: R stores all its data in the physical memory ,hard to handle large data set. Hadoop integration for R Slower execution: R would need a lot of optimizations before your code can run as fast as it does on MATLAB or Python.
  • 33. Drawback of R  Essentially based on 40 yrs. old Technology.  Little built-in support for Dynamic and 3-D graphics.  Functionality is based on consumer demand and user contributions.  Object must be stored in physical memory of computer: but here have been advancement to deal with this too.  Not ideal for all Possible solutions.
  • 34. Some important commands 1. help(command), ?command 2. help. start(): opens the help system in the system default browser 3. apropos(): Show all the commands that contain the “partword” 4. install.packages(“pkg”): install a library of command form CRAN website. 5. installed.packages(): list of the packages installed 6. library(pkg) : Load a package of commands, make them available for use (the pkg must be installed) 7. search(): shows a list of all packages and( other objects) that are loaded and available for use. 8. detach(package:name)- name will be replaced with package name
  • 35. Pacman-make them available all packages Install.packages(pacman) require(pacman); configuration message Library(pacman) – no message p_unload(dplyr,ggplot2,tidyr) # clear specific package.. P_unload(all) Detach(“package:datasets”,unload=TRUE) # for base
  • 36.
  • 37. # clear console Cat(“014”) # ctr+L Cancelling commands : Ctr+C
  • 38. Reserved Words in R The reserved words in R's parser are if else repeat while function for in next break TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_ NA_character_ ... and ..1, ..2 etc,
  • 40. Operator Syntax and Precedence :: ::: access variables in a namespace $ @ component / slot extraction [ [[ indexing ^ exponentiation (right to left) - + unary minus and plus : sequence operator %any% |> special operators (including %% and %/%) * / multiply, divide The following unary and binary operators are defined. They are listed in precedence groups, from highest to lowest.
  • 41. Operator and Precedence + - (binary) add, subtract < > <= >= == != ordering and comparison ! negation & && and | || or ~ as in formulae -> ->> rightwards assignment <- <<- assignment (right to left) = assignment (right to left) ? help (unary and binary)
  • 42. Relational Operator Comparators or operators to check whether two object are equal or not
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51. # R program to illustrate # the use of Arithmetic operators vec1 <- c(0, 2) vec2 <- c(2, 3) # Performing operations on Operands cat ("Addition of vectors :", vec1 + vec2, "n") cat ("Subtraction of vectors :", vec1 - vec2, "n") cat ("Multiplication of vectors :", vec1 * vec2, "n") cat ("Division of vectors :", vec1 / vec2, "n") cat ("Modulo of vectors :", vec1 %% vec2, "n") cat ("Power operator :", vec1 ^ vec2)
  • 52. %in% The %in% operator in R can be used to identify if an element (e.g., a number) belongs to a vector or dataframe. For example, it can be used the see if the number 1 is in the sequence of numbers 1 to 10
  • 53.  This operator is used to multiply a matrix with its transpose.  The number of columns of the first matrix must be equal to the number of rows of the second matrix. %*% Operator:
  • 54. What is the Difference Between the == and %in% Operators in R  The %in% operator is used for matching values. “returns a vector of the positions of (first) matches of its first argument in its second”.  On the other hand, the == operator, is a logical operator and is used to compare if two elements are exactly equal. Using the %in% operator you can compare vectors of different lengths to see if elements of one vector match at least one element in another.
  • 55. 1: Using %in% to Compare two Sequences of Numbers (vectors) # sequence of numbers 1: a <- seq(1, 5) # sequence of numbers 2: b <- seq(3, 12) # using the %in% operator to check matching values in the vectors a %in% b
  • 56. R Resource 1. FAQ: https://CRAN.R-project.org/doc/FAQ/R-FAQ.html 2. Mailing lists: https://www.R-project.org/mail.html 3. Archives: https://CRAN.R-project.org/mirrors.html 4. Bug-tracking system: https://bugs.R-project.org/
  • 57. Arithmetic Operators These unary and binary operators perform arithmetic on numeric or complex vectors (or objects) + x - x x + y x - y x * y x / y x ^ y x %% y x %/% y
  • 58.
  • 59.
  • 60.
  • 61. Language objects Language objects : calls, expressions, and names. objects have modes "call", "expression", and "name", They can be created directly from expressions using the quote mechanism and converted to and from lists by the as.list and as.call functions.
  • 62.
  • 63. Entering Input • At the R prompt we type expression. > x<-1 Print(x) S<-rep(obj,times=10) [1] 1 seq(length=100,from=4 by=1) > msg<- “Welcome” The grammar of the language determine whether an expression is complete or not. > X<- # incomplete expression
  • 64. R commands R commands, case sensitivity, etc. (country locale) Executing commands from or diverting output to a file source("commands.R'") # execute command save in file named commands.R sink("record.lis") # divert all subsequent output from the console to an external file, record.lis. SQR sink() #restores it to the console once again. .Rdata= # all Object .Rhistory # command line used in session
  • 65. Data permanency and removing objects  The entities that R creates and manipulates are known as objects.  variables, arrays of numbers, character strings, functions, or structures built from such components.  During an R session, objects are created and stored by name  > objects()  The collection of objects currently stored is called the workspace. To remove objects > rm(x, y, ….)
  • 66. Objects, their modes and attributes The entities R operates on are technically known as objects. Example; “atomic” vector # component or mode same Recursive: List, function and expression mode : basic type of its fundamental constituents. This is a special case of a “property” of an object mode(object) and length(object) compl<-c(2+3i,4+5i) l=2 m=complex
  • 67. properties of an object are usually provided by attributes(object) As.character(_) As.complex(object) Empty object emp-obje<-character() emp_obj[6]<-57 Changing the length of an object
  • 68. The class of an object All objects in R have a class, reported by the function class. A special attribute known as the class of the object is used to allow for an object-oriented style of programming in R. # To remove temporarily the effects of class, use the function unclass(). For example if winter has the class "data.frame" then > winter will print it in data frame form, which is rather like a matrix, whereas
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101. Learn R Programming (Tutorial & Examples) | Free Introduction Course (statisticsglobe.com) R Guides – Statology https://www.youtube.com/watch?v=fpl_ny- jX5Y&list=RDCMUC87aeHqMrlR6ED0w2SVi5nw&start_radio=1&rv=fpl_ny-jX5Y&t=0 https://www.youtube.com/watch?v=UYclmg1_KLk&list=PLqzoL9- eJTNDw71zWePXyHx3_cm_fMP8S&index=3
  • 102.
  • 103. Your best quote that reflects your approach… “It’s one small step for man, one giant leap for mankind.” - NEIL ARMSTRONG
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.  Identifying potential problems.  Optimizing price dynamically. Improving the allocation of “available to promise” inventory. What is supply chain management? | IBM www.fsf.org https://www.youtube.com/watch?v=ckdHNu4kfL0 Why is Supply Chain Management is important?

Editor's Notes

  1. 1. When R is running, variables, data, functions, results, etc, are stored in the active memory of the computer in the form of objects which have a name. The user can do actions on these objects with operators (arithmetic, logical, comparison, . . .) and functions (which are themselves objects).
  2. All the actions of R are done on objects stored in the active memory of the computer: no temporary files are used (Fig. 1). The readings and writings of files are used for input and output of data and results (graphics, . . .). The user executes the functions via some commands. The results are displayed directly on the screen, stored in an object, or written on the disk (particularly for graphics). Since the results are themselves objects, they can be considered as data and analyzed as such. Data files can be read from the local disk or from a remote server through internet.
  3. In every computer language variables provide a means of accessing the data stored in memory. R does not provide direct access to the computer’s memory but rather provides a number of specialized data structures we will refer to as objects. These objects are referred to through symbols or variables. In R, however, the symbols are themselves objects and can be manipulated in the same way as any other object. This is different from many other languages and has wide ranging effects. In this chapter we provide preliminary descriptions of the various data structures provided in R. More detailed discussions of many of them will be found in the subsequent chapters. The R specific function typeof returns the type of an R object. Note that in the C code underlying R, all objects are pointers to a structure with typedef SEXPREC; the different R data types are represented in C by SEXPTYPE, which determines how the information in the various parts of the structure is used.
  4. So factor is a special type of vector, which is used to create, to represent categorical data. So, and there's two types of factor, there is unordered or ordered, so you can think of this as being, as storing data that are. Have labels that are categorical but have no ordering, so for example male and female. Or you can have ordered factors which might represent things that are ranked. So they have an order but they're not numerical for example you know, in many universities you'll have assistant professors, associates professors, and full professors. Those are categorical but they're ordered. So one, you can think of a factor as an integer vector where each integer has a label. So for example, you might, you can think of it as a vector as one two three, where one represents you know, high, for example high value and two represents a medium value and three represents a low value. So you might have a, a variable that's called high, medium and low. And underlying in R is represented by the numbers one, two, and three. so, factors are important because they're treated specially by modeling functions like lm and glm which we'll talk about later. But these are functions for, for, for fitting linear models. And factors are with labels generally speaking are better than using simple integer vectors because the factors are, what are called self describing. So having a variable that has values male and female is more descriptive than having a variable that just, that just has ones and twos. So for example, in many data sets you'll find that a var, there will be a variable that's coded as one and two and it's, and it's not. Easy to know whether that variable is really a numeric variable that only takes values one and two, but the problem is that's not something that's coded in the data set, so it's hard to tell. If you use a factor variable then the coding for the labels is all, is kind of built into the variable and it's much easier to understand.
  5. So there's a special type of object that we haven't talked too much about yet. And these are missing values. Missing values in R are denoted by either NA or NAN which we talked about before. NAN is used for undefined mathematical operations. And NA is pretty much used for everything else. And so, there's a function in R called is.na which is used to test objects to see if they are NA. To see if they are missing values in that object. There's another function called is.nan which is used to test for NANs. So, NA values can have a class, too. So you can have missing integer val, values or you can have missing character values or missing numeric values etc. And so even though it looks like it's all NAs, the NAs can have different classes potentially. And then it's an NA, an NAN value is considered to be also NA, so for example, an NAN value, a NAN value, is missing. Is considered to be missing. So, but the reverse is not true. So an NA value is not necessarily, an NAN value. I've got a few different types of missing values listed here. So, here I created a vector x which is 1,2, NA, 10, and 3. So, now, this is a numeric vector. And the NA value in here's going to be a numeric missing value. So when I call is.na on x, what it returns is a, is a logical vector. And the logical vector indicates whether each element of the vector x is missing or not. And so, there's only one missing element in this vector, and so that's the third element. So you can see that the, that the logical vector that's returned. The first two are false, the third is true, and the fourth and the fifth are false. So the, the, the element that's true indicated where the missing value is. If I call is.NaN on this vector, you'll see that vector that's returned is all false. Because there aren't any NaN values, or their aren't any MAN values in this vector so everything's false. Of course, if I create a vector that has an end, a NAN value and an, and an NA value in it. You'll see that is.na returns true for both of them. But is.nan only returns true for the for the value that's actually NAN. English
  6. The last data type I'm going to talk about here is the data frame. The data frame is a key data type used in R and it's used to store tabular data. So of course, tabular data make up a lot of what we use in statistics. Of course not all types of data are tabular. But because so much data becomes a tabular form. Data frames are very important in R. So data frames are basically represented as a special type of list, where every element of that list has the same length. Right, so you can think of each column of the data frame as an element of the list, and of course, in order to be a table, every column has to have the same length. However, each column doesn't have to be the same type. So the first column could be numbers, the second column could be factor, the third column could be integers the fourth column could be logicals, it doesn't matter what the different types are. so, unlike matrices where, wh, which have to store the same type of object in every single element of the matrix, data frame can store your cla objects of different classes. And so, data frames also have some special attributes. First, the first special attribute is called a row name. And so every row of a data frame has a name. And this can be useful for kind of annotating the data. So for example, each row re, might represent a subject enrolled in a study, and then the row names would be the subject ID for example. however, sometimes the row names are not interesting, and, and, and often you'll just use row names of 1, 2, 3, et cetera. Data frames can be created by calling most often calling the read.table, the read.csv function and we'll get into that a little bit when I talk about reading data into R. And you can also create a matrix from a data frame by calling the data.matrix a function. Now, you can't if you have a data frame that has many different types of objects, and then if you coerce that into a matrix, it's going to force so each object to be coerced so that they're all the same. So you may get something that's not exactly expected. So, data frames can be created besides using read.table or read.csv, you can use the data.frame function and here I've created a very simple data frame where the first the first column is called, is the foo variable, and the second column is the bar variable. The foo variable is an integer sequence from one to four, and the bar variable is a logical vector with two trues and two falses.So when I autoprint the data frame out you'll see the, it prints out the two columns and here the row names since I didn't specify any special row names, just defaults to 1, 2, 3, 4, because there's four rows.And then when I call the nrow function on x, I see that there's four rows in the ncall function, shows me that there are two rows
  7. The last data type I'm going to talk about here is the data frame. The data frame is a key data type used in R and it's used to store tabular data. So of course, tabular data make up a lot of what we use in statistics. Of course not all types of data are tabular. But because so much data becomes a tabular form. Data frames are very important in R. So data frames are basically represented as a special type of list, where every element of that list has the same length. Right, so you can think of each column of the data frame as an element of the list, and of course, in order to be a table, every column has to have the same length. However, each column doesn't have to be the same type. So the first column could be numbers, the second column could be factor, the third column could be integers the fourth column could be logicals, it doesn't matter what the different types are. so, unlike matrices where, wh, which have to store the same type of object in every single element of the matrix, data frame can store your cla objects of different classes. And so, data frames also have some special attributes. First, the first special attribute is called a row name. And so every row of a data frame has a name. And this can be useful for kind of annotating the data. So for example, each row re, might represent a subject enrolled in a study, and then the row names would be the subject ID for example. however, sometimes the row names are not interesting, and, and, and often you'll just use row names of 1, 2, 3, et cetera. Data frames can be created by calling most often calling the read.table, the read.csv function and we'll get into that a little bit when I talk about reading data into R. And you can also create a matrix from a data frame by calling the data.matrix a function. Now, you can't if you have a data frame that has many different types of objects, and then if you coerce that into a matrix, it's going to force so each object to be coerced so that they're all the same. So you may get something that's not exactly expected. So, data frames can be created besides using read.table or read.csv, you can use the data.frame function and here I've created a very simple data frame where the first the first column is called, is the foo variable, and the second column is the bar variable. The foo variable is an integer sequence from one to four, and the bar variable is a logical vector with two trues and two falses. So when I autoprint the data frame out you'll see the, it prints out the two columns and here the row names since I didn't specify any special row names, just defaults to 1, 2, 3, 4, because there's four rows. And then when I call the nrow function on x, I see that there's four rows in the ncall function, shows me that there are two rows
  8. R objects can also have names. So this not true for just data frames. It's true for all r objects. And this can be very useful for writing readable code and self describing objects. So for example, I'm creating a vector that's an integer sequence 1, 2, 3 and by default, there's no name. So when I call the names function on x, it gives me a null value. However, I can, I can give a name to each element of the vector x. So for example, if I, I can say the first element's called food, the second element's called bar, and the third element's called norf. So now when I print out my x vector, I get a vector 1, 2, 3 but then each one has a name over it, which is the name I just specified. And so when I call the names function I get the, the names that are associated with each element of the vector foo, bar, and norf. Lists can also have names. And so for example here I'm creating a list with the list function where the first element is called a, the second element is called b, and the third element is called c. And so when I print out the list, it prints out the names of each element and the values associated with those names. Finally matrices can have names. These are called dim names. So here I created a matrix from the sequence 1 to 4. It's a two by two matrix. And so the, when, when I use the dim names function I pass it a list. Excuse me, I assign it a list. Where the first element of the list is the, is the vector of row names and the second element of the list is a vector of column names. So here I want to name the rows a and b, and I want to name the columns c and d. So that's what I passed to the dim names function. And now when I print out my matrix I can see that the row names and the column names are labeled as I wanted.