R_L1-Aug-2022.pptx

Presented By
Shantilal Bhayal
Assistant Professor
MEDICAPS UNIVERSITY
Programming
Environment
R-

Freedom 0: Freedom to run the Program .How, When and What
Freedom 1: Freedom to study how the program works, adapt it to your
needs. Access to source code recondition for this.
Freedom 2: The freedom to redistribute copies so you can help
your neighbor.
Freedom 3: The freedom to improve the program ,and release
your improvement to the public so that whole community can
benefits.

What is data science ?
Hacking ( Programming) +
Maths/Statistics + Domain Knowledge =
Data Science

SO NEXT WHAT IS
Data Scientist ?
A data scientist is simply a person who
can
write code = in R, Python,Java, SQL,
Hadoop (Pig,HQL,MR) etc
= for data storage, querying,
summarization, visualization
= how efficiently, and in time (fast
results?)
= where on databases, on cloud,
servers
and understand enough statistics
to derive insights from data
so business can make decisions

Data Science with R :A popular language in Data Science
https://www.tiobe.com/tiobe-index/

The are some milestone dates in the development of R:
R version 4.2.1 (Funny-Looking Kid) has been released on 2022-06-23.
► Early 1990s: The development of R began.
► August 1993: The software was announced on the S-news mailing list.
► www.r-project.org/mail,html
► June 1995: After some persuasive arguments by Martin Mächler - code available as “free software,” under
the FSF’s GNU GPL, Version 2.
► Mid-1997: The initial R Development Core Team was formed(Core group)
► February 2000: The first version of R, version 1.0.0, was released.
► R : Past and Future History (r-project.org); https://cran.r-project/doc/html/interface98-paper/paper.html

What's great about R?
CRAN Packages By Date (r-project.org) https://cran.r-project.org/web/packages/
R can perform various data analysis and data science tasks for free
Interactive Visualization with Shiny package (Equivalent SAS Product : Visual Analytics)
Ensemble Learning / Machine Learning (SAS Product : SAS Enterprise Miner)
Text / Social Media Mining (SAS Product : SAS Text Miner)
Optimization and Forecasting (SAS Product : SAS ETS, PROC OPTMODEL)
RStudio IDE (SAS Product : SAS Enterprise Guide)
 Integartion: Tableau, SQL Server, VS , PowewrBI
The system saves data sets between sessions, so you don't need to reload them each time. It
saves your command history too.

What is R?
Free alternative to MATLAB,Excdel ,SAS and SPSS.
R is a:
1. Statistical Software
2. Language
3. Environment
4. Ecosystem
Used by Google ,Facebook ,Bank of America etc.
Millions of user word wide

Where is R used?
Big data demands of companies
analyse user behaviour.
online advertising and e-commerce
Weather services use it for weather forecasts.
It is a fundamental tool for analytics-driven
organizations

What is R?
 R is a dialect of S.
 S was a language, or is a language that was developed by John
Chambers and at the now-defunct Bell Labs.
 S was initiated in 1976 as an internal statistical analysis environment-
originally implemented as a Fortran Libraries.
 Early versions of the language did not contain functions for statistical
modelling.

 So in 1988, the system was rewritten in the C language and to make it more
portable across systems and it began to resemble the system that we have
today. Historical Notes
 In 1993 Bell Labs gave a corporation called StatSci which became Insightful
Corporation, an exclusive license to develop and sell the S language.
 In 2004, Insightful purchased the S language completely from Lucent for $2 million
is the current owner.
 In 2006, Alcatel purchased Lucent Technologies and it's now called Alcatel-Lucent.
 Insightful sell its implementation of the S language under the product name S-PLUS
and has built a number of fancy features(GUI Mostely) on top of it- ”PLUS” .

Version 4 of the S language was released in 1998. And its version, it's the
version we more or less use today. The book Programming with Data, which is
a reference for this course, is written by John Chambers sometimes called the
green book and it documents version four of the S language.
 In 2008 the Insightful Corporation was acquired a company called TIBCO for $25
million dollars
 The basic fundamentals of the S language have not really changed since 1998.
 In 1998 S won the Association for Computing Machinery’s Software System award

 1991: It was created in New Zealand by two gentleman named Ross Ihaka
and Robert Gentleman.
 1993: First announcement to public.
 1995: Martin Michler convinced Ross and Robert to use, to license R under the
GNU General Public License to make R free software.
 1996: A Public mailing list is created(R-help and R-devel).
 1997: The core group is formed. The core group control the source code of R
 2000: R 1.0.0 Version is released.
R version 4.2.1 (Funny-Looking Kid) has been released on 2022-06-23.

What is R
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display
An effective data handling and storage facility,
A suite of operators for calculations on arrays, in particular matrices,
A large, coherent, integrated collection of intermediate tools for data analysis,
Graphical facilities for data analysis and display either directly at the computer or on hardcopy, and
A well developed, simple and effective programming language (called ‘S’) which includes
conditionals, loops, user defined recursive functions and input and output facilities.

 R is a system for statistical computation and graphics.
 It consists of a language plus a run-time environment with graphics, a
debugger , access to certain system functions, and the ability to run programs
stored in script files.
 It is free software distributed under a GNU-style copyleft, and an official part
of the GNU project (“GNU S”).

Install R: https://cran.r-project.org/bin/windows/base/

Install RStudio
https://www.rstudio.com/products/rstudio/

Alternatives to the standard R editors
Eclipse StatET www.walware.de/goto/statet IDE -java
Emacs Speaks Statistics http://ess.r-project.org Emacs, a powerful text and code
editor
Tinn-R www.sciviews.org/Tinn-R : This editor, developed specifically for working with
R, is available only for Windows

Design of the R System
The R System is divided into 2 conceptual part.
1. The base R system that you downloaded from CRAN
2. Everything else
R functionality divided into number of packages
 The “base” system contain, among the other things the base package which is
required to run R And contain the most fundamental functions

 There are other packages contained in the base system which includes for
example util, stats, datasets, graphics ,grDevices, grid, methods ,tools ,
parallel,compiler,stats4.
 There are also recommended packages: that are kind of fundamental
packages that more or less everyone might use. And then there are a series of
recommended packages, so, boot for bootstrap, class classification, cluster,
codetools,foreign, and a variety of other packages.

How R works
1. R is an interpreted language, not a compiled one.
2. Syntax : lm(y ~ x) which means “fitting a linear model with y as response and x
as predictor”.
3. ls() and ls-content of function .

Features
►It comes as free, open-source code- stable and Reliable . license- www.r-
project.org/COPYING.
►It runs anywhere-MAC, Windows, Unix System
►It supports extensions :data manipulation, statistical modeling, and graphics.
Extensibility-write own s/w and distribute it on the form of add-on pkgs.
►It provides an engaged community :
► www.r-project.org/mail.html
www.stackoverflow.com/questions/tagged/r
http://stats.stackexchange.com/questions/tagged/r
www.twitter.com/search/rstats(R regional Conferences)

It connects with other languages: R package foreign
http://cran.rproject.org/web/packages/foreign/index.html SPSS, SAS, Stata.
RODBC, ROracle
Unique Features: Performing multiple
calculations with vectors: R is Vector
based language
Ex: x<- 1:5
Call x
> x [1] 1 2 3 4 5
> x+2
> x+ 6:10 ( Two Vector)
Processing more than just statistics :
data processing, graphic visualization,
and analysis of all sort
Running code without a compiler-
Development cycle easy- downside of
interptreted language –slow
Object oriented and Functional
Programming

Distributed Computing
 In distributed computing, tasks are split between multiple processing nodes to reduce
processing time and increase efficiency. ddR and multiDplyr -large data sets.
Compatibility with Other Data Processing Technologies
R can be easily paired with other data processing and distributed computing technologies
technologies like Hadoop and Spark.
It is possible to remotely use a Spark cluster to process large datasets using R
 Generates Report in any Desired Format: R’s markdown package

Limitations of R
Steep Learning Curve: R is not an easy language to get started with. Beginners find it
hard to get their feet wet due to the command-line interface. (Rstudio)
Hungry for Physical Memory: R stores all its data in the physical memory ,hard to
handle large data set. Hadoop integration for R
Slower execution: R would need a lot of optimizations before your code can run as
fast as it does on MATLAB or Python.

Drawback of R
 Essentially based on 40 yrs. old Technology.
 Little built-in support for Dynamic and 3-D graphics.
 Functionality is based on consumer demand and user contributions.
 Object must be stored in physical memory of computer: but here have been
advancement to deal with this too.
 Not ideal for all Possible solutions.

Some important commands
1. help(command), ?command
2. help. start(): opens the help system in the system default browser
3. apropos(): Show all the commands that contain the “partword”
4. install.packages(“pkg”): install a library of command form CRAN website.
5. installed.packages(): list of the packages installed
6. library(pkg) : Load a package of commands, make them available for use (the
pkg must be installed)
7. search(): shows a list of all packages and( other objects) that are loaded and
available for use.
8. detach(package:name)- name will be replaced with package name

Pacman-make them available all
packages
Install.packages(pacman)
require(pacman); configuration message
Library(pacman) – no message
p_unload(dplyr,ggplot2,tidyr) # clear specific package..
P_unload(all)
Detach(“package:datasets”,unload=TRUE) # for base

# clear console
Cat(“014”) # ctr+L
Cancelling commands : Ctr+C

Reserved Words in R
The reserved words in R's parser are
if else repeat while function for in next break
TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_
NA_complex_ NA_character_
... and ..1, ..2 etc,

Operator Syntax and Precedence
:: ::: access variables in a namespace
$ @ component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% |> special operators (including %% and %/%)
* / multiply, divide
The following unary and binary operators are defined. They are
listed in precedence groups, from highest to lowest.

Operator and Precedence
+ - (binary) add, subtract
< > <= >= == != ordering and comparison
! negation
& && and
| || or
~ as in formulae
-> ->> rightwards assignment
<- <<- assignment (right to left)
= assignment (right to left)
? help (unary and binary)

Relational Operator
Comparators or operators to check whether two object are equal or not

# R program to illustrate
# the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)
# Performing operations on Operands
cat ("Addition of vectors :", vec1 + vec2, "n")
cat ("Subtraction of vectors :", vec1 - vec2, "n")
cat ("Multiplication of vectors :", vec1 * vec2, "n")
cat ("Division of vectors :", vec1 / vec2, "n")
cat ("Modulo of vectors :", vec1 %% vec2, "n")
cat ("Power operator :", vec1 ^ vec2)

%in%
The %in% operator in R can be used to identify if an element (e.g., a
number) belongs to a vector or dataframe. For example, it can be used
the see if the number 1 is in the sequence of numbers 1 to 10

 This operator is used to multiply a matrix with its transpose.
 The number of columns of the first matrix must be equal to the
number of rows of the second matrix.
%*% Operator:

What is the Difference Between the == and
%in% Operators in R
 The %in% operator is used for matching values. “returns a vector of the positions
of (first) matches of its first argument in its second”.
 On the other hand, the == operator, is a logical operator and is used to compare if
two elements are exactly equal. Using the %in% operator you can compare
vectors of different lengths to see if elements of one vector match at least one
element in another.

1: Using %in% to Compare two Sequences of
Numbers (vectors)
# sequence of numbers 1:
a <- seq(1, 5)
# sequence of numbers 2:
b <- seq(3, 12)
# using the %in% operator to check matching
values in the vectors
a %in% b

R Resource
1. FAQ: https://CRAN.R-project.org/doc/FAQ/R-FAQ.html
2. Mailing lists: https://www.R-project.org/mail.html
3. Archives: https://CRAN.R-project.org/mirrors.html
4. Bug-tracking system: https://bugs.R-project.org/

Arithmetic Operators
These unary and binary operators perform
arithmetic on numeric or complex vectors (or
objects)
+ x - x
x + y x - y
x * y
x / y
x ^ y
x %% y
x %/% y

Language objects
Language objects : calls, expressions, and names.
objects have modes "call", "expression", and "name",
They can be created directly from expressions using the quote
mechanism and converted to and from lists by the as.list and as.call
functions.

Entering Input
• At the R prompt we type expression.
> x<-1
Print(x) S<-rep(obj,times=10)
[1] 1 seq(length=100,from=4 by=1)
> msg<- “Welcome”
The grammar of the language determine whether an expression is
complete or not. > X<- # incomplete expression

R commands
R commands, case sensitivity, etc. (country locale)
Executing commands from or diverting output to a file
source("commands.R'") # execute command save in file named commands.R
sink("record.lis") # divert all subsequent output from the console to an external
file, record.lis. SQR
sink() #restores it to the console once again.
.Rdata= # all Object
.Rhistory # command line used in session

Data permanency and removing objects
 The entities that R creates and manipulates are known as objects.
 variables, arrays of numbers, character strings, functions, or
structures built from such components.
 During an R session, objects are created and stored by name
 > objects()
 The collection of objects currently stored is called the workspace.
To remove objects
> rm(x, y, ….)

Objects, their modes and attributes
The entities R operates on are technically known as objects.
Example; “atomic” vector # component or mode same
Recursive: List, function and expression
mode : basic type of its fundamental constituents. This is a special case of a
“property” of an object
mode(object) and length(object)
compl<-c(2+3i,4+5i) l=2 m=complex

properties of an object are usually provided by attributes(object)
As.character(_) As.complex(object)
Empty object
emp-obje<-character()
emp_obj[6]<-57
Changing the length of an object

The class of an object
All objects in R have a class, reported by the function class.
A special attribute known as the class of the object is used to allow
for an object-oriented style of programming in R.
# To remove temporarily the effects of class, use the function
unclass(). For example if winter has the class "data.frame" then
> winter
will print it in data frame form, which is rather like a matrix, whereas

Learn R Programming (Tutorial & Examples) | Free Introduction Course (statisticsglobe.com)
R Guides – Statology
https://www.youtube.com/watch?v=fpl_ny-
jX5Y&list=RDCMUC87aeHqMrlR6ED0w2SVi5nw&start_radio=1&rv=fpl_ny-jX5Y&t=0
https://www.youtube.com/watch?v=UYclmg1_KLk&list=PLqzoL9-
eJTNDw71zWePXyHx3_cm_fMP8S&index=3

Your best quote that reflects your
approach… “It’s one small step for
man, one giant leap for mankind.”
- NEIL ARMSTRONG

 Identifying potential problems.
 Optimizing price dynamically.
Improving the allocation of “available to promise” inventory.
What is supply chain management? | IBM
www.fsf.org
https://www.youtube.com/watch?v=ckdHNu4kfL0
Why is Supply Chain Management is important?

R_L1-Aug-2022.pptx

Recommended

Recommended

More Related Content

Similar to R_L1-Aug-2022.pptx

Similar to R_L1-Aug-2022.pptx (20)

Recently uploaded

Recently uploaded (20)

R_L1-Aug-2022.pptx

Editor's Notes