SlideShare a Scribd company logo
1 of 23
A Seminar
on
R-Programming
Presented By:
Akshat Sharma
CSE-1, 4th Year
1200112029
mailidakshat@gmail.com
Introduction to R
• The R language is a project designed to create a free, open source language
which can be used as a replacement for the S language, R was created by Ross
Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and
is currently developed by the R Development Core Team, of which Chambers
is a member. R is named partly after the first names of the first two R authors
and partly as a play on the name of S.
• R is an open source implementation of S, and differs from S-plus largely in
its command-line only format.
• R first appeared in 1993; 22 years ago.
• S is a statistical programming language developed primarily by John
Chambers and Rick Becker and Allan Wilks of Bell Laboratories.
• R has a broad set of facilities for doing statistical analyses. In addition, R is
also a very powerful statistical programming language. The open-source
nature of R means that as new statistical techniques are developed, new
packages for R usually become freely available very soon after.
• R is designed to be very portable: it will run on Microsoft Windows, Linux,
Solaris, Mac OSX, and other operating systems, but different binary versions
are required for each.
• The R system is mainly command-driven, with the user typing in test and
asking R to execute it.
• R is polymorphic, which means that the same function can be applied to
different types of objects, with results tailored to the different object types.
Such a function is called generic function.
Installing R
• R is freely downloadable from http://cran.r-project.org/ , as is a wide range
of documentation.
• Most users should download and install a binary version. This is a version
that has been translated (by compilers) into machine language for execution
on a particular type of computer with a particular operating system.
Installation on Microsoft Windows is straightforward.
• A binary version is available for windows 98 or above from the web page
http://cran.r-project.org/bin/windows/base .
Why Learn R
The R Console
• The R console is the most important tool for using R. The R console is a tool
that allows you to type commands into R and see how the R system responds.
• The commands that you type into the console are called expressions. A part
of the R system called the interpreter will read the expressions and respond
with a result or an error message.
• Sometimes, you can also enter an expression into R through the menus.
• If you’ve used a command line before or a language with an interactive
interpreter such as LISP, this should look familiar. If not: don’t worry.
Command-line interfaces aren’t as scary as they look.
• R provides a few tools to save you extra typing, to help you find the tools
you’re looking for, and to spot common mistakes.
• By default, R will display a greater-than sign (“>”) in the console (at the
beginning of a line, when nothing else is shown) when R is waiting for you to
enter a command into the console. R is prompting you to type something, so
this is called a prompt.
Basic Arithmetic and Objects
• R has a command line interface, and will accept simple commands to it.
• This is marked by a > symbol, called the prompt. If you type a command and
press return, R will evaluate it and print the result for you.
> 6 + 9
[1] 15
> x <- 15
> x - 1
[1] 14
• The expression x <- 15 creates a variable called x and gives it the value 15. This is called
assignment; the variable on the left is assigned to the value on the right. The left hand
side must contain only contain a single variable.
Assignment can also be done with = (or ->).
> x + 4 <- 15 # doesn't work
> x = 5
> 5*x -> x
> x
[1] 25
The operators = and <- are identical, but many people prefer <- because it
is not used in any other context, but = is, so there is less room for confusion.
Program Example
Short R code calculating Mandelbrot set through the first 20 iterations
of equation z = z2 + c plotted for different complex constants c.
This example demonstrates:
• use of community-developed external libraries (called packages), in this case
caTools package
• handling of complex numbers
• multidimensional arrays of numbers used as basic data type, see
variables C, Z and X.
install.packages("caTools") # install external package
library(caTools) # external package providing write.gif function
jet.colors <- colorRampPalette(c("green”, "blue", "red", "cyan", "#7FFF7F",
"yellow", "#FF7F00", "red", "#7F0000"))
m <- 1000 # define size
C <- complex( real=rep(seq(-1.8,0.6, length.out=m), each=m ),
imag=rep(seq(-1.2,1.2, length.out=m), m ) )
C <- matrix(C,m,m) # reshape as square matrix of complex numbers
Z <- 0 # initialize Z to zero
X <- array(0, c(m,m,20)) # initialize output 3D array
for (k in 1:20) { # loop with 20 iterations
Z <- Z^2+C # the central difference equation
X[,,k] <- exp(-abs(Z)) # capture results
}
write.gif(X, "Mandelbrot.gif", col=jet.colors, delay=900)
Programming with Big Data in R
Programming with Big Data in R (pbdR) is a series of R packages and an
environment for statistical computing with Big Data by using high-performance
statistical computation. The pbdR uses the same programming language as R
with S3/S4 classes and methods which is used among statisticians and data
miners for developing statistical software.
The significant difference between pbdR and R code is that pbdR mainly
focuses on distributed memory systems, where data are distributed across
several processors and analyzed in a batch mode, while communications
between processors are based on MPI that is easily used in large high-
performance computing (HPC) systems. R system mainly focuses on
single multi-core machines for data analysis via an interactive mode such as GUI
interface.
Big Data Strategies in R
• If Big Data has to be tackled with R, five different strategies can be considered:
1. Sampling: If data is too big to be analyzed in complete, its’ size can be reduced by sampling. Naturally, the question arises
whether sampling decreases the performance of a model significantly. Much data is of course always better than little data.
But according to Hadley Wickham’s useR! talk, sample based model building is acceptable, at least if the size of data crosses
the one billion record threshold.
2. Bigger Hardware: R keeps all objects in memory. This can become a problem if the data gets large. One of the easiest
ways to deal with Big Data in R is simply to increase the machine’s memory. Today, R can address 8 TB of RAM if it runs on 64-
bit machines. That is in many situations a sufficient improvement compared to about 2 GB addressable RAM on 32-bit
machines.
3. Store objects on hard disk and analyze it chunkwise: As an alternative, there are packages available
that avoid storing data in memory. Instead, objects are stored on hard disk and analyzed chunkwise. As a side effect, the
chunking also leads naturally to parallelization, if the algorithms allow parallel analysis of the chunks in principle. A downside
of this strategy is that only those algorithms (and R functions in general) can be performed that are explicitly designed to deal
with hard disk specific datatypes.
4. Integration of higher performing programming languages like C++ or Java: Small parts
of the program are moved from R to another language to avoid bottlenecks and performance expensive procedures.
The aim is to balance R’s more elegant way to deal with data on the one hand and the higher performance of other
languages on the other hand.
5. Alternative interpreters: A relatively new direction to deal with Big Data in R is to use alternative
interpreters. The first one that became popular to a bigger audience was pqR (pretty quick R). Duncon Murdoc from the
R-Core team preannounced that pqR’s suggestions for improvements shall be integrated into the core of R in one of the
next versions.
Applications of R Programming
• R applications span the universe from theoretical computational
statistics and the hard sciences such as astronomy, chemistry and
genomics to practical applications in business, drug development,
finance, health care, marketing, medicine and much more.
• Because R has nearly 5,000 packages (libraries of functions) many
of which are dedicated to specific applications you don’t have to be
an R genius to begin developing your own applications. All it takes to
get started is some modest experience with R, some examples to
emulate and the right libraries of R functions.
• Following are R’s application area
for which there are packages
containing the tools and functions
you need:
1. Clinical Trials
2. Cluster Analysis
3. Computational Physics
4. Differential Equations
5. Econometrics
6. Environmental Studies
7. Experimental Design
8. Finance
9. Genetics
10. Graphical Models
11. Graphics and Visualizations
12. High Performance Computing
13. High Throughput Genomics
14. Machine Learning
15. Medical Imaging
16. Meta Analysis
17. Multivariate Statistics
18. Natural Language Processing
19. Official Statistics
20. Optimization
Companies Using R
Social Search Awareness: Jesse Bridgewater works on
"social search awesomeness" for the Bing search engine, and is setting up his
dev environment with the necessary tools including python, vim, and R.
Facebook uses R for:
• Analysis of Facebook Status Updates
• Facebook's Social Network Graph
• Predicting Colleague Interactions with R
R is widely used at the FDA on a daily basis. FDA scientists
have written R packages for use by other scientists (within and outside the
FDA) involved in the submissions process.
Google uses R to: calculate ROI on advertising
campaigns; to predict economic activity; to analyze effectiveness of TV
ads and make online advertising more effective.
Other notable companies include-
What R is not so good at
• Statistics for non-statisticians
• Numerical methods, such as solving partial differential equations. (Try
Matlab)
• Analytical methods, such as algebraically integrating a function. (Try
Mathematica)
• Precision graphics, such as might be useful in psychology experiments. (Try
Matlab)
• Optimization
• Low-level, high-speed or critical code
Conclusion
A couple of years ago, R had the reputation of not being able to
handle Big Data at all – and it probably still has for users sticking on
other statistical software. But today, there are a number of quite
different Big Data approaches available. Which one fits best depends
on the specifics of the given problem. There is not one solution for all
problems. But there is some solution for any problem.
R is the best at what I does- letting experts quickly and easily
interpret, interact with, and visualize data.
R continues to help shape the future of statistical analysis, and
data science.
R programming presentation

More Related Content

What's hot

R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to RstudioOlga Scrivner
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingVictor Ordu
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in RRupak Roy
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data framekrishna singh
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programmingRamon Salazar
 
R Programming: Introduction To R Packages
R Programming: Introduction To R PackagesR Programming: Introduction To R Packages
R Programming: Introduction To R PackagesRsquared Academy
 

What's hot (20)

R programming slides
R  programming slidesR  programming slides
R programming slides
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programming
 
R Programming: Introduction To R Packages
R Programming: Introduction To R PackagesR Programming: Introduction To R Packages
R Programming: Introduction To R Packages
 
Programming in R
Programming in RProgramming in R
Programming in R
 

Similar to R programming presentation

Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programminghemasri56
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptxranapoonam1
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxrajalakshmi5921
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Unit1_Introduction to R.pdf
Unit1_Introduction to R.pdfUnit1_Introduction to R.pdf
Unit1_Introduction to R.pdfMDDidarulAlam15
 
2 it unit-1 start learning r
2 it   unit-1 start learning r2 it   unit-1 start learning r
2 it unit-1 start learning rNetaji Gandi
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
FULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdfFULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdfattalurilalitha
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul SinghRavi Basil
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTHaritikaChhatwal1
 
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziFinancial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziDr. Muhammad Ali Tirmizi., Ph.D.
 

Similar to R programming presentation (20)

Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
R programming language
R programming languageR programming language
R programming language
 
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Unit1_Introduction to R.pdf
Unit1_Introduction to R.pdfUnit1_Introduction to R.pdf
Unit1_Introduction to R.pdf
 
2 it unit-1 start learning r
2 it   unit-1 start learning r2 it   unit-1 start learning r
2 it unit-1 start learning r
 
UNIT-1 Start Learning R.pdf
UNIT-1 Start Learning R.pdfUNIT-1 Start Learning R.pdf
UNIT-1 Start Learning R.pdf
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
FULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdfFULL R PROGRAMMING METERIAL_2.pdf
FULL R PROGRAMMING METERIAL_2.pdf
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
R language
R languageR language
R language
 
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziFinancial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
 
R programming
R programmingR programming
R programming
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

R programming presentation

  • 1.
  • 2. A Seminar on R-Programming Presented By: Akshat Sharma CSE-1, 4th Year 1200112029 mailidakshat@gmail.com
  • 3. Introduction to R • The R language is a project designed to create a free, open source language which can be used as a replacement for the S language, R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S. • R is an open source implementation of S, and differs from S-plus largely in its command-line only format. • R first appeared in 1993; 22 years ago. • S is a statistical programming language developed primarily by John Chambers and Rick Becker and Allan Wilks of Bell Laboratories.
  • 4. • R has a broad set of facilities for doing statistical analyses. In addition, R is also a very powerful statistical programming language. The open-source nature of R means that as new statistical techniques are developed, new packages for R usually become freely available very soon after. • R is designed to be very portable: it will run on Microsoft Windows, Linux, Solaris, Mac OSX, and other operating systems, but different binary versions are required for each. • The R system is mainly command-driven, with the user typing in test and asking R to execute it. • R is polymorphic, which means that the same function can be applied to different types of objects, with results tailored to the different object types. Such a function is called generic function.
  • 5. Installing R • R is freely downloadable from http://cran.r-project.org/ , as is a wide range of documentation. • Most users should download and install a binary version. This is a version that has been translated (by compilers) into machine language for execution on a particular type of computer with a particular operating system. Installation on Microsoft Windows is straightforward. • A binary version is available for windows 98 or above from the web page http://cran.r-project.org/bin/windows/base .
  • 7. The R Console • The R console is the most important tool for using R. The R console is a tool that allows you to type commands into R and see how the R system responds. • The commands that you type into the console are called expressions. A part of the R system called the interpreter will read the expressions and respond with a result or an error message. • Sometimes, you can also enter an expression into R through the menus. • If you’ve used a command line before or a language with an interactive interpreter such as LISP, this should look familiar. If not: don’t worry. Command-line interfaces aren’t as scary as they look.
  • 8. • R provides a few tools to save you extra typing, to help you find the tools you’re looking for, and to spot common mistakes. • By default, R will display a greater-than sign (“>”) in the console (at the beginning of a line, when nothing else is shown) when R is waiting for you to enter a command into the console. R is prompting you to type something, so this is called a prompt.
  • 9.
  • 10. Basic Arithmetic and Objects • R has a command line interface, and will accept simple commands to it. • This is marked by a > symbol, called the prompt. If you type a command and press return, R will evaluate it and print the result for you. > 6 + 9 [1] 15 > x <- 15 > x - 1 [1] 14 • The expression x <- 15 creates a variable called x and gives it the value 15. This is called assignment; the variable on the left is assigned to the value on the right. The left hand side must contain only contain a single variable.
  • 11. Assignment can also be done with = (or ->). > x + 4 <- 15 # doesn't work > x = 5 > 5*x -> x > x [1] 25 The operators = and <- are identical, but many people prefer <- because it is not used in any other context, but = is, so there is less room for confusion.
  • 12. Program Example Short R code calculating Mandelbrot set through the first 20 iterations of equation z = z2 + c plotted for different complex constants c. This example demonstrates: • use of community-developed external libraries (called packages), in this case caTools package • handling of complex numbers • multidimensional arrays of numbers used as basic data type, see variables C, Z and X.
  • 13. install.packages("caTools") # install external package library(caTools) # external package providing write.gif function jet.colors <- colorRampPalette(c("green”, "blue", "red", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000")) m <- 1000 # define size C <- complex( real=rep(seq(-1.8,0.6, length.out=m), each=m ), imag=rep(seq(-1.2,1.2, length.out=m), m ) ) C <- matrix(C,m,m) # reshape as square matrix of complex numbers Z <- 0 # initialize Z to zero X <- array(0, c(m,m,20)) # initialize output 3D array for (k in 1:20) { # loop with 20 iterations Z <- Z^2+C # the central difference equation X[,,k] <- exp(-abs(Z)) # capture results } write.gif(X, "Mandelbrot.gif", col=jet.colors, delay=900)
  • 14.
  • 15. Programming with Big Data in R Programming with Big Data in R (pbdR) is a series of R packages and an environment for statistical computing with Big Data by using high-performance statistical computation. The pbdR uses the same programming language as R with S3/S4 classes and methods which is used among statisticians and data miners for developing statistical software. The significant difference between pbdR and R code is that pbdR mainly focuses on distributed memory systems, where data are distributed across several processors and analyzed in a batch mode, while communications between processors are based on MPI that is easily used in large high- performance computing (HPC) systems. R system mainly focuses on single multi-core machines for data analysis via an interactive mode such as GUI interface.
  • 16. Big Data Strategies in R • If Big Data has to be tackled with R, five different strategies can be considered: 1. Sampling: If data is too big to be analyzed in complete, its’ size can be reduced by sampling. Naturally, the question arises whether sampling decreases the performance of a model significantly. Much data is of course always better than little data. But according to Hadley Wickham’s useR! talk, sample based model building is acceptable, at least if the size of data crosses the one billion record threshold. 2. Bigger Hardware: R keeps all objects in memory. This can become a problem if the data gets large. One of the easiest ways to deal with Big Data in R is simply to increase the machine’s memory. Today, R can address 8 TB of RAM if it runs on 64- bit machines. That is in many situations a sufficient improvement compared to about 2 GB addressable RAM on 32-bit machines. 3. Store objects on hard disk and analyze it chunkwise: As an alternative, there are packages available that avoid storing data in memory. Instead, objects are stored on hard disk and analyzed chunkwise. As a side effect, the chunking also leads naturally to parallelization, if the algorithms allow parallel analysis of the chunks in principle. A downside of this strategy is that only those algorithms (and R functions in general) can be performed that are explicitly designed to deal with hard disk specific datatypes. 4. Integration of higher performing programming languages like C++ or Java: Small parts of the program are moved from R to another language to avoid bottlenecks and performance expensive procedures. The aim is to balance R’s more elegant way to deal with data on the one hand and the higher performance of other languages on the other hand. 5. Alternative interpreters: A relatively new direction to deal with Big Data in R is to use alternative interpreters. The first one that became popular to a bigger audience was pqR (pretty quick R). Duncon Murdoc from the R-Core team preannounced that pqR’s suggestions for improvements shall be integrated into the core of R in one of the next versions.
  • 17. Applications of R Programming • R applications span the universe from theoretical computational statistics and the hard sciences such as astronomy, chemistry and genomics to practical applications in business, drug development, finance, health care, marketing, medicine and much more. • Because R has nearly 5,000 packages (libraries of functions) many of which are dedicated to specific applications you don’t have to be an R genius to begin developing your own applications. All it takes to get started is some modest experience with R, some examples to emulate and the right libraries of R functions.
  • 18. • Following are R’s application area for which there are packages containing the tools and functions you need: 1. Clinical Trials 2. Cluster Analysis 3. Computational Physics 4. Differential Equations 5. Econometrics 6. Environmental Studies 7. Experimental Design 8. Finance 9. Genetics 10. Graphical Models 11. Graphics and Visualizations 12. High Performance Computing 13. High Throughput Genomics 14. Machine Learning 15. Medical Imaging 16. Meta Analysis 17. Multivariate Statistics 18. Natural Language Processing 19. Official Statistics 20. Optimization
  • 19. Companies Using R Social Search Awareness: Jesse Bridgewater works on "social search awesomeness" for the Bing search engine, and is setting up his dev environment with the necessary tools including python, vim, and R. Facebook uses R for: • Analysis of Facebook Status Updates • Facebook's Social Network Graph • Predicting Colleague Interactions with R
  • 20. R is widely used at the FDA on a daily basis. FDA scientists have written R packages for use by other scientists (within and outside the FDA) involved in the submissions process. Google uses R to: calculate ROI on advertising campaigns; to predict economic activity; to analyze effectiveness of TV ads and make online advertising more effective. Other notable companies include-
  • 21. What R is not so good at • Statistics for non-statisticians • Numerical methods, such as solving partial differential equations. (Try Matlab) • Analytical methods, such as algebraically integrating a function. (Try Mathematica) • Precision graphics, such as might be useful in psychology experiments. (Try Matlab) • Optimization • Low-level, high-speed or critical code
  • 22. Conclusion A couple of years ago, R had the reputation of not being able to handle Big Data at all – and it probably still has for users sticking on other statistical software. But today, there are a number of quite different Big Data approaches available. Which one fits best depends on the specifics of the given problem. There is not one solution for all problems. But there is some solution for any problem. R is the best at what I does- letting experts quickly and easily interpret, interact with, and visualize data. R continues to help shape the future of statistical analysis, and data science.