SlideShare a Scribd company logo
R - scripted data
History
Language
Packages
Tools
RPubs
Slidify
Shiny
A Brief History of R
– 1976 S - Bell Labs; Fortran
– John Chambers
– 1988 S Version 3; C language
● 1991 R Created
– Ross Ihaka and Robert Gentleman
● 1993 R Announced
– 1993 S licensed to StatSci (now Insightful)
● 2000 R Version 1.0.0 released
– 2004 S purchased from Lucent (2MM)
– 2008 TIBCO acquires Insightful (25MM)
Other “Stats” Tools
● R – additional, commercial support
Oracle: “Big Data Appliance” - R + Hadoop
+ Linux + NoSQL + Exadata(H/W)
IBM: R executing in Hadoop (massively
parallel in-databse analytics)
● SAS (SAS Institute) dev. 1966, 1st rel 1972
● SPSS (IBM) 1st rel 1968
Model Development and
Execution Comparison
http://inside-bigdata.com/2014/06/25/revolution-r-enterprise-vs-sas-performance/
Oracle + INTEL Libraries
https://blogs.oracle.com/R/entry/oracle_r_distribution_performance_benchmark
Language
● Derviative of S (S PLUS)
● Portable (includes Playstation 3)
● Interpreted, calls into C libraries
● Functional!
● GPL
● 40 year old technology
● Open Source (you want it, you do it)
Data Types
● Symbols refer to objects
● Object attributes
– names
– dimnames
– dimensions
– class
– length
– user defined attributes/metadata
Data Types
● Object types – single class, except list
– List
(may have mixed classes)
– Vectors
(scalar is a vector of length 1)
– Matrices
(vector with 'dimension' attribute)
(column major order)
Data Types
● Object types
– Factors
● Categorical data (like an enumeration)
– Data frames
● Special list, each element has same length
● Elements are columns with length rows
● Each elements (column) has its own type
● row.names() attribute to name the rows
● Convert to matrix with data.matrix()
● Load with read.table(), read.csv()
Data Types
● Object “atomic” classes
– character
– numeric (double precision real)
– integer
– complex
– logical (booleans)
Numeric and Integer include Inf and NaN
1 / Inf == 0 !
any class can be NA
NaN is NA, NA is not NaN
Data Types
● Dates
– “Date” class
– Days since epoch (1970-01-01)
● Times
– “POSIXct” or “POSIXlt” class
– Seconds since epoch
● Coerce to string with as.Date()
● Generic functions include 'weekdays()',
months()', 'quarters()'
Operators
● Grouping: ()
● Assignment: to<-from AND from->to
● Vectorized: + - ! * / ^ %% & |
● ~ ? : %/% %*% %o% %x% %in% < > == >=
<= && ||
● Element access: [[]] [] $
● Function argument types:
– symbol, symbol=default, ...
Control Structures
● if, else
● for
● while
● repeat
● break, next, return
Apply
● apply – apply functions over arrays
● lapply – apply functions over list / vector
● sapply – apply function to data frames
● tapply – apply function over ragged array
● mapply – apply function to multiple objects
Functions
● Functions are objects
● Functional closure consists of:
– Formal argument list
– Function body (definition)
– Environment
● Each of these can be assigned to
● Assign to environment can eliminate
unwanted environment capture
Packages
● CRAN (Comprehensive R Archive Network)
– Main site, includes R download
● Bioconductor
– Analysis of genomic data
– Next generation high-throughput
sequencing
● R-forge
● GitHub and Personal repositories
Packages
● Analysis
– Statistical analysis (stats, linprog)
● Linear (and general linear) modeling
● Tree models
● Analysis of variance
– Machine learning (caret, kernlab)
● Clustering (forests, k-means, knn, etc)
● Training and predictions
● Cross validation and error analysis
Packages
● Graphics
– Base graphics
● Plot: plot, hist, ...
● Annotate: text, lines, points, axis, ...
– Lattice
● Single command: xyplot, bwplot, ...
– Ggplot2
● Single command: qplot
● Defining objects: aesthetics, geoms
● Chain commands: ggplot, geom_*, ...
Packages
● Data visualization
– rCharts (GitHub), converts visualizations to
Javascript (e.g. d3.js)
http://www.google.com/trends/explore#q=R%20language%2C%20Data%20Visualization%2C%20D3.js%2C%20Processing.js&cmpt=q
Tools
● Command line
● Rstudio (can run on remote Linux server)
● Rkward
● Rcommander (tcl/tk)
● JGR – Java (GUI for R)
● Rattle - RGtk2
Tools
● Debugging
– Print statements!
– Interactive tools:
● traceback() – stack trace on error
● debug() – flags function for stepping
● browser() - stops function and enters debug
● trace() - insert trace statements
● recover() - modify error behavior, can
browse call stack
Tools
● Profiling
– “We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil”
– Donald Knuth
– system.time() - CPU, wall times
– Rprof() - use symmaryRprof() to see results
● Do not use Rprof() and system.time()
together
● Calls to C/Fortran libraries not profiled
Data Exploration
● Script it!
– If you can't repeat it, it didn't happen
● Get the data (ingest)
– Functions to download, uncompress,
unarchive, store, read, and organize
● Clean the data
– Handle missing and incomplete data,
impute values, identify outliers
Data Exploration
● Look at the data (models, visualization)
– Model – regressions (linear, logistic),
clustering, ANOVA
– Refine models and plot the result
● Look for systematic issues – unexpected
trends, bias, unexplained variance, error
estimates, residual analysis
● Explore complexity – number of explanatory
factors
– Plot the models
● What does it look like?
Reproducible Research
● Allows others to validate the work
● Ensures that the results are accepted
● Reduces the chance of errors propagating
– http://youtu.be/7gYIs7uYbMo
– 2010 Anil Potti resigns from Duke after
research was found flawed (off by 1!)
● Clinical trials based on the flawed research
was finally cancelled
● Closed data, non-reproducible research
exacerbated the problem
Reproducible Research
● Don't do things by hand – especially editing
spreadsheets to “clean up” data (removing
outliers, validating, editing) or dowloading
files
● Actions taken by hand need very detailed
documentation to reproduce – such as
download sites and what files were
downloaded to
● GUIs are convenient, but can't be repeated
Reproducible Research
● Capture the steps in a script:
– download.file(“http://...”, “localfile.zip”)
● Can be repeated as long as the link is
available. Can keep and manage the
downloaded file if that is an issue
– Use version control
● Capture small steps at a time (git is good
for this!)
● Can track changes and revert if needed
● Can use GitHub, BitBucket, SouceForge to
publish the results as well
Reproducible Research
● Capture environment – OS, tools, versions
● Don't save outputs – regenerate
– Ok to cache results while in use, but don't
store the results, just the code+data that
produced it
– If you keep intermediate files, document
how they were created
● Set random seed
Sharing Research
● Rmarkdown – markdown with embedded R
– knitr package executes the R fragments
and embeds the code and results into
markdown, which can convert to HTML or
PDF
– Literate programming!
● Hosted documentation
– Rpubs (rpubs.com)
– GitHub gh-pages (github.io)
Sharing Research
● Embedded presentations
– Author using slidify package
– Rmarkdown with embedded R code
– Creates HTML5 presentation slide deck
– Can include inline quizes
Data Products
● Interactive visualizations
– shiny, shinyapp packages
– RStudio includes interactive display of
shiny applications during development
– Generates bootstrap + HTML5 + javascript
+ d3 application
● Hosted!
– Hosted at shinyapp.io
– Private? Server images available (for
purchase)

More Related Content

What's hot

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
Mahmoud Shiri Varamini
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
Dr. Awase Khirni Syed
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academy
rajkamaltibacademy
 
R basics
R basicsR basics
R basics
FAO
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
Yanchang Zhao
 
R programming slides
R  programming slidesR  programming slides
R programming slides
Pankaj Saini
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
Yanchang Zhao
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
Sankhya_Analytics
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
Sovello Hildebrand
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
Savitribai Phule Pune University
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
Olga Scrivner
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
David Chiu
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
Victoria López
 
Functional Programming in R
Functional Programming in RFunctional Programming in R
Functional Programming in R
Soumendra Dhanee
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
izahn
 
R programming language
R programming languageR programming language
R programming language
Keerti Verma
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
Sander Timmer
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
Ajay Ohri
 

What's hot (20)

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academy
 
R basics
R basicsR basics
R basics
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Functional Programming in R
Functional Programming in RFunctional Programming in R
Functional Programming in R
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
R programming language
R programming languageR programming language
R programming language
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 

Similar to R - the language

Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Hakka Labs
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Neville Li
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
BusyBird2
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
Lars Albertsson
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
Joseph Adler
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
Neville Li
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
BDPA Education and Technology Foundation
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
Samuel Bosch
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
DTU - Technical University of Denmark
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
Neville Li
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ahmed Ossama
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
ArchishaKhandareSS20
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
vikassingh569137
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
anshikagoel52
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
IIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RIIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into R
Kevin Smith
 

Similar to R - the language (20)

Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
IIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RIIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into R
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

R - the language

  • 1. R - scripted data History Language Packages Tools RPubs Slidify Shiny
  • 2. A Brief History of R – 1976 S - Bell Labs; Fortran – John Chambers – 1988 S Version 3; C language ● 1991 R Created – Ross Ihaka and Robert Gentleman ● 1993 R Announced – 1993 S licensed to StatSci (now Insightful) ● 2000 R Version 1.0.0 released – 2004 S purchased from Lucent (2MM) – 2008 TIBCO acquires Insightful (25MM)
  • 3. Other “Stats” Tools ● R – additional, commercial support Oracle: “Big Data Appliance” - R + Hadoop + Linux + NoSQL + Exadata(H/W) IBM: R executing in Hadoop (massively parallel in-databse analytics) ● SAS (SAS Institute) dev. 1966, 1st rel 1972 ● SPSS (IBM) 1st rel 1968
  • 4. Model Development and Execution Comparison http://inside-bigdata.com/2014/06/25/revolution-r-enterprise-vs-sas-performance/
  • 5. Oracle + INTEL Libraries https://blogs.oracle.com/R/entry/oracle_r_distribution_performance_benchmark
  • 6. Language ● Derviative of S (S PLUS) ● Portable (includes Playstation 3) ● Interpreted, calls into C libraries ● Functional! ● GPL ● 40 year old technology ● Open Source (you want it, you do it)
  • 7. Data Types ● Symbols refer to objects ● Object attributes – names – dimnames – dimensions – class – length – user defined attributes/metadata
  • 8. Data Types ● Object types – single class, except list – List (may have mixed classes) – Vectors (scalar is a vector of length 1) – Matrices (vector with 'dimension' attribute) (column major order)
  • 9. Data Types ● Object types – Factors ● Categorical data (like an enumeration) – Data frames ● Special list, each element has same length ● Elements are columns with length rows ● Each elements (column) has its own type ● row.names() attribute to name the rows ● Convert to matrix with data.matrix() ● Load with read.table(), read.csv()
  • 10. Data Types ● Object “atomic” classes – character – numeric (double precision real) – integer – complex – logical (booleans) Numeric and Integer include Inf and NaN 1 / Inf == 0 ! any class can be NA NaN is NA, NA is not NaN
  • 11. Data Types ● Dates – “Date” class – Days since epoch (1970-01-01) ● Times – “POSIXct” or “POSIXlt” class – Seconds since epoch ● Coerce to string with as.Date() ● Generic functions include 'weekdays()', months()', 'quarters()'
  • 12. Operators ● Grouping: () ● Assignment: to<-from AND from->to ● Vectorized: + - ! * / ^ %% & | ● ~ ? : %/% %*% %o% %x% %in% < > == >= <= && || ● Element access: [[]] [] $ ● Function argument types: – symbol, symbol=default, ...
  • 13. Control Structures ● if, else ● for ● while ● repeat ● break, next, return
  • 14. Apply ● apply – apply functions over arrays ● lapply – apply functions over list / vector ● sapply – apply function to data frames ● tapply – apply function over ragged array ● mapply – apply function to multiple objects
  • 15. Functions ● Functions are objects ● Functional closure consists of: – Formal argument list – Function body (definition) – Environment ● Each of these can be assigned to ● Assign to environment can eliminate unwanted environment capture
  • 16. Packages ● CRAN (Comprehensive R Archive Network) – Main site, includes R download ● Bioconductor – Analysis of genomic data – Next generation high-throughput sequencing ● R-forge ● GitHub and Personal repositories
  • 17. Packages ● Analysis – Statistical analysis (stats, linprog) ● Linear (and general linear) modeling ● Tree models ● Analysis of variance – Machine learning (caret, kernlab) ● Clustering (forests, k-means, knn, etc) ● Training and predictions ● Cross validation and error analysis
  • 18. Packages ● Graphics – Base graphics ● Plot: plot, hist, ... ● Annotate: text, lines, points, axis, ... – Lattice ● Single command: xyplot, bwplot, ... – Ggplot2 ● Single command: qplot ● Defining objects: aesthetics, geoms ● Chain commands: ggplot, geom_*, ...
  • 19. Packages ● Data visualization – rCharts (GitHub), converts visualizations to Javascript (e.g. d3.js) http://www.google.com/trends/explore#q=R%20language%2C%20Data%20Visualization%2C%20D3.js%2C%20Processing.js&cmpt=q
  • 20. Tools ● Command line ● Rstudio (can run on remote Linux server) ● Rkward ● Rcommander (tcl/tk) ● JGR – Java (GUI for R) ● Rattle - RGtk2
  • 21. Tools ● Debugging – Print statements! – Interactive tools: ● traceback() – stack trace on error ● debug() – flags function for stepping ● browser() - stops function and enters debug ● trace() - insert trace statements ● recover() - modify error behavior, can browse call stack
  • 22. Tools ● Profiling – “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil” – Donald Knuth – system.time() - CPU, wall times – Rprof() - use symmaryRprof() to see results ● Do not use Rprof() and system.time() together ● Calls to C/Fortran libraries not profiled
  • 23. Data Exploration ● Script it! – If you can't repeat it, it didn't happen ● Get the data (ingest) – Functions to download, uncompress, unarchive, store, read, and organize ● Clean the data – Handle missing and incomplete data, impute values, identify outliers
  • 24. Data Exploration ● Look at the data (models, visualization) – Model – regressions (linear, logistic), clustering, ANOVA – Refine models and plot the result ● Look for systematic issues – unexpected trends, bias, unexplained variance, error estimates, residual analysis ● Explore complexity – number of explanatory factors – Plot the models ● What does it look like?
  • 25. Reproducible Research ● Allows others to validate the work ● Ensures that the results are accepted ● Reduces the chance of errors propagating – http://youtu.be/7gYIs7uYbMo – 2010 Anil Potti resigns from Duke after research was found flawed (off by 1!) ● Clinical trials based on the flawed research was finally cancelled ● Closed data, non-reproducible research exacerbated the problem
  • 26. Reproducible Research ● Don't do things by hand – especially editing spreadsheets to “clean up” data (removing outliers, validating, editing) or dowloading files ● Actions taken by hand need very detailed documentation to reproduce – such as download sites and what files were downloaded to ● GUIs are convenient, but can't be repeated
  • 27. Reproducible Research ● Capture the steps in a script: – download.file(“http://...”, “localfile.zip”) ● Can be repeated as long as the link is available. Can keep and manage the downloaded file if that is an issue – Use version control ● Capture small steps at a time (git is good for this!) ● Can track changes and revert if needed ● Can use GitHub, BitBucket, SouceForge to publish the results as well
  • 28. Reproducible Research ● Capture environment – OS, tools, versions ● Don't save outputs – regenerate – Ok to cache results while in use, but don't store the results, just the code+data that produced it – If you keep intermediate files, document how they were created ● Set random seed
  • 29. Sharing Research ● Rmarkdown – markdown with embedded R – knitr package executes the R fragments and embeds the code and results into markdown, which can convert to HTML or PDF – Literate programming! ● Hosted documentation – Rpubs (rpubs.com) – GitHub gh-pages (github.io)
  • 30. Sharing Research ● Embedded presentations – Author using slidify package – Rmarkdown with embedded R code – Creates HTML5 presentation slide deck – Can include inline quizes
  • 31. Data Products ● Interactive visualizations – shiny, shinyapp packages – RStudio includes interactive display of shiny applications during development – Generates bootstrap + HTML5 + javascript + d3 application ● Hosted! – Hosted at shinyapp.io – Private? Server images available (for purchase)