SlideShare a Scribd company logo
1 of 21
R In Production:
the products
Yasmin Lucero, PhD
Senior Statistician, Gravity-AOL
UserR! 2014
Outline
• Internal products
• 1. one-off analysis
• 2. automated reports
• 3. internal R packages
• 4. internal dashboards
• External products
• 1. customer facing web-app
• 2. analytical backend service
• Ops and the managing of an R environment
Internal Product 1:
one-off analytical product
http://rpubs.com/nathanesau1/21383
Nathan Esau
Hilary Parker
Internal Product 2:
Automated reports
Thursday morning:
Automated Business Reporting with
R (Zhengying (Doro) Lour)
R + bash + email
R + markdown + web server
Internal Product 3:
The Internal R package
• Data APIs
• Business specific metrics
• Custom plotting functions
• Custom data manipulation utilities
Thursday Morning:
An R tools platform in Cosmetic Industry (Jean-Francois Collin)
Internal Product 4:
The internal dashboard
Gravity-AOL
External Product 1:
Customer facing web app
Wednesday afternoon
Rapid Prototyping with R/Shiny at
McKinsey (Aaron Horowitz)
http://www.showmeshiny.com/
External Product 2:
analytical back-end
Wed afternoon:
Deploying R into Business Intelligence and Real-time Applications
(Louis Bajuk-Yorgan)
Zillow’s Big Data and Real-time Services in R (Yeng Bun)
Artwork
& Brands
Bank
Partner
Transactions
CARD.COM
Site / App
CARD.COM
AdTech Platform
APIs
RTB Ad
Xchgs
CARD.COM
Analytics Platform
Members
Visitors
1
2
3
Details: card.com/useR-2014
predict
deploy
learn
CARD.com
More good example applications:
• http://blog.revolutionanalytics.com/2014/06/how-data-
driven-companies-use-r-to-compete.html
Ops: Managing an R Environment
• Overall: not complex, but there are pain points:
• R library management
• CRAN, non-CRAN and internal packages
• Version management
• Dependency management (pulling all dependencies)
• Non-R dependencies (especially C++ and Java)
• Hardware specifications: How much RAM is enough?
Conclusion: Why R?
• Plotting
• Rich analytical library
• More than a DSL: end to end functionality from data APIs
to web apps
• Solid IDE support
• Sturdy, stable easy to support platform
• Rapid prototyping
yasmin.lucero@gmail.com
Thanks.
Tools: plotting
• Major frameworks
• Base graphics
• lattice
• ggplot2
• Useful utilties
• grid/gridExtra/gtable
• latticeExtra
• Color: RColorBrewer/munsell/colorspace/dichromat
• gplots (the ‘g’ school)
• plotrix
• Custom plots
• plot.ts
• maps
• igraph (network visualization)
• ggmap
• ggvis: interactive graphics
• rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github)
• rgl (3d)/scatterplot3d
• vcd (categorical data)
Tools: data manipulation
• Base R features
• Data structures: the data.frame
• Vectorized data manipulation: apply, tapply, lapply…
• Data structures: ts
• Comprehensive, elegant missing data handling (NA)
• Packages
• Wickham school: reshape2/plyr/dplyr/tidyr
• data.table
• Time series: zoo, xts, lubridate
• Spatial data tools: sp/maptools
• The ‘G’ school: gdata
Tools: Data interfaces
• Connections: read.table(); url()
• DBI: RpostgresSQL; RMySQL; RSQLite;…
• RODBC; RJDBC: (vertica, redshift)
• Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; …
• yaml, XML, rjson, RJSONIO,
• MS Excel: xlsx, XLConnect
• SAS, SYSTAT, SPSS, Stata…: foreign
• Rcurl
• RProtoBuf: Efficient cross-language data serialization in R
Tools: Package development
• Package development:
• package.skeleton(); tools (base package)
• pkgKitten (CRAN): improvements to package.skeleton
• devtools (CRAN) : miscellaneous and very useful tools
• gtools: various R programming tools
• roxygen2 (CRAN): literate documentation
• testthat/testR: unit testing
• IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
Tools: Web development & reporting
• Shiny
• Interactive documents
• Knitr
• Sweave
Tools: parallel computing
• parallel: lots of features formerly distributed among
packages have recently been collected into this base R
package
• Revolution analytics
• Map-Reduce: rmr/rhadoop
• H20 (hexadata)
• SparkR (not on CRAN yet, look on github)
Tools: big or out of memory computing
• dplyr: supports database backed data structures
• ff: supports file based data
• biglm/bigmemory: shared memory matrices
• HadoopStreaming
Tools: memory profiling
• lineprof
• profr
• proftools
• object.size()

More Related Content

Similar to 2014 july use_r

An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Wes McKinney
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
A Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdfA Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdfVickyAlers
 
Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech TalkNick Sard
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsKeiichiro Ono
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Alexey Zinoviev
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
 
Introduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataIntroduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataKaren Lim
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"Portland R User Group
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software DevelopmentAlexis Seigneurin
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Claudio Martella
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data ScientistGaines Kergosien
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Rodrigo Urubatan
 

Similar to 2014 july use_r (20)

An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
R - the language
R - the languageR - the language
R - the language
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
A Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdfA Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdf
 
Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech Talk
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Introduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataIntroduction to Decision Intelligence using Data
Introduction to Decision Intelligence using Data
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

2014 july use_r

  • 1. R In Production: the products Yasmin Lucero, PhD Senior Statistician, Gravity-AOL UserR! 2014
  • 2. Outline • Internal products • 1. one-off analysis • 2. automated reports • 3. internal R packages • 4. internal dashboards • External products • 1. customer facing web-app • 2. analytical backend service • Ops and the managing of an R environment
  • 3. Internal Product 1: one-off analytical product http://rpubs.com/nathanesau1/21383 Nathan Esau Hilary Parker
  • 4. Internal Product 2: Automated reports Thursday morning: Automated Business Reporting with R (Zhengying (Doro) Lour) R + bash + email R + markdown + web server
  • 5. Internal Product 3: The Internal R package • Data APIs • Business specific metrics • Custom plotting functions • Custom data manipulation utilities Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)
  • 6. Internal Product 4: The internal dashboard Gravity-AOL
  • 7. External Product 1: Customer facing web app Wednesday afternoon Rapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz) http://www.showmeshiny.com/
  • 8. External Product 2: analytical back-end Wed afternoon: Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan) Zillow’s Big Data and Real-time Services in R (Yeng Bun)
  • 9. Artwork & Brands Bank Partner Transactions CARD.COM Site / App CARD.COM AdTech Platform APIs RTB Ad Xchgs CARD.COM Analytics Platform Members Visitors 1 2 3 Details: card.com/useR-2014 predict deploy learn CARD.com
  • 10. More good example applications: • http://blog.revolutionanalytics.com/2014/06/how-data- driven-companies-use-r-to-compete.html
  • 11. Ops: Managing an R Environment • Overall: not complex, but there are pain points: • R library management • CRAN, non-CRAN and internal packages • Version management • Dependency management (pulling all dependencies) • Non-R dependencies (especially C++ and Java) • Hardware specifications: How much RAM is enough?
  • 12. Conclusion: Why R? • Plotting • Rich analytical library • More than a DSL: end to end functionality from data APIs to web apps • Solid IDE support • Sturdy, stable easy to support platform • Rapid prototyping
  • 14. Tools: plotting • Major frameworks • Base graphics • lattice • ggplot2 • Useful utilties • grid/gridExtra/gtable • latticeExtra • Color: RColorBrewer/munsell/colorspace/dichromat • gplots (the ‘g’ school) • plotrix • Custom plots • plot.ts • maps • igraph (network visualization) • ggmap • ggvis: interactive graphics • rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github) • rgl (3d)/scatterplot3d • vcd (categorical data)
  • 15. Tools: data manipulation • Base R features • Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply… • Data structures: ts • Comprehensive, elegant missing data handling (NA) • Packages • Wickham school: reshape2/plyr/dplyr/tidyr • data.table • Time series: zoo, xts, lubridate • Spatial data tools: sp/maptools • The ‘G’ school: gdata
  • 16. Tools: Data interfaces • Connections: read.table(); url() • DBI: RpostgresSQL; RMySQL; RSQLite;… • RODBC; RJDBC: (vertica, redshift) • Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect • SAS, SYSTAT, SPSS, Stata…: foreign • Rcurl • RProtoBuf: Efficient cross-language data serialization in R
  • 17. Tools: Package development • Package development: • package.skeleton(); tools (base package) • pkgKitten (CRAN): improvements to package.skeleton • devtools (CRAN) : miscellaneous and very useful tools • gtools: various R programming tools • roxygen2 (CRAN): literate documentation • testthat/testR: unit testing • IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
  • 18. Tools: Web development & reporting • Shiny • Interactive documents • Knitr • Sweave
  • 19. Tools: parallel computing • parallel: lots of features formerly distributed among packages have recently been collected into this base R package • Revolution analytics • Map-Reduce: rmr/rhadoop • H20 (hexadata) • SparkR (not on CRAN yet, look on github)
  • 20. Tools: big or out of memory computing • dplyr: supports database backed data structures • ff: supports file based data • biglm/bigmemory: shared memory matrices • HadoopStreaming
  • 21. Tools: memory profiling • lineprof • profr • proftools • object.size()

Editor's Notes

  1. Introduce self State goal of presentation: overview of the ways that R is being used Define ‘product’ for the non-business folks (deliverable)
  2. Bread and butter for many; everyone does some of this; even non-primary R users often turn to R for this Why R: R has always tried to be a platform for statistical analysis
  3. R fits neatly into this kind of pipeline, there are useful command line utilities
  4. This product is basically an extension of the automated reporting idea.