SlideShare a Scribd company logo
R In Production:
the products
Yasmin Lucero, PhD
Senior Statistician, Gravity-AOL
UserR! 2014
Outline
• Internal products
• 1. one-off analysis
• 2. automated reports
• 3. internal R packages
• 4. internal dashboards
• External products
• 1. customer facing web-app
• 2. analytical backend service
• Ops and the managing of an R environment
Internal Product 1:
one-off analytical product
http://rpubs.com/nathanesau1/21383
Nathan Esau
Hilary Parker
Internal Product 2:
Automated reports
Thursday morning:
Automated Business Reporting with
R (Zhengying (Doro) Lour)
R + bash + email
R + markdown + web server
Internal Product 3:
The Internal R package
• Data APIs
• Business specific metrics
• Custom plotting functions
• Custom data manipulation utilities
Thursday Morning:
An R tools platform in Cosmetic Industry (Jean-Francois Collin)
Internal Product 4:
The internal dashboard
Gravity-AOL
External Product 1:
Customer facing web app
Wednesday afternoon
Rapid Prototyping with R/Shiny at
McKinsey (Aaron Horowitz)
http://www.showmeshiny.com/
External Product 2:
analytical back-end
Wed afternoon:
Deploying R into Business Intelligence and Real-time Applications
(Louis Bajuk-Yorgan)
Zillow’s Big Data and Real-time Services in R (Yeng Bun)
Artwork
& Brands
Bank
Partner
Transactions
CARD.COM
Site / App
CARD.COM
AdTech Platform
APIs
RTB Ad
Xchgs
CARD.COM
Analytics Platform
Members
Visitors
1
2
3
Details: card.com/useR-2014
predict
deploy
learn
CARD.com
More good example applications:
• http://blog.revolutionanalytics.com/2014/06/how-data-
driven-companies-use-r-to-compete.html
Ops: Managing an R Environment
• Overall: not complex, but there are pain points:
• R library management
• CRAN, non-CRAN and internal packages
• Version management
• Dependency management (pulling all dependencies)
• Non-R dependencies (especially C++ and Java)
• Hardware specifications: How much RAM is enough?
Conclusion: Why R?
• Plotting
• Rich analytical library
• More than a DSL: end to end functionality from data APIs
to web apps
• Solid IDE support
• Sturdy, stable easy to support platform
• Rapid prototyping
yasmin.lucero@gmail.com
Thanks.
Tools: plotting
• Major frameworks
• Base graphics
• lattice
• ggplot2
• Useful utilties
• grid/gridExtra/gtable
• latticeExtra
• Color: RColorBrewer/munsell/colorspace/dichromat
• gplots (the ‘g’ school)
• plotrix
• Custom plots
• plot.ts
• maps
• igraph (network visualization)
• ggmap
• ggvis: interactive graphics
• rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github)
• rgl (3d)/scatterplot3d
• vcd (categorical data)
Tools: data manipulation
• Base R features
• Data structures: the data.frame
• Vectorized data manipulation: apply, tapply, lapply…
• Data structures: ts
• Comprehensive, elegant missing data handling (NA)
• Packages
• Wickham school: reshape2/plyr/dplyr/tidyr
• data.table
• Time series: zoo, xts, lubridate
• Spatial data tools: sp/maptools
• The ‘G’ school: gdata
Tools: Data interfaces
• Connections: read.table(); url()
• DBI: RpostgresSQL; RMySQL; RSQLite;…
• RODBC; RJDBC: (vertica, redshift)
• Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; …
• yaml, XML, rjson, RJSONIO,
• MS Excel: xlsx, XLConnect
• SAS, SYSTAT, SPSS, Stata…: foreign
• Rcurl
• RProtoBuf: Efficient cross-language data serialization in R
Tools: Package development
• Package development:
• package.skeleton(); tools (base package)
• pkgKitten (CRAN): improvements to package.skeleton
• devtools (CRAN) : miscellaneous and very useful tools
• gtools: various R programming tools
• roxygen2 (CRAN): literate documentation
• testthat/testR: unit testing
• IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
Tools: Web development & reporting
• Shiny
• Interactive documents
• Knitr
• Sweave
Tools: parallel computing
• parallel: lots of features formerly distributed among
packages have recently been collected into this base R
package
• Revolution analytics
• Map-Reduce: rmr/rhadoop
• H20 (hexadata)
• SparkR (not on CRAN yet, look on github)
Tools: big or out of memory computing
• dplyr: supports database backed data structures
• ff: supports file based data
• biglm/bigmemory: shared memory matrices
• HadoopStreaming
Tools: memory profiling
• lineprof
• profr
• proftools
• object.size()

More Related Content

Similar to 2014 july use_r

An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
Thomas Hütter
 
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny
IDEAS - Int'l Data Engineering and Science Association
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
 
R - the language
R - the languageR - the language
R - the language
Mike Martinez
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
Ajay Ohri
 
A Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdfA Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdf
VickyAlers
 
Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech Talk
Nick Sard
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
The HDF-EOS Tools and Information Center
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
Ali Arsalan Kazmi
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
Keiichiro Ono
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
Srinath Perera
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 
Introduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataIntroduction to Decision Intelligence using Data
Introduction to Decision Intelligence using Data
Karen Lim
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
Portland R User Group
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
Portland R User Group
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Alexis Seigneurin
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
Claudio Martella
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
Gaines Kergosien
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
Rodrigo Urubatan
 

Similar to 2014 july use_r (20)

An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
R - the language
R - the languageR - the language
R - the language
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
A Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdfA Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdf
 
Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech Talk
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Introduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataIntroduction to Decision Intelligence using Data
Introduction to Decision Intelligence using Data
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

2014 july use_r

  • 1. R In Production: the products Yasmin Lucero, PhD Senior Statistician, Gravity-AOL UserR! 2014
  • 2. Outline • Internal products • 1. one-off analysis • 2. automated reports • 3. internal R packages • 4. internal dashboards • External products • 1. customer facing web-app • 2. analytical backend service • Ops and the managing of an R environment
  • 3. Internal Product 1: one-off analytical product http://rpubs.com/nathanesau1/21383 Nathan Esau Hilary Parker
  • 4. Internal Product 2: Automated reports Thursday morning: Automated Business Reporting with R (Zhengying (Doro) Lour) R + bash + email R + markdown + web server
  • 5. Internal Product 3: The Internal R package • Data APIs • Business specific metrics • Custom plotting functions • Custom data manipulation utilities Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)
  • 6. Internal Product 4: The internal dashboard Gravity-AOL
  • 7. External Product 1: Customer facing web app Wednesday afternoon Rapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz) http://www.showmeshiny.com/
  • 8. External Product 2: analytical back-end Wed afternoon: Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan) Zillow’s Big Data and Real-time Services in R (Yeng Bun)
  • 9. Artwork & Brands Bank Partner Transactions CARD.COM Site / App CARD.COM AdTech Platform APIs RTB Ad Xchgs CARD.COM Analytics Platform Members Visitors 1 2 3 Details: card.com/useR-2014 predict deploy learn CARD.com
  • 10. More good example applications: • http://blog.revolutionanalytics.com/2014/06/how-data- driven-companies-use-r-to-compete.html
  • 11. Ops: Managing an R Environment • Overall: not complex, but there are pain points: • R library management • CRAN, non-CRAN and internal packages • Version management • Dependency management (pulling all dependencies) • Non-R dependencies (especially C++ and Java) • Hardware specifications: How much RAM is enough?
  • 12. Conclusion: Why R? • Plotting • Rich analytical library • More than a DSL: end to end functionality from data APIs to web apps • Solid IDE support • Sturdy, stable easy to support platform • Rapid prototyping
  • 14. Tools: plotting • Major frameworks • Base graphics • lattice • ggplot2 • Useful utilties • grid/gridExtra/gtable • latticeExtra • Color: RColorBrewer/munsell/colorspace/dichromat • gplots (the ‘g’ school) • plotrix • Custom plots • plot.ts • maps • igraph (network visualization) • ggmap • ggvis: interactive graphics • rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github) • rgl (3d)/scatterplot3d • vcd (categorical data)
  • 15. Tools: data manipulation • Base R features • Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply… • Data structures: ts • Comprehensive, elegant missing data handling (NA) • Packages • Wickham school: reshape2/plyr/dplyr/tidyr • data.table • Time series: zoo, xts, lubridate • Spatial data tools: sp/maptools • The ‘G’ school: gdata
  • 16. Tools: Data interfaces • Connections: read.table(); url() • DBI: RpostgresSQL; RMySQL; RSQLite;… • RODBC; RJDBC: (vertica, redshift) • Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect • SAS, SYSTAT, SPSS, Stata…: foreign • Rcurl • RProtoBuf: Efficient cross-language data serialization in R
  • 17. Tools: Package development • Package development: • package.skeleton(); tools (base package) • pkgKitten (CRAN): improvements to package.skeleton • devtools (CRAN) : miscellaneous and very useful tools • gtools: various R programming tools • roxygen2 (CRAN): literate documentation • testthat/testR: unit testing • IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
  • 18. Tools: Web development & reporting • Shiny • Interactive documents • Knitr • Sweave
  • 19. Tools: parallel computing • parallel: lots of features formerly distributed among packages have recently been collected into this base R package • Revolution analytics • Map-Reduce: rmr/rhadoop • H20 (hexadata) • SparkR (not on CRAN yet, look on github)
  • 20. Tools: big or out of memory computing • dplyr: supports database backed data structures • ff: supports file based data • biglm/bigmemory: shared memory matrices • HadoopStreaming
  • 21. Tools: memory profiling • lineprof • profr • proftools • object.size()

Editor's Notes

  1. Introduce self State goal of presentation: overview of the ways that R is being used Define ‘product’ for the non-business folks (deliverable)
  2. Bread and butter for many; everyone does some of this; even non-primary R users often turn to R for this Why R: R has always tried to be a platform for statistical analysis
  3. R fits neatly into this kind of pipeline, there are useful command line utilities
  4. This product is basically an extension of the automated reporting idea.