0
R In Production:
the products
Yasmin Lucero, PhD
Senior Statistician, Gravity-AOL
UserR! 2014
Outline
• Internal products
• 1. one-off analysis
• 2. automated reports
• 3. internal R packages
• 4. internal dashboards...
Internal Product 1:
one-off analytical product
http://rpubs.com/nathanesau1/21383
Nathan Esau
Hilary Parker
Internal Product 2:
Automated reports
Thursday morning:
Automated Business Reporting with
R (Zhengying (Doro) Lour)
R + ba...
Internal Product 3:
The Internal R package
• Data APIs
• Business specific metrics
• Custom plotting functions
• Custom da...
Internal Product 4:
The internal dashboard
Gravity-AOL
External Product 1:
Customer facing web app
Wednesday afternoon
Rapid Prototyping with R/Shiny at
McKinsey (Aaron Horowitz...
External Product 2:
analytical back-end
Wed afternoon:
Deploying R into Business Intelligence and Real-time Applications
(...
Artwork
& Brands
Bank
Partner
Transactions
CARD.COM
Site / App
CARD.COM
AdTech Platform
APIs
RTB Ad
Xchgs
CARD.COM
Analyti...
More good example applications:
• http://blog.revolutionanalytics.com/2014/06/how-data-
driven-companies-use-r-to-compete....
Ops: Managing an R Environment
• Overall: not complex, but there are pain points:
• R library management
• CRAN, non-CRAN ...
Conclusion: Why R?
• Plotting
• Rich analytical library
• More than a DSL: end to end functionality from data APIs
to web ...
yasmin.lucero@gmail.com
Thanks.
Tools: plotting
• Major frameworks
• Base graphics
• lattice
• ggplot2
• Useful utilties
• grid/gridExtra/gtable
• lattice...
Tools: data manipulation
• Base R features
• Data structures: the data.frame
• Vectorized data manipulation: apply, tapply...
Tools: Data interfaces
• Connections: read.table(); url()
• DBI: RpostgresSQL; RMySQL; RSQLite;…
• RODBC; RJDBC: (vertica,...
Tools: Package development
• Package development:
• package.skeleton(); tools (base package)
• pkgKitten (CRAN): improveme...
Tools: Web development & reporting
• Shiny
• Interactive documents
• Knitr
• Sweave
Tools: parallel computing
• parallel: lots of features formerly distributed among
packages have recently been collected in...
Tools: big or out of memory computing
• dplyr: supports database backed data structures
• ff: supports file based data
• b...
Tools: memory profiling
• lineprof
• profr
• proftools
• object.size()
Upcoming SlideShare
Loading in...5
×

2014 july use_r

1,763

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,763
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Introduce self
    State goal of presentation: overview of the ways that R is being used
    Define ‘product’ for the non-business folks (deliverable)
  • Bread and butter for many; everyone does some of this; even non-primary R users often turn to R for this
    Why R: R has always tried to be a platform for statistical analysis
  • R fits neatly into this kind of pipeline, there are useful command line utilities
  • This product is basically an extension of the automated reporting idea.
  • Transcript of "2014 july use_r"

    1. 1. R In Production: the products Yasmin Lucero, PhD Senior Statistician, Gravity-AOL UserR! 2014
    2. 2. Outline • Internal products • 1. one-off analysis • 2. automated reports • 3. internal R packages • 4. internal dashboards • External products • 1. customer facing web-app • 2. analytical backend service • Ops and the managing of an R environment
    3. 3. Internal Product 1: one-off analytical product http://rpubs.com/nathanesau1/21383 Nathan Esau Hilary Parker
    4. 4. Internal Product 2: Automated reports Thursday morning: Automated Business Reporting with R (Zhengying (Doro) Lour) R + bash + email R + markdown + web server
    5. 5. Internal Product 3: The Internal R package • Data APIs • Business specific metrics • Custom plotting functions • Custom data manipulation utilities Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)
    6. 6. Internal Product 4: The internal dashboard Gravity-AOL
    7. 7. External Product 1: Customer facing web app Wednesday afternoon Rapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz) http://www.showmeshiny.com/
    8. 8. External Product 2: analytical back-end Wed afternoon: Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan) Zillow’s Big Data and Real-time Services in R (Yeng Bun)
    9. 9. Artwork & Brands Bank Partner Transactions CARD.COM Site / App CARD.COM AdTech Platform APIs RTB Ad Xchgs CARD.COM Analytics Platform Members Visitors 1 2 3 Details: card.com/useR-2014 predict deploy learn CARD.com
    10. 10. More good example applications: • http://blog.revolutionanalytics.com/2014/06/how-data- driven-companies-use-r-to-compete.html
    11. 11. Ops: Managing an R Environment • Overall: not complex, but there are pain points: • R library management • CRAN, non-CRAN and internal packages • Version management • Dependency management (pulling all dependencies) • Non-R dependencies (especially C++ and Java) • Hardware specifications: How much RAM is enough?
    12. 12. Conclusion: Why R? • Plotting • Rich analytical library • More than a DSL: end to end functionality from data APIs to web apps • Solid IDE support • Sturdy, stable easy to support platform • Rapid prototyping
    13. 13. yasmin.lucero@gmail.com Thanks.
    14. 14. Tools: plotting • Major frameworks • Base graphics • lattice • ggplot2 • Useful utilties • grid/gridExtra/gtable • latticeExtra • Color: RColorBrewer/munsell/colorspace/dichromat • gplots (the ‘g’ school) • plotrix • Custom plots • plot.ts • maps • igraph (network visualization) • ggmap • ggvis: interactive graphics • rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github) • rgl (3d)/scatterplot3d • vcd (categorical data)
    15. 15. Tools: data manipulation • Base R features • Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply… • Data structures: ts • Comprehensive, elegant missing data handling (NA) • Packages • Wickham school: reshape2/plyr/dplyr/tidyr • data.table • Time series: zoo, xts, lubridate • Spatial data tools: sp/maptools • The ‘G’ school: gdata
    16. 16. Tools: Data interfaces • Connections: read.table(); url() • DBI: RpostgresSQL; RMySQL; RSQLite;… • RODBC; RJDBC: (vertica, redshift) • Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect • SAS, SYSTAT, SPSS, Stata…: foreign • Rcurl • RProtoBuf: Efficient cross-language data serialization in R
    17. 17. Tools: Package development • Package development: • package.skeleton(); tools (base package) • pkgKitten (CRAN): improvements to package.skeleton • devtools (CRAN) : miscellaneous and very useful tools • gtools: various R programming tools • roxygen2 (CRAN): literate documentation • testthat/testR: unit testing • IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
    18. 18. Tools: Web development & reporting • Shiny • Interactive documents • Knitr • Sweave
    19. 19. Tools: parallel computing • parallel: lots of features formerly distributed among packages have recently been collected into this base R package • Revolution analytics • Map-Reduce: rmr/rhadoop • H20 (hexadata) • SparkR (not on CRAN yet, look on github)
    20. 20. Tools: big or out of memory computing • dplyr: supports database backed data structures • ff: supports file based data • biglm/bigmemory: shared memory matrices • HadoopStreaming
    21. 21. Tools: memory profiling • lineprof • profr • proftools • object.size()
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×