2014 july use_r
Upcoming SlideShare
Loading in...5
×
 

2014 july use_r

on

  • 891 views

 

Statistics

Views

Total Views
891
Views on SlideShare
852
Embed Views
39

Actions

Likes
4
Downloads
23
Comments
0

1 Embed 39

https://twitter.com 39

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Introduce self <br /> State goal of presentation: overview of the ways that R is being used <br /> Define ‘product’ for the non-business folks (deliverable)
  • Bread and butter for many; everyone does some of this; even non-primary R users often turn to R for this <br /> Why R: R has always tried to be a platform for statistical analysis
  • R fits neatly into this kind of pipeline, there are useful command line utilities
  • This product is basically an extension of the automated reporting idea.

2014 july use_r 2014 july use_r Presentation Transcript

  • R In Production: the products Yasmin Lucero, PhD Senior Statistician, Gravity-AOL UserR! 2014
  • Outline • Internal products • 1. one-off analysis • 2. automated reports • 3. internal R packages • 4. internal dashboards • External products • 1. customer facing web-app • 2. analytical backend service • Ops and the managing of an R environment
  • Internal Product 1: one-off analytical product http://rpubs.com/nathanesau1/21383 Nathan Esau Hilary Parker View slide
  • Internal Product 2: Automated reports Thursday morning: Automated Business Reporting with R (Zhengying (Doro) Lour) R + bash + email R + markdown + web server View slide
  • Internal Product 3: The Internal R package • Data APIs • Business specific metrics • Custom plotting functions • Custom data manipulation utilities Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)
  • Internal Product 4: The internal dashboard Gravity-AOL
  • External Product 1: Customer facing web app Wednesday afternoon Rapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz) http://www.showmeshiny.com/
  • External Product 2: analytical back-end Wed afternoon: Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan) Zillow’s Big Data and Real-time Services in R (Yeng Bun)
  • Artwork & Brands Bank Partner Transactions CARD.COM Site / App CARD.COM AdTech Platform APIs RTB Ad Xchgs CARD.COM Analytics Platform Members Visitors 1 2 3 Details: card.com/useR-2014 predict deploy learn CARD.com
  • More good example applications: • http://blog.revolutionanalytics.com/2014/06/how-data- driven-companies-use-r-to-compete.html
  • Ops: Managing an R Environment • Overall: not complex, but there are pain points: • R library management • CRAN, non-CRAN and internal packages • Version management • Dependency management (pulling all dependencies) • Non-R dependencies (especially C++ and Java) • Hardware specifications: How much RAM is enough?
  • Conclusion: Why R? • Plotting • Rich analytical library • More than a DSL: end to end functionality from data APIs to web apps • Solid IDE support • Sturdy, stable easy to support platform • Rapid prototyping
  • yasmin.lucero@gmail.com Thanks.
  • Tools: plotting • Major frameworks • Base graphics • lattice • ggplot2 • Useful utilties • grid/gridExtra/gtable • latticeExtra • Color: RColorBrewer/munsell/colorspace/dichromat • gplots (the ‘g’ school) • plotrix • Custom plots • plot.ts • maps • igraph (network visualization) • ggmap • ggvis: interactive graphics • rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github) • rgl (3d)/scatterplot3d • vcd (categorical data)
  • Tools: data manipulation • Base R features • Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply… • Data structures: ts • Comprehensive, elegant missing data handling (NA) • Packages • Wickham school: reshape2/plyr/dplyr/tidyr • data.table • Time series: zoo, xts, lubridate • Spatial data tools: sp/maptools • The ‘G’ school: gdata
  • Tools: Data interfaces • Connections: read.table(); url() • DBI: RpostgresSQL; RMySQL; RSQLite;… • RODBC; RJDBC: (vertica, redshift) • Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect • SAS, SYSTAT, SPSS, Stata…: foreign • Rcurl • RProtoBuf: Efficient cross-language data serialization in R
  • Tools: Package development • Package development: • package.skeleton(); tools (base package) • pkgKitten (CRAN): improvements to package.skeleton • devtools (CRAN) : miscellaneous and very useful tools • gtools: various R programming tools • roxygen2 (CRAN): literate documentation • testthat/testR: unit testing • IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
  • Tools: Web development & reporting • Shiny • Interactive documents • Knitr • Sweave
  • Tools: parallel computing • parallel: lots of features formerly distributed among packages have recently been collected into this base R package • Revolution analytics • Map-Reduce: rmr/rhadoop • H20 (hexadata) • SparkR (not on CRAN yet, look on github)
  • Tools: big or out of memory computing • dplyr: supports database backed data structures • ff: supports file based data • biglm/bigmemory: shared memory matrices • HadoopStreaming
  • Tools: memory profiling • lineprof • profr • proftools • object.size()