Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Presentation R basic teaching module by Sander Timmer 1087 views
- Language R by Girish Khanzode 624 views
- Example R usage for oracle DBA UKOU... by BertrandDrouvot 2385 views
- Getting Started with R by Syracuse University 1741 views
- 2008 Pleasure Way Excel TD Class B ... by Sunshine State RVs 1537 views
- Big Data Analytics with R by Great Wide Open 1408 views

No Downloads

Total views

2,309

On SlideShare

0

From Embeds

0

Number of Embeds

278

Shares

0

Downloads

53

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Visualization and Analysis of Big Data with the R Programming Language Michael E. Driscoll, Ph.D. Presented to Amyris April 2009
- 2. “The sexy job in the next ten years will be statisticians.” – Hal Varian, Chief Economist, Google
- 3. What is R? What can it do? • data manipulation • statistics • visualization Why is it different? • created by statisticians • free, open source • extensible via packages
- 4. What is R? Data Manipulation Data Visualization • database connectivity • slicing & dicing data cubes Statistical Analysis • hypothesis testing • model fitting • clustering • machine learning
- 5. I. Taming Microarray Data with Bioconductor Statistical analysis Visualization of hybridization artifacts • fit models for the distributions of expression values • test hypotheses about outliers • cluster genes with similar patterns http://www.bioconductor.org
- 6. 1million transactions during this presentation
- 7. II. Clustering Product Purchases Statistical analysis Which products are ordered together? • every customer has a history of product purchases • hierarchically cluster products and customers • other approaches (depending on goals): singular value decomposition
- 8. 2 billion clicks during this presentation
- 9. III. Optimizing Online Advertising Statistical analysis How confident are we that B beats A? • estimate posterior distributions for click rates from observed data • test hypothesis that the click-rate of a given ad A is greater than for ad B
- 10. IV. A Tale of Two Pitchers Hamels Webb
- 11. R Nuts and Bolts “The best thing about R is that it was developed by statisticians. The worst thing about R is that… it was developed by statisticians.” – Bo Cowgill, Google
- 12. Data Manipulation Getting Data In Getting Data Out SQL Data formats: • MySQL • Delimited (CSV, Excel) • ODBC (Oracle, MS-SQL) • Matlab Excel Graphic formats: • Vector (PDF, EPS, SVG) Matlab • Raster (PNG, TIFF) driver <- dbDriver(quot;MySQLquot;) con <- dbConnect(driver,user=“tgardner”, password=“julien05”, host=“data.amyris.com”, dbname=“biofx”) resultSet <- dbSendQuery(con, “SELECT * FROM assay”) data <- fetch(resultSet, n=-1)
- 13. Statistical Methods
- 14. Extending R with Packages CRAN http://cran.r-project.org • ~ 2000 packages • organized by field • easy to install > install.package( “lattice”)
- 15. R Packages: Beautiful Colors with Colorspace library(“Colorspace”) red <- LAB(50,64,64) blue <- LAB(50,-48,-48) mixcolor(10, red, blue)
- 16. R Packages: Creating Panel Plots with Lattice library(“Lattice”) xyplot(x ~ y | pitch_type, data = gameday)
- 17. Getting Started Choose a UI Download at R-project.org • Emacs – ESS • JGR – Java GUI for R • Rattle http://www.r-project.org
- 18. Getting Help Online Books • use inline help > ?plot • search /post at R-help http://tolstoy.newcastle.edu.au/R Modern Applied Statistics with S W.N.Venables & B.D. Ripley Use R series includes 20 volumes http://www.springer.com/series/6991
- 19. Data Desktop
- 20. Which is Easier? or Coding Clicking
- 21. R-Based Dashboards A Simple Script setContentType(quot;text/htmlquot;) png(quot;/var/www/hello.pngquot;) plot(sample(100,100),col=1:8,pch=19) dev.off() cat(quot;<html>quot;) cat(quot;<body>quot;) cat(quot;<h1>hello world</h1>quot;) cat('<img src=quot;../hello.pngquot;') cat(quot;</body>quot;) cat(quot;</html>quot;) Download Jeff Horner’s Rapache at http://biostat.mc.vanderbilt.edu/rapache/
- 22. R-Based Dashboards http://labs.dataspora.com/gameday
- 23. Contacting Us 350 Townsend St, Suite 270 San Francisco, CA 415-860-4347 inquire@dataspora.com

No public clipboards found for this slide

Be the first to comment