Getting Up to Speed with R: Certificate Program in R for Statistical Analysis, Visualization and Modeling


Published on

The Institute for Statistics Education at offers a graduate-level certificate program in R for those who want to use the R statistical programming environment for statistical analysis, visualization and modeling. The Institute offers continuing education credits as well as a Program completion certificate. Courses are offered year-round (there is no semester system) on a flexible schedule. The content of the Program is the equivalent of 18 credits, in the US academic system. Faculty include R core development team members, package developers, authors of books on R: Paul Murrell, Hadley Wickham, Thomas Lumley, Sudha Purohit, Luis Torgo, John Verzani, others.

Join this webinar to learn about the structure of the certificate and available courses through the Institute, which are offered in 3 categories:

Basic programming skills in R
Statistical methods implemented in R
R applied to specific domains

Published in: Technology, Education
1 Comment
  • For data visualization,data analyticsand data intelligence tools online training with job placements, register at
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Getting Up to Speed with R: Certificate Program in R for Statistical Analysis, Visualization and Modeling

  2. 2. About• First course 2002 (resampling methods)• 2003-2004 added courses in data mining, modeling, intro stats• Now 100+ courses• Hybrid model between • Professional development (topic centered, scheduling accommodates working professionals) • Academic (homework and assessment)• Taught by noted authorities• Statistics, predictive modeling, data mining, R, optimization, risk modeling, clinical trials…
  3. 3. Why learn R?Why take classes?
  4. 4. The spread of R:Phase 1: R started in 1993 by academics, and gained popularityin universities around the world – open source & free!Phase 2: PhD statisticians who used R in university took it totheir complex quant modeling jobs in industry.Phase 3 (now): R is ubiquitous: •Industry - now that R is seeded by the PhD statisticians, other analysts in their companies need to know it. •Academic – researchers in a variety of fields who do statistics but are not primarily statisticians use R
  5. 5. Why learn R? Let’s look at what employers are looking for: R SAS SPSS Relative proportion of mentions of statistical tools in “job requirements” section of job postings. A single job may mention more than one tool.Source: survey of approx. 4000 analytics/statistics job postings on various job sites, May,2012
  6. 6. ConAgra Foods’ Human Capital Analytics/Reporting (HCA/R) program is searching for a projectmanager/statistician ….development of predictive modeling processes to answer different business issues.Excellent computer skills specifically with advanced Excel (v-lookups, pivot tables, macros), R, and other opensource software. Experience in configuration of data to support complex data mining & statistical analysis. SAS- is seeking a Research Statistician to apply cutting-edge econometric models ... demonstrated experience or knowledge of computer programming; ... particularly with applied econometric modeling or time series analysis; the SAS system; statistical software products, such as WinBugs, R, Stata, EViews, OxMetrics, or S-Plus. AmazonLocal, Applied Machine Learning Scientist · Run sampling, clustering, classification, etc on large datasets using a variety of analytics software (e.g. SAS, Python, R, etc). SRA International, ... * Use of statistical algorithms, techniques and models to define data for data integrity and process analytics * Use of data mining techniques to define data ... * Experience and demonstrated expertise with at multiple data mining tools including SAS, SPSS, R, Weka, etc. Need for R goes hand-in-hand with need for higher-level stats skills.
  7. 7. How is R being used in the real world?
  8. 8. Who are your fellow students? China Australia Germany India Academia Industry Bioinformaticist PhD candidate in epidemiology Database marketer, Canada Prof. of Survey researcher international bank medicine Health researcher Digital Marketer J&J Plant ecologist Project manager, large consulting firm PhD student in animal Circulation manager, embryology Countryside Pubs. Statistical geneticist Anthropologist, human UK remains Web developer Casualty actuary Farmer, Calif. Central Valley Forecaster, Walt Disney Government Netherlands Commodities analyst, hedge Researcher, K-12 fund Risk analyst, school dist. agriculture dept. CDC epidemiologist Team coordinator Brazil Denmark aerospace medicine Forest monitor
  9. 9. Executive and an assistant professor in an academic medical center: I have extensive experience with SPSS …I see R as the future for quantitative work and need to begin doing more of my work in R. Analyst with state government natural resource agency: We have survey designs for regional monitoring that we continually need to evaluate and improve. Currently, I program in C and rely heavily on Monte Carlo modeling. I plot in Excel and have wanted to learn R to get greater flexibility. Analyst with health and human service agency: My job is mostly data analysis and some statistical modeling which is handled via SAS and PL/SQL. Other agencies have incorporated R. I am looking to be prepared should our agency adopt R as well as understand how R compares with SAS with the hope of drawing from the strengths of both in the future.Marketing analyst, international banking: Since we are manipulating tons of data at customerlevel for more than 27 countries, R would be the perfect complement tool (we have been usingSAS) for customer analytics. Analyst with non-profit organization: We do quite a bit of data analysis (mostly descriptive work and GIS mapping) and I started teaching myself R a few years ago in order to automate our routine data cleaning.Database marketer, banking: I have used SAS for 8 years, also have experience inFICO Model Builder, but am new to R and want to learn those comprehensivepackages which are not available in base SAS to do more advance analytics. Commodities analyst at a hedge fund: Im looking to use R to build more robust, stable and dynamic econometric models.
  10. 10. Why take classes? Why not learn on your own?• R is not like SAS, SPSS • SAS has two very distinct user types: • Programmer • Statistical modeler & analyst • SPSS the latter • R is powerful, but has more programming and “messiness,” even when used purely in analysis/modeling mode.• Often it is helpful to have an expert on hand while learning R• 4-week courses allow an iterative process – a short intensive learning period, ask lots of questions. Apply what you learn. Come back later for another 4-week class. Learn more. Apply. Repeat.
  11. 11. Certificate Program Content PREREQUISITES: None for entry into program, but introductory statistics is a prerequisite for some courses. 6 ELECTIVES 6 REQUIRED R-Specific: Include R: •Intro to R – Data Handling •Intro to R – Statistical •Data mining •Probability Analysis •Spatial Distributions •Programming in R •Microarray •Resampling •Programming in R – Adv. •SVM •Bootstrap •Modeling in R •Clinical Trials Apps •Logistic •Graphics in R •ggplot2 Regression •Smoothing with P- •GLM splines •Count data •Survey Analysis
  12. 12. 1. The principles of R programming:• Introduction to R – Data Handling (Paul Murrell) introduces basic expressions, symbols, assignment, functions, packages, use of code editors (emacs), workspace, data types & structures, subsetting, assessor functions, classes, type coercion, text files, binary files, large files, memory management, apply function, tabulate, aggregation, merging and splitting data, reshape, text processing.• Programming in R (2 courses with Hadley Wickham) covers lexical scoping, dynamic scoping, frames, environments, namespaces, active bindings, quoting, evaluation, calling from other functions, string processing (stringr), dates and times (lubridate), regular expressions, xml and xpath, extracting data with SQL, executing SQL in R, writing compact and efficient code (helper function, lapply), anonymous functions, first class functions, object oriented programming, S3, tips for producing reliable code, functions and options to help debug, speed, testing.
  13. 13. 2. Plotting and visualizing data in R:Graphics in R (Paul Murrell, covers the core R capabilities forgraphing, and teaches you to produce key statistical plots such asscatterplots, )R ggplot2 (Hadley Wickham teaches how to use his package,which is a package with its own language that rests on R, to creategraphs)
  14. 14. 3. Application/method/domain specific:Other classes are application oriented, where syntax andprogramming are discussed, as necessary, on the path togetting R to accomplish something specific. Intro stats,statistical modeling, microarray analysis, data mining, surveyanalysis.In the most basic of these, Introduction to R – StatisticalAnalysis, some familiarity with statistical procedures is assumedand you learn R by executing these procedures (t-tests, chi-square, correlation, regression, etc.) in R. In other cases, theemphasis is on learning the method and R is simply the chosentool.Let’s see an example from the Statistical Analysis course.
  15. 15. Snapshot: Regression. The instructions are givenstep-by-step in Lesson 3 of “Introduction to R –Statistical Analysis.”The lm function will estimate the regression parameters for thesimple linear regression model. For the two models specified abovewe have:> lm(total ~ w.class, data = d)Call:lm(formula = total ~ w.class, data = d)Coefficients:(Intercept) w.class159.815 2.732which gives estimates ˆb0 = 159.815 and ˆb1 = 2.732.
  16. 16. KIM ASKSHi John,I was plotting the residuals from a linear regression (example on page 19 of the lesson 3), and there was a delay before theplots would show. The message on the R console was "Waiting to confirm page change." By clicking on the graphics, I couldswitch from one plot to the next. Is there anyway to make them tile so I can see all of them at once, or any way to go back andforth once theyve printed on the graphics page? JOHN VERZANI REPLIES: A couple of possibilities exist: You can partition your graphics device so that more than one graphic will appear. For example, par(mfrow=c(2,2)) will set up a 2 by 2 grid, perfect for the plot function called on the output of the lm function. On some implementations you can record plots and scroll back through them. For windows users, the RGui application (your basic interface) allows you to turn on recording, I think by right clicking on a plot (if Im wrong let me know, and Ill check). For RStudio, the graphs are already recorded. There are arrows to scroll. Hope one of those works for you. --J SABINA CHIMES IN Where do you type it? In the plot command? I have tried: > plot(res.pipeline, par(mfrow=c(2,2))) and get Error in plot.lm(res.pipeline, par(mfrow = c(2, 2))) : which must be in 1:6 ....How do you keep track of all these different ways of doing things. I find that your comments are amazing... JOHN REPLIES The par settings are done in their own command (well some are). Try: par(mfrow=c(2,2)) plot(res.pipeline) The ".lm" extra bit isnt necessary (though doesnt hurt), as R will use the class of res.pipeline to find that function in most usual cases. Let me know if that doesnt help .
  17. 17. ALTA ASKSJohn, what does masked in the following error message mean? and what is .GlobeEnv? thnx in advance>attach(kid.weights)The following object(S) are masked _by_ .GlobeEnv:age JOHN REPLIES R looks for objects by traversing a series of nested environments. In this case, when you attach(kid.weights) it includes a variable age. However you already have a variable age in your global workspace (.GlobalEnv is the secret name for that). Which one do you want? Well, R is answering which one it will find. In this case the one in the global workspace, not that in kid.weights. For that one, you will need to work harder (using $ or with or ...) Does that help? gotcha! very helpful--thnx
  18. 18. • Polling question #3 – how many analytics professionals
  19. 19. How courses work Discussion Homework Forum Readings, notes, videos
  20. 20. Weekly Course Schedule Most courses are 4 weeks. ~ March 2013 ~ Sun Mon Tue Wed Thu Fri Sat 1 2 Lesson 1 opens3 4 5 6 7 8 9 Lesson 2 opens10 11 12 13 14 15 16Homework 1 due Feedback Homework 1 Lesson 3 opens17 18 19 20 21 22 23Homework 2 due Feedback Homework 2 Lesson 4 opens24 25 26 27 28 29 30Homework 3 due Feedback Homework 331 April 1 2 3 4 5 6Homework 4 due Feedback Homework 4
  21. 21. Time Required • Estimate 15 hours per week • Don’t need to be online at particular times or days • Time zone does not matter • Best not to leave all work until the end of the week • Materials remain open for a couple of weeks after end-of- course • Most students are working professionals, take courses one at a time
  22. 22. FacultyPaul Murrell John Verzani Hadley Wickham Sudha PurohitLuis Torgo David Unwin Thomas Lumley Din ChenKarl Peace Garrett Brian Marx Paul Eilers Grolemund
  23. 23. Typical Course Contents – R Programming• “Headquarters” Page• Lesson Page• Readings/notes/videos• Homework• Discussion Forum
  24. 24. Typical Course Contents – R Programming• “Headquarters” Page• Lesson Page• Readings/notes/videos• Homework• Discussion Forum
  25. 25. Typical Course Contents – R Programming• “Headquarters” Page• Lesson Page• Readings/notes/videos• Homework• Discussion Forum
  26. 26. Typical Course Contents – R Programming• “Headquarters” Page• Lesson Page• Readings/notes/videos• Homework• Discussion Forum
  27. 27. Typical Course Contents – R Programming• “Headquarters” Page• Lesson Page• Readings/notes/videos• Homework• Discussion Forum
  28. 28. Equiv. to 18 credits, US system$5900 approx.
  29. 29. Next Step.For certificate program application, or call1-855-GET-REVO (1-855-438-7386)• Application fee will be waived (until July 30th)• Up to 50% discount offered for Revolution Analytics software when purchased in combination with training