Why R?
Jeffrey Stanton
Syracuse University
What is R?
• R is a statistics, data management, and
graphics platform
• R is open source, maintained and developed
by a c...
CRAN
So Why or Why Not R?
• Most popular statistics software (other than R)
and some of their audiences:
– SPSS: Social Scienti...
http://r4stats.com/articles/popularity/
But. . .
• Statistics users like point and click
• R is command line oriented; there are GUIs that
can be loaded as add-on...
R-Studio
Command Line? Advantages?
• In social sciences there has been a lot of talk
lately about replication, the necessity of hav...
Look Out! Real Code!
# Read U.S. States shape data from census GIS data set
usShape <- readShapeSpatial("gz_2010_us_040_00...
Colorful!
Many Packages - CRAN Task View
ChemPhys
Econometrics
Environmetrics
ExperimentalDesign
Finance
Genetics
Graphics
HighPerfo...
Why R?
• Free and open source
• Huge community of users, enormous
repository of working code examples, many
sources of onl...
Jsresearch.net
Upcoming SlideShare
Loading in …5
×

Why R? A Brief Introduction to the Open Source Statistics Platform

3,889 views

Published on

A brief introduction to the R open source statistical platform.

Published in: Education, Technology
1 Comment
7 Likes
Statistics
Notes
  • For Business Analytics Tools Online Training register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,889
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
258
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Why R? A Brief Introduction to the Open Source Statistics Platform

  1. 1. Why R? Jeffrey Stanton Syracuse University
  2. 2. What is R? • R is a statistics, data management, and graphics platform • R is open source, maintained and developed by a community of developers. • The R code repository, as well as compiled binaries (ready-to-install software) available at: http://cran.r-project.org • R comprises a core program plus 1000s of freely available add-in packages.
  3. 3. CRAN
  4. 4. So Why or Why Not R? • Most popular statistics software (other than R) and some of their audiences: – SPSS: Social Scientists – Stata: Social Scientists – Mathematica/Matlab: Engineers, mathematicians, computer scientists, and physicists – Python/NumPy: Computer scientists, web developers – SAS: Data intensive industries (e.g., financial services) – Excel: All types of organizations • R is more popular and used by a larger number of analysts than each of these
  5. 5. http://r4stats.com/articles/popularity/
  6. 6. But. . . • Statistics users like point and click • R is command line oriented; there are GUIs that can be loaded as add-on packages; • R-Studio is a Integrated Development Environment (IDE) for R, but more for code development than statistical analysis • R is free, but this also means that there is no formal support mechanism; large organizations often like to contract with a commercial provider
  7. 7. R-Studio
  8. 8. Command Line? Advantages? • In social sciences there has been a lot of talk lately about replication, the necessity of having results that are reproducible • In the world of “big data,” analysts want to produce systems that are transparent, reliable, and that maintain a chain of provenance for each transformation that affects the data • Looking at statistical analysis as a kind of “programming” task (like the old days!) has immense advantages
  9. 9. Look Out! Real Code! # Read U.S. States shape data from census GIS data set usShape <- readShapeSpatial("gz_2010_us_040_00_500k.shp") # Attach the delta CPI data to the states usShape@data$delta <- stateCPIdelta # Consumer price indices in this table # This sets up break points for color designations. # We want 20 gradations of color across all choropleths. bfloor <- floor(min(usShape@data[,"delta"],na.rm=TRUE)*10)/10 bceil <- (ceiling(max(usShape@data[,"delta"],na.rm=TRUE)*10)/10) + 20 breaks <- seq(bfloor, bceil, 20) # Attach the color cut points to the shape data usShape@data$zCat <- cut(usShape@data[,"delta"],breaks,include.lowest=TRUE) cutpoints <- levels(usShape@data$zCat) # For later use with the legend
  10. 10. Colorful!
  11. 11. Many Packages - CRAN Task View ChemPhys Econometrics Environmetrics ExperimentalDesign Finance Genetics Graphics HighPerformanceComputing MachineLearning MedicalImaging MetaAnalysis Multivariate NaturalLanguageProcessing Optimization Pharmacokinetics Phylogenetics Psychometrics ReproducibleResearch SocialSciences Spatial Survival TimeSeries WebTechnologies Chemometrics and Computational Physics Computational Econometrics Analysis of Ecological and Environmental Data Design of Experiments (DoE) & Analysis of Experimental Data Empirical Finance Statistical Genetics Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization High-Performance and Parallel Computing with R Machine Learning & Statistical Learning Medical Image Analysis Meta-Analysis Multivariate Statistics Natural Language Processing Optimization and Mathematical Programming Analysis of Pharmacokinetic Data Phylogenetics, Especially Comparative Methods Psychometric Models and Methods Reproducible Research Statistics for the Social Sciences Analysis of Spatial Data Survival Analysis Time Series Analysis Web Technologies and Services
  12. 12. Why R? • Free and open source • Huge community of users, enormous repository of working code examples, many sources of online expertise/support • Dizzying array of add-on packages for almost any imaginable data application • Encourages good data practice: coding a reproducible chain of data transformations
  13. 13. Jsresearch.net

×