Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
INTRODUCTION TO R
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
Origin in the Bell Labs in the 1970’s
HISTORY AND EVOLUTION OF R
R has developed from the S language
HISTORY AND EVOLUTION OF R
SVersion 1
SVersion 4
SVersion 3
SVersion 2
Developed 30 ye...
1990’s: R developed concurrently
with S
1993: R made public
The regular development of R
HISTORY AND EVOLUTION OF R
Accele...
Growing number of packages
HISTORY AND EVOLUTION OF R
2001: ~100 packages
2009: Over 2000 packages
Source: R Journal Vol 1...
Explosion of R popularity in the last decade
HISTORY AND EVOLUTION OF R
 Object-oriented, growing user base, scripting fe...
Comparison of Mailing Lists
HISTORY AND EVOLUTION OF R
Evolution of the traffic on software main mailing-lists. Source: R....
Popularity amongst programming languages
HISTORY AND EVOLUTION OF R
KD Nuggets 2012 survey
Number of Blogs
HISTORY AND EVOLUTION OF R
Software Number of Blogs
R 365
SAS 40
Stata 8
Others 0-3
Data as on Mar 2012
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
 R is rather a programming language
 Limited user-friendly interfaces for data analysis
 Is object oriented and almost ...
Recent endeavours to enhance R user-friendliness
R has limited Graphical User Interface (GUI) options
PRINCIPLE AND SOFTWA...
R Commander (RCmdr)
PRINCIPLE AND SOFTWARE PARADIGM
RKWard
PRINCIPLE AND SOFTWARE PARADIGM
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
Inherent limitations of pervasive Excel-like spreadsheets
PRINCIPLE AND SOFTWARE PARADIGM
VS.
Sophisticated but costly SAS
PRINCIPLE AND SOFTWARE PARADIGM
VS.
Screenshot of SAS enteprise Miner
7.1. Source: sas.com
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
R console
DESCRIPTION OF R INTERFACE
R desktop
shortcut
RGui: R basic
interface
R command
line (space to
write
instruction...
Using the command line in R console
DESCRIPTION OF R INTERFACE
First false sentence
followed by R’s
error message
Second c...
RGui menu: File tab
DESCRIPTION OF R INTERFACE
File tab: Usual basic
and general
operations
RGui menu: Edit tab
DESCRIPTION OF R INTERFACE
Edit tab: basic
and general
editing
Results of the
data editor
Data editor:...
RGui menu: View tab
DESCRIPTION OF R INTERFACE
View tab: viewing
Toolbar and/or
Status bar
RGui menu: Misc tab
DESCRIPTION OF R INTERFACE
Misc tab:
diverse
operations
RGui menu: Packages tabs
DESCRIPTION OF R INTERFACE
Packages tab:
adding functions
to R foundation
RGui menu: Windows tab
DESCRIPTION OF R INTERFACE
Windows tab:
usual options
to arrange the
tiles
RGui menu: Help tab
DESCRIPTION OF R INTERFACE
Help tab: very
important links
to help
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
 Open source code
 You can access the code of the software
 In-depth understanding of what R does
 Modify the code
R “...
Example of source code of the “mgcv” package
R access to source code
ADVANTAGES OF R
Screenshot of unzipping the « mgcv » ...
R is free
ADVANTAGES OF R
Software Academics Demo Commercial
(basic)
Commercial
(full)
R Free Free Free Free
SAS Free to $...
Interface with other languages and scripting capabilities
ADVANTAGES OF R
Screenshot of the file « mgcv.c » of the « mgcv ...
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
 R ~ tool used by the finest researchers
 Top-notch analytics capabilities
R role in academia
ADVANTAGES OF R
Screenshot...
Free open source philosophy
To summarize
ADVANTAGES OF R
 R websites with many examples
 Free books
 Free online open c...
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
Poor management of large datasets
 Avoid imbricated loops
 Prefer R advanced language for data structure
Average memory ...
No default parallel execution
 R packages to use several cores
 Top skills needed for high performance computing
Average...
Difficult to inspect data sets
Difficult data visualization and management
DRAWBACKS OF R
Screenshot of the R data editor ...
Problems for large organizations
 R made of several thousands independent packages
 No deployment plan for complex organ...
Steep learning curve
 R code far from undergrad computer science courses
 Very complex data structures (useful if master...
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
No language is perfect!!
 Contradictory objectives to meet
 Strengths and weaknesses of each language
More positive than...
Very appealing solution
SO WHY LEARN R?
Popularity of business analytics software (green = very popular, red = unpopular)....
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Dra...
Many books available: choose the one that fits you!
 Style, pedagogy, theory vs practice
 Browse several books at local ...
Websites
REFERENCES FOR LEARNING R
R official websites
 The R project for statistical computing (www.r-project.org )
 Ma...
Growing number of conferences about R
Conferences
REFERENCES FOR LEARNING R
 Annual during a few days in new venue (Googl...
Upcoming SlideShare
Loading in …5
×

Class ppt intro to r

14,211 views

Published on

Intro to R

Published in: Education

Class ppt intro to r

  1. 1. INTRODUCTION TO R
  2. 2. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
  3. 3. Origin in the Bell Labs in the 1970’s HISTORY AND EVOLUTION OF R
  4. 4. R has developed from the S language HISTORY AND EVOLUTION OF R SVersion 1 SVersion 4 SVersion 3 SVersion 2 Developed 30 years ago for research applied to the high-tech industry
  5. 5. 1990’s: R developed concurrently with S 1993: R made public The regular development of R HISTORY AND EVOLUTION OF R Acceleration of R development  R-Help and R-Devl mailing-lists  Creation of the R Core Group Source: R Journal Vol 1/2
  6. 6. Growing number of packages HISTORY AND EVOLUTION OF R 2001: ~100 packages 2009: Over 2000 packages Source: R Journal Vol 1/2 2000: R version 1.0.1 Today: R version 2.14
  7. 7. Explosion of R popularity in the last decade HISTORY AND EVOLUTION OF R  Object-oriented, growing user base, scripting features  Free and open-source  Irrational reasons: R seen as « cool »
  8. 8. Comparison of Mailing Lists HISTORY AND EVOLUTION OF R Evolution of the traffic on software main mailing-lists. Source: R.A. Muenchen, r4stats.com
  9. 9. Popularity amongst programming languages HISTORY AND EVOLUTION OF R KD Nuggets 2012 survey
  10. 10. Number of Blogs HISTORY AND EVOLUTION OF R Software Number of Blogs R 365 SAS 40 Stata 8 Others 0-3 Data as on Mar 2012
  11. 11. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  12. 12.  R is rather a programming language  Limited user-friendly interfaces for data analysis  Is object oriented and almost non declarative  Similar to programming languages like Fortran, C, Java, Python R is not really a (statistical) software PRINCIPLE AND SOFTWARE PARADIGM
  13. 13. Recent endeavours to enhance R user-friendliness R has limited Graphical User Interface (GUI) options PRINCIPLE AND SOFTWARE PARADIGM Several GUIs in development R-commander RKWard Rattle
  14. 14. R Commander (RCmdr) PRINCIPLE AND SOFTWARE PARADIGM
  15. 15. RKWard PRINCIPLE AND SOFTWARE PARADIGM
  16. 16. Rattle PRINCIPLE AND SOFTWARE PARADIGM
  17. 17. Inherent limitations of pervasive Excel-like spreadsheets PRINCIPLE AND SOFTWARE PARADIGM VS.
  18. 18. Sophisticated but costly SAS PRINCIPLE AND SOFTWARE PARADIGM VS. Screenshot of SAS enteprise Miner 7.1. Source: sas.com
  19. 19. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  20. 20. R console DESCRIPTION OF R INTERFACE R desktop shortcut RGui: R basic interface R command line (space to write instructions)
  21. 21. Using the command line in R console DESCRIPTION OF R INTERFACE First false sentence followed by R’s error message Second correct sentence Declaration and printing of the sentence as a R object Simple math computations Basic information about the R object containing the sentence
  22. 22. RGui menu: File tab DESCRIPTION OF R INTERFACE File tab: Usual basic and general operations
  23. 23. RGui menu: Edit tab DESCRIPTION OF R INTERFACE Edit tab: basic and general editing Results of the data editor Data editor: entering the object’s name
  24. 24. RGui menu: View tab DESCRIPTION OF R INTERFACE View tab: viewing Toolbar and/or Status bar
  25. 25. RGui menu: Misc tab DESCRIPTION OF R INTERFACE Misc tab: diverse operations
  26. 26. RGui menu: Packages tabs DESCRIPTION OF R INTERFACE Packages tab: adding functions to R foundation
  27. 27. RGui menu: Windows tab DESCRIPTION OF R INTERFACE Windows tab: usual options to arrange the tiles
  28. 28. RGui menu: Help tab DESCRIPTION OF R INTERFACE Help tab: very important links to help
  29. 29. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  30. 30.  Open source code  You can access the code of the software  In-depth understanding of what R does  Modify the code R “philosophy” ADVANTAGES OF R Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN Adress of the « mgcv » package Link with Package sources (.tar.gz file) Example “mgcv” package webpage
  31. 31. Example of source code of the “mgcv” package R access to source code ADVANTAGES OF R Screenshot of unzipping the « mgcv » package and browsing through the package’s files. Unzipping mgcv_1.7-13.tar.gz file (with 7zip) List of directories in the « mgcv » package List of functions (i.e open code) in the « src » (i.e code sources) directory the « mgcv » package1 2 3
  32. 32. R is free ADVANTAGES OF R Software Academics Demo Commercial (basic) Commercial (full) R Free Free Free Free SAS Free to $100s Not available $1 000s $10 000s Statistica $100s 30 days limit ~$1 000 $10 000 Excel (Microsoft) Free to $10s Limited ~$100 $100s SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
  33. 33. Interface with other languages and scripting capabilities ADVANTAGES OF R Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad « mgcv.c » file in the « mgcv » package coded in typical C programming language Interfaces with virtually any other programming language  Fortran, C, C++, Python…  Tailor or rewrite your old codes in R R as a scripting language  R scripts can launch or be launched by other languages
  34. 34. R visualization capabilities ADVANTAGES OF R
  35. 35. R visualization capabilities ADVANTAGES OF R
  36. 36. R visualization capabilities ADVANTAGES OF R
  37. 37.  R ~ tool used by the finest researchers  Top-notch analytics capabilities R role in academia ADVANTAGES OF R Screenshot of a user’s Facebook map . Source: Paul Butler/Facebook, DG Rossiter, spatialanalysis.co.uk
  38. 38. Free open source philosophy To summarize ADVANTAGES OF R  R websites with many examples  Free books  Free online open courses  Twitter accounts Online help and discussion  Mailing-lists  Very active and diverse forums  Communities of developers and helpers
  39. 39. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  40. 40. Poor management of large datasets  Avoid imbricated loops  Prefer R advanced language for data structure Average memory performance DRAWBACKS OF R Complicated structure of packages in R  Dozen of packages  To be loaded every time in memory R packages to better manage memory  Rhadoop (inspiration from Google)  Ff  bigmemory
  41. 41. No default parallel execution  R packages to use several cores  Top skills needed for high performance computing Average computing performance DRAWBACKS OF R A high-level programming language  Abstract and modern (Python…)  More productive coding  But further from « machine language »…  … meaning 100 times slower than C
  42. 42. Difficult to inspect data sets Difficult data visualization and management DRAWBACKS OF R Screenshot of the R data editor and « Viewtable » tab in SAS 9.3
  43. 43. Problems for large organizations  R made of several thousands independent packages  No deployment plan for complex organizations  No installation support Difficult architecture management DRAWBACKS OF R Lack of code accountability  Thousands of individual independent R developers  Nobody responsible for the quality of the code Potentially high hidden costs with R  Total cost may favour commercial solutions for complex computations made in large corporations
  44. 44. Steep learning curve  R code far from undergrad computer science courses  Very complex data structures (useful if mastered)  Is R’s syntax not logical? Relatively difficult to learn DRAWBACKS OF R Still, not more difficult to learn than SAS  Both SAS and R more abstract than basic programming languages (Fortran, C…)  Difficult to learn = more rewarding professionally!!
  45. 45. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why use R? • References for learning R
  46. 46. No language is perfect!!  Contradictory objectives to meet  Strengths and weaknesses of each language More positive than negative points SO WHY LEARN R? Different needs imply different tools  Large corporations + defined procedures  SAS-like  Less financial resources + quick proof of concept  R Effect of legacy and the culture of the organization  Use existing solutions (system architecture, BA tools…)  Habits in business analytics
  47. 47. Very appealing solution SO WHY LEARN R? Popularity of business analytics software (green = very popular, red = unpopular). Source: Rexer Analytics Overall Corporate Consultants Academics NGO/Gov't R SAS IBMSPSS STATISTICA Owncode
  48. 48. AGENDA • History and evolution of R • Principle and software paradigm • Description of R interface • Advantages of R • Drawbacks of R • So why using R? • References for learning R
  49. 49. Many books available: choose the one that fits you!  Style, pedagogy, theory vs practice  Browse several books at local library or store Books REFERENCES FOR LEARNING R Springer’s UseR! Series (http://www.springer.com/series/6991)  Recent, concise, good quality, affordable, diverse Pure rookies: « A beginners’ guide to R », « R by example » One step forward: « Business analytics for managers » Intensive Excel users: « R through Excel » O’Reilly R series (for programmers) « R cookbook », « R in a nuttshell »
  50. 50. Websites REFERENCES FOR LEARNING R R official websites  The R project for statistical computing (www.r-project.org )  Mailing lists (« R-help », Special Interest Groups) and R journal  Official (austere) manuals (« An introduction to R ») Other websites  UCLA online R resources http://www.ats.ucla.edu/stat/r/)  R blogs aggregator (www.r-bloggers.com)  Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts (@RevolutionR, @inside_R), jobboards (Analytical Bridge…)
  51. 51. Growing number of conferences about R Conferences REFERENCES FOR LEARNING R  Annual during a few days in new venue (Google it!)  Lots of materials about many topics Other conferences or venues  Conferences about business analytics (data mining, specialized topics…) with sessions involving R  Find (or even start!) a R user group close to your location (R Wiki geographical list, map of groups on « meetup.com »)  Events and news from R-bloggers blog Official International R UseR! conference

×