Upcoming SlideShare
×

# Introduction to Data Analysis with R

1,936 views

Published on

Slides used in a workshop at Trovit. The data and R source code can be found at: https://bitbucket.org/danisola/intro-r-trovit

2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,936
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
10
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Introduction to Data Analysis with R

1. 1. Introduction to Data Analysis with R Dani Solà
2. 2. What is R?● “R is a language and environment for statistical computing and graphics”● Paradigms: array, object-oriented, imperative, functional, procedural, reflective● Everything resides in memory (no big data)● Easy to get started!
3. 3. Why R?● Free Software (GNU General Public License)● Mature, v1.0 released on 2000● Widely used● Good documentation and manuals● Lots of freely available packages● Excellent graphic capabilities
4. 4. Getting the data (CSV)● MySQL SELECT * INTO OUTFILE /path/to/file.csv FIELDS TERMINATED BY , OPTIONALLY ENCLOSED BY " ESCAPED BY ‘’ LINES TERMINATED BY n FROM table WHERE <condition>;● Hive + sed INSERT OVERWRITE LOCAL DIRECTORY /tmp_path/ SELECT * FROM table WHERE <condition>; cat /tmp_path/* | sed s/[Ctrl-V][Ctrl-A]/t/g > out.txt● Consider sampling!
5. 5. Linear Regressiony=α+β x n̂ ∑i=1 ( xi − ̄ )( y i − ̄ ) Cov [ x , y ] x yβ= = n Var [ x ] ∑i=1 ( x i − ̄ ) x 2̂ y ̂α= ̄ −β xJust use lm() in R! (But check the assumptions)
6. 6. Want more?● Computing for Data Analysis – Roger D. Peng www.coursera.org/course/compdata● Statistics One – Andrew Conway www.coursera.org/course/stats1● An Introduction to R – The R Core Team cran.r-project.org/doc/manuals/r-release/R-intro.pdf