Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Yva Delva Resume 2016 by Yva Delva 58 views
- Grammar Of Graphics: past, present,... by Hadley Wickham 5274 views
- Introduction to Data Analysis with R by Dani Solà Lagares 1948 views
- R language, an introduction by Venkatesh Prasad ... 1177 views
- Geospatial Data in R by Barry Rowlingson 5855 views
- Introduction to R by Kazuki Yoshida 2724 views

13,174 views

Published on

A slightly different introduction to #Rstats. Data frames, linear models and plots with ggplot2.

Published in:
Technology

No Downloads

Total views

13,174

On SlideShare

0

From Embeds

0

Number of Embeds

6,659

Shares

0

Downloads

39

Comments

0

Likes

2

No embeds

No notes for slide

- 1. THE WRIGHT LAB COMPUTATION LUNCHES An introduction to R Gene expression from DEVA to differential expression Handling massively parallel sequencing data
- 2. R: data frames, plots and linear models Martin Johnsson
- 3. statistical environment
- 4. free open source
- 5. scripting language
- 6. great packages (limma for microarrays, R/qtl for QTL mapping etc)
- 7. You need to write code!
- 8. ( Why scripting? harder the first time easier the next 20 times … necessity for moderately large data )
- 9. R-project.org Rstudio.com
- 10. Interface is immaterial ssh into a server, RStudio, alternatives very few platform-dependent elements that the end user needs to worry about
- 11. Scripting write your code down in a .R file run it with source ( ) ## comments
- 12. if it runs without intervention from start to finish, you’re actually programming
- 13. Help! within R: ? and tab your favourite search engine ask (Stack Exchange, R-help mailing list, package mailing lists)
- 14. First task: make a new script that imports a data set and takes a subset of the data
- 15. Reading in data Excel sheet to data.frame one sheet at a time clear formatting short succinct column names export text read.table
- 16. Subsetting logical operators ==, !=, >, <, ! subset(data, column1==1) subset(data, column1==1 & column2>2)
- 17. Indexing with [ ] first three rows: data[c(1,2,3),] two columns: data[,c(2,4)] ranges: data[,1:3]
- 18. Variables columns in expressions data$column1 + data$column2 log10(data$column2) assignment arrow data$new.column <- log10(data$column2) new.data <- subset(data, column1==10)
- 19. Exercise: Start a new script and save it as unicorn_analysis.R. Import unicorn_data.csv. Take a subset that only includes green unicorns.
- 20. RStudio: File > New > R Script data <- read.csv("unicorn_data.csv") green.unicorns <- subset(data, colour=="green")
- 21. Anatomy of a function call function.name(parameters) mean(data$column) mean(x=data$column) mean(x=data$column, na.rm=T) ?mean
- 22. mean(exp(log(x)))
- 23. programming in R == stringing together functions and writing new ones
- 24. Using a package (ggplot2) to make statistical graphics.
- 25. install.packages("ggplot2") library(ggplot2)
- 26. qplot(x=x.var, y=y.var, data=data) only x: histogram x and y numeric: scatterplot or set geometry (geom) yourself
- 27. geoms: point line boxplot jitter – scattered points tile – heatmap and many more
- 28. Exercise: Make a scatterplot of weight and horn length in green unicorns. Write all code in the unicorn_analysis.R script. Save the plots as variables so you can refer back to them.
- 29. 33 horn.length 30 27 24 250 300 350 weight 400 450 green.scatterplot <- qplot(x=weight, y=horn.length, data=green.unicorns)
- 30. Exercise: Make a boxplot of horn length versus diet.
- 31. 40 horn.length 35 30 25 candy diet flowers qplot(x=diet, y=weight, data=unicorn.data, geom="boxplot")
- 32. Small multiples split the plot into multiple subplots useful for looking at patterns qplot(x=x.var, y=y.var, data=data, facets=~variable) facets=variable1~variable2
- 33. Exercise: Again, make a boxplot of diet and horn length, but separated into small multiples by colour.
- 34. green 40 pink horn.length 35 30 25 candy flowers diet candy flowers qplot(x=diet, y=horn.length, data=data, geom="boxplot", facets=~colour)
- 35. Comparing means with linear models
- 36. Wilkinson–Rogers notation one predictor: y ~ x additive model: y ~ x1 + x2 interactions: y ~ x1 + x2 + x1:y2 or y ~ x1 * x2 factors: y ~ factor(x)
- 37. Student’s t-test
- 38. t.test(variable ~ grouping, data=data) alternative: two sided, less, greater var.equal paired geoms: boxplot, jitter
- 39. Carl Friedrich Gauss least squares estimation
- 40. 40 horn.length 35 30 25 250 300 350 weight 400 450
- 41. linear model y = a + b x + e, e ~ N(0, sigma) lm(y ~ x, data=some.data) formula data summary( ) function
- 42. Exercise: Make a regression of horn length and body weight.
- 43. model <- lm(horn.length ~ weight, data=data) summary(model) Call: lm(formula = horn.length ~ weight, data = data) Residuals: Min 1Q Median -6.5280 -2.0230 -0.1902 3Q 2.5459 Max 7.3620 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.41236 3.85774 5.291 1.76e-05 *** weight 0.03153 0.01093 2.886 0.00793 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.447 on 25 degrees of freedom (3 observations deleted due to missingness) Multiple R-squared: 0.2499, Adjusted R-squared: 0.2199 F-statistic: 8.327 on 1 and 25 DF, p-value: 0.007932
- 44. What have we actually fitted? model.matrix(horn.length ~ weight, data) What were the results? coef(model) How uncertain? confint(model)
- 45. Plotting the model regression equation y = a + x b a is the intercept b is the slope of the line pull out coefficients with coef( ) a plot with two layers: scatterplot with added geom_abline( )
- 46. 40 horn.length 35 30 25 250 300 350 weight 400 450 scatterplot <- qplot(x=weight, y=horn.length, data=data) a <- coef(model)[1] b <- coef(model)[2] scatterplot + geom_abline(intercept=a, slope=b)
- 47. A regression diagnostic the linear model needs several assumptions, particularly linearity and equal error variance the residuals vs fitted plot can help spot gross deviations
- 48. 8 residuals(model) 4 0 -4 28 30 32 fitted(model) 34 qplot(x=fitted(model), y=residuals(model))
- 49. Photo: Peter (anemoneprojectors), CC:BY-SA-2.0 http://www.ﬂickr.com/people/anemoneprojectors/
- 50. analysis of variance aov(formula, data=some.data) drop1(aov.object, test="F") F-tests (Type II SS) post-hoc tests pairwise.t.test TukeyHSD
- 51. Exercise: Perform analysis of variance on weight and the effects of diet while controlling for colour.
- 52. We want a two-way anova with an F-test for diet. model.int <- aov(weight ~ diet * colour, data=data) drop1(model.int, test="F") model.add <- aov(weight ~ diet + colour, data=data) drop1(model.add, test="F") Single term deletions Model: weight ~ diet + colour Df Sum of Sq RSS AIC F value Pr(>F) <none> 85781 223.72 diet 1 471.1 86252 221.87 0.1318 0.71975 colour 1 13479.7 99260 225.66 3.7714 0.06396 . --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- 53. Black magic none of the is really R’s fault, but things that come up along the way
- 54. Black magic none of the is really R’s fault, but things that come up along the way missing values: na.rm=T, na.exclude( )
- 55. Black magic none of the is really R’s fault, but things that come up along the way missing values: na.rm=T, na.exclude( ) type I, II and III sums of squares in Anova
- 56. Black magic none of the is really R’s fault, but things that come up along the way missing values: na.rm=T, na.exclude( ) type I, II and III sums of squares in anova floating-point arithmetic, e.g. sin(pi)
- 57. Reading Daalgard, Introductory statistics with R, electronic resource at the library Faraway, The linear model in R Gelman & Hill, Data analysis using regression and multilevel/hierarchical models Wickham, ggplot2 book tons of tutorials online, for instance http://martinsbioblogg.wordpress.com/a-slightlydifferent-introduction-to-r/
- 58. Exercise More (and some of the same) analysis of the unicorn data set. Use the R documentation and google. I will post solutions.

No public clipboards found for this slide

Be the first to comment