WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Workshops
in
next-‐genera1on

science
at
UNC
Charlo7e
2014

Workshop
2
-‐
R,
RStudio,
&

reproducible
research
with
knitr

1

R,
RStudio,
&
reproducible

research
with
knitr

2

wings
2014

No
programming
experience
necessary

"we
wanted
users
to
be
able
to
begin
in
an

interac1ve
environment,
where
they
did
not

consciously
think
of
themselves
as

programming.
Then
as
their
needs
became

clearer
and
their
sophis1ca1on
increased,
they

should
be
able
to
slide
gradually
into

programming..."

John
Chambers,
Stages
in
the
Evolu0on
of
S

3

Why
use
R?

•  Free
&
open
source

•  Has
a
lot
of
support

– Popular
in
many
domains
(ﬁnance,
business

analy1cs,
sta1s1cs,
biology)

•  Many
libraries
available
for
biological
data

analysis
through
Bioconductor
project

– Such
as
EdgeR
(today)

•  Now
has
an
easy
to
use,
free
user
interface

called
RStudio

4

RStudio

•  A
very
nice
graphical
user
interface
for
R.

•  It's
free!

•  Integrates
well
with
knitr

– tool
for
wri1ng
sta1s1cal
reports
w/
R
markdown

5

R
Markdown
".Rmd"

•  Lets
you
write
a
report
that
combines
results

and
commands

•  Sounds
weird,
but
once
you
get
used
to
it,
it's

very
powerful

•  Catch
mistakes
before
publica1on

– Ask
a
friend
to
run
&
review
your
data
analysis

6

knitr
&
R
Markdown
enable
literate

programming

•  A
way
to
do
"literate

programming"

–  Developed
by
Donald

Knuth,
Stanford
Computer

Science
professor

•  Literate
programming:

Write
programs
that

explain
what
they
are

doing
while
they
are

doing
it.

•  Prac1cal
applica1on:
Data

Analysis
Reports

7

Plan
for
Today

•  Introduce
R
and
RStudio

– Part
I:
Func1ons
&
plots

– Part
2:
Markdown

– Part
3:
See
how
sta1s1cal
tes1ng
works
in
R

•  Diﬀeren1al
expression
analysis
walk-‐through

(may
extend
into
Workshop
3)

•  Goal:
Get
you
started!

– Lots
of
Web
resources
for
further
study

8

Let's
get
started!

9

Start
RStudio

•  RStudio
has
panes

– w/
min,
max
bu7ons

(top
right)

•  Panes
have
tabs

10

console
where
you
type
commands
environment,
shows

variables
you've

deﬁned

Make
new
project
(Part
1)

•  Select
File
>

Project
>
New

Project
..

•  Choose
New

Directory

11

Make
new
project
(Part
2)

•  Choose
Empty

Project

12

Make
new
project
(Part
3)

•  Choose
Empty

Project

•  Enter

"wings2014"

•  Click
Create

Project

13

Project
name
in

upper
right

corner

14

•  Open
folder
wings2014

•  See
wings2014.Rproj
ﬁle

•  Tip:
Aier
quit,
double-‐click
to

start
RStudio
with
correct

directory
sekngs

15

Enter
commands
in
Console

16

>
symbol
is

the
prompt

•  Type
commands
or

expressions
at
the

prompt,
ENTER

•  R
evaluates
what

you
type,
prints
the

result

•  Returns
prompt

Prac1ce:
Try
arithme1c
expressions

•  Add
+

•  Subtract
-‐

•  Mul1ply
*

•  Raise
to
a
power
**

17

•  Expressions
return
values
as

one-‐element
vectors.

•  [1]
indicates
that
the
value

next
to
it
has
this
index.

Prac1ce:
Save
results
to
variables

18

•  Use
'='
to
assign

result
to
a
variable

– Nothing
printed

•  Type
variable
name

to
see
what's
in
it

•  Use
variables
in

expressions

Variables
refer
to
objects

19

•  Environment
tab
shows
objects
created
thus
far

•  Most
of
what
you
do
in
R
involves
manipula1ng

objects
saved
to
variable
names

– Use
objects
as
inputs
to
func1ons

R
func1ons

•  R
has
many
func1ons

– math

– plokng

– sta1s1cal
tests

•  Func1ons
take
inputs

called
arguments

•  Most
func1ons
have
many
possible

arguments

– Usually
have
reasonable
defaults

20

argument

How
to
use
a
func1on
in
4
steps

1.  Type
func1on
name

2.  Type
"("
open
paren

!  RStudio
types
closing
paren
for

you

3.  Type
arguments

– if
more
than
one
argument,

insert
","
(comma)

4.  Type
ENTER

21

sqrt
calculates

square
root

Prac1ce:

rnorm
func1on

•  rnorm
creates
a
vector
of
numbers
randomly

sampled
from
normal
distribu1on
with
speciﬁed

mean,
standard
devia1on

22

func1on

name

rnorm(10,5,5)!

sample

size

mean

standard

devia1on

arguments

Prac1ce:

rnorm
func1on

•  Mean
and
standard

devia1on
are

op1onal

•  If
you
don't
specify

them,
they
default

default
to:

– 0
default
mean

– 1
default
sd

23

R
1p!

•  Use
UP
arrow
key
to
retrieve
previous

command

– Saves
typing

24

Prac1ce:
R
allows
named
arguments

Order
can

vary

25

rnorm(10,mean=5,sd=2)!

26

•  Type
help(rnorm)
to
list
arguments,

defaults

•  help
is
a
func1on

– takes
other
func1ons
as

arguments

help
shows
how
to
use
a
func1on

Now
you
know
how
to...

•  Calculate
values
&
see
the
result

•  Save
output
to
variables

•  Use
Environment
tab
to
view
variables

•  Use
R
func1ons

Next
-‐-‐-‐
ploKng!!!

27

R
plokng
func1ons

•  Many
op1ons

– generic
x-‐y
plot,
sca7er
plots

– barplots

– dendrograms

– histograms
...
and
much
more

•  Highly
conﬁgurable!

– log
or
linear
scale
axes

– diﬀerent
characters
or
colors
for
points
...
and

much
more

28

Prac1ce:
Generic
x-‐y
plot
(sca7er
plot)

•  named
argument

main
determines

plot
1tle

•  Note:
Enclose
text

in
quotes

29

Prac1ce:
Try
other
op1ons

•  col
-‐
color
of
points

(in
quotes)

•  pch
-‐
point
character

– numeric
code

– le7er
(in
quotes)

30
and
many
more..

Prac1ce:
Histogram
(hist)

•  main
-‐
plot
1tle

(in
quotes)

•  col
-‐
color
of
bars

(in
quotes)

31

Prac1ce:
Adding
to
a
plot
(1)

•  abline -‐
"a
b
line"

–  add
straight
line

•  Arguments:

–  v
or
h
for
loca1on
of

ver1cal
or
horizontal

line

–  a
and
b
for
slope
and

y
intercept

32

Prac1ce:
Adding
to
a
plot
(2)

•  points

–  add
points
to
a
plot

•  Arguments:

–  x
,
y
x
&
y
values
for

the
points

–  other
op1ons,
same

as
for
plot !
33

Take-‐home:
In
R
you
can
"script"
a
plot

•  Using
plokng
commands
like
points,
abline,

lines
you
can
add
more
data
to
a
plot,
element

by
element

•  Most
plokng
commands
accept
the
same

op1ons,
like

– pch
-‐
point
character

– col
-‐
color

•  Learning
one
plokng
command
helps
you

learn
many.

34

Prac1ce:
Graphics
demo

•  Enter

demo(graphics)!
•  Type
ENTER
to
see

next
plot

35

Part
2
-‐
R
Markdown

36

How
to
install
knitr

•  Go
to
Packages
tab

•  Not
checked?

– Check
it

•  Not
installed?

– Select
Tools
>

Install
Packages...

– Enter
knitr

– Click
Install

•  May
need
to

restart
RStudio

37

Setup
-‐
to
enable
be7er
coding!

Go
to
Tools
>
Global
Preferences
>
Panes

•  Top
right:

console

•  Lower
right:

Environment,

History,
Files,

Plots,
Help

•  Top
Lei:

Source

•  Lower
lei:

everything

else

38

Prac1ce:
Make
R
Markdown
ﬁle

•  Click
"new"
ﬁle
icon

•  Choose
R
Markdown

– Creates
an
example
R

Markdown

•  Take
a
moment
to

scan
document

39

R
Markdown
has
plain
text
with

formakng
instruc1ons

•  Row
of
"==="
makes

"Title"
a
top
level

heading

40

R
Markdown
has
code
chunks

•  Code
chunk
-‐
three

back
1cs,
{r},
ends

with
three
more

back
1cs

•  gray
background

41

knitr
"knits"
code
&
text

•  Makes
an
HTML
document
(web
page)
that

combines

– code

– output
from
code

– your
text
explana1ons

42

Prac1ce:
Knit
HTML

•  Save
the
ﬁle
as

"Example.Rmd"

•  Click

•  Preview
appears

•  HTML
ﬁle
appears

•  Click
Example.html

in
File
tab

– choose
View
in
Web

browser

43

knitr
makes
an
HTML
document
(a

Web
page)

•  Images
embedded

•  You
can
email
it,
save
in
a
Dropbox,
etc

44

Prac1ce:
Edit
Example

•  Edit
Plain
text

•  Edit
code
chunks

45

Prac1ce:
Run
commands
in
Markdown

•  Put
cursor
inside

code
chunk

•  Type
CNTRL-‐ENTER

– or
click
run

46

Shortcut:
Chunks
menu
(top
right)

•  Put
cursor
in
a
chunk

•  Use
Run
Current
Chunk
to
run
en1re
chunk

•  Or
Run
All

47

Prac1ce:
Edit
Markdown,
make
plot

look
nicer

•  Use
col
to
add
color

•  Use
las
to
change
orienta1on
of
y
axis

numbers

48

Prac1ce:
Run
the
new
code

49

•  Put
cursor
inside

code
chunk

•  Type
CNTRL-‐ENTER

– or
click
run

Prac1ce:
knit
your
Markdown

50

Sta1s1cal
tests
in
R

•  Tests
implemented
as
func1ons

– Usually
return
list
objects

•  List
is

– object
that
contains
other
objects
of
many
types

•  Previously,
you
saw
vectors

– Output
of
rnorm
command

– Vectors
are
like
lists
that
only
contain
one
type
of

object
(e.g.,
numbers
only)

51

Prac1ce:
Start
a
new
sec1on

•  Heading,
smaller
than

1tle
heading

52

•  Make
new
code
chunk

•  Make
new
vectors

•  Run
t.test!

Tip:
Markdown
help

•  Using
R
Markdown
opens

Web
page
w/
more
info

•  Markdown
Quick
Reference

shows
Markdown
codes
in

Help
tab
53

Prac1ce:
Run
the
code

54

•  t.test
output
is
in
result!
•  result is
a
list

•  Cursor
inside
chunk

•  Type
CNTRL-‐ENTER

– or
click
run

Prac1ce:
Type
result
(variable

name)
in
console
for
a
summary

55

Prac1ce:
Result
is
a
list
with
named

components

•  Use
names
func1on
to
ﬁnd
what
it
contains

•  Use
$
to
retrieve
named
components

56

Diﬀeren1al
expression
analysis

walk-‐through

Eﬀects
of
mild
chronic
heat
stress
on
gene

expression
in
tomato
pollen

57

Goals

•  Show
you
how
to
structure
a
data
analysis

– Useful
framework
you
can
use
in
many
sekngs

•  Give
you
an
example
diﬀeren1al
gene

expression
analysis
for
RNA-‐Seq

– Use
it
as
a
star1ng
point
for
other
projects

– 
Tip:
Review
edgeR
user
guide
for
other
example

data
analyses

58

Structure
of
the
data
analysis

•  Introduc1on

–  explain
the
experimental
design

–  state
ques1ons
(no
more
than
3,
ideally
2)

•  Analysis

–  describe
steps
of
analysis,
with
results

–  explain
judgment
calls,
like
P
value
cutoﬀs

•  Conclusion

–  answer
the
original
ques1ons

•  State
limita1ons
of
the
analysis

•  Session
info
including
soiware
versions
used

Adapted
from
Jeﬀ
Leek's
Data
Analysis,
Coursera

59

Prac1ce:
Setup

•  Go
to

h7ps://bitbucket.org/lorainelab/tomatopollen

60

Download
repository

61

Move
to
Desktop

•  Subfolders
correspond
to
analysis
chunks

–  See
README.md
for
details

•  Open
Diﬀeren0alExpression

Folder
name
suﬃx
based
on
repo
version

62

Double-‐click
".Rproj"
ﬁle
in
Diﬀeren1al

Expression
folder

•  Opens
a
new
RStudio
window

63

Review
of
the
experiment

•  Tomato
plants
subjected
to
chronic
mild
heat

stress
&
control

–  Greenhouse
C

–  Greenhouse
B

•  Mature
pollen
grains
harvested
in
batches
over

eight
weeks,
~
10
plants
per
batch

–  One
treatment
sample,
one
control
sample
per

collec1on

•  RNA
extracted,
sent
to
UCLA
for
sequencing

–  10
libraries,
5
treatments,
5
controls,
69
base
paired

end
sequencing

64
Next:
Step-‐by-‐step
walk-‐through
of
R
Markdown

WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Similar to WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr (20)

More from Ann Loraine

More from Ann Loraine (15)

Recently uploaded

Recently uploaded (20)

WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr