20140710 6 c_mason_ercc2.0_workshop

Mul$-‐pla(orm
and
cross-‐methodological

reproducibility
of
transcriptome

proﬁling
by
RNA-‐seq
in
the
ABRF
Next-‐
Genera$on
Sequencing
Study
(ABRF-‐NGS)

and
FDA’s
SEQC
Study

Christopher
E.
Mason,
Ph.D.

Assistant
Professor

Department
of
Physiology
and
Biophysics
&

The
Ins$tute
for
Computa$onal
Biomedicine
at

Weill
Cornell
Medical
College
and
the

Tri-‐Ins$tu$onal
Program
on
Computa$onal
Biology
and
Medicine

Aﬃliate
Fellow
of
the
Informa$on
Society
Project,
Yale
Law
School

July
10th,
2014
@mason_lab

0.

Background

1.  RNA
Standards

2.  Removing
RNA-‐Seq
Varia$on

3.  Epitranscriptome

4.  #JustSequenceIt

Exome:

2%
of
the

genome?

Human Genome Organization
from
Mason,
State,
and
Moldin,
Kaplan
&
Sadock’s
Comprehensive
Textbook
of
Psychiatry,
2009

ENCODE
ac$ve
elements

“These
data
enabled
us
to
assign
biochemical
func$ons
for

80%
of
the
genome,

in
par$cular
outside
of
the
well-‐studied
protein-‐coding
regions.”

“The
ENCODE
results
were
predicted
by
one
of
its
authors
to

necessitate
the
rewri$ng
of
textbooks.”

“We
agree,
many
textbooks
dealing
with
marke;ng,
mass-‐media

hype,
and
public
rela;ons
may
well
have
to
be
rewriAen.”

GENCODE
annota;on
(v19)

July
10th,
2014

Coding
genes:
20,345

Noncoding
genes:
22,883

Psuedogenes:
14,206

Other
genes:
516

57,820
total

RNA-‐Seq
and
all
its
ﬂavors

create
excitement

Differential expression by
gene, exon, splice isoform,
allele, & transcript!
Algorithms: STAR, r-make, ASE,
limma-voom, RSEM"
3!
reads"
alignments"
Sorted
BAMs"
GENCODE "
annotation"
queries "
Reads/bp"
Alignment on HPC nodes"
References!
hg19"
RefSeq"
miRBase"
rRNA"
Adapters"
Find ncRNAs and new genes!
Algorithms: r-make, Aceview"
4!
Sequencing Data"
Gene fusion detection!
Algorithms: r-make, Snowshoes"
5!
Genetic variation (SNVs and
indels)!
Algorithms: STAR/GATK, r-make"
2!
Predict polyA sites & gain/loss
of miRNA binding sites!
Algorithms: r-make, BAGET
AlexaSeq, TargetScan"
6!
!
Viruses/Bacteria/Other
Algorithm: MG-RAST"
Remaining Reads!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT
CTGCTTTAGGATAGATCGATAGCTAGTTCATC!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT!
7!
1!
RNA-‐seq
=
Love

R-‐make
tools

But!

There
is
some
noise

What
is
the
source
of
the
wiggles?
Genes

ESTs

Liver
Kidney
Adult Brain
Fetal Hippocampus
Fetal Amygdala

The
Dirty
Dozen:
>=
12
Sources
of
Technical
Noise
in
RNA-‐Seq

(1)
RNA
integrity:
Sample
purity
or
degrada$on

(2)
Sample
RNA
complexity:
polyA
RNA,
total
RNA,
miRNA

(3)
cDNA
synthesis:
random
hexamer
vs.
polyA-‐primed

(4)
Library
isola;on:
Gel
excision
vs.
column

(5)
Technical
Errors:
Machine,
Site,
Lane,
Technician,
Library
Size

(6)
Amplifica;on
Cycles
or
Methods:
NuGen,
Tn5,
Phi29

(7)
Input
amount:
(1,
10,
100,
1000
cells)

(8)
Algorithms:
for
alignment
and
assembly

(9)
Fragment
size
distribu;on:
Paired-‐End,
Single-‐End
(adaptors)

(10)
Liga;on
Efficiency:
Mul$plexing/Barcoding
and
RNA
ligases

(11)
Depth
of
Sequencing:
cost/benefit
point

(12)
RNA
fragmenta;on:
ca$on,
enzyma$c

Which
type
of
RNA?

Type Abbreviation Function Organisms
7SK$RNA$ 7SK$ negatively$regulating$CDK9/cyclin$T$complex$ metazoans
Signal$recognition$particle$RNA$ 7SRNA Membrane$integration$ All$organisms$
Antisense$RNA$ aRNA$ Regulatory All$organisms$
CRISPR$RNA$ crRNA$ Resistance$to$parasites Bacteria$and$archaea$
Guide$RNA$ gRNA$ mRNA$nucleotide$modification$ Kinetoplastid$mitochondria$
Long$noncoding$RNA$ lncRNA XIST$(dosage$compensation),$HOTAIR$(cancer) Eukaryotes$
MicroRNA$ miRNA$ Gene$regulation$ Most$eukaryotes$
Messenger$RNA$ mRNA$ Codes$for$protein$ All$organisms
PiwiRinteracting$RNA$ piRNA$ Transposon$defense,$maybe$other$functions$ Most$animals$
Repeat$associated$siRNA$ rasiRNA$ Type$of$piRNA;$transposon$defense$ Drosophila$
Retrotransposon$ retroRNA selfRpropagation Eukaryotes$and$some$bacteria$
Ribonuclease$MRP$ RNase$MRP$ rRNA$maturation,$DNA$replication$ Eukaryotes$
Ribonuclease$P$ RNase$P$ tRNA$maturation$ All$organisms$
Ribosomal$RNA$ rRNA$ Translation$ All$organisms
Small$Cajal$bodyRspecific$RNA$ scaRNA$ Guide$RNA$to$telomere$in$active$cells Metazoans
Small$interfering$RNA$ siRNA$ Gene$regulation$ Most$eukaryotes$
SmY$RNA$ SmY$ mRNA$transRsplicing$ Nematodes$
Small$nucleolar$RNA$ snoRNA$ Nucleotide$modification$of$RNAs$ Eukaryotes$and$archaea$
Small$nuclear$RNA$ snRNA$ Splicing$and$other$functions$ Eukaryotes$and$archaea$
TransRacting$siRNA$ tasiRNA$ Gene$regulation$ Land$plants$
Telomerase$RNA$ telRNA Telomere$synthesis$ Most$eukaryotes$
TransferRmessenger$RNA$ tmRNA$ Rescuing$stalled$ribosomes$ Bacteria$
Transfer$RNA$ tRNA$ Translation$ All$organisms
Viral$Response$RNA$ viRNA AntiRviral$immunity C$elegans
Vault$RNA$ vRNA$ selfRpropagation Expulsion$of$xenobiotics
Y$RNA$ yRNA RNA$processing,$DNA$replication$ Animals$

And
-‐
which
one
do
we
use?

Technologies
Bifurcate
into
two
main
realms:

Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0Read
Illumina HiSeq2500 BridgePCR/cluster Rev.<Term.,<SBS 100 150
Illumina HiSeq2000 BridgePCR/cluster Rev.<Term.,<SBS 100 150
Illumina MiSeq BridgePCR/cluster Rev.<Term.,<SBS 250 300
GnuBio GnuBio emPCR HybFAssist<Sequencing 1000* 64,000*
Life<Technologies SOLiD<5500 emPCR Seq.<by<Lig. 75 100
LaserGen LaserGen emPCR Rev.<Term.,<SBS 25* 100*
Pacific<Biosciences RS Polymerase<Binding RealFtime 1800 15,000
454 Titanium emPCR PyroSequencing 650 1100
454 Junior emPCR PyroSequencing 400 650
Helicos Heliscope adaptor<ligation Rev.<Term.,<SBS 35 57
Intelligent<BioSystems MAXFSeq Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300
Intelligent<BioSystems MINIF20 Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300
ZS<Genetics N/A Atomic<Lableing Electron<Microscope N/A N/A
Halcyon<Molecular N/A N/A Direct<Observation<of<DNA N/A N/A
Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0Read
IBM<DNA<Transistor N/A none Microchip<Nanopore N/A N/A
NABsys N/A none Nanochannel N/A N/A
Bionanogenomics N/A anneal<7mers Nanochannel 10,000* 1,000,000*
Life<Technologies PGM emPCR SemiFconductor 150 300
Life<Technologies Proton emPCR SemiFconductor 120 240
Life<Technologies Proton<2 emPCR SemiFconductor 400* 800*
Genia N/A none Protein<nanopore<(aFhemalysin) N/A N/A
Oxford<Nanopore MinION none Protein<Nanopore 10,000 10,000*
Oxford<Nanopore GridION<2K none Protein<Nanopore 10,000 500,000*
Oxford<Nanopore GridION<8K none Protein<Nanopore 10,000 500,000*
*Values<are<estimates<from<companies<that<have<not<yet<released<actual<data
Optical<Sequencing
Electical<Sequencing
Table<1:<Types<of<HighFThroughput<Sequencing<Technologies
Mason
CE,
Porter
SG,
Smith
TM.
Adv
Exp
Med
Biol.
2014

0.

Genomes

1.  RNA
Standards

2.  Removing
RNA-‐Seq
Varia$on

3.  Epitranscriptome

4.  #JustSequenceIt

The complexity of the transcriptome is vast
exon1

exon2

exon3

exon1-‐exon2

exon1-‐exon3

exon2-‐exon3

exon1-‐exon2-‐exon3

6
63
15

7
127
21

8
255
28

3
7
3

4
15
6

5
31
10

1
1
0

2
3
1

Exons
Variants
Junc;ons

2n-1
Exon
1
Exon
2
Exon
3

Exon
1
Exon
2
Exon
3

n(n-1)
2
Exon4

Exon4

exon4

exon1-‐
exon4

exon2-‐exon4

exon3-‐exon4




exon1-‐exon2-‐exon3-‐exon4

8x1083 theoretical transcript combinations
8x1080 atoms in the universe
(159 atoms/star, 111 stars/galaxy, 110 galaxies)Li
and
Mason,
ARGHG,
2014

Es$mate
of
False
Junc$on
Rate

5’
3’

Exon-‐Edge
DB

(3’-‐5’
junc$ons)

2 3 41Exon # :
False
Exon-‐Edge
DB

(3’-‐3’
junc$ons)

False
Exon-‐Edge
DB

(5’-‐5’
junc$ons)

False
Exon-‐Edge
DB

(5’-‐3’
junc$ons)

5’
3’

5’
3’

5’
3’

FDA’s Sequencing Quality Control (SeQC) Consortium:
Seq Standards from 323 members of
Government (FDA), Industry, Academia
>RNA sequence from pilot study based on MAQC
samples and MAQC design:
~Pilot data from 2008-2012
MAQC Sample A/B, & ERCCs from:
•  454
•  Helicos
•  Illumina
•  Lifetech
~1000Gb of Illumina BodyMap data
~300Gb of experimental RNA-Seq

Associa;on
of
Biomolecular
Research
Facili;es
(ABRF)

Next
Genera;on
Sequencing
(NGS)
Study:

Phase
I:
RNA-‐Seq
and
degraded
RNA-‐Seq
(2011-‐2013)

Phase
II:
DNA-‐Seq
and
Hard-‐to-‐sequence
regions
(2014-‐2016)

Phase
III:
Asteroid
and
Mar$an
Surface
Sequencing
(2017-‐2050?)

1.  Cross
Pla(orm:
HiSeq2000,
454,
PGM,
Proton,
PacBio

2.  Cross-‐Protocol:
ribo-‐,
direc$onal,
non-‐direc$onal,
degraded
RNA

3.  Cross-‐Site:
3
sites
for
each
pla(orm,
replicates
at
each
site

Improve
the
detec$on
and
quan$ﬁca$on
of
molecular
signatures

for
use
in
basic,
applied,
and
clinical
research.

Speciﬁcally,
the
ABRF-‐NGS
group
will
test
and
characterize
the
myriad,

rapidly
evolving
NGS
techniques
for
RNA-‐Seq
and
DNA-‐Seq.

Samples
of
the
MAQC,
SEQC
and
ABRF-‐NGS
Study

ß
SEQC
samples
(FDA)
A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Paciﬁc Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
A:
10
cancer
cell
lines

B:
Brain
$ssues

A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-‐pla(orm
experimental
design

A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-‐Site
experimental
design

A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-‐Protocol
experimental
design

All
technologies
have
strengths
&
weaknesses

Vendor Instrument Version
Run Time
(hours)
Read
Length
(mean)
Reads
per Run
(million)
Yield per
Run
Cost per
Run
($)1
Cost
per Mb
($)
Paired-
end
Application Strengths
Illumina HiSeq 2000 High Output 132 50 6,000 300 Gb2
18,725.00 0.06 Yes
Gene expression; splice
junction detection;
variant calling; fusions
Deep read counts for
transcript
quantification
Illumina HiSeq 2500 High Output 132 50 6,000 300 Gb2
18,725.00 0.06 Yes as above as above
Illumina MiSeq v2 kit 39 250 30 7.5 Gb 982.75 0.13 Yes
Splice junction
detection; variant calling
Rapid transcript
quantification and
variant detection
Life
Technologies
PGM 318 chip 7.3 176 6 1.056 Gb 749.00 0.71 No
Splice junction
detection; variant calling
Rapid transcript
quantification and
variant detection
Life
Technologies
Proton
Proton I
chip
2 to 4 81 70 5.67 Gb 834.00 0.15 No
Gene expression;
variant calling
Good read depth and
length for transcript
quantification
Pacific
Biosciences
RS RS 0.5 to 2 1,289 0.03 38.67 Mb 136.38 3.53 No
Splice junction
detection; full length
gene coverage
Extended read
lengths
Roche 454
Genome
Sequencer
FLX+
20 686 1 686 Mb 5,985.00 8.72 No Splice junction detection Read length
1
sequencing reaction reagents, at academic list price U.S. dollars, and does not include library preparation reagents, labor, data storage or analysis, equipment
or maintenance; 2
one sequencing run using 2 flowcells. Pacific Biosciences is calculated based on CCS reads; Gb: billion bases, Mb: million bases
Table 1. Summary of RNA-seq platforms as deployed in the ABRF-NGS Study.

I:
Inter-‐pla(orm
performance

Error
models
highly
variable
among

pla(orms

0%
2%
4%
6%
454
ILMN
PAC
PGM
PRO
platform
Indelsmismatchspercentage
b
454:
Roche
454
GS
FLX+

ILMN:
Illumina
HiSeq
2000/2500

PAC:
Paciﬁc
Biosciences
RS
I

PGM:
Ion
Personal
Genome
Machine

PRO:
Ion
Torrent
Proton

Full
length
genebody
coverage

Coverage across genebody (%)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
5'
platform
454
sample
site
Y
type
intact
type
site
sample
platformm
454
ILMN
PAC
PGM
PRO

Note:
PAC
used
CCS
read
only;
PAC
on
average
only
cover
3k
GENCODE
genes.

Inter-‐site
normalized
gene
expression

varia$on:
CV
and
R2

●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●●●●●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
0
50
100
150
200
454−A
454−B
ILMN−A
ILMN−B
PAC−A1
PAC−A2
PAC−A3
PAC−B1
PAC−B2
PAC−B3
PGM−A
PGM−B
PRO−A
PRO−B
Coefficientofvariation(%)
454 ILMN PAC PGM PRO
intra−platform
A
intra−platform
B
inter−platform
A
inter−platform
B
0.92
0.97
0.91
0.96
0.85
0.98
0.95
0.95
0.82
0.88
0.85
0.84
0.93
0.89
0.75
0.82
0.78
0.86
0.92
0.92
0.00
0.25
0.50
0.75
1.00
454
ILMN
PGM
PRO
454
ILMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
platform
Correlationcoefficients
● 454 ILMN PAC PGM PRO
● ●A B
es
A B● 454 ILMN PAC PGM PRO
● ●A B
gd e f
a b
ILMN
with
lowest
inter-‐site
CV
ILMN
with
highest
inter-‐site
R2

ILMN-‐PRO
&
PRO-‐PGM
with
highest
R2

Genes
detec$on
is
log-‐linear;

Junc$on
detec$on
is
length-‐dependent
454−A
454−B
ILMN−A
ILMN−B
PAC−A1
PAC−A2
PAC−A3
PAC−B1
PAC−B2
PAC−B3
PGM−A
PGM−B
PRO−A
PRO−B
454
LMN
PGM
●
●
50000
100000
150000
200000
9.0
9.5
10.0
10.5
11.0
bases sequenced (log10)
detectedjunctions
● ●A B
●●
10000
20000
30000
9.0
9.5
10.0
10.5
11.0
bases sequenced (log10)
detectedgenes
● ●A B
d e

Junc$ons
detec$on
eﬃciency
highly

variable;
agreement
common

PRO
454
LMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
LMN−PGM
LMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
LMN−PGM
LMN−PRO
PGM−PRO
platform
A−AH
A−AS
A−AR
AH−AS
AH−AR
AR−AS
B−BR
sample
0
25
50
75
454
ILMN
PAC
PGM
PRO
platform
junctionspermillionbases
A B
known novel
0
20000
40000
60000
1
2
3
4
5
3
4
5
occurrence among platforms
junctions
A Bgf
Note:
Only
use
a
subset
of
reads
among
pla(orms

to
normalize
the
scale

Most
of
the
known
junc$ons
are

shared
by
at
least
3
pla(orms

Sequencing
depth
is
important
to

discover
low
abundance
transcripts

Inter-‐pla(orm
diﬀeren$al
gene

expression
show
88-‐97%
agreement

Shared
sets
of
greater

than
1000
genes
are

indicated
in
red,

100-‐999
yellow,

<100
blue.

Unique
DEGs:

454

-‐
3.0%

POLYA
-‐
9.2%

RIBO

-‐
8.8%

PRO

-‐
11.9%

PGM

-‐
3.9%

Propor$onal
venn
diagrams
don’t

always
add
clarity,
but
they
are
prewy
RIBO
POLYA
454
PGM
PRO
1766
482
2046
202
69
105
2199
1104
484158
678 76 1828
740
1010
401
133
559
62
20
36595
2792
1187
406
1791
212
57
99
2151

II:
Inter-‐protocol
quan$ﬁca$ons

•  Illumina

– Poly-‐A
selec$ons
(POLYA)

– Ribo-‐deple$on
(RIBO)

Gene
regions
distribu$on
varies
between
protocols

a" b"MappingDistribution(%)!
0%
25%
50%
75%
100%
Sample
Mappingdistribution(%) intergenic mito exon 5utr intron ribo
a

Very
similar
gene
measurements
of
genes

(RPKM)
between
polyA
&
Ribo-‐deple$on

Ribo0>PolyA+ PolyA>Ribo0+
RPKM%diﬀerence%(polyA%–%ribo0)%
Number%of%Genes%

ENSEMBL Gene ID Gene Symbol Description Length PolyA
Reads
RiboDep
Reads
PolyA
FPKM
RiboDep
FPKM
FPKM Diff
ENSG00000210082 J01415.4 Mt_rRNA 1559 17040155 133679955 18568.31 200404.74 7181836.43
ENSG00000211459 J01415.24 Mt_rRNA 954 2232892 14445488 3976.16 35389.28 731413.11
ENSG00000202198 RN7SK misc_RNA 331 3303 2818202 16.95 19899.03 719882.08
ENSG00000258486 RN7SL1 antisense 300 3061 520624 17.33 4055.93 74038.60
ENSG00000202364 SNORD3A snoRNA 216 718 186779 5.65 2020.98 72015.33
ENSG00000202538 RNU472 snRNA 141 168 102560 2.02 1699.99 71697.97
ENSG00000251562 MALAT1 lincRNA 8708 437102 4931496 85.27 1323.57 71238.30
ENSG00000199916 RMRP misc_RNA 264 148 83019 0.95 734.96 7734.00
ENSG00000201098 RNY1 misc_RNA 113 172 19210 2.59 397.32 7394.73
ENSG00000238741 SCARNA7 snoRNA 330 53 50182 0.27 355.40 7355.13
ENSG00000200795 RNU471 snRNA 141 35 18724 0.42 310.36 7309.94
ENSG00000200087 SNORA73B snoRNA 204 336 23778 2.80 272.42 7269.62
ENSG00000199568 RNU5A71 snRNA 116 38 10224 0.56 205.99 7205.44
ENSG00000212232 SNORD17 snoRNA 237 102 20571 0.73 202.86 7202.13
ENSG00000207008 SNORA54 snoRNA 123 8 9655 0.11 183.46 7183.35
ENSG00000200156 RNU5B71 snRNA 116 28 8301 0.41 167.25 7166.84
ENSG00000254911 SCARNA9 antisense 353 501 19842 2.41 131.37 7128.96
ENSG00000230043 TMSB4XP6 pseudogene 135 0 6272 0.00 108.58 7108.58
ENSG00000239039 SNORD13 snoRNA 104 0 4521 0.00 101.60 7101.60
ENSG00000223336 RNU276P snRNA 190 22 6365 0.20 78.29 778.10
Supplemental Table 3 - Top 25 Genes with Highest Enrichment from Ribo-Depletion Preparation

polyA
favors
high
expressors

•  Gencode14

•  Unique
genes:
55889

0

5,000

10,000

15,000

20,000

25,000

30,000

≥
2
≥
4
≥
8
≥
16
≥
32
≥
64
≥
128

PolyA

RiboZero

0

5,000

10,000

15,000

20,000

25,000

30,000

≥
2
≥
4
≥
8
≥
16
≥
32
≥
64
≥
128

Read
count

Genome-‐level
Gene
Counts
Gene
Overlap

Read
count

High-‐overlap
DEGs
and

comparable
accuracy

FDR: 0.001 FDR: 0.01 FDR: 0.05
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
FC:1.5FC:2
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
comparison
Proportion
type POLYA RIBO Shared
b"
B−C
itivity)
.81.0
FDR: 0.05; FC:1.5
B−D
itivity)
.81.0
FDR: 0.05; FC:1.5
C−D
itivity)
.81.0
FDR: 0.05; FC:1.5
FC: 1.5
FDR: 0.001
FC: 1.5
FDR: 0.01
FC: 1.5
FDR: 0.05
FC: 2
FDR: 0.001
FC: 2
FDR: 0.01
FC: 2
FDR: 0.05
0.0
0.2
0.4
0.6
0.8
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
comp
Externalvalidation(MCC)
RIBO POLYAc
Using
TaqMan
as
external
valida$on
812
reac$ons

0.0
0.5
1.0
1.5
2.0
0
10
20
30
40
50
60
70
80
90
100
Gene (%, 5'−3')
Coverage(%)
UHR−RIN0 UHR−RIN3 UHR−RIN6 UHR−RIN9

Genetic variation (SNVs, indels, CNVs)!
2!
Differential expression by gene, exon,
splice isoform, allele, & transcript!3!
Predict gene fusions, polyA sites and
alterations to miRNA binding sites!
5!
!
Viruses/Bacteria/Other"
De Novo Assembly of Remaining Reads!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT
CTGCTTTAGGATAGATCGATAGCTAGTTCATC!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT!
6!
reads"
alignments"
Sorted
BAMs"
annotations"
queries "
Reads/bp"
Alignment on HPC nodes"
References!
hg19"
RefSeq"
miRBase"
rRNA"
Adapters"
Find ncRNAs and new genes!
4!
1!
Sequencing Data"
Many
areas
could
be
impacted

Entropy
is
usually
a
source
of
fear

"FEAR is
the
main
source
of

superstition,

and
one
of
the
main
sources
of

cruelty.
To
conquer
fear
is
the
beginning
of

wisdom."

–
Bertrand
Russell

HBR−RIN9
HBR−RIN0
UHR−RIN0
UHR−RIN3
UHR−RIN9
UHR−RIN6
HBR−RIN9
HBR−RIN0
UHR−RIN0
UHR−RIN3
UHR−RIN9
UHR−RIN6
0.6 0.7 0.8 0.9 1
Value
Color Key

Can
we
remove
supers$$on?

HEAT

99oC
for

10
minutes

SONICATION

6
x
55
seconds

at
5%DC,

I

intensity
3

100
c/b

RNAse

RNAse
A

1
ng/uL

Un$l
RIN
<2

Degraded
RNA
looks
great!

Coverage across genebody (%)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
5'
platform
454
sample
site
Y
type
intact
type
site
sample
platformm
454
ILMN
PAC
PGM
PRO

Degraded
RNA
highly
correlates
with

intact
RNA
gene
expression

atform intra−platform
B
inter−platform
A
inter−platform
B
0.96
0.85
0.98
0.95
0.95
0.82
0.88
0.85
0.84
0.93
0.89
0.75
0.82
0.78
0.86
0.92
0.92
PRO
454
ILMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
platform
0.95
0.94
0.94
0.96
0.97
0.97
0.96
0.00
0.25
0.50
0.75
1.00
A−AH
A−AS
A−AR
AH−AS
AH−AR
AR−AS
B−BR
sample
Correlationcoefficients
c
B
A B
A
B
C
D
A
B
RS
Life Tech
Ion P
(AR)

(AS)

(AH)

(BR)

Illumina
Ribo-‐deple$on
protocol

How
Sequencing
Quality
aﬀect

Inter-‐site
gene
expression

quan$ﬁca$on?

SEQC
study

Ameliora$ng
Inter-‐site
Varia$on

SEQC
samples
Each site has
4 replicates
ILM1 ILM2
Replicate 5
ILM3
Replicate 5
ILM4 ILM5
Replicate 5
ILM6
Replicate 5 libraries
were vendor supplied
SEQC
study

HiSeq
2000

Ameliora$ng
Inter-‐site
Varia$on

SEQC
samples
Each site has
4 replicates
ILM1 ILM2
Replicate 5
ILM3
Replicate 5
ILM4 ILM5
Replicate 5
ILM6
SEQC
study

Determine
sequencing
varia$on
sources

SEQC
study

Nucleo$de
frequency
bias
from
library
prepara$on
●M5 ILM6
A G C T
23
24
25
26
27
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
Position in read
Nucleotidefrequency(%)
ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
1 5
site
1 2 3 4 5
● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
site
1 2 3 4 5
● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
f
SEQC
study


<-‐
5th
replicate

of
ILM2
and

ILM3

<-‐
1th
replicate

of
ILM2

<-‐
1th
replicate

of
ILM3

Sample
A
vs.
A
=
the
Answer
should
be
zero
DEGs;

SVA
can
remove
false
posi$ve
DEGs

te
0.957
0.958
0.958
0.971
0.973
0.970
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
1 2
1
BBBBBB
B
B
B
BBB
B
B
BBBB
BBBB
B
BBBB
●
●
●
●
●
●
ILM1
ILM2
ILM3
ILM4
ILM5
ILM6
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
ABCD
ILM1−ILM2
ILM1−ILM3
ILM1−ILM4
ILM1−ILM5
ILM1−ILM6
ILM2−ILM3
ILM2−ILM4
ILM2−ILM5
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
Comparison
FalsepositiveDEGs
Without SVA With SVA
c
Illumina

SEQC
study
Leek
JT,
Storey
JD.PLoS
Genet.2007

Compare
the
same

sample
across
sites

SVA
remove
false
posi$ve
DEGs

Validated
by
PGM
data
from
ABRF-‐NGS
study

te
0.957
0.958
0.958
0.971
0.973
0.970
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
1 2
1
BBBBBB
B
B
B
BBB
B
B
BBBB
BBBB
B
BBBB
●
●
●
●
●
●
ILM1
ILM2
ILM3
ILM4
ILM5
ILM6
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
ABCD
ILM1−ILM2
ILM1−ILM3
ILM1−ILM4
ILM1−ILM5
ILM1−ILM6
ILM2−ILM3
ILM2−ILM4
ILM2−ILM5
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
Comparison
FalsepositiveDEGs
c
Illumina
PGM

−2 −1 0 1
−0.50.00.51.01.5
Dimension 1
Dimension2
A.1
A.2 A.3
A.4
B.1
B.2
B.3
B.4
A.1
A.2
B.1
B.2
A.1
A.2
B.1B.2
●
●
●
PGM1
PGM2
PGM3
P
P
P
P
P
P
site
● ● ●PGM1 PGM2 PGM3
1 2 3 4
P
P
P
P
P
site
A
● ● ●PGM1 PGM2 PGM3
1 2 3 4
comp: A comp: B
0
50
100
150
200
PGM1−PGM2
PGM1−PGM3
PGM2−PGM3
PGM1−PGM2
PGM1−PGM3
PGM2−PGM3
Comparison
FalsepositiveDEGs
1
1
NumberofDEGs
d e f
SEQC
study
Li
et
al.,
in
press,
Nat.
Biotech.,
2014
ABRF-‐NGS
study

NIST,
SEQC,
ABRF,
ENCODE,
others

•  Provide
reference
data

resources

•  Best
prac$ces
for

–  gene
quan$ﬁca$on,

–  isoform
characteriza$on,

–  dynamic
range
comparisons,

–  managing
inter-‐site
and
intra-‐
site
varia$on,

–  analysis
pipelines,
and

–  cross-‐pla(orm
tes$ng
of

transcriptome
hypotheses

•  To
address
some
other
aspects

of
RNA-‐seq,
including

–  variant
detec$on,

–  allele-‐speciﬁc
expression,

–  RNA
edi$ng,
and
gene
fusions.

•  And
more
…

Which
transcriptome
is
important?

All
…
RNA
bases.

The
four-‐base
genome

is
just
the
beginning

There
are
many
RNA-‐mods
as
well:

Chuan
He

Abbreviation Chemical0name
m1
acp3
Ψ 1'methyl'3'(3'amino'3'carboxypropyl)5pseudouridine
m1
A 1'methyladenosine
m1
G 1'methylguanosine
m1
I 1'methylinosine
m1
Ψ 1'methylpseudouridine
m1
Am 1,2''O'dimethyladenosine
m1
Gm 1,2''O'dimethylguanosine
m1
Im 1,2''O'dimethylinosine
m2
A 2'methyladenosine
ms2
io6
A 2'methylthio'N6
'(cis'hydroxyisopentenyl)5adenosine
ms2
hn6
A 2'methylthio'N6
'hydroxynorvalyl5carbamoyladenosine
ms2
i6
A 2'methylthio'N6
'isopentenyladenosine
ms2
m6
A 2'methylthio'N6
'methyladenosine
ms2
t6
A 2'methylthio'N6
'threonyl5carbamoyladenosine
s2
Um 2'thio'2''O'methyluridine
s2
C 2'thiocytidine
s2
U 2'thiouridine
Am 2''O'methyladenosine
Cm 2''O'methylcytidine
Gm 2''O'methylguanosine
Im 2''O'methylinosine
Ψm 2''O'methylpseudouridine
Um 2''O'methyluridine
Ar(p) 2''O'ribosyladenosine5(phosphate)
Gr(p) 2''O'ribosylguanosine5(phosphate)
acp3
U 3'(3'amino'3'carboxypropyl)uridine
m3
C 3'methylcytidine
m3
Ψ 3'methylpseudouridine
m3
U 3'methyluridine
m3
Um 3,2''O'dimethyluridine
imG'14 4'demethylwyosine
s4
U 4'thiouridine
chm5
U 5'(carboxyhydroxymethyl)uridine
mchm5
U 5'(carboxyhydroxymethyl)uridine5methyl5ester
inm5
s2
U 5'(isopentenylaminomethyl)'52'thiouridine
inm5
Um 5'(isopentenylaminomethyl)'52''O'methyluridine
inm5
U 5'(isopentenylaminomethyl)uridine
nm5
s2
U 5'aminomethyl'2'thiouridine
ncm5
Um 5'carbamoylmethyl'2''O'methyluridine
ncm5
U 5'carbamoylmethyluridine
cmnm5
Um 5'carboxymethylaminomethyl'52''O'methyluridine
cmnm5
s2
U 5'carboxymethylaminomethyl'2'thiouridine
cmnm5
U 5'carboxymethylaminomethyluridine
cm5
U 5'carboxymethyluridine
f5
Cm 5'formyl'2''O'methylcytidine
f5
C 5'formylcytidine
hm5
C 5'hydroxymethylcytidine
ho5
U 5'hydroxyuridine
mcm5
s2
U 5'methoxycarbonylmethyl'2'thiouridine
mcm5
Um 5'methoxycarbonylmethyl'2''O'methyluridine
mcm5
U 5'methoxycarbonylmethyluridine
mo5
U 5'methoxyuridine
m5
s2
U 5'methyl'2'thiouridine
mnm5
se2
U 5'methylaminomethyl'2'selenouridine
mnm5
s2
U 5'methylaminomethyl'2'thiouridine
mnm5
U 5'methylaminomethyluridine
m5
C 5'methylcytidine
m5
D 5'methyldihydrouridine
m5
U 5'methyluridine
τm5
s2
U 5'taurinomethyl'2'thiouridine
τm5
U 5'taurinomethyluridine
m5
Cm 5,2''O'dimethylcytidine
m5
Um 5,2''O'dimethyluridine
preQ1 7'aminomethyl'7'deazaguanosine
preQ0 7'cyano'7'deazaguanosine
m7
G 7'methylguanosine
G+
archaeosine
D dihydrouridine
oQ epoxyqueuosine
galQ galactosyl'queuosine
OHyW hydroxywybutosine
I inosine
imG2 isowyosine
k2
C lysidine
manQ mannosyl'queuosine
mimG methylwyosine
m2
G N2
'methylguanosine
m2
Gm N2
,2''O'dimethylguanosine
m2,7
G N2
,7'dimethylguanosine
m2,7
Gm N2
,7,2''O'trimethylguanosine
m2
2G N2
,N2
'dimethylguanosine
m2
2Gm N2
,N2
,2''O'trimethylguanosine
m2,2,7
G N2
,N2
,7'trimethylguanosine
ac4
Cm N4
'acetyl'2''O'methylcytidine
ac4
C N4
'acetylcytidine
m4
C N4
'methylcytidine
m4
Cm N4
,2''O'dimethylcytidine
m4
2Cm N4
,N4
,2''O'trimethylcytidine
io6
A N6
'(cis'hydroxyisopentenyl)adenosine
ac6
A N6
'acetyladenosine
g6
A N6
'glycinylcarbamoyladenosine
hn6
A N6
'hydroxynorvalylcarbamoyladenosine
i6
A N6
'isopentenyladenosine
m6
t6
A N6
'methyl'N6
'threonylcarbamoyladenosine
m6
A N6
'methyladenosine
t6
A N6
'threonylcarbamoyladenosine
m6
Am N6
,2''O'dimethyladenosine
m6
2A N6
,N6
'dimethyladenosine
m6
2Am N6
,N6
,2''O'trimethyladenosine
o2yW peroxywybutosine
Ψ pseudouridine
Q queuosine
OHyW undermodified5hydroxywybutosine
cmo5
U uridine55'oxyacetic5acid
mcmo5
U uridine55'oxyacetic5acid5methyl5ester
yW wybutosine
imG wyosine
Table515'5List5of5Base5Modifications5Covered5by5Claims
m6A
is
just
1
of
the
110

known
RNA
modiﬁca$ons

from
the

RNA
Modiﬁca$on
Database

m6A
marks
are
widespread
in
cancer
cells’
RNA

Meyer
et
al.,
Cell,
2012

A
new
method:
MeRIP-‐Seq

Meyer
et
al.,
Cell,
2012

Conserva$ons
of
signal
and
sites
in

orthologous

genes

Meyer
et
al.,
Cell,
2012

Peaks
also
show
strong
conserva$on

Meyer,
2012

Many
puta$ve
roles
for
m6A
in
RNA

Paz-‐Yaakov
et
al,
2010

Saletore,
Chen-‐Kiang,
and
Mason,
RNA
Biology,
2013.

RNA
modiﬁca$ons
give
a
new
layer
of
cellular
regula$on

N
NN
N
NH2
O
OHO
O
RNA
RNA
N
NN
N
HN
O
OHO
O
RNA
RNA
H2
C
OH
N
NN
N
HN
O
OHO
O
RNA
RNA
C
H
O
N
NN
N
HN
O
OHO
O
RNA
RNA
C
OH
O
N
NN
N
HN
O
OHO
O
RNA
RNA
CH3
A m6A
ca6A f6A hm6A
O
H H
O
H OH
H2O
Mettl3
FTO/ALKBH5
O
2
, Fe
II
, -KG
FTO/
ALKBH5
O
2
, Fe
II
, -KG
Li
and
Mason,
ARGHG,
2014

Direct
Detec$on
of
Methyla$on
on
PacBio

70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.5
0
100
200
300
400
Fluorescence
intensity(a.u.)
Time (s)
104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5
0
100
200
300
400
Fluorescence
intensity(a.u.)
Time (s)
C

T
G
A
T
C
G
T
A
C

mA

A
G
T
C
T
A
A

G
C
C
A
A
A

A

Approach:
Kine$c
detec$on
of
methylated
bases
during
SMRT
DNA
sequencing

Example:
N6-‐methyladenosine
(mA)

SMRT RNA Sequencing – Assay
Design
I.  RNA template (purple) is hybridized to a DNA primer (orange) and immobilized on the bottom of a
zero-mode waveguide (ZMW) in the presence of fluorescently-labeled nucleotide analogues.
II.  After the addition of HIV RT to the solution, the polymerase binds to the RNA template at the 3’-end of
DNA primer.
III.  The first nucleotide analogue can bind to the active site of the polymerase resulting in a fluorescent
pulse with a color characteristic to the type of nucleotide (i.e. A, C, G, or T). With HIV RT, the rates of
nucleotide binding, dissociation and incorporation have not been optimized to a state where each
binding event would result in an incorporation event (image IV.). Consequently, nucleotide analogues
often dissociate before they can be incorporated resulting in a series of like pulses before the
polymerase can translocate to the next position and bind the nucleotide analogue complementary to
the next nucleotide in RNA template.
IV.  Incorporation of the nucleotide and translocation to the next nucleotide on RNA template.

First demonstration SMRT RNA Sequencing
– Expected Signal
A.  Example RNA template is shown in black. DNA primer is
shown in orange. The first cDNA strand synthesized by
HIV RT during SMRT RNA Sequencing is color-coded (A
– yellow, C – red, G – gree, T – blue). The first cDNA
strand is complementary to the unhybridized section of
the example RNA template and can be used to determine
the sequence of that section.
B.  A typical time trace obtained in a ZMW with RNA template
immobilized on the bottom surface. One can observe a
succession of blocks of like-colored pulses corresponding
to the sequence of cDNA strand (blocks are indicated by
lines above the trace).
C.  As mentioned on Slide 3 – Point III., due to the kinetics of
the current HIV RT, we observe multiple binding events at
each RNA template position before nucleotide
incorporation and translocation to the next position.
A
B
C

Saletore
et
al.,
Genome
Biology,
2012

m6A
is
dis$nct
from
A

Saletore
et
al.,
Genome
Biology,
2012

Epigenome

Epitranscriptome

Epiproteome
(PTM)

DNA
RNA
Protein

transcrip$on
transla$on

RT

ACGT

viRNA

I
N
H
E
R
I
T
A
N
C
E

or

T
R
A
N
S
M
I
S
S
I
O
N

prions

iDNA

mC,hmC,
8oxoG,
m6A

(sn/sno/g)RNA
(t/r/tm)RNA

(lnc/nc/mi/piwi/si/vi)RNA

scRNA

RbP

Ribozymes
Prions

iRNA
iProtein

mC,hmC,
8oxoG,
m6A,
…
m6A,
5’G,polyA,
…
Acet,
Phos,
Citr,
SUMO,
…

from
Saletore
et
al.,
Genome
Biology,
2012

Which
genome
is
important?

All
…
Cells.

−5 0 5
−6−4−2024
Comp.1
Comp.2
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40
C41C42
C43
C44
C45
C46
C47
C48
C49
C50
C51
C52
C53
C54
C55
C56
C57
C58
C59
C60
C61
C62
C63
C64
C65
C66
C67
C68
C69
C70
C71
C72
C73
C74
C75
C76
C77
C78
C79
C80
C81
C82
C83
C84
C85
C86
C87
C88
C89
C90
C91
C92
C93
C94
C95
C96
Principal
Component
1

Principal
Component
2

C05
C12
C29
C33
C44
C78
C88
C35
C03
C19
C16
C20
C72
C38
C83
C01
C67
C60
C42
C52
C32
C86
C02
C91
C09
C46
C80
C11
C58
C64
C56
C77
C06
C23
C15
C90
C62
C65
C43
C48
C84
C34
C53
C69
C37
C82
C28
C76
C87
C61
C92
C85
C47
C96
C50
C59
C39
C66
C93
C26
C27
C54
C71
C07
C22
C14
C81
C08
C41
C55
C73
C70
C51
C75
C24
C94
C25
C10
C17
C04
C68
C13
C63
C18
C57
C40
C74
C36
C79
C30
C49
C21
C45
C95
C31
C89
MGST3
UROD
PSMA5
PSMB2
DHRS3
L.XKR8
HBXIP
S100A11
R.spk7mid
RPS27
PSMD4
MAST2
YBX1
SLC39A1
UBR4
B4GALT3
S.8
CNIH4
NCSTN
SRM
RNF220
WDR77
TRAPPC3
RPL11
VPS72
ILF2
NUDC
S.4
PARP1
PRDX6
BCRABL
HIST2H2BE
YARS
PI4KB
UBE2Q1
VAMP3
KIFAP3
SF3A3
TPR
DIRAS3
GNPAT
ZMPSTE24
HSD17B7
IPO13
KIF2C
CAPZA1
FAF1
MRPL9
SDHB
AHCYL1
NOC2L
HMGN2
L.AXIN2
GNL2
SFRS4
CREG1
ETV1
SOX5
MEAF6
S.2
DTL
MARCKSL1
MRPS15
JAK1
GNB1
S.10
PTCH1
ALDH9A1
RPA2
KHDRBS1
SFPQ
ARPC5
TXNIP
UFC1
RABGGTB
S.5
KIAA0907
CDC20
PRPF3
KPNA6
RPL22
ACTB
WASF2
ATP5F1
S.6
KIF14
LRRC47
0
5
10
15
20
25
30
Check

Cells

Sort
Cells,

Load
plate

Library

prep.
&

RNA-‐Seq

Lyse,

cDNA

AML-‐related
genes’
expression
(RPKM)

Puriﬁed
Cell
(1-‐96)

Single-‐cell
RNA-‐seq
reveals
extraordinary
heterogeneity

Leukemia

Sample

Which
genome
is
important?

All
…
Species.

nhprtr.org

Pipes,
Li,
Bozinoski
et
al.,
NAR,
2013

Prosimian
Human
Chimpanzee
Gorilla
Rhesus Macaque
Japanese Macaque
Cynomolgus Macaque
Pig-tailed Macaque
Vervet
Sooty Mangabey
Common Marmoset
Squirrel Monkey
Ring-tailed Lemur
Mouse Lemur
Olive Baboon
New World Monkeys
Old World
Hominoids
~70 mya
35-40
23-25
6
8
9
10
14 species selected for
phylogenetic &
pharmacogenomic
significance:
• Samples
from
Washington,
Oregon,
Yerkes,
and
Southwestern
Na$onal
Primate
Research
Centers;

the
Duke
Lemur
Center;
the
MD
Anderson
Cancer
Center,
and
Covance,
Inc.

21
Matching
“BodyMap”
Dissected
Tissues

Brain
Immune/Sex
Soma

Cerebellum
Bone
Marrow
Adipose

Frontal
Cortex
Lymph
Node
Colon

Hippocampus
Spleen
Heart

Hypothalamus
Thymus
Kidney

Pituitary
Thyroid
Liver

Temporal
Lobe
Ovary
Lung

Tes$s
Skeletal
Muscle

Whole
Blood

Phase
I
(2010
-‐2012)

RNA
extracted
from

each
$ssue,
pooled,

sequenced
on
two
FCs

(HS2K,
GAIIx)
for
overall

annota$on.

Phase
II
(2012
–
2014):

RNA
extracted
from

12
$ssues,
individually

sequenced
(1/2
lane
HS2K),

for
$ssue-‐speciﬁc

annota$on.

AceView
Genes
Expressed

Species

/isolate

Number
of

genes

expressed

Known
human

genes

Un-‐annotated

human
genes

BAB
52065
22376
29689

CMC
50073
21916
28157

CMM
50155
21974
28181

JMI
49253
21785
27468

PTM
50863
22084
28779

RMC
49845
21888
27957

RMI
49922
21913
28009

Jean
Thierry-‐Mieg

Species
Library
#
input

sequences

(billion)

#
con;gs
Total

length

(Mb)

N25

(bp)

N50

(bp)

N75

(bp)

Longest

con;g
(bp)

Baboon

TOT
0.149
658,581
281
1,029
434
276
35,170

UDG
1.543
1,131,951
844
3,534
1,368
479
131,395

Chimpanzee
UDG
1.465
987,615
1,433
6,356
3,806
1,666
47,873

Cynomologus

Macaque
Chinese

RNA
1.676
911,282
864
4,769
2,316
706
59,032

UDG
1.630
990,604
1,055
5,479
2,822
875
122,916

Cynomolgus
Macaque

Mauri;an

RNA
1.078
1,142,531
929
4,368
1,813
525
30,364

TOT
0.166
526,723
200
657
377
266
36,976

UDG
1.177
1,015,657
834
4,360
1,927
532
22,239

Gorilla
UDG
1.256
732,336
1,122
5,878
3,680
1,804
33,526

Japanese
Macaque

RNA
1.863
703,246
738
5,010
2,687
898
21,620

Marmoset

TOT
0.253
332,782
118
545
357
261
14,561

UDG
1.659
814,235
476
2,206
785
359
122,518

Pig-‐tailed
Macaque

RNA
1.725
969,993
1,044
5,189
2,807
916
25,056

UDG
1.573
1,301,087
1,222
5,015
2,265
668
32,637

Rhesus
Macaque

Chinese

RNA
1.210
923,017
836
4,694
2,230
639
32,796

UDG
1.310
969,421
850
4,638
2,114
594
34,020

Rhesus
Macaque

Indian

RNA
3.200
703,246
738
5,010
2,687
898
21,620

UDG
1.410
1,051,149
833
3,966
1,644
519
68,269

Ring-‐tailed
Lemur
UDG
1.403
611,678
822
6,169
3,630
1,468
33,401

Sooty
Mangabey
UDG
1.635
1,188,472
1,466
6,431
3,483
1,116
30,331

Chimpanzee
de
novo
assembled
transcriptome
fully
recovers

most
genes
in
current
annota$on
Chimpanzee De Novo Assembled Transcriptome
Proportion of gene covered by transcriptome
Numberofgenes
0.0 0.2 0.4 0.6 0.8 1.0
020004000600080001000012000

Which
genome
is
important?

All
…
Interac$ng
Elements.

We
need
more

Longitudinal,
Integra$ve
Systems
Biology

ScoA
Kelly
–
ISS
for
one
year

Mark
Kelly
–
Earth
control

Telomere
Length

DNA
Muta$ons
&
Structural
Varia$on

DNA
Hydroxy-‐methyla$on

Chroma$n

RNA
expression

&
RNA
Methyla$on
Proteomics

An$body
Titers

Cytokines

DNA
Methyla$on

B-‐cells
/
T-‐cells

Targeted
and
Global
Metabolomics
Microbiome

Cogni$on

Vasculature

These People are Awesome
@mason_lab

Gratitude to Many People and Places
Illumina
Gary Schroth
Marc Van Oene
Univ. Chicago
Yoav Gilad
Yale University
Nenad Sestan
Sherman Weissman
FDA/SEQC
Leming Shi
NIH/UDP/NCBI
Jean and Danielle Thierry-Mieg
William Gahl
Baylor
Jeff Rogers
MSKCC
Christina Leslie
Ross Levine
PKI/Geospiza
Todd Smith
Mason Lab
Noah Alexander
Marjan Bozinoski
Dhruva Chandramohan
Sagar Chhangawala
Jorge Gandara
Fran Garrett-Bakelman
Dyala Jaroudi
Sheng Li
Cem Meyden
Lenore Pipes
Darryl Reeves
Yogesh Saletore
Priyanka Vijay
Paul Zumbo
Cornell/WCMC
Jason Banfelder
Scott Blanchard
Selina Chen-Kiang
Olivier Elemento
Yariv Houvras
Samie Jaffrey
Ari Melnick
Margaret Ross
Adam Siepel
Epigenomics Core
Katze Lab/NHPRT
Michael Katze
Bob Palermo
Xinxia Pan
Icahn/MSSM
Eric Schadt, Andrew Kasarskis,
Joel Dudley, Ali Bashir,
Bobby Sebra
NASA
Kosuke Fujishima
Lynn Rothschild
ABRF
George Grills
Scott Tighe
Don Baldwin
UMMS
Maria E Figueroa
AMNH
George Amato
Mark Sidall
U-Minnesota
Ran Blekhman
@mason_lab

20140710 6 c_mason_ercc2.0_workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to 20140710 6 c_mason_ercc2.0_workshop

Similar to 20140710 6 c_mason_ercc2.0_workshop (20)

More from External RNA Controls Consortium

More from External RNA Controls Consortium (12)

Recently uploaded

Recently uploaded (20)

20140710 6 c_mason_ercc2.0_workshop