User Guide: Orion™ Weather Station (Columbia Weather Systems)
20140710 6 c_mason_ercc2.0_workshop
1. Mul$-‐pla(orm
and
cross-‐methodological
reproducibility
of
transcriptome
profiling
by
RNA-‐seq
in
the
ABRF
Next-‐
Genera$on
Sequencing
Study
(ABRF-‐NGS)
and
FDA’s
SEQC
Study
Christopher
E.
Mason,
Ph.D.
Assistant
Professor
Department
of
Physiology
and
Biophysics
&
The
Ins$tute
for
Computa$onal
Biomedicine
at
Weill
Cornell
Medical
College
and
the
Tri-‐Ins$tu$onal
Program
on
Computa$onal
Biology
and
Medicine
Affiliate
Fellow
of
the
Informa$on
Society
Project,
Yale
Law
School
July
10th,
2014
@mason_lab
4.
Human Genome Organization
from
Mason,
State,
and
Moldin,
Kaplan
&
Sadock’s
Comprehensive
Textbook
of
Psychiatry,
2009
5. ENCODE
ac$ve
elements
“These
data
enabled
us
to
assign
biochemical
func$ons
for
80%
of
the
genome,
in
par$cular
outside
of
the
well-‐studied
protein-‐coding
regions.”
6. “The
ENCODE
results
were
predicted
by
one
of
its
authors
to
necessitate
the
rewri$ng
of
textbooks.”
“We
agree,
many
textbooks
dealing
with
marke;ng,
mass-‐media
hype,
and
public
rela;ons
may
well
have
to
be
rewriAen.”
7. GENCODE
annota;on
(v19)
July
10th,
2014
Coding
genes:
20,345
Noncoding
genes:
22,883
Psuedogenes:
14,206
Other
genes:
516
57,820
total
18. Es$mate
of
False
Junc$on
Rate
5’
3’
Exon-‐Edge
DB
(3’-‐5’
junc$ons)
2 3 41Exon # :
False
Exon-‐Edge
DB
(3’-‐3’
junc$ons)
False
Exon-‐Edge
DB
(5’-‐5’
junc$ons)
False
Exon-‐Edge
DB
(5’-‐3’
junc$ons)
5’
3’
5’
3’
5’
3’
19. FDA’s Sequencing Quality Control (SeQC) Consortium:
Seq Standards from 323 members of
Government (FDA), Industry, Academia
>RNA sequence from pilot study based on MAQC
samples and MAQC design:
~Pilot data from 2008-2012
MAQC Sample A/B, & ERCCs from:
• 454
• Helicos
• Illumina
• Lifetech
~1000Gb of Illumina BodyMap data
~300Gb of experimental RNA-Seq
20. Associa;on
of
Biomolecular
Research
Facili;es
(ABRF)
Next
Genera;on
Sequencing
(NGS)
Study:
Phase
I:
RNA-‐Seq
and
degraded
RNA-‐Seq
(2011-‐2013)
Phase
II:
DNA-‐Seq
and
Hard-‐to-‐sequence
regions
(2014-‐2016)
Phase
III:
Asteroid
and
Mar$an
Surface
Sequencing
(2017-‐2050?)
1. Cross
Pla(orm:
HiSeq2000,
454,
PGM,
Proton,
PacBio
2. Cross-‐Protocol:
ribo-‐,
direc$onal,
non-‐direc$onal,
degraded
RNA
3. Cross-‐Site:
3
sites
for
each
pla(orm,
replicates
at
each
site
Improve
the
detec$on
and
quan$fica$on
of
molecular
signatures
for
use
in
basic,
applied,
and
clinical
research.
Specifically,
the
ABRF-‐NGS
group
will
test
and
characterize
the
myriad,
rapidly
evolving
NGS
techniques
for
RNA-‐Seq
and
DNA-‐Seq.
22. ß
SEQC
samples
(FDA)
A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
A:
10
cancer
cell
lines
B:
Brain
$ssues
23. A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-‐pla(orm
experimental
design
24. A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-‐Site
experimental
design
25. A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-‐Protocol
experimental
design
26. All
technologies
have
strengths
&
weaknesses
Vendor Instrument Version
Run Time
(hours)
Read
Length
(mean)
Reads
per Run
(million)
Yield per
Run
Cost per
Run
($)1
Cost
per Mb
($)
Paired-
end
Application Strengths
Illumina HiSeq 2000 High Output 132 50 6,000 300 Gb2
18,725.00 0.06 Yes
Gene expression; splice
junction detection;
variant calling; fusions
Deep read counts for
transcript
quantification
Illumina HiSeq 2500 High Output 132 50 6,000 300 Gb2
18,725.00 0.06 Yes as above as above
Illumina MiSeq v2 kit 39 250 30 7.5 Gb 982.75 0.13 Yes
Splice junction
detection; variant calling
Rapid transcript
quantification and
variant detection
Life
Technologies
PGM 318 chip 7.3 176 6 1.056 Gb 749.00 0.71 No
Splice junction
detection; variant calling
Rapid transcript
quantification and
variant detection
Life
Technologies
Proton
Proton I
chip
2 to 4 81 70 5.67 Gb 834.00 0.15 No
Gene expression;
variant calling
Good read depth and
length for transcript
quantification
Pacific
Biosciences
RS RS 0.5 to 2 1,289 0.03 38.67 Mb 136.38 3.53 No
Splice junction
detection; full length
gene coverage
Extended read
lengths
Roche 454
Genome
Sequencer
FLX+
20 686 1 686 Mb 5,985.00 8.72 No Splice junction detection Read length
1
sequencing reaction reagents, at academic list price U.S. dollars, and does not include library preparation reagents, labor, data storage or analysis, equipment
or maintenance; 2
one sequencing run using 2 flowcells. Pacific Biosciences is calculated based on CCS reads; Gb: billion bases, Mb: million bases
Table 1. Summary of RNA-seq platforms as deployed in the ABRF-NGS Study.
28. Error
models
highly
variable
among
pla(orms
0%
2%
4%
6%
454
ILMN
PAC
PGM
PRO
platform
Indelsmismatchspercentage
b
454:
Roche
454
GS
FLX+
ILMN:
Illumina
HiSeq
2000/2500
PAC:
Pacific
Biosciences
RS
I
PGM:
Ion
Personal
Genome
Machine
PRO:
Ion
Torrent
Proton
29. Full
length
genebody
coverage
Coverage across genebody (%)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
5'
platform
454
sample
site
Y
type
intact
type
site
sample
platformm
454
ILMN
PAC
PGM
PRO
Note:
PAC
used
CCS
read
only;
PAC
on
average
only
cover
3k
GENCODE
genes.
31. Genes
detec$on
is
log-‐linear;
Junc$on
detec$on
is
length-‐dependent
454−A
454−B
ILMN−A
ILMN−B
PAC−A1
PAC−A2
PAC−A3
PAC−B1
PAC−B2
PAC−B3
PGM−A
PGM−B
PRO−A
PRO−B
454
LMN
PGM
●
●
50000
100000
150000
200000
9.0
9.5
10.0
10.5
11.0
bases sequenced (log10)
detectedjunctions
● 454 ILMN PAC PGM PRO
● ●A B
●●
10000
20000
30000
9.0
9.5
10.0
10.5
11.0
bases sequenced (log10)
detectedgenes
● 454 ILMN PAC PGM PRO
● ●A B
d e
32. Junc$ons
detec$on
efficiency
highly
variable;
agreement
common
PRO
454
LMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
LMN−PGM
LMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
LMN−PGM
LMN−PRO
PGM−PRO
platform
A−AH
A−AS
A−AR
AH−AS
AH−AR
AR−AS
B−BR
sample
0
25
50
75
454
ILMN
PAC
PGM
PRO
platform
junctionspermillionbases
A B
known novel
0
20000
40000
60000
1
2
3
4
5
3
4
5
occurrence among platforms
junctions
A Bgf
Note:
Only
use
a
subset
of
reads
among
pla(orms
to
normalize
the
scale
Most
of
the
known
junc$ons
are
shared
by
at
least
3
pla(orms
37. Gene
regions
distribu$on
varies
between
protocols
a" b"MappingDistribution(%)!
0%
25%
50%
75%
100%
Sample
Mappingdistribution(%) intergenic mito exon 5utr intron ribo
a
38. Very
similar
gene
measurements
of
genes
(RPKM)
between
polyA
&
Ribo-‐deple$on
Ribo0>PolyA+ PolyA>Ribo0+
RPKM%difference%(polyA%–%ribo0)%
Number%of%Genes%
46. "FEAR is
the
main
source
of
superstition,
and
one
of
the
main
sources
of
cruelty.
To
conquer
fear
is
the
beginning
of
wisdom."
–
Bertrand
Russell
HBR−RIN9
HBR−RIN0
UHR−RIN0
UHR−RIN3
UHR−RIN9
UHR−RIN6
HBR−RIN9
HBR−RIN0
UHR−RIN0
UHR−RIN3
UHR−RIN9
UHR−RIN6
0.6 0.7 0.8 0.9 1
Value
Color Key
47. Can
we
remove
supers$$on?
HEAT
99oC
for
10
minutes
SONICATION
6
x
55
seconds
at
5%DC,
I
intensity
3
100
c/b
RNAse
RNAse
A
1
ng/uL
Un$l
RIN
<2
48. Degraded
RNA
looks
great!
Coverage across genebody (%)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
5'
platform
454
sample
site
Y
type
intact
type
site
sample
platformm
454
ILMN
PAC
PGM
PRO
49. Degraded
RNA
highly
correlates
with
intact
RNA
gene
expression
atform intra−platform
B
inter−platform
A
inter−platform
B
0.96
0.85
0.98
0.95
0.95
0.82
0.88
0.85
0.84
0.93
0.89
0.75
0.82
0.78
0.86
0.92
0.92
PRO
454
ILMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
platform
0.95
0.94
0.94
0.96
0.97
0.97
0.96
0.00
0.25
0.50
0.75
1.00
A−AH
A−AS
A−AR
AH−AS
AH−AR
AR−AS
B−BR
sample
Correlationcoefficients
c
B
A B
A
B
C
D
A
B
Pacific Biosciences
RS
Life Tech
Ion P
(AR)
(AS)
(AH)
(BR)
Illumina
Ribo-‐deple$on
protocol
55. Nucleo$de
frequency
bias
from
library
prepara$on
●M5 ILM6
A G C T
23
24
25
26
27
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
Position in read
Nucleotidefrequency(%)
ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
1 5
site
1 2 3 4 5
● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
site
1 2 3 4 5
● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
f
SEQC
study
Replicate 5 libraries
were vendor supplied
<-‐
5th
replicate
of
ILM2
and
ILM3
<-‐
1th
replicate
of
ILM2
<-‐
1th
replicate
of
ILM3
56. Sample
A
vs.
A
=
the
Answer
should
be
zero
DEGs;
SVA
can
remove
false
posi$ve
DEGs
te
0.957
0.958
0.958
0.971
0.973
0.970
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
1 2
1
BBBBBB
B
B
B
BBB
B
B
BBBB
BBBB
B
BBBB
●
●
●
●
●
●
ILM1
ILM2
ILM3
ILM4
ILM5
ILM6
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
ABCD
ILM1−ILM2
ILM1−ILM3
ILM1−ILM4
ILM1−ILM5
ILM1−ILM6
ILM2−ILM3
ILM2−ILM4
ILM2−ILM5
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
Comparison
FalsepositiveDEGs
Without SVA With SVA
c
Illumina
SEQC
study
Leek
JT,
Storey
JD.PLoS
Genet.2007
Compare
the
same
sample
across
sites
57. SVA
remove
false
posi$ve
DEGs
Validated
by
PGM
data
from
ABRF-‐NGS
study
te
0.957
0.958
0.958
0.971
0.973
0.970
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
1 2
1
BBBBBB
B
B
B
BBB
B
B
BBBB
BBBB
B
BBBB
●
●
●
●
●
●
ILM1
ILM2
ILM3
ILM4
ILM5
ILM6
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
ABCD
ILM1−ILM2
ILM1−ILM3
ILM1−ILM4
ILM1−ILM5
ILM1−ILM6
ILM2−ILM3
ILM2−ILM4
ILM2−ILM5
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
Comparison
FalsepositiveDEGs
Without SVA With SVA
c
Illumina
PGM
−2 −1 0 1
−0.50.00.51.01.5
Dimension 1
Dimension2
A.1
A.2 A.3
A.4
B.1
B.2
B.3
B.4
A.1
A.2
B.1
B.2
A.1
A.2
B.1B.2
●
●
●
PGM1
PGM2
PGM3
P
P
P
P
P
P
site
● ● ●PGM1 PGM2 PGM3
1 2 3 4
P
P
P
P
P
site
A
● ● ●PGM1 PGM2 PGM3
1 2 3 4
comp: A comp: B
0
50
100
150
200
PGM1−PGM2
PGM1−PGM3
PGM2−PGM3
PGM1−PGM2
PGM1−PGM3
PGM2−PGM3
Comparison
FalsepositiveDEGs
Without SVA With SVA
1
1
NumberofDEGs
d e f
SEQC
study
Li
et
al.,
in
press,
Nat.
Biotech.,
2014
ABRF-‐NGS
study
58. NIST,
SEQC,
ABRF,
ENCODE,
others
• Provide
reference
data
resources
• Best
prac$ces
for
– gene
quan$fica$on,
– isoform
characteriza$on,
– dynamic
range
comparisons,
– managing
inter-‐site
and
intra-‐
site
varia$on,
– analysis
pipelines,
and
– cross-‐pla(orm
tes$ng
of
transcriptome
hypotheses
• To
address
some
other
aspects
of
RNA-‐seq,
including
– variant
detec$on,
– allele-‐specific
expression,
– RNA
edi$ng,
and
gene
fusions.
• And
more
…
65. Abbreviation Chemical0name
m1
acp3
Ψ 1'methyl'3'(3'amino'3'carboxypropyl)5pseudouridine
m1
A 1'methyladenosine
m1
G 1'methylguanosine
m1
I 1'methylinosine
m1
Ψ 1'methylpseudouridine
m1
Am 1,2''O'dimethyladenosine
m1
Gm 1,2''O'dimethylguanosine
m1
Im 1,2''O'dimethylinosine
m2
A 2'methyladenosine
ms2
io6
A 2'methylthio'N6
'(cis'hydroxyisopentenyl)5adenosine
ms2
hn6
A 2'methylthio'N6
'hydroxynorvalyl5carbamoyladenosine
ms2
i6
A 2'methylthio'N6
'isopentenyladenosine
ms2
m6
A 2'methylthio'N6
'methyladenosine
ms2
t6
A 2'methylthio'N6
'threonyl5carbamoyladenosine
s2
Um 2'thio'2''O'methyluridine
s2
C 2'thiocytidine
s2
U 2'thiouridine
Am 2''O'methyladenosine
Cm 2''O'methylcytidine
Gm 2''O'methylguanosine
Im 2''O'methylinosine
Ψm 2''O'methylpseudouridine
Um 2''O'methyluridine
Ar(p) 2''O'ribosyladenosine5(phosphate)
Gr(p) 2''O'ribosylguanosine5(phosphate)
acp3
U 3'(3'amino'3'carboxypropyl)uridine
m3
C 3'methylcytidine
m3
Ψ 3'methylpseudouridine
m3
U 3'methyluridine
m3
Um 3,2''O'dimethyluridine
imG'14 4'demethylwyosine
s4
U 4'thiouridine
chm5
U 5'(carboxyhydroxymethyl)uridine
mchm5
U 5'(carboxyhydroxymethyl)uridine5methyl5ester
inm5
s2
U 5'(isopentenylaminomethyl)'52'thiouridine
inm5
Um 5'(isopentenylaminomethyl)'52''O'methyluridine
inm5
U 5'(isopentenylaminomethyl)uridine
nm5
s2
U 5'aminomethyl'2'thiouridine
ncm5
Um 5'carbamoylmethyl'2''O'methyluridine
ncm5
U 5'carbamoylmethyluridine
cmnm5
Um 5'carboxymethylaminomethyl'52''O'methyluridine
cmnm5
s2
U 5'carboxymethylaminomethyl'2'thiouridine
cmnm5
U 5'carboxymethylaminomethyluridine
cm5
U 5'carboxymethyluridine
f5
Cm 5'formyl'2''O'methylcytidine
f5
C 5'formylcytidine
hm5
C 5'hydroxymethylcytidine
ho5
U 5'hydroxyuridine
mcm5
s2
U 5'methoxycarbonylmethyl'2'thiouridine
mcm5
Um 5'methoxycarbonylmethyl'2''O'methyluridine
mcm5
U 5'methoxycarbonylmethyluridine
mo5
U 5'methoxyuridine
m5
s2
U 5'methyl'2'thiouridine
mnm5
se2
U 5'methylaminomethyl'2'selenouridine
mnm5
s2
U 5'methylaminomethyl'2'thiouridine
mnm5
U 5'methylaminomethyluridine
m5
C 5'methylcytidine
m5
D 5'methyldihydrouridine
m5
U 5'methyluridine
τm5
s2
U 5'taurinomethyl'2'thiouridine
τm5
U 5'taurinomethyluridine
m5
Cm 5,2''O'dimethylcytidine
m5
Um 5,2''O'dimethyluridine
preQ1 7'aminomethyl'7'deazaguanosine
preQ0 7'cyano'7'deazaguanosine
m7
G 7'methylguanosine
G+
archaeosine
D dihydrouridine
oQ epoxyqueuosine
galQ galactosyl'queuosine
OHyW hydroxywybutosine
I inosine
imG2 isowyosine
k2
C lysidine
manQ mannosyl'queuosine
mimG methylwyosine
m2
G N2
'methylguanosine
m2
Gm N2
,2''O'dimethylguanosine
m2,7
G N2
,7'dimethylguanosine
m2,7
Gm N2
,7,2''O'trimethylguanosine
m2
2G N2
,N2
'dimethylguanosine
m2
2Gm N2
,N2
,2''O'trimethylguanosine
m2,2,7
G N2
,N2
,7'trimethylguanosine
ac4
Cm N4
'acetyl'2''O'methylcytidine
ac4
C N4
'acetylcytidine
m4
C N4
'methylcytidine
m4
Cm N4
,2''O'dimethylcytidine
m4
2Cm N4
,N4
,2''O'trimethylcytidine
io6
A N6
'(cis'hydroxyisopentenyl)adenosine
ac6
A N6
'acetyladenosine
g6
A N6
'glycinylcarbamoyladenosine
hn6
A N6
'hydroxynorvalylcarbamoyladenosine
i6
A N6
'isopentenyladenosine
m6
t6
A N6
'methyl'N6
'threonylcarbamoyladenosine
m6
A N6
'methyladenosine
t6
A N6
'threonylcarbamoyladenosine
m6
Am N6
,2''O'dimethyladenosine
m6
2A N6
,N6
'dimethyladenosine
m6
2Am N6
,N6
,2''O'trimethyladenosine
o2yW peroxywybutosine
Ψ pseudouridine
Q queuosine
OHyW undermodified5hydroxywybutosine
cmo5
U uridine55'oxyacetic5acid
mcmo5
U uridine55'oxyacetic5acid5methyl5ester
yW wybutosine
imG wyosine
Table515'5List5of5Base5Modifications5Covered5by5Claims
m6A
is
just
1
of
the
110
known
RNA
modifica$ons
from
the
RNA
Modifica$on
Database
66. m6A
marks
are
widespread
in
cancer
cells’
RNA
Meyer
et
al.,
Cell,
2012
70. Many
puta$ve
roles
for
m6A
in
RNA
Paz-‐Yaakov
et
al,
2010
Saletore,
Chen-‐Kiang,
and
Mason,
RNA
Biology,
2013.
71.
72. RNA
modifica$ons
give
a
new
layer
of
cellular
regula$on
N
NN
N
NH2
O
OHO
O
RNA
RNA
N
NN
N
HN
O
OHO
O
RNA
RNA
H2
C
OH
N
NN
N
HN
O
OHO
O
RNA
RNA
C
H
O
N
NN
N
HN
O
OHO
O
RNA
RNA
C
OH
O
N
NN
N
HN
O
OHO
O
RNA
RNA
CH3
A m6A
ca6A f6A hm6A
O
H H
O
H OH
H2O
Mettl3
FTO/ALKBH5
O
2
, Fe
II
, -KG
FTO/
ALKBH5
O
2
, Fe
II
, -KG
Li
and
Mason,
ARGHG,
2014
74. Direct
Detec$on
of
Methyla$on
on
PacBio
70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.5
0
100
200
300
400
Fluorescence
intensity(a.u.)
Time (s)
104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5
0
100
200
300
400
Fluorescence
intensity(a.u.)
Time (s)
C
T
G
A
T
C
G
T
A
C
mA
A
G
T
C
T
A
A
G
C
C
A
A
A
A
Approach:
Kine$c
detec$on
of
methylated
bases
during
SMRT
DNA
sequencing
Example:
N6-‐methyladenosine
(mA)
75. SMRT RNA Sequencing – Assay
Design
I. RNA template (purple) is hybridized to a DNA primer (orange) and immobilized on the bottom of a
zero-mode waveguide (ZMW) in the presence of fluorescently-labeled nucleotide analogues.
II. After the addition of HIV RT to the solution, the polymerase binds to the RNA template at the 3’-end of
DNA primer.
III. The first nucleotide analogue can bind to the active site of the polymerase resulting in a fluorescent
pulse with a color characteristic to the type of nucleotide (i.e. A, C, G, or T). With HIV RT, the rates of
nucleotide binding, dissociation and incorporation have not been optimized to a state where each
binding event would result in an incorporation event (image IV.). Consequently, nucleotide analogues
often dissociate before they can be incorporated resulting in a series of like pulses before the
polymerase can translocate to the next position and bind the nucleotide analogue complementary to
the next nucleotide in RNA template.
IV. Incorporation of the nucleotide and translocation to the next nucleotide on RNA template.
76. First demonstration SMRT RNA Sequencing
– Expected Signal
A. Example RNA template is shown in black. DNA primer is
shown in orange. The first cDNA strand synthesized by
HIV RT during SMRT RNA Sequencing is color-coded (A
– yellow, C – red, G – gree, T – blue). The first cDNA
strand is complementary to the unhybridized section of
the example RNA template and can be used to determine
the sequence of that section.
B. A typical time trace obtained in a ZMW with RNA template
immobilized on the bottom surface. One can observe a
succession of blocks of like-colored pulses corresponding
to the sequence of cDNA strand (blocks are indicated by
lines above the trace).
C. As mentioned on Slide 3 – Point III., due to the kinetics of
the current HIV RT, we observe multiple binding events at
each RNA template position before nucleotide
incorporation and translocation to the next position.
A
B
C
78. m6A
is
dis$nct
from
A
Saletore
et
al.,
Genome
Biology,
2012
79. Epigenome
Epitranscriptome
Epiproteome
(PTM)
DNA
RNA
Protein
transcrip$on
transla$on
RT
ACGT
viRNA
I
N
H
E
R
I
T
A
N
C
E
or
T
R
A
N
S
M
I
S
S
I
O
N
prions
iDNA
mC,hmC,
8oxoG,
m6A
(sn/sno/g)RNA
(t/r/tm)RNA
(lnc/nc/mi/piwi/si/vi)RNA
scRNA
RbP
Ribozymes
Prions
iRNA
iProtein
mC,hmC,
8oxoG,
m6A,
…
m6A,
5’G,polyA,
…
Acet,
Phos,
Citr,
SUMO,
…
from
Saletore
et
al.,
Genome
Biology,
2012
85. Prosimian
Human
Chimpanzee
Gorilla
Rhesus Macaque
Japanese Macaque
Cynomolgus Macaque
Pig-tailed Macaque
Vervet
Sooty Mangabey
Common Marmoset
Squirrel Monkey
Ring-tailed Lemur
Mouse Lemur
Olive Baboon
New World Monkeys
Old World
Hominoids
~70 mya
35-40
23-25
6
8
9
10
14 species selected for
phylogenetic &
pharmacogenomic
significance:
• Samples
from
Washington,
Oregon,
Yerkes,
and
Southwestern
Na$onal
Primate
Research
Centers;
the
Duke
Lemur
Center;
the
MD
Anderson
Cancer
Center,
and
Covance,
Inc.
86. 21
Matching
“BodyMap”
Dissected
Tissues
Brain
Immune/Sex
Soma
Cerebellum
Bone
Marrow
Adipose
Frontal
Cortex
Lymph
Node
Colon
Hippocampus
Spleen
Heart
Hypothalamus
Thymus
Kidney
Pituitary
Thyroid
Liver
Temporal
Lobe
Ovary
Lung
Tes$s
Skeletal
Muscle
Whole
Blood
Phase
I
(2010
-‐2012)
RNA
extracted
from
each
$ssue,
pooled,
sequenced
on
two
FCs
(HS2K,
GAIIx)
for
overall
annota$on.
Phase
II
(2012
–
2014):
RNA
extracted
from
12
$ssues,
individually
sequenced
(1/2
lane
HS2K),
for
$ssue-‐specific
annota$on.
87. AceView
Genes
Expressed
Species
/isolate
Number
of
genes
expressed
Known
human
genes
Un-‐annotated
human
genes
BAB
52065
22376
29689
CMC
50073
21916
28157
CMM
50155
21974
28181
JMI
49253
21785
27468
PTM
50863
22084
28779
RMC
49845
21888
27957
RMI
49922
21913
28009
Jean
Thierry-‐Mieg
89. Chimpanzee
de
novo
assembled
transcriptome
fully
recovers
most
genes
in
current
annota$on
Chimpanzee De Novo Assembled Transcriptome
Proportion of gene covered by transcriptome
Numberofgenes
0.0 0.2 0.4 0.6 0.8 1.0
020004000600080001000012000
91. We
need
more
Longitudinal,
Integra$ve
Systems
Biology
92.
93.
94. ScoA
Kelly
–
ISS
for
one
year
Mark
Kelly
–
Earth
control
Telomere
Length
DNA
Muta$ons
&
Structural
Varia$on
DNA
Hydroxy-‐methyla$on
Chroma$n
RNA
expression
&
RNA
Methyla$on
Proteomics
An$body
Titers
Cytokines
DNA
Methyla$on
B-‐cells
/
T-‐cells
Targeted
and
Global
Metabolomics
Microbiome
Cogni$on
Vasculature
96. Gratitude to Many People and Places
Illumina
Gary Schroth
Marc Van Oene
Univ. Chicago
Yoav Gilad
Yale University
Nenad Sestan
Sherman Weissman
FDA/SEQC
Leming Shi
NIH/UDP/NCBI
Jean and Danielle Thierry-Mieg
William Gahl
Baylor
Jeff Rogers
MSKCC
Christina Leslie
Ross Levine
PKI/Geospiza
Todd Smith
Mason Lab
Noah Alexander
Marjan Bozinoski
Dhruva Chandramohan
Sagar Chhangawala
Jorge Gandara
Fran Garrett-Bakelman
Dyala Jaroudi
Sheng Li
Cem Meyden
Lenore Pipes
Darryl Reeves
Yogesh Saletore
Priyanka Vijay
Paul Zumbo
Cornell/WCMC
Jason Banfelder
Scott Blanchard
Selina Chen-Kiang
Olivier Elemento
Yariv Houvras
Samie Jaffrey
Ari Melnick
Margaret Ross
Adam Siepel
Epigenomics Core
Katze Lab/NHPRT
Michael Katze
Bob Palermo
Xinxia Pan
Icahn/MSSM
Eric Schadt, Andrew Kasarskis,
Joel Dudley, Ali Bashir,
Bobby Sebra
NASA
Kosuke Fujishima
Lynn Rothschild
ABRF
George Grills
Scott Tighe
Don Baldwin
UMMS
Maria E Figueroa
AMNH
George Amato
Mark Sidall
U-Minnesota
Ran Blekhman
@mason_lab