SlideShare a Scribd company logo
1 of 96
Download to read offline
Mul$-­‐pla(orm	
  and	
  cross-­‐methodological	
  
reproducibility	
  of	
  transcriptome	
  	
  
profiling	
  by	
  RNA-­‐seq	
  in	
  the	
  ABRF	
  Next-­‐
Genera$on	
  Sequencing	
  Study	
  (ABRF-­‐NGS)	
  
and	
  FDA’s	
  SEQC	
  Study	
  
Christopher	
  E.	
  Mason,	
  Ph.D.	
  
Assistant	
  Professor	
  
Department	
  of	
  Physiology	
  and	
  Biophysics	
  &	
  
The	
  Ins$tute	
  for	
  Computa$onal	
  Biomedicine	
  at	
  
Weill	
  Cornell	
  Medical	
  College	
  and	
  the	
  
Tri-­‐Ins$tu$onal	
  Program	
  on	
  Computa$onal	
  Biology	
  and	
  Medicine	
  
Affiliate	
  Fellow	
  of	
  the	
  Informa$on	
  Society	
  Project,	
  Yale	
  Law	
  School	
  
July	
  10th,	
  2014	
   @mason_lab	
  
0.	
  	
  Background	
  
	
  
1.  RNA	
  Standards	
  
2.  Removing	
  RNA-­‐Seq	
  Varia$on	
  
3.  Epitranscriptome	
  
4.  #JustSequenceIt	
  
Exome:	
  
2%	
  of	
  the	
  
genome?	
  
 
	
  
	
  
Human Genome Organization
from	
  Mason,	
  State,	
  and	
  Moldin,	
  Kaplan	
  &	
  Sadock’s	
  Comprehensive	
  Textbook	
  of	
  Psychiatry,	
  2009	
  
ENCODE	
  ac$ve	
  elements	
  
“These	
  data	
  enabled	
  us	
  to	
  assign	
  biochemical	
  func$ons	
  for	
  	
  
80%	
  of	
  the	
  genome,	
  	
  
in	
  par$cular	
  outside	
  of	
  the	
  well-­‐studied	
  protein-­‐coding	
  regions.”	
  
“The	
  ENCODE	
  results	
  were	
  predicted	
  by	
  one	
  of	
  its	
  authors	
  to	
  
necessitate	
  the	
  rewri$ng	
  of	
  textbooks.”	
  
“We	
  agree,	
  many	
  textbooks	
  dealing	
  with	
  marke;ng,	
  mass-­‐media	
  
hype,	
  and	
  public	
  rela;ons	
  may	
  well	
  have	
  to	
  be	
  rewriAen.”	
  
GENCODE	
  annota;on	
  (v19)	
  
July	
  10th,	
  2014	
  
	
  
Coding	
  genes:	
  20,345	
  
	
  Noncoding	
  genes:	
  22,883	
  
	
  Psuedogenes:	
  14,206	
  
Other	
  genes:	
  516	
  
57,820	
  total	
  
RNA-­‐Seq	
  and	
  all	
  its	
  flavors	
  
create	
  excitement	
  
Differential expression by
gene, exon, splice isoform,
allele, & transcript!
Algorithms: STAR, r-make, ASE,
limma-voom, RSEM"
3!
reads"
alignments"
Sorted
BAMs"
GENCODE "
annotation"
queries "
Reads/bp"
Alignment on HPC nodes"
References!
hg19"
RefSeq"
miRBase"
rRNA"
Adapters"
Find ncRNAs and new genes!
Algorithms: r-make, Aceview"
4!
Sequencing Data"
Gene fusion detection!
Algorithms: r-make, Snowshoes"
5!
Genetic variation (SNVs and
indels)!
Algorithms: STAR/GATK, r-make"
2!
Predict polyA sites & gain/loss
of miRNA binding sites!
Algorithms: r-make, BAGET
AlexaSeq, TargetScan"
6!
!
Viruses/Bacteria/Other
Algorithm: MG-RAST"
Remaining Reads!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT
CTGCTTTAGGATAGATCGATAGCTAGTTCATC!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT!
7!
1!
RNA-­‐seq	
  =	
  Love	
  
R-­‐make	
  tools	
  
But!	
  
There	
  is	
  some	
  noise	
  
What	
  is	
  the	
  source	
  of	
  the	
  wiggles?
Genes	
  
ESTs	
  
Liver
Kidney
Adult Brain
Fetal Hippocampus
Fetal Amygdala
The	
  Dirty	
  Dozen:	
  >=	
  12	
  Sources	
  of	
  Technical	
  Noise	
  in	
  RNA-­‐Seq	
  
	
  
(1)	
  RNA	
  integrity:	
  Sample	
  purity	
  or	
  degrada$on	
  
(2)	
  Sample	
  RNA	
  complexity:	
  polyA	
  RNA,	
  total	
  RNA,	
  miRNA	
  
(3)	
  cDNA	
  synthesis:	
  random	
  hexamer	
  vs.	
  polyA-­‐primed	
  
(4)	
  Library	
  isola;on:	
  Gel	
  excision	
  vs.	
  column	
  
(5)	
  Technical	
  Errors:	
  Machine,	
  Site,	
  Lane,	
  Technician,	
  Library	
  Size	
  
(6)	
  Amplifica;on	
  Cycles	
  or	
  Methods:	
  NuGen,	
  Tn5,	
  Phi29	
  
(7)	
  Input	
  amount:	
  (1,	
  10,	
  100,	
  1000	
  cells)	
  
(8)	
  Algorithms:	
  for	
  alignment	
  and	
  assembly	
  
(9)	
  Fragment	
  size	
  distribu;on:	
  Paired-­‐End,	
  Single-­‐End	
  (adaptors)	
  
(10)	
  Liga;on	
  Efficiency:	
  Mul$plexing/Barcoding	
  and	
  RNA	
  ligases	
  
(11)	
  Depth	
  of	
  Sequencing:	
  cost/benefit	
  point	
  
(12)	
  RNA	
  fragmenta;on:	
  ca$on,	
  enzyma$c	
  
Which	
  type	
  of	
  RNA?	
  
Type Abbreviation Function Organisms
7SK$RNA$ 7SK$ negatively$regulating$CDK9/cyclin$T$complex$ metazoans
Signal$recognition$particle$RNA$ 7SRNA Membrane$integration$ All$organisms$
Antisense$RNA$ aRNA$ Regulatory All$organisms$
CRISPR$RNA$ crRNA$ Resistance$to$parasites Bacteria$and$archaea$
Guide$RNA$ gRNA$ mRNA$nucleotide$modification$ Kinetoplastid$mitochondria$
Long$noncoding$RNA$ lncRNA XIST$(dosage$compensation),$HOTAIR$(cancer) Eukaryotes$
MicroRNA$ miRNA$ Gene$regulation$ Most$eukaryotes$
Messenger$RNA$ mRNA$ Codes$for$protein$ All$organisms
PiwiRinteracting$RNA$ piRNA$ Transposon$defense,$maybe$other$functions$ Most$animals$
Repeat$associated$siRNA$ rasiRNA$ Type$of$piRNA;$transposon$defense$ Drosophila$
Retrotransposon$ retroRNA selfRpropagation Eukaryotes$and$some$bacteria$
Ribonuclease$MRP$ RNase$MRP$ rRNA$maturation,$DNA$replication$ Eukaryotes$
Ribonuclease$P$ RNase$P$ tRNA$maturation$ All$organisms$
Ribosomal$RNA$ rRNA$ Translation$ All$organisms
Small$Cajal$bodyRspecific$RNA$ scaRNA$ Guide$RNA$to$telomere$in$active$cells Metazoans
Small$interfering$RNA$ siRNA$ Gene$regulation$ Most$eukaryotes$
SmY$RNA$ SmY$ mRNA$transRsplicing$ Nematodes$
Small$nucleolar$RNA$ snoRNA$ Nucleotide$modification$of$RNAs$ Eukaryotes$and$archaea$
Small$nuclear$RNA$ snRNA$ Splicing$and$other$functions$ Eukaryotes$and$archaea$
TransRacting$siRNA$ tasiRNA$ Gene$regulation$ Land$plants$
Telomerase$RNA$ telRNA Telomere$synthesis$ Most$eukaryotes$
TransferRmessenger$RNA$ tmRNA$ Rescuing$stalled$ribosomes$ Bacteria$
Transfer$RNA$ tRNA$ Translation$ All$organisms
Viral$Response$RNA$ viRNA AntiRviral$immunity C$elegans
Vault$RNA$ vRNA$ selfRpropagation Expulsion$of$xenobiotics
Y$RNA$ yRNA RNA$processing,$DNA$replication$ Animals$
And	
  -­‐	
  which	
  one	
  do	
  we	
  use?	
  
Technologies	
  Bifurcate	
  into	
  two	
  main	
  realms:	
  
Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0Read
Illumina HiSeq2500 BridgePCR/cluster Rev.<Term.,<SBS 100 150
Illumina HiSeq2000 BridgePCR/cluster Rev.<Term.,<SBS 100 150
Illumina MiSeq BridgePCR/cluster Rev.<Term.,<SBS 250 300
GnuBio GnuBio emPCR HybFAssist<Sequencing 1000* 64,000*
Life<Technologies SOLiD<5500 emPCR Seq.<by<Lig. 75 100
LaserGen LaserGen emPCR Rev.<Term.,<SBS 25* 100*
Pacific<Biosciences RS Polymerase<Binding RealFtime 1800 15,000
454 Titanium emPCR PyroSequencing 650 1100
454 Junior emPCR PyroSequencing 400 650
Helicos Heliscope adaptor<ligation Rev.<Term.,<SBS 35 57
Intelligent<BioSystems MAXFSeq Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300
Intelligent<BioSystems MINIF20 Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300
ZS<Genetics N/A Atomic<Lableing Electron<Microscope N/A N/A
Halcyon<Molecular N/A N/A Direct<Observation<of<DNA N/A N/A
Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0Read
IBM<DNA<Transistor N/A none Microchip<Nanopore N/A N/A
NABsys N/A none Nanochannel N/A N/A
Bionanogenomics N/A anneal<7mers Nanochannel 10,000* 1,000,000*
Life<Technologies PGM emPCR SemiFconductor 150 300
Life<Technologies Proton emPCR SemiFconductor 120 240
Life<Technologies Proton<2 emPCR SemiFconductor 400* 800*
Genia N/A none Protein<nanopore<(aFhemalysin) N/A N/A
Oxford<Nanopore MinION none Protein<Nanopore 10,000 10,000*
Oxford<Nanopore GridION<2K none Protein<Nanopore 10,000 500,000*
Oxford<Nanopore GridION<8K none Protein<Nanopore 10,000 500,000*
*Values<are<estimates<from<companies<that<have<not<yet<released<actual<data
Optical<Sequencing
Electical<Sequencing
Table<1:<Types<of<HighFThroughput<Sequencing<Technologies
Mason	
  CE,	
  Porter	
  SG,	
  Smith	
  TM.	
  Adv	
  Exp	
  Med	
  Biol.	
  2014	
  
0.	
  	
  Genomes	
  
	
  
1.  RNA	
  Standards	
  
2.  Removing	
  RNA-­‐Seq	
  Varia$on	
  
3.  Epitranscriptome	
  
4.  #JustSequenceIt	
  
The complexity of the transcriptome is vast
exon1	
  	
  
exon2	
  	
  
exon3	
  
exon1-­‐exon2	
  
exon1-­‐exon3	
  
exon2-­‐exon3	
  
exon1-­‐exon2-­‐exon3	
  
6	
   63	
   15	
  
7	
   127	
   21	
  
8	
   255	
   28	
  
3	
   7	
   3	
  
4	
   15	
   6	
  
5	
   31	
   10	
  
1	
   1	
   0	
  
2	
   3	
   1	
  
Exons	
   Variants	
   Junc;ons	
  
2n-1
Exon	
  1	
   Exon	
  2	
   Exon	
  3	
  
Exon	
  1	
   Exon	
  2	
   Exon	
  3	
  
n(n-1)
2
Exon4	
  
Exon4	
  
exon4	
  	
  
exon1-­‐	
  exon4	
  
exon2-­‐exon4	
  
exon3-­‐exon4	
  
exon1-­‐exon2-­‐exon4	
  
exon1-­‐exon3-­‐exon4	
  
exon2-­‐exon3-­‐exon4	
  
exon1-­‐exon2-­‐exon3-­‐exon4	
  
8x1083 theoretical transcript combinations
8x1080 atoms in the universe
(159 atoms/star, 111 stars/galaxy, 110 galaxies)Li	
  and	
  Mason,	
  ARGHG,	
  2014	
  
Es$mate	
  of	
  False	
  Junc$on	
  Rate	
  
5’	
   3’	
  
Exon-­‐Edge	
  DB	
  
(3’-­‐5’	
  junc$ons)	
  
2 3 41Exon # :
False	
  Exon-­‐Edge	
  DB	
  
(3’-­‐3’	
  junc$ons)	
  
False	
  Exon-­‐Edge	
  DB	
  
(5’-­‐5’	
  junc$ons)	
  
False	
  Exon-­‐Edge	
  DB	
  
(5’-­‐3’	
  junc$ons)	
  
5’	
   3’	
  
5’	
   3’	
  
5’	
   3’	
  
FDA’s Sequencing Quality Control (SeQC) Consortium:
Seq Standards from 323 members of
Government (FDA), Industry, Academia
>RNA sequence from pilot study based on MAQC
samples and MAQC design:
~Pilot data from 2008-2012
MAQC Sample A/B, & ERCCs from:
•  454
•  Helicos
•  Illumina
•  Lifetech
~1000Gb of Illumina BodyMap data
~300Gb of experimental RNA-Seq
Associa;on	
  of	
  Biomolecular	
  Research	
  Facili;es	
  (ABRF)	
  
Next	
  Genera;on	
  Sequencing	
  (NGS)	
  Study:	
  
	
  
Phase	
  I:	
  RNA-­‐Seq	
  and	
  degraded	
  RNA-­‐Seq	
  (2011-­‐2013)	
  
Phase	
  II:	
  DNA-­‐Seq	
  and	
  Hard-­‐to-­‐sequence	
  regions	
  (2014-­‐2016)	
  
Phase	
  III:	
  Asteroid	
  and	
  Mar$an	
  Surface	
  Sequencing	
  (2017-­‐2050?)	
  
	
  
1.  Cross	
  Pla(orm:	
  HiSeq2000,	
  454,	
  PGM,	
  Proton,	
  PacBio	
  
2.  Cross-­‐Protocol:	
  ribo-­‐,	
  direc$onal,	
  non-­‐direc$onal,	
  degraded	
  RNA	
  
3.  Cross-­‐Site:	
  3	
  sites	
  for	
  each	
  pla(orm,	
  replicates	
  at	
  each	
  site	
  
Improve	
  the	
  detec$on	
  and	
  quan$fica$on	
  of	
  molecular	
  signatures	
  	
  
for	
  use	
  in	
  basic,	
  applied,	
  and	
  clinical	
  research.	
  
	
  
Specifically,	
  the	
  ABRF-­‐NGS	
  group	
  will	
  test	
  and	
  characterize	
  the	
  myriad,	
  
rapidly	
  evolving	
  NGS	
  techniques	
  for	
  RNA-­‐Seq	
  and	
  DNA-­‐Seq.	
  
Samples	
  of	
  the	
  MAQC,	
  SEQC	
  and	
  ABRF-­‐NGS	
  Study	
  
ß	
  SEQC	
  samples	
  (FDA)	
  A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
A:	
  10	
  cancer	
  cell	
  lines	
  
B:	
  Brain	
  $ssues	
  
A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-­‐pla(orm	
  experimental	
  design	
  
A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-­‐Site	
  experimental	
  design	
  
A
B
A B
A
B
C
D
A B
A B
A BA B C D E F
A
B
A
B
A
B
A
B
E
F
Pacific Biosciences
RS
Life Technologies
Ion PGM
Life Technologies
Ion Proton
Cross-­‐Protocol	
  experimental	
  design	
  
All	
  technologies	
  have	
  strengths	
  &	
  weaknesses	
  
Vendor Instrument Version
Run Time
(hours)
Read
Length
(mean)
Reads
per Run
(million)
Yield per
Run
Cost per
Run
($)1
Cost
per Mb
($)
Paired-
end
Application Strengths
Illumina HiSeq 2000 High Output 132 50 6,000 300 Gb2
18,725.00 0.06 Yes
Gene expression; splice
junction detection;
variant calling; fusions
Deep read counts for
transcript
quantification
Illumina HiSeq 2500 High Output 132 50 6,000 300 Gb2
18,725.00 0.06 Yes as above as above
Illumina MiSeq v2 kit 39 250 30 7.5 Gb 982.75 0.13 Yes
Splice junction
detection; variant calling
Rapid transcript
quantification and
variant detection
Life
Technologies
PGM 318 chip 7.3 176 6 1.056 Gb 749.00 0.71 No
Splice junction
detection; variant calling
Rapid transcript
quantification and
variant detection
Life
Technologies
Proton
Proton I
chip
2 to 4 81 70 5.67 Gb 834.00 0.15 No
Gene expression;
variant calling
Good read depth and
length for transcript
quantification
Pacific
Biosciences
RS RS 0.5 to 2 1,289 0.03 38.67 Mb 136.38 3.53 No
Splice junction
detection; full length
gene coverage
Extended read
lengths
Roche 454
Genome
Sequencer
FLX+
20 686 1 686 Mb 5,985.00 8.72 No Splice junction detection Read length
1
sequencing reaction reagents, at academic list price U.S. dollars, and does not include library preparation reagents, labor, data storage or analysis, equipment
or maintenance; 2
one sequencing run using 2 flowcells. Pacific Biosciences is calculated based on CCS reads; Gb: billion bases, Mb: million bases
Table 1. Summary of RNA-seq platforms as deployed in the ABRF-NGS Study.
I:	
  Inter-­‐pla(orm	
  performance	
  
Error	
  models	
  highly	
  variable	
  among	
  
pla(orms	
  
0%
2%
4%
6%
454
ILMN
PAC
PGM
PRO
platform
Indelsmismatchspercentage
b
454:	
  Roche	
  454	
  GS	
  FLX+	
  
ILMN:	
  Illumina	
  HiSeq	
  2000/2500	
  
PAC:	
  Pacific	
  Biosciences	
  RS	
  I	
  
PGM:	
  Ion	
  Personal	
  Genome	
  Machine	
  
PRO:	
  Ion	
  Torrent	
  Proton	
  
	
  	
  	
  	
  
Full	
  length	
  genebody	
  coverage	
  	
  
	
  
Coverage across genebody (%)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
5'
platform
454
sample
site
Y
type
intact
type
site
sample
platformm
454	
   ILMN	
   PAC	
   PGM	
   PRO	
  
Note:	
  PAC	
  used	
  CCS	
  read	
  only;	
  PAC	
  on	
  average	
  only	
  cover	
  3k	
  GENCODE	
  genes.	
  
Inter-­‐site	
  normalized	
  gene	
  expression	
  
varia$on:	
  CV	
  and	
  R2	
  
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●●
●
●●●●●
●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●
●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●●
●●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●●
●●
●●●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●
●●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●●
0
50
100
150
200
454−A
454−B
ILMN−A
ILMN−B
PAC−A1
PAC−A2
PAC−A3
PAC−B1
PAC−B2
PAC−B3
PGM−A
PGM−B
PRO−A
PRO−B
Coefficientofvariation(%)
454 ILMN PAC PGM PRO
intra−platform
A
intra−platform
B
inter−platform
A
inter−platform
B
0.92
0.97
0.91
0.96
0.85
0.98
0.95
0.95
0.82
0.88
0.85
0.84
0.93
0.89
0.75
0.82
0.78
0.86
0.92
0.92
0.00
0.25
0.50
0.75
1.00
454
ILMN
PGM
PRO
454
ILMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
platform
Correlationcoefficients
● 454 ILMN PAC PGM PRO
● ●A B
es
A B● 454 ILMN PAC PGM PRO
● ●A B
gd e f
a b
ILMN	
  with	
  lowest	
  inter-­‐site	
  CV	
   ILMN	
  with	
  highest	
  inter-­‐site	
  R2	
  
ILMN-­‐PRO	
  &	
  PRO-­‐PGM	
  with	
  highest	
  R2	
  
	
  	
  	
  	
  	
  	
  	
  	
  
Genes	
  detec$on	
  is	
  log-­‐linear;	
  
Junc$on	
  detec$on	
  is	
  length-­‐dependent	
  454−A
454−B
ILMN−A
ILMN−B
PAC−A1
PAC−A2
PAC−A3
PAC−B1
PAC−B2
PAC−B3
PGM−A
PGM−B
PRO−A
PRO−B
454
LMN
PGM
●
●
50000
100000
150000
200000
9.0
9.5
10.0
10.5
11.0
bases sequenced (log10)
detectedjunctions
● 454 ILMN PAC PGM PRO
● ●A B
●●
10000
20000
30000
9.0
9.5
10.0
10.5
11.0
bases sequenced (log10)
detectedgenes
● 454 ILMN PAC PGM PRO
● ●A B
d e	
  	
  	
  	
  	
  	
  	
  	
  
Junc$ons	
  detec$on	
  efficiency	
  highly	
  
variable;	
  agreement	
  common	
  
PRO
454
LMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
LMN−PGM
LMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
LMN−PGM
LMN−PRO
PGM−PRO
platform
A−AH
A−AS
A−AR
AH−AS
AH−AR
AR−AS
B−BR
sample
0
25
50
75
454
ILMN
PAC
PGM
PRO
platform
junctionspermillionbases
A B
known novel
0
20000
40000
60000
1
2
3
4
5
3
4
5
occurrence among platforms
junctions
A Bgf
Note:	
  Only	
  use	
  a	
  subset	
  of	
  reads	
  among	
  pla(orms	
  
to	
  normalize	
  the	
  scale	
  
	
  	
  	
  	
   	
  	
  	
  	
  
Most	
  of	
  the	
  known	
  junc$ons	
  are	
  
shared	
  by	
  at	
  least	
  3	
  pla(orms	
  
Sequencing	
  depth	
  is	
  important	
  to	
  
discover	
  low	
  abundance	
  transcripts	
  	
  
Inter-­‐pla(orm	
  differen$al	
  gene	
  
expression	
  show	
  88-­‐97%	
  agreement	
  
Shared	
  sets	
  of	
  greater	
  
than	
  1000	
  genes	
  are	
  
indicated	
  in	
  red,	
  	
  
100-­‐999	
  yellow,	
  	
  
<100	
  blue.	
  
	
  
Unique	
  DEGs:	
  
454	
  	
  	
  	
  	
  	
  -­‐	
  3.0%	
  
POLYA	
  -­‐	
  9.2%	
  
RIBO	
  	
  	
  	
  -­‐	
  8.8%	
  
PRO	
  	
  	
  	
  	
  -­‐	
  11.9%	
  
PGM	
  	
  	
  	
  -­‐	
  3.9%	
  	
  
	
  	
  
Propor$onal	
  venn	
  diagrams	
  don’t	
  
always	
  add	
  clarity,	
  but	
  they	
  are	
  prewy	
  RIBO
POLYA
454
PGM
PRO
1766
482
2046
202
69
105
2199
1104
484158
678 76 1828
740
1010
401
133
559
62
20
36595
2792
1187
406
1791
212
57
99
2151
II:	
  Inter-­‐protocol	
  quan$fica$ons	
  
•  Illumina	
  
– Poly-­‐A	
  selec$ons	
  (POLYA)	
  
– Ribo-­‐deple$on	
  (RIBO)	
  
Gene	
  regions	
  distribu$on	
  varies	
  between	
  protocols	
  
a" b"MappingDistribution(%)!
0%
25%
50%
75%
100%
Sample
Mappingdistribution(%) intergenic mito exon 5utr intron ribo
a	
  	
  	
  	
  
Very	
  similar	
  gene	
  measurements	
  of	
  genes	
  
(RPKM)	
  between	
  polyA	
  &	
  Ribo-­‐deple$on	
  
Ribo0>PolyA+ PolyA>Ribo0+
RPKM%difference%(polyA%–%ribo0)%
Number%of%Genes%
ENSEMBL Gene ID Gene Symbol Description Length PolyA
Reads
RiboDep
Reads
PolyA
FPKM
RiboDep
FPKM
FPKM Diff
ENSG00000210082 J01415.4 Mt_rRNA 1559 17040155 133679955 18568.31 200404.74 7181836.43
ENSG00000211459 J01415.24 Mt_rRNA 954 2232892 14445488 3976.16 35389.28 731413.11
ENSG00000202198 RN7SK misc_RNA 331 3303 2818202 16.95 19899.03 719882.08
ENSG00000258486 RN7SL1 antisense 300 3061 520624 17.33 4055.93 74038.60
ENSG00000202364 SNORD3A snoRNA 216 718 186779 5.65 2020.98 72015.33
ENSG00000202538 RNU472 snRNA 141 168 102560 2.02 1699.99 71697.97
ENSG00000251562 MALAT1 lincRNA 8708 437102 4931496 85.27 1323.57 71238.30
ENSG00000199916 RMRP misc_RNA 264 148 83019 0.95 734.96 7734.00
ENSG00000201098 RNY1 misc_RNA 113 172 19210 2.59 397.32 7394.73
ENSG00000238741 SCARNA7 snoRNA 330 53 50182 0.27 355.40 7355.13
ENSG00000200795 RNU471 snRNA 141 35 18724 0.42 310.36 7309.94
ENSG00000200087 SNORA73B snoRNA 204 336 23778 2.80 272.42 7269.62
ENSG00000252010 SCARNA5 snoRNA 276 29 25379 0.18 214.91 7214.73
ENSG00000199568 RNU5A71 snRNA 116 38 10224 0.56 205.99 7205.44
ENSG00000252481 SCARNA13 snoRNA 275 145 24034 0.90 204.26 7203.36
ENSG00000212232 SNORD17 snoRNA 237 102 20571 0.73 202.86 7202.13
ENSG00000207008 SNORA54 snoRNA 123 8 9655 0.11 183.46 7183.35
ENSG00000200156 RNU5B71 snRNA 116 28 8301 0.41 167.25 7166.84
ENSG00000209582 SNORA48 snoRNA 135 38 9272 0.48 160.52 7160.04
ENSG00000239002 SCARNA10 snoRNA 330 19 21801 0.10 154.40 7154.30
ENSG00000254911 SCARNA9 antisense 353 501 19842 2.41 131.37 7128.96
ENSG00000230043 TMSB4XP6 pseudogene 135 0 6272 0.00 108.58 7108.58
ENSG00000239039 SNORD13 snoRNA 104 0 4521 0.00 101.60 7101.60
ENSG00000208892 SNORA49 snoRNA 136 7 5352 0.09 91.97 791.89
ENSG00000223336 RNU276P snRNA 190 22 6365 0.20 78.29 778.10
Supplemental Table 3 - Top 25 Genes with Highest Enrichment from Ribo-Depletion Preparation
polyA	
  favors	
  high	
  expressors	
  
•  Gencode14	
  
•  Unique	
  genes:	
  55889	
  
0	
  
5,000	
  
10,000	
  
15,000	
  
20,000	
  
25,000	
  
30,000	
  
≥	
  2	
   ≥	
  4	
   ≥	
  8	
   ≥	
  16	
   ≥	
  32	
   ≥	
  64	
   ≥	
  128	
  
PolyA	
  
RiboZero	
  
0	
  
5,000	
  
10,000	
  
15,000	
  
20,000	
  
25,000	
  
30,000	
  
≥	
  2	
   ≥	
  4	
   ≥	
  8	
   ≥	
  16	
   ≥	
  32	
   ≥	
  64	
   ≥	
  128	
  
Read	
  count	
  
Genome-­‐level	
  Gene	
  Counts	
   Gene	
  Overlap	
  
Read	
  count	
  
High-­‐overlap	
  DEGs	
  and	
  	
  
comparable	
  accuracy	
  
FDR: 0.001 FDR: 0.01 FDR: 0.05
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
FC:1.5FC:2
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
comparison
Proportion
type POLYA RIBO Shared
b"
B−C
itivity)
.81.0
FDR: 0.05; FC:1.5
B−D
itivity)
.81.0
FDR: 0.05; FC:1.5
C−D
itivity)
.81.0
FDR: 0.05; FC:1.5
FC: 1.5
FDR: 0.001
FC: 1.5
FDR: 0.01
FC: 1.5
FDR: 0.05
FC: 2
FDR: 0.001
FC: 2
FDR: 0.01
FC: 2
FDR: 0.05
0.0
0.2
0.4
0.6
0.8
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
A−B
A−C
A−D
B−C
B−D
C−D
comp
Externalvalidation(MCC)
RIBO POLYAc
Using	
  TaqMan	
  as	
  external	
  valida$on	
  812	
  reac$ons	
  
	
  	
  	
  	
  	
  	
  	
  	
  
III:	
  Degraded	
  RNA?	
  
0.0
0.5
1.0
1.5
2.0
0
10
20
30
40
50
60
70
80
90
100
Gene (%, 5'−3')
Coverage(%)
UHR−RIN0 UHR−RIN3 UHR−RIN6 UHR−RIN9
Genetic variation (SNVs, indels, CNVs)!
2!
Differential expression by gene, exon,
splice isoform, allele, & transcript!3!
Predict gene fusions, polyA sites and
alterations to miRNA binding sites!
5!
!
Viruses/Bacteria/Other"
De Novo Assembly of Remaining Reads!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT
CTGCTTTAGGATAGATCGATAGCTAGTTCATC!
TCTGCTTTAGGATAGATCGATAGCTAGTTCAT!
6!
reads"
alignments"
Sorted
BAMs"
annotations"
queries "
Reads/bp"
Alignment on HPC nodes"
References!
hg19"
RefSeq"
miRBase"
rRNA"
Adapters"
Find ncRNAs and new genes!
4!
1!
Sequencing Data"
Many	
  areas	
  could	
  be	
  impacted	
  
Entropy	
  is	
  usually	
  a	
  source	
  of	
  fear
"FEAR is	
  the	
  main	
  source	
  of	
  	
  superstition,	
  
	
  
and	
  one	
  of	
  the	
  main	
  sources	
  of	
  	
  	
  cruelty.
To	
  conquer	
  fear	
  is	
  the	
  beginning	
  of	
  	
  wisdom."	
  	
  
	
  
–	
  Bertrand	
  Russell	
  
HBR−RIN9
HBR−RIN0
UHR−RIN0
UHR−RIN3
UHR−RIN9
UHR−RIN6
HBR−RIN9
HBR−RIN0
UHR−RIN0
UHR−RIN3
UHR−RIN9
UHR−RIN6
0.6 0.7 0.8 0.9 1
Value
Color Key
Can	
  we	
  remove	
  supers$$on?	

HEAT	
  
99oC	
  for	
  	
  
10	
  minutes	
  
SONICATION	
  
6	
  x	
  55	
  seconds	
  	
  
at	
  5%DC,	
  	
  I	
  
intensity	
  3	
  	
  
100	
  c/b	
  
RNAse	
  	
  
RNAse	
  A	
  	
  
1	
  ng/uL	
  
Un$l	
  RIN	
  <2	
  
Degraded	
  RNA	
  looks	
  great!	
  
Coverage across genebody (%)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
5'
platform
454
sample
site
Y
type
intact
type
site
sample
platformm
454	
   ILMN	
   PAC	
   PGM	
   PRO	
  
Degraded	
  RNA	
  highly	
  correlates	
  with	
  
intact	
  RNA	
  gene	
  expression	
  
atform intra−platform
B
inter−platform
A
inter−platform
B
0.96
0.85
0.98
0.95
0.95
0.82
0.88
0.85
0.84
0.93
0.89
0.75
0.82
0.78
0.86
0.92
0.92
PRO
454
ILMN
PGM
PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
454−ILMN
454−PGM
454−PRO
ILMN−PGM
ILMN−PRO
PGM−PRO
platform
0.95
0.94
0.94
0.96
0.97
0.97
0.96
0.00
0.25
0.50
0.75
1.00
A−AH
A−AS
A−AR
AH−AS
AH−AR
AR−AS
B−BR
sample
Correlationcoefficients
c
B
A B
A
B
C
D
A
B
Pacific Biosciences
RS
Life Tech
Ion P
(AR)	
  
(AS)	
  
(AH)	
  
(BR)	
  
	
  	
  	
  	
  	
  
Illumina	
  Ribo-­‐deple$on	
  protocol	
  
0.	
  	
  Genomes	
  
	
  
1.  RNA	
  Standards	
  
2.  Removing	
  RNA-­‐Seq	
  Varia$on	
  
3.  Epitranscriptome	
  
4.  #JustSequenceIt	
  
How	
  Sequencing	
  Quality	
  affect	
  
Inter-­‐site	
  gene	
  expression	
  
quan$fica$on?	
  
SEQC	
  study	
  
Ameliora$ng	
  Inter-­‐site	
  Varia$on	
  
SEQC
samples
Each site has
4 replicates
ILM1 ILM2
Replicate 5
ILM3
Replicate 5
ILM4 ILM5
Replicate 5
ILM6
Replicate 5 libraries
were vendor supplied
SEQC	
  study	
  
HiSeq	
  2000	
  
Ameliora$ng	
  Inter-­‐site	
  Varia$on	
  
SEQC
samples
Each site has
4 replicates
ILM1 ILM2
Replicate 5
ILM3
Replicate 5
ILM4 ILM5
Replicate 5
ILM6
Replicate 5 libraries
were vendor supplied
SEQC	
  study	
  
Determine	
  sequencing	
  varia$on	
  sources	
  
SEQC	
  study	
  
Nucleo$de	
  frequency	
  bias	
  from	
  library	
  prepara$on	
  ●M5 ILM6
A G C T
23
24
25
26
27
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
Position in read
Nucleotidefrequency(%)
ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
1 5
site
1 2 3 4 5
● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
site
1 2 3 4 5
● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6
f
SEQC	
  study	
  
Replicate 5 libraries
were vendor supplied
	
  	
  	
  	
  
<-­‐	
  5th	
  replicate	
  
of	
  ILM2	
  and	
  
ILM3	
  
<-­‐	
  1th	
  replicate	
  
of	
  ILM2	
  
<-­‐	
  1th	
  replicate	
  
of	
  ILM3	
  
Sample	
  A	
  vs.	
  A	
  =	
  the	
  Answer	
  should	
  be	
  zero	
  DEGs;	
  
SVA	
  can	
  remove	
  false	
  posi$ve	
  DEGs	
  	
  
te
0.957
0.958
0.958
0.971
0.973
0.970
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
1 2
1
BBBBBB
B
B
B
BBB
B
B
BBBB
BBBB
B
BBBB
●
●
●
●
●
●
ILM1
ILM2
ILM3
ILM4
ILM5
ILM6
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
ABCD
ILM1−ILM2
ILM1−ILM3
ILM1−ILM4
ILM1−ILM5
ILM1−ILM6
ILM2−ILM3
ILM2−ILM4
ILM2−ILM5
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
Comparison
FalsepositiveDEGs
Without SVA With SVA
c
Illumina	
  
SEQC	
  study	
   Leek	
  JT,	
  Storey	
  JD.PLoS	
  Genet.2007	
  
Compare	
  the	
  same	
  
sample	
  across	
  sites	
  
	
  	
  	
  	
  
SVA	
  remove	
  false	
  posi$ve	
  DEGs	
  
Validated	
  by	
  PGM	
  data	
  from	
  ABRF-­‐NGS	
  study	
  	
  
te
0.957
0.958
0.958
0.971
0.973
0.970
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
1 2
1
BBBBBB
B
B
B
BBB
B
B
BBBB
BBBB
B
BBBB
●
●
●
●
●
●
ILM1
ILM2
ILM3
ILM4
ILM5
ILM6
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
0
2500
5000
7500
10000
ABCD
ILM1−ILM2
ILM1−ILM3
ILM1−ILM4
ILM1−ILM5
ILM1−ILM6
ILM2−ILM3
ILM2−ILM4
ILM2−ILM5
ILM2−ILM6
ILM3−ILM4
ILM3−ILM5
ILM3−ILM6
ILM4−ILM5
ILM4−ILM6
ILM5−ILM6
Comparison
FalsepositiveDEGs
Without SVA With SVA
c
Illumina	
   PGM	
  
−2 −1 0 1
−0.50.00.51.01.5
Dimension 1
Dimension2
A.1
A.2 A.3
A.4
B.1
B.2
B.3
B.4
A.1
A.2
B.1
B.2
A.1
A.2
B.1B.2
●
●
●
PGM1
PGM2
PGM3
P
P
P
P
P
P
site
● ● ●PGM1 PGM2 PGM3
1 2 3 4
P
P
P
P
P
site
A
● ● ●PGM1 PGM2 PGM3
1 2 3 4
comp: A comp: B
0
50
100
150
200
PGM1−PGM2
PGM1−PGM3
PGM2−PGM3
PGM1−PGM2
PGM1−PGM3
PGM2−PGM3
Comparison
FalsepositiveDEGs
Without SVA With SVA
1
1
NumberofDEGs
d e f
SEQC	
  study	
   Li	
  et	
  al.,	
  in	
  press,	
  Nat.	
  Biotech.,	
  2014	
   ABRF-­‐NGS	
  study	
  
	
  	
  	
  	
  
NIST,	
  SEQC,	
  ABRF,	
  ENCODE,	
  others	
  
•  Provide	
  reference	
  data	
  
resources	
  	
  
•  Best	
  prac$ces	
  for	
  	
  
–  gene	
  quan$fica$on,	
  	
  
–  isoform	
  characteriza$on,	
  
–  dynamic	
  range	
  comparisons,	
  	
  
–  managing	
  inter-­‐site	
  and	
  intra-­‐
site	
  varia$on,	
  	
  
–  analysis	
  pipelines,	
  and	
  	
  
–  cross-­‐pla(orm	
  tes$ng	
  of	
  
transcriptome	
  hypotheses	
  	
  
•  To	
  address	
  some	
  other	
  aspects	
  
of	
  RNA-­‐seq,	
  including	
  	
  
–  variant	
  detec$on,	
  	
  
–  allele-­‐specific	
  expression,	
  	
  
–  RNA	
  edi$ng,	
  and	
  gene	
  fusions.	
  	
  
•  And	
  more	
  …	
  
0.	
  	
  Genomes	
  
	
  
1.  RNA	
  Standards	
  
2.  Removing	
  RNA-­‐Seq	
  Varia$on	
  
3.  Epitranscriptome	
  
4.  #JustSequenceIt	
  
Which	
  transcriptome	
  is	
  important?	
  	
  
	
  
All	
  …	
  RNA	
  bases.	
  
The	
  four-­‐base	
  genome	
  	
  
is	
  just	
  the	
  beginning	
  
The	
  four-­‐base	
  genome	
  	
  
is	
  just	
  the	
  beginning	
  
There	
  are	
  many	
  RNA-­‐mods	
  as	
  well:	
  
Chuan	
  He	
  
Abbreviation Chemical0name
m1
acp3
Ψ 1'methyl'3'(3'amino'3'carboxypropyl)5pseudouridine
m1
A 1'methyladenosine
m1
G 1'methylguanosine
m1
I 1'methylinosine
m1
Ψ 1'methylpseudouridine
m1
Am 1,2''O'dimethyladenosine
m1
Gm 1,2''O'dimethylguanosine
m1
Im 1,2''O'dimethylinosine
m2
A 2'methyladenosine
ms2
io6
A 2'methylthio'N6
'(cis'hydroxyisopentenyl)5adenosine
ms2
hn6
A 2'methylthio'N6
'hydroxynorvalyl5carbamoyladenosine
ms2
i6
A 2'methylthio'N6
'isopentenyladenosine
ms2
m6
A 2'methylthio'N6
'methyladenosine
ms2
t6
A 2'methylthio'N6
'threonyl5carbamoyladenosine
s2
Um 2'thio'2''O'methyluridine
s2
C 2'thiocytidine
s2
U 2'thiouridine
Am 2''O'methyladenosine
Cm 2''O'methylcytidine
Gm 2''O'methylguanosine
Im 2''O'methylinosine
Ψm 2''O'methylpseudouridine
Um 2''O'methyluridine
Ar(p) 2''O'ribosyladenosine5(phosphate)
Gr(p) 2''O'ribosylguanosine5(phosphate)
acp3
U 3'(3'amino'3'carboxypropyl)uridine
m3
C 3'methylcytidine
m3
Ψ 3'methylpseudouridine
m3
U 3'methyluridine
m3
Um 3,2''O'dimethyluridine
imG'14 4'demethylwyosine
s4
U 4'thiouridine
chm5
U 5'(carboxyhydroxymethyl)uridine
mchm5
U 5'(carboxyhydroxymethyl)uridine5methyl5ester
inm5
s2
U 5'(isopentenylaminomethyl)'52'thiouridine
inm5
Um 5'(isopentenylaminomethyl)'52''O'methyluridine
inm5
U 5'(isopentenylaminomethyl)uridine
nm5
s2
U 5'aminomethyl'2'thiouridine
ncm5
Um 5'carbamoylmethyl'2''O'methyluridine
ncm5
U 5'carbamoylmethyluridine
cmnm5
Um 5'carboxymethylaminomethyl'52''O'methyluridine
cmnm5
s2
U 5'carboxymethylaminomethyl'2'thiouridine
cmnm5
U 5'carboxymethylaminomethyluridine
cm5
U 5'carboxymethyluridine
f5
Cm 5'formyl'2''O'methylcytidine
f5
C 5'formylcytidine
hm5
C 5'hydroxymethylcytidine
ho5
U 5'hydroxyuridine
mcm5
s2
U 5'methoxycarbonylmethyl'2'thiouridine
mcm5
Um 5'methoxycarbonylmethyl'2''O'methyluridine
mcm5
U 5'methoxycarbonylmethyluridine
mo5
U 5'methoxyuridine
m5
s2
U 5'methyl'2'thiouridine
mnm5
se2
U 5'methylaminomethyl'2'selenouridine
mnm5
s2
U 5'methylaminomethyl'2'thiouridine
mnm5
U 5'methylaminomethyluridine
m5
C 5'methylcytidine
m5
D 5'methyldihydrouridine
m5
U 5'methyluridine
τm5
s2
U 5'taurinomethyl'2'thiouridine
τm5
U 5'taurinomethyluridine
m5
Cm 5,2''O'dimethylcytidine
m5
Um 5,2''O'dimethyluridine
preQ1 7'aminomethyl'7'deazaguanosine
preQ0 7'cyano'7'deazaguanosine
m7
G 7'methylguanosine
G+
archaeosine
D dihydrouridine
oQ epoxyqueuosine
galQ galactosyl'queuosine
OHyW hydroxywybutosine
I inosine
imG2 isowyosine
k2
C lysidine
manQ mannosyl'queuosine
mimG methylwyosine
m2
G N2
'methylguanosine
m2
Gm N2
,2''O'dimethylguanosine
m2,7
G N2
,7'dimethylguanosine
m2,7
Gm N2
,7,2''O'trimethylguanosine
m2
2G N2
,N2
'dimethylguanosine
m2
2Gm N2
,N2
,2''O'trimethylguanosine
m2,2,7
G N2
,N2
,7'trimethylguanosine
ac4
Cm N4
'acetyl'2''O'methylcytidine
ac4
C N4
'acetylcytidine
m4
C N4
'methylcytidine
m4
Cm N4
,2''O'dimethylcytidine
m4
2Cm N4
,N4
,2''O'trimethylcytidine
io6
A N6
'(cis'hydroxyisopentenyl)adenosine
ac6
A N6
'acetyladenosine
g6
A N6
'glycinylcarbamoyladenosine
hn6
A N6
'hydroxynorvalylcarbamoyladenosine
i6
A N6
'isopentenyladenosine
m6
t6
A N6
'methyl'N6
'threonylcarbamoyladenosine
m6
A N6
'methyladenosine
t6
A N6
'threonylcarbamoyladenosine
m6
Am N6
,2''O'dimethyladenosine
m6
2A N6
,N6
'dimethyladenosine
m6
2Am N6
,N6
,2''O'trimethyladenosine
o2yW peroxywybutosine
Ψ pseudouridine
Q queuosine
OHyW undermodified5hydroxywybutosine
cmo5
U uridine55'oxyacetic5acid
mcmo5
U uridine55'oxyacetic5acid5methyl5ester
yW wybutosine
imG wyosine
Table515'5List5of5Base5Modifications5Covered5by5Claims
m6A	
  is	
  just	
  1	
  of	
  the	
  110	
  	
  
known	
  RNA	
  modifica$ons	
  
from	
  the	
  	
  
RNA	
  Modifica$on	
  Database	
  
m6A	
  marks	
  are	
  widespread	
  in	
  cancer	
  cells’	
  RNA	
  
Meyer	
  et	
  al.,	
  Cell,	
  2012	
  
A	
  new	
  method:	
  MeRIP-­‐Seq	
  
Meyer	
  et	
  al.,	
  Cell,	
  2012	
  
Conserva$ons	
  of	
  signal	
  and	
  sites	
  in	
  
orthologous	
  	
  genes	
  	
  
Meyer	
  et	
  al.,	
  Cell,	
  2012	
  
Peaks	
  also	
  show	
  strong	
  conserva$on	
  
Meyer,	
  2012	
  
Many	
  puta$ve	
  roles	
  for	
  m6A	
  in	
  RNA	
  
Paz-­‐Yaakov	
  et	
  al,	
  2010	
  
Saletore,	
  Chen-­‐Kiang,	
  and	
  Mason,	
  RNA	
  Biology,	
  2013.	
  
RNA	
  modifica$ons	
  give	
  a	
  new	
  layer	
  of	
  cellular	
  regula$on	
  
N
NN
N
NH2
O
OHO
O
RNA
RNA
N
NN
N
HN
O
OHO
O
RNA
RNA
H2
C
OH
N
NN
N
HN
O
OHO
O
RNA
RNA
C
H
O
N
NN
N
HN
O
OHO
O
RNA
RNA
C
OH
O
N
NN
N
HN
O
OHO
O
RNA
RNA
CH3
A m6A
ca6A f6A hm6A
O
H H
O
H OH
H2O
Mettl3
FTO/ALKBH5
O
2
, Fe
II
, -KG
FTO/
ALKBH5
O
2
, Fe
II
, -KG
Li	
  and	
  Mason,	
  ARGHG,	
  2014	
  
New	
  layer	
  to	
  study!	
  
Direct	
  Detec$on	
  of	
  Methyla$on	
  on	
  PacBio	
  
70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.5
0
100
200
300
400
Fluorescence
intensity(a.u.)
Time (s)
104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5
0
100
200
300
400
Fluorescence
intensity(a.u.)
Time (s)
C	
  
T	
   G	
   A	
   T	
  C	
   G	
   T	
   A	
   C	
  
mA	
  
A	
  G	
   T	
  C	
  T	
   A	
   A	
  
G	
   C	
   C	
   A	
  A	
   A	
  
A	
  
Approach:	
  Kine$c	
  detec$on	
  of	
  methylated	
  bases	
  during	
  SMRT	
  DNA	
  sequencing	
  
Example:	
  N6-­‐methyladenosine	
  (mA)	
  
SMRT RNA Sequencing – Assay
Design
I.  RNA template (purple) is hybridized to a DNA primer (orange) and immobilized on the bottom of a
zero-mode waveguide (ZMW) in the presence of fluorescently-labeled nucleotide analogues.
II.  After the addition of HIV RT to the solution, the polymerase binds to the RNA template at the 3’-end of
DNA primer.
III.  The first nucleotide analogue can bind to the active site of the polymerase resulting in a fluorescent
pulse with a color characteristic to the type of nucleotide (i.e. A, C, G, or T). With HIV RT, the rates of
nucleotide binding, dissociation and incorporation have not been optimized to a state where each
binding event would result in an incorporation event (image IV.). Consequently, nucleotide analogues
often dissociate before they can be incorporated resulting in a series of like pulses before the
polymerase can translocate to the next position and bind the nucleotide analogue complementary to
the next nucleotide in RNA template.
IV.  Incorporation of the nucleotide and translocation to the next nucleotide on RNA template.
First demonstration SMRT RNA Sequencing
– Expected Signal
A.  Example RNA template is shown in black. DNA primer is
shown in orange. The first cDNA strand synthesized by
HIV RT during SMRT RNA Sequencing is color-coded (A
– yellow, C – red, G – gree, T – blue). The first cDNA
strand is complementary to the unhybridized section of
the example RNA template and can be used to determine
the sequence of that section.
B.  A typical time trace obtained in a ZMW with RNA template
immobilized on the bottom surface. One can observe a
succession of blocks of like-colored pulses corresponding
to the sequence of cDNA strand (blocks are indicated by
lines above the trace).
C.  As mentioned on Slide 3 – Point III., due to the kinetics of
the current HIV RT, we observe multiple binding events at
each RNA template position before nucleotide
incorporation and translocation to the next position.
A
B
C
Saletore	
  et	
  al.,	
  Genome	
  Biology,	
  2012	
  
m6A	
  is	
  dis$nct	
  from	
  A	
  
Saletore	
  et	
  al.,	
  Genome	
  Biology,	
  2012	
  
Epigenome	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Epitranscriptome	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Epiproteome	
  (PTM)	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
DNA	
   RNA	
   Protein	
  
transcrip$on	
   transla$on	
  
RT	
  
ACGT	
  
viRNA	
  
	
  
	
  
	
  
	
  
	
  
I	
  N	
  H	
  E	
  R	
  I	
  T	
  A	
  N	
  C	
  E	
  	
  	
  or	
  	
  	
  T	
  R	
  A	
  N	
  S	
  M	
  I	
  S	
  S	
  I	
  O	
  N	
  
prions	
  
iDNA	
  
mC,hmC,	
  8oxoG,	
  m6A	
  
(sn/sno/g)RNA	
   (t/r/tm)RNA	
  
(lnc/nc/mi/piwi/si/vi)RNA	
  
scRNA	
  
RbP	
  
Ribozymes	
   Prions	
  
iRNA	
   iProtein	
  
mC,hmC,	
  8oxoG,	
  m6A,	
  …	
   m6A,	
  5’G,polyA,	
  …	
   Acet,	
  Phos,	
  Citr,	
  SUMO,	
  …	
  	
  
from	
  Saletore	
  et	
  al.,	
  Genome	
  Biology,	
  2012	
  
0.	
  	
  Genomes	
  
	
  
1.  RNA	
  Standards	
  
2.  Removing	
  RNA-­‐Seq	
  Varia$on	
  
3.  Epitranscriptome	
  
4.  #JustSequenceIt	
  
Which	
  genome	
  is	
  important?	
  	
  
	
  
All	
  …	
  Cells.	
  
−5 0 5
−6−4−2024
Comp.1
Comp.2
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26
C27
C28
C29
C30
C31
C32
C33
C34
C35
C36
C37
C38
C39
C40
C41C42
C43
C44
C45
C46
C47
C48
C49
C50
C51
C52
C53
C54
C55
C56
C57
C58
C59
C60
C61
C62
C63
C64
C65
C66
C67
C68
C69
C70
C71
C72
C73
C74
C75
C76
C77
C78
C79
C80
C81
C82
C83
C84
C85
C86
C87
C88
C89
C90
C91
C92
C93
C94
C95
C96
Principal	
  Component	
  1	
  
Principal	
  Component	
  2	
  
C05
C12
C29
C33
C44
C78
C88
C35
C03
C19
C16
C20
C72
C38
C83
C01
C67
C60
C42
C52
C32
C86
C02
C91
C09
C46
C80
C11
C58
C64
C56
C77
C06
C23
C15
C90
C62
C65
C43
C48
C84
C34
C53
C69
C37
C82
C28
C76
C87
C61
C92
C85
C47
C96
C50
C59
C39
C66
C93
C26
C27
C54
C71
C07
C22
C14
C81
C08
C41
C55
C73
C70
C51
C75
C24
C94
C25
C10
C17
C04
C68
C13
C63
C18
C57
C40
C74
C36
C79
C30
C49
C21
C45
C95
C31
C89
MGST3
UROD
PSMA5
PSMB2
DHRS3
L.XKR8
HBXIP
S100A11
R.spk7mid
RPS27
PSMD4
MAST2
YBX1
SLC39A1
UBR4
B4GALT3
S.8
CNIH4
NCSTN
SRM
RNF220
WDR77
TRAPPC3
RPL11
VPS72
ILF2
NUDC
S.4
PARP1
PRDX6
BCRABL
HIST2H2BE
YARS
PI4KB
UBE2Q1
VAMP3
KIFAP3
SF3A3
TPR
DIRAS3
GNPAT
ZMPSTE24
HSD17B7
IPO13
KIF2C
CAPZA1
FAF1
MRPL9
SDHB
AHCYL1
NOC2L
HMGN2
L.AXIN2
GNL2
SFRS4
CREG1
ETV1
SOX5
MEAF6
S.2
DTL
MARCKSL1
MRPS15
JAK1
GNB1
S.10
PTCH1
ALDH9A1
RPA2
KHDRBS1
SFPQ
ARPC5
TXNIP
UFC1
RABGGTB
S.5
KIAA0907
CDC20
PRPF3
KPNA6
RPL22
ACTB
WASF2
ATP5F1
S.6
KIF14
LRRC47
0
5
10
15
20
25
30
Check	
  
Cells	
  
Sort	
  Cells,	
  	
  
Load	
  plate	
  
Library	
  
prep.	
  &	
  
RNA-­‐Seq	
  
Lyse,	
  
cDNA	
  
AML-­‐related	
  genes’	
  expression	
  (RPKM)	
  	
  
Purified	
  Cell	
  (1-­‐96)	
  
Single-­‐cell	
  RNA-­‐seq	
  reveals	
  extraordinary	
  heterogeneity	
  
Leukemia	
  
Sample	
  
Which	
  genome	
  is	
  important?	
  	
  
	
  
All	
  …	
  Species.	
  
nhprtr.org	
  
Pipes,	
  Li,	
  Bozinoski	
  et	
  al.,	
  NAR,	
  2013	
  
Prosimian
Human
Chimpanzee
Gorilla
Rhesus Macaque
Japanese Macaque
Cynomolgus Macaque
Pig-tailed Macaque
Vervet
Sooty Mangabey
Common Marmoset
Squirrel Monkey
Ring-tailed Lemur
Mouse Lemur
Olive Baboon
New World Monkeys
Old World
Hominoids
~70 mya
35-40
23-25
6
8
9
10
14 species selected for
phylogenetic &
pharmacogenomic
significance:
• Samples	
  from	
  Washington,	
  Oregon,	
  Yerkes,	
  and	
  Southwestern	
  Na$onal	
  Primate	
  Research	
  Centers;	
  
the	
  Duke	
  Lemur	
  Center;	
  the	
  MD	
  Anderson	
  Cancer	
  Center,	
  and	
  Covance,	
  Inc.	
  	
  
21	
  Matching	
  “BodyMap”	
  Dissected	
  Tissues	
  
Brain	
   Immune/Sex	
   Soma	
  
Cerebellum	
   Bone	
  Marrow	
   Adipose	
  
Frontal	
  Cortex	
   Lymph	
  Node	
   Colon	
  
Hippocampus	
   Spleen	
   Heart	
  
Hypothalamus	
   Thymus	
   Kidney	
  
Pituitary	
   Thyroid	
   Liver	
  
Temporal	
  Lobe	
   Ovary	
   Lung	
  
Tes$s	
   Skeletal	
  Muscle	
  
Whole	
  Blood	
  
Phase	
  I	
  (2010	
  -­‐2012)	
  
RNA	
  extracted	
  from	
  	
  
each	
  $ssue,	
  pooled,	
  	
  
sequenced	
  on	
  two	
  FCs	
  	
  
(HS2K,	
  GAIIx)	
  for	
  overall	
  
annota$on.	
  
Phase	
  II	
  (2012	
  –	
  2014):	
  
RNA	
  extracted	
  from	
  	
  
12	
  $ssues,	
  individually	
  
sequenced	
  (1/2	
  lane	
  HS2K),	
  
for	
  $ssue-­‐specific	
  	
  
annota$on.	
  	
  	
  	
  
AceView	
  Genes	
  Expressed	
  
Species	
  
/isolate	
  
Number	
  of	
  
genes	
  
expressed	
  
Known	
  human	
  
genes	
  
Un-­‐annotated	
  
human	
  genes	
  
BAB	
   52065	
   22376	
   29689	
  
CMC	
   50073	
   21916	
   28157	
  
CMM	
   50155	
   21974	
   28181	
  
JMI	
   49253	
   21785	
   27468	
  
PTM	
   50863	
   22084	
   28779	
  
RMC	
   49845	
   21888	
   27957	
  
RMI	
   49922	
   21913	
   28009	
  
Jean	
  Thierry-­‐Mieg	
  
Species	
   Library	
   #	
  input	
  
sequences	
  
(billion)	
  
#	
  con;gs	
   Total	
  
length	
  
(Mb)	
  
N25	
  
(bp)	
  
N50	
  
(bp)	
  
N75	
  
(bp)	
  
Longest	
  
con;g	
  (bp)	
  
Baboon	
  
TOT	
   0.149	
   658,581	
   281	
   1,029	
   434	
   276	
   35,170	
  
UDG	
   1.543	
   1,131,951	
   844	
   3,534	
   1,368	
   479	
   131,395	
  
Chimpanzee	
   UDG	
   1.465	
   987,615	
   1,433	
   6,356	
   3,806	
   1,666	
   47,873	
  
Cynomologus	
  
Macaque	
  Chinese	
  
RNA	
   1.676	
   911,282	
   864	
   4,769	
   2,316	
   706	
   59,032	
  
UDG	
   1.630	
   990,604	
   1,055	
   5,479	
   2,822	
   875	
   122,916	
  
Cynomolgus	
  Macaque	
  
Mauri;an	
  
RNA	
   1.078	
   1,142,531	
   929	
   4,368	
   1,813	
   525	
   30,364	
  
TOT	
   0.166	
   526,723	
   200	
   657	
   377	
   266	
   36,976	
  
UDG	
   1.177	
   1,015,657	
   834	
   4,360	
   1,927	
   532	
   22,239	
  
Gorilla	
   UDG	
   1.256	
   732,336	
   1,122	
   5,878	
   3,680	
   1,804	
   33,526	
  
Japanese	
  Macaque	
  	
   RNA	
   1.863	
   703,246	
   738	
   5,010	
   2,687	
   898	
   21,620	
  
Marmoset	
  
TOT	
   0.253	
   332,782	
   118	
   545	
   357	
   261	
   14,561	
  
UDG	
   1.659	
   814,235	
   476	
   2,206	
   785	
   359	
   122,518	
  
Pig-­‐tailed	
  Macaque	
  
RNA	
   1.725	
   969,993	
   1,044	
   5,189	
   2,807	
   916	
   25,056	
  
UDG	
   1.573	
   1,301,087	
   1,222	
   5,015	
   2,265	
   668	
   32,637	
  
Rhesus	
  Macaque	
  
Chinese	
  
RNA	
   1.210	
   923,017	
   836	
   4,694	
   2,230	
   639	
   32,796	
  
UDG	
   1.310	
   969,421	
   850	
   4,638	
   2,114	
   594	
   34,020	
  
Rhesus	
  Macaque	
  
Indian	
  
RNA	
   3.200	
   703,246	
   738	
   5,010	
   2,687	
   898	
   21,620	
  
UDG	
   1.410	
   1,051,149	
   833	
   3,966	
   1,644	
   519	
   68,269	
  
Ring-­‐tailed	
  Lemur	
   UDG	
   1.403	
   611,678	
   822	
   6,169	
   3,630	
   1,468	
   33,401	
  
Sooty	
  Mangabey	
   UDG	
   1.635	
   1,188,472	
   1,466	
   6,431	
   3,483	
   1,116	
   30,331	
  
Chimpanzee	
  de	
  novo	
  assembled	
  transcriptome	
  fully	
  recovers	
  
most	
  genes	
  in	
  current	
  annota$on	
  Chimpanzee De Novo Assembled Transcriptome
Proportion of gene covered by transcriptome
Numberofgenes
0.0 0.2 0.4 0.6 0.8 1.0
020004000600080001000012000
Which	
  genome	
  is	
  important?	
  	
  
	
  
All	
  …	
  Interac$ng	
  Elements.	
  
We	
  need	
  more	
  
Longitudinal,	
  Integra$ve	
  Systems	
  Biology	
  
ScoA	
  Kelly	
  –	
  ISS	
  for	
  one	
  year	
  
Mark	
  Kelly	
  –	
  Earth	
  control	
  
Telomere	
  Length	
  
	
  
DNA	
  Muta$ons	
  &	
  Structural	
  Varia$on	
  
DNA	
  Hydroxy-­‐methyla$on	
  
Chroma$n	
  
RNA	
  expression	
  
&	
  RNA	
  Methyla$on	
   Proteomics	
  
An$body	
  Titers	
  
Cytokines	
  
	
  
DNA	
  Methyla$on	
  
B-­‐cells	
  /	
  T-­‐cells	
  
	
  
Targeted	
  and	
  Global	
  Metabolomics	
  Microbiome	
  
	
  
Cogni$on	
  
	
  
Vasculature	
  
	
  
These People are Awesome
@mason_lab	
  
Gratitude to Many People and Places
Illumina
Gary Schroth
Marc Van Oene
Univ. Chicago
Yoav Gilad
Yale University
Nenad Sestan
Sherman Weissman
FDA/SEQC
Leming Shi
NIH/UDP/NCBI
Jean and Danielle Thierry-Mieg
William Gahl
Baylor
Jeff Rogers
MSKCC
Christina Leslie
Ross Levine
PKI/Geospiza
Todd Smith
Mason Lab
Noah Alexander
Marjan Bozinoski
Dhruva Chandramohan
Sagar Chhangawala
Jorge Gandara
Fran Garrett-Bakelman
Dyala Jaroudi
Sheng Li
Cem Meyden
Lenore Pipes
Darryl Reeves
Yogesh Saletore
Priyanka Vijay
Paul Zumbo
Cornell/WCMC
Jason Banfelder
Scott Blanchard
Selina Chen-Kiang
Olivier Elemento
Yariv Houvras
Samie Jaffrey
Ari Melnick
Margaret Ross
Adam Siepel
Epigenomics Core
Katze Lab/NHPRT
Michael Katze
Bob Palermo
Xinxia Pan
Icahn/MSSM
Eric Schadt, Andrew Kasarskis,
Joel Dudley, Ali Bashir,
Bobby Sebra
NASA
Kosuke Fujishima
Lynn Rothschild
ABRF
George Grills
Scott Tighe
Don Baldwin
UMMS
Maria E Figueroa
AMNH
George Amato
Mark Sidall
U-Minnesota
Ran Blekhman
@mason_lab	
  

More Related Content

What's hot (19)

Micro array study for gene expression in vp
Micro array study for gene expression in vpMicro array study for gene expression in vp
Micro array study for gene expression in vp
 
Crispr cas9-Creative Biogene
Crispr cas9-Creative BiogeneCrispr cas9-Creative Biogene
Crispr cas9-Creative Biogene
 
Genetic Engineering and Biotechnology
Genetic Engineering and BiotechnologyGenetic Engineering and Biotechnology
Genetic Engineering and Biotechnology
 
Crispr/Cas9
Crispr/Cas9Crispr/Cas9
Crispr/Cas9
 
126 micro array study for gene expression
126 micro array study for gene expression126 micro array study for gene expression
126 micro array study for gene expression
 
Paragraph 7 rise
Paragraph 7 riseParagraph 7 rise
Paragraph 7 rise
 
Paragraph 7 rise
Paragraph 7 riseParagraph 7 rise
Paragraph 7 rise
 
Cell 671
Cell 671Cell 671
Cell 671
 
Mol gen-1
Mol gen-1Mol gen-1
Mol gen-1
 
Crispr
CrisprCrispr
Crispr
 
Poster
PosterPoster
Poster
 
Cell 672
Cell 672Cell 672
Cell 672
 
Wild Strawberry: An emerging model for ecological and evolutionary genomics
Wild Strawberry: An emerging model for ecological and evolutionary genomicsWild Strawberry: An emerging model for ecological and evolutionary genomics
Wild Strawberry: An emerging model for ecological and evolutionary genomics
 
DNA Plegable
DNA PlegableDNA Plegable
DNA Plegable
 
Transfer of Potential pseudomonas stutzeri genes
Transfer of Potential pseudomonas stutzeri genes Transfer of Potential pseudomonas stutzeri genes
Transfer of Potential pseudomonas stutzeri genes
 
Crispr/cas9 101
Crispr/cas9 101Crispr/cas9 101
Crispr/cas9 101
 
Decoding the Flu case study
Decoding the Flu case studyDecoding the Flu case study
Decoding the Flu case study
 
6 26-2012
6 26-20126 26-2012
6 26-2012
 
Zebrafish
ZebrafishZebrafish
Zebrafish
 

Similar to 20140710 6 c_mason_ercc2.0_workshop

Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldJoe Parker
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaquesSHAPE Society
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsThermo Fisher Scientific
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSLubna MRL
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysisDr. Olusoji Adewumi
 
FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochureAffymetrix
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Functional Genomics Data Society
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014Ek_Kul
 
Bioinformatics Omics
Bioinformatics OmicsBioinformatics Omics
Bioinformatics OmicsHiplot
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxkarlos64
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expressionishi tandon
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生ysuzuki-naist
 

Similar to 20140710 6 c_mason_ercc2.0_workshop (20)

Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Unilag workshop complex genome analysis
Unilag workshop   complex genome analysisUnilag workshop   complex genome analysis
Unilag workshop complex genome analysis
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
FFPE Applications Solutions brochure
FFPE Applications Solutions brochureFFPE Applications Solutions brochure
FFPE Applications Solutions brochure
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
Bioinformatics Omics
Bioinformatics OmicsBioinformatics Omics
Bioinformatics Omics
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Gene expression
Gene expressionGene expression
Gene expression
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptx
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
 

More from External RNA Controls Consortium (12)

140926 ercc2 workshopreport
140926 ercc2 workshopreport140926 ercc2 workshopreport
140926 ercc2 workshopreport
 
20140711 7 j_myerson_ercc2.0_workshop
20140711 7 j_myerson_ercc2.0_workshop20140711 7 j_myerson_ercc2.0_workshop
20140711 7 j_myerson_ercc2.0_workshop
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
20140711 6 s_munro_ercc2.0_workshop
20140711 6 s_munro_ercc2.0_workshop20140711 6 s_munro_ercc2.0_workshop
20140711 6 s_munro_ercc2.0_workshop
 
20140711 5 s_pond_ercc2.0_workshop
20140711 5 s_pond_ercc2.0_workshop20140711 5 s_pond_ercc2.0_workshop
20140711 5 s_pond_ercc2.0_workshop
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop
 
20140711 1 day2_nist_ercc2.0workshop
20140711 1 day2_nist_ercc2.0workshop20140711 1 day2_nist_ercc2.0workshop
20140711 1 day2_nist_ercc2.0workshop
 
20140710 5 k_thompson_ercc2.0_workshop
20140710 5 k_thompson_ercc2.0_workshop20140710 5 k_thompson_ercc2.0_workshop
20140710 5 k_thompson_ercc2.0_workshop
 
20140710 4 a_bergstrom_lucas_ercc2.0_workshop
20140710 4 a_bergstrom_lucas_ercc2.0_workshop20140710 4 a_bergstrom_lucas_ercc2.0_workshop
20140710 4 a_bergstrom_lucas_ercc2.0_workshop
 
20140710 3 l_paul_ercc2.0_workshop
20140710 3 l_paul_ercc2.0_workshop20140710 3 l_paul_ercc2.0_workshop
20140710 3 l_paul_ercc2.0_workshop
 
20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop
 

Recently uploaded

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 

20140710 6 c_mason_ercc2.0_workshop

  • 1. Mul$-­‐pla(orm  and  cross-­‐methodological   reproducibility  of  transcriptome     profiling  by  RNA-­‐seq  in  the  ABRF  Next-­‐ Genera$on  Sequencing  Study  (ABRF-­‐NGS)   and  FDA’s  SEQC  Study   Christopher  E.  Mason,  Ph.D.   Assistant  Professor   Department  of  Physiology  and  Biophysics  &   The  Ins$tute  for  Computa$onal  Biomedicine  at   Weill  Cornell  Medical  College  and  the   Tri-­‐Ins$tu$onal  Program  on  Computa$onal  Biology  and  Medicine   Affiliate  Fellow  of  the  Informa$on  Society  Project,  Yale  Law  School   July  10th,  2014   @mason_lab  
  • 2. 0.    Background     1.  RNA  Standards   2.  Removing  RNA-­‐Seq  Varia$on   3.  Epitranscriptome   4.  #JustSequenceIt  
  • 3. Exome:   2%  of  the   genome?  
  • 4.       Human Genome Organization from  Mason,  State,  and  Moldin,  Kaplan  &  Sadock’s  Comprehensive  Textbook  of  Psychiatry,  2009  
  • 5. ENCODE  ac$ve  elements   “These  data  enabled  us  to  assign  biochemical  func$ons  for     80%  of  the  genome,     in  par$cular  outside  of  the  well-­‐studied  protein-­‐coding  regions.”  
  • 6. “The  ENCODE  results  were  predicted  by  one  of  its  authors  to   necessitate  the  rewri$ng  of  textbooks.”   “We  agree,  many  textbooks  dealing  with  marke;ng,  mass-­‐media   hype,  and  public  rela;ons  may  well  have  to  be  rewriAen.”  
  • 7. GENCODE  annota;on  (v19)   July  10th,  2014     Coding  genes:  20,345    Noncoding  genes:  22,883    Psuedogenes:  14,206   Other  genes:  516   57,820  total  
  • 8. RNA-­‐Seq  and  all  its  flavors   create  excitement  
  • 9. Differential expression by gene, exon, splice isoform, allele, & transcript! Algorithms: STAR, r-make, ASE, limma-voom, RSEM" 3! reads" alignments" Sorted BAMs" GENCODE " annotation" queries " Reads/bp" Alignment on HPC nodes" References! hg19" RefSeq" miRBase" rRNA" Adapters" Find ncRNAs and new genes! Algorithms: r-make, Aceview" 4! Sequencing Data" Gene fusion detection! Algorithms: r-make, Snowshoes" 5! Genetic variation (SNVs and indels)! Algorithms: STAR/GATK, r-make" 2! Predict polyA sites & gain/loss of miRNA binding sites! Algorithms: r-make, BAGET AlexaSeq, TargetScan" 6! ! Viruses/Bacteria/Other Algorithm: MG-RAST" Remaining Reads! TCTGCTTTAGGATAGATCGATAGCTAGTTCAT CTGCTTTAGGATAGATCGATAGCTAGTTCATC! TCTGCTTTAGGATAGATCGATAGCTAGTTCAT! 7! 1! RNA-­‐seq  =  Love   R-­‐make  tools  
  • 10. But!   There  is  some  noise  
  • 11. What  is  the  source  of  the  wiggles? Genes   ESTs   Liver Kidney Adult Brain Fetal Hippocampus Fetal Amygdala
  • 12. The  Dirty  Dozen:  >=  12  Sources  of  Technical  Noise  in  RNA-­‐Seq     (1)  RNA  integrity:  Sample  purity  or  degrada$on   (2)  Sample  RNA  complexity:  polyA  RNA,  total  RNA,  miRNA   (3)  cDNA  synthesis:  random  hexamer  vs.  polyA-­‐primed   (4)  Library  isola;on:  Gel  excision  vs.  column   (5)  Technical  Errors:  Machine,  Site,  Lane,  Technician,  Library  Size   (6)  Amplifica;on  Cycles  or  Methods:  NuGen,  Tn5,  Phi29   (7)  Input  amount:  (1,  10,  100,  1000  cells)   (8)  Algorithms:  for  alignment  and  assembly   (9)  Fragment  size  distribu;on:  Paired-­‐End,  Single-­‐End  (adaptors)   (10)  Liga;on  Efficiency:  Mul$plexing/Barcoding  and  RNA  ligases   (11)  Depth  of  Sequencing:  cost/benefit  point   (12)  RNA  fragmenta;on:  ca$on,  enzyma$c  
  • 13. Which  type  of  RNA?   Type Abbreviation Function Organisms 7SK$RNA$ 7SK$ negatively$regulating$CDK9/cyclin$T$complex$ metazoans Signal$recognition$particle$RNA$ 7SRNA Membrane$integration$ All$organisms$ Antisense$RNA$ aRNA$ Regulatory All$organisms$ CRISPR$RNA$ crRNA$ Resistance$to$parasites Bacteria$and$archaea$ Guide$RNA$ gRNA$ mRNA$nucleotide$modification$ Kinetoplastid$mitochondria$ Long$noncoding$RNA$ lncRNA XIST$(dosage$compensation),$HOTAIR$(cancer) Eukaryotes$ MicroRNA$ miRNA$ Gene$regulation$ Most$eukaryotes$ Messenger$RNA$ mRNA$ Codes$for$protein$ All$organisms PiwiRinteracting$RNA$ piRNA$ Transposon$defense,$maybe$other$functions$ Most$animals$ Repeat$associated$siRNA$ rasiRNA$ Type$of$piRNA;$transposon$defense$ Drosophila$ Retrotransposon$ retroRNA selfRpropagation Eukaryotes$and$some$bacteria$ Ribonuclease$MRP$ RNase$MRP$ rRNA$maturation,$DNA$replication$ Eukaryotes$ Ribonuclease$P$ RNase$P$ tRNA$maturation$ All$organisms$ Ribosomal$RNA$ rRNA$ Translation$ All$organisms Small$Cajal$bodyRspecific$RNA$ scaRNA$ Guide$RNA$to$telomere$in$active$cells Metazoans Small$interfering$RNA$ siRNA$ Gene$regulation$ Most$eukaryotes$ SmY$RNA$ SmY$ mRNA$transRsplicing$ Nematodes$ Small$nucleolar$RNA$ snoRNA$ Nucleotide$modification$of$RNAs$ Eukaryotes$and$archaea$ Small$nuclear$RNA$ snRNA$ Splicing$and$other$functions$ Eukaryotes$and$archaea$ TransRacting$siRNA$ tasiRNA$ Gene$regulation$ Land$plants$ Telomerase$RNA$ telRNA Telomere$synthesis$ Most$eukaryotes$ TransferRmessenger$RNA$ tmRNA$ Rescuing$stalled$ribosomes$ Bacteria$ Transfer$RNA$ tRNA$ Translation$ All$organisms Viral$Response$RNA$ viRNA AntiRviral$immunity C$elegans Vault$RNA$ vRNA$ selfRpropagation Expulsion$of$xenobiotics Y$RNA$ yRNA RNA$processing,$DNA$replication$ Animals$
  • 14. And  -­‐  which  one  do  we  use?   Technologies  Bifurcate  into  two  main  realms:   Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0Read Illumina HiSeq2500 BridgePCR/cluster Rev.<Term.,<SBS 100 150 Illumina HiSeq2000 BridgePCR/cluster Rev.<Term.,<SBS 100 150 Illumina MiSeq BridgePCR/cluster Rev.<Term.,<SBS 250 300 GnuBio GnuBio emPCR HybFAssist<Sequencing 1000* 64,000* Life<Technologies SOLiD<5500 emPCR Seq.<by<Lig. 75 100 LaserGen LaserGen emPCR Rev.<Term.,<SBS 25* 100* Pacific<Biosciences RS Polymerase<Binding RealFtime 1800 15,000 454 Titanium emPCR PyroSequencing 650 1100 454 Junior emPCR PyroSequencing 400 650 Helicos Heliscope adaptor<ligation Rev.<Term.,<SBS 35 57 Intelligent<BioSystems MAXFSeq Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300 Intelligent<BioSystems MINIF20 Rolony<amplification TwoFStep<SBS<(label/unlabell) 2x100 300 ZS<Genetics N/A Atomic<Lableing Electron<Microscope N/A N/A Halcyon<Molecular N/A N/A Direct<Observation<of<DNA N/A N/A Platform Instrument Template0Preparation Chemistry Avearge0Length Longest0Read IBM<DNA<Transistor N/A none Microchip<Nanopore N/A N/A NABsys N/A none Nanochannel N/A N/A Bionanogenomics N/A anneal<7mers Nanochannel 10,000* 1,000,000* Life<Technologies PGM emPCR SemiFconductor 150 300 Life<Technologies Proton emPCR SemiFconductor 120 240 Life<Technologies Proton<2 emPCR SemiFconductor 400* 800* Genia N/A none Protein<nanopore<(aFhemalysin) N/A N/A Oxford<Nanopore MinION none Protein<Nanopore 10,000 10,000* Oxford<Nanopore GridION<2K none Protein<Nanopore 10,000 500,000* Oxford<Nanopore GridION<8K none Protein<Nanopore 10,000 500,000* *Values<are<estimates<from<companies<that<have<not<yet<released<actual<data Optical<Sequencing Electical<Sequencing Table<1:<Types<of<HighFThroughput<Sequencing<Technologies Mason  CE,  Porter  SG,  Smith  TM.  Adv  Exp  Med  Biol.  2014  
  • 15. 0.    Genomes     1.  RNA  Standards   2.  Removing  RNA-­‐Seq  Varia$on   3.  Epitranscriptome   4.  #JustSequenceIt  
  • 16.
  • 17. The complexity of the transcriptome is vast exon1     exon2     exon3   exon1-­‐exon2   exon1-­‐exon3   exon2-­‐exon3   exon1-­‐exon2-­‐exon3   6   63   15   7   127   21   8   255   28   3   7   3   4   15   6   5   31   10   1   1   0   2   3   1   Exons   Variants   Junc;ons   2n-1 Exon  1   Exon  2   Exon  3   Exon  1   Exon  2   Exon  3   n(n-1) 2 Exon4   Exon4   exon4     exon1-­‐  exon4   exon2-­‐exon4   exon3-­‐exon4   exon1-­‐exon2-­‐exon4   exon1-­‐exon3-­‐exon4   exon2-­‐exon3-­‐exon4   exon1-­‐exon2-­‐exon3-­‐exon4   8x1083 theoretical transcript combinations 8x1080 atoms in the universe (159 atoms/star, 111 stars/galaxy, 110 galaxies)Li  and  Mason,  ARGHG,  2014  
  • 18. Es$mate  of  False  Junc$on  Rate   5’   3’   Exon-­‐Edge  DB   (3’-­‐5’  junc$ons)   2 3 41Exon # : False  Exon-­‐Edge  DB   (3’-­‐3’  junc$ons)   False  Exon-­‐Edge  DB   (5’-­‐5’  junc$ons)   False  Exon-­‐Edge  DB   (5’-­‐3’  junc$ons)   5’   3’   5’   3’   5’   3’  
  • 19. FDA’s Sequencing Quality Control (SeQC) Consortium: Seq Standards from 323 members of Government (FDA), Industry, Academia >RNA sequence from pilot study based on MAQC samples and MAQC design: ~Pilot data from 2008-2012 MAQC Sample A/B, & ERCCs from: •  454 •  Helicos •  Illumina •  Lifetech ~1000Gb of Illumina BodyMap data ~300Gb of experimental RNA-Seq
  • 20. Associa;on  of  Biomolecular  Research  Facili;es  (ABRF)   Next  Genera;on  Sequencing  (NGS)  Study:     Phase  I:  RNA-­‐Seq  and  degraded  RNA-­‐Seq  (2011-­‐2013)   Phase  II:  DNA-­‐Seq  and  Hard-­‐to-­‐sequence  regions  (2014-­‐2016)   Phase  III:  Asteroid  and  Mar$an  Surface  Sequencing  (2017-­‐2050?)     1.  Cross  Pla(orm:  HiSeq2000,  454,  PGM,  Proton,  PacBio   2.  Cross-­‐Protocol:  ribo-­‐,  direc$onal,  non-­‐direc$onal,  degraded  RNA   3.  Cross-­‐Site:  3  sites  for  each  pla(orm,  replicates  at  each  site   Improve  the  detec$on  and  quan$fica$on  of  molecular  signatures     for  use  in  basic,  applied,  and  clinical  research.     Specifically,  the  ABRF-­‐NGS  group  will  test  and  characterize  the  myriad,   rapidly  evolving  NGS  techniques  for  RNA-­‐Seq  and  DNA-­‐Seq.  
  • 21. Samples  of  the  MAQC,  SEQC  and  ABRF-­‐NGS  Study  
  • 22. ß  SEQC  samples  (FDA)  A B A B A B C D A B A B A BA B C D E F A B A B A B A B E F Pacific Biosciences RS Life Technologies Ion PGM Life Technologies Ion Proton A:  10  cancer  cell  lines   B:  Brain  $ssues  
  • 23. A B A B A B C D A B A B A BA B C D E F A B A B A B A B E F Pacific Biosciences RS Life Technologies Ion PGM Life Technologies Ion Proton Cross-­‐pla(orm  experimental  design  
  • 24. A B A B A B C D A B A B A BA B C D E F A B A B A B A B E F Pacific Biosciences RS Life Technologies Ion PGM Life Technologies Ion Proton Cross-­‐Site  experimental  design  
  • 25. A B A B A B C D A B A B A BA B C D E F A B A B A B A B E F Pacific Biosciences RS Life Technologies Ion PGM Life Technologies Ion Proton Cross-­‐Protocol  experimental  design  
  • 26. All  technologies  have  strengths  &  weaknesses   Vendor Instrument Version Run Time (hours) Read Length (mean) Reads per Run (million) Yield per Run Cost per Run ($)1 Cost per Mb ($) Paired- end Application Strengths Illumina HiSeq 2000 High Output 132 50 6,000 300 Gb2 18,725.00 0.06 Yes Gene expression; splice junction detection; variant calling; fusions Deep read counts for transcript quantification Illumina HiSeq 2500 High Output 132 50 6,000 300 Gb2 18,725.00 0.06 Yes as above as above Illumina MiSeq v2 kit 39 250 30 7.5 Gb 982.75 0.13 Yes Splice junction detection; variant calling Rapid transcript quantification and variant detection Life Technologies PGM 318 chip 7.3 176 6 1.056 Gb 749.00 0.71 No Splice junction detection; variant calling Rapid transcript quantification and variant detection Life Technologies Proton Proton I chip 2 to 4 81 70 5.67 Gb 834.00 0.15 No Gene expression; variant calling Good read depth and length for transcript quantification Pacific Biosciences RS RS 0.5 to 2 1,289 0.03 38.67 Mb 136.38 3.53 No Splice junction detection; full length gene coverage Extended read lengths Roche 454 Genome Sequencer FLX+ 20 686 1 686 Mb 5,985.00 8.72 No Splice junction detection Read length 1 sequencing reaction reagents, at academic list price U.S. dollars, and does not include library preparation reagents, labor, data storage or analysis, equipment or maintenance; 2 one sequencing run using 2 flowcells. Pacific Biosciences is calculated based on CCS reads; Gb: billion bases, Mb: million bases Table 1. Summary of RNA-seq platforms as deployed in the ABRF-NGS Study.
  • 28. Error  models  highly  variable  among   pla(orms   0% 2% 4% 6% 454 ILMN PAC PGM PRO platform Indelsmismatchspercentage b 454:  Roche  454  GS  FLX+   ILMN:  Illumina  HiSeq  2000/2500   PAC:  Pacific  Biosciences  RS  I   PGM:  Ion  Personal  Genome  Machine   PRO:  Ion  Torrent  Proton          
  • 29. Full  length  genebody  coverage       Coverage across genebody (%) Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y 5' platform 454 sample site Y type intact type site sample platformm 454   ILMN   PAC   PGM   PRO   Note:  PAC  used  CCS  read  only;  PAC  on  average  only  cover  3k  GENCODE  genes.  
  • 30. Inter-­‐site  normalized  gene  expression   varia$on:  CV  and  R2   ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●●● ● ●●●●● ● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ●●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●● ●●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●●● ●● ● ● ●● ● ●● ● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ●● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●●● ● ● ●● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●●● 0 50 100 150 200 454−A 454−B ILMN−A ILMN−B PAC−A1 PAC−A2 PAC−A3 PAC−B1 PAC−B2 PAC−B3 PGM−A PGM−B PRO−A PRO−B Coefficientofvariation(%) 454 ILMN PAC PGM PRO intra−platform A intra−platform B inter−platform A inter−platform B 0.92 0.97 0.91 0.96 0.85 0.98 0.95 0.95 0.82 0.88 0.85 0.84 0.93 0.89 0.75 0.82 0.78 0.86 0.92 0.92 0.00 0.25 0.50 0.75 1.00 454 ILMN PGM PRO 454 ILMN PGM PRO 454−ILMN 454−PGM 454−PRO ILMN−PGM ILMN−PRO PGM−PRO 454−ILMN 454−PGM 454−PRO ILMN−PGM ILMN−PRO PGM−PRO platform Correlationcoefficients ● 454 ILMN PAC PGM PRO ● ●A B es A B● 454 ILMN PAC PGM PRO ● ●A B gd e f a b ILMN  with  lowest  inter-­‐site  CV   ILMN  with  highest  inter-­‐site  R2   ILMN-­‐PRO  &  PRO-­‐PGM  with  highest  R2                  
  • 31. Genes  detec$on  is  log-­‐linear;   Junc$on  detec$on  is  length-­‐dependent  454−A 454−B ILMN−A ILMN−B PAC−A1 PAC−A2 PAC−A3 PAC−B1 PAC−B2 PAC−B3 PGM−A PGM−B PRO−A PRO−B 454 LMN PGM ● ● 50000 100000 150000 200000 9.0 9.5 10.0 10.5 11.0 bases sequenced (log10) detectedjunctions ● 454 ILMN PAC PGM PRO ● ●A B ●● 10000 20000 30000 9.0 9.5 10.0 10.5 11.0 bases sequenced (log10) detectedgenes ● 454 ILMN PAC PGM PRO ● ●A B d e                
  • 32. Junc$ons  detec$on  efficiency  highly   variable;  agreement  common   PRO 454 LMN PGM PRO 454−ILMN 454−PGM 454−PRO LMN−PGM LMN−PRO PGM−PRO 454−ILMN 454−PGM 454−PRO LMN−PGM LMN−PRO PGM−PRO platform A−AH A−AS A−AR AH−AS AH−AR AR−AS B−BR sample 0 25 50 75 454 ILMN PAC PGM PRO platform junctionspermillionbases A B known novel 0 20000 40000 60000 1 2 3 4 5 3 4 5 occurrence among platforms junctions A Bgf Note:  Only  use  a  subset  of  reads  among  pla(orms   to  normalize  the  scale                   Most  of  the  known  junc$ons  are   shared  by  at  least  3  pla(orms  
  • 33. Sequencing  depth  is  important  to   discover  low  abundance  transcripts    
  • 34. Inter-­‐pla(orm  differen$al  gene   expression  show  88-­‐97%  agreement   Shared  sets  of  greater   than  1000  genes  are   indicated  in  red,     100-­‐999  yellow,     <100  blue.     Unique  DEGs:   454            -­‐  3.0%   POLYA  -­‐  9.2%   RIBO        -­‐  8.8%   PRO          -­‐  11.9%   PGM        -­‐  3.9%        
  • 35. Propor$onal  venn  diagrams  don’t   always  add  clarity,  but  they  are  prewy  RIBO POLYA 454 PGM PRO 1766 482 2046 202 69 105 2199 1104 484158 678 76 1828 740 1010 401 133 559 62 20 36595 2792 1187 406 1791 212 57 99 2151
  • 36. II:  Inter-­‐protocol  quan$fica$ons   •  Illumina   – Poly-­‐A  selec$ons  (POLYA)   – Ribo-­‐deple$on  (RIBO)  
  • 37. Gene  regions  distribu$on  varies  between  protocols   a" b"MappingDistribution(%)! 0% 25% 50% 75% 100% Sample Mappingdistribution(%) intergenic mito exon 5utr intron ribo a        
  • 38. Very  similar  gene  measurements  of  genes   (RPKM)  between  polyA  &  Ribo-­‐deple$on   Ribo0>PolyA+ PolyA>Ribo0+ RPKM%difference%(polyA%–%ribo0)% Number%of%Genes%
  • 39. ENSEMBL Gene ID Gene Symbol Description Length PolyA Reads RiboDep Reads PolyA FPKM RiboDep FPKM FPKM Diff ENSG00000210082 J01415.4 Mt_rRNA 1559 17040155 133679955 18568.31 200404.74 7181836.43 ENSG00000211459 J01415.24 Mt_rRNA 954 2232892 14445488 3976.16 35389.28 731413.11 ENSG00000202198 RN7SK misc_RNA 331 3303 2818202 16.95 19899.03 719882.08 ENSG00000258486 RN7SL1 antisense 300 3061 520624 17.33 4055.93 74038.60 ENSG00000202364 SNORD3A snoRNA 216 718 186779 5.65 2020.98 72015.33 ENSG00000202538 RNU472 snRNA 141 168 102560 2.02 1699.99 71697.97 ENSG00000251562 MALAT1 lincRNA 8708 437102 4931496 85.27 1323.57 71238.30 ENSG00000199916 RMRP misc_RNA 264 148 83019 0.95 734.96 7734.00 ENSG00000201098 RNY1 misc_RNA 113 172 19210 2.59 397.32 7394.73 ENSG00000238741 SCARNA7 snoRNA 330 53 50182 0.27 355.40 7355.13 ENSG00000200795 RNU471 snRNA 141 35 18724 0.42 310.36 7309.94 ENSG00000200087 SNORA73B snoRNA 204 336 23778 2.80 272.42 7269.62 ENSG00000252010 SCARNA5 snoRNA 276 29 25379 0.18 214.91 7214.73 ENSG00000199568 RNU5A71 snRNA 116 38 10224 0.56 205.99 7205.44 ENSG00000252481 SCARNA13 snoRNA 275 145 24034 0.90 204.26 7203.36 ENSG00000212232 SNORD17 snoRNA 237 102 20571 0.73 202.86 7202.13 ENSG00000207008 SNORA54 snoRNA 123 8 9655 0.11 183.46 7183.35 ENSG00000200156 RNU5B71 snRNA 116 28 8301 0.41 167.25 7166.84 ENSG00000209582 SNORA48 snoRNA 135 38 9272 0.48 160.52 7160.04 ENSG00000239002 SCARNA10 snoRNA 330 19 21801 0.10 154.40 7154.30 ENSG00000254911 SCARNA9 antisense 353 501 19842 2.41 131.37 7128.96 ENSG00000230043 TMSB4XP6 pseudogene 135 0 6272 0.00 108.58 7108.58 ENSG00000239039 SNORD13 snoRNA 104 0 4521 0.00 101.60 7101.60 ENSG00000208892 SNORA49 snoRNA 136 7 5352 0.09 91.97 791.89 ENSG00000223336 RNU276P snRNA 190 22 6365 0.20 78.29 778.10 Supplemental Table 3 - Top 25 Genes with Highest Enrichment from Ribo-Depletion Preparation
  • 40. polyA  favors  high  expressors   •  Gencode14   •  Unique  genes:  55889   0   5,000   10,000   15,000   20,000   25,000   30,000   ≥  2   ≥  4   ≥  8   ≥  16   ≥  32   ≥  64   ≥  128   PolyA   RiboZero   0   5,000   10,000   15,000   20,000   25,000   30,000   ≥  2   ≥  4   ≥  8   ≥  16   ≥  32   ≥  64   ≥  128   Read  count   Genome-­‐level  Gene  Counts   Gene  Overlap   Read  count  
  • 41. High-­‐overlap  DEGs  and     comparable  accuracy   FDR: 0.001 FDR: 0.01 FDR: 0.05 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 FC:1.5FC:2 A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D comparison Proportion type POLYA RIBO Shared b" B−C itivity) .81.0 FDR: 0.05; FC:1.5 B−D itivity) .81.0 FDR: 0.05; FC:1.5 C−D itivity) .81.0 FDR: 0.05; FC:1.5 FC: 1.5 FDR: 0.001 FC: 1.5 FDR: 0.01 FC: 1.5 FDR: 0.05 FC: 2 FDR: 0.001 FC: 2 FDR: 0.01 FC: 2 FDR: 0.05 0.0 0.2 0.4 0.6 0.8 A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D A−B A−C A−D B−C B−D C−D comp Externalvalidation(MCC) RIBO POLYAc Using  TaqMan  as  external  valida$on  812  reac$ons                  
  • 44. Genetic variation (SNVs, indels, CNVs)! 2! Differential expression by gene, exon, splice isoform, allele, & transcript!3! Predict gene fusions, polyA sites and alterations to miRNA binding sites! 5! ! Viruses/Bacteria/Other" De Novo Assembly of Remaining Reads! TCTGCTTTAGGATAGATCGATAGCTAGTTCAT CTGCTTTAGGATAGATCGATAGCTAGTTCATC! TCTGCTTTAGGATAGATCGATAGCTAGTTCAT! 6! reads" alignments" Sorted BAMs" annotations" queries " Reads/bp" Alignment on HPC nodes" References! hg19" RefSeq" miRBase" rRNA" Adapters" Find ncRNAs and new genes! 4! 1! Sequencing Data" Many  areas  could  be  impacted  
  • 45. Entropy  is  usually  a  source  of  fear
  • 46. "FEAR is  the  main  source  of    superstition,     and  one  of  the  main  sources  of      cruelty. To  conquer  fear  is  the  beginning  of    wisdom."       –  Bertrand  Russell   HBR−RIN9 HBR−RIN0 UHR−RIN0 UHR−RIN3 UHR−RIN9 UHR−RIN6 HBR−RIN9 HBR−RIN0 UHR−RIN0 UHR−RIN3 UHR−RIN9 UHR−RIN6 0.6 0.7 0.8 0.9 1 Value Color Key
  • 47. Can  we  remove  supers$$on? HEAT   99oC  for     10  minutes   SONICATION   6  x  55  seconds     at  5%DC,    I   intensity  3     100  c/b   RNAse     RNAse  A     1  ng/uL   Un$l  RIN  <2  
  • 48. Degraded  RNA  looks  great!   Coverage across genebody (%) Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y 5' platform 454 sample site Y type intact type site sample platformm 454   ILMN   PAC   PGM   PRO  
  • 49. Degraded  RNA  highly  correlates  with   intact  RNA  gene  expression   atform intra−platform B inter−platform A inter−platform B 0.96 0.85 0.98 0.95 0.95 0.82 0.88 0.85 0.84 0.93 0.89 0.75 0.82 0.78 0.86 0.92 0.92 PRO 454 ILMN PGM PRO 454−ILMN 454−PGM 454−PRO ILMN−PGM ILMN−PRO PGM−PRO 454−ILMN 454−PGM 454−PRO ILMN−PGM ILMN−PRO PGM−PRO platform 0.95 0.94 0.94 0.96 0.97 0.97 0.96 0.00 0.25 0.50 0.75 1.00 A−AH A−AS A−AR AH−AS AH−AR AR−AS B−BR sample Correlationcoefficients c B A B A B C D A B Pacific Biosciences RS Life Tech Ion P (AR)   (AS)   (AH)   (BR)             Illumina  Ribo-­‐deple$on  protocol  
  • 50. 0.    Genomes     1.  RNA  Standards   2.  Removing  RNA-­‐Seq  Varia$on   3.  Epitranscriptome   4.  #JustSequenceIt  
  • 51. How  Sequencing  Quality  affect   Inter-­‐site  gene  expression   quan$fica$on?   SEQC  study  
  • 52. Ameliora$ng  Inter-­‐site  Varia$on   SEQC samples Each site has 4 replicates ILM1 ILM2 Replicate 5 ILM3 Replicate 5 ILM4 ILM5 Replicate 5 ILM6 Replicate 5 libraries were vendor supplied SEQC  study   HiSeq  2000  
  • 53. Ameliora$ng  Inter-­‐site  Varia$on   SEQC samples Each site has 4 replicates ILM1 ILM2 Replicate 5 ILM3 Replicate 5 ILM4 ILM5 Replicate 5 ILM6 Replicate 5 libraries were vendor supplied SEQC  study  
  • 54. Determine  sequencing  varia$on  sources   SEQC  study  
  • 55. Nucleo$de  frequency  bias  from  library  prepara$on  ●M5 ILM6 A G C T 23 24 25 26 27 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Position in read Nucleotidefrequency(%) ILM1 ILM2 ILM3 ILM4 ILM5 ILM6 1 5 site 1 2 3 4 5 ● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6 site 1 2 3 4 5 ● ● ● ● ● ●ILM1 ILM2 ILM3 ILM4 ILM5 ILM6 f SEQC  study   Replicate 5 libraries were vendor supplied         <-­‐  5th  replicate   of  ILM2  and   ILM3   <-­‐  1th  replicate   of  ILM2   <-­‐  1th  replicate   of  ILM3  
  • 56. Sample  A  vs.  A  =  the  Answer  should  be  zero  DEGs;   SVA  can  remove  false  posi$ve  DEGs     te 0.957 0.958 0.958 0.971 0.973 0.970 ILM2−ILM6 ILM3−ILM4 ILM3−ILM5 ILM3−ILM6 ILM4−ILM5 ILM4−ILM6 ILM5−ILM6 1 2 1 BBBBBB B B B BBB B B BBBB BBBB B BBBB ● ● ● ● ● ● ILM1 ILM2 ILM3 ILM4 ILM5 ILM6 0 2500 5000 7500 10000 0 2500 5000 7500 10000 0 2500 5000 7500 10000 0 2500 5000 7500 10000 ABCD ILM1−ILM2 ILM1−ILM3 ILM1−ILM4 ILM1−ILM5 ILM1−ILM6 ILM2−ILM3 ILM2−ILM4 ILM2−ILM5 ILM2−ILM6 ILM3−ILM4 ILM3−ILM5 ILM3−ILM6 ILM4−ILM5 ILM4−ILM6 ILM5−ILM6 Comparison FalsepositiveDEGs Without SVA With SVA c Illumina   SEQC  study   Leek  JT,  Storey  JD.PLoS  Genet.2007   Compare  the  same   sample  across  sites          
  • 57. SVA  remove  false  posi$ve  DEGs   Validated  by  PGM  data  from  ABRF-­‐NGS  study     te 0.957 0.958 0.958 0.971 0.973 0.970 ILM2−ILM6 ILM3−ILM4 ILM3−ILM5 ILM3−ILM6 ILM4−ILM5 ILM4−ILM6 ILM5−ILM6 1 2 1 BBBBBB B B B BBB B B BBBB BBBB B BBBB ● ● ● ● ● ● ILM1 ILM2 ILM3 ILM4 ILM5 ILM6 0 2500 5000 7500 10000 0 2500 5000 7500 10000 0 2500 5000 7500 10000 0 2500 5000 7500 10000 ABCD ILM1−ILM2 ILM1−ILM3 ILM1−ILM4 ILM1−ILM5 ILM1−ILM6 ILM2−ILM3 ILM2−ILM4 ILM2−ILM5 ILM2−ILM6 ILM3−ILM4 ILM3−ILM5 ILM3−ILM6 ILM4−ILM5 ILM4−ILM6 ILM5−ILM6 Comparison FalsepositiveDEGs Without SVA With SVA c Illumina   PGM   −2 −1 0 1 −0.50.00.51.01.5 Dimension 1 Dimension2 A.1 A.2 A.3 A.4 B.1 B.2 B.3 B.4 A.1 A.2 B.1 B.2 A.1 A.2 B.1B.2 ● ● ● PGM1 PGM2 PGM3 P P P P P P site ● ● ●PGM1 PGM2 PGM3 1 2 3 4 P P P P P site A ● ● ●PGM1 PGM2 PGM3 1 2 3 4 comp: A comp: B 0 50 100 150 200 PGM1−PGM2 PGM1−PGM3 PGM2−PGM3 PGM1−PGM2 PGM1−PGM3 PGM2−PGM3 Comparison FalsepositiveDEGs Without SVA With SVA 1 1 NumberofDEGs d e f SEQC  study   Li  et  al.,  in  press,  Nat.  Biotech.,  2014   ABRF-­‐NGS  study          
  • 58. NIST,  SEQC,  ABRF,  ENCODE,  others   •  Provide  reference  data   resources     •  Best  prac$ces  for     –  gene  quan$fica$on,     –  isoform  characteriza$on,   –  dynamic  range  comparisons,     –  managing  inter-­‐site  and  intra-­‐ site  varia$on,     –  analysis  pipelines,  and     –  cross-­‐pla(orm  tes$ng  of   transcriptome  hypotheses     •  To  address  some  other  aspects   of  RNA-­‐seq,  including     –  variant  detec$on,     –  allele-­‐specific  expression,     –  RNA  edi$ng,  and  gene  fusions.     •  And  more  …  
  • 59. 0.    Genomes     1.  RNA  Standards   2.  Removing  RNA-­‐Seq  Varia$on   3.  Epitranscriptome   4.  #JustSequenceIt  
  • 60. Which  transcriptome  is  important?       All  …  RNA  bases.  
  • 61.
  • 62. The  four-­‐base  genome     is  just  the  beginning  
  • 63. The  four-­‐base  genome     is  just  the  beginning  
  • 64. There  are  many  RNA-­‐mods  as  well:   Chuan  He  
  • 65. Abbreviation Chemical0name m1 acp3 Ψ 1'methyl'3'(3'amino'3'carboxypropyl)5pseudouridine m1 A 1'methyladenosine m1 G 1'methylguanosine m1 I 1'methylinosine m1 Ψ 1'methylpseudouridine m1 Am 1,2''O'dimethyladenosine m1 Gm 1,2''O'dimethylguanosine m1 Im 1,2''O'dimethylinosine m2 A 2'methyladenosine ms2 io6 A 2'methylthio'N6 '(cis'hydroxyisopentenyl)5adenosine ms2 hn6 A 2'methylthio'N6 'hydroxynorvalyl5carbamoyladenosine ms2 i6 A 2'methylthio'N6 'isopentenyladenosine ms2 m6 A 2'methylthio'N6 'methyladenosine ms2 t6 A 2'methylthio'N6 'threonyl5carbamoyladenosine s2 Um 2'thio'2''O'methyluridine s2 C 2'thiocytidine s2 U 2'thiouridine Am 2''O'methyladenosine Cm 2''O'methylcytidine Gm 2''O'methylguanosine Im 2''O'methylinosine Ψm 2''O'methylpseudouridine Um 2''O'methyluridine Ar(p) 2''O'ribosyladenosine5(phosphate) Gr(p) 2''O'ribosylguanosine5(phosphate) acp3 U 3'(3'amino'3'carboxypropyl)uridine m3 C 3'methylcytidine m3 Ψ 3'methylpseudouridine m3 U 3'methyluridine m3 Um 3,2''O'dimethyluridine imG'14 4'demethylwyosine s4 U 4'thiouridine chm5 U 5'(carboxyhydroxymethyl)uridine mchm5 U 5'(carboxyhydroxymethyl)uridine5methyl5ester inm5 s2 U 5'(isopentenylaminomethyl)'52'thiouridine inm5 Um 5'(isopentenylaminomethyl)'52''O'methyluridine inm5 U 5'(isopentenylaminomethyl)uridine nm5 s2 U 5'aminomethyl'2'thiouridine ncm5 Um 5'carbamoylmethyl'2''O'methyluridine ncm5 U 5'carbamoylmethyluridine cmnm5 Um 5'carboxymethylaminomethyl'52''O'methyluridine cmnm5 s2 U 5'carboxymethylaminomethyl'2'thiouridine cmnm5 U 5'carboxymethylaminomethyluridine cm5 U 5'carboxymethyluridine f5 Cm 5'formyl'2''O'methylcytidine f5 C 5'formylcytidine hm5 C 5'hydroxymethylcytidine ho5 U 5'hydroxyuridine mcm5 s2 U 5'methoxycarbonylmethyl'2'thiouridine mcm5 Um 5'methoxycarbonylmethyl'2''O'methyluridine mcm5 U 5'methoxycarbonylmethyluridine mo5 U 5'methoxyuridine m5 s2 U 5'methyl'2'thiouridine mnm5 se2 U 5'methylaminomethyl'2'selenouridine mnm5 s2 U 5'methylaminomethyl'2'thiouridine mnm5 U 5'methylaminomethyluridine m5 C 5'methylcytidine m5 D 5'methyldihydrouridine m5 U 5'methyluridine τm5 s2 U 5'taurinomethyl'2'thiouridine τm5 U 5'taurinomethyluridine m5 Cm 5,2''O'dimethylcytidine m5 Um 5,2''O'dimethyluridine preQ1 7'aminomethyl'7'deazaguanosine preQ0 7'cyano'7'deazaguanosine m7 G 7'methylguanosine G+ archaeosine D dihydrouridine oQ epoxyqueuosine galQ galactosyl'queuosine OHyW hydroxywybutosine I inosine imG2 isowyosine k2 C lysidine manQ mannosyl'queuosine mimG methylwyosine m2 G N2 'methylguanosine m2 Gm N2 ,2''O'dimethylguanosine m2,7 G N2 ,7'dimethylguanosine m2,7 Gm N2 ,7,2''O'trimethylguanosine m2 2G N2 ,N2 'dimethylguanosine m2 2Gm N2 ,N2 ,2''O'trimethylguanosine m2,2,7 G N2 ,N2 ,7'trimethylguanosine ac4 Cm N4 'acetyl'2''O'methylcytidine ac4 C N4 'acetylcytidine m4 C N4 'methylcytidine m4 Cm N4 ,2''O'dimethylcytidine m4 2Cm N4 ,N4 ,2''O'trimethylcytidine io6 A N6 '(cis'hydroxyisopentenyl)adenosine ac6 A N6 'acetyladenosine g6 A N6 'glycinylcarbamoyladenosine hn6 A N6 'hydroxynorvalylcarbamoyladenosine i6 A N6 'isopentenyladenosine m6 t6 A N6 'methyl'N6 'threonylcarbamoyladenosine m6 A N6 'methyladenosine t6 A N6 'threonylcarbamoyladenosine m6 Am N6 ,2''O'dimethyladenosine m6 2A N6 ,N6 'dimethyladenosine m6 2Am N6 ,N6 ,2''O'trimethyladenosine o2yW peroxywybutosine Ψ pseudouridine Q queuosine OHyW undermodified5hydroxywybutosine cmo5 U uridine55'oxyacetic5acid mcmo5 U uridine55'oxyacetic5acid5methyl5ester yW wybutosine imG wyosine Table515'5List5of5Base5Modifications5Covered5by5Claims m6A  is  just  1  of  the  110     known  RNA  modifica$ons   from  the     RNA  Modifica$on  Database  
  • 66. m6A  marks  are  widespread  in  cancer  cells’  RNA   Meyer  et  al.,  Cell,  2012  
  • 67. A  new  method:  MeRIP-­‐Seq   Meyer  et  al.,  Cell,  2012  
  • 68. Conserva$ons  of  signal  and  sites  in   orthologous    genes     Meyer  et  al.,  Cell,  2012  
  • 69. Peaks  also  show  strong  conserva$on   Meyer,  2012  
  • 70. Many  puta$ve  roles  for  m6A  in  RNA   Paz-­‐Yaakov  et  al,  2010   Saletore,  Chen-­‐Kiang,  and  Mason,  RNA  Biology,  2013.  
  • 71.
  • 72. RNA  modifica$ons  give  a  new  layer  of  cellular  regula$on   N NN N NH2 O OHO O RNA RNA N NN N HN O OHO O RNA RNA H2 C OH N NN N HN O OHO O RNA RNA C H O N NN N HN O OHO O RNA RNA C OH O N NN N HN O OHO O RNA RNA CH3 A m6A ca6A f6A hm6A O H H O H OH H2O Mettl3 FTO/ALKBH5 O 2 , Fe II , -KG FTO/ ALKBH5 O 2 , Fe II , -KG Li  and  Mason,  ARGHG,  2014  
  • 73. New  layer  to  study!  
  • 74. Direct  Detec$on  of  Methyla$on  on  PacBio   70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.5 0 100 200 300 400 Fluorescence intensity(a.u.) Time (s) 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 0 100 200 300 400 Fluorescence intensity(a.u.) Time (s) C   T   G   A   T  C   G   T   A   C   mA   A  G   T  C  T   A   A   G   C   C   A  A   A   A   Approach:  Kine$c  detec$on  of  methylated  bases  during  SMRT  DNA  sequencing   Example:  N6-­‐methyladenosine  (mA)  
  • 75. SMRT RNA Sequencing – Assay Design I.  RNA template (purple) is hybridized to a DNA primer (orange) and immobilized on the bottom of a zero-mode waveguide (ZMW) in the presence of fluorescently-labeled nucleotide analogues. II.  After the addition of HIV RT to the solution, the polymerase binds to the RNA template at the 3’-end of DNA primer. III.  The first nucleotide analogue can bind to the active site of the polymerase resulting in a fluorescent pulse with a color characteristic to the type of nucleotide (i.e. A, C, G, or T). With HIV RT, the rates of nucleotide binding, dissociation and incorporation have not been optimized to a state where each binding event would result in an incorporation event (image IV.). Consequently, nucleotide analogues often dissociate before they can be incorporated resulting in a series of like pulses before the polymerase can translocate to the next position and bind the nucleotide analogue complementary to the next nucleotide in RNA template. IV.  Incorporation of the nucleotide and translocation to the next nucleotide on RNA template.
  • 76. First demonstration SMRT RNA Sequencing – Expected Signal A.  Example RNA template is shown in black. DNA primer is shown in orange. The first cDNA strand synthesized by HIV RT during SMRT RNA Sequencing is color-coded (A – yellow, C – red, G – gree, T – blue). The first cDNA strand is complementary to the unhybridized section of the example RNA template and can be used to determine the sequence of that section. B.  A typical time trace obtained in a ZMW with RNA template immobilized on the bottom surface. One can observe a succession of blocks of like-colored pulses corresponding to the sequence of cDNA strand (blocks are indicated by lines above the trace). C.  As mentioned on Slide 3 – Point III., due to the kinetics of the current HIV RT, we observe multiple binding events at each RNA template position before nucleotide incorporation and translocation to the next position. A B C
  • 77. Saletore  et  al.,  Genome  Biology,  2012  
  • 78. m6A  is  dis$nct  from  A   Saletore  et  al.,  Genome  Biology,  2012  
  • 79. Epigenome                               Epitranscriptome                               Epiproteome  (PTM)                               DNA   RNA   Protein   transcrip$on   transla$on   RT   ACGT   viRNA             I  N  H  E  R  I  T  A  N  C  E      or      T  R  A  N  S  M  I  S  S  I  O  N   prions   iDNA   mC,hmC,  8oxoG,  m6A   (sn/sno/g)RNA   (t/r/tm)RNA   (lnc/nc/mi/piwi/si/vi)RNA   scRNA   RbP   Ribozymes   Prions   iRNA   iProtein   mC,hmC,  8oxoG,  m6A,  …   m6A,  5’G,polyA,  …   Acet,  Phos,  Citr,  SUMO,  …     from  Saletore  et  al.,  Genome  Biology,  2012  
  • 80. 0.    Genomes     1.  RNA  Standards   2.  Removing  RNA-­‐Seq  Varia$on   3.  Epitranscriptome   4.  #JustSequenceIt  
  • 81. Which  genome  is  important?       All  …  Cells.  
  • 82. −5 0 5 −6−4−2024 Comp.1 Comp.2 C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 C41C42 C43 C44 C45 C46 C47 C48 C49 C50 C51 C52 C53 C54 C55 C56 C57 C58 C59 C60 C61 C62 C63 C64 C65 C66 C67 C68 C69 C70 C71 C72 C73 C74 C75 C76 C77 C78 C79 C80 C81 C82 C83 C84 C85 C86 C87 C88 C89 C90 C91 C92 C93 C94 C95 C96 Principal  Component  1   Principal  Component  2   C05 C12 C29 C33 C44 C78 C88 C35 C03 C19 C16 C20 C72 C38 C83 C01 C67 C60 C42 C52 C32 C86 C02 C91 C09 C46 C80 C11 C58 C64 C56 C77 C06 C23 C15 C90 C62 C65 C43 C48 C84 C34 C53 C69 C37 C82 C28 C76 C87 C61 C92 C85 C47 C96 C50 C59 C39 C66 C93 C26 C27 C54 C71 C07 C22 C14 C81 C08 C41 C55 C73 C70 C51 C75 C24 C94 C25 C10 C17 C04 C68 C13 C63 C18 C57 C40 C74 C36 C79 C30 C49 C21 C45 C95 C31 C89 MGST3 UROD PSMA5 PSMB2 DHRS3 L.XKR8 HBXIP S100A11 R.spk7mid RPS27 PSMD4 MAST2 YBX1 SLC39A1 UBR4 B4GALT3 S.8 CNIH4 NCSTN SRM RNF220 WDR77 TRAPPC3 RPL11 VPS72 ILF2 NUDC S.4 PARP1 PRDX6 BCRABL HIST2H2BE YARS PI4KB UBE2Q1 VAMP3 KIFAP3 SF3A3 TPR DIRAS3 GNPAT ZMPSTE24 HSD17B7 IPO13 KIF2C CAPZA1 FAF1 MRPL9 SDHB AHCYL1 NOC2L HMGN2 L.AXIN2 GNL2 SFRS4 CREG1 ETV1 SOX5 MEAF6 S.2 DTL MARCKSL1 MRPS15 JAK1 GNB1 S.10 PTCH1 ALDH9A1 RPA2 KHDRBS1 SFPQ ARPC5 TXNIP UFC1 RABGGTB S.5 KIAA0907 CDC20 PRPF3 KPNA6 RPL22 ACTB WASF2 ATP5F1 S.6 KIF14 LRRC47 0 5 10 15 20 25 30 Check   Cells   Sort  Cells,     Load  plate   Library   prep.  &   RNA-­‐Seq   Lyse,   cDNA   AML-­‐related  genes’  expression  (RPKM)     Purified  Cell  (1-­‐96)   Single-­‐cell  RNA-­‐seq  reveals  extraordinary  heterogeneity   Leukemia   Sample  
  • 83. Which  genome  is  important?       All  …  Species.  
  • 84. nhprtr.org   Pipes,  Li,  Bozinoski  et  al.,  NAR,  2013  
  • 85. Prosimian Human Chimpanzee Gorilla Rhesus Macaque Japanese Macaque Cynomolgus Macaque Pig-tailed Macaque Vervet Sooty Mangabey Common Marmoset Squirrel Monkey Ring-tailed Lemur Mouse Lemur Olive Baboon New World Monkeys Old World Hominoids ~70 mya 35-40 23-25 6 8 9 10 14 species selected for phylogenetic & pharmacogenomic significance: • Samples  from  Washington,  Oregon,  Yerkes,  and  Southwestern  Na$onal  Primate  Research  Centers;   the  Duke  Lemur  Center;  the  MD  Anderson  Cancer  Center,  and  Covance,  Inc.    
  • 86. 21  Matching  “BodyMap”  Dissected  Tissues   Brain   Immune/Sex   Soma   Cerebellum   Bone  Marrow   Adipose   Frontal  Cortex   Lymph  Node   Colon   Hippocampus   Spleen   Heart   Hypothalamus   Thymus   Kidney   Pituitary   Thyroid   Liver   Temporal  Lobe   Ovary   Lung   Tes$s   Skeletal  Muscle   Whole  Blood   Phase  I  (2010  -­‐2012)   RNA  extracted  from     each  $ssue,  pooled,     sequenced  on  two  FCs     (HS2K,  GAIIx)  for  overall   annota$on.   Phase  II  (2012  –  2014):   RNA  extracted  from     12  $ssues,  individually   sequenced  (1/2  lane  HS2K),   for  $ssue-­‐specific     annota$on.        
  • 87. AceView  Genes  Expressed   Species   /isolate   Number  of   genes   expressed   Known  human   genes   Un-­‐annotated   human  genes   BAB   52065   22376   29689   CMC   50073   21916   28157   CMM   50155   21974   28181   JMI   49253   21785   27468   PTM   50863   22084   28779   RMC   49845   21888   27957   RMI   49922   21913   28009   Jean  Thierry-­‐Mieg  
  • 88. Species   Library   #  input   sequences   (billion)   #  con;gs   Total   length   (Mb)   N25   (bp)   N50   (bp)   N75   (bp)   Longest   con;g  (bp)   Baboon   TOT   0.149   658,581   281   1,029   434   276   35,170   UDG   1.543   1,131,951   844   3,534   1,368   479   131,395   Chimpanzee   UDG   1.465   987,615   1,433   6,356   3,806   1,666   47,873   Cynomologus   Macaque  Chinese   RNA   1.676   911,282   864   4,769   2,316   706   59,032   UDG   1.630   990,604   1,055   5,479   2,822   875   122,916   Cynomolgus  Macaque   Mauri;an   RNA   1.078   1,142,531   929   4,368   1,813   525   30,364   TOT   0.166   526,723   200   657   377   266   36,976   UDG   1.177   1,015,657   834   4,360   1,927   532   22,239   Gorilla   UDG   1.256   732,336   1,122   5,878   3,680   1,804   33,526   Japanese  Macaque     RNA   1.863   703,246   738   5,010   2,687   898   21,620   Marmoset   TOT   0.253   332,782   118   545   357   261   14,561   UDG   1.659   814,235   476   2,206   785   359   122,518   Pig-­‐tailed  Macaque   RNA   1.725   969,993   1,044   5,189   2,807   916   25,056   UDG   1.573   1,301,087   1,222   5,015   2,265   668   32,637   Rhesus  Macaque   Chinese   RNA   1.210   923,017   836   4,694   2,230   639   32,796   UDG   1.310   969,421   850   4,638   2,114   594   34,020   Rhesus  Macaque   Indian   RNA   3.200   703,246   738   5,010   2,687   898   21,620   UDG   1.410   1,051,149   833   3,966   1,644   519   68,269   Ring-­‐tailed  Lemur   UDG   1.403   611,678   822   6,169   3,630   1,468   33,401   Sooty  Mangabey   UDG   1.635   1,188,472   1,466   6,431   3,483   1,116   30,331  
  • 89. Chimpanzee  de  novo  assembled  transcriptome  fully  recovers   most  genes  in  current  annota$on  Chimpanzee De Novo Assembled Transcriptome Proportion of gene covered by transcriptome Numberofgenes 0.0 0.2 0.4 0.6 0.8 1.0 020004000600080001000012000
  • 90. Which  genome  is  important?       All  …  Interac$ng  Elements.  
  • 91. We  need  more   Longitudinal,  Integra$ve  Systems  Biology  
  • 92.
  • 93.
  • 94. ScoA  Kelly  –  ISS  for  one  year   Mark  Kelly  –  Earth  control   Telomere  Length     DNA  Muta$ons  &  Structural  Varia$on   DNA  Hydroxy-­‐methyla$on   Chroma$n   RNA  expression   &  RNA  Methyla$on   Proteomics   An$body  Titers   Cytokines     DNA  Methyla$on   B-­‐cells  /  T-­‐cells     Targeted  and  Global  Metabolomics  Microbiome     Cogni$on     Vasculature    
  • 95. These People are Awesome @mason_lab  
  • 96. Gratitude to Many People and Places Illumina Gary Schroth Marc Van Oene Univ. Chicago Yoav Gilad Yale University Nenad Sestan Sherman Weissman FDA/SEQC Leming Shi NIH/UDP/NCBI Jean and Danielle Thierry-Mieg William Gahl Baylor Jeff Rogers MSKCC Christina Leslie Ross Levine PKI/Geospiza Todd Smith Mason Lab Noah Alexander Marjan Bozinoski Dhruva Chandramohan Sagar Chhangawala Jorge Gandara Fran Garrett-Bakelman Dyala Jaroudi Sheng Li Cem Meyden Lenore Pipes Darryl Reeves Yogesh Saletore Priyanka Vijay Paul Zumbo Cornell/WCMC Jason Banfelder Scott Blanchard Selina Chen-Kiang Olivier Elemento Yariv Houvras Samie Jaffrey Ari Melnick Margaret Ross Adam Siepel Epigenomics Core Katze Lab/NHPRT Michael Katze Bob Palermo Xinxia Pan Icahn/MSSM Eric Schadt, Andrew Kasarskis, Joel Dudley, Ali Bashir, Bobby Sebra NASA Kosuke Fujishima Lynn Rothschild ABRF George Grills Scott Tighe Don Baldwin UMMS Maria E Figueroa AMNH George Amato Mark Sidall U-Minnesota Ran Blekhman @mason_lab