SlideShare a Scribd company logo
1 of 39
Download to read offline
ERCC	
  2.0	
  Workshop	
  
Day	
  1	
  
July	
  10,	
  2014	
  
Sarah	
  Munro	
  and	
  Marc	
  Salit	
  
Welcome	
  
Agenda	
  
•  MeeCng	
  IntroducCon	
  
–  ERCC	
  1.0	
  Recap	
  &	
  
ApplicaCons	
  
–  Charge	
  to	
  Workshop	
  
–  ERCC	
  2.0	
  Scope	
  &	
  
Process	
  Discussion	
  
•  ParCcipant	
  
PresentaCons	
  
–  Bob	
  Se'erquist	
  
–  Lukas	
  Paul	
  
	
  
•  Break:	
  3:30	
  –	
  4:00pm	
  
•  ParCcipant	
  
PresentaCons	
  
–  Anne	
  Bergstrom	
  Lucas	
  
–  Karol	
  Thompson	
  
–  Christopher	
  Mason	
  
•  Working	
  Group	
  
FormaCon	
  and	
  Scoping	
  
Discussion	
  
•  Dinner:	
  6:00pm	
  
ERCC	
  1.0	
  RECAP	
  
Gene	
  Expression	
  Measurements	
  
5	
  
Evaluation of gene expression measurements from
commercial microarray platforms
Paul K. Tan, Thomas J. Downey1
, Edward L. Spitznagel Jr2
, Pin Xu, Dadin Fu,
Dimiter S. Dimitrov3
, Richard A. Lempicki4
, Bruce M. Raaka5
and Margaret C. Cam*
Microarray Core Laboratory, National Institute of Diabetes and Digestive and Kidney Disorders (NIDDK),
National Institutes of Health, 1
Partek Incorporated, 2
Department of Mathematics, Washington University,
3
Laboratory of Experimental and Computational Biology (LECB), National Cancer Institute, NIH, 4
National Institute
of Allergy and Infectious Diseases (NIAID), NIH, SAIC-Frederick, Inc., 5
Clinical Endocrinology Branch, NIDDK,
NIH, USA
Received May 23, 2003; Revised July 11, 2003; Accepted August 11, 2003
ABSTRACT
Multiple commercial microarrays for measuring
genome-wide gene expression levels are currently
available, including oligonucleotide and cDNA,
single- and two-channel formats. This study reports
on the results of gene expression measurements
generated from identical RNA preparations that
were obtained using three commercially available
microarray platforms. RNA was collected from
PANC-1 cells grown in serum-rich medium and at
24 h following the removal of serum. Three bio-
logical replicates were prepared for each condition,
and three experimental replicates were produced for
the ®rst biological replicate. RNA was labeled and
hybridized to microarrays from three major sup-
pliers according to manufacturers' protocols, and
gene expression measurements were obtained
using each platform's standard software. For each
platform, gene targets from a subset of 2009 com-
mon genes were compared. Correlations in gene
expression levels and comparisons for signi®cant
gene expression changes in this subset were calcu-
lated, and showed considerable divergence across
the different platforms, suggesting the need for
establishing industrial manufacturing standards,
and further independent and thorough validation of
the technology.
INTRODUCTION
A powerful application of microarray technology is in
(1). Once target genes are identi®ed, additional laboratory
resources may be invested to validate this list and to further
characterize the relationship of their biological functions to
the process under study (2). The ef®ciency of knowledge
discovery using this high-throughput experimental process
depends upon the reliability of the microarray technology used
in the initial screening experiments. Researchers planning to
utilize microarray experiments for discovery-based research
must evaluate available commercial technologies when allo-
cating laboratory resources for prospective experiments.
Several formats of microarrays for measuring genome-wide
gene expression levels are currently available (3). Important
factors for selecting an appropriate microarray platform would
include sensitivity, speci®city and both inter- and intra-assay
reproducibility. Also important is knowledge of the degree of
cross-platform agreement, as interchangeability amongst
various microarray formats would allow for the utility of
gene expression data without regard to platform. Having such
a property would allow researchers from independent labora-
tories to make direct comparisons on data produced from
different types of available platforms, and would reduce the
need to replicate experiments (4). Such cross-platform com-
parisons ideally require that corresponding RNA expression
measurements be concordant. Previous comparisons of
microarray formats suggested that expression data on the
NCI60 cell lines from spotted cDNA mircroarrays could not
be directly combined with data from synthesized oligonucleo-
tide arrays (5). This ®nding was determined using identical
originating cell lines; however, cell culturing, mRNA prep-
aration and hybridization of targets were all performed
separately. In this study we analyzed identical RNA prepar-
ations using three commercially available high-density
microarray platforms. This experimental design allowed us
to compare the microarray formats while controlling for
5676±5684 Nucleic Acids Research, 2003, Vol. 31, No. 19
DOI: 10.1093/nar/gkg763
•  Transcript	
  raCos	
  across	
  
samples	
  
•  Lack	
  of	
  confidence	
  in	
  gene	
  
expression	
  experiments	
  
–  Same	
  pair	
  of	
  samples,	
  
different	
  plaUorms,	
  different	
  
raCo	
  results!	
  
•  CriCcal	
  applicaCons	
  
–  Cancer	
  Biology	
  
–  Drug	
  Discovery	
  
–  Tissue	
  engineering	
  
–  Stem	
  Cell	
  Biology	
  
External	
  RNA	
  Control	
  ConsorCum	
  (ERCC)	
  
•  Industry-­‐iniCated,	
  NIST-­‐hosted,	
  
stakeholder	
  coupled	
  
–  grew	
  out	
  of	
  NIST	
  workshop	
  in	
  
2003	
  
•  iniCated	
  by	
  Janet	
  Warrington,	
  
VP	
  Clinical	
  Genomics	
  at	
  
Affymetrix	
  
–  all	
  major	
  microarray	
  technology	
  
developers	
  
–  other	
  gene	
  expression	
  assay	
  
developers	
  
•  Open	
  to	
  all	
  interested	
  parCes	
  
•  Voluntary	
  
•  More	
  than	
  90	
  parCcipants	
  
–  Private,	
  Public,	
  Academic	
  
6	
  
Spike-­‐
ins	
  
ERCC	
  CollaboraCve	
  Study	
  
•  Developed	
  sequence	
  library	
  
from	
  submission	
  by	
  ERCC	
  
members,	
  as	
  well	
  as	
  
synthesis	
  
–  evaluated	
  performance	
  of	
  
RNA	
  controls	
  on	
  variety	
  of	
  
plaUorms	
  
–  selected	
  96	
  well-­‐performing	
  
sequences	
  in	
  collaboraCve	
  
study	
  
•  Array	
  manufacturers	
  
modified	
  products	
  to	
  
include	
  ERCC	
  control	
  
sequences	
   7	
  
176	
  
144	
  
106	
  
96	
  
SRM	
  2374	
  –	
  DNA	
  Sequence	
  Library	
  	
  
for	
  External	
  RNA	
  Controls	
  
•  NIST	
  Standard	
  Reference	
  
Material	
  (SRM	
  2374)	
  
•  Contains	
  96	
  unique	
  control	
  
sequences	
  inserted	
  in	
  
common	
  plasmid	
  DNA	
  
–  engineered	
  to	
  be	
  readily	
  
in	
  vitro	
  transcribed	
  to	
  make	
  
RNA	
  controls	
  
–  RNA	
  controls	
  intended	
  to	
  
mimic	
  mammalian	
  mRNA	
  
•  hdp://www.nist.gov/srm/	
  
Library	
  of	
  96	
  Controls	
  in	
  SRM	
  2374	
  
Sequence	
  Lengths	
  
Sequence Length
count
0
5
10
15
20
25
500 1000 1500 2000
GC	
  Content	
  
GC Content
count
0
2
4
6
8
10
12
14
0.35 0.40 0.45 0.50
9	
  
CreaCng	
  Spike-­‐in	
  Mixtures	
  from	
  SRM	
  
2374	
  
10	
  
SRM	
  2374	
  Plasmid	
  	
  
DNA	
  Library	
  
in	
  vitro	
  
transcripCon	
  
RNA	
  transcripts	
  
Pooling	
  
Mixtures	
  with	
  known	
  
abundance	
  raCos	
  
…	
  
Feature	
   A_1	
   A_2	
   A_3	
   B_1	
   B_2	
   B_3	
  
T1	
   1	
   5	
   4	
   0	
   2	
   3	
  
T2	
   200	
   204	
   199	
   101	
   97	
   103	
  
T3	
   142	
   153	
   147	
   149	
   130	
   155	
  
ERCC-­‐0001	
   5	
   8	
   10	
   20	
   23	
   19	
  
…	
  
Method	
  ValidaCon	
  with	
  	
  
erccdashboard	
  R	
  package	
  
erccdashboard Package Vignette
Sarah A. Munro
May 4, 2014
This vignette describes the use of the erccdashboard R package to analyze External RNA Control Con-
sortium (ERCC) spike-in control ratio mixtures in gene expression experiments. If you use this package for
method validation of your gene expression experiments please cite our publication:
Please cite our paper when you use the erccdashboard
package for analysis. This is a placeholder citation,
because our manuscript is still under review.
Munro SA, Lund S, Pine PS, Binder H, Clevert D,
Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H,
Jafari N, Kreil DP, ˚A ,
Aabaj PP, Liao Y, Lin S, Meehan
J, Mason CE, Santoyo J, Setterquist RA, Shi L, Shi
W, Smyth GK, Stralis-Pavese N, Su Z, Tong W, Wang
C, Wang J, Xu J, Ye Z, Yang Y, Yu Y, Salit M (Under
Review, 2014). Assessing Technical Performance in
Gene Expression Experiments with External Spike-in
RNA Control Ratio Mixtures.
A BibTeX entry for LaTeX users is
@Article{,
title = {Assessing Technical Performance in Gene Expression Experiments with External Spike-in RNA Co
author = {Munro SA and Lund S and Pine PS and Binder H and Clevert D and Conesa A and Dopazo J and Fa
journal = {Under Review},
volume = {0},
pages = {0},
year = {2014},
}
Munro SA, Lund S, Pine PS, Binder H, Clevert D, Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H,
Jafari N, Kreil DP, ˘0141abaj PP, Li S, Liao Y, Lin S, Meehan J, Mason CE, Santoyo J, Setterquist RA, Shi
•  Open-­‐source	
  R	
  
package	
  
–  erccdashboard	
  
•  Assess	
  technical	
  
performance	
  of	
  a	
  
gene	
  expression	
  
experiment	
  
•  Compare	
  results	
  
–  Within	
  a	
  single	
  
laboratory	
  
–  Between	
  laboratories	
  
Method	
  ValidaCon	
  with	
  	
  
erccdashboard	
  R	
  package	
  
•  Open-­‐source	
  R	
  
package	
  
–  erccdashboard	
  
•  Assess	
  technical	
  
performance	
  of	
  a	
  
gene	
  expression	
  
experiment	
  
•  Compare	
  results	
  
–  Within	
  a	
  single	
  
laboratory	
  
–  Between	
  laboratories	
  
APPLICATIONS	
  OF	
  ERCC	
  1.0	
  
Product	
  and	
  Method	
  Development	
  
•  CerCfied	
  Sequences	
  
•  Known	
  concentraCons	
  
–  ValidaCon	
  and	
  method	
  
tesCng	
  
–  Product	
  development	
  
and	
  evaluaCon	
  	
  
Measurement	
  Analysis	
  
Quality	
  Control	
  
•  Limit	
  of	
  DetecCon	
  
•  Dynamic	
  Range	
  
•  Noise	
  models	
  
Sample	
  NormalizaCon	
  
•  Key	
  to	
  comparing	
  
transcriptomes:	
  
–  Immunology	
  
–  Agriculture	
  
–  Virology	
  
–  Cancer	
  
Single-­‐Cell	
  Measurements	
  
•  NormalizaCon	
  
•  Noise	
  modeling	
  
•  RT	
  Efficiency	
  
•  Limit	
  of	
  DetecCon	
  
Others…	
  
Synthetic Spike-in Standards Improve Run-Specific
Systematic Error Analysis for DNA and RNA Sequencing
Justin M. Zook1
*, Daniel Samarov2
, Jennifer McDaniel1
, Shurjo K. Sen3
, Marc Salit1
1 Biochemical Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America, 2 Statistical Engineering Division,
National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America, 3 Genetic Disease Research Branch, National Human Genome Research
Institute, National Institutes of Health, Bethesda, Maryland, United States of America
Abstract
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic
sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants.
These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in
false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells,
bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set
used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either
from a part of the data set being ‘‘recalibrated’’ (Genome Analysis ToolKit, or GATK) or from a separate data set with special
characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards
to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to
conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base
quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the
spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific
recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the
spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG,
and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these
DNA and RNA spike-in standards with GATK improves base quality score recalibration.
Citation: Zook JM, Samarov D, McDaniel J, Sen SK, Salit M (2012) Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA
Sequencing. PLoS ONE 7(7): e41356. doi:10.1371/journal.pone.0041356
Editor: Janet Kelso, Max Planck Institute for Evolutionary Anthropology, Germany
Received February 28, 2012; Accepted June 20, 2012; Published July 31, 2012
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for
any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Funding: This research was supported in part by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of
Health. No additional external funding was received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: zook@nist.gov
Introduction
As sequencing costs drop, it is becoming cost-effective to
sequence even whole genomes to a sufficient depth that random
errors become insignificant. However, systematic sequencing
errors (SSEs) and biases remain problematic even at high
sequencing depths, so recent research has started to focus on
understanding these SSEs and biases [1,2]. In this work, we focus
on SSEs rather than coverage biases, where SSEs are systematic
errors in sample preparation and sequencing processes that cause
base call errors to accumulate preferentially at certain base
positions in the genome, and coverage biases are biases in the
number of reads covering certain genomic regions such as GC-
bias [3–5]. Examples of SSEs, as well as random errors, are
portrayed in Figure 1(a). Compensating for these SSEs is critical
for applications in which a variant might be expected to be in only
a small fraction of the reads, such as samples containing RNA-
editing [6,7], cancer tissues and circulating tumor cells [8–11],
fetal DNA in mother’s blood [12], mixtures of bacterial strains
[13], mitochondrial heteroplasmy [14], mosaic disorders [15], and
pooled samples [16,17]. Since the causes of many SSEs are not
well understood and may vary due to batch effects in a run-specific
manner, compensating for them requires training data sets. The
two previously proposed approaches either use a separate data set
with special characteristics (e.g., SysCall uses overlapping paired-
end reads [1]) or use the data set itself excluding regions known to
have variants (e.g., Genome Analysis Toolkit, or GATK, base
quality score recalibration [2]). Here, we combine the advantages
of these approaches by using DNA or RNA spike-in standards
without homology to almost all biological organisms.
The first approach, SysCall, used a methyl-Seq dataset that had
overlapping paired-end reads to detect SSEs depending on
sequencing direction for the Illumina sequencer [1]. The region
in which the reads overlap can be used to find systematic errors
that preferentially occur on one DNA strand compared to the
other strand. To improve variant calls, the SysCall method uses
a separate dataset with overlapping reads to train a logistic
regression model that accounts for SSEs correlated with several
covariates: (1) the 2 preceding bases + the base in question (each
base independently), (2) directionality bias of the errors, the
proportion of non-reference reads, and (3) a comparison of the
quality scores of the error base to the next base. Most sequencing
runs do not contain overlapping paired reads, so SysCall assumes
the SSEs in a training data set are the same as the SSEs in other
Experience	
  from	
  Expression	
  Analysis	
  
Thanks	
  to	
  Wendell	
  Jones	
  and	
  Erik	
  Aronesty	
  
I	
  like…	
  
•  1000s	
  of	
  RNA-­‐Seq	
  samples	
  
–  Ambion	
  Mix	
  1	
  
–  “did	
  the	
  library	
  reacCons	
  
work	
  appropriately	
  and	
  
consistently?	
  
–  “did	
  our	
  lab	
  degrade	
  samples	
  
or	
  were	
  the	
  samples	
  already	
  
degraded?”	
  	
  	
  
–  “effects	
  (or	
  lack	
  of)	
  between	
  
lane	
  or	
  flowcell”	
  
I	
  wish…	
  
•  “Construct	
  Ext	
  RNA	
  Controls	
  that	
  
emulate	
  a	
  variety	
  of	
  splice	
  variaCon	
  
(some	
  that	
  may	
  be	
  challenging)	
  and	
  
have	
  them	
  at	
  different	
  magnitudes”	
  
–  ”examine	
  not	
  only	
  the	
  chemistry	
  but	
  also	
  
the	
  bioinformaCc	
  pipeline	
  to	
  ensure	
  it	
  has	
  
basic	
  fitness.”	
  
•  “Suggest	
  a	
  protocol	
  for	
  adding	
  Ext	
  RNA	
  
Controls	
  for	
  FFPE.”	
  	
  
–  “While	
  we	
  spike	
  in	
  ERCC	
  controls	
  at	
  a	
  fixed	
  
amount	
  for	
  FFPE	
  samples,	
  we	
  get	
  out	
  a	
  
wild	
  range	
  of	
  sequence	
  coming	
  out	
  that	
  
aligns	
  to	
  the	
  ERCC	
  controls.”	
  
–  Hypothesis:	
  “much	
  of	
  the	
  target	
  RNA	
  is	
  so	
  
damaged	
  that	
  it	
  doesn't	
  ligate	
  to	
  adapters	
  
correctly,	
  but	
  ERCC	
  controls	
  do	
  (ligate);	
  as	
  
a	
  result	
  they	
  are	
  (much)	
  preferenCally	
  
amplified.”	
  	
  
ERCC	
  1.0	
  Shortcomings	
  
•  Poly	
  A	
  SelecCon	
  is	
  
broken	
  
•  Too	
  short	
  
•  No	
  isoforms	
  
•  No	
  good	
  mimics	
  for	
  
variants	
  
–  SNPs,	
  cancer	
  fusions	
  
•  Bimodal	
  GC	
  distribuCon	
  
ERCC−00002
ERCC−00003
ERCC−00004
ERCC−00009
ERCC−00013
ERCC−00014
ERCC−00019
ERCC−00022
ERCC−00025
ERCC−00028
ERCC−00031
ERCC−00033
ERCC−00034
ERCC−00035
ERCC−00039
ERCC−00040
ERCC−00042
ERCC−00043
ERCC−00044
ERCC−00046
ERCC−00051
ERCC−00053
ERCC−00054
ERCC−00058
ERCC−00059
ERCC−00060
ERCC−00062
ERCC−00067
ERCC−00069
ERCC−00071
ERCC−00073
ERCC−00074
ERCC−00076
ERCC−00077
ERCC−00078
ERCC−00079
ERCC−00084
ERCC−00085
ERCC−00092
ERCC−00095
ERCC−00096
ERCC−00099
ERCC−00108
ERCC−00109
ERCC−00111
ERCC−00112
ERCC−00113
ERCC−00116
ERCC−00126
ERCC−00130
ERCC−00131
ERCC−00136
ERCC−00137
ERCC−00143
ERCC−00144
ERCC−00145
ERCC−00148
ERCC−00150
ERCC−00154
ERCC−00157
ERCC−00160
ERCC−00162
ERCC−00163
ERCC−00165
ERCC−00168
ERCC−00170
ERCC−00171
Lab1 Lab2 Lab3 Lab4 Lab5 Lab6 Lab7 Lab8 Lab9
Lab
Feature
−10
−5
0
5
ScaledEffect
OpportuniCes	
  for	
  ERCC	
  2.0	
  
•  New	
  technologies	
  
–  RNA-­‐Seq	
  
–  Long	
  reads	
  
•  PacBio,	
  Moleculo	
  
–  Digital	
  counCng	
  
•  Cellular	
  Research,	
  digital	
  
PCR	
  
•  Method	
  improvements	
  
–  Library	
  preparaCon	
  
–  BioinformaCcs	
  
•  New	
  discoveries	
  
Counting individual DNA molecules by the
stochastic attachment of diverse labels
Glenn K. Fu, Jing Hu, Pei-Hua Wang, and Stephen P. A. Fodor1
Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA 95051
Edited* by Ronald W. Davis, Stanford Genome Technology Center, Palo Alto, CA, and approved March 22, 2011 (received for review November 27, 2010)
We implement a unique strategy for single molecule counting
termed stochastic labeling, where random attachment of a diverse
set of labels converts a population of identical DNA molecules
into a population of distinct DNA molecules suitable for threshold
detection. The conceptual framework for stochastic labeling is
developed and experimentally demonstrated by determining the
absolute and relative number of selected genes after stochastically
labeling approximately 360,000 different fragments of the human
genome. The approach does not require the physical separation of
molecules and takes advantage of highly parallel methods such as
microarray and sequencing technologies to simultaneously count
absolute numbers of multiple targets. Stochastic labeling should
be particularly useful for determining the absolute numbers of
RNA or DNA molecules in single cells.
absolute counting ∣ digital PCR ∣ next-generation sequencing ∣
single molecule detection
Determining small numbers of biological molecules and their
changes is essential when unraveling mechanisms of cellular
response, differentiation or signal transduction, and in perform-
Identical DNA
target molecules {t1
, t2
…. tn
}
t1
t2
t3
t4
Pool of labels
{l1
, l2
…. lm
}
Random
labeling
t1
l20
t2
l107
t3
l477
t4
l9
Amplification and detection
of k distinctly labeled molecules
Fig. 1. A schematic representation of the labeling process. An example
showing four identical target molecules in solution. Each DNA molecule ran-
domly captures and joins with a label by choosing from a large, nondepleting
Fu	
  et	
  al.	
  PNAS	
  2011	
  	
  
CHARGE	
  TO	
  THE	
  WORKSHOP	
  
Charge	
  to	
  the	
  Workshop	
  
•  Develop	
  consensus	
  on…	
  	
  
–  Concept	
  
•  Shared	
  interests	
  
–  PorUolio	
  
•  Controls	
  
•  Analysis	
  
•  Documentary	
  Standards	
  
•  Develop	
  consorCum	
  structure…	
  
–  Working	
  groups	
  
–  Steering	
  commidee	
  
Principles	
  of	
  OperaCon	
  
•  Pre-­‐compeCCve	
  
•  Consensus	
  decision-­‐
making	
  
•  Data-­‐driven	
  
•  Technology	
  
independent	
  
•  Leadership	
  
–  Working	
  Groups	
  
–  Steering	
  Commidee	
  
•  NIST-­‐hosted	
  
•  “You	
  get	
  out	
  of	
  it	
  what	
  
you	
  put	
  into	
  it.”	
  
“A rising tide floats all boats”
ERCC operates by consensus
“A rising tide floats all boats…”
VISION	
  OF	
  SCOPE	
  &	
  LIFESPAN	
  OF	
  
ERCC	
  2.0	
  
Scope	
  of	
  ERCC	
  2.0	
  
•  ERCC	
  2.0	
  is	
  convened	
  to	
  
develop	
  standard	
  controls	
  
for	
  RNA	
  measurements	
  
•  Three	
  working	
  groups	
  are	
  
proposed	
  
1.  Design	
  
–  Types	
  of	
  controls	
  &	
  
sequence	
  design	
  
2.  Development	
  
–  Building	
  controls,	
  
developing	
  &	
  tesCng	
  
control	
  mixtures	
  
3.  Analysis	
  
–  Standard	
  performance	
  
metrics	
  
–  Tools	
  as	
  needed	
  to	
  
support	
  design	
  &	
  
development	
  
The	
  Arc	
  of	
  ERCC	
  2.0	
  
•  Products	
  
–  Sequences	
  represenCng	
  
different	
  types	
  of	
  RNA	
  
•  Transcript	
  isoforms	
  
•  miRNA	
  
•  New	
  mRNA	
  mimics	
  
•  …	
  
–  Documentary	
  standards	
  for	
  
using	
  controls	
  
–  Performance	
  metrics	
  
•  LogisCcs	
  
–  Workshops	
  
•  Number,	
  frequency	
  
–  Telecons,	
  Mailing	
  list,	
  Wiki	
  
•  Development	
  Schedules	
  
–  Sequence	
  selecCon	
  
–  Control	
  synthesis	
  
–  Control	
  tesCng	
  and	
  analysis	
  
–  Reference	
  material	
  
development,	
  
characterizaCon,	
  release	
  
–  AnalyCcal	
  methods	
  &	
  tools	
  
–  Documentary	
  standards	
  
–  …	
  
–  Finished.	
  
•  DisseminaCon	
  
–  Steering	
  commidee	
  to	
  
address	
  business	
  models	
  
	
  
ERCC	
  2.0	
  PROCESS	
  DISCUSSION	
  
What	
  will	
  we	
  do	
  together?	
  
•  NIST	
  is	
  commided	
  to...	
  
–  HosCng	
  the	
  consorCum	
  
–  SupporCng	
  product	
  
development	
  
•  PorUolio	
  possibiliCes	
  
–  Reference	
  materials	
  
–  Reference	
  data	
  
–  Analysis	
  methods	
  
–  Analysis	
  tools	
  
–  Documentary	
  standards	
  
–  …	
  
•  Define	
  consorCum	
  
mission	
  
–  Purpose	
  of	
  ERCC	
  2.0	
  
products	
  
•  Providing	
  infrastructure	
  
to	
  discern	
  signal	
  from	
  
arCfact	
  
•  Confidence	
  in	
  RNA	
  
measurement	
  results	
  
•  …	
  
How	
  can	
  we	
  work	
  together?	
  
•  How	
  do	
  we	
  make	
  
decisions?	
  
•  How	
  do	
  we	
  operate?	
  
–  Working	
  groups	
  
–  Semi-­‐annual	
  meeCngs	
  
–  Conference	
  calls	
  
–  Email	
  list,	
  wiki?	
  
•  Why	
  a	
  consorCum?	
  	
  
–  We	
  can	
  make	
  beder	
  
standards	
  together	
  
•  Things	
  the	
  consorCum	
  
can	
  do	
  as	
  an	
  enCty:	
  
–  Integrate	
  controls	
  from	
  
the	
  membership	
  
–  Conduct	
  validaCon	
  
studies	
  
–  Make	
  recommendaCons,	
  
guidelines,	
  develop	
  
standards	
  
•  Documentary	
  standards	
  
to	
  support	
  regulated	
  
applicaCons	
  
PARTICIPANT	
  PRESENTATIONS	
  
WORKING	
  GROUP	
  AND	
  SCOPE	
  
DISCUSSION	
  	
  
Design	
  Working	
  Group	
  
•  Types	
  of	
  Controls	
  
–  Transcript	
  isoforms	
  
–  miRNA	
  
–  Small	
  RNAs	
  –	
  pre-­‐miRNA,	
  
noncoding	
  
–  Cancer-­‐fusion	
  transcripts	
  
–  Microbial	
  RNAs	
  
–  Polysome-­‐associated	
  RNA	
  
spike-­‐ins	
  
–  Long	
  noncoding	
  RNAs	
  
–  Epitranscriptome	
  standards	
  	
  
–  Refined	
  mRNA	
  mimics	
  
–  …	
  
•  Design	
  consideraCons	
  
–  Sequence	
  source	
  
–  GC	
  content	
  
–  Length	
  
–  Complexity	
  
–  Poly-­‐adenylaCon	
  
–  Secondary	
  structure	
  
–  Non-­‐cognate	
  sequences	
  
(“alien”)	
  
–  ModificaCons	
  
–  …	
  
Development	
  Working	
  Group	
  
•  Control	
  synthesis	
  
–  DNA	
  templates,	
  RNA	
  
molecules	
  
–  Special	
  modificaCons	
  
•  QC	
  of	
  DNA,	
  RNA	
  controls	
  
–  Purity	
  
–  Homogeneity	
  
–  Stability	
  
•  Control	
  Mixture	
  Design	
  
–  Dynamic	
  range	
  
–  RaCos	
  
–  …	
  
	
  
•  Plan	
  and	
  conduct	
  
interlaboratory	
  studies	
  to	
  
evaluate	
  controls	
  
–  ValidaCon	
  of	
  controls	
  
–  ValidaCon	
  of	
  concepts	
  and	
  
analysis	
  
–  Use	
  mulCple	
  measurement	
  
technologies	
  
Analysis	
  Working	
  Group	
  
•  Develop	
  standard	
  
performance	
  metrics	
  
–  Develop	
  reference	
  
implementaCon	
  as	
  
example	
  
•  ApplicaCons	
  
–  Process	
  control	
  
–  QuanCtaCve	
  
benchmarking	
  
–  NormalizaCon	
  
–  OpCmizaCon	
  
•  Tools	
  as	
  needed	
  to	
  
support	
  design	
  &	
  
development	
  
–  ValidaCon	
  study	
  analysis	
  
–  Mixture	
  design	
  tools 	
  	
  
Design	
  Working	
  Group	
  
•  Types	
  of	
  Controls	
  
–  Transcript	
  isoforms	
  
–  miRNA	
  
–  Small	
  RNAs	
  –	
  pre-­‐miRNA,	
  
noncoding	
  
–  Cancer-­‐fusion	
  transcripts	
  
–  Microbial	
  RNAs	
  
–  Polysome-­‐associated	
  RNA	
  
spike-­‐ins	
  
–  Long	
  noncoding	
  RNAs	
  
–  Epitranscriptome	
  standards	
  	
  
–  Refined	
  mRNA	
  mimics	
  
–  …	
  
•  Design	
  consideraCons	
  
–  Sequence	
  source	
  
–  GC	
  content	
  
–  Length	
  
–  Complexity	
  
–  Poly-­‐adenylaCon	
  
–  Secondary	
  structure	
  
–  Non-­‐cognate	
  sequences	
  
(“alien”)	
  
–  ModificaCons	
  
–  …	
  
Design	
  Working	
  Group	
  
I	
  like…	
   I	
  wish…	
  
Closing	
  Comments	
  Day	
  1	
  
•  9:00	
  am	
  start	
  tomorrow	
  
(there	
  will	
  be	
  coffee)	
  
•  More	
  presentaCons	
  
tomorrow	
  morning	
  
•  Open	
  Pitch	
  session	
  is	
  
also	
  available	
  tomorrow	
  
–  Let	
  us	
  know	
  if	
  you	
  want	
  
to	
  speak	
  tomorrow,	
  but	
  
you	
  can	
  also	
  can	
  just	
  get	
  
up	
  and	
  pitch	
  
•  Working	
  groups	
  will	
  
reconvene	
  tomorrow	
  to	
  
develop	
  summaries	
  
•  Please	
  join	
  us	
  now	
  for	
  
dinner	
  

More Related Content

What's hot

Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Tania Acuna
 
Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5Dago Noel
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)Genome Reference Consortium
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
Sample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchSample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchQIAGEN
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justinGenomeInABottle
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justinGenomeInABottle
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_introGenomeInABottle
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminarGenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomesGenomeInABottle
 
Tilling, Eco- Tilling and MAS for crop improvement
Tilling, Eco- Tilling and MAS for crop improvementTilling, Eco- Tilling and MAS for crop improvement
Tilling, Eco- Tilling and MAS for crop improvementDr. Shobha D. Surbhaiyya
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 

What's hot (20)

Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)
 
Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
 
IAJPR SIVA
IAJPR SIVAIAJPR SIVA
IAJPR SIVA
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
Sample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchSample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome Research
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Tilling, Eco- Tilling and MAS for crop improvement
Tilling, Eco- Tilling and MAS for crop improvementTilling, Eco- Tilling and MAS for crop improvement
Tilling, Eco- Tilling and MAS for crop improvement
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 

Viewers also liked (7)

20140710 4 a_bergstrom_lucas_ercc2.0_workshop
20140710 4 a_bergstrom_lucas_ercc2.0_workshop20140710 4 a_bergstrom_lucas_ercc2.0_workshop
20140710 4 a_bergstrom_lucas_ercc2.0_workshop
 
20140710 5 k_thompson_ercc2.0_workshop
20140710 5 k_thompson_ercc2.0_workshop20140710 5 k_thompson_ercc2.0_workshop
20140710 5 k_thompson_ercc2.0_workshop
 
20140711 1 day2_nist_ercc2.0workshop
20140711 1 day2_nist_ercc2.0workshop20140711 1 day2_nist_ercc2.0workshop
20140711 1 day2_nist_ercc2.0workshop
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
20140710 3 l_paul_ercc2.0_workshop
20140710 3 l_paul_ercc2.0_workshop20140710 3 l_paul_ercc2.0_workshop
20140710 3 l_paul_ercc2.0_workshop
 

Similar to 20140710 1 day1_nist_ercc2.0workshop

Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5Dago Noel
 
Transcriptomics: A Tool for Plant Disease Management
Transcriptomics: A Tool for Plant Disease ManagementTranscriptomics: A Tool for Plant Disease Management
Transcriptomics: A Tool for Plant Disease ManagementSHIVANI PATHAK
 
5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco AgendaDiane McKenna
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
4th RNA-Seq San Francisco April 26-28 Event Guide
4th RNA-Seq San Francisco April 26-28 Event Guide4th RNA-Seq San Francisco April 26-28 Event Guide
4th RNA-Seq San Francisco April 26-28 Event GuideDiane McKenna
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposalGenomeInABottle
 
Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Affymetrix
 
5 Tips for Successful qRT-PCR Results Infographic
5 Tips for Successful qRT-PCR Results Infographic5 Tips for Successful qRT-PCR Results Infographic
5 Tips for Successful qRT-PCR Results InfographicQIAGEN
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsPawan Kumar
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Assays for protein dna interactions
Assays for protein dna interactionsAssays for protein dna interactions
Assays for protein dna interactionsoikawa
 
pcr en temps réel et evolution bioteche
pcr en temps réel  et evolution biotechepcr en temps réel  et evolution bioteche
pcr en temps réel et evolution biotecheDjamilaHEZIL
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerTom Kelly
 

Similar to 20140710 1 day1_nist_ercc2.0workshop (20)

Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5
 
Transcriptomics: A Tool for Plant Disease Management
Transcriptomics: A Tool for Plant Disease ManagementTranscriptomics: A Tool for Plant Disease Management
Transcriptomics: A Tool for Plant Disease Management
 
5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda
 
project
projectproject
project
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 
4th RNA-Seq San Francisco April 26-28 Event Guide
4th RNA-Seq San Francisco April 26-28 Event Guide4th RNA-Seq San Francisco April 26-28 Event Guide
4th RNA-Seq San Francisco April 26-28 Event Guide
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq Integrating arrays and RNA-Seq
Integrating arrays and RNA-Seq
 
5 Tips for Successful qRT-PCR Results Infographic
5 Tips for Successful qRT-PCR Results Infographic5 Tips for Successful qRT-PCR Results Infographic
5 Tips for Successful qRT-PCR Results Infographic
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Assays for protein dna interactions
Assays for protein dna interactionsAssays for protein dna interactions
Assays for protein dna interactions
 
pcr en temps réel et evolution bioteche
pcr en temps réel  et evolution biotechepcr en temps réel  et evolution bioteche
pcr en temps réel et evolution bioteche
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
 

More from External RNA Controls Consortium (6)

140926 ercc2 workshopreport
140926 ercc2 workshopreport140926 ercc2 workshopreport
140926 ercc2 workshopreport
 
20140711 7 j_myerson_ercc2.0_workshop
20140711 7 j_myerson_ercc2.0_workshop20140711 7 j_myerson_ercc2.0_workshop
20140711 7 j_myerson_ercc2.0_workshop
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
20140711 6 s_munro_ercc2.0_workshop
20140711 6 s_munro_ercc2.0_workshop20140711 6 s_munro_ercc2.0_workshop
20140711 6 s_munro_ercc2.0_workshop
 
20140711 5 s_pond_ercc2.0_workshop
20140711 5 s_pond_ercc2.0_workshop20140711 5 s_pond_ercc2.0_workshop
20140711 5 s_pond_ercc2.0_workshop
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 

Recently uploaded

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 

Recently uploaded (20)

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 

20140710 1 day1_nist_ercc2.0workshop

  • 1. ERCC  2.0  Workshop   Day  1   July  10,  2014   Sarah  Munro  and  Marc  Salit  
  • 3. Agenda   •  MeeCng  IntroducCon   –  ERCC  1.0  Recap  &   ApplicaCons   –  Charge  to  Workshop   –  ERCC  2.0  Scope  &   Process  Discussion   •  ParCcipant   PresentaCons   –  Bob  Se'erquist   –  Lukas  Paul     •  Break:  3:30  –  4:00pm   •  ParCcipant   PresentaCons   –  Anne  Bergstrom  Lucas   –  Karol  Thompson   –  Christopher  Mason   •  Working  Group   FormaCon  and  Scoping   Discussion   •  Dinner:  6:00pm  
  • 5. Gene  Expression  Measurements   5   Evaluation of gene expression measurements from commercial microarray platforms Paul K. Tan, Thomas J. Downey1 , Edward L. Spitznagel Jr2 , Pin Xu, Dadin Fu, Dimiter S. Dimitrov3 , Richard A. Lempicki4 , Bruce M. Raaka5 and Margaret C. Cam* Microarray Core Laboratory, National Institute of Diabetes and Digestive and Kidney Disorders (NIDDK), National Institutes of Health, 1 Partek Incorporated, 2 Department of Mathematics, Washington University, 3 Laboratory of Experimental and Computational Biology (LECB), National Cancer Institute, NIH, 4 National Institute of Allergy and Infectious Diseases (NIAID), NIH, SAIC-Frederick, Inc., 5 Clinical Endocrinology Branch, NIDDK, NIH, USA Received May 23, 2003; Revised July 11, 2003; Accepted August 11, 2003 ABSTRACT Multiple commercial microarrays for measuring genome-wide gene expression levels are currently available, including oligonucleotide and cDNA, single- and two-channel formats. This study reports on the results of gene expression measurements generated from identical RNA preparations that were obtained using three commercially available microarray platforms. RNA was collected from PANC-1 cells grown in serum-rich medium and at 24 h following the removal of serum. Three bio- logical replicates were prepared for each condition, and three experimental replicates were produced for the ®rst biological replicate. RNA was labeled and hybridized to microarrays from three major sup- pliers according to manufacturers' protocols, and gene expression measurements were obtained using each platform's standard software. For each platform, gene targets from a subset of 2009 com- mon genes were compared. Correlations in gene expression levels and comparisons for signi®cant gene expression changes in this subset were calcu- lated, and showed considerable divergence across the different platforms, suggesting the need for establishing industrial manufacturing standards, and further independent and thorough validation of the technology. INTRODUCTION A powerful application of microarray technology is in (1). Once target genes are identi®ed, additional laboratory resources may be invested to validate this list and to further characterize the relationship of their biological functions to the process under study (2). The ef®ciency of knowledge discovery using this high-throughput experimental process depends upon the reliability of the microarray technology used in the initial screening experiments. Researchers planning to utilize microarray experiments for discovery-based research must evaluate available commercial technologies when allo- cating laboratory resources for prospective experiments. Several formats of microarrays for measuring genome-wide gene expression levels are currently available (3). Important factors for selecting an appropriate microarray platform would include sensitivity, speci®city and both inter- and intra-assay reproducibility. Also important is knowledge of the degree of cross-platform agreement, as interchangeability amongst various microarray formats would allow for the utility of gene expression data without regard to platform. Having such a property would allow researchers from independent labora- tories to make direct comparisons on data produced from different types of available platforms, and would reduce the need to replicate experiments (4). Such cross-platform com- parisons ideally require that corresponding RNA expression measurements be concordant. Previous comparisons of microarray formats suggested that expression data on the NCI60 cell lines from spotted cDNA mircroarrays could not be directly combined with data from synthesized oligonucleo- tide arrays (5). This ®nding was determined using identical originating cell lines; however, cell culturing, mRNA prep- aration and hybridization of targets were all performed separately. In this study we analyzed identical RNA prepar- ations using three commercially available high-density microarray platforms. This experimental design allowed us to compare the microarray formats while controlling for 5676±5684 Nucleic Acids Research, 2003, Vol. 31, No. 19 DOI: 10.1093/nar/gkg763 •  Transcript  raCos  across   samples   •  Lack  of  confidence  in  gene   expression  experiments   –  Same  pair  of  samples,   different  plaUorms,  different   raCo  results!   •  CriCcal  applicaCons   –  Cancer  Biology   –  Drug  Discovery   –  Tissue  engineering   –  Stem  Cell  Biology  
  • 6. External  RNA  Control  ConsorCum  (ERCC)   •  Industry-­‐iniCated,  NIST-­‐hosted,   stakeholder  coupled   –  grew  out  of  NIST  workshop  in   2003   •  iniCated  by  Janet  Warrington,   VP  Clinical  Genomics  at   Affymetrix   –  all  major  microarray  technology   developers   –  other  gene  expression  assay   developers   •  Open  to  all  interested  parCes   •  Voluntary   •  More  than  90  parCcipants   –  Private,  Public,  Academic   6   Spike-­‐ ins  
  • 7. ERCC  CollaboraCve  Study   •  Developed  sequence  library   from  submission  by  ERCC   members,  as  well  as   synthesis   –  evaluated  performance  of   RNA  controls  on  variety  of   plaUorms   –  selected  96  well-­‐performing   sequences  in  collaboraCve   study   •  Array  manufacturers   modified  products  to   include  ERCC  control   sequences   7   176   144   106   96  
  • 8. SRM  2374  –  DNA  Sequence  Library     for  External  RNA  Controls   •  NIST  Standard  Reference   Material  (SRM  2374)   •  Contains  96  unique  control   sequences  inserted  in   common  plasmid  DNA   –  engineered  to  be  readily   in  vitro  transcribed  to  make   RNA  controls   –  RNA  controls  intended  to   mimic  mammalian  mRNA   •  hdp://www.nist.gov/srm/  
  • 9. Library  of  96  Controls  in  SRM  2374   Sequence  Lengths   Sequence Length count 0 5 10 15 20 25 500 1000 1500 2000 GC  Content   GC Content count 0 2 4 6 8 10 12 14 0.35 0.40 0.45 0.50 9  
  • 10. CreaCng  Spike-­‐in  Mixtures  from  SRM   2374   10   SRM  2374  Plasmid     DNA  Library   in  vitro   transcripCon   RNA  transcripts   Pooling   Mixtures  with  known   abundance  raCos   …  
  • 11. Feature   A_1   A_2   A_3   B_1   B_2   B_3   T1   1   5   4   0   2   3   T2   200   204   199   101   97   103   T3   142   153   147   149   130   155   ERCC-­‐0001   5   8   10   20   23   19   …   Method  ValidaCon  with     erccdashboard  R  package   erccdashboard Package Vignette Sarah A. Munro May 4, 2014 This vignette describes the use of the erccdashboard R package to analyze External RNA Control Con- sortium (ERCC) spike-in control ratio mixtures in gene expression experiments. If you use this package for method validation of your gene expression experiments please cite our publication: Please cite our paper when you use the erccdashboard package for analysis. This is a placeholder citation, because our manuscript is still under review. Munro SA, Lund S, Pine PS, Binder H, Clevert D, Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H, Jafari N, Kreil DP, ˚A , Aabaj PP, Liao Y, Lin S, Meehan J, Mason CE, Santoyo J, Setterquist RA, Shi L, Shi W, Smyth GK, Stralis-Pavese N, Su Z, Tong W, Wang C, Wang J, Xu J, Ye Z, Yang Y, Yu Y, Salit M (Under Review, 2014). Assessing Technical Performance in Gene Expression Experiments with External Spike-in RNA Control Ratio Mixtures. A BibTeX entry for LaTeX users is @Article{, title = {Assessing Technical Performance in Gene Expression Experiments with External Spike-in RNA Co author = {Munro SA and Lund S and Pine PS and Binder H and Clevert D and Conesa A and Dopazo J and Fa journal = {Under Review}, volume = {0}, pages = {0}, year = {2014}, } Munro SA, Lund S, Pine PS, Binder H, Clevert D, Conesa A, Dopazo J, Fasold M, Hochreiter S, Hong H, Jafari N, Kreil DP, ˘0141abaj PP, Li S, Liao Y, Lin S, Meehan J, Mason CE, Santoyo J, Setterquist RA, Shi •  Open-­‐source  R   package   –  erccdashboard   •  Assess  technical   performance  of  a   gene  expression   experiment   •  Compare  results   –  Within  a  single   laboratory   –  Between  laboratories  
  • 12. Method  ValidaCon  with     erccdashboard  R  package   •  Open-­‐source  R   package   –  erccdashboard   •  Assess  technical   performance  of  a   gene  expression   experiment   •  Compare  results   –  Within  a  single   laboratory   –  Between  laboratories  
  • 14. Product  and  Method  Development   •  CerCfied  Sequences   •  Known  concentraCons   –  ValidaCon  and  method   tesCng   –  Product  development   and  evaluaCon    
  • 15. Measurement  Analysis   Quality  Control   •  Limit  of  DetecCon   •  Dynamic  Range   •  Noise  models  
  • 16. Sample  NormalizaCon   •  Key  to  comparing   transcriptomes:   –  Immunology   –  Agriculture   –  Virology   –  Cancer  
  • 17. Single-­‐Cell  Measurements   •  NormalizaCon   •  Noise  modeling   •  RT  Efficiency   •  Limit  of  DetecCon  
  • 18. Others…   Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing Justin M. Zook1 *, Daniel Samarov2 , Jennifer McDaniel1 , Shurjo K. Sen3 , Marc Salit1 1 Biochemical Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America, 2 Statistical Engineering Division, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America, 3 Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America Abstract While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being ‘‘recalibrated’’ (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. Citation: Zook JM, Samarov D, McDaniel J, Sen SK, Salit M (2012) Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing. PLoS ONE 7(7): e41356. doi:10.1371/journal.pone.0041356 Editor: Janet Kelso, Max Planck Institute for Evolutionary Anthropology, Germany Received February 28, 2012; Accepted June 20, 2012; Published July 31, 2012 This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Funding: This research was supported in part by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. No additional external funding was received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: zook@nist.gov Introduction As sequencing costs drop, it is becoming cost-effective to sequence even whole genomes to a sufficient depth that random errors become insignificant. However, systematic sequencing errors (SSEs) and biases remain problematic even at high sequencing depths, so recent research has started to focus on understanding these SSEs and biases [1,2]. In this work, we focus on SSEs rather than coverage biases, where SSEs are systematic errors in sample preparation and sequencing processes that cause base call errors to accumulate preferentially at certain base positions in the genome, and coverage biases are biases in the number of reads covering certain genomic regions such as GC- bias [3–5]. Examples of SSEs, as well as random errors, are portrayed in Figure 1(a). Compensating for these SSEs is critical for applications in which a variant might be expected to be in only a small fraction of the reads, such as samples containing RNA- editing [6,7], cancer tissues and circulating tumor cells [8–11], fetal DNA in mother’s blood [12], mixtures of bacterial strains [13], mitochondrial heteroplasmy [14], mosaic disorders [15], and pooled samples [16,17]. Since the causes of many SSEs are not well understood and may vary due to batch effects in a run-specific manner, compensating for them requires training data sets. The two previously proposed approaches either use a separate data set with special characteristics (e.g., SysCall uses overlapping paired- end reads [1]) or use the data set itself excluding regions known to have variants (e.g., Genome Analysis Toolkit, or GATK, base quality score recalibration [2]). Here, we combine the advantages of these approaches by using DNA or RNA spike-in standards without homology to almost all biological organisms. The first approach, SysCall, used a methyl-Seq dataset that had overlapping paired-end reads to detect SSEs depending on sequencing direction for the Illumina sequencer [1]. The region in which the reads overlap can be used to find systematic errors that preferentially occur on one DNA strand compared to the other strand. To improve variant calls, the SysCall method uses a separate dataset with overlapping reads to train a logistic regression model that accounts for SSEs correlated with several covariates: (1) the 2 preceding bases + the base in question (each base independently), (2) directionality bias of the errors, the proportion of non-reference reads, and (3) a comparison of the quality scores of the error base to the next base. Most sequencing runs do not contain overlapping paired reads, so SysCall assumes the SSEs in a training data set are the same as the SSEs in other
  • 19. Experience  from  Expression  Analysis   Thanks  to  Wendell  Jones  and  Erik  Aronesty   I  like…   •  1000s  of  RNA-­‐Seq  samples   –  Ambion  Mix  1   –  “did  the  library  reacCons   work  appropriately  and   consistently?   –  “did  our  lab  degrade  samples   or  were  the  samples  already   degraded?”       –  “effects  (or  lack  of)  between   lane  or  flowcell”   I  wish…   •  “Construct  Ext  RNA  Controls  that   emulate  a  variety  of  splice  variaCon   (some  that  may  be  challenging)  and   have  them  at  different  magnitudes”   –  ”examine  not  only  the  chemistry  but  also   the  bioinformaCc  pipeline  to  ensure  it  has   basic  fitness.”   •  “Suggest  a  protocol  for  adding  Ext  RNA   Controls  for  FFPE.”     –  “While  we  spike  in  ERCC  controls  at  a  fixed   amount  for  FFPE  samples,  we  get  out  a   wild  range  of  sequence  coming  out  that   aligns  to  the  ERCC  controls.”   –  Hypothesis:  “much  of  the  target  RNA  is  so   damaged  that  it  doesn't  ligate  to  adapters   correctly,  but  ERCC  controls  do  (ligate);  as   a  result  they  are  (much)  preferenCally   amplified.”    
  • 20. ERCC  1.0  Shortcomings   •  Poly  A  SelecCon  is   broken   •  Too  short   •  No  isoforms   •  No  good  mimics  for   variants   –  SNPs,  cancer  fusions   •  Bimodal  GC  distribuCon   ERCC−00002 ERCC−00003 ERCC−00004 ERCC−00009 ERCC−00013 ERCC−00014 ERCC−00019 ERCC−00022 ERCC−00025 ERCC−00028 ERCC−00031 ERCC−00033 ERCC−00034 ERCC−00035 ERCC−00039 ERCC−00040 ERCC−00042 ERCC−00043 ERCC−00044 ERCC−00046 ERCC−00051 ERCC−00053 ERCC−00054 ERCC−00058 ERCC−00059 ERCC−00060 ERCC−00062 ERCC−00067 ERCC−00069 ERCC−00071 ERCC−00073 ERCC−00074 ERCC−00076 ERCC−00077 ERCC−00078 ERCC−00079 ERCC−00084 ERCC−00085 ERCC−00092 ERCC−00095 ERCC−00096 ERCC−00099 ERCC−00108 ERCC−00109 ERCC−00111 ERCC−00112 ERCC−00113 ERCC−00116 ERCC−00126 ERCC−00130 ERCC−00131 ERCC−00136 ERCC−00137 ERCC−00143 ERCC−00144 ERCC−00145 ERCC−00148 ERCC−00150 ERCC−00154 ERCC−00157 ERCC−00160 ERCC−00162 ERCC−00163 ERCC−00165 ERCC−00168 ERCC−00170 ERCC−00171 Lab1 Lab2 Lab3 Lab4 Lab5 Lab6 Lab7 Lab8 Lab9 Lab Feature −10 −5 0 5 ScaledEffect
  • 21. OpportuniCes  for  ERCC  2.0   •  New  technologies   –  RNA-­‐Seq   –  Long  reads   •  PacBio,  Moleculo   –  Digital  counCng   •  Cellular  Research,  digital   PCR   •  Method  improvements   –  Library  preparaCon   –  BioinformaCcs   •  New  discoveries   Counting individual DNA molecules by the stochastic attachment of diverse labels Glenn K. Fu, Jing Hu, Pei-Hua Wang, and Stephen P. A. Fodor1 Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA 95051 Edited* by Ronald W. Davis, Stanford Genome Technology Center, Palo Alto, CA, and approved March 22, 2011 (received for review November 27, 2010) We implement a unique strategy for single molecule counting termed stochastic labeling, where random attachment of a diverse set of labels converts a population of identical DNA molecules into a population of distinct DNA molecules suitable for threshold detection. The conceptual framework for stochastic labeling is developed and experimentally demonstrated by determining the absolute and relative number of selected genes after stochastically labeling approximately 360,000 different fragments of the human genome. The approach does not require the physical separation of molecules and takes advantage of highly parallel methods such as microarray and sequencing technologies to simultaneously count absolute numbers of multiple targets. Stochastic labeling should be particularly useful for determining the absolute numbers of RNA or DNA molecules in single cells. absolute counting ∣ digital PCR ∣ next-generation sequencing ∣ single molecule detection Determining small numbers of biological molecules and their changes is essential when unraveling mechanisms of cellular response, differentiation or signal transduction, and in perform- Identical DNA target molecules {t1 , t2 …. tn } t1 t2 t3 t4 Pool of labels {l1 , l2 …. lm } Random labeling t1 l20 t2 l107 t3 l477 t4 l9 Amplification and detection of k distinctly labeled molecules Fig. 1. A schematic representation of the labeling process. An example showing four identical target molecules in solution. Each DNA molecule ran- domly captures and joins with a label by choosing from a large, nondepleting Fu  et  al.  PNAS  2011    
  • 22. CHARGE  TO  THE  WORKSHOP  
  • 23. Charge  to  the  Workshop   •  Develop  consensus  on…     –  Concept   •  Shared  interests   –  PorUolio   •  Controls   •  Analysis   •  Documentary  Standards   •  Develop  consorCum  structure…   –  Working  groups   –  Steering  commidee  
  • 24. Principles  of  OperaCon   •  Pre-­‐compeCCve   •  Consensus  decision-­‐ making   •  Data-­‐driven   •  Technology   independent   •  Leadership   –  Working  Groups   –  Steering  Commidee   •  NIST-­‐hosted   •  “You  get  out  of  it  what   you  put  into  it.”  
  • 25. “A rising tide floats all boats” ERCC operates by consensus “A rising tide floats all boats…”
  • 26. VISION  OF  SCOPE  &  LIFESPAN  OF   ERCC  2.0  
  • 27. Scope  of  ERCC  2.0   •  ERCC  2.0  is  convened  to   develop  standard  controls   for  RNA  measurements   •  Three  working  groups  are   proposed   1.  Design   –  Types  of  controls  &   sequence  design   2.  Development   –  Building  controls,   developing  &  tesCng   control  mixtures   3.  Analysis   –  Standard  performance   metrics   –  Tools  as  needed  to   support  design  &   development  
  • 28. The  Arc  of  ERCC  2.0   •  Products   –  Sequences  represenCng   different  types  of  RNA   •  Transcript  isoforms   •  miRNA   •  New  mRNA  mimics   •  …   –  Documentary  standards  for   using  controls   –  Performance  metrics   •  LogisCcs   –  Workshops   •  Number,  frequency   –  Telecons,  Mailing  list,  Wiki   •  Development  Schedules   –  Sequence  selecCon   –  Control  synthesis   –  Control  tesCng  and  analysis   –  Reference  material   development,   characterizaCon,  release   –  AnalyCcal  methods  &  tools   –  Documentary  standards   –  …   –  Finished.   •  DisseminaCon   –  Steering  commidee  to   address  business  models    
  • 29. ERCC  2.0  PROCESS  DISCUSSION  
  • 30. What  will  we  do  together?   •  NIST  is  commided  to...   –  HosCng  the  consorCum   –  SupporCng  product   development   •  PorUolio  possibiliCes   –  Reference  materials   –  Reference  data   –  Analysis  methods   –  Analysis  tools   –  Documentary  standards   –  …   •  Define  consorCum   mission   –  Purpose  of  ERCC  2.0   products   •  Providing  infrastructure   to  discern  signal  from   arCfact   •  Confidence  in  RNA   measurement  results   •  …  
  • 31. How  can  we  work  together?   •  How  do  we  make   decisions?   •  How  do  we  operate?   –  Working  groups   –  Semi-­‐annual  meeCngs   –  Conference  calls   –  Email  list,  wiki?   •  Why  a  consorCum?     –  We  can  make  beder   standards  together   •  Things  the  consorCum   can  do  as  an  enCty:   –  Integrate  controls  from   the  membership   –  Conduct  validaCon   studies   –  Make  recommendaCons,   guidelines,  develop   standards   •  Documentary  standards   to  support  regulated   applicaCons  
  • 33. WORKING  GROUP  AND  SCOPE   DISCUSSION    
  • 34. Design  Working  Group   •  Types  of  Controls   –  Transcript  isoforms   –  miRNA   –  Small  RNAs  –  pre-­‐miRNA,   noncoding   –  Cancer-­‐fusion  transcripts   –  Microbial  RNAs   –  Polysome-­‐associated  RNA   spike-­‐ins   –  Long  noncoding  RNAs   –  Epitranscriptome  standards     –  Refined  mRNA  mimics   –  …   •  Design  consideraCons   –  Sequence  source   –  GC  content   –  Length   –  Complexity   –  Poly-­‐adenylaCon   –  Secondary  structure   –  Non-­‐cognate  sequences   (“alien”)   –  ModificaCons   –  …  
  • 35. Development  Working  Group   •  Control  synthesis   –  DNA  templates,  RNA   molecules   –  Special  modificaCons   •  QC  of  DNA,  RNA  controls   –  Purity   –  Homogeneity   –  Stability   •  Control  Mixture  Design   –  Dynamic  range   –  RaCos   –  …     •  Plan  and  conduct   interlaboratory  studies  to   evaluate  controls   –  ValidaCon  of  controls   –  ValidaCon  of  concepts  and   analysis   –  Use  mulCple  measurement   technologies  
  • 36. Analysis  Working  Group   •  Develop  standard   performance  metrics   –  Develop  reference   implementaCon  as   example   •  ApplicaCons   –  Process  control   –  QuanCtaCve   benchmarking   –  NormalizaCon   –  OpCmizaCon   •  Tools  as  needed  to   support  design  &   development   –  ValidaCon  study  analysis   –  Mixture  design  tools    
  • 37. Design  Working  Group   •  Types  of  Controls   –  Transcript  isoforms   –  miRNA   –  Small  RNAs  –  pre-­‐miRNA,   noncoding   –  Cancer-­‐fusion  transcripts   –  Microbial  RNAs   –  Polysome-­‐associated  RNA   spike-­‐ins   –  Long  noncoding  RNAs   –  Epitranscriptome  standards     –  Refined  mRNA  mimics   –  …   •  Design  consideraCons   –  Sequence  source   –  GC  content   –  Length   –  Complexity   –  Poly-­‐adenylaCon   –  Secondary  structure   –  Non-­‐cognate  sequences   (“alien”)   –  ModificaCons   –  …  
  • 38. Design  Working  Group   I  like…   I  wish…  
  • 39. Closing  Comments  Day  1   •  9:00  am  start  tomorrow   (there  will  be  coffee)   •  More  presentaCons   tomorrow  morning   •  Open  Pitch  session  is   also  available  tomorrow   –  Let  us  know  if  you  want   to  speak  tomorrow,  but   you  can  also  can  just  get   up  and  pitch   •  Working  groups  will   reconvene  tomorrow  to   develop  summaries   •  Please  join  us  now  for   dinner