SlideShare a Scribd company logo
1 of 105
Download to read offline
Introduction to Apollo

C o l l a b o r a t i v e g e n o m e a n n o t a t i o n e d i t i n g





A webinar for the i5K Research Community
Monica Munoz-Torres, PhD | @monimunozto

Berkeley Bioinformatics Open-Source Projects (BBOP)

Lawrence Berkeley National Laboratory | 

University of California Berkeley | U.S. Department of Energy


i5K Pilot Project Species Call | 13 October, 2015
OUTLINE

Web	
  Apollo	
  Collabora've	
  Cura'on	
  and	
  	
  
Interac've	
  Analysis	
  of	
  Genomes	
  
2OUTLINE
•  Today	
  we	
  will	
  discover	
  
how	
  to	
  extract	
  very	
  
valuable	
  informa'on	
  
about	
  a	
  genome	
  through	
  
cura'on	
  efforts.	
  
APOLLO DEVELOPMENT
APOLLO DEVELOPERS 3
h* p://G e nom e Ar c hite c t. or g /	
   	
  
Nathan Dunn
Eric Yao
JBrowse, UC Berkeley
Christine Elsik’s Lab,
University of Missouri
Suzi Lewis
Principal Investigator
BBOP	
  
Moni Munoz-Torres
Stephen Ficklin
GenSAS,
Washington State University
Colin DieshDeepak Unni
4
AFTER THIS TALK WE WILL...
v Be@er	
  understand	
  genome	
  cura'on	
  in	
  the	
  context	
  of	
  annota'on:	
  	
  
assembled	
  genome	
  à	
  automated	
  annota=on	
  à	
  manual	
  annota=on	
  
v Become	
  familiar	
  with	
  the	
  environment	
  and	
  func'onality	
  of	
  the	
  Apollo	
  
genome	
  annota'on	
  edi'ng	
  tool.	
  
v Learn	
  to	
  iden'fy	
  homologs	
  of	
  known	
  genes	
  of	
  interest	
  in	
  a	
  newly	
  
sequenced	
  genome.	
  
v Learn	
  about	
  corrobora'ng	
  and	
  modifying	
  automa'cally	
  annotated	
  gene	
  
models	
  using	
  available	
  evidence	
  in	
  Apollo.	
  
What to expect
A	
  typical	
  genome	
  	
  
sequencing	
  project	
  
6
Genome Sequencing Project
Anatomy of a genome sequencing project
Experimental design, sampling.
Comparative analyses
Consensus
Gene Set
Manual
Annotation
Automated
Annotation
Sequencing Assembly
Synthesis &
dissemination.
CURATING GENOMES

steps involved
1  Genera=on	
  of	
  Gene	
  Models	
  
calling	
  ORFs,	
  one	
  or	
  more	
  
rounds	
  of	
  gene	
  predic'on,	
  
etc.	
  
	
  
2  Annota=on	
  of	
  gene	
  models	
  
Describing	
  func'on,	
  
expression	
  pa@erns,	
  
metabolic	
  network	
  
	
  memberships.	
  
	
  
3  Manual	
  annota=on	
  
CURATING GENOMES 7
GENOME ANNOTATION

objectives and uses
Curating Genomes 8
The	
  gene	
  set	
  of	
  an	
  organism	
  informs	
  a	
  variety	
  of	
  studies:	
  
•  Gene	
  number,	
  GC%,	
  TE	
  composi'on,	
  repe''ve	
  regions.	
  
•  Func'onal	
  assignments.	
  
•  Molecular	
  evolu'on,	
  sequence	
  conserva'on.	
  
•  Gene	
  families.	
  
•  Metabolic	
  pathways.	
  
•  What	
  makes	
  an	
  organism	
  what	
  it	
  is?	
  	
  
What	
  makes	
  a	
  bee	
  a	
  “bee”?	
  
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
Bio-­‐refresher	
  
REMEMBER... 

for manual annotation
To	
  remember…	
  Biological	
  concepts	
  to	
  be@er	
  
understand	
  manual	
  annota'on	
  
10BIO-REFRESHER
•  GLOSSARY	
  
from	
  con1g	
  to	
  splice	
  site	
  
	
  
•  CENTRAL	
  DOGMA	
  
in	
  molecular	
  biology	
  
	
  
•  WHAT	
  IS	
  A	
  GENE?	
  
defining	
  your	
  goal	
  
•  TRANSCRIPTION	
  
mRNA	
  in	
  detail	
  
	
  
•  TRANSLATION	
  
and	
  other	
  defini'ons	
  
•  GENOME	
  CURATION	
  
steps	
  involved	
  
11BIO-REFRESHER
WHAT IS A GENE?
v  A	
  con'nuously	
  evolving	
  concept	
  paints	
  a	
  very	
  complex	
  
picture	
  of	
  molecular	
  ac'vity:	
  	
  
“A	
  gene	
  is	
  a	
  locatable	
  region	
  of	
  genomic	
  sequence,	
  corresponding	
  to	
  
a	
  unit	
  of	
  inheritance,	
  which	
  is	
  associated	
  with	
  regulatory	
  regions,	
  
transcribed	
  regions	
  and/or	
  other	
  func'onal	
  sequence	
  regions”.	
  	
  
-­‐	
  The	
  Sequence	
  Ontology	
  
	
  
12BIO-REFRESHER
WHAT IS A GENE?
v  ...	
  also	
  long	
  transcripts,	
  dispersed	
  regula1on.	
  
	
  
	
  
“The	
  gene	
  is	
  a	
  DNA	
  segment	
  that	
  contributes	
  to	
  phenotype	
  and	
  func'on.	
  In	
  
the	
  absence	
  of	
  demonstrated	
  func'on,	
  a	
  gene	
  may	
  be	
  characterized	
  by	
  
sequence,	
  transcrip'on	
  or	
  homology.”	
  
	
  
-­‐	
  The	
  ENCODE	
  Project	
  
https://www.encodeproject.org/
13BIO-REFRESHER
“The	
  gene	
  is	
  a	
  union	
  
of	
  genomic	
  sequences	
  
encoding	
  a	
  coherent	
  
set	
  of	
  poten'ally	
  	
  
overlapping	
  func'onal	
  
products.”	
  
Gerstein et al., 2007. Genome Res
THE GENE: a moving target
14BIO-REFRESHER
TRANSLATION

reading frames
v  Reading	
  frame	
  is	
  a	
  manner	
  of	
  dividing	
  the	
  sequence	
  of	
  nucleo'des	
  in	
  mRNA	
  
(or	
  DNA)	
  into	
  a	
  set	
  of	
  consecu've,	
  non-­‐overlapping	
  triplets	
  (codons).	
  
v  Three	
  frames	
  can	
  be	
  read	
  in	
  the	
  5’	
  à	
  3’	
  direc'on.	
  Given	
  that	
  DNA	
  has	
  two	
  
an'-­‐parallel	
  strands,	
  an	
  addi'onal	
  three	
  frames	
  are	
  possible	
  to	
  be	
  read	
  on	
  
the	
  an'-­‐sense	
  strand.	
  Six	
  total	
  possible	
  reading	
  frames	
  exist.	
  
v  In	
  eukaryotes,	
  only	
  one	
  reading	
  frame	
  per	
  sec'on	
  of	
  DNA	
  is	
  biologically	
  
relevant	
  at	
  a	
  'me:	
  it	
  has	
  the	
  poten'al	
  to	
  be	
  transcribed	
  into	
  RNA	
  and	
  
translated	
  into	
  protein.	
  This	
  is	
  called	
  the	
  OPEN	
  READING	
  FRAME	
  (ORF)	
  
•  ORF	
  =	
  Start	
  signal	
  +	
  coding	
  sequence	
  (divisible	
  by	
  3)	
  +	
  Stop	
  signal	
  
v  The	
  sec'ons	
  of	
  the	
  mature	
  mRNA	
  transcribed	
  with	
  the	
  coding	
  sequence	
  but	
  
not	
  translated	
  are	
  called	
  UnTranslated	
  Regions	
  (UTR);	
  one	
  at	
  each	
  end.	
  
15BIO-REFRESHER
TRANSLATION

splice sites
v  The	
  spliceosome	
  catalyzes	
  the	
  removal	
  of	
  introns	
  and	
  the	
  liga'on	
  of	
  flanking	
  
exons.	
  
•  introns:	
  spaces	
  inside	
  the	
  gene,	
  not	
  part	
  of	
  the	
  coding	
  sequence	
  
•  exons:	
  expression	
  units	
  (of	
  the	
  coding	
  sequence)	
  
v  Splicing	
  signals	
  (from	
  the	
  point	
  of	
  view	
  of	
  an	
  intron):	
  	
  
•  One	
  splice	
  signal	
  (site)	
  on	
  the	
  5’	
  end:	
  usually	
  GT	
  (less	
  common:	
  GC)	
  
•  And	
  a	
  3’	
  end	
  splice	
  site:	
  usually	
  AG	
  
•  Canonical	
  splice	
  sites	
  look	
  like	
  this:	
  …]5’-­‐GT/AG-­‐3’[…	
  
	
  
v  It	
  is	
  possible	
  to	
  produce	
  more	
  than	
  one	
  protein	
  (polypep'de)	
  sequence	
  from	
  
the	
  same	
  genic	
  region,	
  by	
  alterna'vely	
  bringing	
  exons	
  together=	
  alterna=ve	
  
splicing.	
  For	
  example,	
  the	
  gene	
  Dscam	
  (Drosophila)	
  has	
  38,000	
  alterna'vely	
  
spliced	
  mRNAs	
  =	
  isoforms	
  
16BIO-REFRESHER
TRANSLATION

phase
v  Introns	
  can	
  interrupt	
  the	
  reading	
  frame	
  of	
  a	
  gene	
  by	
  inser'ng	
  a	
  sequence	
  
between	
  two	
  consecu've	
  codons	
  
	
  
	
  
v  Between	
  the	
  first	
  and	
  second	
  nucleo'de	
  of	
  a	
  codon	
  
	
  
v  Or	
  between	
  the	
  second	
  and	
  third	
  nucleo'de	
  of	
  a	
  codon	
  
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
17
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
mRNA

now in your mind
•  Although	
  of	
  brief	
  existence,	
  understanding	
  mRNAs	
  is	
  crucial,	
  
	
  as	
  they	
  will	
  become	
  the	
  center	
  of	
  your	
  work.	
  
Predic'on	
  &	
  Annota'on	
  
19GENE PREDICTION & ANNOTATION
PREDICTION & ANNOTATION
v  Iden'fica'on	
  and	
  annota'on	
  of	
  genome	
  features:	
  
	
  
•  primarily	
  focuses	
  on	
  protein-­‐coding	
  genes.	
  	
  
•  also	
  iden'fies	
  RNAs	
  (tRNA,	
  rRNA,	
  long	
  and	
  small	
  non-­‐coding	
  
RNAs	
  (ncRNA)),	
  regulatory	
  mo'fs,	
  repe''ve	
  elements,	
  etc.	
  
	
  
•  happens	
  in	
  2	
  phases:	
  
1.  Computa'on	
  phase	
  	
  
2.  Annota'on	
  phase	
  
20GENE PREDICTION & ANNOTATION
COMPUTATION PHASE
a.  Experimental	
  data	
  are	
  aligned	
  to	
  the	
  genome:	
  expressed	
  sequence	
  tags,	
  
RNA-­‐sequencing	
  reads,	
  proteins	
  (homologous	
  and	
  heterologous).	
  
	
  
	
  
	
  
	
  
	
  
b.  Gene	
  predic=ons	
  are	
  generated:	
  	
  
	
  -­‐	
  ab	
  ini1o:	
  based	
  on	
  nucleo'de	
  sequence	
  and	
  composi'on	
  
	
  e.g.	
  Augustus,	
  GENSCAN,	
  geneid,	
  fgenesh,	
  etc.	
  
	
  -­‐	
  evidence-­‐driven:	
  iden'fying	
  also	
  domains	
  and	
  mo'fs	
  
	
  e.g.	
  SGP2,	
  JAMg,	
  fgenesh++,	
  etc.	
  
	
  
	
  
Result:	
  the	
  single	
  most	
  likely	
  coding	
  sequence,	
  no	
  UTRs,	
  no	
  isoforms.	
  
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
21GENE PREDICTION & ANNOTATION
ANNOTATION PHASE
Experimental	
  data	
  (evidence)	
  and	
  predic'ons	
  are	
  synthe'zed	
  into	
  gene	
  
annota'ons.	
  
	
  
Result:	
  gene	
  models	
  that	
  [generally]	
  include	
  UTRs,	
  isoforms,	
  evidence	
  trails.	
  
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
5’	
  UTR	
   3’	
  UTR	
  
22
In	
  some	
  cases	
  algorithms	
  and	
  metrics	
  used	
  to	
  generate	
  
consensus	
  sets	
  may	
  actually	
  reduce	
  the	
  accuracy	
  of	
  the	
  gene’s	
  
representa'on.	
  
CONSENSUS GENE SETS
Gene	
  models	
  may	
  be	
  organized	
  into	
  sets	
  using:	
  
v  combiners	
  for	
  automa'c	
  integra'on	
  of	
  predicted	
  sets	
  	
  
e.g:	
  GLEAN,	
  EvidenceModeler	
  
or	
  
v  tools	
  packaged	
  into	
  pipelines	
  
e.g:	
  MAKER,	
  PASA,	
  Gnomon,	
  Ensembl,	
  etc.	
  
GENE PREDICTION & ANNOTATION
ANNOTATION

an imperfect art
No one is perfect, least of all automated annotation. 23
New	
  technology	
  brings	
  new	
  challenges:	
  	
  
•  Assembly	
  errors	
  can	
  cause	
  fragmented	
  
annota'ons	
  
•  Limited	
  coverage	
  makes	
  precise	
  
iden'fica'on	
  a	
  difficult	
  task	
  
Image: www.BroadInstitute.org
MANUAL ANNOTATION

improving predictions
Precise	
  elucida=on	
  of	
  biological	
  features	
  
encoded	
  in	
  the	
  genome	
  requires	
  careful	
  
examina=on	
  and	
  review.	
  	
  
Schiex	
  et	
  al.	
  Nucleic	
  Acids	
  2003	
  (31)	
  13:	
  3738-­‐3741	
  
Automated Predictions
Experimental Evidence
Manual Annotation – to the rescue. 24
cDNAs,	
  HMM	
  domain	
  searches,	
  RNAseq,	
  
genes	
  from	
  other	
  species.	
  
25
BIOCURATION

structural and functional adjustments
Iden=fies	
  elements	
  that	
  best	
  
represent	
  the	
  underlying	
  biology	
  
and	
  eliminates	
  elements	
  that	
  
reflect	
  systemic	
  errors	
  of	
  
automated	
  analyses.	
  
Assigns	
  func=on	
  through	
  
compara've	
  analysis	
  of	
  similar	
  
genome	
  elements	
  from	
  closely	
  
related	
  species	
  using	
  literature,	
  
databases,	
  and	
  experimental	
  data.	
  
MANUAL ANNOTATION
h@p://GeneOntology.org	
  
1	
  
2	
  
GENOME ANNOTATION

an inherently collaborative task
GENE PREDICTION & ANNOTATION 26
Researchers	
  oGen	
  turn	
  to	
  colleagues	
  for	
  second	
  
opinions	
  and	
  insight	
  from	
  those	
  with	
  exper1se	
  in	
  
par1cular	
  areas	
  (e.g.,	
  domains,	
  families).	
  
So	
  many	
  sequences,	
  not	
  enough	
  hands.	
  
APOLLO

collaborative genome annotation editing tool
27
v  Web	
  based,	
  integrated	
  with	
  JBrowse.	
  
v  Supports	
  real	
  'me	
  collabora'on.	
  
v  Automa'c	
  genera'on	
  of	
  ready-­‐made	
  	
  
computable	
  data.	
  	
  
v  Supports	
  annota'on	
  of	
  genes,	
  	
  pseudogenes,	
  	
  
tRNAs,	
  snRNAs,	
  snoRNAs,	
  ncRNAs,	
  miRNAs,	
  TEs,	
  and	
  repeats.	
  
v  Intui've	
  annota'on,	
  gestures,	
  and	
  pull-­‐down	
  menus	
  to	
  create	
  and	
  
edit	
  transcripts	
  and	
  exons	
  structures,	
  insert	
  comments	
  (CV,	
  freeform	
  
text),	
  associate	
  GO	
  terms,	
  etc.	
  
APOLLO
h@p://GenomeArchitect.org	
  	
  
Con'nuous	
  training	
  and	
  support	
  for	
  hundreds	
  of	
  geographically	
  dispersed	
  
scien'sts,	
   from	
   diverse	
   research	
   communi'es,	
   in	
   conduc'ng	
   manual	
  
annota'ons	
   efforts	
   to	
   recover	
   coding	
   sequences	
   in	
   agreement	
   with	
   all	
  
available	
  biological	
  evidence	
  using	
  Apollo.	
  	
  
28
LESSONS LEARNED
APOLLO
•  Collabora've	
  work	
  dis'lls	
  invaluable	
  knowledge	
  
29
A LITTLE TRAINING GOES A LONG WAY!
Provided	
  with	
  adequate	
  tools,	
  wet	
  lab	
  scien'sts	
  make	
  excep'onal	
  
curators	
  who	
  can	
  easily	
  learn	
  to	
  maximize	
  the	
  genera'on	
  of	
  accurate,	
  
biologically	
  supported	
  gene	
  models.	
  
APOLLO
Apollo	
  
Sort
Apollo - current version at i5K Workspace@NAL
31
The	
  Sequence	
  Selec'on	
  Window	
  
4. Becoming Acquainted with Web Apollo.
31
32
APOLLO

annotation editing environment
BECOMING ACQUAINTED WITH APOLLO
Color	
  by	
  CDS	
  frame,	
  
toggle	
  strands,	
  set	
  color	
  
scheme	
  and	
  highlights.	
  
-­‐	
  Upload	
  evidence	
  files	
  
(GFF3,	
  BAM,	
  BigWig),	
  
-­‐	
  combina=on	
  track	
  	
  
-­‐	
  sequence	
  search	
  track	
  
Query	
  the	
  genome	
  using	
  
BLAT.	
  
Naviga'on	
  and	
  zoom.	
  
Search	
  for	
  a	
  gene	
  
model	
  or	
  a	
  scaffold.	
  
Get	
  coordinates	
  and	
  “rubber	
  
band”	
  selec'on	
  for	
  zooming.	
  
Login	
  
User-­‐created	
  
annota'ons.	
  
New	
  
annotator	
  
panel.	
  
Evidence	
  
Tracks	
  
Stage	
  and	
  
cell-­‐type	
  
specific	
  
transcrip'on	
  
data.	
  
	
  h@p://genomearchitect.org/web_apollo_user_guide	
  	
  
Cura'ng	
  with	
  Apollo	
  
BECOMING ACQUAINTED WITH APOLLO
34 | 34	
GENERAL PROCESS OF CURATION

main steps to remember
1.  Select	
  or	
  find	
  a	
  region	
  of	
  interest,	
  e.g.	
  scaffold.	
  
2.  Select	
  appropriate	
  evidence	
  tracks	
  to	
  review	
  the	
  gene	
  model.	
  
3.  Determine	
  whether	
  a	
  feature	
  in	
  an	
  exis'ng	
  evidence	
  track	
  
will	
  provide	
  a	
  reasonable	
  gene	
  model	
  to	
  start	
  working.	
  
4.  If	
  necessary,	
  adjust	
  the	
  gene	
  model.	
  
5.  Check	
  your	
  edited	
  gene	
  model	
  for	
  integrity	
  and	
  accuracy	
  by	
  
comparing	
  it	
  with	
  available	
  homologs.	
  
6.  Comment	
  and	
  finish.	
  
USER NAVIGATION

removable side dock
HIGHLIGHTED IMPROVEMENTS 35
Annotations Organism Users Groups AdminTracks
Reference
Sequence
EDITS & EXPORTS

annotation details, exon boundaries, data export
HIGHLIGHTED IMPROVEMENTS 36
1 2
Annotations
1
2
gene	
  
mRNA	
  
HIGHLIGHTED IMPROVEMENTS 37
Reference
Sequences
3
FASTA	
  
GFF3	
  
EDITS & EXPORTS

annotation details, exon boundaries, data export
3
38 | 38	
BECOMING ACQUAINTED WITH APOLLO
USER NAVIGATION
Annotator	
  
panel.	
  
•  Choose	
  appropriate	
  evidence	
  from	
  list	
  of	
  “Tracks”	
  on	
  annotator	
  panel.	
  	
  
	
  
•  Select	
  &	
  drag	
  elements	
  from	
  evidence	
  track	
  into	
  the	
  ‘User-­‐created	
  Annota1ons’	
  area.	
  
	
  
•  Hovering	
  over	
  annota'on	
  in	
  progress	
  brings	
  up	
  an	
  informa'on	
  pop-­‐up.	
  
•  Crea'ng	
  a	
  new	
  annota'on	
  
39 | 39	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Annota'on	
  right-­‐click	
  menu	
  
40 | 40	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  ‘Zoom	
  to	
  base	
  level’	
  op'on	
  reveals	
  the	
  DNA	
  Track.	
  
41 | 41	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Color	
  exons	
  by	
  CDS	
  from	
  the	
  ‘View’	
  menu.	
  
42 |
Zoom	
  in/out	
  with	
  keyboard:	
  
shit	
  +	
  arrow	
  keys	
  up/down	
  
42	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Toggle	
  reference	
  DNA	
  sequence	
  and	
  transla=on	
  frames	
  in	
  forward	
  
strand.	
  Toggle	
  models	
  in	
  either	
  direc'on.	
  
Annota'on	
  
simple	
  cases	
  
“Simple	
  case”:	
  	
  
	
  -­‐	
  the	
  predicted	
  gene	
  model	
  is	
  correct	
  or	
  nearly	
  correct,	
  and	
  	
  
	
  -­‐	
  this	
  model	
  is	
  supported	
  by	
  evidence	
  that	
  completely	
  or	
  mostly	
  
agrees	
  with	
  the	
  predic'on.	
  	
  
	
  -­‐	
  evidence	
  that	
  extends	
  beyond	
  the	
  predicted	
  model	
  is	
  assumed	
  
to	
  be	
  non-­‐coding	
  sequence.	
  	
  
	
  
The	
  following	
  are	
  simple	
  modifica'ons.	
  	
  
	
  
45 | 45	
ANNOTATING SIMPLE CASES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
•  A	
  confirma'on	
  box	
  will	
  warn	
  you	
  if	
  the	
  receiving	
  transcript	
  is	
  not	
  on	
  the	
  
same	
  strand	
  as	
  the	
  feature	
  where	
  the	
  new	
  exon	
  originated.	
  
	
  
•  Check	
  ‘Start’	
  and	
  ‘Stop’	
  signals	
  ater	
  each	
  edit.	
  
46	
ADDING EXONS
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
If	
  transcript	
  alignment	
  data	
  are	
  available	
  &	
  extend	
  beyond	
  your	
  original	
  annota'on,	
  	
  
you	
  may	
  extend	
  or	
  add	
  UTRs.	
  	
  
1.  Right	
  click	
  at	
  the	
  exon	
  edge	
  and	
  ‘Zoom	
  to	
  base	
  level’.	
  	
  
2.  Place	
  the	
  cursor	
  over	
  the	
  edge	
  of	
  the	
  exon	
  un1l	
  it	
  becomes	
  a	
  black	
  arrow	
  then	
  click	
  
and	
  drag	
  the	
  edge	
  of	
  the	
  exon	
  to	
  the	
  new	
  coordinate	
  posi'on	
  that	
  includes	
  the	
  UTR.	
  	
  
47	
ADDING UTRs
To	
  add	
  a	
  new	
  spliced	
  UTR	
  to	
  an	
  exis'ng	
  	
  
annota'on	
  also	
  follow	
  the	
  procedure	
  for	
  adding	
  an	
  exon.	
  
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
To	
  modify	
  an	
  exon	
  boundary	
  and	
  match	
  
data	
   in	
   the	
   evidence	
   tracks:	
   select	
  
both	
   the	
   [offending]	
   exon	
   and	
   the	
  
feature	
  with	
  the	
  expected	
  boundary,	
  
then	
  right	
  click	
  on	
  the	
  annota'on	
  to	
  
select	
  ‘Set	
  3’	
  end’	
  or	
  ‘Set	
  5’	
  end’	
  as	
  
appropriate.	
  
	
  
In	
  some	
  cases	
  all	
  the	
  data	
  may	
  disagree	
  with	
  the	
  annota'on,	
  in	
  
other	
  cases	
  some	
  data	
  support	
  the	
  annota'on	
  and	
  some	
  of	
  the	
  
data	
  support	
  one	
  or	
  more	
  alterna've	
  transcripts.	
  Try	
  to	
  annotate	
  
as	
  many	
  alterna've	
  transcripts	
  as	
  are	
  well	
  supported	
  by	
  the	
  data.	
  
48	
MATCHING EXON BOUNDARY TO EVIDENCE
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-­‐canonical	
  splice	
  sites	
  flags.	
   Double	
  click:	
  selec'on	
  of	
  
feature	
  and	
  sub-­‐features	
  
Evidence	
  Tracks	
  Area	
  
‘User-­‐created	
  Annota1ons’	
  Track	
  
Edge-­‐matching	
  
Apollo’s	
  edi'ng	
  logic	
  (brain):	
  	
  
§  selects	
  longest	
  ORF	
  as	
  CDS	
  
§  flags	
  non-­‐canonical	
  splice	
  sites	
  
49	
ORFs AND SPLICE SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-­‐canonical	
  splices	
  are	
  indicated	
  by	
  
an	
   orange	
   circle	
   with	
   a	
   white	
  
exclama'on	
  point	
  inside,	
  placed	
  over	
  
the	
  edge	
  of	
  the	
  offending	
  exon.	
  	
  
Canonical	
  splice	
  sites:	
  
3’-­‐…exon]GA	
  /	
  TG[exon…-­‐5’	
  
5’-­‐…exon]GT	
  /	
  AG[exon…-­‐3’	
  
reverse	
  strand,	
  not	
  reverse-­‐complemented:	
  
forward	
  strand	
  
50	
SPLICE SITES
Zoom	
  to	
  review	
  non-­‐canonical	
  
splice	
  site	
  warnings.	
  Although	
  
these	
  may	
  not	
  always	
  have	
  to	
  be	
  
corrected	
  (e.g	
  GC	
  donor),	
  they	
  
should	
  be	
  flagged	
  with	
  a	
  
comment.	
  	
  
Exon/intron	
  splice	
  site	
  error	
  warning	
  
Curated	
  model	
  
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Apollo	
  calculates	
  the	
  longest	
  possible	
  open	
  reading	
  
frame	
  (ORF)	
  that	
  includes	
  canonical	
  ‘Start’	
  and	
  
‘Stop’	
  signals	
  within	
  the	
  predicted	
  exons.	
  	
  
If	
  ‘Start’	
  appears	
  to	
  be	
  incorrect,	
  modify	
  it	
  by	
  selec'ng	
  
an	
  in-­‐frame	
  ‘Start’	
  codon	
  further	
  up	
  or	
  
downstream,	
  depending	
  on	
  evidence	
  (proteins,	
  
RNAseq).	
  	
  
	
  
It	
  may	
  be	
  present	
  outside	
  the	
  predicted	
  gene	
  
model,	
  within	
  a	
  region	
  supported	
  by	
  another	
  
evidence	
  track.	
  
	
  
In	
  very	
  rare	
  cases,	
  the	
  actual	
  ‘Start’	
  codon	
  may	
  be	
  
non-­‐canonical	
  (non-­‐ATG).	
  	
  
51	
‘Start’ AND ‘Stop’ SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
1.  Zoom	
  in	
  to	
  clearly	
  resolve	
  each	
  exon	
  as	
  a	
  dis'nct	
  rectangle.	
  	
  
2.  Two	
  exons	
  from	
  different	
  tracks	
  sharing	
  the	
  same	
  start/end	
  coordinates	
  
display	
  a	
  red	
  bar	
  to	
  indicate	
  matching	
  edges.	
  
3.  Selec'ng	
  the	
  whole	
  annota'on	
  or	
  one	
  exon	
  at	
  a	
  'me,	
  use	
  this	
  edge-­‐
matching	
  func'on	
  and	
  scroll	
  along	
  the	
  length	
  of	
  the	
  annota'on,	
  
verifying	
  exon	
  boundaries	
  against	
  available	
  data.	
  	
  
Use	
  square	
  [	
  ]	
  brackets	
  to	
  scroll	
  from	
  exon	
  to	
  exon.	
  
User	
  curly	
  {	
  }	
  brackets	
  to	
  scroll	
  from	
  annota'on	
  to	
  annota'on.	
  
4.  Check	
  if	
  cDNA	
  /	
  RNAseq	
  reads	
  lack	
  one	
  or	
  more	
  of	
  the	
  annotated	
  exons	
  
or	
  include	
  addi'onal	
  exons.	
  	
  
52	
CHECKING EXON INTEGRITY
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
complex	
  cases	
  
Evidence	
  may	
  support	
  joining	
  two	
  or	
  more	
  different	
  gene	
  models.	
  	
  
Warning:	
  protein	
  alignments	
  may	
  have	
  incorrect	
  splice	
  sites	
  and	
  lack	
  non-­‐conserved	
  regions!	
  
	
  
1.  In	
  ‘User-­‐created	
  Annota<ons’	
  area	
  shit-­‐click	
  to	
  select	
  an	
  intron	
  from	
  each	
  gene	
  model	
  and	
  
right	
  click	
  to	
  select	
  the	
  ‘Merge’	
  op'on	
  from	
  the	
  menu.	
  	
  
2.  Drag	
  suppor'ng	
  evidence	
  tracks	
  over	
  the	
  candidate	
  models	
  to	
  corroborate	
  overlap,	
  or	
  
review	
  edge	
  matching	
  and	
  coverage	
  across	
  models.	
  
3.  Check	
  the	
  resul'ng	
  transla'on	
  by	
  querying	
  a	
  protein	
  database	
  e.g.	
  UniProt,	
  NCBI	
  nr.	
  Add	
  
comments	
  to	
  record	
  that	
  this	
  annota'on	
  is	
  the	
  result	
  of	
  a	
  merge.	
  
54	
Red	
  lines	
  around	
  exons:	
  
‘edge-­‐matching’	
  allows	
  annotators	
  to	
  confirm	
  whether	
  the	
  
evidence	
  is	
  in	
  agreement	
  without	
  examining	
  each	
  exon	
  at	
  the	
  
base	
  level.	
  
COMPLEX CASES
merge two gene predictions on the same scaffold
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
One	
  or	
  more	
  splits	
  may	
  be	
  recommended	
  when:	
  	
  
-­‐	
  different	
  segments	
  of	
  the	
  predicted	
  protein	
  align	
  to	
  two	
  or	
  more	
  different	
  
gene	
  families	
  	
  
-­‐	
  predicted	
  protein	
  doesn’t	
  align	
  to	
  known	
  proteins	
  over	
  its	
  en're	
  length	
  
-­‐	
  Transcript	
  data	
  may	
  support	
  a	
  split,	
  but	
  first	
  verify	
  whether	
  they	
  are	
  
alterna've	
  transcripts.	
  	
  
55	
COMPLEX CASES
split a gene prediction
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
DNA	
  Track	
  
‘User-­‐created	
  Annota=ons’	
  Track	
  
56	
COMPLEX CASES
correcting frameshifts and single-base errors
Always	
  remember:	
  when	
  annota'ng	
  gene	
  models	
  using	
  Apollo,	
  you	
  are	
  looking	
  at	
  a	
  ‘frozen’	
  version	
  of	
  
the	
  genome	
  assembly	
  and	
  you	
  will	
  not	
  be	
  able	
  to	
  modify	
  the	
  assembly	
  itself.	
  
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
57	
COMPLEX CASES
correcting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
58	
COMPLEX CASES
correcting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
1.  Apollo	
  allows	
  annotators	
  to	
  make	
  single	
  base	
  modifica'ons	
  or	
  frameshits	
  that	
  are	
  reflected	
  in	
  
the	
  sequence	
  and	
  structure	
  of	
  any	
  transcripts	
  overlapping	
  the	
  modifica'on.	
  These	
  
manipula'ons	
  do	
  NOT	
  change	
  the	
  underlying	
  genomic	
  sequence.	
  	
  
2.  If	
  you	
  determine	
  that	
  you	
  need	
  to	
  make	
  one	
  of	
  these	
  changes,	
  zoom	
  in	
  to	
  the	
  nucleo'de	
  level	
  
and	
  right	
  click	
  over	
  a	
  single	
  nucleo'de	
  on	
  the	
  genomic	
  sequence	
  to	
  access	
  a	
  menu	
  that	
  
provides	
  op'ons	
  for	
  crea'ng	
  inser'ons,	
  dele'ons	
  or	
  subs'tu'ons.	
  	
  
3.  The	
  ‘Create	
  Genomic	
  Inser<on’	
  feature	
  will	
  require	
  you	
  to	
  enter	
  the	
  necessary	
  string	
  of	
  
nucleo'de	
  residues	
  that	
  will	
  be	
  inserted	
  to	
  the	
  right	
  of	
  the	
  cursor’s	
  current	
  loca'on.	
  The	
  
‘Create	
  Genomic	
  Dele<on’	
  op'on	
  will	
  require	
  you	
  to	
  enter	
  the	
  length	
  of	
  the	
  dele'on,	
  star'ng	
  
with	
  the	
  nucleo'de	
  where	
  the	
  cursor	
  is	
  posi'oned.	
  The	
  ‘Create	
  Genomic	
  Subs<tu<on’	
  feature	
  
asks	
  for	
  the	
  string	
  of	
  nucleo'de	
  residues	
  that	
  will	
  replace	
  the	
  ones	
  on	
  the	
  DNA	
  track.	
  
4.  Once	
  you	
  have	
  entered	
  the	
  modifica'ons,	
  Apollo	
  will	
  recalculate	
  the	
  corrected	
  transcript	
  and	
  
protein	
  sequences,	
  which	
  will	
  appear	
  when	
  you	
  use	
  the	
  right-­‐click	
  menu	
  ‘Get	
  Sequence’	
  
op'on.	
  Since	
  the	
  underlying	
  genomic	
  sequence	
  is	
  reflected	
  in	
  all	
  annota'ons	
  that	
  include	
  the	
  
modified	
  region	
  you	
  should	
  alert	
  the	
  curators	
  of	
  your	
  organisms	
  database	
  using	
  the	
  
‘Comments’	
  sec'on	
  to	
  report	
  the	
  CDS	
  edits.	
  	
  
5.  In	
  special	
  cases	
  such	
  as	
  selenocysteine	
  containing	
  proteins	
  (read-­‐throughs),	
  right-­‐click	
  over	
  the	
  
offending/premature	
  ‘Stop’	
  signal	
  and	
  choose	
  the	
  ‘Set	
  readthrough	
  stop	
  codon’	
  op'on	
  from	
  
the	
  menu.	
  
	
   59	
COMPLEX CASES
correcting frameshifts, single-base errors, and selenocysteines
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
60 | 60	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Annotation right-click menu
61	
Annota'ons,	
  annota'on	
  edits,	
  and	
  History:	
  stored	
  in	
  a	
  centralized	
  database.	
  
61	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
Follow	
  the	
  checklist	
  un'l	
  you	
  are	
  happy	
  with	
  the	
  annota'on!	
  
And	
  remember	
  to…	
  
–  comment	
  to	
  validate	
  your	
  annota'on,	
  even	
  if	
  you	
  made	
  no	
  changes	
  to	
  an	
  
exis'ng	
  model.	
  Think	
  of	
  comments	
  as	
  your	
  vote	
  of	
  confidence.	
  
	
  
–  or	
  add	
  a	
  comment	
  to	
  inform	
  the	
  community	
  of	
  unresolved	
  issues	
  you	
  
think	
  this	
  model	
  may	
  have.	
  
62 | 62	
Always	
  Remember:	
  Apollo	
  cura'on	
  is	
  a	
  community	
  effort	
  so	
  please	
  
use	
  comments	
  to	
  communicate	
  the	
  reasons	
  for	
  your	
  	
  
annota'on.	
  Your	
  comments	
  will	
  be	
  visible	
  to	
  everyone.	
  
COMPLETING THE ANNOTATION
BECOMING ACQUAINTED WITH APOLLO
63 | 63	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Annotation right-click menu
64	
The	
  Annota'on	
  Informa=on	
  Editor	
  
64	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
DBXRefs	
  are	
  database	
  crossed	
  references:	
  if	
  you	
  have	
  
reason	
  to	
  believe	
  that	
  this	
  gene	
  is	
  linked	
  to	
  a	
  gene	
  in	
  a	
  
public	
  database	
  (including	
  your	
  own),	
  then	
  add	
  it	
  here.	
  
65	
The	
  Annota'on	
  Informa=on	
  Editor	
  
•  Add	
  PubMed	
  IDs	
  
•  Include	
  GO	
  terms	
  as	
  appropriate	
  
from	
  any	
  of	
  the	
  three	
  ontologies	
  
•  Write	
  comments	
  sta'ng	
  how	
  you	
  
have	
  validated	
  each	
  model.	
  
65	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
Checklist	
  
•  Check	
  ‘Start’	
  and	
  ‘Stop’	
  sites.	
  
•  Check	
  	
  splice	
  sites:	
  most	
  splice	
  sites	
  display	
  
these	
  residues	
  …]5’-­‐GT/AG-­‐3’[…	
  
•  Check	
  if	
  you	
  can	
  annotate	
  UTRs,	
  for	
  example	
  
using	
  RNA-­‐Seq	
  data:	
  
–  align	
  it	
  against	
  relevant	
  genes/gene	
  family	
  
–  blastp	
  against	
  NCBI’s	
  RefSeq	
  or	
  nr	
  
•  Check	
  for	
  gaps	
  in	
  the	
  genome.	
  
•  Addi'onal	
  func'onality	
  may	
  be	
  necessary:	
  
–  merging	
  2	
  gene	
  predic'ons	
  on	
  the	
  same	
  
scaffold	
  
–  merging	
  2	
  gene	
  predic'ons	
  from	
  different	
  
scaffolds	
  	
  
–  splifng	
  a	
  gene	
  predic'on	
  
–  correc'ng	
  frameshigs	
  and	
  other	
  errors	
  in	
  
the	
  genome	
  assembly	
  
–  annota'ng	
  selenocysteines,	
  correc'ng	
  
single-­‐base	
  errors,	
  etc.	
  
67 | 67	
•  Add:	
  
–  Important	
  project	
  informa'on	
  in	
  the	
  form	
  of	
  
comments	
  
–  IDs	
  from	
  public	
  databases	
  e.g.	
  GenBank	
  (via	
  
DBXRef),	
  gene	
  symbol(s),	
  common	
  name(s),	
  
synonyms,	
  top	
  BLAST	
  hits,	
  orthologs	
  with	
  
species	
  names,	
  and	
  everything	
  else	
  you	
  can	
  
think	
  of,	
  because	
  you	
  are	
  the	
  expert.	
  
–  Comments	
  about	
  the	
  kinds	
  of	
  changes	
  you	
  
made	
  to	
  the	
  gene	
  model	
  of	
  interest,	
  if	
  any.	
  	
  
–  Any	
  appropriate	
  func'onal	
  assignments,	
  e.g.	
  via	
  
BLAST,	
  RNA-­‐Seq	
  data,	
  literature	
  searches,	
  etc.	
  
CHECKLIST
for accuracy and integrity
MANUAL ANNOTATION CHECKLIST
Cura'ng	
  within	
  i5K	
  
69i5K Workspace@NAL
THE COLLABORATIVE CURATION PROCESS AT i5K

1.  A	
  computa'onally	
  predicted	
  consensus	
  gene	
  set	
  has	
  been	
  generated	
  
using	
  mul'ple	
  lines	
  of	
  evidence;	
  e.g.	
  LDEC_v0.5.3-­‐Models	
  
	
  
2.  i5K	
  Projects	
  will	
  integrate	
  consensus	
  computa'onal	
  predic'ons	
  with	
  
manual	
  annota'ons	
  to	
  produce	
  an	
  updated	
  Official	
  Gene	
  Set	
  (OGS):	
  
Achtung!	
  
•  If	
  it’s	
  not	
  on	
  either	
  track,	
  it	
  won’t	
  make	
  the	
  OGS!	
  
•  If	
  it’s	
  there	
  and	
  it	
  shouldn’t,	
  it	
  will	
  s'll	
  make	
  the	
  OGS!	
  
70i5K Workspace@NAL
THE COLLABORATIVE CURATION PROCESS AT i5K

3.  In	
  some	
  cases	
  algorithms	
  and	
  metrics	
  used	
  to	
  generate	
  consensus	
  sets	
  
may	
  actually	
  reduce	
  the	
  accuracy	
  of	
  the	
  gene’s	
  representa'on.	
  User	
  
your	
  judgment	
  and	
  choose	
  a	
  different	
  model	
  to	
  annotate.	
  
4.  Isoforms:	
  drag	
  original	
  and	
  alterna'vely	
  spliced	
  form	
  to	
  ‘User-­‐created	
  
Annota<ons’	
  area.	
  
5.  If	
  an	
  annota'on	
  needs	
  to	
  be	
  removed	
  from	
  the	
  consensus	
  set,	
  drag	
  it	
  to	
  
the	
  ‘User-­‐created	
  Annota<ons’	
  area	
  and	
  label	
  as	
  ‘Delete’	
  on	
  the	
  
Informa1on	
  Editor.	
  
6.  Overlapping	
  interests?	
  Collaborate	
  to	
  reach	
  agreement.	
  
7.  Follow	
  guidelines	
  for	
  i5K	
  Pilot	
  Species	
  Projects,	
  at	
  h@p://goo.gl/LRu1VY	
  
Example	
  
Example
Example 72
	
  Cura'on	
  example	
  using	
  the	
  Hyalella	
  azteca	
  
genome	
  (amphipod	
  crustacean).	
  
What do we know about this genome?
•  Currently	
  publicly	
  available	
  data	
  at	
  NCBI:	
  
•  >37,000	
   	
  nucleo'de	
  seqsà	
  scaffolds,	
  mitochondrial	
  genes	
  
•  344	
   	
  amino	
  acid	
  seqsà	
  mitochondrion	
  
•  47 	
   	
  ESTs	
  
•  0	
   	
   	
  conserved	
  domains	
  iden'fied	
  
•  0 	
   	
  “gene”	
  entries	
  submi@ed	
  
	
  
•  Data	
  at	
  i5K	
  Workspace@NAL	
  (annota'on	
  hosted	
  at	
  USDA)	
  	
  
-­‐	
  10,832	
  scaffolds:	
  23,288	
  transcripts:	
  12,906	
  proteins	
  
Example 73
PubMed Search: 

what’s new?
Example 74
PubMed Search: what’s new?
Example 75
“Ten	
  popula'ons	
  (3	
  cultures,	
  7	
  from	
  California	
  water	
  
bodies)	
  differed	
  by	
  at	
  least	
  550-­‐fold	
  in	
  sensi=vity	
  to	
  
pyrethroids.”	
  	
  
“By	
  sequencing	
  the	
  primary	
  pyrethroid	
  target	
  site,	
  the	
  
voltage-­‐gated	
  sodium	
  channel	
  (vgsc),	
  we	
  show	
  that	
  
point	
  muta'ons	
  and	
  their	
  spread	
  in	
  natural	
  popula'ons	
  
were	
  responsible	
  for	
  differences	
  in	
  pyrethroid	
  
sensi'vity.”	
  
“The	
  finding	
  that	
  a	
  non-­‐target	
  aqua'c	
  species	
  has	
  
acquired	
  resistance	
  to	
  pes'cides	
  used	
  only	
  on	
  terrestrial	
  
pests	
  is	
  troubling	
  evidence	
  of	
  the	
  impact	
  of	
  chronic	
  
pes=cide	
  transport	
  from	
  land-­‐based	
  applica'ons	
  into	
  
aqua'c	
  systems.”	
  
How many sequences are there, publicly available,
for our gene of interest?
Example 76
•  Para,	
  (voltage-­‐gated	
  sodium	
  channel	
  alpha	
  
subunit;	
  Nasonia	
  vitripennis).	
  	
  
•  NaCP60E	
  (Sodium	
  channel	
  protein	
  60	
  E;	
  D.	
  
melanogaster).	
  
–  MF:	
  voltage-­‐gated	
  ca'on	
  channel	
  ac'vity	
  
(IDA,	
  GO:0022843).	
  
–  BP:	
  olfactory	
  behavior	
  (IMP,	
  GO:
0042048),	
  sodium	
  ion	
  transmembrane	
  
transport	
  (ISS,GO:0035725).	
  
–  CC:	
  voltage-­‐gated	
  sodium	
  channel	
  
complex	
  (IEA,	
  GO:0001518).	
  
And	
  what	
  do	
  we	
  know	
  about	
  them?	
  
Retrieving sequences for a 

sequence similarity search.
Example 77
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT search



input	
  
Example 78
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT search



results	
  
Example 79
•  High-­‐scoring	
  segment	
  pairs	
  (hsp)	
  
are	
  listed	
  in	
  tabulated	
  format.	
  
•  Clicking	
  on	
  one	
  line	
  of	
  results	
  
sends	
  you	
  to	
  those	
  coordinates.	
  
BLAST at i5K 

h*ps://i5k.nal.usda.gov/blast
Example 80
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAST at i5K 

h*ps://i5k.nal.usda.gov/blast	
  
Example 81
BLAST at i5K: hsps	
  in	
  “BLAST+	
  Results”	
  track	
  
Example 82
Creating a new gene model: drag and drop
Example 83
•  Apollo	
  automa'cally	
  calculates	
  longest	
  ORF.	
  	
  
•  In	
  this	
  case,	
  ORF	
  includes	
  the	
  high-­‐scoring	
  segment	
  pairs	
  (hsp),	
  
marked	
  here	
  in	
  blue.	
  
•  Note	
  that	
  gene	
  is	
  transcribed	
  from	
  reverse	
  strand.	
  
Available Tracks
Example 84
Get Sequence
Example 85
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Also, flanking sequences (other gene models) vs. NCBI nr
Example 86
In	
  this	
  case,	
  two	
  gene	
  
models	
  upstream,	
  at	
  5’	
  
end.	
  
BLAST	
  hsps	
  
Review alignments
Example 87
HaztTmpM006234	
  
HaztTmpM006233	
  
HaztTmpM006232	
  
Hypothesis for vgsc gene model
Example 88
Editing: merge the three models
Example 89
Merge	
  by	
  dropping	
  an	
  
exon	
  or	
  gene	
  model	
  
onto	
  another.	
  
Merge	
  by	
  selec'ng	
  
two	
  exons	
  (holding	
  
down	
  “Shit”)	
  and	
  
using	
  the	
  right	
  click	
  
menu.	
  
or…	
  
Result of merging the gene models:
Example 90
Editing: correct offending splice site
Example 91
Modify	
  exon	
  /	
  intron	
  
boundary:	
  	
  
-­‐  Drag	
  the	
  end	
  of	
  the	
  
exon	
  to	
  the	
  nearest	
  
canonical	
  splice	
  site.	
  
	
  
or	
  
	
  
-­‐  Use	
  right-­‐click	
  menu.	
  
Editing: set translation start
Example 92
Editing: delete exon not supported by evidence
Example 93
Delete	
  first	
  exon	
  from	
  
HaztTmpM006233	
  
Editing: add an exon supported by RNAseq
Example 94
•  RNAseq	
  reads	
  show	
  evidence	
  in	
  support	
  of	
  transcribed	
  product,	
  which	
  was	
  not	
  predicted.	
  
•  Add	
  exon	
  at	
  coordinates	
  97946-­‐98012	
  by	
  dragging	
  up	
  one	
  of	
  the	
  RNAseq	
  reads.	
  
Editing: adjust offending splice site using evidence
Example 95
Editing: adjust other boundaries supported by evidence
Example 96
Finished model
Example 97
Corroborate	
  integrity	
  and	
  accuracy	
  of	
  the	
  model:	
  	
  
-­‐	
  Start	
  and	
  Stop	
  
-­‐	
  Exon	
  structure	
  and	
  splice	
  sites	
  …]5’-­‐GT/AG-­‐3’[…	
  
-­‐	
  Check	
  the	
  predicted	
  protein	
  product	
  vs.	
  NCBI	
  nr,	
  UniProt,	
  etc.	
  
Information Editor
•  DBXRefs:	
  e.g.	
  NP_001128389.1,	
  N.	
  
vitripennis,	
  RefSeq	
  
•  PubMed	
  iden'fier:	
  PMID:	
  24065824	
  
•  Gene	
  Ontology	
  IDs:	
  GO:0022843,	
  GO:
0042048,	
  GO:0035725,	
  GO:0001518.	
  
•  Comments	
  
•  Name,	
  Symbol	
  
•  Approve	
  /	
  Delete	
  radio	
  bu@on	
  
Example 98
Comments	
  
(if	
  applicable)	
  
Go	
  play!	
  
PUBLIC DEMO
100 | 100	
APOLLO ON THE WEB

instructions
At	
  i5K	
  
1.  Register	
  for	
  access	
  to	
  Apollo	
  at	
  the	
  i5K	
  Workspace@NAL	
  at	
  
h@ps://i5k.nal.usda.gov/web-­‐apollo-­‐registra'on	
  
	
  
2.  Contact	
  the	
  coordinator	
  for	
  each	
  species	
  community	
  to	
  receive	
  more	
  
informa'on	
  about	
  how	
  to	
  contribute.	
  Contact	
  info	
  is	
  available	
  on	
  each	
  
organism’s	
  page.	
  	
  
PUBLIC DEMO
101 | 101	
APOLLO ON THE WEB

instructions
Public	
  Honey	
  bee	
  demo	
  available	
  at:	
  	
  
h@p://GenomeArchitect.org/WebApolloDemo	
  
	
  
APOLLO

demonstration
PUBLIC DEMO 102
Demonstra'on	
  video	
  is	
  available	
  at	
  	
  
h@ps://youtu.be/VgPtAP_fvxY	
  
OUTLINE

Web	
  Apollo	
  Collabora've	
  Cura'on	
  and	
  	
  
Interac've	
  Analysis	
  of	
  Genomes	
  
103OUTLINE
•  BIO-­‐REFRESHER	
  
biological	
  concepts	
  for	
  cura'on	
  
•  ANNOTATION	
  
automa'c	
  predic'ons	
  
•  MANUAL	
  ANNOTATION	
  
necessary,	
  collabora've	
  
	
  
•  APOLLO	
  
advancing	
  collabora've	
  cura'on	
  
	
  
•  EXAMPLE	
  
demos	
  
Thank you! 104
•  Berkeley	
  Bioinforma=cs	
  Open-­‐source	
  Projects	
  (BBOP),	
  
Berkeley	
  Lab:	
  Apollo	
  and	
  Gene	
  Ontology	
  teams.	
  Suzanna	
  
E.	
  Lewis	
  (PI).	
  
•  §	
  Chris1ne	
  G.	
  Elsik	
  (PI).	
  University	
  of	
  Missouri.	
  	
  
•  *	
  Ian	
  Holmes	
  (PI).	
  University	
  of	
  California	
  Berkeley.	
  
•  Arthropod	
  genomics	
  community:	
  i5K	
  Steering	
  
Commi@ee	
  (esp.	
  Sue	
  Brown	
  (Kansas	
  State)),	
  Alexie	
  
Papanicolaou	
  (UWS),	
  and	
  the	
  Honey	
  Bee	
  Genome	
  
Sequencing	
  Consor'um.	
  
•  Stephen	
  Ficklin,	
  GenSAS,	
  Washington	
  State	
  University	
  
•  Apollo	
  is	
  supported	
  by	
  NIH	
  grants	
  5R01GM080203	
  from	
  
NIGMS,	
  and	
  5R01HG004483	
  from	
  NHGRI.	
  Both	
  projects	
  
are	
  also	
  supported	
  by	
  the	
  Director,	
  Office	
  of	
  Science,	
  
Office	
  of	
  Basic	
  Energy	
  Sciences,	
  of	
  the	
  U.S.	
  Department	
  
of	
  Energy	
  under	
  Contract	
  No.	
  DE-­‐AC02-­‐05CH11231	
  
•  	
  	
  
•  For	
  your	
  a*en=on,	
  thank	
  you!	
  
Apollo	
  
Nathan	
  Dunn	
  
Colin	
  Diesh	
  §	
  
Deepak	
  Unni	
  §	
  	
  
	
  
Gene	
  Ontology	
  
Chris	
  Mungall	
  
Seth	
  Carbon	
  
Heiko	
  Dietze	
  
	
  
BBOP	
  
Apollo:	
  h@p://GenomeArchitect.org	
  	
  
GO:	
  h@p://GeneOntology.org	
  
i5K:	
  h@p://arthropodgenomes.org/wiki/i5K	
  
Thank	
  you!	
  
NAL	
  at	
  USDA	
  
Monica	
  Poelchau	
  
Christopher	
  Childers	
  
Gary	
  Moore	
  
Mei-­‐Ju	
  Chen	
  
HGSC	
  at	
  BCM	
  
fringy	
  Richards	
  
Kim	
  Worley	
  
	
  
JBrowse	
   	
   	
   	
  	
  Eric	
  Yao	
  *	
  
Introduction to Apollo: A webinar for the i5K Research Community

More Related Content

What's hot

Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methodssonamyadav82
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programsMugdhaSharma11
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introductionSetia Pramana
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerKAUSHAL SAHU
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 

What's hot (20)

Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methods
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Est database
Est databaseEst database
Est database
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Bioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-statBioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-stat
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular marker
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 

Viewers also liked

Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Monica Munoz-Torres
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterMonica Munoz-Torres
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Monica Munoz-Torres
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Monica Munoz-Torres
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalMonica Munoz-Torres
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityMonica Munoz-Torres
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Monica Munoz-Torres
 

Viewers also liked (9)

Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunity
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 

Similar to Introduction to Apollo: A webinar for the i5K Research Community

Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionMonica Munoz-Torres
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Monica Munoz-Torres
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 pptrehman2009
 
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...DevikaPatel12
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymesGetachew Birhanu
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 

Similar to Introduction to Apollo: A webinar for the i5K Research Community (20)

Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
Shotgun (2) metagenomics
Shotgun (2) metagenomicsShotgun (2) metagenomics
Shotgun (2) metagenomics
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
 
C value
C value C value
C value
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
Functional Genomic l Genomes l proteomic l DNA l #genomics #proteomics #scien...
 
gene_concept_2.pdf
gene_concept_2.pdfgene_concept_2.pdf
gene_concept_2.pdf
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Finding genes
Finding genesFinding genes
Finding genes
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
THE human genome
THE human genomeTHE human genome
THE human genome
 

More from Monica Munoz-Torres

Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityMonica Munoz-Torres
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopMonica Munoz-Torres
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopMonica Munoz-Torres
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Monica Munoz-Torres
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRMonica Munoz-Torres
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Monica Munoz-Torres
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsMonica Munoz-Torres
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMonica Munoz-Torres
 

More from Monica Munoz-Torres (9)

Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionality
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo Workshop
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGR
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 

Recently uploaded

final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docxkarenmillo
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationSanghamitraMohapatra5
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 

Recently uploaded (20)

final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docx
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitation
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 

Introduction to Apollo: A webinar for the i5K Research Community

  • 1. Introduction to Apollo
 C o l l a b o r a t i v e g e n o m e a n n o t a t i o n e d i t i n g
 
 
 A webinar for the i5K Research Community Monica Munoz-Torres, PhD | @monimunozto
 Berkeley Bioinformatics Open-Source Projects (BBOP)
 Lawrence Berkeley National Laboratory | 
 University of California Berkeley | U.S. Department of Energy 
 i5K Pilot Project Species Call | 13 October, 2015
  • 2. OUTLINE
 Web  Apollo  Collabora've  Cura'on  and     Interac've  Analysis  of  Genomes   2OUTLINE •  Today  we  will  discover   how  to  extract  very   valuable  informa'on   about  a  genome  through   cura'on  efforts.  
  • 3. APOLLO DEVELOPMENT APOLLO DEVELOPERS 3 h* p://G e nom e Ar c hite c t. or g /     Nathan Dunn Eric Yao JBrowse, UC Berkeley Christine Elsik’s Lab, University of Missouri Suzi Lewis Principal Investigator BBOP   Moni Munoz-Torres Stephen Ficklin GenSAS, Washington State University Colin DieshDeepak Unni
  • 4. 4 AFTER THIS TALK WE WILL... v Be@er  understand  genome  cura'on  in  the  context  of  annota'on:     assembled  genome  à  automated  annota=on  à  manual  annota=on   v Become  familiar  with  the  environment  and  func'onality  of  the  Apollo   genome  annota'on  edi'ng  tool.   v Learn  to  iden'fy  homologs  of  known  genes  of  interest  in  a  newly   sequenced  genome.   v Learn  about  corrobora'ng  and  modifying  automa'cally  annotated  gene   models  using  available  evidence  in  Apollo.   What to expect
  • 5. A  typical  genome     sequencing  project  
  • 6. 6 Genome Sequencing Project Anatomy of a genome sequencing project Experimental design, sampling. Comparative analyses Consensus Gene Set Manual Annotation Automated Annotation Sequencing Assembly Synthesis & dissemination.
  • 7. CURATING GENOMES
 steps involved 1  Genera=on  of  Gene  Models   calling  ORFs,  one  or  more   rounds  of  gene  predic'on,   etc.     2  Annota=on  of  gene  models   Describing  func'on,   expression  pa@erns,   metabolic  network    memberships.     3  Manual  annota=on   CURATING GENOMES 7
  • 8. GENOME ANNOTATION
 objectives and uses Curating Genomes 8 The  gene  set  of  an  organism  informs  a  variety  of  studies:   •  Gene  number,  GC%,  TE  composi'on,  repe''ve  regions.   •  Func'onal  assignments.   •  Molecular  evolu'on,  sequence  conserva'on.   •  Gene  families.   •  Metabolic  pathways.   •  What  makes  an  organism  what  it  is?     What  makes  a  bee  a  “bee”?   Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
  • 10. REMEMBER... 
 for manual annotation To  remember…  Biological  concepts  to  be@er   understand  manual  annota'on   10BIO-REFRESHER •  GLOSSARY   from  con1g  to  splice  site     •  CENTRAL  DOGMA   in  molecular  biology     •  WHAT  IS  A  GENE?   defining  your  goal   •  TRANSCRIPTION   mRNA  in  detail     •  TRANSLATION   and  other  defini'ons   •  GENOME  CURATION   steps  involved  
  • 11. 11BIO-REFRESHER WHAT IS A GENE? v  A  con'nuously  evolving  concept  paints  a  very  complex   picture  of  molecular  ac'vity:     “A  gene  is  a  locatable  region  of  genomic  sequence,  corresponding  to   a  unit  of  inheritance,  which  is  associated  with  regulatory  regions,   transcribed  regions  and/or  other  func'onal  sequence  regions”.     -­‐  The  Sequence  Ontology    
  • 12. 12BIO-REFRESHER WHAT IS A GENE? v  ...  also  long  transcripts,  dispersed  regula1on.       “The  gene  is  a  DNA  segment  that  contributes  to  phenotype  and  func'on.  In   the  absence  of  demonstrated  func'on,  a  gene  may  be  characterized  by   sequence,  transcrip'on  or  homology.”     -­‐  The  ENCODE  Project   https://www.encodeproject.org/
  • 13. 13BIO-REFRESHER “The  gene  is  a  union   of  genomic  sequences   encoding  a  coherent   set  of  poten'ally     overlapping  func'onal   products.”   Gerstein et al., 2007. Genome Res THE GENE: a moving target
  • 14. 14BIO-REFRESHER TRANSLATION
 reading frames v  Reading  frame  is  a  manner  of  dividing  the  sequence  of  nucleo'des  in  mRNA   (or  DNA)  into  a  set  of  consecu've,  non-­‐overlapping  triplets  (codons).   v  Three  frames  can  be  read  in  the  5’  à  3’  direc'on.  Given  that  DNA  has  two   an'-­‐parallel  strands,  an  addi'onal  three  frames  are  possible  to  be  read  on   the  an'-­‐sense  strand.  Six  total  possible  reading  frames  exist.   v  In  eukaryotes,  only  one  reading  frame  per  sec'on  of  DNA  is  biologically   relevant  at  a  'me:  it  has  the  poten'al  to  be  transcribed  into  RNA  and   translated  into  protein.  This  is  called  the  OPEN  READING  FRAME  (ORF)   •  ORF  =  Start  signal  +  coding  sequence  (divisible  by  3)  +  Stop  signal   v  The  sec'ons  of  the  mature  mRNA  transcribed  with  the  coding  sequence  but   not  translated  are  called  UnTranslated  Regions  (UTR);  one  at  each  end.  
  • 15. 15BIO-REFRESHER TRANSLATION
 splice sites v  The  spliceosome  catalyzes  the  removal  of  introns  and  the  liga'on  of  flanking   exons.   •  introns:  spaces  inside  the  gene,  not  part  of  the  coding  sequence   •  exons:  expression  units  (of  the  coding  sequence)   v  Splicing  signals  (from  the  point  of  view  of  an  intron):     •  One  splice  signal  (site)  on  the  5’  end:  usually  GT  (less  common:  GC)   •  And  a  3’  end  splice  site:  usually  AG   •  Canonical  splice  sites  look  like  this:  …]5’-­‐GT/AG-­‐3’[…     v  It  is  possible  to  produce  more  than  one  protein  (polypep'de)  sequence  from   the  same  genic  region,  by  alterna'vely  bringing  exons  together=  alterna=ve   splicing.  For  example,  the  gene  Dscam  (Drosophila)  has  38,000  alterna'vely   spliced  mRNAs  =  isoforms  
  • 16. 16BIO-REFRESHER TRANSLATION
 phase v  Introns  can  interrupt  the  reading  frame  of  a  gene  by  inser'ng  a  sequence   between  two  consecu've  codons       v  Between  the  first  and  second  nucleo'de  of  a  codon     v  Or  between  the  second  and  third  nucleo'de  of  a  codon   "Exon and Intron classes”. Licensed under Fair use via Wikipedia
  • 17. 17 "Gene structure" by Daycd- Wikimedia Commons BIO-REFRESHER mRNA
 now in your mind •  Although  of  brief  existence,  understanding  mRNAs  is  crucial,    as  they  will  become  the  center  of  your  work.  
  • 19. 19GENE PREDICTION & ANNOTATION PREDICTION & ANNOTATION v  Iden'fica'on  and  annota'on  of  genome  features:     •  primarily  focuses  on  protein-­‐coding  genes.     •  also  iden'fies  RNAs  (tRNA,  rRNA,  long  and  small  non-­‐coding   RNAs  (ncRNA)),  regulatory  mo'fs,  repe''ve  elements,  etc.     •  happens  in  2  phases:   1.  Computa'on  phase     2.  Annota'on  phase  
  • 20. 20GENE PREDICTION & ANNOTATION COMPUTATION PHASE a.  Experimental  data  are  aligned  to  the  genome:  expressed  sequence  tags,   RNA-­‐sequencing  reads,  proteins  (homologous  and  heterologous).             b.  Gene  predic=ons  are  generated:      -­‐  ab  ini1o:  based  on  nucleo'de  sequence  and  composi'on    e.g.  Augustus,  GENSCAN,  geneid,  fgenesh,  etc.    -­‐  evidence-­‐driven:  iden'fying  also  domains  and  mo'fs    e.g.  SGP2,  JAMg,  fgenesh++,  etc.       Result:  the  single  most  likely  coding  sequence,  no  UTRs,  no  isoforms.   Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
  • 21. 21GENE PREDICTION & ANNOTATION ANNOTATION PHASE Experimental  data  (evidence)  and  predic'ons  are  synthe'zed  into  gene   annota'ons.     Result:  gene  models  that  [generally]  include  UTRs,  isoforms,  evidence  trails.   Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174 5’  UTR   3’  UTR  
  • 22. 22 In  some  cases  algorithms  and  metrics  used  to  generate   consensus  sets  may  actually  reduce  the  accuracy  of  the  gene’s   representa'on.   CONSENSUS GENE SETS Gene  models  may  be  organized  into  sets  using:   v  combiners  for  automa'c  integra'on  of  predicted  sets     e.g:  GLEAN,  EvidenceModeler   or   v  tools  packaged  into  pipelines   e.g:  MAKER,  PASA,  Gnomon,  Ensembl,  etc.   GENE PREDICTION & ANNOTATION
  • 23. ANNOTATION
 an imperfect art No one is perfect, least of all automated annotation. 23 New  technology  brings  new  challenges:     •  Assembly  errors  can  cause  fragmented   annota'ons   •  Limited  coverage  makes  precise   iden'fica'on  a  difficult  task   Image: www.BroadInstitute.org
  • 24. MANUAL ANNOTATION
 improving predictions Precise  elucida=on  of  biological  features   encoded  in  the  genome  requires  careful   examina=on  and  review.     Schiex  et  al.  Nucleic  Acids  2003  (31)  13:  3738-­‐3741   Automated Predictions Experimental Evidence Manual Annotation – to the rescue. 24 cDNAs,  HMM  domain  searches,  RNAseq,   genes  from  other  species.  
  • 25. 25 BIOCURATION
 structural and functional adjustments Iden=fies  elements  that  best   represent  the  underlying  biology   and  eliminates  elements  that   reflect  systemic  errors  of   automated  analyses.   Assigns  func=on  through   compara've  analysis  of  similar   genome  elements  from  closely   related  species  using  literature,   databases,  and  experimental  data.   MANUAL ANNOTATION h@p://GeneOntology.org   1   2  
  • 26. GENOME ANNOTATION
 an inherently collaborative task GENE PREDICTION & ANNOTATION 26 Researchers  oGen  turn  to  colleagues  for  second   opinions  and  insight  from  those  with  exper1se  in   par1cular  areas  (e.g.,  domains,  families).   So  many  sequences,  not  enough  hands.  
  • 27. APOLLO
 collaborative genome annotation editing tool 27 v  Web  based,  integrated  with  JBrowse.   v  Supports  real  'me  collabora'on.   v  Automa'c  genera'on  of  ready-­‐made     computable  data.     v  Supports  annota'on  of  genes,    pseudogenes,     tRNAs,  snRNAs,  snoRNAs,  ncRNAs,  miRNAs,  TEs,  and  repeats.   v  Intui've  annota'on,  gestures,  and  pull-­‐down  menus  to  create  and   edit  transcripts  and  exons  structures,  insert  comments  (CV,  freeform   text),  associate  GO  terms,  etc.   APOLLO h@p://GenomeArchitect.org    
  • 28. Con'nuous  training  and  support  for  hundreds  of  geographically  dispersed   scien'sts,   from   diverse   research   communi'es,   in   conduc'ng   manual   annota'ons   efforts   to   recover   coding   sequences   in   agreement   with   all   available  biological  evidence  using  Apollo.     28 LESSONS LEARNED APOLLO •  Collabora've  work  dis'lls  invaluable  knowledge  
  • 29. 29 A LITTLE TRAINING GOES A LONG WAY! Provided  with  adequate  tools,  wet  lab  scien'sts  make  excep'onal   curators  who  can  easily  learn  to  maximize  the  genera'on  of  accurate,   biologically  supported  gene  models.   APOLLO
  • 31. Sort Apollo - current version at i5K Workspace@NAL 31 The  Sequence  Selec'on  Window   4. Becoming Acquainted with Web Apollo. 31
  • 32. 32 APOLLO
 annotation editing environment BECOMING ACQUAINTED WITH APOLLO Color  by  CDS  frame,   toggle  strands,  set  color   scheme  and  highlights.   -­‐  Upload  evidence  files   (GFF3,  BAM,  BigWig),   -­‐  combina=on  track     -­‐  sequence  search  track   Query  the  genome  using   BLAT.   Naviga'on  and  zoom.   Search  for  a  gene   model  or  a  scaffold.   Get  coordinates  and  “rubber   band”  selec'on  for  zooming.   Login   User-­‐created   annota'ons.   New   annotator   panel.   Evidence   Tracks   Stage  and   cell-­‐type   specific   transcrip'on   data.    h@p://genomearchitect.org/web_apollo_user_guide    
  • 34. BECOMING ACQUAINTED WITH APOLLO 34 | 34 GENERAL PROCESS OF CURATION
 main steps to remember 1.  Select  or  find  a  region  of  interest,  e.g.  scaffold.   2.  Select  appropriate  evidence  tracks  to  review  the  gene  model.   3.  Determine  whether  a  feature  in  an  exis'ng  evidence  track   will  provide  a  reasonable  gene  model  to  start  working.   4.  If  necessary,  adjust  the  gene  model.   5.  Check  your  edited  gene  model  for  integrity  and  accuracy  by   comparing  it  with  available  homologs.   6.  Comment  and  finish.  
  • 35. USER NAVIGATION
 removable side dock HIGHLIGHTED IMPROVEMENTS 35 Annotations Organism Users Groups AdminTracks Reference Sequence
  • 36. EDITS & EXPORTS
 annotation details, exon boundaries, data export HIGHLIGHTED IMPROVEMENTS 36 1 2 Annotations 1 2 gene   mRNA  
  • 37. HIGHLIGHTED IMPROVEMENTS 37 Reference Sequences 3 FASTA   GFF3   EDITS & EXPORTS
 annotation details, exon boundaries, data export 3
  • 38. 38 | 38 BECOMING ACQUAINTED WITH APOLLO USER NAVIGATION Annotator   panel.   •  Choose  appropriate  evidence  from  list  of  “Tracks”  on  annotator  panel.       •  Select  &  drag  elements  from  evidence  track  into  the  ‘User-­‐created  Annota1ons’  area.     •  Hovering  over  annota'on  in  progress  brings  up  an  informa'on  pop-­‐up.   •  Crea'ng  a  new  annota'on  
  • 39. 39 | 39 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Annota'on  right-­‐click  menu  
  • 40. 40 | 40 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  ‘Zoom  to  base  level’  op'on  reveals  the  DNA  Track.  
  • 41. 41 | 41 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Color  exons  by  CDS  from  the  ‘View’  menu.  
  • 42. 42 | Zoom  in/out  with  keyboard:   shit  +  arrow  keys  up/down   42 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Toggle  reference  DNA  sequence  and  transla=on  frames  in  forward   strand.  Toggle  models  in  either  direc'on.  
  • 45. “Simple  case”:      -­‐  the  predicted  gene  model  is  correct  or  nearly  correct,  and      -­‐  this  model  is  supported  by  evidence  that  completely  or  mostly   agrees  with  the  predic'on.      -­‐  evidence  that  extends  beyond  the  predicted  model  is  assumed   to  be  non-­‐coding  sequence.       The  following  are  simple  modifica'ons.       45 | 45 ANNOTATING SIMPLE CASES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 46. •  A  confirma'on  box  will  warn  you  if  the  receiving  transcript  is  not  on  the   same  strand  as  the  feature  where  the  new  exon  originated.     •  Check  ‘Start’  and  ‘Stop’  signals  ater  each  edit.   46 ADDING EXONS BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 47. If  transcript  alignment  data  are  available  &  extend  beyond  your  original  annota'on,     you  may  extend  or  add  UTRs.     1.  Right  click  at  the  exon  edge  and  ‘Zoom  to  base  level’.     2.  Place  the  cursor  over  the  edge  of  the  exon  un1l  it  becomes  a  black  arrow  then  click   and  drag  the  edge  of  the  exon  to  the  new  coordinate  posi'on  that  includes  the  UTR.     47 ADDING UTRs To  add  a  new  spliced  UTR  to  an  exis'ng     annota'on  also  follow  the  procedure  for  adding  an  exon.   BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 48. To  modify  an  exon  boundary  and  match   data   in   the   evidence   tracks:   select   both   the   [offending]   exon   and   the   feature  with  the  expected  boundary,   then  right  click  on  the  annota'on  to   select  ‘Set  3’  end’  or  ‘Set  5’  end’  as   appropriate.     In  some  cases  all  the  data  may  disagree  with  the  annota'on,  in   other  cases  some  data  support  the  annota'on  and  some  of  the   data  support  one  or  more  alterna've  transcripts.  Try  to  annotate   as  many  alterna've  transcripts  as  are  well  supported  by  the  data.   48 MATCHING EXON BOUNDARY TO EVIDENCE BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 49. Non-­‐canonical  splice  sites  flags.   Double  click:  selec'on  of   feature  and  sub-­‐features   Evidence  Tracks  Area   ‘User-­‐created  Annota1ons’  Track   Edge-­‐matching   Apollo’s  edi'ng  logic  (brain):     §  selects  longest  ORF  as  CDS   §  flags  non-­‐canonical  splice  sites   49 ORFs AND SPLICE SITES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 50. Non-­‐canonical  splices  are  indicated  by   an   orange   circle   with   a   white   exclama'on  point  inside,  placed  over   the  edge  of  the  offending  exon.     Canonical  splice  sites:   3’-­‐…exon]GA  /  TG[exon…-­‐5’   5’-­‐…exon]GT  /  AG[exon…-­‐3’   reverse  strand,  not  reverse-­‐complemented:   forward  strand   50 SPLICE SITES Zoom  to  review  non-­‐canonical   splice  site  warnings.  Although   these  may  not  always  have  to  be   corrected  (e.g  GC  donor),  they   should  be  flagged  with  a   comment.     Exon/intron  splice  site  error  warning   Curated  model   BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 51. Apollo  calculates  the  longest  possible  open  reading   frame  (ORF)  that  includes  canonical  ‘Start’  and   ‘Stop’  signals  within  the  predicted  exons.     If  ‘Start’  appears  to  be  incorrect,  modify  it  by  selec'ng   an  in-­‐frame  ‘Start’  codon  further  up  or   downstream,  depending  on  evidence  (proteins,   RNAseq).       It  may  be  present  outside  the  predicted  gene   model,  within  a  region  supported  by  another   evidence  track.     In  very  rare  cases,  the  actual  ‘Start’  codon  may  be   non-­‐canonical  (non-­‐ATG).     51 ‘Start’ AND ‘Stop’ SITES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 52. 1.  Zoom  in  to  clearly  resolve  each  exon  as  a  dis'nct  rectangle.     2.  Two  exons  from  different  tracks  sharing  the  same  start/end  coordinates   display  a  red  bar  to  indicate  matching  edges.   3.  Selec'ng  the  whole  annota'on  or  one  exon  at  a  'me,  use  this  edge-­‐ matching  func'on  and  scroll  along  the  length  of  the  annota'on,   verifying  exon  boundaries  against  available  data.     Use  square  [  ]  brackets  to  scroll  from  exon  to  exon.   User  curly  {  }  brackets  to  scroll  from  annota'on  to  annota'on.   4.  Check  if  cDNA  /  RNAseq  reads  lack  one  or  more  of  the  annotated  exons   or  include  addi'onal  exons.     52 CHECKING EXON INTEGRITY BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 54. Evidence  may  support  joining  two  or  more  different  gene  models.     Warning:  protein  alignments  may  have  incorrect  splice  sites  and  lack  non-­‐conserved  regions!     1.  In  ‘User-­‐created  Annota<ons’  area  shit-­‐click  to  select  an  intron  from  each  gene  model  and   right  click  to  select  the  ‘Merge’  op'on  from  the  menu.     2.  Drag  suppor'ng  evidence  tracks  over  the  candidate  models  to  corroborate  overlap,  or   review  edge  matching  and  coverage  across  models.   3.  Check  the  resul'ng  transla'on  by  querying  a  protein  database  e.g.  UniProt,  NCBI  nr.  Add   comments  to  record  that  this  annota'on  is  the  result  of  a  merge.   54 Red  lines  around  exons:   ‘edge-­‐matching’  allows  annotators  to  confirm  whether  the   evidence  is  in  agreement  without  examining  each  exon  at  the   base  level.   COMPLEX CASES merge two gene predictions on the same scaffold BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 55. One  or  more  splits  may  be  recommended  when:     -­‐  different  segments  of  the  predicted  protein  align  to  two  or  more  different   gene  families     -­‐  predicted  protein  doesn’t  align  to  known  proteins  over  its  en're  length   -­‐  Transcript  data  may  support  a  split,  but  first  verify  whether  they  are   alterna've  transcripts.     55 COMPLEX CASES split a gene prediction BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 56. DNA  Track   ‘User-­‐created  Annota=ons’  Track   56 COMPLEX CASES correcting frameshifts and single-base errors Always  remember:  when  annota'ng  gene  models  using  Apollo,  you  are  looking  at  a  ‘frozen’  version  of   the  genome  assembly  and  you  will  not  be  able  to  modify  the  assembly  itself.   BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 57. 57 COMPLEX CASES correcting selenocysteine containing proteins BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 58. 58 COMPLEX CASES correcting selenocysteine containing proteins BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 59. 1.  Apollo  allows  annotators  to  make  single  base  modifica'ons  or  frameshits  that  are  reflected  in   the  sequence  and  structure  of  any  transcripts  overlapping  the  modifica'on.  These   manipula'ons  do  NOT  change  the  underlying  genomic  sequence.     2.  If  you  determine  that  you  need  to  make  one  of  these  changes,  zoom  in  to  the  nucleo'de  level   and  right  click  over  a  single  nucleo'de  on  the  genomic  sequence  to  access  a  menu  that   provides  op'ons  for  crea'ng  inser'ons,  dele'ons  or  subs'tu'ons.     3.  The  ‘Create  Genomic  Inser<on’  feature  will  require  you  to  enter  the  necessary  string  of   nucleo'de  residues  that  will  be  inserted  to  the  right  of  the  cursor’s  current  loca'on.  The   ‘Create  Genomic  Dele<on’  op'on  will  require  you  to  enter  the  length  of  the  dele'on,  star'ng   with  the  nucleo'de  where  the  cursor  is  posi'oned.  The  ‘Create  Genomic  Subs<tu<on’  feature   asks  for  the  string  of  nucleo'de  residues  that  will  replace  the  ones  on  the  DNA  track.   4.  Once  you  have  entered  the  modifica'ons,  Apollo  will  recalculate  the  corrected  transcript  and   protein  sequences,  which  will  appear  when  you  use  the  right-­‐click  menu  ‘Get  Sequence’   op'on.  Since  the  underlying  genomic  sequence  is  reflected  in  all  annota'ons  that  include  the   modified  region  you  should  alert  the  curators  of  your  organisms  database  using  the   ‘Comments’  sec'on  to  report  the  CDS  edits.     5.  In  special  cases  such  as  selenocysteine  containing  proteins  (read-­‐throughs),  right-­‐click  over  the   offending/premature  ‘Stop’  signal  and  choose  the  ‘Set  readthrough  stop  codon’  op'on  from   the  menu.     59 COMPLEX CASES correcting frameshifts, single-base errors, and selenocysteines BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 60. 60 | 60 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Annotation right-click menu
  • 61. 61 Annota'ons,  annota'on  edits,  and  History:  stored  in  a  centralized  database.   61 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  • 62. Follow  the  checklist  un'l  you  are  happy  with  the  annota'on!   And  remember  to…   –  comment  to  validate  your  annota'on,  even  if  you  made  no  changes  to  an   exis'ng  model.  Think  of  comments  as  your  vote  of  confidence.     –  or  add  a  comment  to  inform  the  community  of  unresolved  issues  you   think  this  model  may  have.   62 | 62 Always  Remember:  Apollo  cura'on  is  a  community  effort  so  please   use  comments  to  communicate  the  reasons  for  your     annota'on.  Your  comments  will  be  visible  to  everyone.   COMPLETING THE ANNOTATION BECOMING ACQUAINTED WITH APOLLO
  • 63. 63 | 63 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Annotation right-click menu
  • 64. 64 The  Annota'on  Informa=on  Editor   64 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO DBXRefs  are  database  crossed  references:  if  you  have   reason  to  believe  that  this  gene  is  linked  to  a  gene  in  a   public  database  (including  your  own),  then  add  it  here.  
  • 65. 65 The  Annota'on  Informa=on  Editor   •  Add  PubMed  IDs   •  Include  GO  terms  as  appropriate   from  any  of  the  three  ontologies   •  Write  comments  sta'ng  how  you   have  validated  each  model.   65 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  • 67. •  Check  ‘Start’  and  ‘Stop’  sites.   •  Check    splice  sites:  most  splice  sites  display   these  residues  …]5’-­‐GT/AG-­‐3’[…   •  Check  if  you  can  annotate  UTRs,  for  example   using  RNA-­‐Seq  data:   –  align  it  against  relevant  genes/gene  family   –  blastp  against  NCBI’s  RefSeq  or  nr   •  Check  for  gaps  in  the  genome.   •  Addi'onal  func'onality  may  be  necessary:   –  merging  2  gene  predic'ons  on  the  same   scaffold   –  merging  2  gene  predic'ons  from  different   scaffolds     –  splifng  a  gene  predic'on   –  correc'ng  frameshigs  and  other  errors  in   the  genome  assembly   –  annota'ng  selenocysteines,  correc'ng   single-­‐base  errors,  etc.   67 | 67 •  Add:   –  Important  project  informa'on  in  the  form  of   comments   –  IDs  from  public  databases  e.g.  GenBank  (via   DBXRef),  gene  symbol(s),  common  name(s),   synonyms,  top  BLAST  hits,  orthologs  with   species  names,  and  everything  else  you  can   think  of,  because  you  are  the  expert.   –  Comments  about  the  kinds  of  changes  you   made  to  the  gene  model  of  interest,  if  any.     –  Any  appropriate  func'onal  assignments,  e.g.  via   BLAST,  RNA-­‐Seq  data,  literature  searches,  etc.   CHECKLIST for accuracy and integrity MANUAL ANNOTATION CHECKLIST
  • 69. 69i5K Workspace@NAL THE COLLABORATIVE CURATION PROCESS AT i5K
 1.  A  computa'onally  predicted  consensus  gene  set  has  been  generated   using  mul'ple  lines  of  evidence;  e.g.  LDEC_v0.5.3-­‐Models     2.  i5K  Projects  will  integrate  consensus  computa'onal  predic'ons  with   manual  annota'ons  to  produce  an  updated  Official  Gene  Set  (OGS):   Achtung!   •  If  it’s  not  on  either  track,  it  won’t  make  the  OGS!   •  If  it’s  there  and  it  shouldn’t,  it  will  s'll  make  the  OGS!  
  • 70. 70i5K Workspace@NAL THE COLLABORATIVE CURATION PROCESS AT i5K
 3.  In  some  cases  algorithms  and  metrics  used  to  generate  consensus  sets   may  actually  reduce  the  accuracy  of  the  gene’s  representa'on.  User   your  judgment  and  choose  a  different  model  to  annotate.   4.  Isoforms:  drag  original  and  alterna'vely  spliced  form  to  ‘User-­‐created   Annota<ons’  area.   5.  If  an  annota'on  needs  to  be  removed  from  the  consensus  set,  drag  it  to   the  ‘User-­‐created  Annota<ons’  area  and  label  as  ‘Delete’  on  the   Informa1on  Editor.   6.  Overlapping  interests?  Collaborate  to  reach  agreement.   7.  Follow  guidelines  for  i5K  Pilot  Species  Projects,  at  h@p://goo.gl/LRu1VY  
  • 72. Example Example 72  Cura'on  example  using  the  Hyalella  azteca   genome  (amphipod  crustacean).  
  • 73. What do we know about this genome? •  Currently  publicly  available  data  at  NCBI:   •  >37,000    nucleo'de  seqsà  scaffolds,  mitochondrial  genes   •  344    amino  acid  seqsà  mitochondrion   •  47    ESTs   •  0      conserved  domains  iden'fied   •  0    “gene”  entries  submi@ed     •  Data  at  i5K  Workspace@NAL  (annota'on  hosted  at  USDA)     -­‐  10,832  scaffolds:  23,288  transcripts:  12,906  proteins   Example 73
  • 74. PubMed Search: 
 what’s new? Example 74
  • 75. PubMed Search: what’s new? Example 75 “Ten  popula'ons  (3  cultures,  7  from  California  water   bodies)  differed  by  at  least  550-­‐fold  in  sensi=vity  to   pyrethroids.”     “By  sequencing  the  primary  pyrethroid  target  site,  the   voltage-­‐gated  sodium  channel  (vgsc),  we  show  that   point  muta'ons  and  their  spread  in  natural  popula'ons   were  responsible  for  differences  in  pyrethroid   sensi'vity.”   “The  finding  that  a  non-­‐target  aqua'c  species  has   acquired  resistance  to  pes'cides  used  only  on  terrestrial   pests  is  troubling  evidence  of  the  impact  of  chronic   pes=cide  transport  from  land-­‐based  applica'ons  into   aqua'c  systems.”  
  • 76. How many sequences are there, publicly available, for our gene of interest? Example 76 •  Para,  (voltage-­‐gated  sodium  channel  alpha   subunit;  Nasonia  vitripennis).     •  NaCP60E  (Sodium  channel  protein  60  E;  D.   melanogaster).   –  MF:  voltage-­‐gated  ca'on  channel  ac'vity   (IDA,  GO:0022843).   –  BP:  olfactory  behavior  (IMP,  GO: 0042048),  sodium  ion  transmembrane   transport  (ISS,GO:0035725).   –  CC:  voltage-­‐gated  sodium  channel   complex  (IEA,  GO:0001518).   And  what  do  we  know  about  them?  
  • 77. Retrieving sequences for a 
 sequence similarity search. Example 77 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 78. BLAT search
 
 input   Example 78 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 79. BLAT search
 
 results   Example 79 •  High-­‐scoring  segment  pairs  (hsp)   are  listed  in  tabulated  format.   •  Clicking  on  one  line  of  results   sends  you  to  those  coordinates.  
  • 80. BLAST at i5K 
 h*ps://i5k.nal.usda.gov/blast Example 80 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 81. BLAST at i5K 
 h*ps://i5k.nal.usda.gov/blast   Example 81
  • 82. BLAST at i5K: hsps  in  “BLAST+  Results”  track   Example 82
  • 83. Creating a new gene model: drag and drop Example 83 •  Apollo  automa'cally  calculates  longest  ORF.     •  In  this  case,  ORF  includes  the  high-­‐scoring  segment  pairs  (hsp),   marked  here  in  blue.   •  Note  that  gene  is  transcribed  from  reverse  strand.  
  • 86. Also, flanking sequences (other gene models) vs. NCBI nr Example 86 In  this  case,  two  gene   models  upstream,  at  5’   end.   BLAST  hsps  
  • 87. Review alignments Example 87 HaztTmpM006234   HaztTmpM006233   HaztTmpM006232  
  • 88. Hypothesis for vgsc gene model Example 88
  • 89. Editing: merge the three models Example 89 Merge  by  dropping  an   exon  or  gene  model   onto  another.   Merge  by  selec'ng   two  exons  (holding   down  “Shit”)  and   using  the  right  click   menu.   or…  
  • 90. Result of merging the gene models: Example 90
  • 91. Editing: correct offending splice site Example 91 Modify  exon  /  intron   boundary:     -­‐  Drag  the  end  of  the   exon  to  the  nearest   canonical  splice  site.     or     -­‐  Use  right-­‐click  menu.  
  • 92. Editing: set translation start Example 92
  • 93. Editing: delete exon not supported by evidence Example 93 Delete  first  exon  from   HaztTmpM006233  
  • 94. Editing: add an exon supported by RNAseq Example 94 •  RNAseq  reads  show  evidence  in  support  of  transcribed  product,  which  was  not  predicted.   •  Add  exon  at  coordinates  97946-­‐98012  by  dragging  up  one  of  the  RNAseq  reads.  
  • 95. Editing: adjust offending splice site using evidence Example 95
  • 96. Editing: adjust other boundaries supported by evidence Example 96
  • 97. Finished model Example 97 Corroborate  integrity  and  accuracy  of  the  model:     -­‐  Start  and  Stop   -­‐  Exon  structure  and  splice  sites  …]5’-­‐GT/AG-­‐3’[…   -­‐  Check  the  predicted  protein  product  vs.  NCBI  nr,  UniProt,  etc.  
  • 98. Information Editor •  DBXRefs:  e.g.  NP_001128389.1,  N.   vitripennis,  RefSeq   •  PubMed  iden'fier:  PMID:  24065824   •  Gene  Ontology  IDs:  GO:0022843,  GO: 0042048,  GO:0035725,  GO:0001518.   •  Comments   •  Name,  Symbol   •  Approve  /  Delete  radio  bu@on   Example 98 Comments   (if  applicable)  
  • 100. PUBLIC DEMO 100 | 100 APOLLO ON THE WEB
 instructions At  i5K   1.  Register  for  access  to  Apollo  at  the  i5K  Workspace@NAL  at   h@ps://i5k.nal.usda.gov/web-­‐apollo-­‐registra'on     2.  Contact  the  coordinator  for  each  species  community  to  receive  more   informa'on  about  how  to  contribute.  Contact  info  is  available  on  each   organism’s  page.    
  • 101. PUBLIC DEMO 101 | 101 APOLLO ON THE WEB
 instructions Public  Honey  bee  demo  available  at:     h@p://GenomeArchitect.org/WebApolloDemo    
  • 102. APOLLO
 demonstration PUBLIC DEMO 102 Demonstra'on  video  is  available  at     h@ps://youtu.be/VgPtAP_fvxY  
  • 103. OUTLINE
 Web  Apollo  Collabora've  Cura'on  and     Interac've  Analysis  of  Genomes   103OUTLINE •  BIO-­‐REFRESHER   biological  concepts  for  cura'on   •  ANNOTATION   automa'c  predic'ons   •  MANUAL  ANNOTATION   necessary,  collabora've     •  APOLLO   advancing  collabora've  cura'on     •  EXAMPLE   demos  
  • 104. Thank you! 104 •  Berkeley  Bioinforma=cs  Open-­‐source  Projects  (BBOP),   Berkeley  Lab:  Apollo  and  Gene  Ontology  teams.  Suzanna   E.  Lewis  (PI).   •  §  Chris1ne  G.  Elsik  (PI).  University  of  Missouri.     •  *  Ian  Holmes  (PI).  University  of  California  Berkeley.   •  Arthropod  genomics  community:  i5K  Steering   Commi@ee  (esp.  Sue  Brown  (Kansas  State)),  Alexie   Papanicolaou  (UWS),  and  the  Honey  Bee  Genome   Sequencing  Consor'um.   •  Stephen  Ficklin,  GenSAS,  Washington  State  University   •  Apollo  is  supported  by  NIH  grants  5R01GM080203  from   NIGMS,  and  5R01HG004483  from  NHGRI.  Both  projects   are  also  supported  by  the  Director,  Office  of  Science,   Office  of  Basic  Energy  Sciences,  of  the  U.S.  Department   of  Energy  under  Contract  No.  DE-­‐AC02-­‐05CH11231   •      •  For  your  a*en=on,  thank  you!   Apollo   Nathan  Dunn   Colin  Diesh  §   Deepak  Unni  §       Gene  Ontology   Chris  Mungall   Seth  Carbon   Heiko  Dietze     BBOP   Apollo:  h@p://GenomeArchitect.org     GO:  h@p://GeneOntology.org   i5K:  h@p://arthropodgenomes.org/wiki/i5K   Thank  you!   NAL  at  USDA   Monica  Poelchau   Christopher  Childers   Gary  Moore   Mei-­‐Ju  Chen   HGSC  at  BCM   fringy  Richards   Kim  Worley     JBrowse          Eric  Yao  *