SlideShare a Scribd company logo
Introduction to Apollo
Collaborative genome annotation editing
A webinar for the i5K Research Community - Hemiptera
Monica Munoz-Torres | @monimunozto
Berkeley Bioinformatics Open-Source Projects (BBOP)
Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory
i5k Pilot Project Species Calls | 9 February, 2016
http://GenomeArchitect.org
Outline
•  Today you will discover
effective ways to extract
valuable information
about a genome through
curation efforts. Apollo	
  Collabora've	
  Cura'on	
  and	
  	
  
Interac've	
  Analysis	
  of	
  Genomes	
  
After this talk you will...
•  Better understand ‘curation’ in the context of genome annotation:
assembled genome à automated annotation à manual annotation
•  Become familiar with Apollo’s environment and functionality.
•  Learn to identify homologs of known genes of interest in your newly
sequenced genome.
•  Learn how to corroborate and modify automatically annotated gene
models using all available evidence in Apollo.
Experimental design, sampling.
Comparative analyses
Official / Merged
Gene Set
Manual
Annotation
Automated
Annotation
Sequencing
Assembly
Synthesis &
dissemination.
This is our focus.
We must care about curation
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
The gene set of an organism informs a variety of studies:
•  Characterization: Gene number, GC%, TEs, repeats.
•  Functional assignments.
•  Molecular evolution, sequence conservation.
•  Gene families.
•  Metabolic pathways.
•  What makes an organism what it is?
What makes a bee a “bee”?
Genome Curation
Identifies elements that best
represent the underlying biology
and eliminates elements that
reflect systemic errors of
automated analyses.
Assigns function through
comparative analysis of similar
genome elements from closely
related species using literature,
databases, and experimental
data.
Apollo
Gene Ontology
Resources
A few things to remember

when conducting manual annotation
To	
  remember…	
  Biological	
  concepts	
  to	
  be;er	
  
understand	
  manual	
  annota'on	
  
7BIO-REFRESHER
•  KEEP	
  A	
  GLOSSARY	
  HANDY	
  	
  
from	
  con$g	
  to	
  splice	
  site	
  
	
  
•  WHAT	
  IS	
  A	
  GENE?	
  
defining	
  your	
  goal	
  
•  TRANSCRIPTION	
  
mRNA	
  in	
  detail	
  
	
  
•  TRANSLATION	
  
reading	
  frames,	
  etc.	
  
•  GENOME	
  CURATION	
  
steps	
  involved	
  
The gene: a “moving target”
“The gene is a union
of genomic
sequences encoding
a coherent set of
potentially
overlapping
functional products.”
Gerstein et al., 2007. Genome Res
9
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
mRNA
•  Although of brief existence, understanding mRNAs is crucial,
as they will become the center of your work.
10BIO-REFRESHER
Reading frames
v  In eukaryotes, only one reading frame per section of DNA is biologically
relevant at a time: it has the potential to be transcribed into RNA and
translated into protein. This is called the OPEN READING FRAME (ORF)
•  ORF = Start signal + coding sequence (divisible by 3) + Stop signal
11BIO-REFRESHER
Splice sites
v  The spliceosome catalyzes the removal of introns and the ligation of
flanking exons.
v  Splicing signals (from the point of view of an intron):
•  One splice signal (site) on the 5’ end: usually GT (less common: GC)
•  And a 3’ end splice site: usually AG
•  Canonical splice sites look like this: …]5’-GT/AG-3’[…
12BIO-REFRESHER
Exons and Introns
v  Introns can interrupt the reading frame of a gene by inserting a sequence
between two consecutive codons
v  Between the first and second nucleotide of a codon
v  Or between the second and third nucleotide of a codon
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
Predic'on	
  &	
  Annota'on	
  
14GENE PREDICTION & ANNOTATION
PREDICTION & ANNOTATION
v  Iden'fica'on	
  and	
  annota'on	
  of	
  genome	
  features:	
  
	
  
•  primarily	
  focuses	
  on	
  protein-­‐coding	
  genes.	
  	
  
•  also	
  iden'fies	
  RNAs	
  (tRNA,	
  rRNA,	
  long	
  and	
  small	
  non-­‐coding	
  
RNAs	
  (ncRNA)),	
  regulatory	
  mo'fs,	
  repe''ve	
  elements,	
  etc.	
  
	
  
•  happens	
  in	
  2	
  phases:	
  
1.  Computa'on	
  phase	
  	
  
2.  Annota'on	
  phase	
  
15GENE PREDICTION & ANNOTATION
COMPUTATION PHASE
a.  Experimental	
  data	
  are	
  aligned	
  to	
  the	
  genome:	
  expressed	
  sequence	
  tags,	
  
RNA-­‐sequencing	
  reads,	
  proteins	
  (also	
  from	
  other	
  species).	
  
	
  
	
  
	
  
	
  
	
  
b.  Gene	
  predic;ons	
  are	
  generated:	
  	
  
	
  -­‐	
  ab	
  ini$o:	
  based	
  on	
  nucleo'de	
  sequence	
  and	
  composi'on	
  
	
  e.g.	
  Augustus,	
  GENSCAN,	
  geneid,	
  fgenesh,	
  etc.	
  
	
  -­‐	
  evidence-­‐driven:	
  iden'fying	
  also	
  domains	
  and	
  mo'fs	
  
	
  e.g.	
  SGP2,	
  JAMg,	
  fgenesh++,	
  etc.	
  
	
  
	
  
Result:	
  the	
  single	
  most	
  likely	
  coding	
  sequence,	
  no	
  UTRs,	
  no	
  isoforms.	
  
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
16GENE PREDICTION & ANNOTATION
ANNOTATION PHASE
Experimental	
  data	
  (evidence)	
  and	
  predic'ons	
  are	
  synthe'zed	
  into	
  gene	
  
annota'ons.	
  
	
  
Result:	
  gene	
  models	
  that	
  generally	
  include	
  UTRs,	
  isoforms,	
  evidence	
  trails.	
  
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
5’	
  UTR	
   3’	
  UTR	
  
17
In	
  some	
  cases	
  algorithms	
  and	
  metrics	
  used	
  to	
  generate	
  
consensus	
  sets	
  may	
  actually	
  reduce	
  the	
  accuracy	
  of	
  the	
  gene’s	
  
representa'on.	
  
CONSENSUS GENE SETS
Gene	
  models	
  may	
  be	
  organized	
  into	
  sets	
  using:	
  
v  combiners	
  for	
  automa'c	
  integra'on	
  of	
  predicted	
  sets	
  	
  
e.g:	
  GLEAN,	
  EvidenceModeler	
  
or	
  
v  tools	
  packaged	
  into	
  pipelines	
  
e.g:	
  MAKER,	
  PASA,	
  Gnomon,	
  Ensembl,	
  etc.	
  
GENE PREDICTION & ANNOTATION
ANNOTATION

needs some refinement
No one is perfect, least of all automated annotation. 18
New	
  technologies	
  bring	
  new	
  challenges:	
  	
  
•  Assembly	
  errors	
  can	
  cause	
  fragmented	
  
annota'ons	
  
•  Limited	
  coverage	
  makes	
  precise	
  
iden'fica'on	
  a	
  difficult	
  task	
  
Image: www.BroadInstitute.org
MANUAL ANNOTATION

improving predictions
Precise	
  elucida;on	
  of	
  biological	
  features	
  
encoded	
  in	
  the	
  genome	
  requires	
  careful	
  
examina;on	
  and	
  review.	
  	
  
Schiex	
  et	
  al.	
  Nucleic	
  Acids	
  2003	
  (31)	
  13:	
  3738-­‐3741	
  
Automated Predictions
Experimental Evidence
Manual Annotation – to the rescue. 19
cDNAs,	
  HMM	
  domain	
  searches,	
  RNAseq,	
  
genes	
  from	
  other	
  species.	
  
GENOME CURATION

an inherently collaborative task
GENE PREDICTION & ANNOTATION 20
So	
  many	
  sequences,	
  not	
  enough	
  hands.	
  
Apis	
  mellifera	
  |	
  Alexander	
  Wild	
  |	
  www.alexanderwild.com	
  
We have provided continuous training and support for hundreds of
geographically dispersed scientists to conduct manual annotations
efforts in order to recover coding sequences in agreement with all
available biological evidence.
21
Lessons learned
APOLLO
•  Collaborative work distills invaluable knowledge.
•  A little training goes a long way!
Wet lab scientists can easily learn to maximize the
generation of accurate, biologically supported gene models.
Apollo	
  
APOLLO: versatile genome annotation editing
•  Apollo is a web-based genome annotation editor, integrated with JBrowse
•  Supports real time collaboration & generates analysis-ready data
USER-CREATED ANNOTATIONS
EVIDENCE TRACKS
ANNOTATOR PANEL
BECOMING ACQUAINTED WITH APOLLO 24	
General process of curation
1.  Select	
  or	
  find	
  a	
  region	
  of	
  interest,	
  e.g.	
  scaffold.	
  
2.  Select	
  appropriate	
  evidence	
  tracks	
  to	
  review	
  the	
  gene	
  model.	
  
3.  Determine	
  whether	
  a	
  feature	
  in	
  an	
  exis'ng	
  evidence	
  track	
  
will	
  provide	
  a	
  reasonable	
  gene	
  model	
  to	
  start	
  working.	
  
4.  If	
  necessary,	
  adjust	
  the	
  gene	
  model.	
  
5.  Check	
  your	
  edited	
  gene	
  model	
  for	
  integrity	
  and	
  accuracy	
  by	
  
comparing	
  it	
  with	
  available	
  homologs.	
  
6.  Comment	
  and	
  finish.	
  
Apollo - version at i5K Workspace@NAL
254. Becoming Acquainted with Web Apollo.
25
The	
  Sequence	
  Selec'on	
  Window	
  
Sort
Apollo - version at i5K Workspace@NAL
26
“Old	
  Track	
  Select	
  Page”	
  
4. Becoming Acquainted with Web Apollo.
26
27
APOLLO

annotation editing environment
BECOMING ACQUAINTED WITH APOLLO
Color	
  by	
  CDS	
  frame,	
  
toggle	
  strands,	
  set	
  color	
  
scheme	
  and	
  highlights.	
  
-­‐	
  Upload	
  evidence	
  files	
  
(GFF3,	
  BAM,	
  BigWig),	
  
-­‐	
  combina;on	
  track	
  	
  
-­‐	
  sequence	
  search	
  track	
  
Query	
  the	
  genome	
  using	
  
BLAT.	
  
Naviga'on	
  and	
  zoom.	
  
Search	
  for	
  a	
  gene	
  
model	
  or	
  a	
  scaffold.	
  
Get	
  coordinates	
  and	
  “rubber	
  
band”	
  selec'on	
  for	
  zooming.	
  
Login	
  
User-­‐created	
  
annota'ons.	
  
New	
  
annotator	
  
panel.	
  
Evidence	
  
Tracks	
  
Stage	
  and	
  
cell-­‐type	
  
specific	
  
transcrip'on	
  
data.	
  
	
  h;p://genomearchitect.org/web_apollo_user_guide	
  	
  
28 | 28	
BECOMING ACQUAINTED WITH APOLLO
USER NAVIGATION
Annotator	
  
panel.	
  
•  Choose	
  appropriate	
  evidence	
  from	
  list	
  of	
  “Tracks”	
  on	
  annotator	
  panel.	
  	
  
	
  
•  Select	
  &	
  drag	
  elements	
  from	
  evidence	
  track	
  into	
  the	
  ‘User-­‐created	
  Annota$ons’	
  area.	
  
	
  
•  Hovering	
  over	
  annota'on	
  in	
  progress	
  brings	
  up	
  an	
  informa'on	
  pop-­‐up.	
  
•  Crea'ng	
  a	
  new	
  annota'on	
  
Adding a gene model
Adding a gene model
Adding a gene model
Editing functionality
Editing functionality
Example: Adding an exon supported by experimental data
•  RNAseq reads show evidence in support of a transcribed product that was not predicted.
•  Add exon by dragging up one of the RNAseq reads.
Editing functionality
Example: Adjusting exon boundaries supported by experimental data
Cura'ng	
  with	
  Apollo	
  
36 | 36	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  ‘Zoom	
  to	
  base	
  level’	
  reveals	
  the	
  DNA	
  Track.	
  
37 | 37	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Color	
  exons	
  by	
  CDS	
  from	
  the	
  ‘View’	
  menu.	
  
38 |
Zoom	
  in/out	
  with	
  keyboard:	
  
shio	
  +	
  arrow	
  keys	
  up/down	
  
38	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Toggle	
  reference	
  DNA	
  sequence	
  and	
  transla;on	
  frames	
  in	
  forward	
  
strand.	
  Toggle	
  models	
  in	
  either	
  direc'on.	
  
annota'ng	
  simple	
  cases	
  
“Simple	
  case”:	
  	
  
	
  -­‐	
  the	
  predicted	
  gene	
  model	
  is	
  correct	
  or	
  nearly	
  correct,	
  and	
  	
  
	
  -­‐	
  this	
  model	
  is	
  supported	
  by	
  evidence	
  that	
  completely	
  or	
  mostly	
  
agrees	
  with	
  the	
  predic'on.	
  	
  
	
  -­‐	
  evidence	
  that	
  extends	
  beyond	
  the	
  predicted	
  model	
  is	
  assumed	
  
to	
  be	
  non-­‐coding	
  sequence.	
  	
  
	
  
The	
  following	
  are	
  simple	
  modifica'ons.	
  	
  
	
  
40	
ANNOTATING SIMPLE CASES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
•  A	
  confirma'on	
  box	
  will	
  warn	
  you	
  if	
  the	
  receiving	
  transcript	
  is	
  not	
  on	
  the	
  
same	
  strand	
  as	
  the	
  feature	
  where	
  the	
  new	
  exon	
  originated.	
  
	
  
•  Check	
  ‘Start’	
  and	
  ‘Stop’	
  signals	
  aoer	
  each	
  edit.	
  
41	
ADDING EXONS
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
If	
  transcript	
  alignment	
  data	
  are	
  available	
  &	
  extend	
  beyond	
  your	
  original	
  annota'on,	
  	
  
you	
  may	
  extend	
  or	
  add	
  UTRs.	
  	
  
1.  Right	
  click	
  at	
  the	
  exon	
  edge	
  and	
  ‘Zoom	
  to	
  base	
  level’.	
  	
  
2.  Place	
  the	
  cursor	
  over	
  the	
  edge	
  of	
  the	
  exon	
  un$l	
  it	
  becomes	
  a	
  black	
  arrow	
  then	
  click	
  
and	
  drag	
  the	
  edge	
  of	
  the	
  exon	
  to	
  the	
  new	
  coordinate	
  posi'on	
  that	
  includes	
  the	
  UTR.	
  	
  
42	
ADDING UTRs
To	
  add	
  a	
  new	
  spliced	
  UTR	
  to	
  an	
  exis'ng	
  	
  
annota'on	
  also	
  follow	
  the	
  procedure	
  for	
  adding	
  an	
  exon.	
  
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
To	
  modify	
  an	
  exon	
  boundary	
  and	
  match	
  
data	
   in	
   the	
   evidence	
   tracks:	
   select	
  
both	
   the	
   [offending]	
   exon	
   and	
   the	
  
feature	
  with	
  the	
  expected	
  boundary,	
  
then	
  right	
  click	
  on	
  the	
  annota'on	
  to	
  
select	
  ‘Set	
  3’	
  end’	
  or	
  ‘Set	
  5’	
  end’	
  as	
  
appropriate.	
  
	
  
In	
  some	
  cases	
  all	
  the	
  data	
  may	
  disagree	
  with	
  the	
  annota'on,	
  in	
  
other	
  cases	
  some	
  data	
  support	
  the	
  annota'on	
  and	
  some	
  of	
  the	
  
data	
  support	
  one	
  or	
  more	
  alterna've	
  transcripts.	
  Try	
  to	
  annotate	
  
as	
  many	
  alterna've	
  transcripts	
  as	
  are	
  well	
  supported	
  by	
  the	
  data.	
  
43	
MATCHING EXON BOUNDARY TO EVIDENCE
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-­‐canonical	
  splice	
  sites	
  flags.	
   Double	
  click:	
  selec'on	
  of	
  
feature	
  and	
  sub-­‐features	
  
Evidence	
  Tracks	
  Area	
  
‘User-­‐created	
  Annota$ons’	
  Track	
  
Edge-­‐matching	
  
Apollo’s	
  edi'ng	
  logic	
  (brain):	
  	
  
§  selects	
  longest	
  ORF	
  as	
  CDS	
  
§  flags	
  non-­‐canonical	
  splice	
  sites	
  
44	
ORFs AND SPLICE SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-­‐canonical	
  splices	
  are	
  indicated	
  by	
  
an	
   orange	
   circle	
   with	
   a	
   white	
  
exclama'on	
  point	
  inside,	
  placed	
  over	
  
the	
  edge	
  of	
  the	
  offending	
  exon.	
  	
  
Canonical	
  splice	
  sites:	
  
3’-­‐…exon]GA	
  /	
  TG[exon…-­‐5’	
  
5’-­‐…exon]GT	
  /	
  AG[exon…-­‐3’	
  
reverse	
  strand,	
  not	
  reverse-­‐complemented:	
  
forward	
  strand	
  
45	
SPLICE SITES
Zoom	
  to	
  review	
  non-­‐canonical	
  
splice	
  site	
  warnings.	
  Although	
  
these	
  may	
  not	
  always	
  have	
  to	
  be	
  
corrected	
  (e.g	
  GC	
  donor),	
  they	
  
should	
  be	
  flagged	
  with	
  a	
  
comment.	
  	
  
Exon/intron	
  splice	
  site	
  error	
  warning	
  
Curated	
  model	
  
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Apollo	
  calculates	
  the	
  longest	
  possible	
  open	
  reading	
  
frame	
  (ORF)	
  that	
  includes	
  canonical	
  ‘Start’	
  and	
  
‘Stop’	
  signals	
  within	
  the	
  predicted	
  exons.	
  	
  
If	
  ‘Start’	
  appears	
  to	
  be	
  incorrect,	
  modify	
  it	
  by	
  selec'ng	
  
an	
  in-­‐frame	
  ‘Start’	
  codon	
  further	
  up	
  or	
  
downstream,	
  depending	
  on	
  evidence	
  (proteins,	
  
RNAseq).	
  	
  
	
  
It	
  may	
  be	
  present	
  outside	
  the	
  predicted	
  gene	
  
model,	
  within	
  a	
  region	
  supported	
  by	
  another	
  
evidence	
  track.	
  
	
  
In	
  very	
  rare	
  cases,	
  the	
  actual	
  ‘Start’	
  codon	
  may	
  be	
  
non-­‐canonical	
  (non-­‐ATG).	
  	
  
46	
‘Start’ AND ‘Stop’ SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
1.  Two	
  exons	
  from	
  different	
  tracks	
  sharing	
  the	
  same	
  start/end	
  coordinates	
  
display	
  a	
  red	
  bar	
  to	
  indicate	
  matching	
  edges.	
  
2.  Selec'ng	
  the	
  whole	
  annota'on	
  or	
  one	
  exon	
  at	
  a	
  'me,	
  use	
  this	
  edge-­‐
matching	
  func'on	
  and	
  scroll	
  along	
  the	
  length	
  of	
  the	
  annota'on,	
  
verifying	
  exon	
  boundaries	
  against	
  available	
  data.	
  	
  
Use	
  square	
  [	
  ]	
  brackets	
  to	
  scroll	
  from	
  exon	
  to	
  exon.	
  
User	
  curly	
  {	
  }	
  brackets	
  to	
  scroll	
  from	
  annota'on	
  to	
  annota'on.	
  
3.  Check	
  if	
  cDNA	
  /	
  RNAseq	
  reads	
  lack	
  one	
  or	
  more	
  of	
  the	
  annotated	
  exons	
  
or	
  include	
  addi'onal	
  exons.	
  	
  
47	
CHECKING EXON INTEGRITY
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
annota'ng	
  complex	
  cases	
  
Evidence	
  may	
  support	
  joining	
  two	
  or	
  more	
  different	
  gene	
  models.	
  	
  
Warning:	
  protein	
  alignments	
  may	
  have	
  incorrect	
  splice	
  sites	
  and	
  lack	
  non-­‐conserved	
  regions!	
  
	
  
1.  In	
  ‘User-­‐created	
  Annota<ons’	
  area	
  shio-­‐click	
  to	
  select	
  an	
  intron	
  from	
  each	
  gene	
  model	
  and	
  
right	
  click	
  to	
  select	
  the	
  ‘Merge’	
  op'on	
  from	
  the	
  menu.	
  	
  
2.  Drag	
  suppor'ng	
  evidence	
  tracks	
  over	
  the	
  candidate	
  models	
  to	
  corroborate	
  overlap,	
  or	
  
review	
  edge	
  matching	
  and	
  coverage	
  across	
  models.	
  
3.  Check	
  the	
  resul'ng	
  transla'on	
  by	
  querying	
  a	
  protein	
  database	
  e.g.	
  UniProt,	
  NCBI	
  nr.	
  Add	
  
comments	
  to	
  record	
  that	
  this	
  annota'on	
  is	
  the	
  result	
  of	
  a	
  merge.	
  
49	
Red	
  lines	
  around	
  exons:	
  
‘edge-­‐matching’	
  allows	
  annotators	
  to	
  confirm	
  whether	
  the	
  
evidence	
  is	
  in	
  agreement	
  without	
  examining	
  each	
  exon	
  at	
  the	
  
base	
  level.	
  
COMPLEX CASES
merge two gene predictions on the same scaffold
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
One	
  or	
  more	
  splits	
  may	
  be	
  recommended	
  when:	
  	
  
-­‐	
  different	
  segments	
  of	
  the	
  predicted	
  protein	
  align	
  to	
  two	
  or	
  more	
  different	
  
gene	
  families	
  	
  
-­‐	
  predicted	
  protein	
  doesn’t	
  align	
  to	
  known	
  proteins	
  over	
  its	
  en're	
  length	
  
-­‐	
  Transcript	
  data	
  may	
  support	
  a	
  split,	
  but	
  first	
  verify	
  whether	
  they	
  are	
  
alterna've	
  transcripts.	
  	
  
50	
COMPLEX CASES
split a gene prediction
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
DNA	
  Track	
  
‘User-­‐created	
  Annota;ons’	
  Track	
  
51	
COMPLEX CASES
annotate frameshifts and correct single-base errors
Always	
  remember:	
  when	
  annota'ng	
  gene	
  models	
  using	
  Apollo,	
  you	
  are	
  looking	
  at	
  a	
  ‘frozen’	
  version	
  of	
  
the	
  genome	
  assembly	
  and	
  you	
  will	
  not	
  be	
  able	
  to	
  modify	
  the	
  assembly	
  itself.	
  
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
52	
COMPLEX CASES
correcting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
53	
COMPLEX CASES
correcting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
1.  Apollo	
  allows	
  annotators	
  to	
  make	
  single	
  base	
  modifica'ons	
  or	
  frameshios	
  that	
  are	
  reflected	
  in	
  
the	
  sequence	
  and	
  structure	
  of	
  any	
  transcripts	
  overlapping	
  the	
  modifica'on.	
  These	
  
manipula'ons	
  do	
  NOT	
  change	
  the	
  underlying	
  genomic	
  sequence.	
  	
  
2.  If	
  you	
  determine	
  that	
  you	
  need	
  to	
  make	
  one	
  of	
  these	
  changes,	
  zoom	
  in	
  to	
  the	
  nucleo'de	
  level	
  
and	
  right	
  click	
  over	
  a	
  single	
  nucleo'de	
  on	
  the	
  genomic	
  sequence	
  to	
  access	
  a	
  menu	
  that	
  
provides	
  op'ons	
  for	
  crea'ng	
  inser'ons,	
  dele'ons	
  or	
  subs'tu'ons.	
  	
  
3.  The	
  ‘Create	
  Genomic	
  Inser<on’	
  feature	
  will	
  require	
  you	
  to	
  enter	
  the	
  necessary	
  string	
  of	
  
nucleo'de	
  residues	
  that	
  will	
  be	
  inserted	
  to	
  the	
  right	
  of	
  the	
  cursor’s	
  current	
  loca'on.	
  The	
  
‘Create	
  Genomic	
  Dele<on’	
  op'on	
  will	
  require	
  you	
  to	
  enter	
  the	
  length	
  of	
  the	
  dele'on,	
  star'ng	
  
with	
  the	
  nucleo'de	
  where	
  the	
  cursor	
  is	
  posi'oned.	
  The	
  ‘Create	
  Genomic	
  Subs<tu<on’	
  feature	
  
asks	
  for	
  the	
  string	
  of	
  nucleo'de	
  residues	
  that	
  will	
  replace	
  the	
  ones	
  on	
  the	
  DNA	
  track.	
  
4.  Once	
  you	
  have	
  entered	
  the	
  modifica'ons,	
  Apollo	
  will	
  recalculate	
  the	
  corrected	
  transcript	
  and	
  
protein	
  sequences,	
  which	
  will	
  appear	
  when	
  you	
  use	
  the	
  right-­‐click	
  menu	
  ‘Get	
  Sequence’	
  
op'on.	
  Since	
  the	
  underlying	
  genomic	
  sequence	
  is	
  reflected	
  in	
  all	
  annota'ons	
  that	
  include	
  the	
  
modified	
  region	
  you	
  should	
  alert	
  the	
  curators	
  of	
  your	
  organisms	
  database	
  using	
  the	
  
‘Comments’	
  sec'on	
  to	
  report	
  the	
  CDS	
  edits.	
  	
  
5.  In	
  special	
  cases	
  such	
  as	
  selenocysteine	
  containing	
  proteins	
  (read-­‐throughs),	
  right-­‐click	
  over	
  the	
  
offending/premature	
  ‘Stop’	
  signal	
  and	
  choose	
  the	
  ‘Set	
  readthrough	
  stop	
  codon’	
  op'on	
  from	
  
the	
  menu.	
  
	
   54	
COMPLEX CASES
annotating frameshifts and correcting single-base errors & selenocysteines
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
55 | 55	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Information Editor
56	
The	
  Annota'on	
  Informa;on	
  Editor	
  
56	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
57	
The	
  Annota'on	
  Informa;on	
  Editor	
  
•  Add	
  PubMed	
  IDs	
  
•  Include	
  GO	
  terms	
  as	
  appropriate	
  
from	
  any	
  of	
  the	
  three	
  ontologies	
  
•  Write	
  comments	
  sta'ng	
  how	
  you	
  
have	
  validated	
  each	
  model.	
  
57	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
58 | 58	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
•  Keeping track of each edit
59	
Annota'ons,	
  annota'on	
  edits,	
  and	
  History:	
  stored	
  in	
  a	
  centralized	
  database.	
  
59	
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
Follow	
  the	
  checklist	
  un'l	
  you	
  are	
  happy	
  with	
  the	
  annota'on!	
  
And	
  remember	
  to…	
  
–  comment	
  to	
  validate	
  your	
  annota'on,	
  even	
  if	
  you	
  made	
  no	
  changes	
  to	
  an	
  
exis'ng	
  model.	
  Think	
  of	
  comments	
  as	
  your	
  vote	
  of	
  confidence.	
  
	
  
–  or	
  add	
  a	
  comment	
  to	
  inform	
  the	
  community	
  of	
  unresolved	
  issues	
  you	
  
think	
  this	
  model	
  may	
  have.	
  
60 | 60	
Always	
  Remember:	
  Apollo	
  cura'on	
  is	
  a	
  community	
  effort	
  so	
  please	
  
use	
  comments	
  to	
  communicate	
  the	
  reasons	
  for	
  your	
  	
  
annota'on.	
  Your	
  comments	
  will	
  be	
  visible	
  to	
  everyone.	
  
COMPLETING THE ANNOTATION
BECOMING ACQUAINTED WITH APOLLO
Checklist	
  
•  Check	
  ‘Start’	
  and	
  ‘Stop’	
  sites.	
  
•  Check	
  	
  splice	
  sites:	
  most	
  splice	
  sites	
  display	
  
these	
  residues	
  …]5’-­‐GT/AG-­‐3’[…	
  
•  Check	
  if	
  you	
  can	
  annotate	
  UTRs,	
  for	
  example	
  
using	
  RNA-­‐Seq	
  data:	
  
– align	
  it	
  against	
  relevant	
  genes/gene	
  family	
  
– blastp	
  against	
  NCBI’s	
  RefSeq	
  or	
  nr	
  
•  Check	
  for	
  gaps	
  in	
  the	
  genome.	
  
•  Addi'onal	
  func'onality	
  may	
  be	
  necessary:	
  
– merging	
  2	
  gene	
  predic'ons	
  -­‐	
  same	
  scaffold	
  
– merging	
  2	
  gene	
  predic'ons	
  -­‐	
  different	
  
scaffolds	
  	
  
– spli`ng	
  a	
  gene	
  predic'on	
  
– annota'ng	
  frameshias	
  
– annota'ng	
  selenocysteines,	
  correc'ng	
  
single-­‐base	
  and	
  other	
  assembly	
  errors,	
  etc.	
  
62 | 62	
•  Add:	
  
–  Important	
  project	
  informa'on	
  in	
  the	
  form	
  of	
  
comments	
  
–  IDs	
  from	
  public	
  databases	
  e.g.	
  GenBank	
  (via	
  
DBXRef),	
  gene	
  symbol(s),	
  common	
  name(s),	
  
synonyms,	
  top	
  BLAST	
  hits,	
  orthologs	
  with	
  
species	
  names,	
  and	
  everything	
  else	
  you	
  can	
  
think	
  of,	
  because	
  you	
  are	
  the	
  expert.	
  
–  Comments	
  about	
  the	
  kinds	
  of	
  changes	
  you	
  
made	
  to	
  the	
  gene	
  model	
  of	
  interest,	
  if	
  any.	
  	
  
–  Any	
  appropriate	
  func'onal	
  assignments,	
  e.g.	
  
via	
  BLAST,	
  RNA-­‐Seq	
  data,	
  literature	
  searches,	
  
etc.	
  
CHECKLIST
for accuracy and integrity
MANUAL ANNOTATION CHECKLIST
Genome	
  cura'on	
  with	
  i5k	
  
64i5K Workspace@NAL
The collaborative curation process at i5k
1.  A	
  computa'onally	
  predicted	
  consensus	
  gene	
  set	
  has	
  been	
  generated	
  
using	
  mul'ple	
  lines	
  of	
  evidence;	
  e.g.	
  HVIT_v0.5.3-­‐Models	
  
	
  
2.  i5K	
  Projects	
  will	
  integrate	
  consensus	
  computa'onal	
  predic'ons	
  with	
  
manual	
  annota'ons	
  to	
  produce	
  an	
  updated	
  Official	
  Gene	
  Set	
  (OGS):	
  
Achtung!	
  
•  If	
  it’s	
  not	
  on	
  either	
  track,	
  it	
  won’t	
  make	
  the	
  OGS!	
  
•  If	
  it’s	
  there	
  and	
  it	
  shouldn’t,	
  it	
  will	
  s'll	
  make	
  the	
  OGS!	
  
65	
The ‘Replace Models’ rules
65	
BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace
66i5K Workspace@NAL
3.  In	
  some	
  cases	
  algorithms	
  and	
  metrics	
  used	
  to	
  generate	
  consensus	
  sets	
  
may	
  actually	
  reduce	
  the	
  accuracy	
  of	
  the	
  gene’s	
  representa'on.	
  Use	
  your	
  
judgment,	
  try	
  choosing	
  a	
  different	
  model	
  to	
  begin	
  the	
  annota'on.	
  
4.  Isoforms:	
  drag	
  original	
  and	
  alterna'vely	
  spliced	
  form	
  to	
  ‘User-­‐created	
  
Annota<ons’	
  area.	
  
5.  If	
  an	
  annota'on	
  needs	
  to	
  be	
  removed	
  from	
  the	
  consensus	
  set,	
  drag	
  it	
  to	
  
the	
  ‘User-­‐created	
  Annota<ons’	
  area	
  and	
  label	
  as	
  ‘Delete’	
  on	
  the	
  
Informa$on	
  Editor.	
  
6.  Overlapping	
  interests?	
  Collaborate	
  to	
  reach	
  agreement.	
  
7.  Follow	
  guidelines	
  for	
  i5K	
  Pilot	
  Species	
  Projects,	
  at	
  h;p://goo.gl/LRu1VY	
  
The collaborative curation process at i5k
Example	
  
What’s new?... 

finding inspiration in PubMed.
Example 68
“Molecular analysis of bed bug populations from across the USA and Europe
found that >80% and >95% of the respective populations contained V419L and/
or L925I mutations in the voltage-gated sodium channel gene, indicating
widespread distribution of target-site-based pyrethroid resistance.”
Homalodisca vitripennis | Alexander Wild | www.alexanderwild.comHalyomorpha halys | Fondazione Edmund Mach - Italy
Now for our species of interest. . .
Example
Example 69
	
  Cura'on	
  example	
  using	
  the	
  Hyalella	
  azteca	
  
genome	
  (amphipod	
  crustacean).	
  
What do we know about this genome?
•  Currently	
  publicly	
  available	
  data	
  at	
  NCBI:	
  
•  >37,000	
   	
  nucleo'de	
  seqsà	
  scaffolds,	
  mitochondrial	
  genes	
  
•  344	
   	
  amino	
  acid	
  seqsà	
  mitochondrion	
  
•  47 	
   	
  ESTs	
  
•  0	
   	
   	
  conserved	
  domains	
  iden'fied	
  
•  0 	
   	
  “gene”	
  entries	
  submi;ed	
  
	
  
•  Data	
  at	
  i5K	
  Workspace@NAL	
  (annota'on	
  hosted	
  at	
  USDA)	
  	
  
-­‐	
  10,832	
  scaffolds:	
  23,288	
  transcripts:	
  12,906	
  proteins	
  
Example 70
PubMed Search: 

what’s new?
Example 71
PubMed Search: what’s new?
Example 72
“Ten	
  popula'ons	
  (3	
  cultures,	
  7	
  from	
  California	
  water	
  
bodies)	
  differed	
  by	
  at	
  least	
  550-­‐fold	
  in	
  sensi;vity	
  to	
  
pyrethroids.”	
  	
  
“By	
  sequencing	
  the	
  primary	
  pyrethroid	
  target	
  site,	
  the	
  
voltage-­‐gated	
  sodium	
  channel	
  (vgsc),	
  we	
  show	
  that	
  
point	
  muta'ons	
  and	
  their	
  spread	
  in	
  natural	
  popula'ons	
  
were	
  responsible	
  for	
  differences	
  in	
  pyrethroid	
  
sensi'vity.”	
  
“The	
  finding	
  that	
  a	
  non-­‐target	
  aqua'c	
  species	
  has	
  
acquired	
  resistance	
  to	
  pes'cides	
  used	
  only	
  on	
  terrestrial	
  
pests	
  is	
  troubling	
  evidence	
  of	
  the	
  impact	
  of	
  chronic	
  
pes;cide	
  transport	
  from	
  land-­‐based	
  applica'ons	
  into	
  
aqua'c	
  systems.”	
  
How many sequences are there, publicly available,
for our gene of interest?
Example 73
•  Para,	
  (voltage-­‐gated	
  sodium	
  channel	
  alpha	
  
subunit;	
  Nasonia	
  vitripennis).	
  	
  
•  NaCP60E	
  (Sodium	
  channel	
  protein	
  60	
  E;	
  D.	
  
melanogaster).	
  
–  MF:	
  voltage-­‐gated	
  ca'on	
  channel	
  ac'vity	
  
(IDA,	
  GO:0022843).	
  
–  BP:	
  olfactory	
  behavior	
  (IMP,	
  GO:
0042048),	
  sodium	
  ion	
  transmembrane	
  
transport	
  (ISS,GO:0035725).	
  
–  CC:	
  voltage-­‐gated	
  sodium	
  channel	
  
complex	
  (IEA,	
  GO:0001518).	
  
And	
  what	
  do	
  we	
  know	
  about	
  them?	
  
Retrieving sequences for a 

sequence similarity search.
Example 74
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT search



input	
  
Example 75
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT search



results	
  
Example 76
•  High-­‐scoring	
  segment	
  pairs	
  (hsp)	
  
are	
  listed	
  in	
  tabulated	
  format.	
  
•  Clicking	
  on	
  one	
  line	
  of	
  results	
  
sends	
  you	
  to	
  those	
  coordinates.	
  
BLAST at i5K 

heps://i5k.nal.usda.gov/blast
Example 77
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAST at i5K 

heps://i5k.nal.usda.gov/blast	
  
Example 78
BLAST at i5K: hsps	
  in	
  “BLAST+	
  Results”	
  track	
  
Example 79
Creating a new gene model: drag and drop
Example 80
•  Apollo	
  automa'cally	
  calculates	
  longest	
  ORF.	
  	
  
•  In	
  this	
  case,	
  ORF	
  includes	
  the	
  high-­‐scoring	
  segment	
  pairs	
  (hsp),	
  
marked	
  here	
  in	
  blue.	
  
•  Note	
  that	
  gene	
  is	
  transcribed	
  from	
  reverse	
  strand.	
  
Available Tracks
Example 81
Get Sequence
Example 82
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Also, flanking sequences (other gene models) vs. NCBI nr
Example 83
In	
  this	
  case,	
  two	
  gene	
  
models	
  upstream,	
  at	
  5’	
  
end.	
  
BLAST	
  hsps	
  
Review alignments
Example 84
HaztTmpM006234	
  
HaztTmpM006233	
  
HaztTmpM006232	
  
Hypothesis for vgsc gene model
Example 85
Editing: merge the three models
Example 86
Merge	
  by	
  dropping	
  an	
  
exon	
  or	
  gene	
  model	
  
onto	
  another.	
  
Merge	
  by	
  selec'ng	
  
two	
  exons	
  (holding	
  
down	
  “Shio”)	
  and	
  
using	
  the	
  right	
  click	
  
menu.	
  
or…	
  
Result of merging the gene models:
Example 87
Editing: correct offending splice site
Example 88
Modify	
  exon	
  /	
  intron	
  
boundary:	
  	
  
-­‐  Drag	
  the	
  end	
  of	
  the	
  
exon	
  to	
  the	
  nearest	
  
canonical	
  splice	
  site.	
  
	
  
or	
  
	
  
-­‐  Use	
  right-­‐click	
  menu.	
  
Editing: set translation start
Example 89
Editing: delete exon not supported by evidence
Example 90
Delete	
  first	
  exon	
  from	
  
HaztTmpM006233	
  
Editing: add an exon supported by RNAseq
Example 91
•  RNAseq	
  reads	
  show	
  evidence	
  in	
  support	
  of	
  transcribed	
  product,	
  which	
  was	
  not	
  predicted.	
  
•  Add	
  exon	
  at	
  coordinates	
  97946-­‐98012	
  by	
  dragging	
  up	
  one	
  of	
  the	
  RNAseq	
  reads.	
  
Editing: adjust offending splice site using evidence
Example 92
Editing: adjust other boundaries supported by evidence
Example 93
Finished model
Example 94
Corroborate	
  integrity	
  and	
  accuracy	
  of	
  the	
  model:	
  	
  
-­‐	
  Start	
  and	
  Stop	
  
-­‐	
  Exon	
  structure	
  and	
  splice	
  sites	
  …]5’-­‐GT/AG-­‐3’[…	
  
-­‐	
  Check	
  the	
  predicted	
  protein	
  product	
  vs.	
  NCBI	
  nr,	
  UniProt,	
  etc.	
  
Information Editor
•  DBXRefs:	
  e.g.	
  NP_001128389.1,	
  N.	
  
vitripennis,	
  RefSeq	
  
•  PubMed	
  iden'fier:	
  PMID:	
  24065824	
  
•  Gene	
  Ontology	
  IDs:	
  GO:0022843,	
  GO:
0042048,	
  GO:0035725,	
  GO:0001518.	
  
•  Comments	
  
•  Name,	
  Symbol	
  
•  Approve	
  /	
  Delete	
  radio	
  bu;on	
  
Example 95
Comments	
  
(if	
  applicable)	
  
Go	
  play!	
  
PUBLIC DEMO
97 | 97	
APOLLO ON THE WEB

instructions
At	
  i5K	
  
1.  Register	
  for	
  access	
  to	
  Apollo	
  at	
  the	
  i5K	
  Workspace@NAL	
  at	
  
h;ps://i5k.nal.usda.gov/web-­‐apollo-­‐registra'on	
  
	
  
2.  Contact	
  the	
  coordinator	
  for	
  each	
  species	
  community	
  to	
  receive	
  
more	
  informa'on	
  about	
  how	
  to	
  contribute.	
  Contact	
  info	
  is	
  available	
  
on	
  each	
  organism’s	
  page.	
  	
  
PUBLIC DEMO
98 | 98	
APOLLO ON THE WEB

instructions
Public	
  Honey	
  bee	
  demo	
  available	
  at:	
  	
  
h;p://GenomeArchitect.org/WebApolloDemo	
  	
  
	
  
Username:	
  
demo@demo.com	
  
	
  
Password:	
  
demo	
  
APOLLO

demonstration
PUBLIC DEMO 99
Demonstra'on	
  video	
  is	
  available	
  at	
  	
  
h;ps://youtu.be/VgPtAP_fvxY	
  
OUTLINE

Apollo	
  Collabora've	
  Cura'on	
  and	
  	
  
Interac've	
  Analysis	
  of	
  Genomes	
  
100OUTLINE
•  BIO-­‐REFRESHER	
  
biological	
  concepts	
  for	
  cura'on	
  
•  ANNOTATION	
  
automa'c	
  predic'ons	
  
•  MANUAL	
  ANNOTATION	
  
necessary,	
  collabora've	
  
	
  
•  APOLLO	
  
advancing	
  collabora've	
  cura'on	
  
	
  
•  EXAMPLE	
  
demos	
  
Apollo Development
Nathan Dunn
Eric Yao
Christine Elsik’s Lab,
University of Missouri
Suzi Lewis
Principal Investigator
BBOP
Moni Munoz-Torres Colin DieshDeepak Unni
JBrowse. Ian Holmes’ Lab
University of California, Berkeley
•  Berkeley Bioinformatics Open-source Projects (BBOP),
Berkeley Lab: Apollo and Gene Ontology teams.
Suzanna E. Lewis (PI).
•  § Christine G. Elsik (PI). University of Missouri.
•  * Ian Holmes (PI). University of California Berkeley.
•  Arthropod genomics community & i5K Steering
Committee.
•  Stephen Ficklin, GenSAS, Washington State University
•  Apollo is supported by NIH grants 5R01GM080203
from NIGMS, and 5R01HG004483 from NHGRI. Also
supported by the Director, Office of Science, Office of
Basic Energy Sciences, of the U.S. Department of
Energy under Contract No. DE-AC02-05CH11231
•  For your attention, thank you!
Apollo
Nathan Dunn
Colin Diesh §
Deepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Learn more about Apollo at http://GenomeArchitect.org
Thank you!
NAL at USDA
Monica Poelchau
Mei-Ju Chen
Christopher Childers
Gary Moore
HGSC at BCM
fringy Richards
Kim Worley
JBrowse Eric Yao *
Introduction to Apollo for i5k

More Related Content

What's hot

Gene expression profiling
Gene expression profilingGene expression profiling
Gene expression profiling
PriyankaPriyanka63
 
Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methods
sonamyadav82
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
Monica Munoz-Torres
 
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
Sucheta Tripathy
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
Amit Ruchi Yadav
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
Dhananjay Desai
 
Est database
Est databaseEst database
Est database
Amit Ruchi Yadav
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
Amity university, Noida
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
Shifa Ansari
 
Bioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-statBioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-stat
BioinformaticsInstitute
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
Vijay Hemmadi
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
Rai University
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Afra Fathima
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
COST action BM1006
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
BioinformaticsInstitute
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
MugdhaSharma11
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
Denis C. Bauer
 
Genome assembly
Genome assemblyGenome assembly
Genome analysis2
Genome analysis2Genome analysis2
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
Paul Gardner
 

What's hot (20)

Gene expression profiling
Gene expression profilingGene expression profiling
Gene expression profiling
 
Differential gene profiling methods
Differential gene profiling methodsDifferential gene profiling methods
Differential gene profiling methods
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120Tyler functional annotation thurs 1120
Tyler functional annotation thurs 1120
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Est database
Est databaseEst database
Est database
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Bioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-statBioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-stat
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 

Viewers also liked

Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Monica Munoz-Torres
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
Monica Munoz-Torres
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Monica Munoz-Torres
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
Keith Bradnam
 
Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015
Monica Munoz-Torres
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
Scott Dawson
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
Monica Munoz-Torres
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
Monica Munoz-Torres
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
Keith Bradnam
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
Aureliano Bombarely
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
Mike Hucka
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
Karan Veer Singh
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo Workshop
Monica Munoz-Torres
 
Overview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsOverview of Genome Assembly Algorithms
Overview of Genome Assembly Algorithms
Ntino Krampis
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
c.titus.brown
 
Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
Tajammal Daultana
 

Viewers also liked (16)

Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
 
Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo Workshop
 
Overview of Genome Assembly Algorithms
Overview of Genome Assembly AlgorithmsOverview of Genome Assembly Algorithms
Overview of Genome Assembly Algorithms
 
2013 hmp-assembly-webinar
2013 hmp-assembly-webinar2013 hmp-assembly-webinar
2013 hmp-assembly-webinar
 
Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 

Similar to Introduction to Apollo for i5k

Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
Monica Munoz-Torres
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
Monica Munoz-Torres
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
Monica Munoz-Torres
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
hansjansen9999
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
Monica Munoz-Torres
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
Cherry
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.
Monica Munoz-Torres
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
Bioinformatics and Computational Biosciences Branch
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
Monica Munoz-Torres
 
Genome editing with engineered nucleases
Genome editing with engineered nucleasesGenome editing with engineered nucleases
Genome editing with engineered nucleases
Krishan Kumar
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
Naima Tahsin
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
Monica Munoz-Torres
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
Rajendra K Labala
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
Dan Gaston
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Golden Helix Inc
 
Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.
Monica Munoz-Torres
 
Plant functionalgenomics
Plant functionalgenomicsPlant functionalgenomics
Plant functionalgenomics
Clifford Stone
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
SABYASACHISAHU10
 
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
Sean Davis
 

Similar to Introduction to Apollo for i5k (20)

Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
Structural annotation................pptx
Structural annotation................pptxStructural annotation................pptx
Structural annotation................pptx
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
Genome editing with engineered nucleases
Genome editing with engineered nucleasesGenome editing with engineered nucleases
Genome editing with engineered nucleases
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.Introduction to Web Apollo for the i5K pilot species.
Introduction to Web Apollo for the i5K pilot species.
 
Plant functionalgenomics
Plant functionalgenomicsPlant functionalgenomics
Plant functionalgenomics
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
 

More from Monica Munoz-Torres

Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionality
Monica Munoz-Torres
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
Monica Munoz-Torres
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
Monica Munoz-Torres
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGR
Monica Munoz-Torres
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Monica Munoz-Torres
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunity
Monica Munoz-Torres
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
Monica Munoz-Torres
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
Monica Munoz-Torres
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05
Monica Munoz-Torres
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Monica Munoz-Torres
 

More from Monica Munoz-Torres (10)

Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionality
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGR
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunity
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 

Recently uploaded

Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
Sérgio Sacani
 
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDEANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
RanjithaSL
 
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
khadija07kubra
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
Introduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdfIntroduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdf
kaavyashreegoskula
 
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdfGametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
SELF-EXPLANATORY
 
Computer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterizationComputer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterization
souravpaul769171
 
MCQ in Electrostatics. for class XII pptx
MCQ in Electrostatics. for class XII  pptxMCQ in Electrostatics. for class XII  pptx
MCQ in Electrostatics. for class XII pptx
ArunachalamM22
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Sérgio Sacani
 
GIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptxGIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptx
muralinath2
 
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
Sharon Liu
 
Summer program introduction in Yunnan university
Summer program introduction in Yunnan universitySummer program introduction in Yunnan university
Summer program introduction in Yunnan university
Hayato Shimabukuro
 
Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...
Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...
Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...
University of Maribor
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
PANDURANGLAWATE1
 
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
University of Maribor
 
A slightly oblate dark matter halo revealed by a retrograde precessing Galact...
A slightly oblate dark matter halo revealed by a retrograde precessing Galact...A slightly oblate dark matter halo revealed by a retrograde precessing Galact...
A slightly oblate dark matter halo revealed by a retrograde precessing Galact...
Sérgio Sacani
 
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptxBragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Lisandro Cunci
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
Thane Heins
 
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptxsmallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
muralinath2
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Gurjant Singh
 

Recently uploaded (20)

Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
 
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDEANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
 
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
2. Osmotic pressure, osmotic potential, turgor pressure, wall pressure, water...
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
Introduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdfIntroduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdf
 
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdfGametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
Gametogenesis: Male gametes Formation Process / Spermatogenesis .pdf
 
Computer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterizationComputer aided biopharmaceutical characterization
Computer aided biopharmaceutical characterization
 
MCQ in Electrostatics. for class XII pptx
MCQ in Electrostatics. for class XII  pptxMCQ in Electrostatics. for class XII  pptx
MCQ in Electrostatics. for class XII pptx
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
 
GIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptxGIT hormones- II_12345677809876543235780963.pptx
GIT hormones- II_12345677809876543235780963.pptx
 
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
20240710 ACMJ Diagrams Set 3.docx . Apache, Csharp, Mysql, Javascript stack a...
 
Summer program introduction in Yunnan university
Summer program introduction in Yunnan universitySummer program introduction in Yunnan university
Summer program introduction in Yunnan university
 
Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...
Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...
Modelling, Simulation, and Computer-aided Design in Computational, Evolutiona...
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
 
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
 
A slightly oblate dark matter halo revealed by a retrograde precessing Galact...
A slightly oblate dark matter halo revealed by a retrograde precessing Galact...A slightly oblate dark matter halo revealed by a retrograde precessing Galact...
A slightly oblate dark matter halo revealed by a retrograde precessing Galact...
 
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptxBragg Brentano Alignment for D4 with LynxEye Rev3.pptx
Bragg Brentano Alignment for D4 with LynxEye Rev3.pptx
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
 
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptxsmallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
smallintestinedisorders-causessymptoms-240626051934-b669b27d.pptx
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
 

Introduction to Apollo for i5k

  • 1. Introduction to Apollo Collaborative genome annotation editing A webinar for the i5K Research Community - Hemiptera Monica Munoz-Torres | @monimunozto Berkeley Bioinformatics Open-Source Projects (BBOP) Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory i5k Pilot Project Species Calls | 9 February, 2016 http://GenomeArchitect.org
  • 2. Outline •  Today you will discover effective ways to extract valuable information about a genome through curation efforts. Apollo  Collabora've  Cura'on  and     Interac've  Analysis  of  Genomes  
  • 3. After this talk you will... •  Better understand ‘curation’ in the context of genome annotation: assembled genome à automated annotation à manual annotation •  Become familiar with Apollo’s environment and functionality. •  Learn to identify homologs of known genes of interest in your newly sequenced genome. •  Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.
  • 4. Experimental design, sampling. Comparative analyses Official / Merged Gene Set Manual Annotation Automated Annotation Sequencing Assembly Synthesis & dissemination. This is our focus.
  • 5. We must care about curation Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild The gene set of an organism informs a variety of studies: •  Characterization: Gene number, GC%, TEs, repeats. •  Functional assignments. •  Molecular evolution, sequence conservation. •  Gene families. •  Metabolic pathways. •  What makes an organism what it is? What makes a bee a “bee”?
  • 6. Genome Curation Identifies elements that best represent the underlying biology and eliminates elements that reflect systemic errors of automated analyses. Assigns function through comparative analysis of similar genome elements from closely related species using literature, databases, and experimental data. Apollo Gene Ontology Resources
  • 7. A few things to remember
 when conducting manual annotation To  remember…  Biological  concepts  to  be;er   understand  manual  annota'on   7BIO-REFRESHER •  KEEP  A  GLOSSARY  HANDY     from  con$g  to  splice  site     •  WHAT  IS  A  GENE?   defining  your  goal   •  TRANSCRIPTION   mRNA  in  detail     •  TRANSLATION   reading  frames,  etc.   •  GENOME  CURATION   steps  involved  
  • 8. The gene: a “moving target” “The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.” Gerstein et al., 2007. Genome Res
  • 9. 9 "Gene structure" by Daycd- Wikimedia Commons BIO-REFRESHER mRNA •  Although of brief existence, understanding mRNAs is crucial, as they will become the center of your work.
  • 10. 10BIO-REFRESHER Reading frames v  In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF) •  ORF = Start signal + coding sequence (divisible by 3) + Stop signal
  • 11. 11BIO-REFRESHER Splice sites v  The spliceosome catalyzes the removal of introns and the ligation of flanking exons. v  Splicing signals (from the point of view of an intron): •  One splice signal (site) on the 5’ end: usually GT (less common: GC) •  And a 3’ end splice site: usually AG •  Canonical splice sites look like this: …]5’-GT/AG-3’[…
  • 12. 12BIO-REFRESHER Exons and Introns v  Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons v  Between the first and second nucleotide of a codon v  Or between the second and third nucleotide of a codon "Exon and Intron classes”. Licensed under Fair use via Wikipedia
  • 14. 14GENE PREDICTION & ANNOTATION PREDICTION & ANNOTATION v  Iden'fica'on  and  annota'on  of  genome  features:     •  primarily  focuses  on  protein-­‐coding  genes.     •  also  iden'fies  RNAs  (tRNA,  rRNA,  long  and  small  non-­‐coding   RNAs  (ncRNA)),  regulatory  mo'fs,  repe''ve  elements,  etc.     •  happens  in  2  phases:   1.  Computa'on  phase     2.  Annota'on  phase  
  • 15. 15GENE PREDICTION & ANNOTATION COMPUTATION PHASE a.  Experimental  data  are  aligned  to  the  genome:  expressed  sequence  tags,   RNA-­‐sequencing  reads,  proteins  (also  from  other  species).             b.  Gene  predic;ons  are  generated:      -­‐  ab  ini$o:  based  on  nucleo'de  sequence  and  composi'on    e.g.  Augustus,  GENSCAN,  geneid,  fgenesh,  etc.    -­‐  evidence-­‐driven:  iden'fying  also  domains  and  mo'fs    e.g.  SGP2,  JAMg,  fgenesh++,  etc.       Result:  the  single  most  likely  coding  sequence,  no  UTRs,  no  isoforms.   Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
  • 16. 16GENE PREDICTION & ANNOTATION ANNOTATION PHASE Experimental  data  (evidence)  and  predic'ons  are  synthe'zed  into  gene   annota'ons.     Result:  gene  models  that  generally  include  UTRs,  isoforms,  evidence  trails.   Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174 5’  UTR   3’  UTR  
  • 17. 17 In  some  cases  algorithms  and  metrics  used  to  generate   consensus  sets  may  actually  reduce  the  accuracy  of  the  gene’s   representa'on.   CONSENSUS GENE SETS Gene  models  may  be  organized  into  sets  using:   v  combiners  for  automa'c  integra'on  of  predicted  sets     e.g:  GLEAN,  EvidenceModeler   or   v  tools  packaged  into  pipelines   e.g:  MAKER,  PASA,  Gnomon,  Ensembl,  etc.   GENE PREDICTION & ANNOTATION
  • 18. ANNOTATION
 needs some refinement No one is perfect, least of all automated annotation. 18 New  technologies  bring  new  challenges:     •  Assembly  errors  can  cause  fragmented   annota'ons   •  Limited  coverage  makes  precise   iden'fica'on  a  difficult  task   Image: www.BroadInstitute.org
  • 19. MANUAL ANNOTATION
 improving predictions Precise  elucida;on  of  biological  features   encoded  in  the  genome  requires  careful   examina;on  and  review.     Schiex  et  al.  Nucleic  Acids  2003  (31)  13:  3738-­‐3741   Automated Predictions Experimental Evidence Manual Annotation – to the rescue. 19 cDNAs,  HMM  domain  searches,  RNAseq,   genes  from  other  species.  
  • 20. GENOME CURATION
 an inherently collaborative task GENE PREDICTION & ANNOTATION 20 So  many  sequences,  not  enough  hands.   Apis  mellifera  |  Alexander  Wild  |  www.alexanderwild.com  
  • 21. We have provided continuous training and support for hundreds of geographically dispersed scientists to conduct manual annotations efforts in order to recover coding sequences in agreement with all available biological evidence. 21 Lessons learned APOLLO •  Collaborative work distills invaluable knowledge. •  A little training goes a long way! Wet lab scientists can easily learn to maximize the generation of accurate, biologically supported gene models.
  • 23. APOLLO: versatile genome annotation editing •  Apollo is a web-based genome annotation editor, integrated with JBrowse •  Supports real time collaboration & generates analysis-ready data USER-CREATED ANNOTATIONS EVIDENCE TRACKS ANNOTATOR PANEL
  • 24. BECOMING ACQUAINTED WITH APOLLO 24 General process of curation 1.  Select  or  find  a  region  of  interest,  e.g.  scaffold.   2.  Select  appropriate  evidence  tracks  to  review  the  gene  model.   3.  Determine  whether  a  feature  in  an  exis'ng  evidence  track   will  provide  a  reasonable  gene  model  to  start  working.   4.  If  necessary,  adjust  the  gene  model.   5.  Check  your  edited  gene  model  for  integrity  and  accuracy  by   comparing  it  with  available  homologs.   6.  Comment  and  finish.  
  • 25. Apollo - version at i5K Workspace@NAL 254. Becoming Acquainted with Web Apollo. 25 The  Sequence  Selec'on  Window  
  • 26. Sort Apollo - version at i5K Workspace@NAL 26 “Old  Track  Select  Page”   4. Becoming Acquainted with Web Apollo. 26
  • 27. 27 APOLLO
 annotation editing environment BECOMING ACQUAINTED WITH APOLLO Color  by  CDS  frame,   toggle  strands,  set  color   scheme  and  highlights.   -­‐  Upload  evidence  files   (GFF3,  BAM,  BigWig),   -­‐  combina;on  track     -­‐  sequence  search  track   Query  the  genome  using   BLAT.   Naviga'on  and  zoom.   Search  for  a  gene   model  or  a  scaffold.   Get  coordinates  and  “rubber   band”  selec'on  for  zooming.   Login   User-­‐created   annota'ons.   New   annotator   panel.   Evidence   Tracks   Stage  and   cell-­‐type   specific   transcrip'on   data.    h;p://genomearchitect.org/web_apollo_user_guide    
  • 28. 28 | 28 BECOMING ACQUAINTED WITH APOLLO USER NAVIGATION Annotator   panel.   •  Choose  appropriate  evidence  from  list  of  “Tracks”  on  annotator  panel.       •  Select  &  drag  elements  from  evidence  track  into  the  ‘User-­‐created  Annota$ons’  area.     •  Hovering  over  annota'on  in  progress  brings  up  an  informa'on  pop-­‐up.   •  Crea'ng  a  new  annota'on  
  • 29. Adding a gene model
  • 30. Adding a gene model
  • 31. Adding a gene model
  • 33. Editing functionality Example: Adding an exon supported by experimental data •  RNAseq reads show evidence in support of a transcribed product that was not predicted. •  Add exon by dragging up one of the RNAseq reads.
  • 34. Editing functionality Example: Adjusting exon boundaries supported by experimental data
  • 36. 36 | 36 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  ‘Zoom  to  base  level’  reveals  the  DNA  Track.  
  • 37. 37 | 37 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Color  exons  by  CDS  from  the  ‘View’  menu.  
  • 38. 38 | Zoom  in/out  with  keyboard:   shio  +  arrow  keys  up/down   38 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Toggle  reference  DNA  sequence  and  transla;on  frames  in  forward   strand.  Toggle  models  in  either  direc'on.  
  • 40. “Simple  case”:      -­‐  the  predicted  gene  model  is  correct  or  nearly  correct,  and      -­‐  this  model  is  supported  by  evidence  that  completely  or  mostly   agrees  with  the  predic'on.      -­‐  evidence  that  extends  beyond  the  predicted  model  is  assumed   to  be  non-­‐coding  sequence.       The  following  are  simple  modifica'ons.       40 ANNOTATING SIMPLE CASES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 41. •  A  confirma'on  box  will  warn  you  if  the  receiving  transcript  is  not  on  the   same  strand  as  the  feature  where  the  new  exon  originated.     •  Check  ‘Start’  and  ‘Stop’  signals  aoer  each  edit.   41 ADDING EXONS BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 42. If  transcript  alignment  data  are  available  &  extend  beyond  your  original  annota'on,     you  may  extend  or  add  UTRs.     1.  Right  click  at  the  exon  edge  and  ‘Zoom  to  base  level’.     2.  Place  the  cursor  over  the  edge  of  the  exon  un$l  it  becomes  a  black  arrow  then  click   and  drag  the  edge  of  the  exon  to  the  new  coordinate  posi'on  that  includes  the  UTR.     42 ADDING UTRs To  add  a  new  spliced  UTR  to  an  exis'ng     annota'on  also  follow  the  procedure  for  adding  an  exon.   BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 43. To  modify  an  exon  boundary  and  match   data   in   the   evidence   tracks:   select   both   the   [offending]   exon   and   the   feature  with  the  expected  boundary,   then  right  click  on  the  annota'on  to   select  ‘Set  3’  end’  or  ‘Set  5’  end’  as   appropriate.     In  some  cases  all  the  data  may  disagree  with  the  annota'on,  in   other  cases  some  data  support  the  annota'on  and  some  of  the   data  support  one  or  more  alterna've  transcripts.  Try  to  annotate   as  many  alterna've  transcripts  as  are  well  supported  by  the  data.   43 MATCHING EXON BOUNDARY TO EVIDENCE BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 44. Non-­‐canonical  splice  sites  flags.   Double  click:  selec'on  of   feature  and  sub-­‐features   Evidence  Tracks  Area   ‘User-­‐created  Annota$ons’  Track   Edge-­‐matching   Apollo’s  edi'ng  logic  (brain):     §  selects  longest  ORF  as  CDS   §  flags  non-­‐canonical  splice  sites   44 ORFs AND SPLICE SITES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 45. Non-­‐canonical  splices  are  indicated  by   an   orange   circle   with   a   white   exclama'on  point  inside,  placed  over   the  edge  of  the  offending  exon.     Canonical  splice  sites:   3’-­‐…exon]GA  /  TG[exon…-­‐5’   5’-­‐…exon]GT  /  AG[exon…-­‐3’   reverse  strand,  not  reverse-­‐complemented:   forward  strand   45 SPLICE SITES Zoom  to  review  non-­‐canonical   splice  site  warnings.  Although   these  may  not  always  have  to  be   corrected  (e.g  GC  donor),  they   should  be  flagged  with  a   comment.     Exon/intron  splice  site  error  warning   Curated  model   BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 46. Apollo  calculates  the  longest  possible  open  reading   frame  (ORF)  that  includes  canonical  ‘Start’  and   ‘Stop’  signals  within  the  predicted  exons.     If  ‘Start’  appears  to  be  incorrect,  modify  it  by  selec'ng   an  in-­‐frame  ‘Start’  codon  further  up  or   downstream,  depending  on  evidence  (proteins,   RNAseq).       It  may  be  present  outside  the  predicted  gene   model,  within  a  region  supported  by  another   evidence  track.     In  very  rare  cases,  the  actual  ‘Start’  codon  may  be   non-­‐canonical  (non-­‐ATG).     46 ‘Start’ AND ‘Stop’ SITES BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 47. 1.  Two  exons  from  different  tracks  sharing  the  same  start/end  coordinates   display  a  red  bar  to  indicate  matching  edges.   2.  Selec'ng  the  whole  annota'on  or  one  exon  at  a  'me,  use  this  edge-­‐ matching  func'on  and  scroll  along  the  length  of  the  annota'on,   verifying  exon  boundaries  against  available  data.     Use  square  [  ]  brackets  to  scroll  from  exon  to  exon.   User  curly  {  }  brackets  to  scroll  from  annota'on  to  annota'on.   3.  Check  if  cDNA  /  RNAseq  reads  lack  one  or  more  of  the  annotated  exons   or  include  addi'onal  exons.     47 CHECKING EXON INTEGRITY BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
  • 49. Evidence  may  support  joining  two  or  more  different  gene  models.     Warning:  protein  alignments  may  have  incorrect  splice  sites  and  lack  non-­‐conserved  regions!     1.  In  ‘User-­‐created  Annota<ons’  area  shio-­‐click  to  select  an  intron  from  each  gene  model  and   right  click  to  select  the  ‘Merge’  op'on  from  the  menu.     2.  Drag  suppor'ng  evidence  tracks  over  the  candidate  models  to  corroborate  overlap,  or   review  edge  matching  and  coverage  across  models.   3.  Check  the  resul'ng  transla'on  by  querying  a  protein  database  e.g.  UniProt,  NCBI  nr.  Add   comments  to  record  that  this  annota'on  is  the  result  of  a  merge.   49 Red  lines  around  exons:   ‘edge-­‐matching’  allows  annotators  to  confirm  whether  the   evidence  is  in  agreement  without  examining  each  exon  at  the   base  level.   COMPLEX CASES merge two gene predictions on the same scaffold BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 50. One  or  more  splits  may  be  recommended  when:     -­‐  different  segments  of  the  predicted  protein  align  to  two  or  more  different   gene  families     -­‐  predicted  protein  doesn’t  align  to  known  proteins  over  its  en're  length   -­‐  Transcript  data  may  support  a  split,  but  first  verify  whether  they  are   alterna've  transcripts.     50 COMPLEX CASES split a gene prediction BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 51. DNA  Track   ‘User-­‐created  Annota;ons’  Track   51 COMPLEX CASES annotate frameshifts and correct single-base errors Always  remember:  when  annota'ng  gene  models  using  Apollo,  you  are  looking  at  a  ‘frozen’  version  of   the  genome  assembly  and  you  will  not  be  able  to  modify  the  assembly  itself.   BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 52. 52 COMPLEX CASES correcting selenocysteine containing proteins BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 53. 53 COMPLEX CASES correcting selenocysteine containing proteins BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 54. 1.  Apollo  allows  annotators  to  make  single  base  modifica'ons  or  frameshios  that  are  reflected  in   the  sequence  and  structure  of  any  transcripts  overlapping  the  modifica'on.  These   manipula'ons  do  NOT  change  the  underlying  genomic  sequence.     2.  If  you  determine  that  you  need  to  make  one  of  these  changes,  zoom  in  to  the  nucleo'de  level   and  right  click  over  a  single  nucleo'de  on  the  genomic  sequence  to  access  a  menu  that   provides  op'ons  for  crea'ng  inser'ons,  dele'ons  or  subs'tu'ons.     3.  The  ‘Create  Genomic  Inser<on’  feature  will  require  you  to  enter  the  necessary  string  of   nucleo'de  residues  that  will  be  inserted  to  the  right  of  the  cursor’s  current  loca'on.  The   ‘Create  Genomic  Dele<on’  op'on  will  require  you  to  enter  the  length  of  the  dele'on,  star'ng   with  the  nucleo'de  where  the  cursor  is  posi'oned.  The  ‘Create  Genomic  Subs<tu<on’  feature   asks  for  the  string  of  nucleo'de  residues  that  will  replace  the  ones  on  the  DNA  track.   4.  Once  you  have  entered  the  modifica'ons,  Apollo  will  recalculate  the  corrected  transcript  and   protein  sequences,  which  will  appear  when  you  use  the  right-­‐click  menu  ‘Get  Sequence’   op'on.  Since  the  underlying  genomic  sequence  is  reflected  in  all  annota'ons  that  include  the   modified  region  you  should  alert  the  curators  of  your  organisms  database  using  the   ‘Comments’  sec'on  to  report  the  CDS  edits.     5.  In  special  cases  such  as  selenocysteine  containing  proteins  (read-­‐throughs),  right-­‐click  over  the   offending/premature  ‘Stop’  signal  and  choose  the  ‘Set  readthrough  stop  codon’  op'on  from   the  menu.     54 COMPLEX CASES annotating frameshifts and correcting single-base errors & selenocysteines BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
  • 55. 55 | 55 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Information Editor
  • 56. 56 The  Annota'on  Informa;on  Editor   56 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  • 57. 57 The  Annota'on  Informa;on  Editor   •  Add  PubMed  IDs   •  Include  GO  terms  as  appropriate   from  any  of  the  three  ontologies   •  Write  comments  sta'ng  how  you   have  validated  each  model.   57 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  • 58. 58 | 58 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO •  Keeping track of each edit
  • 59. 59 Annota'ons,  annota'on  edits,  and  History:  stored  in  a  centralized  database.   59 USER NAVIGATION BECOMING ACQUAINTED WITH APOLLO
  • 60. Follow  the  checklist  un'l  you  are  happy  with  the  annota'on!   And  remember  to…   –  comment  to  validate  your  annota'on,  even  if  you  made  no  changes  to  an   exis'ng  model.  Think  of  comments  as  your  vote  of  confidence.     –  or  add  a  comment  to  inform  the  community  of  unresolved  issues  you   think  this  model  may  have.   60 | 60 Always  Remember:  Apollo  cura'on  is  a  community  effort  so  please   use  comments  to  communicate  the  reasons  for  your     annota'on.  Your  comments  will  be  visible  to  everyone.   COMPLETING THE ANNOTATION BECOMING ACQUAINTED WITH APOLLO
  • 62. •  Check  ‘Start’  and  ‘Stop’  sites.   •  Check    splice  sites:  most  splice  sites  display   these  residues  …]5’-­‐GT/AG-­‐3’[…   •  Check  if  you  can  annotate  UTRs,  for  example   using  RNA-­‐Seq  data:   – align  it  against  relevant  genes/gene  family   – blastp  against  NCBI’s  RefSeq  or  nr   •  Check  for  gaps  in  the  genome.   •  Addi'onal  func'onality  may  be  necessary:   – merging  2  gene  predic'ons  -­‐  same  scaffold   – merging  2  gene  predic'ons  -­‐  different   scaffolds     – spli`ng  a  gene  predic'on   – annota'ng  frameshias   – annota'ng  selenocysteines,  correc'ng   single-­‐base  and  other  assembly  errors,  etc.   62 | 62 •  Add:   –  Important  project  informa'on  in  the  form  of   comments   –  IDs  from  public  databases  e.g.  GenBank  (via   DBXRef),  gene  symbol(s),  common  name(s),   synonyms,  top  BLAST  hits,  orthologs  with   species  names,  and  everything  else  you  can   think  of,  because  you  are  the  expert.   –  Comments  about  the  kinds  of  changes  you   made  to  the  gene  model  of  interest,  if  any.     –  Any  appropriate  func'onal  assignments,  e.g.   via  BLAST,  RNA-­‐Seq  data,  literature  searches,   etc.   CHECKLIST for accuracy and integrity MANUAL ANNOTATION CHECKLIST
  • 64. 64i5K Workspace@NAL The collaborative curation process at i5k 1.  A  computa'onally  predicted  consensus  gene  set  has  been  generated   using  mul'ple  lines  of  evidence;  e.g.  HVIT_v0.5.3-­‐Models     2.  i5K  Projects  will  integrate  consensus  computa'onal  predic'ons  with   manual  annota'ons  to  produce  an  updated  Official  Gene  Set  (OGS):   Achtung!   •  If  it’s  not  on  either  track,  it  won’t  make  the  OGS!   •  If  it’s  there  and  it  shouldn’t,  it  will  s'll  make  the  OGS!  
  • 65. 65 The ‘Replace Models’ rules 65 BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace
  • 66. 66i5K Workspace@NAL 3.  In  some  cases  algorithms  and  metrics  used  to  generate  consensus  sets   may  actually  reduce  the  accuracy  of  the  gene’s  representa'on.  Use  your   judgment,  try  choosing  a  different  model  to  begin  the  annota'on.   4.  Isoforms:  drag  original  and  alterna'vely  spliced  form  to  ‘User-­‐created   Annota<ons’  area.   5.  If  an  annota'on  needs  to  be  removed  from  the  consensus  set,  drag  it  to   the  ‘User-­‐created  Annota<ons’  area  and  label  as  ‘Delete’  on  the   Informa$on  Editor.   6.  Overlapping  interests?  Collaborate  to  reach  agreement.   7.  Follow  guidelines  for  i5K  Pilot  Species  Projects,  at  h;p://goo.gl/LRu1VY   The collaborative curation process at i5k
  • 68. What’s new?... 
 finding inspiration in PubMed. Example 68 “Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/ or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.” Homalodisca vitripennis | Alexander Wild | www.alexanderwild.comHalyomorpha halys | Fondazione Edmund Mach - Italy Now for our species of interest. . .
  • 69. Example Example 69  Cura'on  example  using  the  Hyalella  azteca   genome  (amphipod  crustacean).  
  • 70. What do we know about this genome? •  Currently  publicly  available  data  at  NCBI:   •  >37,000    nucleo'de  seqsà  scaffolds,  mitochondrial  genes   •  344    amino  acid  seqsà  mitochondrion   •  47    ESTs   •  0      conserved  domains  iden'fied   •  0    “gene”  entries  submi;ed     •  Data  at  i5K  Workspace@NAL  (annota'on  hosted  at  USDA)     -­‐  10,832  scaffolds:  23,288  transcripts:  12,906  proteins   Example 70
  • 71. PubMed Search: 
 what’s new? Example 71
  • 72. PubMed Search: what’s new? Example 72 “Ten  popula'ons  (3  cultures,  7  from  California  water   bodies)  differed  by  at  least  550-­‐fold  in  sensi;vity  to   pyrethroids.”     “By  sequencing  the  primary  pyrethroid  target  site,  the   voltage-­‐gated  sodium  channel  (vgsc),  we  show  that   point  muta'ons  and  their  spread  in  natural  popula'ons   were  responsible  for  differences  in  pyrethroid   sensi'vity.”   “The  finding  that  a  non-­‐target  aqua'c  species  has   acquired  resistance  to  pes'cides  used  only  on  terrestrial   pests  is  troubling  evidence  of  the  impact  of  chronic   pes;cide  transport  from  land-­‐based  applica'ons  into   aqua'c  systems.”  
  • 73. How many sequences are there, publicly available, for our gene of interest? Example 73 •  Para,  (voltage-­‐gated  sodium  channel  alpha   subunit;  Nasonia  vitripennis).     •  NaCP60E  (Sodium  channel  protein  60  E;  D.   melanogaster).   –  MF:  voltage-­‐gated  ca'on  channel  ac'vity   (IDA,  GO:0022843).   –  BP:  olfactory  behavior  (IMP,  GO: 0042048),  sodium  ion  transmembrane   transport  (ISS,GO:0035725).   –  CC:  voltage-­‐gated  sodium  channel   complex  (IEA,  GO:0001518).   And  what  do  we  know  about  them?  
  • 74. Retrieving sequences for a 
 sequence similarity search. Example 74 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 75. BLAT search
 
 input   Example 75 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 76. BLAT search
 
 results   Example 76 •  High-­‐scoring  segment  pairs  (hsp)   are  listed  in  tabulated  format.   •  Clicking  on  one  line  of  results   sends  you  to  those  coordinates.  
  • 77. BLAST at i5K 
 heps://i5k.nal.usda.gov/blast Example 77 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 78. BLAST at i5K 
 heps://i5k.nal.usda.gov/blast   Example 78
  • 79. BLAST at i5K: hsps  in  “BLAST+  Results”  track   Example 79
  • 80. Creating a new gene model: drag and drop Example 80 •  Apollo  automa'cally  calculates  longest  ORF.     •  In  this  case,  ORF  includes  the  high-­‐scoring  segment  pairs  (hsp),   marked  here  in  blue.   •  Note  that  gene  is  transcribed  from  reverse  strand.  
  • 83. Also, flanking sequences (other gene models) vs. NCBI nr Example 83 In  this  case,  two  gene   models  upstream,  at  5’   end.   BLAST  hsps  
  • 84. Review alignments Example 84 HaztTmpM006234   HaztTmpM006233   HaztTmpM006232  
  • 85. Hypothesis for vgsc gene model Example 85
  • 86. Editing: merge the three models Example 86 Merge  by  dropping  an   exon  or  gene  model   onto  another.   Merge  by  selec'ng   two  exons  (holding   down  “Shio”)  and   using  the  right  click   menu.   or…  
  • 87. Result of merging the gene models: Example 87
  • 88. Editing: correct offending splice site Example 88 Modify  exon  /  intron   boundary:     -­‐  Drag  the  end  of  the   exon  to  the  nearest   canonical  splice  site.     or     -­‐  Use  right-­‐click  menu.  
  • 89. Editing: set translation start Example 89
  • 90. Editing: delete exon not supported by evidence Example 90 Delete  first  exon  from   HaztTmpM006233  
  • 91. Editing: add an exon supported by RNAseq Example 91 •  RNAseq  reads  show  evidence  in  support  of  transcribed  product,  which  was  not  predicted.   •  Add  exon  at  coordinates  97946-­‐98012  by  dragging  up  one  of  the  RNAseq  reads.  
  • 92. Editing: adjust offending splice site using evidence Example 92
  • 93. Editing: adjust other boundaries supported by evidence Example 93
  • 94. Finished model Example 94 Corroborate  integrity  and  accuracy  of  the  model:     -­‐  Start  and  Stop   -­‐  Exon  structure  and  splice  sites  …]5’-­‐GT/AG-­‐3’[…   -­‐  Check  the  predicted  protein  product  vs.  NCBI  nr,  UniProt,  etc.  
  • 95. Information Editor •  DBXRefs:  e.g.  NP_001128389.1,  N.   vitripennis,  RefSeq   •  PubMed  iden'fier:  PMID:  24065824   •  Gene  Ontology  IDs:  GO:0022843,  GO: 0042048,  GO:0035725,  GO:0001518.   •  Comments   •  Name,  Symbol   •  Approve  /  Delete  radio  bu;on   Example 95 Comments   (if  applicable)  
  • 97. PUBLIC DEMO 97 | 97 APOLLO ON THE WEB
 instructions At  i5K   1.  Register  for  access  to  Apollo  at  the  i5K  Workspace@NAL  at   h;ps://i5k.nal.usda.gov/web-­‐apollo-­‐registra'on     2.  Contact  the  coordinator  for  each  species  community  to  receive   more  informa'on  about  how  to  contribute.  Contact  info  is  available   on  each  organism’s  page.    
  • 98. PUBLIC DEMO 98 | 98 APOLLO ON THE WEB
 instructions Public  Honey  bee  demo  available  at:     h;p://GenomeArchitect.org/WebApolloDemo       Username:   demo@demo.com     Password:   demo  
  • 99. APOLLO
 demonstration PUBLIC DEMO 99 Demonstra'on  video  is  available  at     h;ps://youtu.be/VgPtAP_fvxY  
  • 100. OUTLINE
 Apollo  Collabora've  Cura'on  and     Interac've  Analysis  of  Genomes   100OUTLINE •  BIO-­‐REFRESHER   biological  concepts  for  cura'on   •  ANNOTATION   automa'c  predic'ons   •  MANUAL  ANNOTATION   necessary,  collabora've     •  APOLLO   advancing  collabora've  cura'on     •  EXAMPLE   demos  
  • 101. Apollo Development Nathan Dunn Eric Yao Christine Elsik’s Lab, University of Missouri Suzi Lewis Principal Investigator BBOP Moni Munoz-Torres Colin DieshDeepak Unni JBrowse. Ian Holmes’ Lab University of California, Berkeley
  • 102. •  Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI). •  § Christine G. Elsik (PI). University of Missouri. •  * Ian Holmes (PI). University of California Berkeley. •  Arthropod genomics community & i5K Steering Committee. •  Stephen Ficklin, GenSAS, Washington State University •  Apollo is supported by NIH grants 5R01GM080203 from NIGMS, and 5R01HG004483 from NHGRI. Also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 •  For your attention, thank you! Apollo Nathan Dunn Colin Diesh § Deepak Unni § Gene Ontology Chris Mungall Seth Carbon Heiko Dietze BBOP Learn more about Apollo at http://GenomeArchitect.org Thank you! NAL at USDA Monica Poelchau Mei-Ju Chen Christopher Childers Gary Moore HGSC at BCM fringy Richards Kim Worley JBrowse Eric Yao *