SlideShare a Scribd company logo
1 of 66
Download to read offline
Compara've	
  Genomics	
  and	
  
Visualisa'on	
  –	
  Part	
  2	
  
Leighton	
  Pritchard	
  
Part	
  2	
  
l Part	
  1	
  
l  Experimental	
  Compara5ve	
  Genomics	
  
l  Bulk	
  and	
  Whole	
  Genome	
  Comparisons	
  
l Genome	
  Features	
  
l Who	
  let	
  the	
  –logues	
  out?	
  
l Finishing	
  The	
  Hat	
  
Genome	
  Features	
  
l Genes:	
  
l  transla5on	
  start	
  
l  introns	
  
l  exons	
  
l  transla5on	
  stop	
  
l  transla5on	
  terminator	
  
l ncRNA:	
  
l  tRNA	
  –	
  transfer	
  RNA	
  
l  rRNA	
  –	
  ribosomal	
  RNA	
  
l  CRISPRs	
  –	
  bacterial	
  and	
  archaeal	
  defence	
  	
  
(genome	
  edi5ng)	
  
l  many	
  other	
  classes	
  (including	
  enhancers)	
  
Genome	
  Features	
  
l Regulatory	
  sites	
  
l  Transcrip5on	
  start	
  site	
  (TSS)	
  
l  RNA	
  polymerase	
  binding	
  sites	
  
l  Transcrip5on	
  Factor	
  Binding	
  Sites	
  	
  
(TFBS)	
  
l  Core,	
  proximal	
  and	
  distal	
  promoter	
  regions	
  
l Repe''ve	
  Regions	
  and	
  Mobile	
  Elements	
  
l  Tandem	
  repeats	
  
l  (retro-­‐)transposable	
  elements	
  
„ Alu	
  has	
  ≈50,000	
  ac5ve	
  copies	
  in	
  human	
  genome	
  
l  Phage	
  inclusion	
  (bacteria/archaea)	
  
Pennacchio	
  &	
  Rubin	
  (2001)	
  Nat.	
  Rev.	
  Genet.	
  doi:10.1038/35052548	
  
human	
  v	
  mouse	
  comparison	
  
Genome	
  Feature	
  Iden'fica'on	
  
l Gene	
  Finding:	
  
1.  Empirical	
  (evidence-­‐based)	
  methods:	
  
„ Inference	
  from	
  known	
  protein/cDNA/mRNA/EST	
  sequence	
  
„ Inference	
  from	
  mapped	
  RNA	
  reads	
  
2.  Ab	
  ini*o	
  methods:	
  
„  Iden5fica5on	
  of	
  sequences	
  associated	
  with	
  gene	
  
features:	
  
ª  TSS,	
  CpG	
  islands,	
  Shine-­‐Dalgarno	
  sequence,	
  stop	
  
codons,	
  etc.	
  
3.  Inference	
  from	
  genome	
  comparisons/conserva5on	
  
Liang	
  et	
  al.	
  (2009)	
  Genome	
  Res.	
  doi:10.1101/gr.088997.108	
  
Brent	
  (2007)	
  Nat.	
  Biotech.	
  doi:10.1038/nbt0807-­‐883	
  
Korf	
  (2004)	
  BMC	
  Bioinf.	
  doi:10.1186/1471-­‐2105-­‐5-­‐59	
  
Genome	
  Feature	
  Iden'fica'on	
  
l Finding	
  Regulatory	
  Elements	
  (short,	
  degenerate):	
  
1.  Empirical	
  (evidence-­‐based)	
  methods:	
  
„ Inference	
  from	
  protein-­‐DNA	
  binding	
  experiments	
  
„ Inference	
  from	
  coexpression	
  
2.  Ab	
  ini*o	
  methods:	
  
„  Iden5fica5on	
  of	
  regulatory	
  mo5fs	
  (profile/other	
  methods):	
  
ª  TATA,	
  sigma-­‐factor	
  binding	
  sites,	
  etc.	
  
„  sta5s5cal	
  overrepresenta5on	
  
„  Iden5fica5on	
  from	
  sequence	
  proper5es	
  
3.  Inference	
  from	
  sequence	
  conserva5on/genome	
  comparisons	
  
Zhang	
  et	
  al.	
  (2011)	
  BMC	
  Bioinf.	
  doi:10.1186/1471-­‐2105-­‐12-­‐238	
  
Kilic	
  et	
  al.	
  (2013)	
  Nucl.	
  Acids	
  Res.	
  doi:10.1093/nar/gkt1123	
  
Vavouri	
  &	
  Elgar	
  (2005)	
  Curr.	
  Op.	
  Genet.	
  Devel.	
  doi:10.1016/j.gde.2005.05.002	
  
Genome	
  Feature	
  Iden'fica'on	
  
l  All	
  predic5on	
  methods	
  result	
  in	
  errors	
  
l  All	
  experiments	
  have	
  error	
  
l  Genome	
  comparisons	
  can	
  help	
  correct	
  errors	
  
l  [OPTIONAL	
  ACTIVITY]	
  –	
  useful	
  for	
  exercise	
  
l  predict_CDS.md	
  Markdown	
  
l  Other	
  op5ons	
  for	
  prokaryo5c	
  genecalling:	
  
l  Glimmer	
  (hZp://ccb.jhu.edu/soware/glimmer/index.shtml)	
  
l  GeneMarkS	
  (hZp://opal.biology.gatech.edu/)	
  
l  RAST	
  (hZp://rast.nmpdr.org/)	
  
l  BASys	
  (hZps://www.basys.ca/),	
  etc.	
  
l  Op5ons	
  for	
  eukaryo5c	
  genecalling:	
  
l  GlimmerHMM	
  (hZp://ccb.jhu.edu/soware/glimmerhmm/)	
  
l  GeneMarkES	
  (hZp://opal.biology.gatech.edu/gmseuk.html)	
  	
  
l  Augustus	
  (hZp://augustus.gobics.de/),	
  etc.	
  
Who	
  Let	
  The	
  -­‐logues	
  Out?	
  
Evolu'onary	
  rela'onships	
  of	
  genome	
  
features	
  can	
  be	
  complex.	
  
We	
  require	
  precise	
  terms	
  to	
  describe	
  
rela'onships	
  between	
  genome	
  features.	
  
Comparing	
  Gene	
  Features	
  
l Given	
  gene	
  annota5ons	
  for	
  more	
  than	
  one	
  genome,	
  how	
  
can	
  we	
  organise	
  and	
  understand	
  rela5onships?	
  
l  Func5onal	
  similarity	
  (analogy)	
  
l  Evolu5onary	
  common	
  origin	
  (homology,	
  orthology,	
  etc.)	
  
l  Evolu5onary/func5onal/family	
  rela5onships	
  (paralogy)	
  
Terms	
  first	
  suggested	
  by	
  Fitch	
  (1970)	
  Syst.	
  Zool.	
  doi:10.2307/2412448	
  
Agack	
  of	
  the	
  –logues	
  
l Technical	
  terms	
  describing	
  evolu5onary	
  rela5onships	
  	
  
l Homologues:	
  elements	
  that	
  are	
  similar	
  because	
  they	
  share	
  a	
  common	
  
ancestor	
  (NOTE:	
  There	
  are	
  NOT	
  degrees	
  of	
  homology!)	
  
l Analogues:	
  elements	
  that	
  are	
  (func5onally?)	
  similar,	
  possibly	
  through	
  
convergent	
  evolu5on	
  and	
  not	
  by	
  sharing	
  common	
  ancestry	
  
l Orthologues:	
  homologues	
  that	
  diverged	
  through	
  specia5on	
  
l Paralogues:	
  homologues	
  that	
  diverged	
  through	
  duplica5on	
  within	
  the	
  
same	
  genome	
  
l (also	
  co-­‐orthologues,	
  xenologues,	
  etc.)	
  
Agack	
  of	
  the	
  –logues	
  
'me	
  
ancestral	
  genome	
  feature	
  genome	
  
Agack	
  of	
  the	
  –logues	
  
'me	
  
specia'on	
  
ancestor:	
  iA	
  
species1:iA	
   species2:iA	
  
orthologues	
  
•  Orthologues:	
  homologues	
  that	
  diverged	
  through	
  specia5on	
  
genome	
  
Agack	
  of	
  the	
  –logues	
  
ancestral	
  copy:A	
  
'me	
  
copy	
  1:A	
   copy	
  2:A’	
  
duplica'on	
  
paralogues	
  
Paralogues:	
  homologues	
  that	
  diverged	
  through	
  duplica5on	
  within	
  
the	
  same	
  genome	
  
genome	
  
Agack	
  of	
  the	
  –logues	
  
'me	
  
specia'on	
  
ancestor:iA	
  
species1:iA	
   species2:iA	
  
species1:iA’	
   species1:iA	
   species2:iA	
  
duplica'on	
  
orthologues	
  
out-­‐paralogues	
  
in-­‐paralogues	
  
genome	
  
Agack	
  of	
  the	
  –logues	
  
'me	
  
specia'on	
  
ancestor:iA	
  
species1:iA	
   Species2:iA	
  
species1:iA’	
   species2:iA	
   species2:iA’	
  species1:iA	
  
duplica'on	
  
in-­‐paralogues	
   in-­‐paralogues	
  
out-­‐paralogues	
  
orthologues	
  
genome	
  
Agack	
  of	
  the	
  –logues	
  
l BUT:	
  biology	
  is	
  not	
  well-­‐behaved:	
  rela5onships	
  can	
  be	
  	
  
difficult	
  to	
  infer	
  
l  Gene	
  loss	
  occurs	
  
l  Homologues	
  can	
  diverge	
  –	
  some5mes	
  very	
  widely:	
  hard	
  to	
  recognise	
  
l  Reconstructed	
  evolu5onary	
  trees	
  for	
  specia5on	
  events	
  may	
  not	
  be	
  robust	
  
Kristensen	
  et	
  al.	
  (2011)	
  Brief.	
  Bioinf.	
  doi:10.1093/bib/bbr030	
  
genome	
  
extensive	
  
divergence	
  
Agack	
  of	
  the	
  –logues	
  
'me	
  
specia'on	
  
ancestor:iA	
  
species1:iA	
   Species2:iA	
  
species1:iA’	
   species2:iA	
   species2:iA’	
  species1:iA	
  
duplica'on	
  
species1:iA?	
   species1:iA	
   species2:iA?	
  
in-­‐paralogues	
  
(co-­‐)orthologues?	
  
contemporary	
  
sequence	
  
historical	
  
events	
  
out-­‐paralogues/co-­‐orthologues?	
  
Current	
  classifica'ons	
  of	
  orthology/paralogy	
  are	
  inferences	
  
Agack	
  of	
  the	
  –logues	
  
l BUT:	
  biology	
  is	
  not	
  well-­‐behaved:	
  rela5onships	
  can	
  be	
  	
  
difficult	
  to	
  infer	
  
l  Gene	
  loss	
  occurs	
  
l  Homologues	
  can	
  diverge	
  –	
  some5mes	
  very	
  widely:	
  hard	
  to	
  recognise	
  
l  Reconstructed	
  evolu5onary	
  trees	
  for	
  specia5on	
  events	
  may	
  not	
  be	
  robust	
  
l Some	
  resources	
  and	
  tools	
  ‘bend’	
  defini5ons,	
  e.g.	
  Ensembl	
  Compara	
  
and	
  OrthoMCL.	
  
hZp://www.ensembl.org/info/genome/compara/	
  homology_method.html	
  
Kristensen	
  et	
  al.	
  (2011)	
  Brief.	
  Bioinf.	
  doi:10.1093/bib/bbr030	
  
Note	
  on	
  “Orthology”	
  
l Frequently	
  abused/misused	
  as	
  a	
  term	
  
l “Orthology”	
  is	
  an	
  evolu5onary	
  rela5onship,	
  oen	
  bent	
  
into	
  service	
  as	
  a	
  func5onal	
  descriptor	
  
l Strictly	
  defined	
  only	
  for	
  two	
  species	
  or	
  clades!	
  
l  (cf.	
  OrthoMCL,	
  etc.)	
  
l Orthology	
  is	
  not	
  transi5ve	
  (A	
  is	
  orthologue	
  of	
  C	
  and	
  B	
  is	
  
orthologue	
  of	
  C	
  does	
  not	
  imply	
  A	
  is	
  an	
  orthologue	
  of	
  B)	
  
l  (cf.	
  EnsemblCompara	
  defini5ons)	
  
Storm	
  &	
  Sonnhammer	
  (2002)	
  Bioinforma@cs.	
  doi:10.1093/bioinforma'cs/18.1.92	
  
Ensembl	
  Compara	
  defini'ons	
  
l  within_species_paralog:	
  	
  
same-­‐species	
  paralogue	
  (in-­‐
paralogue)	
  
l  ortholog_one2one:
orthologue	
  
l  ortholog_one2many:	
  	
  
orthologue/paralogue	
  
rela5onship	
  
l  orthology_many2many:	
  	
  
orthologue/paralogue	
  
rela5onship	
  
Vilella	
  et	
  al.	
  (2009)	
  Genome	
  Res.	
  doi:10.1101/gr.073585.107	
  
NOTE:	
  the	
  taxonomy	
  may	
  not	
  always	
  be	
  correct…	
  
“The	
  Ortholog	
  Conjecture”	
  
Without	
  duplica'on,	
  a	
  gene	
  is	
  unlikely	
  to	
  
change	
  its	
  basic	
  func'on,	
  because	
  this	
  
would	
  lead	
  to	
  loss	
  of	
  the	
  original	
  
func'on,	
  and	
  this	
  would	
  be	
  harmful.	
  
Problems	
  with	
  the	
  Ortholog	
  Conjecture	
  
l Nehrt	
  et	
  al.	
  (2011)	
  say:	
  
l  Paralogues	
  beZer	
  predictor	
  of	
  func5on	
  than	
  orthologues	
  
„ ∴	
  conjecture	
  is	
  false!	
  
l  Cellular	
  context	
  beZer	
  for	
  protein	
  func5on	
  inference	
  
l  Func5on	
  defined	
  from	
  Gene	
  Ontology	
  (GO)	
  
Nehrt	
  et	
  al.	
  (2011)	
  PLoS	
  Comp.	
  Biol.	
  doi:10.1371/journal.pcbi.1002073	
  
Chen	
  et	
  al.	
  (2012)	
  PLoS	
  Comp.	
  Biol.	
  doi:10.1371/journal.pcbi.1002784	
  
Problems	
  with	
  the	
  Ortholog	
  Conjecture	
  
l But	
  do	
  we	
  understand	
  func5on	
  well	
  enough	
  to	
  test	
  the	
  
conjecture?	
  
l Chen	
  et	
  al.	
  (2012)	
  say:	
  “No”	
  
l  “examina5on	
  of	
  func5onal	
  studies	
  of	
  homologs	
  with	
  iden5cal	
  
protein	
  sequences	
  reveals	
  experimental	
  biases,	
  annota5on	
  errors,	
  
and	
  homology-­‐based	
  func5onal	
  inferences	
  that	
  are	
  labeled	
  in	
  GO	
  
as	
  experimental.	
  These	
  problems	
  […]	
  make	
  the	
  current	
  GO	
  
inappropriate	
  for	
  tes5ng	
  the	
  ortholog	
  conjecture”	
  
l  Expression	
  level	
  similarity	
  is	
  more	
  similar	
  for	
  orthologues	
  than	
  
paralogues	
  (but	
  is	
  this	
  “func'on”…?)	
  
Nehrt	
  et	
  al.	
  (2011)	
  PLoS	
  Comp.	
  Biol.	
  doi:10.1371/journal.pcbi.1002073	
  
Chen	
  et	
  al.	
  (2012)	
  PLoS	
  Comp.	
  Biol.	
  doi:10.1371/journal.pcbi.1002784	
  
Finding	
  “Orthologues”	
  
The	
  process	
  of	
  finding	
  evolu'onary	
  (and/
or	
  func'onal)	
  equivalents	
  of	
  genes	
  across	
  
two	
  or	
  more	
  organisms’	
  genomes.	
  
Why	
  are	
  “orthologues”	
  so	
  important?	
  
l Orthology	
  formalises	
  the	
  concept	
  of	
  corresponding	
  genes	
  	
  
across	
  mul5ple	
  organisms.	
  
l  Evolu5onary	
  
l  Func5onal?	
  (“The	
  Ortholog	
  Conjecture”)	
  
l Applica5ons	
  in:	
  
l  Compara5ve	
  genomics	
  
l  Func5onal	
  genomics	
  
l  Phylogene5cs,	
  …	
  
l Many	
  (>35)	
  databases	
  aZempt	
  to	
  describe	
  orthologous	
  rela5onships	
  
l  hZp://queskororthologs.org/orthology_databases	
  
Dessimoz	
  (2011)	
  Brief.	
  Bioinf.	
  doi:10.1093/bib/bbr057	
  
How	
  to	
  find	
  orthologues?	
  
l Many	
  published	
  methods	
  and	
  databases:	
  
l  Pairwise	
  between	
  two	
  genomes:	
  	
  
„ RBBH	
  (aka	
  BBH,	
  RBH,	
  etc.),	
  RSD,	
  InParanoid,	
  RoundUp	
  
l  Mul5-­‐genome	
  	
  
„ Graph-­‐based:	
  COG,	
  eggNOG,	
  OrthoDB,	
  OrthoMCL,	
  OMA,	
  
Mul5Paranoid	
  
„ Tree-­‐based:	
  TreeFam,	
  Ensembl	
  Compara,	
  PhylomeDB,	
  LOFT	
  
l Methods	
  may	
  apply	
  different	
  -­‐	
  or	
  refined	
  -­‐	
  defini5ons	
  of	
  
orthology,	
  paralogy,	
  etc.	
  
Salichos	
  et	
  al.	
  (2011)	
  PLoS	
  One.	
  doi:10.1371/journal.pone.0018755	
  	
  
Trachana	
  	
  et	
  al.	
  (2011)	
  Bioessays	
  doi:10.1002/bies.201100062	
  
Kristensen	
  et	
  al.	
  (2011)	
  Brief.	
  Bioinf.	
  doi:10.1093/bib/bbr030	
  
Pairwise	
  approaches	
  
l S1,	
  S2	
  are	
  the	
  gene	
  sequence	
  sets	
  from	
  two	
  organisms	
  
l Compare	
  S1	
  to	
  S2,	
  and	
  iden5fy	
  the	
  most	
  similar	
  pairs	
  of	
  
sequences:	
  these	
  are	
  “orthologues”	
  (or	
  “puta5ve	
  orthologues”).	
  
l Many	
  similarity	
  measures	
  possible	
  (which	
  threshold:	
  E-­‐value,	
  bit	
  score,	
  
coverage…?):	
  
l  Reciprocal	
  best	
  BLAST	
  hit	
  (RBBH)	
  –	
  used	
  by	
  e.g.	
  InParanoid	
  
l  Reciprocal	
  smallest	
  difference	
  (RSD)	
  –	
  used	
  by	
  e.g.	
  RoundUp	
  
l  and	
  so	
  on…	
  
l Can	
  be	
  extended	
  to	
  mul5-­‐organism	
  clusters	
  by	
  graph-­‐based	
  
approaches	
  
Östlund	
  et	
  al.	
  (2009)	
  Nuc.	
  Acids	
  Res.	
  doi:10.1093/nar/gkp931	
  	
  
DeLuca	
  	
  et	
  al.	
  (2012)	
  Bioinf.	
  doi:10.1093/bioinforma'cs/bts006	
  
Reciprocal	
  Best	
  BLAST	
  Hits	
  
l S1,	
  S2	
  are	
  the	
  gene	
  sequence	
  sets	
  from	
  two	
  organisms	
  
l BLASTP:	
  
l  Query=S1,	
  Subject=S2	
  	
  
l  Query=S2,	
  Subject=S1	
  
l Op5onally	
  filter	
  BLAST	
  hits	
  (e.g.	
  on	
  %iden5ty	
  and	
  %coverage)	
  
l Find	
  all	
  pairs	
  of	
  sequences	
  {GS1n,	
  GS2n}	
  in	
  S1,	
  S2	
  where	
  GS1n	
  is	
  the	
  best	
  
BLAST	
  match	
  to	
  GS2n	
  and	
  GS2n	
  is	
  the	
  best	
  BLAST	
  match	
  to	
  GS1n.	
  
best	
  hit	
  
best	
  hit	
   best	
  hit	
  
best	
  hit	
  
2nd	
  best	
  hit	
  
2nd	
  best	
  hit	
  
✔	
   ✘	
  
best	
  hit	
  
Reciprocal	
  Best	
  BLAST	
  Hits	
  
l Advantages:	
  	
  
l  quick	
  
l  easy	
  
l  performs	
  surprisingly	
  well	
  (see	
  later…)	
  
l Disadvantages:	
  	
  
l  misses	
  paralogues	
  
l  not	
  good	
  at	
  iden5fying	
  gene	
  families	
  or	
  *-­‐to-­‐many	
  
rela5onships	
  without	
  more	
  detailed	
  analysis.	
  	
  
l  no	
  strong	
  theore5cal/phylogene5c	
  basis.	
  
COG	
  
l COG	
  (Clusters	
  of	
  Orthologous	
  Groups;	
  now	
  POG,	
  KOG,	
  
eggNOG	
  etc.)	
  
l Graph	
  extension	
  of	
  RBBH	
  to	
  clusters	
  of	
  mutual	
  RBBH	
  
l  “Any	
  group	
  of	
  at	
  least	
  three	
  proteins	
  from	
  different	
  
genomes,	
  more	
  similar	
  to	
  each	
  other	
  than	
  any	
  other	
  
proteins	
  from	
  those	
  genomes,	
  are	
  an	
  orthologous	
  family.”	
  
l  Conduct	
  RBBH	
  
l  Collapse	
  paralogues	
  	
  
l  Detect	
  “triangles”	
  
l  Merge	
  triangles	
  having	
  common	
  side	
  
l  Manual	
  cura5on	
  
l Databases	
  have	
  many	
  outparalogues	
  
Tatusov	
  et	
  al.	
  (2000)	
  Nucl.	
  Acids	
  Res.	
  doi:10.1093/nar/28.1.33	
  
MCL	
  
l MCL	
  constructs	
  a	
  network	
  from	
  all-­‐vs-­‐all	
  BLAST	
  results	
  
l Then	
  applies	
  matrix	
  opera5ons:	
  expansion	
  and	
  infla5on	
  
l Itera5ve	
  expansion	
  and	
  infla*on	
  un5l	
  network	
  
convergence	
  
Enright	
  et	
  al.	
  (2002)	
  Nucl.	
  Acids	
  Res.	
  doi:10.1093/nar/30.7.1575	
  
MCL	
  
Expansion	
   Infla'on	
  
…	
  
…	
  
…	
   …	
  
→	
  
→	
  
Input	
  
Clustering	
  
OrthoMCL	
  
l hZp://orthomcl.org/orthomcl/	
  
1.  Defines	
  poten5al	
  inparalogue,	
  orthologue	
  and	
  co-­‐orthologue	
  pairs	
  
(using	
  RBBH!	
  –	
  see	
  algorithm	
  descrip5on	
  in	
  papers	
  directory)	
  
2.  Applies	
  MCL	
  to	
  cluster	
  inparalogue,	
  orthologue,	
  co-­‐orthologue	
  
pairs/	
  
l Output	
  clusters	
  include	
  both	
  orthologues	
  and	
  paralogues	
  
Li	
  et	
  al.	
  (2003)	
  Genome	
  Res.	
  doi:10.1101/gr.1224503	
  
Notes	
  of	
  Cau'on	
  
l  BLAST-­‐based	
  orthology	
  methods	
  (e.g.	
  RBBH,	
  InParanoid,	
  COG)	
  are	
  fast!	
  
l  But	
  they	
  have	
  some	
  drawbacks:	
  
l  No	
  guarantee	
  that	
  sequence	
  matches	
  are	
  transi5ve	
  (A	
  may	
  match	
  B	
  at	
  a	
  
domain	
  differently	
  than	
  B	
  matches	
  C)	
  
l  No	
  evolu5onary	
  distance	
  model	
  
l  Mul5ple	
  domain	
  matches	
  are	
  not	
  accounted	
  for	
  
l  These	
  methods	
  find	
  similar	
  sequences,	
  then	
  make	
  assump5ons	
  based	
  on	
  
similarity	
  and	
  number	
  of	
  matches.	
  They	
  do	
  not	
  detect	
  orthologues	
  
directly!	
  
l  Tree-­‐based	
  methods	
  incorporate:	
  
l  Evolu5onary	
  distance	
  
l  Direct	
  orthologue	
  detec5on	
  
Finding	
  “Orthologues”	
  
l Pairwise	
  analysis:	
  RBBH	
  
l [ACTIVITY]	
  
l  find_rbbh.ipynb	
  iPython	
  notebook	
  
l Mul5-­‐organism	
  analysis:	
  MCL	
  
l [ACTIVITY]	
  
l  mcl_orthologues/README.md	
  Markdown	
  
l  mcl_orthologues.ipynb iPython	
  notebook	
  
Other	
  Methods	
  
l  Synteny-­‐based:	
  
l  Homologene	
  (NCBI):	
  	
  
„  hZp://www.ncbi.nlm.nih.gov/homologene	
  
l  Manual	
  cura5on:	
  
l  Mouse	
  Genome	
  Database	
  (MGD):	
  
„  hZp://www.informa5cs.jax.org/homology.shtml	
  
l  Tree-­‐based:	
  
l  EnsemblCompara	
  (EMBL-­‐EBI):	
  
„  hZp://www.ensembl.org/info/genome/compara/index.html	
  
l  TreeFam	
  (EMBL-­‐EBI):	
  
„  hZp://www.treefam.org/	
  
l  OrthologID:	
  
„  hZp://nypg.bio.nyu.edu/orthologid/	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
Which	
  method	
  works	
  best?	
  
(and	
  what	
  do	
  we	
  mean	
  by	
  “best”	
  anyway?)	
  
Evalua'ng	
  Predic'ons	
  
l Works	
  the	
  same	
  way	
  for	
  all	
  predic5on	
  tools	
  
1.  Define	
  a	
  “valida5on	
  set”	
  (gold	
  standard),	
  unseen	
  by	
  the	
  predic5on	
  
tool	
  
2.  Make	
  predic5ons	
  with	
  the	
  tool	
  
3.  Evaluate	
  confusion	
  matrix	
  
and	
  performance	
  sta5s5cs	
  
l  Sensi5vity	
  
l  Specificity	
  
l  Accuracy	
  	
  
Standard:	
   +ve	
   -­‐ve	
  
Predict	
  +ve	
   TP	
   FP	
  
Predict	
  -­‐ve	
   FN	
   TN	
  
False	
  posi5ve	
  rate	
   FP/(FP+TN)	
  
False	
  nega5ve	
  rate	
   FN/(TP+FN)	
  
Sensi5vity	
   TP/(TP+FN)	
  
Specificity	
   TN/(FP+TN)	
  
False	
  discovery	
  rate	
  (FDR)	
   FP/(FP+TP)	
  
Accuracy	
   (TP+TN)/(TP+TN+FP+FN)	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l Take	
  advantage	
  of	
  prokaryo5c	
  operon	
  
structure:	
  conserved	
  syntenic	
  triplets	
  
likely	
  to	
  be	
  orthologous	
  
l Idea:	
  If	
  the	
  outer	
  pair	
  in	
  a	
  syntenic	
  
triplet	
  are	
  orthologous,	
  the	
  middle	
  
gene	
  is	
  likely	
  to	
  be,	
  too.	
  
l  Middle	
  genes	
  are	
  orthologue	
  “gold	
  
standard”	
  
l Do	
  RBBH	
  reliably	
  iden5fy	
  middle	
  
genes	
  from	
  syntenic	
  triplets?	
  
Wolf	
  et	
  al.	
  (2012)	
  Genome	
  Biol.	
  Evol.	
  doi:10.1093/gbe/evs100	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l  Two	
  well-­‐characterised	
  genomes	
  	
  
compared	
  against	
  573	
  prokaryotes	
  
l  Iden5fied	
  RBBH	
  (with	
  permissive	
  	
  
BLAST	
  sewngs)	
  
l  “Overwhelming	
  majority”	
  of	
  middle	
  	
  
genes	
  (counterparts)	
  are	
  BBH	
  
l  88-­‐99%	
  of	
  BBH	
  are	
  in	
  syntenic	
  triplets	
  
l  Therefore,	
  RBBH	
  reliably	
  finds	
  orthologues	
  
Wolf	
  et	
  al.	
  (2012)	
  Genome	
  Biol.	
  Evol.	
  doi:10.1093/gbe/evs100	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l Four	
  orthologue	
  predic5on	
  algorithms:	
  
l  RBBH	
  (and	
  cRBH)	
  
l  RSD	
  (and	
  cRSD)	
  
l  Mul5Paranoid	
  
l  OrthoMCL	
  
l Tested	
  against	
  2,723	
  curated	
  orthologues	
  from	
  six	
  Saccharomycetes	
  
l Rated	
  by:	
  
l  Sensi5vity:	
  TP/(TP+FN)	
  –	
  what	
  propor5on	
  of	
  orthologues	
  are	
  found	
  
l  Specificity:	
  TN/(TN+FP)	
  –	
  how	
  well	
  are	
  non-­‐orthologues	
  excluded	
  
l  Accuracy:	
  (TP+TN)/(TP+TN+FP+FN)	
  –	
  general	
  measure	
  of	
  performance	
  
l  FDR:	
  FP/(FP+TP)	
  –	
  what	
  propor5on	
  of	
  predic5ons	
  are	
  incorrect	
  
Salichos	
  et	
  al.	
  (2011)	
  PLoS	
  One.	
  doi:10.1371/journal.pone.0018755	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l Four	
  orthologue	
  predic5on	
  algorithms:	
  
l  RBBH	
  (cRBH)	
  
l  RSD	
  (cRSD)	
  
l  Mul5Paranoid	
  
l  OrthoMCL	
  
l  cRBH	
  most	
  accurate,	
  and	
  specific,	
  with	
  lowest	
  FDR	
  
Salichos	
  et	
  al.	
  (2011)	
  PLoS	
  One.	
  doi:10.1371/journal.pone.0018755	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l Tests	
  of	
  several	
  methods	
  on	
  a	
  number	
  of	
  literature-­‐based	
  
benchmarks	
  for:	
  
l  Correct	
  branching	
  of	
  phylogeny	
  
l  Grouping	
  by	
  func5on	
  
„ GO	
  similarity	
  
„ EC	
  number	
  
„ Expression	
  level	
  
„ Gene	
  Neighbourhood	
  
Altenhoff	
  &	
  Dessimoz	
  (2009)	
  PLoS	
  Comp.	
  Biol.	
  doi:10.1371/journal.pcbi.1000262	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
Altenhoff	
  &	
  Dessimoz	
  (2009)	
  PLoS	
  Comp.	
  Biol.	
  doi:10.1371/journal.pcbi.1000262	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l 70	
  gene	
  family	
  test,	
  mul5ple	
  evolu5onary	
  scenarios	
  
l Tested	
  databases	
  with	
  associated	
  algorithms:	
  
Trachana	
  et	
  al.	
  (2011)	
  Bioessays.	
  doi:10.1002/bies.201100062	
  
Evalua'ng	
  Orthologue	
  Predic'ons	
  
l 70	
  gene	
  family	
  test	
  set,	
  mul5ple	
  evolu5onary	
  scenarios	
  
	
  
l All	
  methods/dbs	
  have	
  strong	
  scope	
  for	
  improvement.	
  
l OrthoMCL	
  poor	
  performer,	
  TreeFam	
  &	
  eggNOG	
  do	
  best	
  
Trachana	
  et	
  al.	
  (2011)	
  Bioessays.	
  doi:10.1002/bies.201100062	
  
Orthologue	
  Predic'on	
  Performance	
  
l Performance	
  varies	
  by	
  choice	
  of	
  method	
  and	
  
interpreta'on	
  of	
  “orthology”	
  
l Biggest	
  influence	
  is	
  genome	
  annota'on	
  quality	
  
l Rela've	
  performance	
  varies	
  with	
  benchmark	
  choice	
  
l (clustering)	
  RBBH	
  outperforms	
  more	
  complex	
  algorithms	
  
under	
  many	
  circumstances	
  
Selec'on	
  Pressures	
  
Signs	
  of	
  selec'on	
  pressure	
  iden'fiable	
  by	
  
compara've	
  genomics	
  
Selec'on	
  Pressures	
  
l Defining	
  core	
  groups	
  of	
  genes	
  by	
  “orthology”	
  allows	
  
analysis	
  of	
  those	
  groups:	
  
l  Synteny/colloca'on	
  
l  Gene	
  neighbourhood	
  changes	
  (e.g.	
  genome	
  expansion)	
  
l  The	
  pangenome:	
  core	
  and	
  accessory	
  genomes	
  
l and	
  sequences	
  in	
  those	
  groups:	
  
l  Mul5ple	
  alignment	
  
l  Domain	
  detec5on	
  
l  Iden5fica5on	
  of	
  func5onal	
  sites	
  
l  Inference	
  of	
  evolu'onary	
  pressures	
  
Synteny	
  
l Selec5ve	
  pressures	
  depend	
  on	
  gene	
  (product)	
  func5on	
  
l Genes	
  involving	
  physically	
  or	
  func5onally-­‐interac5ng	
  
proteins	
  tend	
  to	
  evolve	
  under	
  similar	
  selec5ve	
  constraints	
  
l Par5cularly	
  in	
  bacteria,	
  this	
  leads	
  to	
  co-­‐expression	
  as	
  
regulons	
  and	
  colloca5on	
  in	
  operons	
  
l Colloca5on	
  (and	
  coregula5on)	
  may	
  be	
  iden5fied	
  by	
  
compara5ve	
  genomics	
  
l (This	
  is	
  also	
  true	
  when	
  considering	
  regulatory	
  or	
  
metabolic	
  networks,	
  similarly	
  to	
  genome	
  organisa5on)	
  
Alvarez-­‐Ponce	
  et	
  al.	
  (2011)	
  Genome	
  Biol.	
  Evol.	
  doi:10.1093/gbe/evq084	
  
Synteny	
  
l  Many	
  tools/packages/services	
  for	
  synteny	
  detec5on,	
  
e.g.	
  
l  SyMAP	
  
„  hZp://www.agcol.arizona.edu/soware/
symap/	
  
l  i-­‐ADHoRe	
  
„  hZp://bioinforma5cs.psb.ugent.be/soware/
details/i-­‐-­‐ADHoRe	
  
l  MCScan,	
  Cyntenator,	
  etc	
  
Soderlund	
  et	
  al.	
  (2011)	
  Nucl.	
  Acids.	
  Res.	
  doi:10.1093/nar/gkr123	
  
Proost	
  et	
  al.	
  (2011)	
  Nucl.	
  Acids	
  Res.	
  doi:10.1093/nar/gkr955	
  
i-­‐ADHoRe	
  
l Algorithm:	
  
1.  Combine	
  tandem	
  repeats	
  of	
  genes/gene	
  sets	
  
2.  Make	
  gene	
  homology	
  matrix	
  (GHM):	
  iden5fy	
  collinear	
  regions	
  (diagonals)	
  
for	
  first	
  genome	
  pair	
  
3.  Convert	
  these	
  to	
  	
  
profiles	
  
4.  Use	
  GG2	
  algorithm	
  to	
  
align	
  profiles	
  
5.  Search	
  next	
  genome	
  	
  
with	
  profiles,	
  spliwng	
  	
  
them	
  where	
  necessary	
  
6.  iterate	
  un5l	
  complete	
  
l Gives	
  genome-­‐scale	
  mul5ple	
  alignments	
  of	
  blocks	
  of	
  genes	
  
Proost	
  et	
  al.	
  (2011)	
  Nucl.	
  Acids	
  Res.	
  doi:10.1093/nar/gkr955	
  
i-­‐ADHoRe	
  
l [ACTIVITY]	
  
l  i-ADHoRe/README.md	
  Markdown	
  
l  i-ADHoRe.ipynb	
  iPython	
  notebook	
  
Genome	
  Expansion	
  
l Mobile/repeat	
  elements	
  reproduce	
  and	
  expand	
  during	
  evolu5on	
  
l Generates	
  sequence	
  “laboratory”	
  for	
  varia5on	
  and	
  experiment	
  
l e.g.	
  Phytophthora	
  infestans	
  effector	
  protein	
  expansion	
  and	
  arms	
  race	
  
Haas	
  et	
  al.	
  (2009)	
  Nature.	
  doi:10.1038/nature08358	
  
Genome	
  Expansion	
  
l  Mobile	
  elements	
  (MEs)	
  are	
  large,	
  
carry	
  genes	
  with	
  them.	
  
l  Regions	
  rich	
  in	
  MEs	
  have	
  larger	
  
gaps	
  between	
  	
  
consecu5ve	
  genes	
  
l  Effector	
  proteins	
  are	
  found	
  
preferen5ally	
  in	
  regions	
  with	
  
large	
  gaps,	
  also	
  show	
  increased	
  
rates	
  of	
  evolu5onary	
  divergence.	
  
l  “Two-­‐speed	
  genome”	
  associated	
  
with	
  adaptability	
  to	
  new	
  hosts/
escape	
  from	
  evolu5onary	
  
“boZleneck”	
  
Haas	
  et	
  al.	
  (2009)	
  Nature.	
  doi:10.1038/nature08358	
  
The	
  Pangenome	
  
l  The	
  gene	
  complement	
  of	
  a	
  set	
  of	
  organisms	
  (e.g.	
  species	
  group)	
  is	
  the	
  	
  
pangenome,	
  defined	
  by	
  the	
  union	
  of	
  two	
  gene	
  sets:	
  
l  Core	
  genes:	
  genes	
  present	
  in	
  all	
  examples	
  (define	
  common	
  species	
  
characteris5cs)	
  
l  Accessory	
  genes:	
  genes	
  only	
  present	
  in	
  a	
  subset	
  of	
  examples	
  (relevant	
  to	
  
adapta5on	
  of	
  individuals)	
  
l  Defini5on	
  depends	
  on	
  composi5on	
  of	
  organism	
  set	
  
l  Core	
  genome	
  hypothesis:	
  
l  “The	
  core	
  genome	
  is	
  the	
  primary	
  	
  
cohesive	
  unit	
  defining	
  a	
  bacterial	
  	
  
species.”	
  
l  Online	
  tools	
  available,	
  e.g.	
  
l  Panseq	
  (hZp://lfz.corefacility.ca/panseq/)	
  
Laing	
  et	
  al.	
  (2010)	
  BMC	
  Bioinf.	
  doi:10.1186/1471-­‐2105-­‐11-­‐461	
  
Lefébure	
  et	
  al.	
  (2010)	
  Genome	
  Biol.	
  Evol.	
  doi:10.1093/gbe/evq048	
  
Defining	
  a	
  species’	
  core	
  genome	
  
l  “Orthologue	
  groups”	
  with	
  a	
  
representa5ve	
  in	
  (nearly)	
  every	
  
member	
  of	
  the	
  set	
  
l  But	
  we	
  only	
  have	
  a	
  sample	
  of	
  the	
  
species,	
  not	
  every	
  member…	
  
l  …so	
  use	
  rarefac5on	
  curves	
  to	
  
es5mate	
  core	
  genome	
  size.	
  
1.  Randomly	
  order	
  organisms,	
  
and	
  count	
  number	
  of	
  ‘core’	
  
and	
  ‘new’	
  genes	
  seen	
  with	
  
each	
  new	
  genome	
  addi5on.	
  
2.  Repeat	
  un5l	
  you	
  have	
  a	
  
reasonable	
  es5mate	
  of	
  error/
no	
  new	
  genes	
  found	
  
Lefébure	
  et	
  al.	
  (2010)	
  Genome	
  Biol.	
  Evol.	
  doi:10.1093/gbe/evq048	
  
Direc'onal	
  Selec'on	
  
l Several	
  sta5s5cal	
  tests	
  for	
  direc5onal	
  selec5on,	
  e.g.	
  
l  QTL	
  sign	
  
l  Ka/Ks	
  (dN/dS)	
  ra'o	
  test	
  –	
  most	
  commonly	
  applied	
  
l  Rela5ve	
  rate	
  test	
  
l Ka/Ks	
  ra'o:	
  
l  Ka	
  (or	
  dN):	
  number	
  of	
  non-­‐synonymous	
  subs5tu5ons	
  per	
  non-­‐
synonymous	
  site	
  
l  Ks	
  (or	
  dS):	
  number	
  of	
  synonymous	
  subs5tu5ons	
  per	
  synonymous	
  site	
  
l  Ka/Ks	
  >	
  1	
  ⇒	
  posi5ve	
  selec5on;	
  Ka/Ks	
  <	
  1	
  ⇒	
  stabilising	
  selec5on	
  
l  Several	
  methods/tools	
  for	
  calcula5on	
  
„ PAML	
  (hZp://abacus.gene.ucl.ac.uk/soware/paml.html)	
  
„ SeqinR	
  (hZp://cran.r-­‐project.org/web/packages/seqinr/index.html)	
  
Genome-­‐Wide	
  Posi've	
  Selec'on	
  
Lefébure	
  &	
  Stanhope	
  (2009)	
  Genome	
  Res.	
  doi:10.1101/gr.089250.108	
  
An	
  Analysis	
  Output	
  
l Class	
  comparison:	
  
animal-­‐pathogenic	
  
(APE)	
  vs	
  plant-­‐
associated	
  bacteria	
  
(PAB)	
  
l Presence	
  of	
  
horizontally-­‐acquired	
  
islands	
  (HAI)	
  
l Genes	
  with	
  greater	
  
similarity	
  to	
  PAB	
  than	
  
APE	
  
Toth	
  et	
  al.	
  (2006)	
  Annu.	
  Rev.	
  Phytopath.	
  doi:10.1146/annurev.phyto.44.070505.143444	
  
Things	
  I	
  Didn’t	
  Get	
  To	
  
l Genome-­‐Wide	
  Associa'on	
  Studies	
  (GWAS):	
  
l  Try	
  hZp://genenetwork.org/	
  to	
  play	
  with	
  some	
  data	
  
l Predic'on	
  of	
  regulatory	
  elements,	
  e.g.	
  
l  Kellis	
  et	
  al.	
  (2003)	
  Nature	
  doi:10.1038/nature01644	
  
l  King	
  et	
  al.	
  (2007)	
  Genome	
  Res.	
  doi:10.1101/gr.5592107	
  
l  Chaivorapol	
  et	
  al.	
  (2008)	
  BMC	
  Bioinf.	
  doi:10.1186/1471-­‐2105-­‐9-­‐455	
  
l  CompMOBY:	
  hZp://genome.ucsf.edu/compmoby/	
  
l Detec'on	
  of	
  Horizontal/Lateral	
  Gene	
  Transfer	
  (HGT/LGT),	
  e.g.	
  
l  Tsirigos	
  &	
  Rigoutsos	
  (2005)	
  Nucl.	
  Acids.	
  Res.	
  doi:10.1093/nar/gki187	
  
l Phylogenomics,	
  e.g.	
  
l  Delsuc	
  et	
  al.	
  (2005)	
  Nat.	
  Rev.	
  Genet.	
  doi:10.1038/nrg1603	
  
Finishing	
  The	
  Hat	
  
Some	
  of	
  the	
  things	
  I	
  hope	
  you	
  have	
  taken	
  
away	
  from	
  the	
  lectures/ac'vi'es	
  
Take-­‐Home	
  Messages	
  
l Compara've	
  genomics	
  is	
  a	
  powerful	
  set	
  of	
  techniques	
  for:	
  
l  Understanding	
  and	
  iden5fying	
  evolu5onary	
  processes	
  and	
  mechanisms	
  
l  Reconstruc5ng	
  detailed	
  evolu5onary	
  history	
  of	
  a	
  set	
  of	
  organisms	
  
l  Iden5fying	
  and	
  understanding	
  common	
  genomic	
  features	
  of	
  organisms	
  
l  Providing	
  hypotheses	
  about	
  gene	
  func5on	
  for	
  experimental	
  inves5ga5on	
  
l A	
  huge	
  amount	
  of	
  data	
  is	
  available	
  to	
  work	
  with	
  
l  And	
  it’s	
  only	
  going	
  to	
  get	
  much,	
  much	
  larger	
  
l Results	
  feed	
  into	
  many	
  areas	
  of	
  study:	
  
l  Medicine	
  and	
  health	
  
l  Agriculture	
  and	
  food	
  security	
  
l  Basic	
  biology	
  in	
  all	
  fields	
  
l  Systems	
  and	
  synthe5c	
  biology	
  
Take-­‐Home	
  Messages	
  
l Compara've	
  genomics	
  is	
  essen'ally	
  based	
  around	
  comparisons	
  
l  What	
  is	
  similar	
  between	
  two	
  genomes?	
  What	
  is	
  different?	
  
l Compara've	
  genomics	
  is	
  evolu'onary	
  genomics	
  
l Large	
  datasets	
  benefit	
  from	
  visualisa'on	
  for	
  effec've	
  interpreta'on	
  
l  Much	
  scope	
  for	
  improvement	
  in	
  visualisa5on	
  
l Tools	
  with	
  the	
  same	
  purpose	
  give	
  different	
  output	
  
l  BLAST	
  vs	
  MUMmer	
  
l  RBBH	
  vs	
  MCL	
  
l  Choice	
  of	
  applica'on	
  magers	
  for	
  correctness	
  and	
  interpreta'on!	
  –	
  
understand	
  what	
  the	
  applica'on	
  does,	
  and	
  its	
  limits.	
  
Take-­‐Home	
  Messages	
  
l Compara've	
  genomics	
  is	
  
l Fun	
  
l Indoor	
  work,	
  in	
  the	
  warm	
  and	
  dry	
  
l Not	
  a	
  job	
  that	
  involves	
  heavy	
  liiing	
  
Credits	
  
l This	
  slideshow	
  is	
  shared	
  under	
  a	
  Crea5ve	
  Commons	
  
AZribu5on	
  4.0	
  License	
  
hZp://crea5vecommons.org/licenses/by/4.0/)	
  
l Copyright	
  is	
  held	
  by	
  The	
  James	
  HuZon	
  Ins5tute	
  
hZp://www.huZon.ac.uk	
  
l You	
  may	
  freely	
  use	
  this	
  material	
  in	
  research,	
  papers,	
  and	
  
talks	
  so	
  long	
  as	
  acknowledgement	
  is	
  made.	
  	
  

More Related Content

What's hot

What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...Leighton Pritchard
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentationEmmanuel Aguon
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsfisherp
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Kate Hertweck
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2GCUF
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAmol Kunde
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomesmadalladam
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomicsJalormi Parekh
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolutionPaula Mills
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)lucascw
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomicsUsman Arshad
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicskiran singh
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 

What's hot (20)

What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
What makes the enterobacterial plant pathogen Pectobacterium atrosepticum dif...
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentation
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Formal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural GenomesFormal languages to map Genotype to Phenotype in Natural Genomes
Formal languages to map Genotype to Phenotype in Natural Genomes
 
Gene order
Gene orderGene order
Gene order
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomics
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolution
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 

Similar to Comparative Genomics and Visualisation - Part 2

Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityMonica Munoz-Torres
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)avalgar
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsSaramita De Chakravarti
 
SMBE 2015: Expression STRs
SMBE 2015: Expression STRsSMBE 2015: Expression STRs
SMBE 2015: Expression STRsYaniv Erlich
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisMonica Munoz-Torres
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila CellsGenome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cellslleung
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesDenise Carvalho-Silva, PhD
 
Progress and prospects in plant genome editing
Progress and prospects in plant genome editingProgress and prospects in plant genome editing
Progress and prospects in plant genome editingAnilkumar C
 
Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004Jonathan Eisen
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Monica Munoz-Torres
 
Cleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 geneCleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 geneJingwen Zhang
 
Cell Biology Lecture #2
Cell Biology Lecture #2Cell Biology Lecture #2
Cell Biology Lecture #2Suk Namgoong
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterMonica Munoz-Torres
 
Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...Fran Supek
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 

Similar to Comparative Genomics and Visualisation - Part 2 (20)

Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)
 
Molecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in InsectsMolecular Markers: Major Applications in Insects
Molecular Markers: Major Applications in Insects
 
Gene structure and expression
Gene structure and expressionGene structure and expression
Gene structure and expression
 
SMBE 2015: Expression STRs
SMBE 2015: Expression STRsSMBE 2015: Expression STRs
SMBE 2015: Expression STRs
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila CellsGenome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar series
 
Progress and prospects in plant genome editing
Progress and prospects in plant genome editingProgress and prospects in plant genome editing
Progress and prospects in plant genome editing
 
Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004Talk on Phylogenomics for MBL Molecular Evolution Course 2004
Talk on Phylogenomics for MBL Molecular Evolution Course 2004
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
Cleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 geneCleft lip and palate: Examining variations on ZEB1 gene
Cleft lip and palate: Examining variations on ZEB1 gene
 
Cell Biology Lecture #2
Cell Biology Lecture #2Cell Biology Lecture #2
Cell Biology Lecture #2
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...Inferring microbial gene function from evolution of synonymous codon usage bi...
Inferring microbial gene function from evolution of synonymous codon usage bi...
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 

More from Leighton Pritchard

Little Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLittle Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLeighton Pritchard
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogensLeighton Pritchard
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Leighton Pritchard
 
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial Agrogenomics 4/2/2015, UK-MX WorkshopMicrobial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial Agrogenomics 4/2/2015, UK-MX WorkshopLeighton Pritchard
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeLeighton Pritchard
 
Highly Discriminatory Diagnostic Primer Design From Whole Genome Data
Highly Discriminatory Diagnostic Primer Design From Whole Genome DataHighly Discriminatory Diagnostic Primer Design From Whole Genome Data
Highly Discriminatory Diagnostic Primer Design From Whole Genome DataLeighton Pritchard
 
ICSB 2013 - Visits Abroad Report
ICSB 2013 - Visits Abroad ReportICSB 2013 - Visits Abroad Report
ICSB 2013 - Visits Abroad ReportLeighton Pritchard
 
Adventures in Bioinformatics (2012)
Adventures in Bioinformatics (2012)Adventures in Bioinformatics (2012)
Adventures in Bioinformatics (2012)Leighton Pritchard
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of BioinformaticsLeighton Pritchard
 
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
 
Rapid generation of E.coli O104:H4 PCR diagnostics
Rapid generation of E.coli O104:H4 PCR diagnosticsRapid generation of E.coli O104:H4 PCR diagnostics
Rapid generation of E.coli O104:H4 PCR diagnosticsLeighton Pritchard
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
Mining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsMining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsLeighton Pritchard
 
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, Turin
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, TurinA Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, Turin
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, TurinLeighton Pritchard
 

More from Leighton Pritchard (18)

In a Different Class?
In a Different Class?In a Different Class?
In a Different Class?
 
RDVW Hands-on session: Python
RDVW Hands-on session: PythonRDVW Hands-on session: Python
RDVW Hands-on session: Python
 
Little Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic BacteriaLittle Rotters: Adventures With Plant-Pathogenic Bacteria
Little Rotters: Adventures With Plant-Pathogenic Bacteria
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Whole genome taxonomic classi cation for prokaryotic plant pathogens
Whole genome taxonomic classication for prokaryotic plant pathogensWhole genome taxonomic classication for prokaryotic plant pathogens
Whole genome taxonomic classi cation for prokaryotic plant pathogens
 
Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)Microbial Genomics and Bioinformatics: BM405 (2015)
Microbial Genomics and Bioinformatics: BM405 (2015)
 
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial Agrogenomics 4/2/2015, UK-MX WorkshopMicrobial Agrogenomics 4/2/2015, UK-MX Workshop
Microbial Agrogenomics 4/2/2015, UK-MX Workshop
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
 
Sequencing and Beyond?
Sequencing and Beyond?Sequencing and Beyond?
Sequencing and Beyond?
 
Highly Discriminatory Diagnostic Primer Design From Whole Genome Data
Highly Discriminatory Diagnostic Primer Design From Whole Genome DataHighly Discriminatory Diagnostic Primer Design From Whole Genome Data
Highly Discriminatory Diagnostic Primer Design From Whole Genome Data
 
ICSB 2013 - Visits Abroad Report
ICSB 2013 - Visits Abroad ReportICSB 2013 - Visits Abroad Report
ICSB 2013 - Visits Abroad Report
 
Adventures in Bioinformatics (2012)
Adventures in Bioinformatics (2012)Adventures in Bioinformatics (2012)
Adventures in Bioinformatics (2012)
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of Bioinformatics
 
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
 
Rapid generation of E.coli O104:H4 PCR diagnostics
Rapid generation of E.coli O104:H4 PCR diagnosticsRapid generation of E.coli O104:H4 PCR diagnostics
Rapid generation of E.coli O104:H4 PCR diagnostics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Mining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsMining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for Effectors
 
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, Turin
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, TurinA Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, Turin
A Systems Biology Perspective on Plant-Pathogen Interactions 2012-05-08, Turin
 

Recently uploaded

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 

Recently uploaded (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 

Comparative Genomics and Visualisation - Part 2

  • 1. Compara've  Genomics  and   Visualisa'on  –  Part  2   Leighton  Pritchard  
  • 2. Part  2   l Part  1   l  Experimental  Compara5ve  Genomics   l  Bulk  and  Whole  Genome  Comparisons   l Genome  Features   l Who  let  the  –logues  out?   l Finishing  The  Hat  
  • 3. Genome  Features   l Genes:   l  transla5on  start   l  introns   l  exons   l  transla5on  stop   l  transla5on  terminator   l ncRNA:   l  tRNA  –  transfer  RNA   l  rRNA  –  ribosomal  RNA   l  CRISPRs  –  bacterial  and  archaeal  defence     (genome  edi5ng)   l  many  other  classes  (including  enhancers)  
  • 4. Genome  Features   l Regulatory  sites   l  Transcrip5on  start  site  (TSS)   l  RNA  polymerase  binding  sites   l  Transcrip5on  Factor  Binding  Sites     (TFBS)   l  Core,  proximal  and  distal  promoter  regions   l Repe''ve  Regions  and  Mobile  Elements   l  Tandem  repeats   l  (retro-­‐)transposable  elements   „ Alu  has  ≈50,000  ac5ve  copies  in  human  genome   l  Phage  inclusion  (bacteria/archaea)   Pennacchio  &  Rubin  (2001)  Nat.  Rev.  Genet.  doi:10.1038/35052548   human  v  mouse  comparison  
  • 5. Genome  Feature  Iden'fica'on   l Gene  Finding:   1.  Empirical  (evidence-­‐based)  methods:   „ Inference  from  known  protein/cDNA/mRNA/EST  sequence   „ Inference  from  mapped  RNA  reads   2.  Ab  ini*o  methods:   „  Iden5fica5on  of  sequences  associated  with  gene   features:   ª  TSS,  CpG  islands,  Shine-­‐Dalgarno  sequence,  stop   codons,  etc.   3.  Inference  from  genome  comparisons/conserva5on   Liang  et  al.  (2009)  Genome  Res.  doi:10.1101/gr.088997.108   Brent  (2007)  Nat.  Biotech.  doi:10.1038/nbt0807-­‐883   Korf  (2004)  BMC  Bioinf.  doi:10.1186/1471-­‐2105-­‐5-­‐59  
  • 6. Genome  Feature  Iden'fica'on   l Finding  Regulatory  Elements  (short,  degenerate):   1.  Empirical  (evidence-­‐based)  methods:   „ Inference  from  protein-­‐DNA  binding  experiments   „ Inference  from  coexpression   2.  Ab  ini*o  methods:   „  Iden5fica5on  of  regulatory  mo5fs  (profile/other  methods):   ª  TATA,  sigma-­‐factor  binding  sites,  etc.   „  sta5s5cal  overrepresenta5on   „  Iden5fica5on  from  sequence  proper5es   3.  Inference  from  sequence  conserva5on/genome  comparisons   Zhang  et  al.  (2011)  BMC  Bioinf.  doi:10.1186/1471-­‐2105-­‐12-­‐238   Kilic  et  al.  (2013)  Nucl.  Acids  Res.  doi:10.1093/nar/gkt1123   Vavouri  &  Elgar  (2005)  Curr.  Op.  Genet.  Devel.  doi:10.1016/j.gde.2005.05.002  
  • 7. Genome  Feature  Iden'fica'on   l  All  predic5on  methods  result  in  errors   l  All  experiments  have  error   l  Genome  comparisons  can  help  correct  errors   l  [OPTIONAL  ACTIVITY]  –  useful  for  exercise   l  predict_CDS.md  Markdown   l  Other  op5ons  for  prokaryo5c  genecalling:   l  Glimmer  (hZp://ccb.jhu.edu/soware/glimmer/index.shtml)   l  GeneMarkS  (hZp://opal.biology.gatech.edu/)   l  RAST  (hZp://rast.nmpdr.org/)   l  BASys  (hZps://www.basys.ca/),  etc.   l  Op5ons  for  eukaryo5c  genecalling:   l  GlimmerHMM  (hZp://ccb.jhu.edu/soware/glimmerhmm/)   l  GeneMarkES  (hZp://opal.biology.gatech.edu/gmseuk.html)     l  Augustus  (hZp://augustus.gobics.de/),  etc.  
  • 8. Who  Let  The  -­‐logues  Out?   Evolu'onary  rela'onships  of  genome   features  can  be  complex.   We  require  precise  terms  to  describe   rela'onships  between  genome  features.  
  • 9. Comparing  Gene  Features   l Given  gene  annota5ons  for  more  than  one  genome,  how   can  we  organise  and  understand  rela5onships?   l  Func5onal  similarity  (analogy)   l  Evolu5onary  common  origin  (homology,  orthology,  etc.)   l  Evolu5onary/func5onal/family  rela5onships  (paralogy)   Terms  first  suggested  by  Fitch  (1970)  Syst.  Zool.  doi:10.2307/2412448  
  • 10. Agack  of  the  –logues   l Technical  terms  describing  evolu5onary  rela5onships     l Homologues:  elements  that  are  similar  because  they  share  a  common   ancestor  (NOTE:  There  are  NOT  degrees  of  homology!)   l Analogues:  elements  that  are  (func5onally?)  similar,  possibly  through   convergent  evolu5on  and  not  by  sharing  common  ancestry   l Orthologues:  homologues  that  diverged  through  specia5on   l Paralogues:  homologues  that  diverged  through  duplica5on  within  the   same  genome   l (also  co-­‐orthologues,  xenologues,  etc.)  
  • 11. Agack  of  the  –logues   'me   ancestral  genome  feature  genome  
  • 12. Agack  of  the  –logues   'me   specia'on   ancestor:  iA   species1:iA   species2:iA   orthologues   •  Orthologues:  homologues  that  diverged  through  specia5on   genome  
  • 13. Agack  of  the  –logues   ancestral  copy:A   'me   copy  1:A   copy  2:A’   duplica'on   paralogues   Paralogues:  homologues  that  diverged  through  duplica5on  within   the  same  genome   genome  
  • 14. Agack  of  the  –logues   'me   specia'on   ancestor:iA   species1:iA   species2:iA   species1:iA’   species1:iA   species2:iA   duplica'on   orthologues   out-­‐paralogues   in-­‐paralogues   genome  
  • 15. Agack  of  the  –logues   'me   specia'on   ancestor:iA   species1:iA   Species2:iA   species1:iA’   species2:iA   species2:iA’  species1:iA   duplica'on   in-­‐paralogues   in-­‐paralogues   out-­‐paralogues   orthologues   genome  
  • 16. Agack  of  the  –logues   l BUT:  biology  is  not  well-­‐behaved:  rela5onships  can  be     difficult  to  infer   l  Gene  loss  occurs   l  Homologues  can  diverge  –  some5mes  very  widely:  hard  to  recognise   l  Reconstructed  evolu5onary  trees  for  specia5on  events  may  not  be  robust   Kristensen  et  al.  (2011)  Brief.  Bioinf.  doi:10.1093/bib/bbr030  
  • 17. genome   extensive   divergence   Agack  of  the  –logues   'me   specia'on   ancestor:iA   species1:iA   Species2:iA   species1:iA’   species2:iA   species2:iA’  species1:iA   duplica'on   species1:iA?   species1:iA   species2:iA?   in-­‐paralogues   (co-­‐)orthologues?   contemporary   sequence   historical   events   out-­‐paralogues/co-­‐orthologues?   Current  classifica'ons  of  orthology/paralogy  are  inferences  
  • 18. Agack  of  the  –logues   l BUT:  biology  is  not  well-­‐behaved:  rela5onships  can  be     difficult  to  infer   l  Gene  loss  occurs   l  Homologues  can  diverge  –  some5mes  very  widely:  hard  to  recognise   l  Reconstructed  evolu5onary  trees  for  specia5on  events  may  not  be  robust   l Some  resources  and  tools  ‘bend’  defini5ons,  e.g.  Ensembl  Compara   and  OrthoMCL.   hZp://www.ensembl.org/info/genome/compara/  homology_method.html   Kristensen  et  al.  (2011)  Brief.  Bioinf.  doi:10.1093/bib/bbr030  
  • 19. Note  on  “Orthology”   l Frequently  abused/misused  as  a  term   l “Orthology”  is  an  evolu5onary  rela5onship,  oen  bent   into  service  as  a  func5onal  descriptor   l Strictly  defined  only  for  two  species  or  clades!   l  (cf.  OrthoMCL,  etc.)   l Orthology  is  not  transi5ve  (A  is  orthologue  of  C  and  B  is   orthologue  of  C  does  not  imply  A  is  an  orthologue  of  B)   l  (cf.  EnsemblCompara  defini5ons)   Storm  &  Sonnhammer  (2002)  Bioinforma@cs.  doi:10.1093/bioinforma'cs/18.1.92  
  • 20. Ensembl  Compara  defini'ons   l  within_species_paralog:     same-­‐species  paralogue  (in-­‐ paralogue)   l  ortholog_one2one: orthologue   l  ortholog_one2many:     orthologue/paralogue   rela5onship   l  orthology_many2many:     orthologue/paralogue   rela5onship   Vilella  et  al.  (2009)  Genome  Res.  doi:10.1101/gr.073585.107   NOTE:  the  taxonomy  may  not  always  be  correct…  
  • 21. “The  Ortholog  Conjecture”   Without  duplica'on,  a  gene  is  unlikely  to   change  its  basic  func'on,  because  this   would  lead  to  loss  of  the  original   func'on,  and  this  would  be  harmful.  
  • 22. Problems  with  the  Ortholog  Conjecture   l Nehrt  et  al.  (2011)  say:   l  Paralogues  beZer  predictor  of  func5on  than  orthologues   „ ∴  conjecture  is  false!   l  Cellular  context  beZer  for  protein  func5on  inference   l  Func5on  defined  from  Gene  Ontology  (GO)   Nehrt  et  al.  (2011)  PLoS  Comp.  Biol.  doi:10.1371/journal.pcbi.1002073   Chen  et  al.  (2012)  PLoS  Comp.  Biol.  doi:10.1371/journal.pcbi.1002784  
  • 23. Problems  with  the  Ortholog  Conjecture   l But  do  we  understand  func5on  well  enough  to  test  the   conjecture?   l Chen  et  al.  (2012)  say:  “No”   l  “examina5on  of  func5onal  studies  of  homologs  with  iden5cal   protein  sequences  reveals  experimental  biases,  annota5on  errors,   and  homology-­‐based  func5onal  inferences  that  are  labeled  in  GO   as  experimental.  These  problems  […]  make  the  current  GO   inappropriate  for  tes5ng  the  ortholog  conjecture”   l  Expression  level  similarity  is  more  similar  for  orthologues  than   paralogues  (but  is  this  “func'on”…?)   Nehrt  et  al.  (2011)  PLoS  Comp.  Biol.  doi:10.1371/journal.pcbi.1002073   Chen  et  al.  (2012)  PLoS  Comp.  Biol.  doi:10.1371/journal.pcbi.1002784  
  • 24. Finding  “Orthologues”   The  process  of  finding  evolu'onary  (and/ or  func'onal)  equivalents  of  genes  across   two  or  more  organisms’  genomes.  
  • 25. Why  are  “orthologues”  so  important?   l Orthology  formalises  the  concept  of  corresponding  genes     across  mul5ple  organisms.   l  Evolu5onary   l  Func5onal?  (“The  Ortholog  Conjecture”)   l Applica5ons  in:   l  Compara5ve  genomics   l  Func5onal  genomics   l  Phylogene5cs,  …   l Many  (>35)  databases  aZempt  to  describe  orthologous  rela5onships   l  hZp://queskororthologs.org/orthology_databases   Dessimoz  (2011)  Brief.  Bioinf.  doi:10.1093/bib/bbr057  
  • 26. How  to  find  orthologues?   l Many  published  methods  and  databases:   l  Pairwise  between  two  genomes:     „ RBBH  (aka  BBH,  RBH,  etc.),  RSD,  InParanoid,  RoundUp   l  Mul5-­‐genome     „ Graph-­‐based:  COG,  eggNOG,  OrthoDB,  OrthoMCL,  OMA,   Mul5Paranoid   „ Tree-­‐based:  TreeFam,  Ensembl  Compara,  PhylomeDB,  LOFT   l Methods  may  apply  different  -­‐  or  refined  -­‐  defini5ons  of   orthology,  paralogy,  etc.   Salichos  et  al.  (2011)  PLoS  One.  doi:10.1371/journal.pone.0018755     Trachana    et  al.  (2011)  Bioessays  doi:10.1002/bies.201100062   Kristensen  et  al.  (2011)  Brief.  Bioinf.  doi:10.1093/bib/bbr030  
  • 27. Pairwise  approaches   l S1,  S2  are  the  gene  sequence  sets  from  two  organisms   l Compare  S1  to  S2,  and  iden5fy  the  most  similar  pairs  of   sequences:  these  are  “orthologues”  (or  “puta5ve  orthologues”).   l Many  similarity  measures  possible  (which  threshold:  E-­‐value,  bit  score,   coverage…?):   l  Reciprocal  best  BLAST  hit  (RBBH)  –  used  by  e.g.  InParanoid   l  Reciprocal  smallest  difference  (RSD)  –  used  by  e.g.  RoundUp   l  and  so  on…   l Can  be  extended  to  mul5-­‐organism  clusters  by  graph-­‐based   approaches   Östlund  et  al.  (2009)  Nuc.  Acids  Res.  doi:10.1093/nar/gkp931     DeLuca    et  al.  (2012)  Bioinf.  doi:10.1093/bioinforma'cs/bts006  
  • 28. Reciprocal  Best  BLAST  Hits   l S1,  S2  are  the  gene  sequence  sets  from  two  organisms   l BLASTP:   l  Query=S1,  Subject=S2     l  Query=S2,  Subject=S1   l Op5onally  filter  BLAST  hits  (e.g.  on  %iden5ty  and  %coverage)   l Find  all  pairs  of  sequences  {GS1n,  GS2n}  in  S1,  S2  where  GS1n  is  the  best   BLAST  match  to  GS2n  and  GS2n  is  the  best  BLAST  match  to  GS1n.   best  hit   best  hit   best  hit   best  hit   2nd  best  hit   2nd  best  hit   ✔   ✘   best  hit  
  • 29. Reciprocal  Best  BLAST  Hits   l Advantages:     l  quick   l  easy   l  performs  surprisingly  well  (see  later…)   l Disadvantages:     l  misses  paralogues   l  not  good  at  iden5fying  gene  families  or  *-­‐to-­‐many   rela5onships  without  more  detailed  analysis.     l  no  strong  theore5cal/phylogene5c  basis.  
  • 30. COG   l COG  (Clusters  of  Orthologous  Groups;  now  POG,  KOG,   eggNOG  etc.)   l Graph  extension  of  RBBH  to  clusters  of  mutual  RBBH   l  “Any  group  of  at  least  three  proteins  from  different   genomes,  more  similar  to  each  other  than  any  other   proteins  from  those  genomes,  are  an  orthologous  family.”   l  Conduct  RBBH   l  Collapse  paralogues     l  Detect  “triangles”   l  Merge  triangles  having  common  side   l  Manual  cura5on   l Databases  have  many  outparalogues   Tatusov  et  al.  (2000)  Nucl.  Acids  Res.  doi:10.1093/nar/28.1.33  
  • 31. MCL   l MCL  constructs  a  network  from  all-­‐vs-­‐all  BLAST  results   l Then  applies  matrix  opera5ons:  expansion  and  infla5on   l Itera5ve  expansion  and  infla*on  un5l  network   convergence   Enright  et  al.  (2002)  Nucl.  Acids  Res.  doi:10.1093/nar/30.7.1575  
  • 32. MCL   Expansion   Infla'on   …   …   …   …   →   →   Input   Clustering  
  • 33. OrthoMCL   l hZp://orthomcl.org/orthomcl/   1.  Defines  poten5al  inparalogue,  orthologue  and  co-­‐orthologue  pairs   (using  RBBH!  –  see  algorithm  descrip5on  in  papers  directory)   2.  Applies  MCL  to  cluster  inparalogue,  orthologue,  co-­‐orthologue   pairs/   l Output  clusters  include  both  orthologues  and  paralogues   Li  et  al.  (2003)  Genome  Res.  doi:10.1101/gr.1224503  
  • 34. Notes  of  Cau'on   l  BLAST-­‐based  orthology  methods  (e.g.  RBBH,  InParanoid,  COG)  are  fast!   l  But  they  have  some  drawbacks:   l  No  guarantee  that  sequence  matches  are  transi5ve  (A  may  match  B  at  a   domain  differently  than  B  matches  C)   l  No  evolu5onary  distance  model   l  Mul5ple  domain  matches  are  not  accounted  for   l  These  methods  find  similar  sequences,  then  make  assump5ons  based  on   similarity  and  number  of  matches.  They  do  not  detect  orthologues   directly!   l  Tree-­‐based  methods  incorporate:   l  Evolu5onary  distance   l  Direct  orthologue  detec5on  
  • 35. Finding  “Orthologues”   l Pairwise  analysis:  RBBH   l [ACTIVITY]   l  find_rbbh.ipynb  iPython  notebook   l Mul5-­‐organism  analysis:  MCL   l [ACTIVITY]   l  mcl_orthologues/README.md  Markdown   l  mcl_orthologues.ipynb iPython  notebook  
  • 36. Other  Methods   l  Synteny-­‐based:   l  Homologene  (NCBI):     „  hZp://www.ncbi.nlm.nih.gov/homologene   l  Manual  cura5on:   l  Mouse  Genome  Database  (MGD):   „  hZp://www.informa5cs.jax.org/homology.shtml   l  Tree-­‐based:   l  EnsemblCompara  (EMBL-­‐EBI):   „  hZp://www.ensembl.org/info/genome/compara/index.html   l  TreeFam  (EMBL-­‐EBI):   „  hZp://www.treefam.org/   l  OrthologID:   „  hZp://nypg.bio.nyu.edu/orthologid/  
  • 37. Evalua'ng  Orthologue  Predic'ons   Which  method  works  best?   (and  what  do  we  mean  by  “best”  anyway?)  
  • 38. Evalua'ng  Predic'ons   l Works  the  same  way  for  all  predic5on  tools   1.  Define  a  “valida5on  set”  (gold  standard),  unseen  by  the  predic5on   tool   2.  Make  predic5ons  with  the  tool   3.  Evaluate  confusion  matrix   and  performance  sta5s5cs   l  Sensi5vity   l  Specificity   l  Accuracy     Standard:   +ve   -­‐ve   Predict  +ve   TP   FP   Predict  -­‐ve   FN   TN   False  posi5ve  rate   FP/(FP+TN)   False  nega5ve  rate   FN/(TP+FN)   Sensi5vity   TP/(TP+FN)   Specificity   TN/(FP+TN)   False  discovery  rate  (FDR)   FP/(FP+TP)   Accuracy   (TP+TN)/(TP+TN+FP+FN)  
  • 39. Evalua'ng  Orthologue  Predic'ons   l Take  advantage  of  prokaryo5c  operon   structure:  conserved  syntenic  triplets   likely  to  be  orthologous   l Idea:  If  the  outer  pair  in  a  syntenic   triplet  are  orthologous,  the  middle   gene  is  likely  to  be,  too.   l  Middle  genes  are  orthologue  “gold   standard”   l Do  RBBH  reliably  iden5fy  middle   genes  from  syntenic  triplets?   Wolf  et  al.  (2012)  Genome  Biol.  Evol.  doi:10.1093/gbe/evs100  
  • 40. Evalua'ng  Orthologue  Predic'ons   l  Two  well-­‐characterised  genomes     compared  against  573  prokaryotes   l  Iden5fied  RBBH  (with  permissive     BLAST  sewngs)   l  “Overwhelming  majority”  of  middle     genes  (counterparts)  are  BBH   l  88-­‐99%  of  BBH  are  in  syntenic  triplets   l  Therefore,  RBBH  reliably  finds  orthologues   Wolf  et  al.  (2012)  Genome  Biol.  Evol.  doi:10.1093/gbe/evs100  
  • 41. Evalua'ng  Orthologue  Predic'ons   l Four  orthologue  predic5on  algorithms:   l  RBBH  (and  cRBH)   l  RSD  (and  cRSD)   l  Mul5Paranoid   l  OrthoMCL   l Tested  against  2,723  curated  orthologues  from  six  Saccharomycetes   l Rated  by:   l  Sensi5vity:  TP/(TP+FN)  –  what  propor5on  of  orthologues  are  found   l  Specificity:  TN/(TN+FP)  –  how  well  are  non-­‐orthologues  excluded   l  Accuracy:  (TP+TN)/(TP+TN+FP+FN)  –  general  measure  of  performance   l  FDR:  FP/(FP+TP)  –  what  propor5on  of  predic5ons  are  incorrect   Salichos  et  al.  (2011)  PLoS  One.  doi:10.1371/journal.pone.0018755  
  • 42. Evalua'ng  Orthologue  Predic'ons   l Four  orthologue  predic5on  algorithms:   l  RBBH  (cRBH)   l  RSD  (cRSD)   l  Mul5Paranoid   l  OrthoMCL   l  cRBH  most  accurate,  and  specific,  with  lowest  FDR   Salichos  et  al.  (2011)  PLoS  One.  doi:10.1371/journal.pone.0018755  
  • 43. Evalua'ng  Orthologue  Predic'ons   l Tests  of  several  methods  on  a  number  of  literature-­‐based   benchmarks  for:   l  Correct  branching  of  phylogeny   l  Grouping  by  func5on   „ GO  similarity   „ EC  number   „ Expression  level   „ Gene  Neighbourhood   Altenhoff  &  Dessimoz  (2009)  PLoS  Comp.  Biol.  doi:10.1371/journal.pcbi.1000262  
  • 44. Evalua'ng  Orthologue  Predic'ons   Altenhoff  &  Dessimoz  (2009)  PLoS  Comp.  Biol.  doi:10.1371/journal.pcbi.1000262  
  • 45. Evalua'ng  Orthologue  Predic'ons   l 70  gene  family  test,  mul5ple  evolu5onary  scenarios   l Tested  databases  with  associated  algorithms:   Trachana  et  al.  (2011)  Bioessays.  doi:10.1002/bies.201100062  
  • 46. Evalua'ng  Orthologue  Predic'ons   l 70  gene  family  test  set,  mul5ple  evolu5onary  scenarios     l All  methods/dbs  have  strong  scope  for  improvement.   l OrthoMCL  poor  performer,  TreeFam  &  eggNOG  do  best   Trachana  et  al.  (2011)  Bioessays.  doi:10.1002/bies.201100062  
  • 47. Orthologue  Predic'on  Performance   l Performance  varies  by  choice  of  method  and   interpreta'on  of  “orthology”   l Biggest  influence  is  genome  annota'on  quality   l Rela've  performance  varies  with  benchmark  choice   l (clustering)  RBBH  outperforms  more  complex  algorithms   under  many  circumstances  
  • 48. Selec'on  Pressures   Signs  of  selec'on  pressure  iden'fiable  by   compara've  genomics  
  • 49. Selec'on  Pressures   l Defining  core  groups  of  genes  by  “orthology”  allows   analysis  of  those  groups:   l  Synteny/colloca'on   l  Gene  neighbourhood  changes  (e.g.  genome  expansion)   l  The  pangenome:  core  and  accessory  genomes   l and  sequences  in  those  groups:   l  Mul5ple  alignment   l  Domain  detec5on   l  Iden5fica5on  of  func5onal  sites   l  Inference  of  evolu'onary  pressures  
  • 50. Synteny   l Selec5ve  pressures  depend  on  gene  (product)  func5on   l Genes  involving  physically  or  func5onally-­‐interac5ng   proteins  tend  to  evolve  under  similar  selec5ve  constraints   l Par5cularly  in  bacteria,  this  leads  to  co-­‐expression  as   regulons  and  colloca5on  in  operons   l Colloca5on  (and  coregula5on)  may  be  iden5fied  by   compara5ve  genomics   l (This  is  also  true  when  considering  regulatory  or   metabolic  networks,  similarly  to  genome  organisa5on)   Alvarez-­‐Ponce  et  al.  (2011)  Genome  Biol.  Evol.  doi:10.1093/gbe/evq084  
  • 51. Synteny   l  Many  tools/packages/services  for  synteny  detec5on,   e.g.   l  SyMAP   „  hZp://www.agcol.arizona.edu/soware/ symap/   l  i-­‐ADHoRe   „  hZp://bioinforma5cs.psb.ugent.be/soware/ details/i-­‐-­‐ADHoRe   l  MCScan,  Cyntenator,  etc   Soderlund  et  al.  (2011)  Nucl.  Acids.  Res.  doi:10.1093/nar/gkr123   Proost  et  al.  (2011)  Nucl.  Acids  Res.  doi:10.1093/nar/gkr955  
  • 52. i-­‐ADHoRe   l Algorithm:   1.  Combine  tandem  repeats  of  genes/gene  sets   2.  Make  gene  homology  matrix  (GHM):  iden5fy  collinear  regions  (diagonals)   for  first  genome  pair   3.  Convert  these  to     profiles   4.  Use  GG2  algorithm  to   align  profiles   5.  Search  next  genome     with  profiles,  spliwng     them  where  necessary   6.  iterate  un5l  complete   l Gives  genome-­‐scale  mul5ple  alignments  of  blocks  of  genes   Proost  et  al.  (2011)  Nucl.  Acids  Res.  doi:10.1093/nar/gkr955  
  • 53. i-­‐ADHoRe   l [ACTIVITY]   l  i-ADHoRe/README.md  Markdown   l  i-ADHoRe.ipynb  iPython  notebook  
  • 54. Genome  Expansion   l Mobile/repeat  elements  reproduce  and  expand  during  evolu5on   l Generates  sequence  “laboratory”  for  varia5on  and  experiment   l e.g.  Phytophthora  infestans  effector  protein  expansion  and  arms  race   Haas  et  al.  (2009)  Nature.  doi:10.1038/nature08358  
  • 55. Genome  Expansion   l  Mobile  elements  (MEs)  are  large,   carry  genes  with  them.   l  Regions  rich  in  MEs  have  larger   gaps  between     consecu5ve  genes   l  Effector  proteins  are  found   preferen5ally  in  regions  with   large  gaps,  also  show  increased   rates  of  evolu5onary  divergence.   l  “Two-­‐speed  genome”  associated   with  adaptability  to  new  hosts/ escape  from  evolu5onary   “boZleneck”   Haas  et  al.  (2009)  Nature.  doi:10.1038/nature08358  
  • 56. The  Pangenome   l  The  gene  complement  of  a  set  of  organisms  (e.g.  species  group)  is  the     pangenome,  defined  by  the  union  of  two  gene  sets:   l  Core  genes:  genes  present  in  all  examples  (define  common  species   characteris5cs)   l  Accessory  genes:  genes  only  present  in  a  subset  of  examples  (relevant  to   adapta5on  of  individuals)   l  Defini5on  depends  on  composi5on  of  organism  set   l  Core  genome  hypothesis:   l  “The  core  genome  is  the  primary     cohesive  unit  defining  a  bacterial     species.”   l  Online  tools  available,  e.g.   l  Panseq  (hZp://lfz.corefacility.ca/panseq/)   Laing  et  al.  (2010)  BMC  Bioinf.  doi:10.1186/1471-­‐2105-­‐11-­‐461   Lefébure  et  al.  (2010)  Genome  Biol.  Evol.  doi:10.1093/gbe/evq048  
  • 57. Defining  a  species’  core  genome   l  “Orthologue  groups”  with  a   representa5ve  in  (nearly)  every   member  of  the  set   l  But  we  only  have  a  sample  of  the   species,  not  every  member…   l  …so  use  rarefac5on  curves  to   es5mate  core  genome  size.   1.  Randomly  order  organisms,   and  count  number  of  ‘core’   and  ‘new’  genes  seen  with   each  new  genome  addi5on.   2.  Repeat  un5l  you  have  a   reasonable  es5mate  of  error/ no  new  genes  found   Lefébure  et  al.  (2010)  Genome  Biol.  Evol.  doi:10.1093/gbe/evq048  
  • 58. Direc'onal  Selec'on   l Several  sta5s5cal  tests  for  direc5onal  selec5on,  e.g.   l  QTL  sign   l  Ka/Ks  (dN/dS)  ra'o  test  –  most  commonly  applied   l  Rela5ve  rate  test   l Ka/Ks  ra'o:   l  Ka  (or  dN):  number  of  non-­‐synonymous  subs5tu5ons  per  non-­‐ synonymous  site   l  Ks  (or  dS):  number  of  synonymous  subs5tu5ons  per  synonymous  site   l  Ka/Ks  >  1  ⇒  posi5ve  selec5on;  Ka/Ks  <  1  ⇒  stabilising  selec5on   l  Several  methods/tools  for  calcula5on   „ PAML  (hZp://abacus.gene.ucl.ac.uk/soware/paml.html)   „ SeqinR  (hZp://cran.r-­‐project.org/web/packages/seqinr/index.html)  
  • 59. Genome-­‐Wide  Posi've  Selec'on   Lefébure  &  Stanhope  (2009)  Genome  Res.  doi:10.1101/gr.089250.108  
  • 60. An  Analysis  Output   l Class  comparison:   animal-­‐pathogenic   (APE)  vs  plant-­‐ associated  bacteria   (PAB)   l Presence  of   horizontally-­‐acquired   islands  (HAI)   l Genes  with  greater   similarity  to  PAB  than   APE   Toth  et  al.  (2006)  Annu.  Rev.  Phytopath.  doi:10.1146/annurev.phyto.44.070505.143444  
  • 61. Things  I  Didn’t  Get  To   l Genome-­‐Wide  Associa'on  Studies  (GWAS):   l  Try  hZp://genenetwork.org/  to  play  with  some  data   l Predic'on  of  regulatory  elements,  e.g.   l  Kellis  et  al.  (2003)  Nature  doi:10.1038/nature01644   l  King  et  al.  (2007)  Genome  Res.  doi:10.1101/gr.5592107   l  Chaivorapol  et  al.  (2008)  BMC  Bioinf.  doi:10.1186/1471-­‐2105-­‐9-­‐455   l  CompMOBY:  hZp://genome.ucsf.edu/compmoby/   l Detec'on  of  Horizontal/Lateral  Gene  Transfer  (HGT/LGT),  e.g.   l  Tsirigos  &  Rigoutsos  (2005)  Nucl.  Acids.  Res.  doi:10.1093/nar/gki187   l Phylogenomics,  e.g.   l  Delsuc  et  al.  (2005)  Nat.  Rev.  Genet.  doi:10.1038/nrg1603  
  • 62. Finishing  The  Hat   Some  of  the  things  I  hope  you  have  taken   away  from  the  lectures/ac'vi'es  
  • 63. Take-­‐Home  Messages   l Compara've  genomics  is  a  powerful  set  of  techniques  for:   l  Understanding  and  iden5fying  evolu5onary  processes  and  mechanisms   l  Reconstruc5ng  detailed  evolu5onary  history  of  a  set  of  organisms   l  Iden5fying  and  understanding  common  genomic  features  of  organisms   l  Providing  hypotheses  about  gene  func5on  for  experimental  inves5ga5on   l A  huge  amount  of  data  is  available  to  work  with   l  And  it’s  only  going  to  get  much,  much  larger   l Results  feed  into  many  areas  of  study:   l  Medicine  and  health   l  Agriculture  and  food  security   l  Basic  biology  in  all  fields   l  Systems  and  synthe5c  biology  
  • 64. Take-­‐Home  Messages   l Compara've  genomics  is  essen'ally  based  around  comparisons   l  What  is  similar  between  two  genomes?  What  is  different?   l Compara've  genomics  is  evolu'onary  genomics   l Large  datasets  benefit  from  visualisa'on  for  effec've  interpreta'on   l  Much  scope  for  improvement  in  visualisa5on   l Tools  with  the  same  purpose  give  different  output   l  BLAST  vs  MUMmer   l  RBBH  vs  MCL   l  Choice  of  applica'on  magers  for  correctness  and  interpreta'on!  –   understand  what  the  applica'on  does,  and  its  limits.  
  • 65. Take-­‐Home  Messages   l Compara've  genomics  is   l Fun   l Indoor  work,  in  the  warm  and  dry   l Not  a  job  that  involves  heavy  liiing  
  • 66. Credits   l This  slideshow  is  shared  under  a  Crea5ve  Commons   AZribu5on  4.0  License   hZp://crea5vecommons.org/licenses/by/4.0/)   l Copyright  is  held  by  The  James  HuZon  Ins5tute   hZp://www.huZon.ac.uk   l You  may  freely  use  this  material  in  research,  papers,  and   talks  so  long  as  acknowledgement  is  made.