MAKER
The Genome Annotation Pipeline
GMOD Summer Course
May 19, 2014
Barry Moore/Carson Holt
Yandell Lab
University of Utah
MAKER
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
What are Annotations?
FunctionalStructural
Function
cAMP-dependent and sulfonylurea-sensitive anion transporter. Key
gatek...
Genomes Online Database
http://www.genomesonline.org/
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1998 2000 2002 2004 2...
http://www.genome.gov/
http://www.genome.gov/
100
1,600
3,200
4,800
6,400
8,000
0
Next Gen Genome Annotation 2013-
14
• Coelacanth
• Pine
• Sacred Lotus
• Conus ballatus
• Pigeon
• King Cobra
• Hymenopter...
The ‘NextGen’ Genome Project
Lab/Small Group Funding
Short-read Genome Sequencing
RNASeq Data
Genome/Transcriptome Ass...
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
The Source of Annotations
RNA and
Protein
Evidence
Accurate
Gene
Annotations
Ab Initio
Computational
Evidence
Annotating the Genome – Apollo View
current evidence
gene annotations
genome assembly
http://apollo.berkeleybop.org/
Identify and mask repetitive elements
current evidence
genome assembly
http://www.repeatmasker.org
Generate ab initio gene predictions
ab initio predictionsSNAP
GeneMark
Augustus
current evidence
genome assembly
http://ko...
Align RNA and protein evidence
ab initio predictions
protein - BLASTX
EST - BLASTN
altEST - TBLASTX
current evidence
genom...
Polish BLAST alignments with Exonerate
ab initio predictions
polished protein
polished EST
current evidence
genome assembl...
current evidence
Pass gene-finders evidence-based ‘hints’
ab initio predictions
Hint-based SNAP
Hint-based Augustus
genome...
current evidence
Identify gene model most consistent with evidence
ab initio predictions*
Hint-based SNAP
Hint-based Augus...
current evidence
Revise further if necessary; create new annotation
ab initio predictions
genome assembly
Compute support for each portion of gene model
Eilbeck et al BMC Bioinformatics 2009
genome assembly
Compute support for each portion of gene model
Cantarel BL et al., Genome Res 2008
genome assembly
MAKER2 Workflow
MAKER2 Distributed Workflow
Paralellization
Efficiency
Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
30 GB Pine genome
annotated in 37 hrs on
6,...
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
MAKER
The Genome Annotation Pipeline
GMOD Summer Course
May 19, 2014
Barry Moore/Carson Holt
Yandell Lab
University of Utah
MAKER2 Use Cases
1. De novo annotation providing quality metrics
2. Merging multiple annotation sets
3. Re-annotation with...
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Quality
Gold Standard Genes
SN SP AC
1.0 1.0 100%
Gold Standard Genes
Perfect Accuracy
Sensitivity, Specificity, Accuracy
As a Measure of Annotation Q...
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Sensitivity, Specificity, Accuracy...
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
0.5 1.0 80%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Poor Sensitivity
Sensi...
SN SP AC
1.0 1.0 100%
1.0 0.5 80%
0.5 1.0 80%
0.5 0.5 50%
Gold Standard Genes
Perfect Accuracy
Poor Specificity
Poor Sensi...
MAKER vs. Predictors
Holt C, Yandell M. BMC Bioinformatics. 2011
MAKER vs. Predictors
(the wrong HMM...)
Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
Annotation Edit Distance
Gold Standard GenesGold Standard
Evidence
Protein Alignments
EST Alignments
mRNASeq
Eilbeck et al...
Annotation Edit Distance
SN SP AED
1.0 1.0 0.0
1.0 0.5 0.2
0.5 1.0 0.2
0.5 0.5 0.5
Gold Standard
Evidence
Perfect Accuracy...
AED as a Measure of Genome Wide Annotation
Quality
Eilbeck et al BMC Bioinformatics 2009
TAIR Star Rating System
http://www.arabidopsis.org/
AED Agrees well with the TAIR star system
Evidence: mRNA-seq (17 experiments), ESTs, full length cDNAs, Swiss-Prot (minus ...
Holt C, Yandell M. BMC Bioinformatics. 2011
AED as a Measure of Annotation Quality
MAKER Annotations Match the
Evidence Well
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.25 0.5 0.75 1
CumulativeFractionofAn...
Protein Domain Content
As a Measure of Annotation Quality
Holt C, Yandell M. BMC Bioinformatics. 2011
MAKER vs. Predictors
Holt C, Yandell M. BMC Bioinformatics. 2011
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
http://derringer.genetics.utah.edu/cgi-bin/mwas/maker.cgi
MAKER Installation
• Automated query/answer based installation
script.
• Installs Perl prerequisites.
• Installs necessary...
MAKER Runtime Features
• Fill out a config file with input data and
parameters
• Parallelize:
– Running with MPI
– Simply ...
Accessory Scripts
Over 30 accessory scripts:
•cegma2zff
•chado2gff3
•cufflinks2gff3
•gff3_2_gtf
•gff3_preds2models
•gff3_t...
• The Annotation Problem
• How MAKER Works
• Why Choose MAKER
• Working with MAKER
MAKER
Acknowledgements
• Mark Yandell
– Carson Holt
– Mike Campbell
– Daniel Ence
– Steven Flygare
– Zev Kronenberg
– Qing Li
– ...
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecture
Upcoming SlideShare
Loading in …5
×

GMOD 2014 MAKER Lecture

2,171 views

Published on

Lecture for the MAKER2 Tutorial for the GMOD 2014 Summer Training

Published in: Science, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,171
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
108
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

GMOD 2014 MAKER Lecture

  1. 1. MAKER The Genome Annotation Pipeline GMOD Summer Course May 19, 2014 Barry Moore/Carson Holt Yandell Lab University of Utah
  2. 2. MAKER • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER
  3. 3. What are Annotations? FunctionalStructural Function cAMP-dependent and sulfonylurea-sensitive anion transporter. Key gatekeeper influencing intracellular cholesterol transport. Subcellular location Membrane; Multi-pass membrane protein Ref.13 Ref.14. Domain Multifunctional polypeptide with two homologous halves, each containing a hydrophobic membrane-anchoring domain and an ATP binding cassette (ABC) domain.
  4. 4. Genomes Online Database http://www.genomesonline.org/ 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1998 2000 2002 2004 2006 2008 2010 2012 Genomes Year Genome Project Status Incomplete Complete
  5. 5. http://www.genome.gov/
  6. 6. http://www.genome.gov/ 100 1,600 3,200 4,800 6,400 8,000 0
  7. 7. Next Gen Genome Annotation 2013- 14 • Coelacanth • Pine • Sacred Lotus • Conus ballatus • Pigeon • King Cobra • Hymenopterids • Fusarium cirinatum • Cardiocondyla obscurior • Burmese Python • Sarcocystis neurona • Spotted Gar • Apple magot fly
  8. 8. The ‘NextGen’ Genome Project Lab/Small Group Funding Short-read Genome Sequencing RNASeq Data Genome/Transcriptome Assembly Gene Annotation Genome Database / Blast Server Manual curation New assembly Reannotate/Merge annotations
  9. 9. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  10. 10. The Source of Annotations RNA and Protein Evidence Accurate Gene Annotations Ab Initio Computational Evidence
  11. 11. Annotating the Genome – Apollo View current evidence gene annotations genome assembly http://apollo.berkeleybop.org/
  12. 12. Identify and mask repetitive elements current evidence genome assembly http://www.repeatmasker.org
  13. 13. Generate ab initio gene predictions ab initio predictionsSNAP GeneMark Augustus current evidence genome assembly http://korflab.ucdavis.edu/
  14. 14. Align RNA and protein evidence ab initio predictions protein - BLASTX EST - BLASTN altEST - TBLASTX current evidence genome assembly http://blast.ncbi.nlm.nih.gov
  15. 15. Polish BLAST alignments with Exonerate ab initio predictions polished protein polished EST current evidence genome assembly http://www.ebi.ac.uk/~guy/exonerate/
  16. 16. current evidence Pass gene-finders evidence-based ‘hints’ ab initio predictions Hint-based SNAP Hint-based Augustus genome assembly
  17. 17. current evidence Identify gene model most consistent with evidence ab initio predictions* Hint-based SNAP Hint-based Augustus genome assembly
  18. 18. current evidence Revise further if necessary; create new annotation ab initio predictions genome assembly
  19. 19. Compute support for each portion of gene model Eilbeck et al BMC Bioinformatics 2009 genome assembly
  20. 20. Compute support for each portion of gene model Cantarel BL et al., Genome Res 2008 genome assembly
  21. 21. MAKER2 Workflow
  22. 22. MAKER2 Distributed Workflow
  23. 23. Paralellization Efficiency Holt C, Yandell M. BMC Bioinformatics. 2011 12:491. 30 GB Pine genome annotated in 37 hrs on 6,000 CPUs at the TACC
  24. 24. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  25. 25. MAKER The Genome Annotation Pipeline GMOD Summer Course May 19, 2014 Barry Moore/Carson Holt Yandell Lab University of Utah
  26. 26. MAKER2 Use Cases 1. De novo annotation providing quality metrics 2. Merging multiple annotation sets 3. Re-annotation with new evidence 4. Mapping annotations forward to a new assembly 5. Generating GMOD Compliant Output 1. Gbrowse/JBrowse 2. Apollo 3. Tripal
  27. 27. Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality Gold Standard Genes
  28. 28. SN SP AC 1.0 1.0 100% Gold Standard Genes Perfect Accuracy Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality
  29. 29. SN SP AC 1.0 1.0 100% 1.0 0.5 80% Gold Standard Genes Perfect Accuracy Poor Specificity Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality
  30. 30. SN SP AC 1.0 1.0 100% 1.0 0.5 80% 0.5 1.0 80% Gold Standard Genes Perfect Accuracy Poor Specificity Poor Sensitivity Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality
  31. 31. SN SP AC 1.0 1.0 100% 1.0 0.5 80% 0.5 1.0 80% 0.5 0.5 50% Gold Standard Genes Perfect Accuracy Poor Specificity Poor Sensitivity Poor Specificity and Sensitivity Sensitivity, Specificity, Accuracy As a Measure of Annotation Quality Guigó R et al. Genome Biol. 2006
  32. 32. MAKER vs. Predictors Holt C, Yandell M. BMC Bioinformatics. 2011
  33. 33. MAKER vs. Predictors (the wrong HMM...) Holt C, Yandell M. BMC Bioinformatics. 2011 12:491.
  34. 34. Annotation Edit Distance Gold Standard GenesGold Standard Evidence Protein Alignments EST Alignments mRNASeq Eilbeck et al BMC Bioinformatics 2009
  35. 35. Annotation Edit Distance SN SP AED 1.0 1.0 0.0 1.0 0.5 0.2 0.5 1.0 0.2 0.5 0.5 0.5 Gold Standard Evidence Perfect Accuracy Poor Specificity Poor Sensitivity Poor Specificity and Sensitivity Eilbeck et al BMC Bioinformatics 2009
  36. 36. AED as a Measure of Genome Wide Annotation Quality Eilbeck et al BMC Bioinformatics 2009
  37. 37. TAIR Star Rating System http://www.arabidopsis.org/
  38. 38. AED Agrees well with the TAIR star system Evidence: mRNA-seq (17 experiments), ESTs, full length cDNAs, Swiss-Prot (minus Arabidopsis) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.25 0.5 0.75 1 CumulativeFractionofAnnotations AED ***** (7,880) **** (12,654) *** (2,087) ** (2,188) * (1,788) (604)
  39. 39. Holt C, Yandell M. BMC Bioinformatics. 2011 AED as a Measure of Annotation Quality
  40. 40. MAKER Annotations Match the Evidence Well 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.25 0.5 0.75 1 CumulativeFractionofAnnotations AED TAIR10 rep transcripts (27,206) MAKER de novo (25,956) MAKER update of TAIR10 (26,885) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.25 0.5 0.75 1 CumulativeFractionofAnnotations AED chr10 rep transcripts (2,688) MAKER de novo (3,056) MAKER update of v3 (2,661) A. thaliana Z. mays Campbell et al, 2013 submitted
  41. 41. Protein Domain Content As a Measure of Annotation Quality Holt C, Yandell M. BMC Bioinformatics. 2011
  42. 42. MAKER vs. Predictors Holt C, Yandell M. BMC Bioinformatics. 2011
  43. 43. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  44. 44. http://derringer.genetics.utah.edu/cgi-bin/mwas/maker.cgi
  45. 45. MAKER Installation • Automated query/answer based installation script. • Installs Perl prerequisites. • Installs necessary executables – RepeatMasker (RepBase) – BLAST+ – Exonerate – SNAP • Even installs MWAS and MPICH2
  46. 46. MAKER Runtime Features • Fill out a config file with input data and parameters • Parallelize: – Running with MPI – Simply start multiple instances in the same directory. • Re-run MAKER in the same directory and it won't redo completed work. • Restart aborted jobs without losing any work.
  47. 47. Accessory Scripts Over 30 accessory scripts: •cegma2zff •chado2gff3 •cufflinks2gff3 •gff3_2_gtf •gff3_preds2models •gff3_to_eval_gtf •maker2chado •maker2jbrowse •maker2zff •tophat2gff3 •compare •evaluator •gff3_merge •fasta_merge •fasta_tool •fix_fasta •genemark_gtf2gff3 •ipr_update_gff •iprscan2gff3iprscan_batch •iprscan_wrap •maker_functional •maker_functional_fasta •maker_functional_gff •maker_map_ids •map2assembly •map_data_ids •map_fasta_ids •map_gff_ids •split_fasta
  48. 48. • The Annotation Problem • How MAKER Works • Why Choose MAKER • Working with MAKER MAKER
  49. 49. Acknowledgements • Mark Yandell – Carson Holt – Mike Campbell – Daniel Ence – Steven Flygare – Zev Kronenberg – Qing Li – Marc Singleton – Bretty Kennedy – Brandi Cantarel – Hadi Islam – Shawn Reynearson – Nicole Ruiz – Keith Simmons – Bret Heale • Alejandro Alvarado – Eric Ross • Jason Stajich • Sophia Robb • Kevin Childs • Shin-Han Shui • Ning Jiang • Yanni Sun

×