SlideShare a Scribd company logo
1 of 16
Download to read offline
The first near-complete assembly of
the hexaploid bread wheat genome,
Tritricum aestivum
Daniela Puiu
Aleksey Zimin, Richard Hall, Sarah Kingan, Bernardo Clavijo, Steven Salzberg
ICG-12
Oct 27 2017
IGC-12The Wheat Genome 2
Sequencing and Assembly of the
Ancestral and Common Wheat
Aegilops tauschii ssp strangulata accession AL8/78
Chinese spring variety (CS42, accession Dv418)
2013-2017
IGC-12The Wheat Genome 3
History of Wheat
~8,000 years ago: spontaneous hybridization
Emmer Wheat + Goat grass = Bread Wheat (World's 3rd
cereal crop)
Triticum turgidum + Aegilops tauschii = Triticum aestivum
AABB + DD = AABBDD
Whole Genome => Assisted Breeding => Improved Yield
IGC-12The Wheat Genome 4
The Wheat Genome
One of the most complex genomes !
1) Genome size: over 15 billion bases
2) Allohexapoild : six copies of each chromosome
3) >90% repeats
Multiple past attempts to assemble =>
assemblies shorter than the estimated genome size.
IGC-12The Wheat Genome 5
New vs Previous Assemblies
Tritricum 3.1
N50
232K
IGC-12The Wheat Genome 6
Data Reduction
Original Reads Number Sum Coverage Accuracy
Illumina 7.06G 1Tb 65x 99.5%
PacBio 55.5M 545Gb 36x 87.5%
Processed Seq Number Sum Coverage Accuracy
super-reads 95.7M 31Gb 2x 99.95%
mega-reads 57M 278Gb 18x 99.65%
MaSuRCA mega-reads
hybrid correction
IGC-12The Wheat Genome 7
MaSuRCA mega-reads Correction
IGC-12The Wheat Genome 8
Assembly Pipeline
MaSuRCA Correction
Illumina
Celera WGS Assembler
Mega-reads
Remove Duplicates
Tritricum 1.0
Tritricum 2.0
FALCON Correction
PacBio
FALCON Assembler
pReads
Arrow Polishing
FALCON Trit 0.5
FALCON Trit 1.0
k-mer Analysis
Merge
Tritricum 3.1
IGC-12The Wheat Genome 9
k-mer Analysis
50M
k-mers missing from the
PacBio assembly only
40M
30M
20M
10M
31-mer frequencies
IGC-12The Wheat Genome 10
Assembly Merge
Merging of the Hybrid and PacBio assembliesMerging of the Hybrid and PacBio assemblies
Tritricum 2.0 contig
FALCON contigA FALCON contigB
Tritricum 3.1
>5Kb >5Kb>5Kb
IGC-12The Wheat Genome 11
Assembly Statistics
Assembly Number Total size
(bp)
N50 size
(bp)
Triticum 2.0 375,328 14,395,027,822 75,599
FALCON Trit 1.0 97,809 12,939,100,857 215,314
Triticum 3.1 279,439 15,344,693,583 232,659
IGC-12The Wheat Genome 12
Run Time: 100 CPU years
Main
Steps
Run
Time
CPUhrs
Wall
Time
Months
MaSuRCA 100K 1.5
Celera WGS 470K 5
FALCON 150K 0.75
ARROW 160K 0.75
total 880K 9
100K CPU hrs=11.5 years
800K CPU hrs=100 years
IGC-12The Wheat Genome 13
Genome Repetitiveness
k-mer uniqueness ratios
WHEAT
FLY
COW
RICE
PINE
Ae tauschii
IGC-12The Wheat Genome 14
Publication
IGC-12The Wheat Genome 15
Conclusions
The most challenging genome (we) assembled!
Learning experience!
Assembly quality vs computational resources?
Share your data!
The most challenging genome (we) assembled!
Learning experience!
Assembly quality vs computational resources?
Share your data!
IGC-12The Wheat Genome 16
Acknowledgements
Steven Salzberg
Aleksey ZImin
Johns Hopkins University UCDavis Plant Sciences
Jan Dvorak
Earlham Institute
Bernardo Clavijo
Mingcheng Luo

More Related Content

Similar to Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum

2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...CGIAR Generation Challenge Programme
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...ICRISAT
 
CRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxCRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxsatish rana
 
CRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxCRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxsatish rana
 
CRISPR Is On The Move: Genome Editing From Rice To Wheat
CRISPR Is On The Move: Genome Editing From Rice To WheatCRISPR Is On The Move: Genome Editing From Rice To Wheat
CRISPR Is On The Move: Genome Editing From Rice To WheatFabio Caligaris
 
Technical innovations in processing cassava peels into new products for feedi...
Technical innovations in processing cassava peels into new products for feedi...Technical innovations in processing cassava peels into new products for feedi...
Technical innovations in processing cassava peels into new products for feedi...Humidtropics, a CGIAR Research Program
 
Tropical maize genome: what do we know so far and how to use that information
Tropical maize genome: what do we know so far and how to use that informationTropical maize genome: what do we know so far and how to use that information
Tropical maize genome: what do we know so far and how to use that informationCIMMYT
 
Crop genetic improvement and utilization in china. xinhai li
Crop genetic improvement and utilization in china. xinhai liCrop genetic improvement and utilization in china. xinhai li
Crop genetic improvement and utilization in china. xinhai liExternalEvents
 
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...CGIAR Generation Challenge Programme
 
THEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybeanTHEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybeanICARDA
 

Similar to Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum (13)

2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
 
CRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxCRYOPRESERVATION.pptx
CRYOPRESERVATION.pptx
 
CRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxCRYOPRESERVATION.pptx
CRYOPRESERVATION.pptx
 
CRISPR Is On The Move: Genome Editing From Rice To Wheat
CRISPR Is On The Move: Genome Editing From Rice To WheatCRISPR Is On The Move: Genome Editing From Rice To Wheat
CRISPR Is On The Move: Genome Editing From Rice To Wheat
 
Technical innovations in processing cassava peels into new products for feedi...
Technical innovations in processing cassava peels into new products for feedi...Technical innovations in processing cassava peels into new products for feedi...
Technical innovations in processing cassava peels into new products for feedi...
 
Hybrid seed production of pigeonpea
Hybrid seed production of pigeonpea Hybrid seed production of pigeonpea
Hybrid seed production of pigeonpea
 
Irc 2011-sm
Irc 2011-smIrc 2011-sm
Irc 2011-sm
 
Tropical maize genome: what do we know so far and how to use that information
Tropical maize genome: what do we know so far and how to use that informationTropical maize genome: what do we know so far and how to use that information
Tropical maize genome: what do we know so far and how to use that information
 
PFO_SBI_2015
PFO_SBI_2015PFO_SBI_2015
PFO_SBI_2015
 
Crop genetic improvement and utilization in china. xinhai li
Crop genetic improvement and utilization in china. xinhai liCrop genetic improvement and utilization in china. xinhai li
Crop genetic improvement and utilization in china. xinhai li
 
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
 
THEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybeanTHEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybean
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 

Recently uploaded (20)

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 

Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum

  • 1. The first near-complete assembly of the hexaploid bread wheat genome, Tritricum aestivum Daniela Puiu Aleksey Zimin, Richard Hall, Sarah Kingan, Bernardo Clavijo, Steven Salzberg ICG-12 Oct 27 2017
  • 2. IGC-12The Wheat Genome 2 Sequencing and Assembly of the Ancestral and Common Wheat Aegilops tauschii ssp strangulata accession AL8/78 Chinese spring variety (CS42, accession Dv418) 2013-2017
  • 3. IGC-12The Wheat Genome 3 History of Wheat ~8,000 years ago: spontaneous hybridization Emmer Wheat + Goat grass = Bread Wheat (World's 3rd cereal crop) Triticum turgidum + Aegilops tauschii = Triticum aestivum AABB + DD = AABBDD Whole Genome => Assisted Breeding => Improved Yield
  • 4. IGC-12The Wheat Genome 4 The Wheat Genome One of the most complex genomes ! 1) Genome size: over 15 billion bases 2) Allohexapoild : six copies of each chromosome 3) >90% repeats Multiple past attempts to assemble => assemblies shorter than the estimated genome size.
  • 5. IGC-12The Wheat Genome 5 New vs Previous Assemblies Tritricum 3.1 N50 232K
  • 6. IGC-12The Wheat Genome 6 Data Reduction Original Reads Number Sum Coverage Accuracy Illumina 7.06G 1Tb 65x 99.5% PacBio 55.5M 545Gb 36x 87.5% Processed Seq Number Sum Coverage Accuracy super-reads 95.7M 31Gb 2x 99.95% mega-reads 57M 278Gb 18x 99.65% MaSuRCA mega-reads hybrid correction
  • 7. IGC-12The Wheat Genome 7 MaSuRCA mega-reads Correction
  • 8. IGC-12The Wheat Genome 8 Assembly Pipeline MaSuRCA Correction Illumina Celera WGS Assembler Mega-reads Remove Duplicates Tritricum 1.0 Tritricum 2.0 FALCON Correction PacBio FALCON Assembler pReads Arrow Polishing FALCON Trit 0.5 FALCON Trit 1.0 k-mer Analysis Merge Tritricum 3.1
  • 9. IGC-12The Wheat Genome 9 k-mer Analysis 50M k-mers missing from the PacBio assembly only 40M 30M 20M 10M 31-mer frequencies
  • 10. IGC-12The Wheat Genome 10 Assembly Merge Merging of the Hybrid and PacBio assembliesMerging of the Hybrid and PacBio assemblies Tritricum 2.0 contig FALCON contigA FALCON contigB Tritricum 3.1 >5Kb >5Kb>5Kb
  • 11. IGC-12The Wheat Genome 11 Assembly Statistics Assembly Number Total size (bp) N50 size (bp) Triticum 2.0 375,328 14,395,027,822 75,599 FALCON Trit 1.0 97,809 12,939,100,857 215,314 Triticum 3.1 279,439 15,344,693,583 232,659
  • 12. IGC-12The Wheat Genome 12 Run Time: 100 CPU years Main Steps Run Time CPUhrs Wall Time Months MaSuRCA 100K 1.5 Celera WGS 470K 5 FALCON 150K 0.75 ARROW 160K 0.75 total 880K 9 100K CPU hrs=11.5 years 800K CPU hrs=100 years
  • 13. IGC-12The Wheat Genome 13 Genome Repetitiveness k-mer uniqueness ratios WHEAT FLY COW RICE PINE Ae tauschii
  • 14. IGC-12The Wheat Genome 14 Publication
  • 15. IGC-12The Wheat Genome 15 Conclusions The most challenging genome (we) assembled! Learning experience! Assembly quality vs computational resources? Share your data! The most challenging genome (we) assembled! Learning experience! Assembly quality vs computational resources? Share your data!
  • 16. IGC-12The Wheat Genome 16 Acknowledgements Steven Salzberg Aleksey ZImin Johns Hopkins University UCDavis Plant Sciences Jan Dvorak Earlham Institute Bernardo Clavijo Mingcheng Luo