SlideShare a Scribd company logo
1 of 27
Assembling the Norway Spruce genome :
     20 Gb and many challenges

               Douglas G. Scofield

   Umeå Plant Science Centre, Umeå University
Picea abies: A Very Large Genome
• 20 Gb, nearly 7x human
   • typical for conifers
• Little known about gene and genome structure
• Only ~0.1 % consists of genes
• Largest genome sequenced so far
• Other ongoing conifer sequencing projects
   • Picea glauca (Canada)
   • Pinus taeda (US)




 Arabidopsis Poplar           Humans             Norway spruce
 (120 Mbp) (450 Mbp)        (3 000 Mbp)           (20 000 Mbp)
Sequencing overview
      2n needle tissue                                 1n megagametophytes
                            Illumina paired-end
                            • 150-bp insert
                            • 300-bp insert
                            • 650-bp insert

                            454 single-end (1.5x)

Mate-pair libraries         454 and Illumina
• 2kbp insert                   transcriptomes
• 4kbp insert                                       Three megagametophytes
• 10kbp insert                                      from the reference tree
• Fosmid ends (40kbp)
                                                    ~30-35x input coverage
                                                    for each
~50x input coverage
                                                        plus, fosmid pools…
Fosmid pools: the plan



       Fosmid library
                                ~40kbp / fosmid


   Pool
                .. . . . .. ... .....
                 . . .. .



~1000 fosmids / pool               = ~40 Mbp @ 1n, much simpler assembly
8 pools / HiSeq lane               = 60x coverage with 300-bp PE reads
500 pools                          = 1x genome coverage
Fosmid pools: the reality
                              40% reads lost to E. coli, and 20-50% mitochondrial

                              ~0.5x realized genome coverage

                              BUT … low redundancy among pools, and assembly is easier
                                                                                                            100
                                                                                                             80




                                                                                            % recovered
                                                                                                             60
                                                                                                             40
                    400




                                                                                                             20
WGS assembly, Mbp

                    300




                                                                                                              0   >5kbp
                                                                                                                  >10kbp
                    200




                                                                                                                  >20kbp
                                                                                                            100
                    100




                                                                                              % recovered
                                                                                                             80
                                                                                                             60
                    0




                          0    100   200         300         400          500   600   700

                                      Cumulated assembly of fosmid pools, Mbp
                                                                                                             40
                                                                                                             20
                                                                                                              0
Putting it all together


   WGS                                                                       CLC
   (1n)

                                                                             CLC +
Fosmid pool                                                                  BESST

                 GAM : Genomic Assemblies Merger (Casagrande et al.)


  Merged

              BESST : Fast, lightweight scaffolder developed at SciLifeLab



    Final
                Multiple rounds of GAM, scaffolding with transcripts, …
Assembly quality
•   >80% of genome in scaffolds and contigs
•   14% total genome length in scaffolds >10Kb (vs. 1%)
•   N50 8800 bp (vs. 2900 bp)
•   Longest contig 1.15 Mb (vs. 206 Kb)




                                           Feature Response Curves:
    Accumulated length of contigs   quantifying quality using configurations
                                        of mapped paired-end reads
Now for a little spruce biology…



        Genetic diversity

        Low gene content

         Allele splitting

         Repeat content
Genetic diversity




                                                                    0.008
                                            Zygosity correlation
ε = 2.46 × 10–3 errors/read-base




                                                                    0.006
    – sequencing error + misaligned reads




                                                                    0.004
Θ = 9.47 × 10–3 ± 0.007 × 10–3




                                                                    0.002
    – scaled mutation rate, Θ = 4Neμ                                                            Δ, zygosity correlation




                                                                    0.000
                                                                            50   100 150 200 250 300 350 400 450 500 550 600 650 700

Ne ≈ 1.0 – 1.7 ×   106                                                                    Distance between sites (bp)
    – using LTR-based estimates of μ
                                                                                      Heterozygous sites
                                                                   1.4
Ho ≈ 0.27%                                  Count (Mb)             1.2
                                                                   1.0
    – 1.42 Gbp examined                                            0.8
                                                                   0.6
                                                                   0.4
                                                                   0.2
                                                                   0.0

                                                                            W (AT) S (CG)     K (GT) M (AC) R (AG)       Y (CT)

                                                                                       IUPAC Code (Genotype)
Low gene content
Random promoter sequences…

    12 bp : 730 locations
  8 bp : ~220,000 locations
Frequent aberrant transcription?
Chromatin structure strongly controlled?
Genes clustered?
Are there inefficiencies in transcript processing?
Allele splitting




    Median coverage of 2n contigs   Median coverage of 1n contigs
Identifying allele splitting: self-self blasts
Repeat content in contig ends vs. middles
20-mers appearing in 100-bp segments from 100K contigs




         Gepard: http://www.helmholtz-muenchen.de/en/mips/services/analysis-tools/gepard/index.html
The repeat landscape in Picea abies (so far)
Transposable elements!

LINE           0,44
LTR copia      13,69
                         3 human genomes
LTR gypsy      29,61
                              of LTRs!
LTR uncl.      15,22
DNA_TEs        0,57
Unclassified   9,24
Total          68,77




                                           Ty1-copia RT
Age of LTR insertions




4 Mya        Time back to LTR insertion (MYA)   85 Mya
Norway spruce genome: the state of things
• ~80% of genome assembled
   – still highly fragmented ... repeats!
• Fosmid pool / long fragment strategy will work
   – necessary for spanning repeats and filling gaps
   – requires improvements in both sequencing and software
     technologies
• Biology
   –   Allele splitting : improved assembly technology needed
   –   ~70% repeats, but ~none of the TEs are active
   –   Gene and genome structure still being unravelled
   –   Comparisons against 5 other conifers
The Spruce Genome Team
UPSC                                                                               SciLifeLab
Rishikesh Bhalerao                                                                 Andrey Alexeyenko
Simon Birve                                                                        Björn Andersson
Ulrika Egertsdotter                                                                Siv Andersson
Ioana Gaboreanu                                                                    Lars Arvestad
Rosario Garcia-Gil                                                                 Frida Berglund
Per Gardeström                                                                     Oscar Franzén
Thomas Hiltonen                                                                    Manfred Grabherr
Torgeir Hvidsten                                                                   Kicki Holmberg
Pär Ingvarsson                                                                     Lisa Klasson
Stefan Jansson                                                                     Max Käller
Olivier Keech                                                                      Joakim Lundeberg
Susanne Larsson                                                                    Fredrik Lysholm
Chanaka Mannapperuma                                                               Björn Nystedt
Ove Nilsson                                                                        Kristoffer Sahlin
Douglas Scofield                                                                   Ellen Sherwood
Nathaniel Street                                                                   Anna Sköllermo
Björn Sundberg         VIB Gent           IGA Udine              CHORI Oakland     Anne-Charlotte Sonnhammer
Stacey Lee Thompson    Yves Van de Peer   Michele Morgante       Pieter de Jong    Thomas Svensson
Zhi-Qiang Wu           Yao-Cheng Lin      Francesco Vezzi        Maxim Koriabine   Carlos Talavera-Lopez
Harry Wu                                  Riccardo                                 Anna Wetterbom
SAB                                       Vicedomini
Kerstin Lindblad-Toh                      Andrea Zuccolo
                       Skogforsk          SNIC Supercomputers    SNISS national    CLCbio
John MacKay                                                      infrastructure
                       Bengt Andersson    Uppmax/PDC/NSC/HPC2N
Outi Savolainen                                                                    Lucigen
                       Bo Karlsson
Detlef Weigel
The spruce
Thanks for
listening!
Data processing outline

                             Quality filtered                Remove phiX                    De novo
      Raw data
                                  data                      (+ chloroplast)                assembly
       ~6 Tbp                rNA, Fastx, FastQC            BWA, rNA, FastQC            CLC, (Velvet, Newbler)


       Repeat                    Assembly                     Merging of
                                                                                          Scaffolding
     annotation                  validation                   assemblies
RepeatMasker, Repeat         FRC [custom toolkit]           GAM [custom tool]          BESST [custom tool]
Scout, BLAST, custom tools


       Gene
                             Genome portal                                Aims (Phase 1)
     annotation
                                                                          • Public genome resource
        EUGene                                                            • Genes (and gene families)
                                                                          • Repeats
                                                                          • Evolutionary insight
          Transcriptome
            sequencing
                                              550 libs
                                              20-30 different lib types

More Related Content

Viewers also liked

Wedding slideshow
Wedding slideshowWedding slideshow
Wedding slideshowllee1986
 
Building your global library
Building your global libraryBuilding your global library
Building your global libraryKathyGShort
 
Globalizing the CCSS - Amy
Globalizing the CCSS - AmyGlobalizing the CCSS - Amy
Globalizing the CCSS - AmyKathyGShort
 
Fastech and Creative Writing at Winchester
Fastech and Creative Writing at WinchesterFastech and Creative Writing at Winchester
Fastech and Creative Writing at WinchesterFASTECH Project
 
Neil Glen - Feedback Forms
Neil Glen - Feedback FormsNeil Glen - Feedback Forms
Neil Glen - Feedback FormsFASTECH Project
 
The Arena is I 91 Compliant
The Arena is I 91 CompliantThe Arena is I 91 Compliant
The Arena is I 91 Compliantsonicsarena
 
¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?
¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?
¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?UCT ICO
 

Viewers also liked (10)

Kerken en kloosters
Kerken en kloostersKerken en kloosters
Kerken en kloosters
 
Wedding slideshow
Wedding slideshowWedding slideshow
Wedding slideshow
 
Building your global library
Building your global libraryBuilding your global library
Building your global library
 
Globalizing the CCSS - Amy
Globalizing the CCSS - AmyGlobalizing the CCSS - Amy
Globalizing the CCSS - Amy
 
Fastech and Creative Writing at Winchester
Fastech and Creative Writing at WinchesterFastech and Creative Writing at Winchester
Fastech and Creative Writing at Winchester
 
Neil Glen - Feedback Forms
Neil Glen - Feedback FormsNeil Glen - Feedback Forms
Neil Glen - Feedback Forms
 
Worlds of Words
Worlds of WordsWorlds of Words
Worlds of Words
 
Kenes
KenesKenes
Kenes
 
The Arena is I 91 Compliant
The Arena is I 91 CompliantThe Arena is I 91 Compliant
The Arena is I 91 Compliant
 
¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?
¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?
¿Podrá Europa reducir la prevalencia de consumo al 30% en 2025?
 

Similar to Assembling the Norway Spruce Genome: 20Gb and many challenges, Umeå Plant Sciences Centre, Umeå University, Douglas G. Scofield Copenhagenomics 2012

Solanaceae comparative genomics prl lunchtime seminar 2009
Solanaceae comparative genomics   prl lunchtime seminar 2009Solanaceae comparative genomics   prl lunchtime seminar 2009
Solanaceae comparative genomics prl lunchtime seminar 2009Brett Whitty
 
Tavartkiladze Stp12
Tavartkiladze Stp12Tavartkiladze Stp12
Tavartkiladze Stp12similei
 
Mine Rock Geochemistry and Pit Lake Model
Mine Rock Geochemistry and Pit Lake ModelMine Rock Geochemistry and Pit Lake Model
Mine Rock Geochemistry and Pit Lake ModelHudbay Minerals Inc.
 
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Emulex Corporation
 
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Emulex Corporation
 
Capacity analysis of gsm systems using slow frequency hoppin
Capacity analysis of gsm systems using slow frequency hoppinCapacity analysis of gsm systems using slow frequency hoppin
Capacity analysis of gsm systems using slow frequency hoppinShiju Chacko
 
22 ionics mass spectrometry - boldly going where no mass spec has gone before...
22 ionics mass spectrometry - boldly going where no mass spec has gone before...22 ionics mass spectrometry - boldly going where no mass spec has gone before...
22 ionics mass spectrometry - boldly going where no mass spec has gone before...CPSA-2012_5-Minutes-Fame
 
Ben Shafer Asa Seattle2011 Fri Aft 5p A Aa
Ben Shafer Asa Seattle2011 Fri Aft 5p A AaBen Shafer Asa Seattle2011 Fri Aft 5p A Aa
Ben Shafer Asa Seattle2011 Fri Aft 5p A Aaotishobbes
 

Similar to Assembling the Norway Spruce Genome: 20Gb and many challenges, Umeå Plant Sciences Centre, Umeå University, Douglas G. Scofield Copenhagenomics 2012 (8)

Solanaceae comparative genomics prl lunchtime seminar 2009
Solanaceae comparative genomics   prl lunchtime seminar 2009Solanaceae comparative genomics   prl lunchtime seminar 2009
Solanaceae comparative genomics prl lunchtime seminar 2009
 
Tavartkiladze Stp12
Tavartkiladze Stp12Tavartkiladze Stp12
Tavartkiladze Stp12
 
Mine Rock Geochemistry and Pit Lake Model
Mine Rock Geochemistry and Pit Lake ModelMine Rock Geochemistry and Pit Lake Model
Mine Rock Geochemistry and Pit Lake Model
 
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
 
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
Get Better I/O Performance in VMware vSphere 5.1 Environments with Emulex 16G...
 
Capacity analysis of gsm systems using slow frequency hoppin
Capacity analysis of gsm systems using slow frequency hoppinCapacity analysis of gsm systems using slow frequency hoppin
Capacity analysis of gsm systems using slow frequency hoppin
 
22 ionics mass spectrometry - boldly going where no mass spec has gone before...
22 ionics mass spectrometry - boldly going where no mass spec has gone before...22 ionics mass spectrometry - boldly going where no mass spec has gone before...
22 ionics mass spectrometry - boldly going where no mass spec has gone before...
 
Ben Shafer Asa Seattle2011 Fri Aft 5p A Aa
Ben Shafer Asa Seattle2011 Fri Aft 5p A AaBen Shafer Asa Seattle2011 Fri Aft 5p A Aa
Ben Shafer Asa Seattle2011 Fri Aft 5p A Aa
 

More from Copenhagenomics

Comparative metagenomics: quantifying similarities between environments, CMBI...
Comparative metagenomics: quantifying similarities between environments, CMBI...Comparative metagenomics: quantifying similarities between environments, CMBI...
Comparative metagenomics: quantifying similarities between environments, CMBI...Copenhagenomics
 
Exome sequencing for disease gene identification and patient diagnostics, Gen...
Exome sequencing for disease gene identification and patient diagnostics, Gen...Exome sequencing for disease gene identification and patient diagnostics, Gen...
Exome sequencing for disease gene identification and patient diagnostics, Gen...Copenhagenomics
 
Integrating omic approaches to investigate the gut microbiota, School of Bios...
Integrating omic approaches to investigate the gut microbiota, School of Bios...Integrating omic approaches to investigate the gut microbiota, School of Bios...
Integrating omic approaches to investigate the gut microbiota, School of Bios...Copenhagenomics
 
Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...Copenhagenomics
 
Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...
Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...
Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...Copenhagenomics
 
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...Copenhagenomics
 
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...Copenhagenomics
 
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...Copenhagenomics
 
Clinical translation of prostate cancer genomics, Department of Biosciences a...
Clinical translation of prostate cancer genomics, Department of Biosciences a...Clinical translation of prostate cancer genomics, Department of Biosciences a...
Clinical translation of prostate cancer genomics, Department of Biosciences a...Copenhagenomics
 
Can we exploit the power of NGS to move towards personalized medicine?, Cente...
Can we exploit the power of NGS to move towards personalized medicine?, Cente...Can we exploit the power of NGS to move towards personalized medicine?, Cente...
Can we exploit the power of NGS to move towards personalized medicine?, Cente...Copenhagenomics
 
Sequencing the entire nation of the Faroe Islands - from sequencing to societ...
Sequencing the entire nation of the Faroe Islands - from sequencing to societ...Sequencing the entire nation of the Faroe Islands - from sequencing to societ...
Sequencing the entire nation of the Faroe Islands - from sequencing to societ...Copenhagenomics
 
Uncovering the impacts of circumcision on the penis microbiome, Translational...
Uncovering the impacts of circumcision on the penis microbiome, Translational...Uncovering the impacts of circumcision on the penis microbiome, Translational...
Uncovering the impacts of circumcision on the penis microbiome, Translational...Copenhagenomics
 
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...Copenhagenomics
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Copenhagenomics
 

More from Copenhagenomics (15)

Wigard P. Kloosterman
Wigard P. KloostermanWigard P. Kloosterman
Wigard P. Kloosterman
 
Comparative metagenomics: quantifying similarities between environments, CMBI...
Comparative metagenomics: quantifying similarities between environments, CMBI...Comparative metagenomics: quantifying similarities between environments, CMBI...
Comparative metagenomics: quantifying similarities between environments, CMBI...
 
Exome sequencing for disease gene identification and patient diagnostics, Gen...
Exome sequencing for disease gene identification and patient diagnostics, Gen...Exome sequencing for disease gene identification and patient diagnostics, Gen...
Exome sequencing for disease gene identification and patient diagnostics, Gen...
 
Integrating omic approaches to investigate the gut microbiota, School of Bios...
Integrating omic approaches to investigate the gut microbiota, School of Bios...Integrating omic approaches to investigate the gut microbiota, School of Bios...
Integrating omic approaches to investigate the gut microbiota, School of Bios...
 
Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...Sequencing the transcriptome reveals complex layers of regulation, Department...
Sequencing the transcriptome reveals complex layers of regulation, Department...
 
Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...
Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...
Informatics and Computing Infrastructure for Clinical High-Throughput Sequenc...
 
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
Recent Advances in NGS Technologies, LaserGen & Baylor College of Medicine, M...
 
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
 
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
High-Throughput Sequencing of the Human Microbiome, Rob Knight Research Group...
 
Clinical translation of prostate cancer genomics, Department of Biosciences a...
Clinical translation of prostate cancer genomics, Department of Biosciences a...Clinical translation of prostate cancer genomics, Department of Biosciences a...
Clinical translation of prostate cancer genomics, Department of Biosciences a...
 
Can we exploit the power of NGS to move towards personalized medicine?, Cente...
Can we exploit the power of NGS to move towards personalized medicine?, Cente...Can we exploit the power of NGS to move towards personalized medicine?, Cente...
Can we exploit the power of NGS to move towards personalized medicine?, Cente...
 
Sequencing the entire nation of the Faroe Islands - from sequencing to societ...
Sequencing the entire nation of the Faroe Islands - from sequencing to societ...Sequencing the entire nation of the Faroe Islands - from sequencing to societ...
Sequencing the entire nation of the Faroe Islands - from sequencing to societ...
 
Uncovering the impacts of circumcision on the penis microbiome, Translational...
Uncovering the impacts of circumcision on the penis microbiome, Translational...Uncovering the impacts of circumcision on the penis microbiome, Translational...
Uncovering the impacts of circumcision on the penis microbiome, Translational...
 
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc,...
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
 

Recently uploaded

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonhttgc7rh9c
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesSHIVANANDaRV
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17Celine George
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 

Recently uploaded (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food Additives
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 

Assembling the Norway Spruce Genome: 20Gb and many challenges, Umeå Plant Sciences Centre, Umeå University, Douglas G. Scofield Copenhagenomics 2012

  • 1. Assembling the Norway Spruce genome : 20 Gb and many challenges Douglas G. Scofield Umeå Plant Science Centre, Umeå University
  • 2. Picea abies: A Very Large Genome • 20 Gb, nearly 7x human • typical for conifers • Little known about gene and genome structure • Only ~0.1 % consists of genes • Largest genome sequenced so far • Other ongoing conifer sequencing projects • Picea glauca (Canada) • Pinus taeda (US) Arabidopsis Poplar Humans Norway spruce (120 Mbp) (450 Mbp) (3 000 Mbp) (20 000 Mbp)
  • 3. Sequencing overview 2n needle tissue 1n megagametophytes Illumina paired-end • 150-bp insert • 300-bp insert • 650-bp insert 454 single-end (1.5x) Mate-pair libraries 454 and Illumina • 2kbp insert transcriptomes • 4kbp insert Three megagametophytes • 10kbp insert from the reference tree • Fosmid ends (40kbp) ~30-35x input coverage for each ~50x input coverage plus, fosmid pools…
  • 4. Fosmid pools: the plan Fosmid library ~40kbp / fosmid Pool .. . . . .. ... ..... . . .. . ~1000 fosmids / pool = ~40 Mbp @ 1n, much simpler assembly 8 pools / HiSeq lane = 60x coverage with 300-bp PE reads 500 pools = 1x genome coverage
  • 5. Fosmid pools: the reality 40% reads lost to E. coli, and 20-50% mitochondrial ~0.5x realized genome coverage BUT … low redundancy among pools, and assembly is easier 100 80 % recovered 60 40 400 20 WGS assembly, Mbp 300 0 >5kbp >10kbp 200 >20kbp 100 100 % recovered 80 60 0 0 100 200 300 400 500 600 700 Cumulated assembly of fosmid pools, Mbp 40 20 0
  • 6. Putting it all together WGS CLC (1n) CLC + Fosmid pool BESST GAM : Genomic Assemblies Merger (Casagrande et al.) Merged BESST : Fast, lightweight scaffolder developed at SciLifeLab Final Multiple rounds of GAM, scaffolding with transcripts, …
  • 7. Assembly quality • >80% of genome in scaffolds and contigs • 14% total genome length in scaffolds >10Kb (vs. 1%) • N50 8800 bp (vs. 2900 bp) • Longest contig 1.15 Mb (vs. 206 Kb) Feature Response Curves: Accumulated length of contigs quantifying quality using configurations of mapped paired-end reads
  • 8. Now for a little spruce biology… Genetic diversity Low gene content Allele splitting Repeat content
  • 9. Genetic diversity 0.008 Zygosity correlation ε = 2.46 × 10–3 errors/read-base 0.006 – sequencing error + misaligned reads 0.004 Θ = 9.47 × 10–3 ± 0.007 × 10–3 0.002 – scaled mutation rate, Θ = 4Neμ Δ, zygosity correlation 0.000 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Ne ≈ 1.0 – 1.7 × 106 Distance between sites (bp) – using LTR-based estimates of μ Heterozygous sites 1.4 Ho ≈ 0.27% Count (Mb) 1.2 1.0 – 1.42 Gbp examined 0.8 0.6 0.4 0.2 0.0 W (AT) S (CG) K (GT) M (AC) R (AG) Y (CT) IUPAC Code (Genotype)
  • 11.
  • 12. Random promoter sequences… 12 bp : 730 locations 8 bp : ~220,000 locations
  • 16. Are there inefficiencies in transcript processing?
  • 17.
  • 18.
  • 19. Allele splitting Median coverage of 2n contigs Median coverage of 1n contigs
  • 20. Identifying allele splitting: self-self blasts
  • 21. Repeat content in contig ends vs. middles 20-mers appearing in 100-bp segments from 100K contigs Gepard: http://www.helmholtz-muenchen.de/en/mips/services/analysis-tools/gepard/index.html
  • 22. The repeat landscape in Picea abies (so far) Transposable elements! LINE 0,44 LTR copia 13,69 3 human genomes LTR gypsy 29,61 of LTRs! LTR uncl. 15,22 DNA_TEs 0,57 Unclassified 9,24 Total 68,77 Ty1-copia RT
  • 23. Age of LTR insertions 4 Mya Time back to LTR insertion (MYA) 85 Mya
  • 24. Norway spruce genome: the state of things • ~80% of genome assembled – still highly fragmented ... repeats! • Fosmid pool / long fragment strategy will work – necessary for spanning repeats and filling gaps – requires improvements in both sequencing and software technologies • Biology – Allele splitting : improved assembly technology needed – ~70% repeats, but ~none of the TEs are active – Gene and genome structure still being unravelled – Comparisons against 5 other conifers
  • 25. The Spruce Genome Team UPSC SciLifeLab Rishikesh Bhalerao Andrey Alexeyenko Simon Birve Björn Andersson Ulrika Egertsdotter Siv Andersson Ioana Gaboreanu Lars Arvestad Rosario Garcia-Gil Frida Berglund Per Gardeström Oscar Franzén Thomas Hiltonen Manfred Grabherr Torgeir Hvidsten Kicki Holmberg Pär Ingvarsson Lisa Klasson Stefan Jansson Max Käller Olivier Keech Joakim Lundeberg Susanne Larsson Fredrik Lysholm Chanaka Mannapperuma Björn Nystedt Ove Nilsson Kristoffer Sahlin Douglas Scofield Ellen Sherwood Nathaniel Street Anna Sköllermo Björn Sundberg VIB Gent IGA Udine CHORI Oakland Anne-Charlotte Sonnhammer Stacey Lee Thompson Yves Van de Peer Michele Morgante Pieter de Jong Thomas Svensson Zhi-Qiang Wu Yao-Cheng Lin Francesco Vezzi Maxim Koriabine Carlos Talavera-Lopez Harry Wu Riccardo Anna Wetterbom SAB Vicedomini Kerstin Lindblad-Toh Andrea Zuccolo Skogforsk SNIC Supercomputers SNISS national CLCbio John MacKay infrastructure Bengt Andersson Uppmax/PDC/NSC/HPC2N Outi Savolainen Lucigen Bo Karlsson Detlef Weigel
  • 27. Data processing outline Quality filtered Remove phiX De novo Raw data data (+ chloroplast) assembly ~6 Tbp rNA, Fastx, FastQC BWA, rNA, FastQC CLC, (Velvet, Newbler) Repeat Assembly Merging of Scaffolding annotation validation assemblies RepeatMasker, Repeat FRC [custom toolkit] GAM [custom tool] BESST [custom tool] Scout, BLAST, custom tools Gene Genome portal Aims (Phase 1) annotation • Public genome resource EUGene • Genes (and gene families) • Repeats • Evolutionary insight Transcriptome sequencing 550 libs 20-30 different lib types

Editor's Notes

  1. Cumbersome pipeline -> rNA much easier!Try to stay with standard software. Saves time and resources for us. One tool that we couldn’t do without. Any guesses?