SlideShare a Scribd company logo
1 of 28
Download to read offline
1
ScanPAV:
Presence-Absence Variation in Genome Pairs
Francesca Giordano
IGM, November 7th, 2017
Estimation of Genetic
Divergence
‣ As important as SNPs/indels,
CNVs to estimate genetic
variations
‣ Provides insights into
population or cancer evolution
‣ Helps identifying foreign DNA
contamination
‣ Helps identifying lateral
transfer of sequences from
viruses or bacteria
2
Presence-Absence Variation (PAV)
Assembly Strategy and
Technology Assessment
‣ Determines the completeness of
an assembly
‣ Helps assessing an assembly
strategy and pipeline
‣ Helps identifying strengths and
weaknesses in new sequencing
technologies
3
Presence Assembly
ScanPAV: pipeline
1 Kb
Presence Assembly
Shred into 1Kb chunks
3
Presence Assembly
AbsenceAssembly
Scaffold 1
Scaffold 2
Scaffold 3
ScanPAV: pipeline
1 Kb
Presence Assembly
Shred into 1Kb chunks
[SMALT Aligner]
Map Against Absence Assembly
3
Presence Assembly
AbsenceAssembly
Scaffold 1
Scaffold 2
Scaffold 3
ScanPAV: pipeline
1 Kb
Presence Assembly
Shred into 1Kb chunks
[SMALT Aligner]
Map Against Absence Assembly
Filter noisy mapping
3
Presence Assembly
AbsenceAssembly
Scaffold 1
Scaffold 2
Scaffold 3
ScanPAV: pipeline
1 Kb
Presence Assembly
Shred into 1Kb chunks
[SMALT Aligner]
Map Against Absence Assembly
Output Not Mapped Areas
Filter noisy mapping
Human Assemblies
‣ HuRef (2007), 7.5x whole-genome shotgun Sanger 800 bp PE reads, Celera [Levy
S, et al., 2007, PLOS Biology 5(10), e254]
‣ Hs2-HiC (2017) 60x Illumina 250 bp PE reads, DISCOVAR denovo, Hi-C data for
scaffolding [Dudchenko O. et al., 2017, Science, doi: 10.1126/science.aal3327]
‣ Illumina (2016) 39x short-insert and 24x long-insert Illumina 101 bp PE reads,
spaDES, 10X genomics and Bionano data for scaffolding [Mostovoy Y. et al., 2016,
Nature Methods, 13, 587]
‣ AK1 (2016) 101x PacBio long reads (mean 10Kb), Falcon, polished with Quiver and
Pilon (Illumina), Bionano maps for scaffolding [Seo J.S. et al., 2016, Nature, 538, 243]
‣ ONT_30x (2017) 30x Oxford Nanopore long reads (mean 11Kb), Canu, polished
with Pilon (Illumina). No scaffolds [Jain, M. et al., 2017, bioRxiv, doi:10.1101/128835]
‣ ONT_35x (2017) Same as ONT_30x plus 5x extra-long Oxford Nanopore reads (n50
65Kb), Canu, polished with Pilon (Illumina). No scaffolds [Jain, M. et al., 2017,
bioRxiv, doi:10.1101/128835]
4
Human Assemblies
5
Assembly
#Bases
(Gb)
#Contigs/
Scaffolds
Longest
(Mb)
Contig-n50
(Mb)
Scaffold-n50
(Mb)
GRCh38 3.1 24 249 59 156
HuRef 2.8 3,134 235 0.11 144
AK1 2.9 2,832 114 18 45
Hs2-HiC 2.8 44,065 225 0.10 141
Illumina 2.9 170 100 0.01 33
ONT_30x 2.8 2,886 28 4 —
ONT_35x 2.9 2,337 50 8 —
Human Assemblies
5
Assembly
#Bases
(Gb)
#Contigs/
Scaffolds
Longest
(Mb)
Contig-n50
(Mb)
Scaffold-n50
(Mb)
GRCh38 3.1 24 249 59 156
HuRef 2.8 3,134 235 0.11 144
AK1 2.9 2,832 114 18 45
Hs2-HiC 2.8 44,065 225 0.10 141
Illumina 2.9 170 100 0.01 33
ONT_30x 2.8 2,886 28 4 —
ONT_35x 2.9 2,337 50 8 —
800 bp
250 bp
101 bp
ScanPAV analysis
6
PAVs (Mb)
Present in:
PAVs (Mb)
Absent in:
GRCh38 HuRef AK1 Hs2-HiC Illumina ONT_30x ONT_35x
GRCh38 0 33 20 35 263 65 55
HuRef 18 0 26 33 197 67 58
AK1 14 33 0 32 225 62 54
Hs2-HiC 23 34 27 0 200 59 49
Illumina 91 109 98 95 0 124 117
ONT_30x 21 40 27 29 183 0 35
ONT_35x 47 64 47 53 268 61 0
ONT_35x: Homomer Counts
7
TTTTTT AAAAAA CCCCCC GGGGGG
Draft
Presence−PAVs
Absence−PAVs
0.00.51.01.52.02.53.0
6−mers
6−homopolymerfrequency
Homopolymer frequency / Draft frequency
Mis-joints analysis
8
Map chr-level scaffolds Against Reference
Mis-joints analysis
8
Map chr-level scaffolds Against Reference If no chr-level scaffolds: assign ctgs to chrs
Mis-joints analysis
8
Map chr-level scaffolds Against Reference
If scaffold map to more than 1 chrs
If no chr-level scaffolds: assign ctgs to chrs
Mis-joints analysis
8
Map chr-level scaffolds Against Reference
If scaffold map to more than 1 chrs
Misjoint
If no chr-level scaffolds: assign ctgs to chrs
Mis-joints analysis
8
Map chr-level scaffolds Against Reference
If scaffold map to more than 1 chrs
Misjoint
If no chr-level scaffolds: assign ctgs to chrs
AK1
GRCh38
Chr16Chr2
SY-16SY-2
Mis-joints analysis
9
Scaffold Assigned Chr
Mis-Joints (>300 Kb)
Length (bp) Chr Avg id (%)
HuRef —
Hs2-HiC
Hs2-HiC_hic_scaffold_9 19 300 K 22 99.0
Hs2-HiC_hic_scaffold_10 21 1.3 M 22 99.3
AK1 synteny_group_16 16 16 M 2 99.7
ONT_35x
synteny_group_1 1 3.9 M 19 99.8
synteny_group_1 1 565 K 3 99.0
synteny_group_2 2 1.8 M 1 99.1
synteny_group_4 4 316 K 20 99.1
synteny_group_4 4 543 K 5 99.2
synteny_group_5 5 2.2 M 18 99.1
synteny_group_6 6 3.3 M 7 99.1
synteny_group_8 8 3.7 M 16 98.9
synteny_group_10 10 5.8 M 1 99.0
synteny_group_10 10 2.0 M 11 98.9
synteny_group_12 12 13.9 M 10 99.1
synteny_group_13 13 4.2 M 10 99.1
synteny_group_15 15 2.0 M 3 98.8
synteny_group_16 16 1.4 M 1 98.8
synteny_group_22 22 342 K 19 98.6
4
Fig. S2. Misplaced block visualisation for chromosome assigned scaffolds in Hs2-HiC (left), AK1 (center) and ONT_35x (right).
Table S3. Misjoint location in the original scaffolds for assemblies AK1 and ONT_35x.
Assembly Mis-
Joint
Original Scaffold 1st
Chromosome
Length mapped to
1st
Chromosome
Break
Position
2nd
Chromosom
Giordano et a
sation for chromosome assigned scaffolds in Hs2-HiC (left), AK1 (center) and ONT_35x (right).
on in the original scaffolds for assemblies AK1 and ONT_35x.
Original Scaffold 1st
Length mapped to
st
Break 2nd
Length mapped to
nd
Mis-joints analysis
10
G
Hs2-HiC
AK1
ONT_35x
Mis-joints analysis
11
Scaffold
First Chr
Breaking Area
Second Chr
Chr Length (bp) Chr Length (bp)
AK1 KV784727.1 16 16.9 M 16,390,000 – 16,390,001 2 16.3 M
ONT_35x
tig00001490_pilon_pilon 1 5.5 M 3,999,036 – 4,000,001 19 3.9 M
tig01414718_pilon_pilon 1 23.6 M 23,710,000 – 23,713,101 3 561 K
tig00000928_pilon_pilon 1 2.0 M 23,710,000 – 23,713,101 2 3.0 M
tig00000326_pilon_pilon 4 10.2 M 330,000 – 365,771 20 322 K
tig01414909_pilon_pilon 4 9.5 M 557,558 – 560,001 5 535 K
tig01415181_pilon_pilon 5 9.2 M 9,740,000 – 9,741,384 18 2.2 M
tig00000726_pilon_pilon 6 6.9 M 3,399,916 – 3,400,054 7 3.4 M
tig00001250_pilon_pilon 8 4.7 M 4,820,000 – 4,820,001 16 3.7 M
tig01414799_pilon_pilon 10 14.4 M 14,700,000 – 14,770,556 1 5.8 M
tig01415009_pilon_pilon 10 9.1 M 2,037,882 – 2,040,001 11 2.0 M
tig01414760_pilon_pilon 12 18.0 M 18,455,260 – 18,460,001 10 13.9 M
tig01414699_pilon_pilon 13 45.2 M 46,129,991 – 46,131,368 10 4.2 M
tig00002429_pilon_pilon 15 7.3 M 7,426,592 – 7,430,001 3 2.0 M
tig00001215_pilon_pilon 16 3.9 M 1,719,551 – 2,010,001 1 1.6 M
tig00000735_pilon_pilon 22 11.0 M 358,160 – 360,002 19 352 K
Tasmanian Devil’s
Transmissible Cancer
12
First Observed in NE
Tasmania in 1996
Tumour-
Free Area
Devil Facial Tumour (DFT):
‣ Large tumour growths on face and neck
‣ Transmissible by bite
‣ Affect mostly adult individuals
‣ Death occurs in 4-6 months
‣ Two genetically different Cancers with similar phenotypes:
DFT1 and DFT2
Tasmanian Devil’s
Transmissible Cancer
13
A comparative characterisation analysis for DFT1 and DFT2 has been
submitted to Science* recently [Stammnitz M.R. et al., The origins and
vulnerabilities of two transmissible cancers in Tasmanian devils]
PAV estimation is one of the numerous analyses presented, both to
assess assembly completeness and to look for exposure to exogenous
agents that might have contributed to the cancer development.
The scanPAV analysis involved:
‣ 2 Existing References: Ref-v7.1 and PSU
‣ 6 New Assemblies:
2 from healthy cells: 202H, 203H
4 from tumour cells: 202T, 203T, 86T, 88T
[Stammnitz M.R. et al., The origins and vulnerabilities of two
transmissible cancers in Tasmanian devils, in review @ Science]
Devil’s PAVs
14
The extracted PAVs sequences were screened against the NCBI database:
- no foreign DNA found in the references
- in 202T: found 590 Kb sequence belonging to Mycoplasma arginini genome
- All 6 new assemblies PAVs contained 1.9 Mb of the Streptococcus
pneumoniae genome
Both well known laboratory cell culture contaminants
Remaining PAVs are believed to be due to assembly errors
and ancestral variation
No exogenous contribution to the emergence of DFTs found
[Stammnitz M.R. et al., The origins and vulnerabilities of two
transmissible cancers in Tasmanian devils, in review @ Science]
Summary
‣ Assessing the completeness and quality of a newly generated genome
assembly is a complex task and requires evaluating multiple metrics.
In presence of multiple assemblies or of a reference assembly:
‣ the extraction of PAV sequences can help assessing strengths and weaknesses
of a novel assembly strategy or technology, as well as identifying structure
variation and foreign DNA exposure
‣ indication of possible mis-joints can be inferred by mapping the assembly to
the reference and identify scaffolds that map to multiple chromosomes
‣ scanPAV pipeline for pair-wise assembly extraction of PAVs is available
@ https://sourceforge.net/projects/phusion2/files/scanPAV
‣ In progress: including BWA as an alternative aligner
15
16
Acknowledgments
Paul A. Kitts, National Center for Biotechnology Information, Bethesda, MD
Elizabeth P. Murchison, University of Cambridge, Cambridge, UK
Zemin Ning, Wellcome Trust Sanger Institute, Hinxton, UK
Maximilian R. Stammnitz, University of Cambridge, Cambridge, UK
Thank you!
Backup Slides
17
Devil Assemblies
18
Assembly
#Bases
(Gb)
Scaffolds
(103)
Longest
(Mb)
Contig-n50
(Kb)
Scaffold-n50
(Mb)
Ref-v7.1 3.2 36 5.3 20 1.8
PSU 3.2 149 2.9 11 0.2
202T2 3.0 62 50.7 55 9.5
202H1 3.0 68 22.0 51 4.2
203T3 3.0 63 50.7 53 9.6
203H 3.0 64 31.1 48 4.2
86T 3.0 70 24.5 66 4.4
88H 3.0 61 19.7 60 4.0
[Stammnitz M.R. et al., The origins and vulnerabilities of two
transmissible cancers in Tasmanian devils, in review @ Science]
19
PAVs
present in:
PAVs (Mb)
Absent in:
Ref-v7.1 PSU 202T2 202H1 203T3 203H 86T 88T
Ref-v7.1 0 156 62 67 61 64 55 58
PSU 125 0 69 76 69 71 63 66
202T2 110 149 0 36 22 28 26 25
202H1 107 147 28 0 29 27 28 27
203T3 107 146 20 34 0 27 24 24
203H 109 148 25 32 27 0 26 25
86T 106 146 28 38 29 31 0 22
88H 106 146 25 35 26 27 19 0
Devil’s PAVs
Devil’s Filtered PAVs
20
PAVs
present in:
PAVs (Mb) Absent in:
Ref-v7.1
All Filtered
PSU 125 5
202T2 110 14
202H1 107 11
203T3 107 12
202H 109 12
86T 106 12
88H 106 13
PAV filter:
filter out sequences missing from Absent Assembly but present in reads
Absent Assembly + PAVs
Absent Assembly Reads
Mapped
Filtered PAVs
Filter out PAVs
with 10x depth
Most of the missing sequences are present in the
Genome (reads), but Absent in the assembly
[Stammnitz M.R. et al., The origins and vulnerabilities of two
transmissible cancers in Tasmanian devils, in review @ Science]

More Related Content

What's hot

Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsGenomeInABottle
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollGenomeInABottle
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingGenomeInABottle
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han caoGenomeInABottle
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Candy Smellie
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Integrated DNA Technologies
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Saul Kravitz
 
NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9Joe Szczepaniak
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 

What's hot (20)

Aug2015 analysis team spiral genetics
Aug2015 analysis team spiral geneticsAug2015 analysis team spiral genetics
Aug2015 analysis team spiral genetics
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
Jan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carrollJan2016 dnanexus giab uses andrew carroll
Jan2016 dnanexus giab uses andrew carroll
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008Human Reference Genome Browser Presentation at BIO-ITWorld 2008
Human Reference Genome Browser Presentation at BIO-ITWorld 2008
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9NCER Position on Crispr-Cas9
NCER Position on Crispr-Cas9
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
MSU Transgenic and Genome Editing Facility
MSU Transgenic and Genome Editing FacilityMSU Transgenic and Genome Editing Facility
MSU Transgenic and Genome Editing Facility
 

Similar to F Giordano ScanPAV Analysis Pipeline

Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community
 
Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...Casey Bergman
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Docker, Inc.
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2Dan Gaston
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingGenomeInABottle
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...William Chow
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walkingJonathan Blakes
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
LEO Commercialization: Space-based Research and Development and Manufacturing
LEO Commercialization: Space-based Research and Development and Manufacturing LEO Commercialization: Space-based Research and Development and Manufacturing
LEO Commercialization: Space-based Research and Development and Manufacturing ISSRDC
 

Similar to F Giordano ScanPAV Analysis Pipeline (20)

Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
2014 davis-talk
2014 davis-talk2014 davis-talk
2014 davis-talk
 
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
 
Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
Cool Genes: The Search for a Cure Using Genomics, Big Data, and Docker - Jame...
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotyping
 
Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...Integration of single molecule, genome mapping data in a web-based genome bro...
Integration of single molecule, genome mapping data in a web-based genome bro...
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
LEO Commercialization: Space-based Research and Development and Manufacturing
LEO Commercialization: Space-based Research and Development and Manufacturing LEO Commercialization: Space-based Research and Development and Manufacturing
LEO Commercialization: Space-based Research and Development and Manufacturing
 

More from Francesca Giordano

Phenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCsPhenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCsFrancesca Giordano
 
Talk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in BilbaoTalk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in BilbaoFrancesca Giordano
 
Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1Francesca Giordano
 
Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014Francesca Giordano
 
NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)Francesca Giordano
 
Genome assembly from three sequencing platforms: minION, MiSeq and PacBio
Genome assembly from three sequencing platforms: minION, MiSeq and PacBioGenome assembly from three sequencing platforms: minION, MiSeq and PacBio
Genome assembly from three sequencing platforms: minION, MiSeq and PacBioFrancesca Giordano
 
F Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea QuarksF Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea QuarksFrancesca Giordano
 
F Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DISF Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DISFrancesca Giordano
 
F Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for KaonF Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for KaonFrancesca Giordano
 
F Giordano Proton transversity distributions
F Giordano Proton transversity distributionsF Giordano Proton transversity distributions
F Giordano Proton transversity distributionsFrancesca Giordano
 

More from Francesca Giordano (12)

Phenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCsPhenix Forward Upgrade: the RPCs
Phenix Forward Upgrade: the RPCs
 
Dis2013 spin highlights
Dis2013 spin highlightsDis2013 spin highlights
Dis2013 spin highlights
 
TMD PDF in SIDIS
TMD PDF in SIDISTMD PDF in SIDIS
TMD PDF in SIDIS
 
Talk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in BilbaoTalk at the QCDN 2012 conference in Bilbao
Talk at the QCDN 2012 conference in Bilbao
 
Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1Seminario per Studenti a SPINFest 2014 - day 1
Seminario per Studenti a SPINFest 2014 - day 1
 
Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014Seminario per Studenti a SPINFest 2014
Seminario per Studenti a SPINFest 2014
 
NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)NGS Assembly Practical Lesson (EBI course)
NGS Assembly Practical Lesson (EBI course)
 
Genome assembly from three sequencing platforms: minION, MiSeq and PacBio
Genome assembly from three sequencing platforms: minION, MiSeq and PacBioGenome assembly from three sequencing platforms: minION, MiSeq and PacBio
Genome assembly from three sequencing platforms: minION, MiSeq and PacBio
 
F Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea QuarksF Giordano Proton Spin from Sea Quarks
F Giordano Proton Spin from Sea Quarks
 
F Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DISF Giordano: spin-dependent effects in spin-averaged DIS
F Giordano: spin-dependent effects in spin-averaged DIS
 
F Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for KaonF Giordano Collins Fragmentation for Kaon
F Giordano Collins Fragmentation for Kaon
 
F Giordano Proton transversity distributions
F Giordano Proton transversity distributionsF Giordano Proton transversity distributions
F Giordano Proton transversity distributions
 

Recently uploaded

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 

Recently uploaded (20)

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 

F Giordano ScanPAV Analysis Pipeline

  • 1. 1 ScanPAV: Presence-Absence Variation in Genome Pairs Francesca Giordano IGM, November 7th, 2017
  • 2. Estimation of Genetic Divergence ‣ As important as SNPs/indels, CNVs to estimate genetic variations ‣ Provides insights into population or cancer evolution ‣ Helps identifying foreign DNA contamination ‣ Helps identifying lateral transfer of sequences from viruses or bacteria 2 Presence-Absence Variation (PAV) Assembly Strategy and Technology Assessment ‣ Determines the completeness of an assembly ‣ Helps assessing an assembly strategy and pipeline ‣ Helps identifying strengths and weaknesses in new sequencing technologies
  • 3. 3 Presence Assembly ScanPAV: pipeline 1 Kb Presence Assembly Shred into 1Kb chunks
  • 4. 3 Presence Assembly AbsenceAssembly Scaffold 1 Scaffold 2 Scaffold 3 ScanPAV: pipeline 1 Kb Presence Assembly Shred into 1Kb chunks [SMALT Aligner] Map Against Absence Assembly
  • 5. 3 Presence Assembly AbsenceAssembly Scaffold 1 Scaffold 2 Scaffold 3 ScanPAV: pipeline 1 Kb Presence Assembly Shred into 1Kb chunks [SMALT Aligner] Map Against Absence Assembly Filter noisy mapping
  • 6. 3 Presence Assembly AbsenceAssembly Scaffold 1 Scaffold 2 Scaffold 3 ScanPAV: pipeline 1 Kb Presence Assembly Shred into 1Kb chunks [SMALT Aligner] Map Against Absence Assembly Output Not Mapped Areas Filter noisy mapping
  • 7. Human Assemblies ‣ HuRef (2007), 7.5x whole-genome shotgun Sanger 800 bp PE reads, Celera [Levy S, et al., 2007, PLOS Biology 5(10), e254] ‣ Hs2-HiC (2017) 60x Illumina 250 bp PE reads, DISCOVAR denovo, Hi-C data for scaffolding [Dudchenko O. et al., 2017, Science, doi: 10.1126/science.aal3327] ‣ Illumina (2016) 39x short-insert and 24x long-insert Illumina 101 bp PE reads, spaDES, 10X genomics and Bionano data for scaffolding [Mostovoy Y. et al., 2016, Nature Methods, 13, 587] ‣ AK1 (2016) 101x PacBio long reads (mean 10Kb), Falcon, polished with Quiver and Pilon (Illumina), Bionano maps for scaffolding [Seo J.S. et al., 2016, Nature, 538, 243] ‣ ONT_30x (2017) 30x Oxford Nanopore long reads (mean 11Kb), Canu, polished with Pilon (Illumina). No scaffolds [Jain, M. et al., 2017, bioRxiv, doi:10.1101/128835] ‣ ONT_35x (2017) Same as ONT_30x plus 5x extra-long Oxford Nanopore reads (n50 65Kb), Canu, polished with Pilon (Illumina). No scaffolds [Jain, M. et al., 2017, bioRxiv, doi:10.1101/128835] 4
  • 8. Human Assemblies 5 Assembly #Bases (Gb) #Contigs/ Scaffolds Longest (Mb) Contig-n50 (Mb) Scaffold-n50 (Mb) GRCh38 3.1 24 249 59 156 HuRef 2.8 3,134 235 0.11 144 AK1 2.9 2,832 114 18 45 Hs2-HiC 2.8 44,065 225 0.10 141 Illumina 2.9 170 100 0.01 33 ONT_30x 2.8 2,886 28 4 — ONT_35x 2.9 2,337 50 8 —
  • 9. Human Assemblies 5 Assembly #Bases (Gb) #Contigs/ Scaffolds Longest (Mb) Contig-n50 (Mb) Scaffold-n50 (Mb) GRCh38 3.1 24 249 59 156 HuRef 2.8 3,134 235 0.11 144 AK1 2.9 2,832 114 18 45 Hs2-HiC 2.8 44,065 225 0.10 141 Illumina 2.9 170 100 0.01 33 ONT_30x 2.8 2,886 28 4 — ONT_35x 2.9 2,337 50 8 — 800 bp 250 bp 101 bp
  • 10. ScanPAV analysis 6 PAVs (Mb) Present in: PAVs (Mb) Absent in: GRCh38 HuRef AK1 Hs2-HiC Illumina ONT_30x ONT_35x GRCh38 0 33 20 35 263 65 55 HuRef 18 0 26 33 197 67 58 AK1 14 33 0 32 225 62 54 Hs2-HiC 23 34 27 0 200 59 49 Illumina 91 109 98 95 0 124 117 ONT_30x 21 40 27 29 183 0 35 ONT_35x 47 64 47 53 268 61 0
  • 11. ONT_35x: Homomer Counts 7 TTTTTT AAAAAA CCCCCC GGGGGG Draft Presence−PAVs Absence−PAVs 0.00.51.01.52.02.53.0 6−mers 6−homopolymerfrequency Homopolymer frequency / Draft frequency
  • 12. Mis-joints analysis 8 Map chr-level scaffolds Against Reference
  • 13. Mis-joints analysis 8 Map chr-level scaffolds Against Reference If no chr-level scaffolds: assign ctgs to chrs
  • 14. Mis-joints analysis 8 Map chr-level scaffolds Against Reference If scaffold map to more than 1 chrs If no chr-level scaffolds: assign ctgs to chrs
  • 15. Mis-joints analysis 8 Map chr-level scaffolds Against Reference If scaffold map to more than 1 chrs Misjoint If no chr-level scaffolds: assign ctgs to chrs
  • 16. Mis-joints analysis 8 Map chr-level scaffolds Against Reference If scaffold map to more than 1 chrs Misjoint If no chr-level scaffolds: assign ctgs to chrs AK1 GRCh38 Chr16Chr2 SY-16SY-2
  • 17. Mis-joints analysis 9 Scaffold Assigned Chr Mis-Joints (>300 Kb) Length (bp) Chr Avg id (%) HuRef — Hs2-HiC Hs2-HiC_hic_scaffold_9 19 300 K 22 99.0 Hs2-HiC_hic_scaffold_10 21 1.3 M 22 99.3 AK1 synteny_group_16 16 16 M 2 99.7 ONT_35x synteny_group_1 1 3.9 M 19 99.8 synteny_group_1 1 565 K 3 99.0 synteny_group_2 2 1.8 M 1 99.1 synteny_group_4 4 316 K 20 99.1 synteny_group_4 4 543 K 5 99.2 synteny_group_5 5 2.2 M 18 99.1 synteny_group_6 6 3.3 M 7 99.1 synteny_group_8 8 3.7 M 16 98.9 synteny_group_10 10 5.8 M 1 99.0 synteny_group_10 10 2.0 M 11 98.9 synteny_group_12 12 13.9 M 10 99.1 synteny_group_13 13 4.2 M 10 99.1 synteny_group_15 15 2.0 M 3 98.8 synteny_group_16 16 1.4 M 1 98.8 synteny_group_22 22 342 K 19 98.6
  • 18. 4 Fig. S2. Misplaced block visualisation for chromosome assigned scaffolds in Hs2-HiC (left), AK1 (center) and ONT_35x (right). Table S3. Misjoint location in the original scaffolds for assemblies AK1 and ONT_35x. Assembly Mis- Joint Original Scaffold 1st Chromosome Length mapped to 1st Chromosome Break Position 2nd Chromosom Giordano et a sation for chromosome assigned scaffolds in Hs2-HiC (left), AK1 (center) and ONT_35x (right). on in the original scaffolds for assemblies AK1 and ONT_35x. Original Scaffold 1st Length mapped to st Break 2nd Length mapped to nd Mis-joints analysis 10 G Hs2-HiC AK1 ONT_35x
  • 19. Mis-joints analysis 11 Scaffold First Chr Breaking Area Second Chr Chr Length (bp) Chr Length (bp) AK1 KV784727.1 16 16.9 M 16,390,000 – 16,390,001 2 16.3 M ONT_35x tig00001490_pilon_pilon 1 5.5 M 3,999,036 – 4,000,001 19 3.9 M tig01414718_pilon_pilon 1 23.6 M 23,710,000 – 23,713,101 3 561 K tig00000928_pilon_pilon 1 2.0 M 23,710,000 – 23,713,101 2 3.0 M tig00000326_pilon_pilon 4 10.2 M 330,000 – 365,771 20 322 K tig01414909_pilon_pilon 4 9.5 M 557,558 – 560,001 5 535 K tig01415181_pilon_pilon 5 9.2 M 9,740,000 – 9,741,384 18 2.2 M tig00000726_pilon_pilon 6 6.9 M 3,399,916 – 3,400,054 7 3.4 M tig00001250_pilon_pilon 8 4.7 M 4,820,000 – 4,820,001 16 3.7 M tig01414799_pilon_pilon 10 14.4 M 14,700,000 – 14,770,556 1 5.8 M tig01415009_pilon_pilon 10 9.1 M 2,037,882 – 2,040,001 11 2.0 M tig01414760_pilon_pilon 12 18.0 M 18,455,260 – 18,460,001 10 13.9 M tig01414699_pilon_pilon 13 45.2 M 46,129,991 – 46,131,368 10 4.2 M tig00002429_pilon_pilon 15 7.3 M 7,426,592 – 7,430,001 3 2.0 M tig00001215_pilon_pilon 16 3.9 M 1,719,551 – 2,010,001 1 1.6 M tig00000735_pilon_pilon 22 11.0 M 358,160 – 360,002 19 352 K
  • 20. Tasmanian Devil’s Transmissible Cancer 12 First Observed in NE Tasmania in 1996 Tumour- Free Area Devil Facial Tumour (DFT): ‣ Large tumour growths on face and neck ‣ Transmissible by bite ‣ Affect mostly adult individuals ‣ Death occurs in 4-6 months ‣ Two genetically different Cancers with similar phenotypes: DFT1 and DFT2
  • 21. Tasmanian Devil’s Transmissible Cancer 13 A comparative characterisation analysis for DFT1 and DFT2 has been submitted to Science* recently [Stammnitz M.R. et al., The origins and vulnerabilities of two transmissible cancers in Tasmanian devils] PAV estimation is one of the numerous analyses presented, both to assess assembly completeness and to look for exposure to exogenous agents that might have contributed to the cancer development. The scanPAV analysis involved: ‣ 2 Existing References: Ref-v7.1 and PSU ‣ 6 New Assemblies: 2 from healthy cells: 202H, 203H 4 from tumour cells: 202T, 203T, 86T, 88T [Stammnitz M.R. et al., The origins and vulnerabilities of two transmissible cancers in Tasmanian devils, in review @ Science]
  • 22. Devil’s PAVs 14 The extracted PAVs sequences were screened against the NCBI database: - no foreign DNA found in the references - in 202T: found 590 Kb sequence belonging to Mycoplasma arginini genome - All 6 new assemblies PAVs contained 1.9 Mb of the Streptococcus pneumoniae genome Both well known laboratory cell culture contaminants Remaining PAVs are believed to be due to assembly errors and ancestral variation No exogenous contribution to the emergence of DFTs found [Stammnitz M.R. et al., The origins and vulnerabilities of two transmissible cancers in Tasmanian devils, in review @ Science]
  • 23. Summary ‣ Assessing the completeness and quality of a newly generated genome assembly is a complex task and requires evaluating multiple metrics. In presence of multiple assemblies or of a reference assembly: ‣ the extraction of PAV sequences can help assessing strengths and weaknesses of a novel assembly strategy or technology, as well as identifying structure variation and foreign DNA exposure ‣ indication of possible mis-joints can be inferred by mapping the assembly to the reference and identify scaffolds that map to multiple chromosomes ‣ scanPAV pipeline for pair-wise assembly extraction of PAVs is available @ https://sourceforge.net/projects/phusion2/files/scanPAV ‣ In progress: including BWA as an alternative aligner 15
  • 24. 16 Acknowledgments Paul A. Kitts, National Center for Biotechnology Information, Bethesda, MD Elizabeth P. Murchison, University of Cambridge, Cambridge, UK Zemin Ning, Wellcome Trust Sanger Institute, Hinxton, UK Maximilian R. Stammnitz, University of Cambridge, Cambridge, UK Thank you!
  • 26. Devil Assemblies 18 Assembly #Bases (Gb) Scaffolds (103) Longest (Mb) Contig-n50 (Kb) Scaffold-n50 (Mb) Ref-v7.1 3.2 36 5.3 20 1.8 PSU 3.2 149 2.9 11 0.2 202T2 3.0 62 50.7 55 9.5 202H1 3.0 68 22.0 51 4.2 203T3 3.0 63 50.7 53 9.6 203H 3.0 64 31.1 48 4.2 86T 3.0 70 24.5 66 4.4 88H 3.0 61 19.7 60 4.0 [Stammnitz M.R. et al., The origins and vulnerabilities of two transmissible cancers in Tasmanian devils, in review @ Science]
  • 27. 19 PAVs present in: PAVs (Mb) Absent in: Ref-v7.1 PSU 202T2 202H1 203T3 203H 86T 88T Ref-v7.1 0 156 62 67 61 64 55 58 PSU 125 0 69 76 69 71 63 66 202T2 110 149 0 36 22 28 26 25 202H1 107 147 28 0 29 27 28 27 203T3 107 146 20 34 0 27 24 24 203H 109 148 25 32 27 0 26 25 86T 106 146 28 38 29 31 0 22 88H 106 146 25 35 26 27 19 0 Devil’s PAVs
  • 28. Devil’s Filtered PAVs 20 PAVs present in: PAVs (Mb) Absent in: Ref-v7.1 All Filtered PSU 125 5 202T2 110 14 202H1 107 11 203T3 107 12 202H 109 12 86T 106 12 88H 106 13 PAV filter: filter out sequences missing from Absent Assembly but present in reads Absent Assembly + PAVs Absent Assembly Reads Mapped Filtered PAVs Filter out PAVs with 10x depth Most of the missing sequences are present in the Genome (reads), but Absent in the assembly [Stammnitz M.R. et al., The origins and vulnerabilities of two transmissible cancers in Tasmanian devils, in review @ Science]