SlideShare a Scribd company logo
FIND MEANING IN COMPLEXITY
© Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved.For Research Use Only. Not for use in diagnostic procedures.
Jason Chin (@infoecho) / Sept. 20 2014, GRC Workshop, Cambridge,
UK
Learning Genomic Structures From De
Novo Assembly and Long-read Mapping
de novol
Cost per Genome Dilemma
2
Sequencing cost is down for sure, but getting a de novo human genome that has the
same scientific standard as the initial work does NOT follow Moore’s law.
PacBio® CHM1: 4378 kb
from just single random fragment
library
HGP, N50 ~100kb
NCBI-34
Contig N50 29Mb
HuRef: 107kb
BGI YH: 7.4kb
KB1: 5.5kb
NA12878: 24kb
CHM1: 144kb
RP11: 127kb
According to the NHGRI
website, the definition of
“sequencing a genome”
changed in 2008.
The 1000 Genomes Project
starts in 2008, too.
Question Asked!!
•  Since the 1000 Genomes
Project, we have learned a lot
of about point mutations. Can
we go beyond that?
•  What if we have 50, 100 or
more human assemblies so we
can address all genetic
variations as much as
possible?
•  Will one day all human genome
sequencing be done in de novo
fashion?
–  If so, how can we get ready
for that as bioinformatists?
3
Evan Eichler , In Future Opportunities
for Genome Sequencing and Beyond,
July 28-29, 2014
Where We Are Now
•  One PacBio® human data set is publicly available, more are likely to
come
•  Multiple groups have successfully assembled the public CHM1 data
set independently with new algorithms from raw data
•  With new alignment/assembly tools from Gene Myers:
one can assemble a genome in ~ 20,000 CPU-hours. (20X faster
than 400,000+ CPU-hours from previous effort.)
4
New Assembly Statistics done
With Daligner:
	
  
#Seqs	
  	
  	
  5,058	
  
Mean	
  	
  	
  	
  562,695	
  
Max	
  	
  	
  	
  	
  27,292,514	
  
n50	
  	
  	
  	
  	
  5,265,098	
  
Total	
  	
  	
  2,846,115,586	
  http://dazzlerblog.wordpress.com
What Can We Learn from High-contiguity
De Novo Human Assemblies?
5
What Can We Learn from High-contiguity Human
Assemblies?
•  Low-hanging Fruits
–  Calling SNPs (assembly not needed, but it helps)
–  Calling structure variants with whole-genome alignment
approaches
–  Inferring repeats by coverage analysis
•  Assembly graph can provide information for understanding
more complicated polymorphisms
6
Call SNPs / Example: HLA-B
7
Call Structure Variation By Whole-genome Alignment
•  Whole-genome alignments ( ~ 1 hr in a 32-core machine)
–  With multi-threaded Mummer
–  Clustering the hits with Mgaps and identified “gaps” in the alignments,
convert to bed format for visualization
8
Structure Variants Called in Chromosome 1
Distribution of The Structure Variation Sizes
•  Number of insertions/deletions: 13796 SV calls (for insertion or deletion >
100 bp against hg19)
9
PacBio® vs. Short-read Alignment View for SV in the MHC region
10
318bp insertion
Assembly Graph
11
Each edge is associated with a sequence.
Every path is a candidate of a model of part
of the genome.
From Gene Myers’ ISMB 2014 Keynote talk
Dissect a Contig from a String Graph
The autonomy of a contig from a string graph layout
12
A contig: a linear non-branching path
Each node: the begin (5’) or end (3’)
of a read
Each edge: a continuous sub-
sequence from one read
Ek:	
  (V1,	
  V2,	
  Read,	
  Range)	
  =	
  
	
  (	
  00099576_1:B,	
  00101043_0:B,	
  00101043_0,	
  1991-­‐0	
  )	
  
	
  
Read	
  1:	
  00099576_1,	
  Read	
  2:	
  00101043_0	
  
	
  
In practice, we might just encode the paths in a contig rather than each single
edge:
C	
  =	
  (Ek,	
  Ek+1,	
  Ek+2,	
  Ek+2)	
  =	
  (Pj	
  Pj+1)	
  	
  	
  
V1 V2 V3 V4 V5
Ek Ek+1 Ek+2 Ek+3
V1 V3 V5
Pj Pj+1
C =
=
Assembly String Graph of CHM1 Genome
•  Largest connect component: 31998 nodes, 39399 edges, ~36.5%
(~1Gbp) of the human genome (total: 87572 nodes, 94530 edges)
13
Centromere?
Casey Bergman:
“it almost looks like an
electron micrograph of
the nucleus”
#convergence
Polymorphism Structure vs. Local Assembly Graph
Structure
14
SNPs
SNPs SNPs
SVsSVs
Diploid Genome
Segmental Duplication
Similar String Graph
Identify Contigs: A New Proposal
SNPs
SNPs SNPs
SVs
SVs
Associated
contig 1
Associated
contig 2
Primary
contig
1 full length contig + 2 associated contigs
Keep the long-range information
while maintaining the relations of
the alternative alleles.
Contig 4076 Alignment Around DPY19L2 Locus
Same contig
Contig Graph and Segmental Duplication
Contig 4076, one primary contig, 3 associate contigs, aligned to Chr7 and Chr12
Coting 4076 Alignment to Chr7
Same contig
SV calls from
CHM1 asm
SV calls from
GRC38
Local Neighborhood Subgraph of Contig 4076
19
Examining an Assembly Graph at Contig Level Around
1q21
•  Contig graph, 1q21, contig 4108, another potential segmental
duplication?
20
Another Intriguing Case
21
•  Contig 4006 mapped to chr 9
The aligned region changes a lot in GRC38.
Contig Coverage Analysis
22
18.5 X
2 * 18.5 X
3 * 18.5 X
High coverage long contigs
40 contigs > 100kbp
> 2.5 * 18.5 X
Poor assemblies,
alignment artifacts,
or sequence errors?
High repeat elements
Checking the Complexity of the High-coverage Contigs
23
Contig 4006, 687kb, 53x coverage
Contig 4235, 453k, 59x coverage
Contig 3842, 235k, 54x coverage
Warning: These contigs may not be 100% correctly assembled due to
some nasty repeats. However, the local graphs give hints about the
true genome structures.
How does the High-coverage Contig Look?
24
>2000X in this region
How does The High-coverage Contig Look?
25
High-coverage
Region
Alpha satellites?
For Research Use Only. Not for use in diagnostic procedures.
Extreme Repeats
26
Identify Centromere Alpha-satellite Structure
•  Most of the nasty contig graphs are around the centromere.
Currently, it remains hard to get long contigs around those very long
tandem repeats.
•  However, we can still learn many useful things from long-read data
•  Tool In Development: α-Centauri for identifying different high-order
repeat structures (https://github.com/volkansevim/alpha-CENTAURI,
Volkan Sevim, Ali Bashir & Karen Miga )
27
Centromere Alpha Satellites Have Non-trivial High-order
Repeat Structure
28
Karen Miga
Example: A Read Reconstructs a 24-mer HOR
29
Align monomer to each other to
identify near identical mon0mers
Identify HOR with the monomer
IDs and positions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
171819
20
21
22
23
24
Many Other Open Topics
•  Low-coverage assembly: cost vs. quality analysis
•  Phasing for haplotypes
•  Crowd-sourcing infrastructure for examining / annotating / correcting
genome assemblies
•  Evaluation about SNPs calling with short reads on better assembly
•  Large-scale comparative genomes with de novo assemblies
•  Assembly-graph data format
•  Visualization Techniques
•  Combining other data types, e.g. optical mapping
30
It is a very exciting time. We still need more tools to harvest
information to generate new knowledge.
For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq
are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.
31

More Related Content

What's hot

Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
Genome Reference Consortium
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
Genome Reference Consortium
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
Genome Reference Consortium
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
Genome Reference Consortium
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
Genome Reference Consortium
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
Genome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
Shaojun Xie
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
Genome Reference Consortium
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
Genome Reference Consortium
 
ABGT 2016 Workshop Schneider
ABGT 2016 Workshop SchneiderABGT 2016 Workshop Schneider
ABGT 2016 Workshop Schneider
Genome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
Genome Reference Consortium
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
Genome Reference Consortium
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
Genome Reference Consortium
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
Genome Reference Consortium
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
Genome Reference Consortium
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
Genome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
Genome Reference Consortium
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
Genome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
Genome Reference Consortium
 

What's hot (20)

Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
ABGT 2016 Workshop Schneider
ABGT 2016 Workshop SchneiderABGT 2016 Workshop Schneider
ABGT 2016 Workshop Schneider
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 

Similar to Alignment Approaches II: Long Reads

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
Genome Reference Consortium
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
Karen Hayden Miga
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Golden Helix Inc
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
hansjansen9999
 
04_Assembly_2022.pdf
04_Assembly_2022.pdf04_Assembly_2022.pdf
04_Assembly_2022.pdf
Kristen DeAngelis
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
c.titus.brown
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
David Cook
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
Lex Nederbragt
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Stuart MacGowan
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
Deanna Church
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
NoraCRuizGuevara
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
ehsan sepahi
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
Adam Phillippy
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
austinps
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
Pradeep Kumar
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
Jonathan Blakes
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
vantinhkhuc
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
GenomeInABottle
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
Sean Davis
 

Similar to Alignment Approaches II: Long Reads (20)

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
04_Assembly_2022.pdf
04_Assembly_2022.pdf04_Assembly_2022.pdf
04_Assembly_2022.pdf
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 

More from Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
Genome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
Genome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Genome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
Genome Reference Consortium
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
Genome Reference Consortium
 
Mane v2 final
Mane v2 finalMane v2 final
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
Genome Reference Consortium
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
Genome Reference Consortium
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
Genome Reference Consortium
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
Genome Reference Consortium
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
Genome Reference Consortium
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
Genome Reference Consortium
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
Genome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
Genome Reference Consortium
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Genome Reference Consortium
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 

More from Genome Reference Consortium (20)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 

Recently uploaded

Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Faculty of Applied Chemistry and Materials Science
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Sérgio Sacani
 
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physicsTHE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
Dr. sreeremya S
 
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Dr NEETHU ASOKAN
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
Sérgio Sacani
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
PANDURANGLAWATE1
 
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra IonBiochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Faculty of Applied Chemistry and Materials Science
 
Pancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptxPancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptx
muralinath2
 
Concept of Balanced Diet & Nutrients.pdf
Concept of Balanced Diet & Nutrients.pdfConcept of Balanced Diet & Nutrients.pdf
Concept of Balanced Diet & Nutrients.pdf
SELF-EXPLANATORY
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
Sérgio Sacani
 
Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....
anushkakharat13
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
Faculty of Applied Chemistry and Materials Science
 
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPYReview Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
niranjangiri009
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
Sérgio Sacani
 
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
Faculty of Applied Chemistry and Materials Science
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Gurjant Singh
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
Task Train
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
bellared2
 

Recently uploaded (20)

Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
 
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physicsTHE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
 
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
 
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra IonBiochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
 
Pancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptxPancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptx
 
Concept of Balanced Diet & Nutrients.pdf
Concept of Balanced Diet & Nutrients.pdfConcept of Balanced Diet & Nutrients.pdf
Concept of Balanced Diet & Nutrients.pdf
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
 
Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
 
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPYReview Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
Review Article:- A REVIEW ON RADIOISOTOPES IN CANCER THERAPY
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
 
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
 
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
 
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
 

Alignment Approaches II: Long Reads

  • 1. FIND MEANING IN COMPLEXITY © Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved.For Research Use Only. Not for use in diagnostic procedures. Jason Chin (@infoecho) / Sept. 20 2014, GRC Workshop, Cambridge, UK Learning Genomic Structures From De Novo Assembly and Long-read Mapping de novol
  • 2. Cost per Genome Dilemma 2 Sequencing cost is down for sure, but getting a de novo human genome that has the same scientific standard as the initial work does NOT follow Moore’s law. PacBio® CHM1: 4378 kb from just single random fragment library HGP, N50 ~100kb NCBI-34 Contig N50 29Mb HuRef: 107kb BGI YH: 7.4kb KB1: 5.5kb NA12878: 24kb CHM1: 144kb RP11: 127kb According to the NHGRI website, the definition of “sequencing a genome” changed in 2008. The 1000 Genomes Project starts in 2008, too.
  • 3. Question Asked!! •  Since the 1000 Genomes Project, we have learned a lot of about point mutations. Can we go beyond that? •  What if we have 50, 100 or more human assemblies so we can address all genetic variations as much as possible? •  Will one day all human genome sequencing be done in de novo fashion? –  If so, how can we get ready for that as bioinformatists? 3 Evan Eichler , In Future Opportunities for Genome Sequencing and Beyond, July 28-29, 2014
  • 4. Where We Are Now •  One PacBio® human data set is publicly available, more are likely to come •  Multiple groups have successfully assembled the public CHM1 data set independently with new algorithms from raw data •  With new alignment/assembly tools from Gene Myers: one can assemble a genome in ~ 20,000 CPU-hours. (20X faster than 400,000+ CPU-hours from previous effort.) 4 New Assembly Statistics done With Daligner:   #Seqs      5,058   Mean        562,695   Max          27,292,514   n50          5,265,098   Total      2,846,115,586  http://dazzlerblog.wordpress.com
  • 5. What Can We Learn from High-contiguity De Novo Human Assemblies? 5
  • 6. What Can We Learn from High-contiguity Human Assemblies? •  Low-hanging Fruits –  Calling SNPs (assembly not needed, but it helps) –  Calling structure variants with whole-genome alignment approaches –  Inferring repeats by coverage analysis •  Assembly graph can provide information for understanding more complicated polymorphisms 6
  • 7. Call SNPs / Example: HLA-B 7
  • 8. Call Structure Variation By Whole-genome Alignment •  Whole-genome alignments ( ~ 1 hr in a 32-core machine) –  With multi-threaded Mummer –  Clustering the hits with Mgaps and identified “gaps” in the alignments, convert to bed format for visualization 8 Structure Variants Called in Chromosome 1
  • 9. Distribution of The Structure Variation Sizes •  Number of insertions/deletions: 13796 SV calls (for insertion or deletion > 100 bp against hg19) 9
  • 10. PacBio® vs. Short-read Alignment View for SV in the MHC region 10 318bp insertion
  • 11. Assembly Graph 11 Each edge is associated with a sequence. Every path is a candidate of a model of part of the genome. From Gene Myers’ ISMB 2014 Keynote talk
  • 12. Dissect a Contig from a String Graph The autonomy of a contig from a string graph layout 12 A contig: a linear non-branching path Each node: the begin (5’) or end (3’) of a read Each edge: a continuous sub- sequence from one read Ek:  (V1,  V2,  Read,  Range)  =    (  00099576_1:B,  00101043_0:B,  00101043_0,  1991-­‐0  )     Read  1:  00099576_1,  Read  2:  00101043_0     In practice, we might just encode the paths in a contig rather than each single edge: C  =  (Ek,  Ek+1,  Ek+2,  Ek+2)  =  (Pj  Pj+1)       V1 V2 V3 V4 V5 Ek Ek+1 Ek+2 Ek+3 V1 V3 V5 Pj Pj+1 C = =
  • 13. Assembly String Graph of CHM1 Genome •  Largest connect component: 31998 nodes, 39399 edges, ~36.5% (~1Gbp) of the human genome (total: 87572 nodes, 94530 edges) 13 Centromere? Casey Bergman: “it almost looks like an electron micrograph of the nucleus” #convergence
  • 14. Polymorphism Structure vs. Local Assembly Graph Structure 14 SNPs SNPs SNPs SVsSVs Diploid Genome Segmental Duplication Similar String Graph
  • 15. Identify Contigs: A New Proposal SNPs SNPs SNPs SVs SVs Associated contig 1 Associated contig 2 Primary contig 1 full length contig + 2 associated contigs Keep the long-range information while maintaining the relations of the alternative alleles.
  • 16. Contig 4076 Alignment Around DPY19L2 Locus Same contig
  • 17. Contig Graph and Segmental Duplication Contig 4076, one primary contig, 3 associate contigs, aligned to Chr7 and Chr12
  • 18. Coting 4076 Alignment to Chr7 Same contig SV calls from CHM1 asm SV calls from GRC38
  • 19. Local Neighborhood Subgraph of Contig 4076 19
  • 20. Examining an Assembly Graph at Contig Level Around 1q21 •  Contig graph, 1q21, contig 4108, another potential segmental duplication? 20
  • 21. Another Intriguing Case 21 •  Contig 4006 mapped to chr 9 The aligned region changes a lot in GRC38.
  • 22. Contig Coverage Analysis 22 18.5 X 2 * 18.5 X 3 * 18.5 X High coverage long contigs 40 contigs > 100kbp > 2.5 * 18.5 X Poor assemblies, alignment artifacts, or sequence errors? High repeat elements
  • 23. Checking the Complexity of the High-coverage Contigs 23 Contig 4006, 687kb, 53x coverage Contig 4235, 453k, 59x coverage Contig 3842, 235k, 54x coverage Warning: These contigs may not be 100% correctly assembled due to some nasty repeats. However, the local graphs give hints about the true genome structures.
  • 24. How does the High-coverage Contig Look? 24 >2000X in this region
  • 25. How does The High-coverage Contig Look? 25 High-coverage Region Alpha satellites?
  • 26. For Research Use Only. Not for use in diagnostic procedures. Extreme Repeats 26
  • 27. Identify Centromere Alpha-satellite Structure •  Most of the nasty contig graphs are around the centromere. Currently, it remains hard to get long contigs around those very long tandem repeats. •  However, we can still learn many useful things from long-read data •  Tool In Development: α-Centauri for identifying different high-order repeat structures (https://github.com/volkansevim/alpha-CENTAURI, Volkan Sevim, Ali Bashir & Karen Miga ) 27
  • 28. Centromere Alpha Satellites Have Non-trivial High-order Repeat Structure 28 Karen Miga
  • 29. Example: A Read Reconstructs a 24-mer HOR 29 Align monomer to each other to identify near identical mon0mers Identify HOR with the monomer IDs and positions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24
  • 30. Many Other Open Topics •  Low-coverage assembly: cost vs. quality analysis •  Phasing for haplotypes •  Crowd-sourcing infrastructure for examining / annotating / correcting genome assemblies •  Evaluation about SNPs calling with short reads on better assembly •  Large-scale comparative genomes with de novo assemblies •  Assembly-graph data format •  Visualization Techniques •  Combining other data types, e.g. optical mapping 30 It is a very exciting time. We still need more tools to harvest information to generate new knowledge.
  • 31. For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners. 31