0
Christopher W. Beitel, Aaron Darling
twitter: @koadman
Genomics revolutionized our understanding of bacteria

Perna et al 2001 Nature, Welch et al 2002 PNAS
Three genomes, same ...
Phylogenetic diversity: the known unknowns
●

Can amplify and sequence the 16S rRNA gene present in all bacteria and
archa...
Is metagenomics the answer...??
Smash with bead beater
Sample
Purify DNA

Analyze

Shear DNA to 400nt frags

Metagenomics ...
What is Hi-C?

High throughput sequencing of proximity-ligation products

Landmark papers
Chromosome conformation capture:...
We propose to use Hi-C for metagenomics

Cell/Species A

Cell/Species B

Conjecture: DNA in the same cell will become cros...
A little experiment to test Hi-C
Selected 5 strains, 4 species:
+ E. coli BL21 (BL21)
+ E. coli K12 (K12)
+ Pediococcus pe...
Reconstructing genomes from within metagenomes
A classic problem in metagenomics
Can't de novo assemble Hi-C data due to p...
A Hi-C scaffold graph

●

Nodes are assembly contigs

●

node size ∝ contig size

●

●

edge weights ∝ normalized
Hi-C rea...
A Hi-C scaffold graph

●

Nodes are assembly contigs

●

node size ∝ contig size

●

edge weights ∝ normalized
Hi-C read c...
Contig clustering: why it works
Rate at which Hi-C reads associate
within and across species

Hi− read pair insert distrib...
Resolving strain differences: Single nucleotide variants
AA

TT

AA

AA

G
G

C
C

G
G

C
C

- SNPs between E. coli K12 an...
Resolving strain differences: Hi-C versus mate-pair
Distance on genome versus path length in variant graph

Mate pair grap...
Opportunities to collaborate
1. Species clustering of assemblies
2. Resolving strain haplotypes
3. Application to microbia...
Thanks to…

Lutz Froenicke
Richard Michelmore
Ian Korf
Matthew DeMaere

Christopher Beitel

Jonathan Eisen

UC Davis

UC D...
Upcoming SlideShare
Loading in...5
×

Enjoy Hi-C! - Aaron Darling

2,097

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,097
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
22
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Enjoy Hi-C! - Aaron Darling"

  1. 1. Christopher W. Beitel, Aaron Darling twitter: @koadman
  2. 2. Genomics revolutionized our understanding of bacteria Perna et al 2001 Nature, Welch et al 2002 PNAS Three genomes, same species only 40% genes in common
  3. 3. Phylogenetic diversity: the known unknowns ● Can amplify and sequence the 16S rRNA gene present in all bacteria and archaea ● Millions of 16S sequences obtained per run ● Align sequences, build evolutionary tree, analyze Rinke et al 2013, Nature Uncultured bacterial diversity greatly exceeds known isolate diversity
  4. 4. Is metagenomics the answer...?? Smash with bead beater Sample Purify DNA Analyze Shear DNA to 400nt frags Metagenomics loses long-range linkage At least two paths forward: 1. Physically “dissect” microbial communities Sequence (yay!) 2. Better inference methods Prep sequencer library 3. Nanopore long read sequencing?
  5. 5. What is Hi-C? High throughput sequencing of proximity-ligation products Landmark papers Chromosome conformation capture: sdfdf Hi-C: Lieberman-Aiden et al 2009 Single (human) cell Hi-C: Nagano et al 2013
  6. 6. We propose to use Hi-C for metagenomics Cell/Species A Cell/Species B Conjecture: DNA in the same cell will become crosslinked, ligated, and sequenced more frequently (green links) than DNA fragments from different cells (red links)
  7. 7. A little experiment to test Hi-C Selected 5 strains, 4 species: + E. coli BL21 (BL21) + E. coli K12 (K12) + Pediococcus pentosaceus (Ped) + Burkholderia thailandensis (2 chromosomes: Bur1, Bur2) + Lactobacillus brevis (chromosome + 2 plasmids: Lac0, Lac1, Lac2) Grew up cultures, mixed in equal parts based on OD600 Applied standard Hi-C protocol, sequenced on Illumina MiSeq ~22M read pairs, 160nt reads
  8. 8. Reconstructing genomes from within metagenomes A classic problem in metagenomics Can't de novo assemble Hi-C data due to presence of ligation junctions, other issues To test reconstruction we instead simulated reads with BioGrinder and assembled with soapdenovo. Assembly stats: Cov. Contigs (scaff.) Bp assem. N50 (scaff.) N90 (scaf) Max. len. Contig. error Scaff. error Pct. assem. 100 7687 (557) 15,577,164 87634 8552 379623 0 0 77.37
  9. 9. A Hi-C scaffold graph ● Nodes are assembly contigs ● node size ∝ contig size ● ● edge weights ∝ normalized Hi-C read counts linking contigs Fruchterman-Reingold layout
  10. 10. A Hi-C scaffold graph ● Nodes are assembly contigs ● node size ∝ contig size ● edge weights ∝ normalized Hi-C read counts linking contigs ● Fruchterman Reingold layout ● Nodes colored by SPECIES ● Can we actually compute clusters? ● Markov clustering, I=1.1: 4 clusters, >97% of genome in each bin
  11. 11. Contig clustering: why it works Rate at which Hi-C reads associate within and across species Hi− read pair insert distributions C L. brevis ATCC367 P. pentosaceus ATCC25745 E. coli K12 DH10B E. coli BL21(DE3) B. thailandensis E264 chrI B. thailandensis E264 chrII 0 500000 1000000 1500000 2000000 2500000 Insert distance in nucleotides ● 99% of Hi-C read pairs associate within-species ● Hi-C read pairs associate throughout the chromosome ● Spans distances that no existing or imaginary “long read” technology can cover
  12. 12. Resolving strain differences: Single nucleotide variants AA TT AA AA G G C C G G C C - SNPs between E. coli K12 and BL21 are nodes - Edges link SNPs observed in same read pair Are Hi-C graphs are better connected than mate-pair?? 5k MP Largest 6.2% connected component 10k MP 20k MP Hi-C 16.6% 32.4% 97.8% Mate pair Hi-C
  13. 13. Resolving strain differences: Hi-C versus mate-pair Distance on genome versus path length in variant graph Mate pair graph distances scale linearly. Hi-C graphs are scale-free. Estimation error in probability of variant linkage grows with path length!
  14. 14. Opportunities to collaborate 1. Species clustering of assemblies 2. Resolving strain haplotypes 3. Application to microbial communities! 4. Improvement of the Hi-C protocol 5. Population metagenomics – track gene flow of, e.g. antibiotic resistance plasmids 6. Your idea (and some grant funding) here
  15. 15. Thanks to… Lutz Froenicke Richard Michelmore Ian Korf Matthew DeMaere Christopher Beitel Jonathan Eisen UC Davis UC Davis Jenna Lang UC Davis Catherine Burke Manuscript preprint online at https://peerj.com/preprints/260/ twitter: @koadman
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×