Your SlideShare is downloading. ×

Enjoy Hi-C! - Aaron Darling

1,905

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,905
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Christopher W. Beitel, Aaron Darling twitter: @koadman
  • 2. Genomics revolutionized our understanding of bacteria Perna et al 2001 Nature, Welch et al 2002 PNAS Three genomes, same species only 40% genes in common
  • 3. Phylogenetic diversity: the known unknowns ● Can amplify and sequence the 16S rRNA gene present in all bacteria and archaea ● Millions of 16S sequences obtained per run ● Align sequences, build evolutionary tree, analyze Rinke et al 2013, Nature Uncultured bacterial diversity greatly exceeds known isolate diversity
  • 4. Is metagenomics the answer...?? Smash with bead beater Sample Purify DNA Analyze Shear DNA to 400nt frags Metagenomics loses long-range linkage At least two paths forward: 1. Physically “dissect” microbial communities Sequence (yay!) 2. Better inference methods Prep sequencer library 3. Nanopore long read sequencing?
  • 5. What is Hi-C? High throughput sequencing of proximity-ligation products Landmark papers Chromosome conformation capture: sdfdf Hi-C: Lieberman-Aiden et al 2009 Single (human) cell Hi-C: Nagano et al 2013
  • 6. We propose to use Hi-C for metagenomics Cell/Species A Cell/Species B Conjecture: DNA in the same cell will become crosslinked, ligated, and sequenced more frequently (green links) than DNA fragments from different cells (red links)
  • 7. A little experiment to test Hi-C Selected 5 strains, 4 species: + E. coli BL21 (BL21) + E. coli K12 (K12) + Pediococcus pentosaceus (Ped) + Burkholderia thailandensis (2 chromosomes: Bur1, Bur2) + Lactobacillus brevis (chromosome + 2 plasmids: Lac0, Lac1, Lac2) Grew up cultures, mixed in equal parts based on OD600 Applied standard Hi-C protocol, sequenced on Illumina MiSeq ~22M read pairs, 160nt reads
  • 8. Reconstructing genomes from within metagenomes A classic problem in metagenomics Can't de novo assemble Hi-C data due to presence of ligation junctions, other issues To test reconstruction we instead simulated reads with BioGrinder and assembled with soapdenovo. Assembly stats: Cov. Contigs (scaff.) Bp assem. N50 (scaff.) N90 (scaf) Max. len. Contig. error Scaff. error Pct. assem. 100 7687 (557) 15,577,164 87634 8552 379623 0 0 77.37
  • 9. A Hi-C scaffold graph ● Nodes are assembly contigs ● node size ∝ contig size ● ● edge weights ∝ normalized Hi-C read counts linking contigs Fruchterman-Reingold layout
  • 10. A Hi-C scaffold graph ● Nodes are assembly contigs ● node size ∝ contig size ● edge weights ∝ normalized Hi-C read counts linking contigs ● Fruchterman Reingold layout ● Nodes colored by SPECIES ● Can we actually compute clusters? ● Markov clustering, I=1.1: 4 clusters, >97% of genome in each bin
  • 11. Contig clustering: why it works Rate at which Hi-C reads associate within and across species Hi− read pair insert distributions C L. brevis ATCC367 P. pentosaceus ATCC25745 E. coli K12 DH10B E. coli BL21(DE3) B. thailandensis E264 chrI B. thailandensis E264 chrII 0 500000 1000000 1500000 2000000 2500000 Insert distance in nucleotides ● 99% of Hi-C read pairs associate within-species ● Hi-C read pairs associate throughout the chromosome ● Spans distances that no existing or imaginary “long read” technology can cover
  • 12. Resolving strain differences: Single nucleotide variants AA TT AA AA G G C C G G C C - SNPs between E. coli K12 and BL21 are nodes - Edges link SNPs observed in same read pair Are Hi-C graphs are better connected than mate-pair?? 5k MP Largest 6.2% connected component 10k MP 20k MP Hi-C 16.6% 32.4% 97.8% Mate pair Hi-C
  • 13. Resolving strain differences: Hi-C versus mate-pair Distance on genome versus path length in variant graph Mate pair graph distances scale linearly. Hi-C graphs are scale-free. Estimation error in probability of variant linkage grows with path length!
  • 14. Opportunities to collaborate 1. Species clustering of assemblies 2. Resolving strain haplotypes 3. Application to microbial communities! 4. Improvement of the Hi-C protocol 5. Population metagenomics – track gene flow of, e.g. antibiotic resistance plasmids 6. Your idea (and some grant funding) here
  • 15. Thanks to… Lutz Froenicke Richard Michelmore Ian Korf Matthew DeMaere Christopher Beitel Jonathan Eisen UC Davis UC Davis Jenna Lang UC Davis Catherine Burke Manuscript preprint online at https://peerj.com/preprints/260/ twitter: @koadman

×