Genomics 2011 lecture 2
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Genomics 2011 lecture 2



C. elegans genome project. The development of original genome sequencing stratagy.

C. elegans genome project. The development of original genome sequencing stratagy.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Genomics 2011 lecture 2 Presentation Transcript

  • 1. C. elegans Genetics C.elegans has 2 sexes, self fertilizing hermaphrodites and males. Sex determined chromosomally - XX-hermaphrodite, X-male. Diploid for 5 autosomes. Standard classical genetic techniques can be applied. Life cycle – Zygote to adult ~3 days. Grow on petri dish – they eat bacteria. Can store them frozen in liquid nitrogen indefinately. Why might the hermaphrodite sex be useful for genetics?
  • 2. Chromosome I Genetic mapping.Left arm m.u. bli-3 m.u. = map unit. -15 egl-30 Genetic mapping – recombination. mab-20 -10 1 m.u. is 1% recombination per meiosis. -5 fog-1 unc-73 unc-57Central 0 dpy-5 dpy-14cluster fer-1 5 lin-11 unc-29 unc-75 Parent Recombinant 10 unc-101 15 20 glp-4 fog-1 + fog-1 + 25 unc-54 glp-4 + + glp-4Right arm
  • 3. We want to understand how life works – at the molecular level.We had mutant genes with informative phenotypes.The mutated genes were mapped onto linkage groups – chromosomes.What kinds of proteins do these genes encode and how do these proteinsfunction? In 1983, identifying the molecular sequence of a gene defined by mutation was a complicated and time consuming business, even in the worm. If we only new the sequence of the genome!
  • 4. As the term applies to recombinant DNA, what is a clone? Starting with DNA extracted from any organism, Vector How can you take that and get one single fragment into a vector and grow billions of copies of that single “cloned” molecule? Cloned DNA insert
  • 5. C. elegans Genome Project unc-101 unc-75 unc-54 unc-73 mab-20 lin-11 dpy-5 glp-4 fog-1 egl-30 fer-1 Mutants - function bli-3 Genetic map 25 10 15 20 0 5 -15 -10 -5 Chromosomes AACGTTCCACG.......DNA sequence – genes Cloned DNAand proteins fragments Identify DNA sequences corresponding to genes defined by mutation.
  • 6. If you wanted to clone sections of chromosomes for sequencing, how many copies of each chromosome would you start with? DNAOf the order of millions – millions of copies of each chromosome
  • 7. Purified genomic DNAFragment the chromosomalDNA – either restrictionenzyme or mechanical shear.
  • 8. Cloning methods used by the C. elegans genome project Cosmid clones – ~ 40 Kb insert size – Genomic Library. Cosmid cloning vector Linearised cosmid vector Random fragments of genomic DNA – Drug resistance marker E. coli origin of replication millions of them. cos site Useful restriction sites DNA LigaseLong concatenates of cosmidvectors interspaced withrandom fragments ofgenomic DNA.
  • 9. Mixed population “inserts” In vitro lambda packaging extracts Lambda Terminase Other phage proteins COS sites in cosmid vector E. coli Critical stepPhage “transfects” singlecosmid into an E. coli cell.
  • 10. CLONING This is a clone Cells are plated onto medium with antibiotic selection. Cells grown up to form bacterial colonies. Insert X Each colony is derived from a single transfected cell. Each colony is a clonal population. E. coli - clonal population with a single cosmid clone – single genomic DNA fragment. Billions of copies of one cloned insert. Freeze it for storage. Purify cosmid DNA. Sequence the insert.Solid medium on plates Liquid culture Sub-clone fragments etc.
  • 11. Started with many millions of different fragments of chromosomal DNA inone tube.End up with potentially millions of CLONED fragments, each in a differentE.coli colony – or culture.
  • 12. We have got as far as random cloned fragments of genomic DNA.What next?Average cosmid insert size – 40 KbC.elegans genome ~100.3 Mb = 100,300 Kb100,300/40 = 2,507.5i.e. ~2,500 cosmid clones could contain the entire C. elegansgenome – but WOULD they?
  • 13. In principle, 2500 cosmid clones could contain all the DNA of the C. elegans genome. Why not just start sequencing ~2500 clones picked at random? Imagine this: I give you a large and awkwardly shaped dice with 2500 faces, with a single number on each face, the numbers 1-2500. Roll the dice and write down the number on top. Repeat this – again and again and……. How many times would you have to roll the dice so that every face of the dice would have been on top at least once?~ 4x2500 will give ~95% probability of any one side or DNA fragment, appearing.~10x2500 raises probability to ~99%
  • 14. The Golden PathWhat if you could identify clones that overlapped slightly with ones another? How can we get these clones? Cloned DNA fragments – moderate overlaps.With this approach you could sequence the entire genome bysequencing less than 5000 cosmid clones (2x2500)
  • 15. Cosmid fingerprinting 1. Restriction digest of cosmid DNA. 2. Separate fragments according to size by gel electrophoresis. 3. Digitise the ladder of different sized DNA fragments obtained. Multiple common fragments – clones probably overlap. C. elegans genome project, ~17,000 cosmid clones fingerprinted.A B C Assembled into “contigs” – overlapping clones. “Contig” ~17,000 random cosmid clones A Fingerprinting ~700 contigs B C D C.elegans genome 100 Mb ~2,500 cosmid clones
  • 16. 700 contigs.What is the minimum number of contigs the C. elegans genome could becontained in?Or – how would we know when we had succeeded in joining all the contigs? A method of filling the gaps – joining the contigs – was needed.
  • 17. YACs – Yeast Artificial ChromosomesDNA inserts of ~100 kb – 2 Mb.Grown in yeast.Clonal growth of yeast colonies, much like cosmids in E. coli.YAC DNA separated by pulsed-field gel electrophoresis. C. elegans genome is ~100 Mb. Cosmid clones – approximately 40 kb inserts. YAC clones – select average 500 kb inserts. ~2500 cosmid clones would permit 1x coverage of the genome. ~200 YAC clones would permit 1x coverage of the genome.
  • 18. ~17,000 fingerprinted cosmid clones – ~700 unlinked contigs.Cosmid clonecontigs ? ?6 Chromosomes AACGTTCCACG....... unc-101 unc-75 unc-54 unc-73 mab-20 lin-11 dpy-5 glp-4 fog-1 egl-30 fer-1 bli-3Genetic map 15 10 20 25 0 5 -15 -10 -5
  • 19. Joining up the contigs Contig X Contig Y YAC clone~700 contigs – grids ofrepresentative cosmid clones. • Large YAC clones (> 1Mb). • Purify YAC DNA – (PFGE). • Radio-label YAC DNA. • Hybridise to cosmid grid. • Expose to X-ray film. Linked cosmid clones
  • 20. unc-101 unc-75 unc-54 unc-73 mab-20 lin-11 dpy-5 glp-4 fog-1 egl-30 fer-1 bli-3 Genetic map 10 15 20 25 0 5 -15 -10 -5A physical map of the genome - the “Golden Path” – chromosomes represented in orderedoverlapping clones or “clone contigs”.YACs Cosmids The Sequence of The Genome
  • 21. Sequencing the C. elegans Genome Individual cosmid clone. Randomly fragmented and shotgun cloned into sequencing vectors. Generally smaller insert size is best for primary sequence determination – 2-10 Kb.Sequence of cosmid or YAC etc, determined and compiled in silico.Finishing – directed cloning to fill in any gaps.Check for overlap of sequence with overlapping cosmids.
  • 22. Gaps between cosmid contigs ~20% of genome.Most of these gaps were not random. They contained regions that could not becloned in cosmids.YAC clones covering most of the gaps.YAC DNA shotgun cloned into M13 or plasmid vectors.Most of the DNA contained in these awkward regions was successfully sub-clonedinto small insert size vectors, and sequenced.The sequence as published in December 1998 was generated from:2527 cosmids, 257 YACs, 113 fosmids, 44 PCR products.
  • 23. C. elegans cosmid K06A5, 24323 bp.Flat sequence file –3955 bp shown.>CEK06A5acaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcttctctcctcgttctctgctcacaactcgtctatcactcatatcacatttatttcccaatatcattttaacaacatcttccgatgcatgttcgtcaatattgcgcaaccactttgcaatattgtcaaaacttttcgcatttgtgatatcgtaaaccagcataattcccattgctccgcggtaatatgatgttgtgattgtgtggaatcgttcttgtccagctgtgtcccagatttgtaatttaatcttttttccttttaattcgatagttttaattttgaagtcgattcctgaatgaaaaaagaaaattattttgaaatcactagattctgaataaaaactaaccaatagttgagatgaatgtggtgttaaaggcatcatccgaaaatctgtacagaatgcaagtttttccaactcctgagtcgcctattagcagcaatttgaagagcatgtcatacggtcggcgagccatttttcttctgaaatgagaaaaagttgagaactaaagttgcacaaaagtaagagaaaagcacttgagtcatggcaaatagaacgaacactttgagatttcgaagaagttatcaagagttgacaattggaagatatttggaagaactttctaatttttttctagttttccaaaattaggtttttgtcataaaatgttgtcaaagaaaaaacaggacaaaatagttaattgttgtttccattataacaaaaaaaaatttgaacggagctattaacgcgtgcatgcgcaaatcacatcgattagctgtttctgggaaattctcgggaaaaggtgaacagcagctgctggcttcctctgcgggtcacgaaaacacaaagagatcattataattgttatttggaaaggaagcgaatctaaaacgggtacaggtggacgtttattgatcgaaagtgctttttatttgaaattgaatggtgaactttgcaattttgtaatgcaaagtacgttatcagatggcatgagatgtgtgaagtgataaggaataaaatgtgaacgacatgttcaagaaactgtgatttttcaataatttgtgatgaaatattttaggaacagaaatgaacatattaattgatataaaaacaataggaacactaactcataattatgataggtgaatatcaaaatgtgctagattttttgaagttaaaaaatacatttctaatattttttcaaataataagtttcagctgaaatttcagggtgatttcagaaagctatgttttgataaattgttttgaaaattaaaagaagctacagcaaaaaaaaattaaagagaacatcgctccctcgtagtgtataatttttgattatcgaaaaaaatgagtcaatgatgaaaaggaagtcgcaatctcaaaacttcaaaaatcaaaagaagccgttgcctctgtcatcaaaaattcagaagacaaggttgttgacaagggtcaattctcagtggtggagggcattgggcgtggtgaaatttttgaaggctagtgtggttggacctctactagatagacaaaacccccgaaatagacgtttaatttgatgagatggtggagaaagaaaaggactcattctctagatgatagagagaccagagatacagacaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcatgtgtttttatgtttccggtgggagaaggttcaacaaaaaatgaaaagaaaaagttcaagcggcatgaatcattctgagtttaaaacaaaattattgcgaaaattaatattaaaaccttttcacaaaacttcaagctaatctgttcatgaaaatttgaataatagttttttcccacctatttagaattaacttcatattaacgaaattaattaacgaatcgaaaattatgacttttcagaatcatctgaagttttttcacattccatgctgcatggaataatttgatcctggaatcgatatgtttttatggtatactttttaaccttcaatttagctggaaaagtatggaataaataattcccgaagctatgtacatatatgtagaattattgaatgattgtgagaacaacttgactttagcttgagtaggaatcggaatggctatcgaccgatcaacacttaggattgtaagaatggcagtaagaatatattgaagaaagaatgtttgttcataggaagagaaagagtattgcgaaatcatcatcgcccactttagaatggacgggcggtgagcggacatagagaattgtgaatgactaatgcttttgcagaatctagggcaaaatcgtaggaacaaacaattgtaatacggagaaaacaatcatatcgatcgatgatcatggagaaaaatgtgatttaagtgagtagacttggaaaaattaataaaagcatgaattgtcgatatttttcatttattttcattataaagctctttaaaaacaaattaaatattgagaatggcttcgaagaatattgtttcaaatatgttcaatggtgacaccttgcggataaaattaatgtaaaaatcatggaacacagattcactgatatctcattatctcaagcagtgtaattagagattttttggaacaattattttataaaactataaataaaccgtttatactactcaaagccaaatattcaagctattaccattttttttctaactaattcttgagcaattaaagtattccccagtttttattttgcaacgactccaggcaaacacgctccgttgcacttgccgccaaggcgttgcattcaaatcagagagacatctcattccgatttctgtttttcttccaataaacggtattttatgcctaatgggtgatacggaaattgttcctcttcgagtacaaaatgtacttgatagcgaaatcattcgtctcaacttgtggtccatgaaggtaactgtctagtttttttaagttttcatgatttcaatatttttacagtttaacgcgaccagtttcaaactcgaaggttttgtgagaaatgaagaaggcactatgatgcagaaagtttgttccgaatttatttgtgtaagtcgagaaacatattcgtcaacaattttcattaaatattcagagacgcttcacttctacgttgcttttcgatgtttccggacgtttcttcgacttggtcggacagattgatcgggaatatcaacaaaaaatgggaatgcctagtagaattattgatgaattttcaaatggaattcctgaaaattgggccgaccttatctattcctgcatgtcagccaaccaaagaagcgcacttcgccctatccaacaggctccaaaagaaccaattagaactagaacagaaccaattgttacgttggcagatgaaaccgagctaactggaggatgccagaaaaattccgaaaacgagaaagaaaggaacagacgtgagcgtgaagaacagcaaacaaaggaacgtgagagaagattagaagaagaaaaacaacgacgagatgctgaagctgaggctgaaagaaggcgaaaagaagaggaagagctggaagaagctaattacacccttcgtgctccgaaatctcagaacggcgagccaatcactccgataaga
  • 24. Genome sequence of C.elegans. Sequence of entire genome. Sequence of cDNA clones. Approximately 19,500 predicted protein coding gene sequences. Large number of various kinds of functional RNAs – not discuss further. For this lecture – focus predicted proteins. Gene prediction? How?Science, December 1998.
  • 25. Computer based predictionsGENEFINDERBiases in coding sequence - in C. elegans non-coding is AT rich.Splice site signals, initiator methionines, termination codons.Likely exons and probable/possible splice patterns. • Evidence that a prediction is correct? • Homology with genes in other organisms – homologues. • Known protein families. •Experimental evidence.