0
Surya Saha, Ph.D.
Cornell University & Boyce Thompson Institute
suryasaha@cornell.edu @SahaSurya
Centre for Agricultural B...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film,...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 3
Sequencing
1953
DNA Structure
discovery
1977
2012
Sanger DNA sequencing by
chain-terminating inhibitors
1984
Epstein-Barr
virus
(170 ...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 5
Its all about the $£€¥
http://www.genome.gov/sequencingcosts/
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 6
First generation sequencing
Sanger method
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 7
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the ...
Sanger method
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 8
http://bit.ly/1g6Cudq
http://bit.ly/1lcQO4J
First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
6/15/2014 Centre for Agricultura...
Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Io...
454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
6/15/2014 Centre for Agricultural Bioinformatics, ...
Illumina
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 12
Output 15 Gb 120 GB 1000 GB 1800 GB
Number
of Reads
25 ...
Illumina
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 13
http://1.usa.gov/1fP9ybl
Illumina:Moleculo
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 14
http://bit.ly/1aEPOBn
Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
6/15/2014 Centre for Agricultural Bioinformatics,...
Pacific Biosciences SMRT sequencing
Error correction methods
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 16
Hie...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 17
Pacific Biosciences SMRT sequencing
Read Lengths
Oxford Nanopore
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 18
https://www.nanoporetech.com/
• No data yet??
• ...
Others
• Ion Torrent Proton/PGM
• Nabsys
• SOLiD
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 19
Comparison
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 20
Next generation sequencing
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 21
Run Time Read Length Quality
Total
nu...
http://omicsmaps.com/
Next Generation Genomics:
World Map of High-throughput Sequencers
Centre for Agricultural Bioinforma...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 23
http://bit.ly/18pfUId
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
6/15/2014 24Centre for Agricultural Bioinformatics, Pusa
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
6/15/2014 Centre for Agric...
Implications of Choice of Library
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 26
Slide credit: Aureliano Bombar...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 27
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a ...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 28
Genome Assembly
Whole Genome Shotgun Sequencing
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 29
Slide credit: cbcb.umd.edu
Genome Sequencing Strategies
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 30
Slide credit: Aureliano Bombarely
Genome Sequencing Strategies
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 31
International Human Genome Sequenci...
DeBruijnGraph
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 32
Ingredient for a Good Assembly
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 33
Slide credit: Mike Schatz
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 34
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 35
Bird Snake
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 36
• You have the expertise to install and run
• You have the suita...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 37
Which technology to use??
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq method...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 39
SOL Genomics Network
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 40
The SGN Team!!
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 41
Surya Saha, Tom Fisher-York, Hartmut Foerster, Su...
SGN Website
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 42
http://solgenomics.net
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 43
Main web page (front page):
WEB ICONS
TOOL BAR
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 44
Main web page (front page):
TOOL BAR
(MENUS)
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 45
But the DATA also can be
edited
LocusLocus Editor Data
Community...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 46
You need
• SGN account.
• Activate submitter / Locus Editor priv...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 47
Tools
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 48
Genome Browser: GBrowse
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 49
Genome Browser: JBrowse
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 50
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 51
CassavaBase
http://cassavabase.org/
Slide credit: Jeremy Edwards
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 52
NextGen Cassava Project
● Project: Adapt SGN database for Cassav...
SGN/Cassavabase behind the scenes
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 53
● Perl/Catalyst MVC Framework
...
Objectives
Provide cassava breeders and researchers access
to data and tools in a centralized, user-friendly
and reliable ...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 55
Genomic Selection
The 'training population' is genotyped and phe...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 56
Data collection in the field
● Android tablets
● Field book app
...
Cassava Trait Ontology
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 57
Kulakow et al. 2011
Kulakow et al. 2011
●...
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 58
Position available at Solgenomics
Cassavabase project
Plant Bree...
Thank you!!
Questions??
6/15/2014 Centre for Agricultural Bioinformatics, Pusa 59
Upcoming SlideShare
Loading in...5
×

Sequencing, Genome Assembly and the SGN Platform

614

Published on

This talk was presented at IASRI Pusa on June 13th, 2014.

Centre for Agricultural Bioinformatics
Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi - 110012 (INDIA)
http://cabgrid.res.in/cabin/

Published in: Science, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
614
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Sequencing, Genome Assembly and the SGN Platform"

  1. 1. Surya Saha, Ph.D. Cornell University & Boyce Thompson Institute suryasaha@cornell.edu @SahaSurya Centre for Agricultural Bioinformatics Pusa, New Delhi June 13,2014 Slides: http://bit.ly/CABin_Pusa_2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die Genome Assembly Jason Chin http://www.bit.ly/SZPKIG
  2. 2. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 2 You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
  3. 3. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 3 Sequencing
  4. 4. 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide credit: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 4 Pinus taeda (24 Gb) 2014 MinION The Next Generation
  5. 5. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 5 Its all about the $£€¥ http://www.genome.gov/sequencingcosts/
  6. 6. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 6 First generation sequencing
  7. 7. Sanger method 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 7 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  8. 8. Sanger method 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 8 http://bit.ly/1g6Cudq http://bit.ly/1lcQO4J
  9. 9. First generation sequencing • Very high quality sequences (99.999%) • Very low throughput 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 9 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  10. 10. Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS I/RS II – Ion Torrent Proton/PGM – SOLiD – 454 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 10 http://www.acgt.me/blog/2014/3/10/next-generation- sequencing-must-diepart-2
  11. 11. 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 11 http://bit.ly/1ehwxWN GS FLX Titanium http://bit.ly/1ehAcEh
  12. 12. Illumina 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 12 Output 15 Gb 120 GB 1000 GB 1800 GB Number of Reads 25 Million 400 Million 4 Billion 6 Billion Read Length 2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014) 2x150 bp Cost $99K $250K $740K $10M Source: Illumina $1000 human genome??
  13. 13. Illumina 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 13 http://1.usa.gov/1fP9ybl
  14. 14. Illumina:Moleculo 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 14 http://bit.ly/1aEPOBn
  15. 15. Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 15 http://bit.ly/1naxgTe
  16. 16. Pacific Biosciences SMRT sequencing Error correction methods 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 16 Hierarchical genome-assembly process (HGAP) PBJelly Enlish et al., PLOS One. 2012 PBJelly
  17. 17. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 17 Pacific Biosciences SMRT sequencing Read Lengths
  18. 18. Oxford Nanopore 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 18 https://www.nanoporetech.com/ • No data yet?? • Error model http://erlichya.tumblr.com/post/66376172948/hands-on- experience-with-oxford-nanopore-minion
  19. 19. Others • Ion Torrent Proton/PGM • Nabsys • SOLiD 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 19
  20. 20. Comparison 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 20
  21. 21. Next generation sequencing 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 21 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10 Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15 Illumina Hiseq 2500 11days 2x125bp >Q30 1000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 2h 10-20kb >Q30 consensus >Q10 single 400-800MB /SMRT cell $0.33-$1 http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
  22. 22. http://omicsmaps.com/ Next Generation Genomics: World Map of High-throughput Sequencers Centre for Agricultural Bioinformatics, Pusa6/15/2014 22
  23. 23. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 23 http://bit.ly/18pfUId
  24. 24. Real cost of Sequencing!! Sboner, Genome Biology, 2011 6/15/2014 24Centre for Agricultural Bioinformatics, Pusa
  25. 25. Library Types Single end Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 25 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely
  26. 26. Implications of Choice of Library 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 26 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) NNNNN NN
  27. 27. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 27 Quality control: Encoding http://bit.ly/N28yUd Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated probability of a base being incorrect
  28. 28. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 28 Genome Assembly
  29. 29. Whole Genome Shotgun Sequencing 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 29 Slide credit: cbcb.umd.edu
  30. 30. Genome Sequencing Strategies 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 30 Slide credit: Aureliano Bombarely
  31. 31. Genome Sequencing Strategies 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 31 International Human Genome Sequencing Consortium 2001 Overlap Layout Consensus http://contig.wordpress.com/ cbcb.umd.edu
  32. 32. DeBruijnGraph 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 32
  33. 33. Ingredient for a Good Assembly 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 33 Slide credit: Mike Schatz
  34. 34. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 34
  35. 35. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 35 Bird Snake
  36. 36. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 36 • You have the expertise to install and run • You have the suitable infrastructure (CPU & RAM) to run the assembler • You have sufficient time to run the assembler • Is designed to work with the specific mix of NGS data that you have generated • Best addresses what you want to get out of a genome assembly (bigger overall assembly, more genes, most accuracy, longer scaffolds, most resolution of haplotypes, most tolerant of repeats, etc.) The BEST?? Genome Assembler for YOU http://haldanessieve.org/2013/01/28/our-paper-making-pizzas-and-genome-assemblies/
  37. 37. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 37
  38. 38. Which technology to use?? • Microbial genomes • Eukaryotic genomes • Resequencing genomes • RNAseq and other XXXseq methods 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 38 http://bit.ly/1ko9Kgh
  39. 39. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 39 SOL Genomics Network
  40. 40. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 40
  41. 41. The SGN Team!! 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 41 Surya Saha, Tom Fisher-York, Hartmut Foerster, Suzy Strickler, Jeremy Edwards, Noe Fernandez, Naama Menda, Aure Bombarely, Aimin Yan, Isaak Tecle
  42. 42. SGN Website 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 42 http://solgenomics.net
  43. 43. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 43 Main web page (front page): WEB ICONS TOOL BAR
  44. 44. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 44 Main web page (front page): TOOL BAR (MENUS)
  45. 45. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 45 But the DATA also can be edited LocusLocus Editor Data Community Data Curation
  46. 46. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 46 You need • SGN account. • Activate submitter / Locus Editor privileges by SGN curator LocusLocus Editor Data
  47. 47. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 47 Tools
  48. 48. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 48 Genome Browser: GBrowse
  49. 49. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 49 Genome Browser: JBrowse
  50. 50. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 50
  51. 51. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 51 CassavaBase http://cassavabase.org/ Slide credit: Jeremy Edwards
  52. 52. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 52 NextGen Cassava Project ● Project: Adapt SGN database for Cassava Breeding ● Goal: Apply Genomic Selection to cassava breeding ● Predict breeding values from genotype information ● Shorten the breeding cycle ● Massive amounts of genotypic data (GBS) ● Phenotypic data ● Data management challenge ● Improve flowering ● http://nextgencassava.org Slide credit: Jeremy Edwards
  53. 53. SGN/Cassavabase behind the scenes 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 53 ● Perl/Catalyst MVC Framework ● PostgreSQL Database ● Generic Model Organism Database (GMOD) – Chado relational database schema – GBrowse – JBrowse ● R – Experimental design – QTL mapping – Genomic selection Slide credit: Jeremy Edwards
  54. 54. Objectives Provide cassava breeders and researchers access to data and tools in a centralized, user-friendly and reliable database. – Improve partner breeding program information tracking – Streamline management of genotypic and phenotypic data – Pipeline genotypic and phenotypic data through Genomic Selection prediction analyses 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 54 Slide credit: Jeremy Edwards
  55. 55. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 55 Genomic Selection The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. From Heffner et al. 2009 Crop Sci. 49:1–12 Information from a majority of lines in the breeding population (the training set) is used to create the prediction model. The model is then used to predict the phenotypes of the remaining lines (the validation set), using genotypic information only. The results from the model are compared to the actual data to give the prediction accuracy. Image courtesy of Martha Hamblin, Cornell University Flow diagram of a genomic selection breeding program. Breeding cycle time is shortened by removing phenotypic evaluation of lines before selection as parents for the next cycle. From Heffner et al. 2009 Crop Sci. 49:1–12 Slide credit: Jeremy Edwards
  56. 56. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 56 Data collection in the field ● Android tablets ● Field book app – Jesse Poland's group at USDA-ARS / Kansas State University Slide credit: Jeremy Edwards
  57. 57. Cassava Trait Ontology 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 57 Kulakow et al. 2011 Kulakow et al. 2011 ● Standard terminology ● Facilitate the sharing of information ● Allow users to query keywords related to traits Slide credit: Jeremy Edwards
  58. 58. 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 58 Position available at Solgenomics Cassavabase project Plant Breeding + Bioinformatician ● Familiar with breeding ● Programming in Perl, R, SQL, Hadoop ● Linux ● Africa ● Genius http://www.cassavabase.org/forum/posts .pl?topic_id=9
  59. 59. Thank you!! Questions?? 6/15/2014 Centre for Agricultural Bioinformatics, Pusa 59
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×