SlideShare a Scribd company logo
Surya Saha
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
suryasaha@cornell.edu // Twitter:@SahaSurya
BTI Plant Bioinformatics Course 2017
http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die
Earth BioGenome Project (EBP)
3/28/2017 BTI Plant Bioinformatics Course 2017 2
• Complete genome of 1
representative from each
eukaryotic family (9000)
• Low coverage sequencing of a
species from each of the 150,000
to 200,000 genera
• Budget estimate $4.8 billion
Maybe better to sequence less to
higher quality and invest in
interpretation???
http://omicsomics.blogspot.com/2017/02/earth-biogenome-project-ill-conceived.html
1953
DNA
Structure
discovery
1977
2012
Sanger DNA
sequencing by
chain-terminating
inhibitors
1984
Epstein-Barr
virus
(170 Kb)
1987
Abi370
Sequencer
1995
2001
Homo
sapiens
(3.0 Gb)
2005
454
Solexa
Solid
2007
2011
Ion
Torrent
PacBio
Haemophilus
influenzae
(1.83 Mb)
2013
Slide concept: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina
Hiseq X
454
3/28/2017 BTI Plant Bioinformatics Course 2017 3
Pinus
taeda
(24 Gb)
2014
Nanopore
MinION
2015
10X
Genomics
First generation sequencing
3/28/2017 BTI Plant Bioinformatics Course 2017 4
Sanger. Annu Rev Biochem. 1988;57:1-28.
Thanks to Nick Loman for the mention
Maxam-Gilbert method
3/28/2017 BTI Plant Bioinformatics Course 2017 5
Maxam-Gilbert method
3/28/2017 BTI Plant Bioinformatics Course 2017 6
http://en.wikipedia.org/wiki/File:Maxam-
Gilbert_sequencing_en.svg
https://www.nationaldiagnostics.com/electrophoresis
/article/maxam-gilbert-sequencing
Sanger method
3/28/2017 BTI Plant Bioinformatics Course 2017 7
Frederick Sanger
13 Aug 1918 – 19 Nov 2013
Won the Nobel Prize for Chemistry in 1958 and
1980. Published the dideoxy chain termination
method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
Sanger method
3/28/2017 BTI Plant Bioinformatics Course 2017 8
http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg
http://en.wikipedia.org/wiki/File:
Radioactive_Fluorescent_Seq.jpg
First generation sequencing
• Very high quality sequences (99.999% or Q50)
• Very very low throughput
3/28/2017 BTI Plant Bioinformatics Course 2017 9
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400
http://www.hindawi.com/journals/bmri/2012/251364/tab1/
Next generation sequencing
3/28/2017 BTI Plant Bioinformatics Course 2017 10
Use the specific technology used
to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Ion Torrent Proton/PGM
– SOLiD
– Oxford Nanopore
3/28/2017 BTI Plant Bioinformatics Course 2017 11
http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must-
diepart-2
454 Pyrosequencing
One purified DNA
fragment, to one bead, to
one read.
3/28/2017 BTI Plant Bioinformatics Course 2017 12
http://www.genengnews.com/
GS FLX
Titanium
https://mariamuir.com/wp-
content/uploads/2013/04/rip.gif
Illumina
3/28/2017 BTI Plant Bioinformatics Course 2017 13
Output 15 Gb 120 GB 1500 GB 1800 GB
Max Number
of Reads/
Run
25 Million 400 Million 5 Billion 6 Billion
Max Read
Length
2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
2500
3000
4000
500
550
Illumina
3/28/2017 BTI Plant Bioinformatics Course 2017 14
Output 15 Gb 120 GB 1500 GB 1800 GB
Max Number
of Reads/
Run
25 Million 400 Million 5 Billion 6 Billion
Max Read
Length
2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
2500
3000
4000
500
550
Illumina
3/28/2017 BTI Plant Bioinformatics Course 2017 15
Output 15 Gb 120 GB 1500 GB 1800 GB
Max Number
of Reads/
Run
25 Million 400 Million 5 Billion 6 Billion
Max Read
Length
2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp
Cost $99K $250K $740K $10M (10 units)
Source: Illumina
2500
3000
4000
500
550
Illumina
3/28/2017 BTI Plant Bioinformatics Course 2017 16
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
Illumina
3/28/2017 BTI Plant Bioinformatics Course 2017 17
Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
Pacific Biosciences SMRT sequencing
Single Molecule Real
Time sequencing
3/28/2017 BTI Plant Bioinformatics Course 2017 18
http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif
RS II
Sequel
Pacific Biosciences SMRT sequencing
Error correction methods
3/28/2017 BTI Plant Bioinformatics Course 2017 19
Hierarchical genome-assembly
process (HGAP)
English et al., PLOS One. 2012
PBJelly
Pacific Biosciences SMRT sequencing
Error correction methods
3/28/2017 BTI Plant Bioinformatics Course 2017 20
PBcRPipeline
3/28/2017 BTI Plant Bioinformatics Course 2017 21
Pacific Biosciences SMRT sequencing
Read Lengths
Oxford Nanopore
3/28/2017 BTI Plant Bioinformatics Course 2017 22
https://www.nanoporetech.com/
http://erlichya.tumblr.com/post/66376172948/hands-on-
experience-with-oxford-nanopore-minion
http://halegrafx.com/vector-art/free-vector-despicable-me-minions/
3/28/2017 BTI Plant Bioinformatics Course 2017 23
3/28/2017 BTI Plant Bioinformatics Course 2017 24
http://lab.loman.net/2017/03/09/ultrareads-for-nanopore/
E. coli K-12 MG1655 on a standard
FLO-MIN106 (R9.4) flowcell
Next generation sequencing
3/28/2017 BTI Plant Bioinformatics Course 2017 25
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing
24h 700 bp Q20-Q30 1 GB $10
Illumina Miseq 27h 2x300bp > Q30 15 GB $0.15
Illumina Hiseq
2500
1 - 10days 2x250bp >Q30 3000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences
30m - 4h 10kb - >40kb
>Q50 consensus
>Q10 single
500 - 1000MB
/SMRT cell
$0.13 - $0.60
http://www.hindawi.com/journals/bmri/2012/251364/
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431227
Note: Some figures might be out of date
Long range scaffolding
3/28/2017 BTI Plant Bioinformatics Course 2017 26
Hi-C Crosslinking
3/28/2017 BTI Plant Bioinformatics Course 2017 27
3/28/2017 BTI Plant Bioinformatics Course 2017 28
http://mms.businesswire.com/media/20150225005296/en
/454639/5/GemCodePlatform.jpg
• Long read information from short reads using 14bp bar codes
• Very low input DNA ( as low as 0.625 ng)
• Short library preparation time
• 1ng of DNA is split across 100,000 Gel Coated Beads (GEMs)
• Chromium instrument for single-cell RNAseq
GemCode
3/28/2017 BTI Plant Bioinformatics Course 2017 29
http://mms.businesswire.com/media/20150225005296/en
/454639/5/GemCodePlatform.jpg
GemCode
http://www.nature.com/nbt/journal/v34/n3/full/nbt.3432.html
3/28/2017 BTI Plant Bioinformatics Course 2017 30
http://www.bionanogenomics.com/technology/why-genome-mapping/
3/28/2017 BTI Plant Bioinformatics Course 2017 31
Human MHC map
• Sample prep requires very high molecular weight DNA
• Nicks at 10 sites / 100kb
• Individual molecules are assembles into optical maps
• Optical maps and sequences are merged in a hybrid assembly
http://www.bionanogenomics.com/technology/why-genome-mapping/
Many Others..
• Ion Torrent Proton/PGM
• Dovetail
• Supporting technologies
– Nabsys
– OpGen
– Fluidigm
3/28/2017 BTI Plant Bioinformatics Course 2017 32
http://nextgenseek.com/2012/11/did-you-know-there-are-
at-least-14-next-gen-sequence-technology-companies/
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
3/28/2017 33BTI Plant Bioinformatics Course 2017
3/28/2017 BTI Plant Bioinformatics Course 2017 34
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-8-125
So What Sequencer Do I Use??
Microbial genome
• Draft genome
– Illumina Miseq (100-130X)
– Illumina Hiseq (<200X)
• Complete genome
– Pacific Biosciences (80-100X)
• Amplicons (16S, ITS)
– Illumina Miseq
Eukaryotic genome
• Denovo assembly
– Pacific Biosciences (70-80X)
– Illumina Hiseq (100X+)
– 10X Genomics
– Bionano
• Genotyping (GBS)
– Illumina Hiseq
• BACs
– Pacific Biosciences
3/28/2017 BTI Plant Bioinformatics Course 2017 35
$$$$ ????
3/28/2017 BTI Plant Bioinformatics Course 2017 36
The diploid
reference
genome
Cornell Sequencing Core
• Illumina Hiseq 2500 (Rapid run and High output)
• Illumina Miseq
• Illumina Nextseq 500
• 10X Genomics GemCode
3/28/2017 BTI Plant Bioinformatics Course 2017 37
http://www.biotech.cornell.edu/brc/g
enomics/services/price-list#overlay-
context=brc/genomics-facility/next-
generation-sequencing
$
$
$
Library Types
Single end
Pair end (PE, 150-300 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
3/28/2017 38
F
F R
F R 454/Roche
FR Illumina
Illumina
Slide credit: Aureliano Bombarely
BTI Plant Bioinformatics Course 2017
Implications of Choice of Library
3/28/2017 39
Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers) or Optical maps
NNNNN NN
BTI Plant Bioinformatics Course 2017
Multiplexing Libraries
Use of different tags (4-6 nucleotides) to identify
different samples in the same lane/sector.
3/28/2017 40
Slide credit: Aureliano Bombarely
AGTCGT
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
AGTCGT
AGTCGT
AGTCGT
AGTCGT
TGAGCA
TGAGCA
TGAGCA
TGAGCA
Sequencing
BTI Plant Bioinformatics Course 2017
Data!!
3/28/2017 BTI Plant Bioinformatics Course 2017 41
Fasta files:
It is a text-based format for representing either nucleotide sequences or peptide
sequences, in which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
File Formats
3/28/2017 42
Slide credit: Aureliano Bombarely
BTI Plant Bioinformatics Course 2017
Fastq files:
FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and its corresponding quality scores.
-Wikipedia
• Single line ID with at symbol (“@”) in the first column.
• Sequences can be in multiple lines after the ID line
• Single line with plus symbol (“+”) in the first column to represent the quality line.
• Quality ID line may contain ID
• Quality values are in multiple lines after the + line but length is identical to sequence
3/28/2017 43
Slide credit: Aureliano Bombarely
File Formats
BTI Plant Bioinformatics Course 2017
3/28/2017 44
Quality control: Encoding
Fastq files:
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2017
Quality control: Encoding
3/28/2017 45
!"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33)
KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64)
BTI Plant Bioinformatics Course 2017
3/28/2017 46
Quality control: Encoding
http://en.wikipedia.org/wiki/Phred_quality_score
Phred score of a base is:
Qphred = -10 log10 (e)
where e is the estimated error probability of a base
BTI Plant Bioinformatics Course 2017
Pre-processing: Tools
Trimming
• FastQC
• FASTX toolkit
• Trimmomatic
• Scythe
Joining paired-end reads
• fastq-join
• FLASH
• PANDAseq
3/28/2017 47BTI Plant Bioinformatics Course 2017
Thank you!!
3/28/2017 BTI Plant Bioinformatics Course 2017 48

More Related Content

What's hot

BLAST
BLASTBLAST
BLAST
Rabia W.
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Shaheen Alam
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Uzma Jabeen
 
The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454
creativebiogene1
 
Single nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahuSingle nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahu
KAUSHAL SAHU
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
sarwat bashir
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Physical mapping
Physical mappingPhysical mapping
Physical mapping
Priya Trivedi
 
BLAST
BLASTBLAST
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
Aureliano Bombarely
 
Fasta
FastaFasta
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
Nusrat Gulbarga
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
Aayushi Pal
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
Bioinformatics and Computational Biosciences Branch
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Afra Fathima
 
FASTA
FASTAFASTA
2016. daisuke tsugama. next generation sequencing (ngs) for plant research
2016. daisuke tsugama. next generation sequencing (ngs) for plant research2016. daisuke tsugama. next generation sequencing (ngs) for plant research
2016. daisuke tsugama. next generation sequencing (ngs) for plant research
FOODCROPS
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
yaghava
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 

What's hot (20)

BLAST
BLASTBLAST
BLAST
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454
 
Single nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahuSingle nucleotide polymorphism by kk sahu
Single nucleotide polymorphism by kk sahu
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Physical mapping
Physical mappingPhysical mapping
Physical mapping
 
BLAST
BLASTBLAST
BLAST
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Fasta
FastaFasta
Fasta
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
FASTA
FASTAFASTA
FASTA
 
2016. daisuke tsugama. next generation sequencing (ngs) for plant research
2016. daisuke tsugama. next generation sequencing (ngs) for plant research2016. daisuke tsugama. next generation sequencing (ngs) for plant research
2016. daisuke tsugama. next generation sequencing (ngs) for plant research
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 

Similar to Sequencing 2017

Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
Surya Saha
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
Surya Saha
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
Surya Saha
 
Cloud Polis: Geopolitical Design in Virtual Spaces
Cloud Polis:Geopolitical Design in Virtual SpacesCloud Polis:Geopolitical Design in Virtual Spaces
Cloud Polis: Geopolitical Design in Virtual Spaces
Larry Smarr
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
Larry Smarr
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
Surya Saha
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Christopher Mason
 
Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...
Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...
Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...
confluent
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance Collaboration
Larry Smarr
 
40th TOP500 List and Awarding Session
40th TOP500 List and Awarding Session40th TOP500 List and Awarding Session
40th TOP500 List and Awarding Session
top500
 
System Interconnects for HPC
System Interconnects for HPCSystem Interconnects for HPC
System Interconnects for HPC
inside-BigData.com
 
Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016
Seattle DAML meetup
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Larry Smarr
 
OptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedOptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light Speed
Larry Smarr
 
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebula Project
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
NVIDIA Japan
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
Larry Smarr
 
Data-driven design of cell factories and communities
Data-driven design of cell factories and communitiesData-driven design of cell factories and communities
Data-driven design of cell factories and communities
Laura Berry
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
Miten Jain
 
Big data - short intro on NGS challenges
Big data - short intro on NGS challengesBig data - short intro on NGS challenges
Big data - short intro on NGS challenges
Pawel Szczesny
 

Similar to Sequencing 2017 (20)

Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Cloud Polis: Geopolitical Design in Virtual Spaces
Cloud Polis:Geopolitical Design in Virtual SpacesCloud Polis:Geopolitical Design in Virtual Spaces
Cloud Polis: Geopolitical Design in Virtual Spaces
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...
Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...
Discovering Drugs with Kafka Streams (Ben Mabey & Scott Nielsen, Recursion Ph...
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance Collaboration
 
40th TOP500 List and Awarding Session
40th TOP500 List and Awarding Session40th TOP500 List and Awarding Session
40th TOP500 List and Awarding Session
 
System Interconnects for HPC
System Interconnects for HPCSystem Interconnects for HPC
System Interconnects for HPC
 
Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
OptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedOptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light Speed
 
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
 
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
最新の HPC 技術を生かした AI・ビッグデータインフラの東工大 TSUBAME3.0 及び産総研 ABCI
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
Data-driven design of cell factories and communities
Data-driven design of cell factories and communitiesData-driven design of cell factories and communities
Data-driven design of cell factories and communities
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
Big data - short intro on NGS challenges
Big data - short intro on NGS challengesBig data - short intro on NGS challenges
Big data - short intro on NGS challenges
 

More from Surya Saha

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...
Surya Saha
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
Surya Saha
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Surya Saha
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meeting
Surya Saha
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Surya Saha
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
Surya Saha
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Surya Saha
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
Surya Saha
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
Surya Saha
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
Surya Saha
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Surya Saha
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
Surya Saha
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
Surya Saha
 
Sequencing
SequencingSequencing
Sequencing
Surya Saha
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
Surya Saha
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
Surya Saha
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
Surya Saha
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
Surya Saha
 

More from Surya Saha (20)

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meeting
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
 
Sequencing
SequencingSequencing
Sequencing
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
 

Recently uploaded

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 

Recently uploaded (20)

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 

Sequencing 2017

  • 1. Surya Saha Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY suryasaha@cornell.edu // Twitter:@SahaSurya BTI Plant Bioinformatics Course 2017 http://www.acgt.me/blog/2015/3/7/next-generation-sequencing-must-die
  • 2. Earth BioGenome Project (EBP) 3/28/2017 BTI Plant Bioinformatics Course 2017 2 • Complete genome of 1 representative from each eukaryotic family (9000) • Low coverage sequencing of a species from each of the 150,000 to 200,000 genera • Budget estimate $4.8 billion Maybe better to sequence less to higher quality and invest in interpretation??? http://omicsomics.blogspot.com/2017/02/earth-biogenome-project-ill-conceived.html
  • 3. 1953 DNA Structure discovery 1977 2012 Sanger DNA sequencing by chain-terminating inhibitors 1984 Epstein-Barr virus (170 Kb) 1987 Abi370 Sequencer 1995 2001 Homo sapiens (3.0 Gb) 2005 454 Solexa Solid 2007 2011 Ion Torrent PacBio Haemophilus influenzae (1.83 Mb) 2013 Slide concept: Aureliano Bombarely Sequencing over the Ages Illumina Illumina Hiseq X 454 3/28/2017 BTI Plant Bioinformatics Course 2017 3 Pinus taeda (24 Gb) 2014 Nanopore MinION 2015 10X Genomics
  • 4. First generation sequencing 3/28/2017 BTI Plant Bioinformatics Course 2017 4 Sanger. Annu Rev Biochem. 1988;57:1-28. Thanks to Nick Loman for the mention
  • 5. Maxam-Gilbert method 3/28/2017 BTI Plant Bioinformatics Course 2017 5
  • 6. Maxam-Gilbert method 3/28/2017 BTI Plant Bioinformatics Course 2017 6 http://en.wikipedia.org/wiki/File:Maxam- Gilbert_sequencing_en.svg https://www.nationaldiagnostics.com/electrophoresis /article/maxam-gilbert-sequencing
  • 7. Sanger method 3/28/2017 BTI Plant Bioinformatics Course 2017 7 Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977 http://dailym.ai/1f1XeTB
  • 8. Sanger method 3/28/2017 BTI Plant Bioinformatics Course 2017 8 http://en.wikipedia.org/wiki/File:Sanger-sequencing.svg http://en.wikipedia.org/wiki/File: Radioactive_Fluorescent_Seq.jpg
  • 9. First generation sequencing • Very high quality sequences (99.999% or Q50) • Very very low throughput 3/28/2017 BTI Plant Bioinformatics Course 2017 9 Run Time Read Length Reads / Run Total nucleotides sequenced Cost / MB Capillary Sequencing (ABI3730xl) 20m-3h 400-900 bp 96 or 384 1.9-84 Kb $2400 http://www.hindawi.com/journals/bmri/2012/251364/tab1/
  • 10. Next generation sequencing 3/28/2017 BTI Plant Bioinformatics Course 2017 10
  • 11. Use the specific technology used to generate the data – Illumina Hiseq/Miseq/NextSeq – Pacific Biosciences RS I/RS II – Ion Torrent Proton/PGM – SOLiD – Oxford Nanopore 3/28/2017 BTI Plant Bioinformatics Course 2017 11 http://www.acgt.me/blog/2015/3/10/next-generation-sequencing-must- diepart-2
  • 12. 454 Pyrosequencing One purified DNA fragment, to one bead, to one read. 3/28/2017 BTI Plant Bioinformatics Course 2017 12 http://www.genengnews.com/ GS FLX Titanium https://mariamuir.com/wp- content/uploads/2013/04/rip.gif
  • 13. Illumina 3/28/2017 BTI Plant Bioinformatics Course 2017 13 Output 15 Gb 120 GB 1500 GB 1800 GB Max Number of Reads/ Run 25 Million 400 Million 5 Billion 6 Billion Max Read Length 2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp Cost $99K $250K $740K $10M (10 units) Source: Illumina 2500 3000 4000 500 550
  • 14. Illumina 3/28/2017 BTI Plant Bioinformatics Course 2017 14 Output 15 Gb 120 GB 1500 GB 1800 GB Max Number of Reads/ Run 25 Million 400 Million 5 Billion 6 Billion Max Read Length 2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp Cost $99K $250K $740K $10M (10 units) Source: Illumina 2500 3000 4000 500 550
  • 15. Illumina 3/28/2017 BTI Plant Bioinformatics Course 2017 15 Output 15 Gb 120 GB 1500 GB 1800 GB Max Number of Reads/ Run 25 Million 400 Million 5 Billion 6 Billion Max Read Length 2x300 bp 2x150 bp 2x125- 2x250 bp (RR mode) 2x150 bp Cost $99K $250K $740K $10M (10 units) Source: Illumina 2500 3000 4000 500 550
  • 16. Illumina 3/28/2017 BTI Plant Bioinformatics Course 2017 16 Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
  • 17. Illumina 3/28/2017 BTI Plant Bioinformatics Course 2017 17 Mardis 2008. Annu. Rev. Genomics Hum. Genet. 2008. 9:387–402
  • 18. Pacific Biosciences SMRT sequencing Single Molecule Real Time sequencing 3/28/2017 BTI Plant Bioinformatics Course 2017 18 http://smrt.med.cornell.edu/images/pacbio_library_prep-1.gif RS II Sequel
  • 19. Pacific Biosciences SMRT sequencing Error correction methods 3/28/2017 BTI Plant Bioinformatics Course 2017 19 Hierarchical genome-assembly process (HGAP) English et al., PLOS One. 2012 PBJelly
  • 20. Pacific Biosciences SMRT sequencing Error correction methods 3/28/2017 BTI Plant Bioinformatics Course 2017 20 PBcRPipeline
  • 21. 3/28/2017 BTI Plant Bioinformatics Course 2017 21 Pacific Biosciences SMRT sequencing Read Lengths
  • 22. Oxford Nanopore 3/28/2017 BTI Plant Bioinformatics Course 2017 22 https://www.nanoporetech.com/ http://erlichya.tumblr.com/post/66376172948/hands-on- experience-with-oxford-nanopore-minion http://halegrafx.com/vector-art/free-vector-despicable-me-minions/
  • 23. 3/28/2017 BTI Plant Bioinformatics Course 2017 23
  • 24. 3/28/2017 BTI Plant Bioinformatics Course 2017 24 http://lab.loman.net/2017/03/09/ultrareads-for-nanopore/ E. coli K-12 MG1655 on a standard FLO-MIN106 (R9.4) flowcell
  • 25. Next generation sequencing 3/28/2017 BTI Plant Bioinformatics Course 2017 25 Run Time Read Length Quality Total nucleotides sequenced Cost /MB 454 Pyrosequencing 24h 700 bp Q20-Q30 1 GB $10 Illumina Miseq 27h 2x300bp > Q30 15 GB $0.15 Illumina Hiseq 2500 1 - 10days 2x250bp >Q30 3000 GB $0.05 Ion torrent 2h 400bp >Q20 50MB-1GB $1 Pacific Biosciences 30m - 4h 10kb - >40kb >Q50 consensus >Q10 single 500 - 1000MB /SMRT cell $0.13 - $0.60 http://www.hindawi.com/journals/bmri/2012/251364/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431227 Note: Some figures might be out of date
  • 26. Long range scaffolding 3/28/2017 BTI Plant Bioinformatics Course 2017 26
  • 27. Hi-C Crosslinking 3/28/2017 BTI Plant Bioinformatics Course 2017 27
  • 28. 3/28/2017 BTI Plant Bioinformatics Course 2017 28 http://mms.businesswire.com/media/20150225005296/en /454639/5/GemCodePlatform.jpg • Long read information from short reads using 14bp bar codes • Very low input DNA ( as low as 0.625 ng) • Short library preparation time • 1ng of DNA is split across 100,000 Gel Coated Beads (GEMs) • Chromium instrument for single-cell RNAseq GemCode
  • 29. 3/28/2017 BTI Plant Bioinformatics Course 2017 29 http://mms.businesswire.com/media/20150225005296/en /454639/5/GemCodePlatform.jpg GemCode http://www.nature.com/nbt/journal/v34/n3/full/nbt.3432.html
  • 30. 3/28/2017 BTI Plant Bioinformatics Course 2017 30 http://www.bionanogenomics.com/technology/why-genome-mapping/
  • 31. 3/28/2017 BTI Plant Bioinformatics Course 2017 31 Human MHC map • Sample prep requires very high molecular weight DNA • Nicks at 10 sites / 100kb • Individual molecules are assembles into optical maps • Optical maps and sequences are merged in a hybrid assembly http://www.bionanogenomics.com/technology/why-genome-mapping/
  • 32. Many Others.. • Ion Torrent Proton/PGM • Dovetail • Supporting technologies – Nabsys – OpGen – Fluidigm 3/28/2017 BTI Plant Bioinformatics Course 2017 32 http://nextgenseek.com/2012/11/did-you-know-there-are- at-least-14-next-gen-sequence-technology-companies/
  • 33. Real cost of Sequencing!! Sboner, Genome Biology, 2011 3/28/2017 33BTI Plant Bioinformatics Course 2017
  • 34. 3/28/2017 BTI Plant Bioinformatics Course 2017 34 https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-8-125
  • 35. So What Sequencer Do I Use?? Microbial genome • Draft genome – Illumina Miseq (100-130X) – Illumina Hiseq (<200X) • Complete genome – Pacific Biosciences (80-100X) • Amplicons (16S, ITS) – Illumina Miseq Eukaryotic genome • Denovo assembly – Pacific Biosciences (70-80X) – Illumina Hiseq (100X+) – 10X Genomics – Bionano • Genotyping (GBS) – Illumina Hiseq • BACs – Pacific Biosciences 3/28/2017 BTI Plant Bioinformatics Course 2017 35 $$$$ ????
  • 36. 3/28/2017 BTI Plant Bioinformatics Course 2017 36 The diploid reference genome
  • 37. Cornell Sequencing Core • Illumina Hiseq 2500 (Rapid run and High output) • Illumina Miseq • Illumina Nextseq 500 • 10X Genomics GemCode 3/28/2017 BTI Plant Bioinformatics Course 2017 37 http://www.biotech.cornell.edu/brc/g enomics/services/price-list#overlay- context=brc/genomics-facility/next- generation-sequencing $ $ $
  • 38. Library Types Single end Pair end (PE, 150-300 bp, Fwd:/1, Rev:/2) Mate pair (MP, 2Kb to 20 Kb) 3/28/2017 38 F F R F R 454/Roche FR Illumina Illumina Slide credit: Aureliano Bombarely BTI Plant Bioinformatics Course 2017
  • 39. Implications of Choice of Library 3/28/2017 39 Slide credit: Aureliano Bombarely Consensus sequence (Contig) Reads Scaffold (or Supercontig) Pair Read information NNNNN Pseudomolecule (or ultracontig) F Genetic information (markers) or Optical maps NNNNN NN BTI Plant Bioinformatics Course 2017
  • 40. Multiplexing Libraries Use of different tags (4-6 nucleotides) to identify different samples in the same lane/sector. 3/28/2017 40 Slide credit: Aureliano Bombarely AGTCGT TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA AGTCGT AGTCGT AGTCGT AGTCGT TGAGCA TGAGCA TGAGCA TGAGCA Sequencing BTI Plant Bioinformatics Course 2017
  • 41. Data!! 3/28/2017 BTI Plant Bioinformatics Course 2017 41
  • 42. Fasta files: It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. -Wikipedia File Formats 3/28/2017 42 Slide credit: Aureliano Bombarely BTI Plant Bioinformatics Course 2017
  • 43. Fastq files: FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. -Wikipedia • Single line ID with at symbol (“@”) in the first column. • Sequences can be in multiple lines after the ID line • Single line with plus symbol (“+”) in the first column to represent the quality line. • Quality ID line may contain ID • Quality values are in multiple lines after the + line but length is identical to sequence 3/28/2017 43 Slide credit: Aureliano Bombarely File Formats BTI Plant Bioinformatics Course 2017
  • 44. 3/28/2017 44 Quality control: Encoding Fastq files: !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64) BTI Plant Bioinformatics Course 2017
  • 45. Quality control: Encoding 3/28/2017 45 !"#$%&'()*+,-./0123456789 Offset by 33 (Phred+33) KLMNOPQRSTUVWXYZ[]^_`abcdefgh Offset by 64 (Phred+64) BTI Plant Bioinformatics Course 2017
  • 46. 3/28/2017 46 Quality control: Encoding http://en.wikipedia.org/wiki/Phred_quality_score Phred score of a base is: Qphred = -10 log10 (e) where e is the estimated error probability of a base BTI Plant Bioinformatics Course 2017
  • 47. Pre-processing: Tools Trimming • FastQC • FASTX toolkit • Trimmomatic • Scythe Joining paired-end reads • fastq-join • FLASH • PANDAseq 3/28/2017 47BTI Plant Bioinformatics Course 2017
  • 48. Thank you!! 3/28/2017 BTI Plant Bioinformatics Course 2017 48