SlideShare a Scribd company logo
ERRORS & DRAWBACKS
OF NGS
Nixon Mendez
Department of Bioinformatics
Introduction
 High throughput sequencing technologies has made whole genome
sequencing and resequencing available to many more researchers and
projects.
 Cost and time have been greatly reduced.
 The error profiles and limitations of the new platforms differ significantly
from those of previous sequencing technologies.
 The selection of an appropriate sequencing platform for particular types of
experiments is an important consideration.
 Requires a detailed understanding of the technologies available which
including sources of error, error rate, as well as the speed and cost of
sequencing.
Errors in NGS
Errors in NGS
NGS sequencing errors focuses mainly on the following
points:
1. Low quality bases
2. PCR errors
3. High Error rate
1. Low quality bases
1. All the NGS companies have made big strides in improving the raw
accuracy of the bases.
2. Read lengths have increased as a result.
3. The number of reads has also increased to the point to get high
enough coverage to rule out most issues with low quality base calls.
2. PCR errors
All of the current NGS systems use PCR in some form to amplify the
initial nucleic acid and to add adapters for sequencing.
1. The amount of amplification can be very high, with multiple rounds
of PCR for exome and/or amplicon applications.
2. That base differences are seen which were artefacts generated by
the PCR.
3. Several groups have published improved methods that reduce the
amount of PCR or use alternative enzymes to increase the fidelity of
the reaction, e.g. Quail et al.
3. High error rate
1. High error rate prevents the accurate detection of rare mutations in
heterogeneous populations such as tumors and microbiomes.
Limitations of NGS
Limitations of NGS
NGS has inherent limitations they are as follows :
1. Sequence properties and algorithmic challenges
2. Contamination or new insertions
3. Repeat content
4. Segmental duplications
5. Missing and fragmented genes
6. Reference index
1. Sequence properties and
algorithmic challenges
 NGS technologies typically generate shorter sequences with higher
error rates from relatively short insert libraries.
 Illumina’s sequencing by synthesis, routinely produces read lengths of
75–100 base pairs (bp) from libraries with insert sizes of 200–500 bp.
 Short read lengths of NGS prevent the assembly of genomes with long
stretches of repetitive DNA.
2. Contamination or new insertions
 An important consideration of any sequencing project is DNA
contamination from other organisms.
 Before analyzing the genomes are searched for possible contaminants
by comparing the genome against (NCBI) nucleotide (nt) database.
 De novo sequence assemblies may be an important source for the
discovery of insertion polymorphisms sequence which require
particular scrutiny and additional validation because of their tendency
to enrich for contamination artifacts.
 Discriminating such sequences before sequence assembly becomes
particularly problematic when the underlying sequence read data are
short.
3. Repeat content
 Any WGS-based sequence assembly algorithm will collapse identical
repeats, resulting in reduced or lost genomic complexity.
 Most Alu subfamilies were underrepresented because of the shorter
sequence length of the Alu repeat class.
 Most common repeat classes showed reduced representation in the YH
genome.
4. Segmental duplications
 Whole-Genome Assembly Comparison (WGAC) method is used to analyse
the segmental duplication.
 If we limit our analysis to those duplications commonly present in the
human reference genome and duplications we detected through read-
depth analysis of a capillary sequencing–based WGS dataset (Celera) and
YH we conclude that 99.4% of true pairwise segmental duplications were
absent.
 We predict that 95.6% of the duplications in the YH de novo assembly are
likely false because they did not correspond to duplications predicted by
read depth.
5. Missing and fragmented genes
 Genomic reduction impacted on both gene coverage and
fragmentation of genes into multiple scaffolds.
 The presence of duplicated and repetitive sequences in introns
complicates complete gene assembly and annotation, leading to genes
being broken among multiple sequence scaffolds.
6. Reference index
 Other problem is analysing genomes without a reference index
genome.
 The portions that are missing or misassembled cannot be readily
inferred and are invisible to the biologist.
 Biases against duplications and repeats, as well as fragmentation,
raise questions related to the accuracy and completeness of similarly
assembled genomes.
Overcoming the Limitations
 It is the responsibility of the scientific community to enforce
standards of quality that can be measured and assessed.
 It is critical to develop new hybrid sequencing approaches, such as
multiplatform strategies including the third generation long-read
technologies, high-quality finished long-insert clones and new
assembly algorithms that can accommodate these heterogeneous
datasets.
 The genome assemblies themselves must be experimentally validated.
 Large-molecule, high-quality sequencing should not be abandoned
until the balance between quantity and quality of genomes has been
re-established.
THANK YOU!!

More Related Content

What's hot

Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 

What's hot (20)

NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Phage display
Phage displayPhage display
Phage display
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Microsatellite
MicrosatelliteMicrosatellite
Microsatellite
 
Introduction of RT PCR
Introduction of RT PCRIntroduction of RT PCR
Introduction of RT PCR
 
Nanopore sequencing
Nanopore sequencingNanopore sequencing
Nanopore sequencing
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Single Nucleotide Polymorphism (SNP)
Single Nucleotide Polymorphism (SNP)Single Nucleotide Polymorphism (SNP)
Single Nucleotide Polymorphism (SNP)
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
MASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCINGMASSIVELY PARELLEL SIGNATURE SEQUENCING
MASSIVELY PARELLEL SIGNATURE SEQUENCING
 
Next generation sequencing
Next  generation  sequencingNext  generation  sequencing
Next generation sequencing
 
Sts
StsSts
Sts
 
Microsatellite
MicrosatelliteMicrosatellite
Microsatellite
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 

Viewers also liked

Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
innocent87
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
Prianca12
 
Protein databases
Protein databasesProtein databases
Protein databases
sarumalay
 

Viewers also liked (20)

Lyme disease
Lyme diseaseLyme disease
Lyme disease
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
Addressing the shortage of medical doctors in zambia
Addressing the shortage of medical doctors in zambiaAddressing the shortage of medical doctors in zambia
Addressing the shortage of medical doctors in zambia
 
PERL- Bioperl modules
PERL- Bioperl modulesPERL- Bioperl modules
PERL- Bioperl modules
 
PowerMV
PowerMV PowerMV
PowerMV
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Protein database ..... of NCBI
Protein database ..... of NCBI Protein database ..... of NCBI
Protein database ..... of NCBI
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
MASCOT
MASCOTMASCOT
MASCOT
 
Protein-protein interaction (PPI)
Protein-protein interaction (PPI)Protein-protein interaction (PPI)
Protein-protein interaction (PPI)
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
2D-PAGE & DIGE
2D-PAGE & DIGE2D-PAGE & DIGE
2D-PAGE & DIGE
 
Cytoscape plugins - GeneMania and CentiScape
Cytoscape plugins - GeneMania and CentiScapeCytoscape plugins - GeneMania and CentiScape
Cytoscape plugins - GeneMania and CentiScape
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
 
Protein database
Protein databaseProtein database
Protein database
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Protein databases
Protein databasesProtein databases
Protein databases
 

Similar to Errors and Limitaions of Next Generation Sequencing

Long vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdfLong vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdf
balrajashok
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
Long Pei
 

Similar to Errors and Limitaions of Next Generation Sequencing (20)

Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
 
Mitigating genotyping application note
Mitigating genotyping application noteMitigating genotyping application note
Mitigating genotyping application note
 
Long vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdfLong vs short read sequencing. Long read sequencing technology is po.pdf
Long vs short read sequencing. Long read sequencing technology is po.pdf
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
Enabling CNV Studies from Single Cells Using Whole Genome Amplification and L...
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Viral genome sequencing
Viral genome sequencingViral genome sequencing
Viral genome sequencing
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
genesequencing-200105073623 (1).pdf
genesequencing-200105073623 (1).pdfgenesequencing-200105073623 (1).pdf
genesequencing-200105073623 (1).pdf
 
Gene Sequencing
Gene SequencingGene Sequencing
Gene Sequencing
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx final
 
The Evolution of In Situ Genetic Technology
The Evolution of In Situ Genetic TechnologyThe Evolution of In Situ Genetic Technology
The Evolution of In Situ Genetic Technology
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 

Errors and Limitaions of Next Generation Sequencing

  • 1. ERRORS & DRAWBACKS OF NGS Nixon Mendez Department of Bioinformatics
  • 2. Introduction  High throughput sequencing technologies has made whole genome sequencing and resequencing available to many more researchers and projects.  Cost and time have been greatly reduced.  The error profiles and limitations of the new platforms differ significantly from those of previous sequencing technologies.  The selection of an appropriate sequencing platform for particular types of experiments is an important consideration.  Requires a detailed understanding of the technologies available which including sources of error, error rate, as well as the speed and cost of sequencing.
  • 4. Errors in NGS NGS sequencing errors focuses mainly on the following points: 1. Low quality bases 2. PCR errors 3. High Error rate
  • 5. 1. Low quality bases 1. All the NGS companies have made big strides in improving the raw accuracy of the bases. 2. Read lengths have increased as a result. 3. The number of reads has also increased to the point to get high enough coverage to rule out most issues with low quality base calls.
  • 6. 2. PCR errors All of the current NGS systems use PCR in some form to amplify the initial nucleic acid and to add adapters for sequencing. 1. The amount of amplification can be very high, with multiple rounds of PCR for exome and/or amplicon applications. 2. That base differences are seen which were artefacts generated by the PCR. 3. Several groups have published improved methods that reduce the amount of PCR or use alternative enzymes to increase the fidelity of the reaction, e.g. Quail et al.
  • 7. 3. High error rate 1. High error rate prevents the accurate detection of rare mutations in heterogeneous populations such as tumors and microbiomes.
  • 9. Limitations of NGS NGS has inherent limitations they are as follows : 1. Sequence properties and algorithmic challenges 2. Contamination or new insertions 3. Repeat content 4. Segmental duplications 5. Missing and fragmented genes 6. Reference index
  • 10. 1. Sequence properties and algorithmic challenges  NGS technologies typically generate shorter sequences with higher error rates from relatively short insert libraries.  Illumina’s sequencing by synthesis, routinely produces read lengths of 75–100 base pairs (bp) from libraries with insert sizes of 200–500 bp.  Short read lengths of NGS prevent the assembly of genomes with long stretches of repetitive DNA.
  • 11. 2. Contamination or new insertions  An important consideration of any sequencing project is DNA contamination from other organisms.  Before analyzing the genomes are searched for possible contaminants by comparing the genome against (NCBI) nucleotide (nt) database.  De novo sequence assemblies may be an important source for the discovery of insertion polymorphisms sequence which require particular scrutiny and additional validation because of their tendency to enrich for contamination artifacts.  Discriminating such sequences before sequence assembly becomes particularly problematic when the underlying sequence read data are short.
  • 12. 3. Repeat content  Any WGS-based sequence assembly algorithm will collapse identical repeats, resulting in reduced or lost genomic complexity.  Most Alu subfamilies were underrepresented because of the shorter sequence length of the Alu repeat class.  Most common repeat classes showed reduced representation in the YH genome.
  • 13. 4. Segmental duplications  Whole-Genome Assembly Comparison (WGAC) method is used to analyse the segmental duplication.  If we limit our analysis to those duplications commonly present in the human reference genome and duplications we detected through read- depth analysis of a capillary sequencing–based WGS dataset (Celera) and YH we conclude that 99.4% of true pairwise segmental duplications were absent.  We predict that 95.6% of the duplications in the YH de novo assembly are likely false because they did not correspond to duplications predicted by read depth.
  • 14. 5. Missing and fragmented genes  Genomic reduction impacted on both gene coverage and fragmentation of genes into multiple scaffolds.  The presence of duplicated and repetitive sequences in introns complicates complete gene assembly and annotation, leading to genes being broken among multiple sequence scaffolds.
  • 15. 6. Reference index  Other problem is analysing genomes without a reference index genome.  The portions that are missing or misassembled cannot be readily inferred and are invisible to the biologist.  Biases against duplications and repeats, as well as fragmentation, raise questions related to the accuracy and completeness of similarly assembled genomes.
  • 16. Overcoming the Limitations  It is the responsibility of the scientific community to enforce standards of quality that can be measured and assessed.  It is critical to develop new hybrid sequencing approaches, such as multiplatform strategies including the third generation long-read technologies, high-quality finished long-insert clones and new assembly algorithms that can accommodate these heterogeneous datasets.  The genome assemblies themselves must be experimentally validated.  Large-molecule, high-quality sequencing should not be abandoned until the balance between quantity and quality of genomes has been re-established.