SlideShare a Scribd company logo
1 of 26
Download to read offline
Quality control of sequencing
with FastQC obtained with the
Illumina platform
Hafiz.M.Zeeshan.Raza
Research Associate
COMSATS University Islamabad, Pakistan
Sahiwal Campus
hafizraza26@gmail.com
Cell# 0092-36-6155501
Basic data (BASIC STATISTICS)
In the basic statistical data it is represented:
• File name: Sec_Ilumina.fastq.txt
• File format: conventional
• System used: Illumina
• Total analyzed sequences: 25000. This is the number of
readings analyzed.
• Sequences marked with bad quality: 0.
• Length of each reading: 38 bases
• % GC: 45%
• Note: The program will not tell you the sequences that
have bad quality you are the one that will correct them
and you will mark them.
Quality (Q) of the sequences per base
(PER BASE SEQUENCE QUALITY)
• The quartiles are represented in yellow, the blue line is the median and in
red, the mean of the quality. In the X axis, the bases of the readings are
represented and each reading has 38 bases. While in the Y, the qualities 0-
34 are represented, distinguishing three zones:
• Green zone : 28-34. They correspond to a very good quality.
• Orange zone : intermediate quality zone (20-28).
• Red zone : area of ​​poor quality (0-20).
• The quality of the 25000 readings is represented from each base.
Continue…
• From the graphical representation, it can be said that when you see the qualities
assigned to the first base, they are all very good, since they are in the green
zone. At the base 38 there is a lot of dispersion in the qualities (that is why the
quartile is so big), that is, there are good qualities and others very bad.
• In conclusion, we can say that until the base 22 the qualities are good, but from
this they get worse since of the 25000 readings from the base 23 I have some
readings in which that base has bad quality. Therefore, I will have to use a program
that will remove all the bases on which, for example, Q <25 (the quality is assigned
by us) or that make me the average Q <25. In this way, I will have that of the
25,000 readings each will have a different size, there will be readings that have 38
bases and others that do not.
Quality of the sequence by "tile"
(PER TILE SEQUENCE QUALITY).
• In this case a graphic is shown here (it only appears if an
Illumina library is used) that shows the flow cells, where the
sequence is placed. This chart allows you to search the
quality scores of each piece through all its bases to see if
there was a quality loss associated with only part of the flow
cell.
• If there are marks on the graph, this tells me that I have poor
quality since I may not have filtered the reagents, I have not
done the vacuum. So that the bubbles stay in the flow cell and
when looking at the spectra it interferes me, giving a bad
quality.
• In our image no fault is shown by us since the background is
blue. Which indicates that it has been degassed and filtered.
Levels of quality per sequence
(PER SEQUENCE QUALITY SCORE)
• It gives us an idea in advance of how many readings I am going to
remove since it allows to see if a subset of its sequences have
values ​​with low quality.
• In the graph shown on the left, the average of the quality of the
sequences is represented on the X axis. While on the Y axis, the
number of sequences or readings corresponding to that average is
represented.
• In our case, it can be seen that there are more than 3500 readings
that present a warm environment of 29-31. Existing much less with
low quality.
Content of the sequence per base
(PER BASE SEQUENCE CONTENT)
• In this section we are told the proportion of each of the
bases in the sequence.
• In a random library, there should be little difference
between the bases of a sequence of execution, so the lines
in this plot must run parallel to each other.
• In our case, we can see that there are differences between
some bases and others when the amount of A should be
equal to that of T, and that of G = C.
Content of guanine and cytosine (GC) per sequence
(PER SEQUENCE GC CONTENT)
• This module measures the GC content of our entire sequence ( red line ) and
compares it with a theoretical normal distribution of GC content ( blue line ).
• The average percentage of the content of G and C is shown on the X axis, while the
number of readings is shown on the Y axis.
• In our case ( red line ), we see that there are several peaks, where there should be
a Gaussian curve. This indicates that you have been able to recognize:
 Adapter dimers
 Contamination with other DNA
• If I have sequences with bad quality, in which I do not know what the base is,
when sequencing G or C is put where maybe I should not go.
• This indicates to me that the sequencing has not been carried out correctly, but
after analyzing the file with the qualities, I will be able to correct them and see:
• If there was DNA contamination, if it is of good quality I will not remove it
• In the case of sequences that are misread, if you read them as G and C when
correcting them, they should be removed.
Content of N per base
(PER BASE N CONTENT)
• This section tells us the content of bases that have an N
(unassigned base).
• In our case, we can see that the content of N is
practically nil, that is, that N has not been assigned to
the bases that were not known, but has placed an A, T,
G or C. This indicates that in our sequence the quality is
not so bad as to put an N.
Distribution of the length of the sequences
(SEQUENCE LENGTH DISTRIBUTION)
• Some high-performance sequencers generate fragments of
sequences of uniform length, but others may contain
different lengths.
• This module generates a graph that shows the size
distribution of the fragments in the file that was analyzed.
• In our case, we see that the sequences are homogeneous,
the 25000 sequences have 38 bases.
Levels of duplicate sequences
(SEQUENCE DUPLICATION LEVELS)
• This module counts the degree of duplication for each sequence in a
library and creates a graph that shows the relative number of sequences
with different degrees of duplication.
• When sequencing, it is necessary that random sequences occur.
• The graph shows the proportion of the library that consists of sequences
in each of the different duplication level containers. There are two lines:
• The blue line shows the total of the sequences
• The red line shows the duplicated sequences
Continue…
• In the case of the complete sequence we can observe 3 peaks:
• > 10: in this case there are more than 10% of sequences that have the same
fragment from the beginning to the end 10 times
• > 100: of the 25,000 readings that I have, 25% of them have the same fragment
from the beginning to the end 100 times
• > 1K: 20% of repeated sequences, that is, they have the same fragment from the
beginning to the end 1000 times
• In the case, genomic DNA should not be observed duplications (red line). However,
they can be generated. In general there are two possible types of duplicates of a
library: duplicates derived from PCR artifacts, or biological duplicates that are
natural collisions where different copies of the same sequence are randomly
selected. However, there is no way to distinguish between these two types and
both will be reported as duplicates here.
• In the RNA-Seq libraries, some sequences are expected to occur very frequently,
and others will be very rare ( transcripts under copy number), so a high level of
duplication in the part of the library is inevitable.
Overrepresented sequences
• In this module, it shows the evaluation of the number of sequences
that come out at the time of mapping (I use a reference
transcriptome and I look for alignment by homology) that can give
me problems when I try to do an assembly.
• You can see if there are dimers in the adapters, because as you
know the adapter that has been placed if that sequence comes out
you know that the adapter has been sequenced forming dimers.
Content of the adapters
(ADAPTER CONTENT)
• An obvious class of sequences that you may want to
analyze are the adapter sequences. It is useful to know if
the library contains a significant number of adapters to be
able to evaluate if you need adjustment adapter or not.
• Therefore, this module makes a specific search for a set of
Kmers defined separately and will give you a view of the
total proportion of your library that contains these Kmers.
Contact now for Scientific writing, synopsis, thesis, assignments, ppt presentations, etc.
hafizraza26@gmail.com
Check the work now https://www.slideshare.net/HafizMuhammadRaza/edit_my_uploads

More Related Content

What's hot

Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Torsten Seemann
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsprateek kumar
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Leighton Pritchard
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 

What's hot (20)

Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Biological networks
Biological networksBiological networks
Biological networks
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
PCR Primer desining
PCR Primer desiningPCR Primer desining
PCR Primer desining
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
Primer design
Primer designPrimer design
Primer design
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
OMIM Database
OMIM DatabaseOMIM Database
OMIM Database
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
phylogenetic analysis.pptx
phylogenetic analysis.pptxphylogenetic analysis.pptx
phylogenetic analysis.pptx
 

Similar to Quality control of sequencing with fast qc obtained with

DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDivyanshGupta922023
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2AdamCribbs1
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Christos Argyropoulos
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 

Similar to Quality control of sequencing with fast qc obtained with (20)

Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
ACF.ppt
ACF.pptACF.ppt
ACF.ppt
 
Finch TV DNA SEQUENCING
Finch TV DNA SEQUENCINGFinch TV DNA SEQUENCING
Finch TV DNA SEQUENCING
 
Qpcr
QpcrQpcr
Qpcr
 
Data Mining Lecture_9.pptx
Data Mining Lecture_9.pptxData Mining Lecture_9.pptx
Data Mining Lecture_9.pptx
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
Histogram
HistogramHistogram
Histogram
 

More from Hafiz Muhammad Zeeshan Raza

Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Hafiz Muhammad Zeeshan Raza
 
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Hafiz Muhammad Zeeshan Raza
 
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...Hafiz Muhammad Zeeshan Raza
 
DNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationDNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationHafiz Muhammad Zeeshan Raza
 

More from Hafiz Muhammad Zeeshan Raza (16)

Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...
 
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
 
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
 
OMANTEL
OMANTELOMANTEL
OMANTEL
 
Cell organelles
Cell organellesCell organelles
Cell organelles
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Translation & Post Translational Modifications
Translation & Post Translational ModificationsTranslation & Post Translational Modifications
Translation & Post Translational Modifications
 
DNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationDNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional Modification
 
Recombinant DNA technology
Recombinant DNA technologyRecombinant DNA technology
Recombinant DNA technology
 
Restriction Fragment Length Polymorphism (RFLP)
Restriction Fragment Length Polymorphism (RFLP)Restriction Fragment Length Polymorphism (RFLP)
Restriction Fragment Length Polymorphism (RFLP)
 
Mendeley software beginers
Mendeley software beginersMendeley software beginers
Mendeley software beginers
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 

Recently uploaded (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 

Quality control of sequencing with fast qc obtained with

  • 1. Quality control of sequencing with FastQC obtained with the Illumina platform Hafiz.M.Zeeshan.Raza Research Associate COMSATS University Islamabad, Pakistan Sahiwal Campus hafizraza26@gmail.com Cell# 0092-36-6155501
  • 2. Basic data (BASIC STATISTICS) In the basic statistical data it is represented: • File name: Sec_Ilumina.fastq.txt • File format: conventional • System used: Illumina • Total analyzed sequences: 25000. This is the number of readings analyzed. • Sequences marked with bad quality: 0. • Length of each reading: 38 bases • % GC: 45% • Note: The program will not tell you the sequences that have bad quality you are the one that will correct them and you will mark them.
  • 3.
  • 4. Quality (Q) of the sequences per base (PER BASE SEQUENCE QUALITY) • The quartiles are represented in yellow, the blue line is the median and in red, the mean of the quality. In the X axis, the bases of the readings are represented and each reading has 38 bases. While in the Y, the qualities 0- 34 are represented, distinguishing three zones: • Green zone : 28-34. They correspond to a very good quality. • Orange zone : intermediate quality zone (20-28). • Red zone : area of ​​poor quality (0-20). • The quality of the 25000 readings is represented from each base.
  • 5.
  • 6. Continue… • From the graphical representation, it can be said that when you see the qualities assigned to the first base, they are all very good, since they are in the green zone. At the base 38 there is a lot of dispersion in the qualities (that is why the quartile is so big), that is, there are good qualities and others very bad. • In conclusion, we can say that until the base 22 the qualities are good, but from this they get worse since of the 25000 readings from the base 23 I have some readings in which that base has bad quality. Therefore, I will have to use a program that will remove all the bases on which, for example, Q <25 (the quality is assigned by us) or that make me the average Q <25. In this way, I will have that of the 25,000 readings each will have a different size, there will be readings that have 38 bases and others that do not.
  • 7. Quality of the sequence by "tile" (PER TILE SEQUENCE QUALITY). • In this case a graphic is shown here (it only appears if an Illumina library is used) that shows the flow cells, where the sequence is placed. This chart allows you to search the quality scores of each piece through all its bases to see if there was a quality loss associated with only part of the flow cell. • If there are marks on the graph, this tells me that I have poor quality since I may not have filtered the reagents, I have not done the vacuum. So that the bubbles stay in the flow cell and when looking at the spectra it interferes me, giving a bad quality. • In our image no fault is shown by us since the background is blue. Which indicates that it has been degassed and filtered.
  • 8.
  • 9. Levels of quality per sequence (PER SEQUENCE QUALITY SCORE) • It gives us an idea in advance of how many readings I am going to remove since it allows to see if a subset of its sequences have values ​​with low quality. • In the graph shown on the left, the average of the quality of the sequences is represented on the X axis. While on the Y axis, the number of sequences or readings corresponding to that average is represented. • In our case, it can be seen that there are more than 3500 readings that present a warm environment of 29-31. Existing much less with low quality.
  • 10.
  • 11. Content of the sequence per base (PER BASE SEQUENCE CONTENT) • In this section we are told the proportion of each of the bases in the sequence. • In a random library, there should be little difference between the bases of a sequence of execution, so the lines in this plot must run parallel to each other. • In our case, we can see that there are differences between some bases and others when the amount of A should be equal to that of T, and that of G = C.
  • 12.
  • 13. Content of guanine and cytosine (GC) per sequence (PER SEQUENCE GC CONTENT) • This module measures the GC content of our entire sequence ( red line ) and compares it with a theoretical normal distribution of GC content ( blue line ). • The average percentage of the content of G and C is shown on the X axis, while the number of readings is shown on the Y axis. • In our case ( red line ), we see that there are several peaks, where there should be a Gaussian curve. This indicates that you have been able to recognize:  Adapter dimers  Contamination with other DNA • If I have sequences with bad quality, in which I do not know what the base is, when sequencing G or C is put where maybe I should not go. • This indicates to me that the sequencing has not been carried out correctly, but after analyzing the file with the qualities, I will be able to correct them and see: • If there was DNA contamination, if it is of good quality I will not remove it • In the case of sequences that are misread, if you read them as G and C when correcting them, they should be removed.
  • 14.
  • 15. Content of N per base (PER BASE N CONTENT) • This section tells us the content of bases that have an N (unassigned base). • In our case, we can see that the content of N is practically nil, that is, that N has not been assigned to the bases that were not known, but has placed an A, T, G or C. This indicates that in our sequence the quality is not so bad as to put an N.
  • 16.
  • 17. Distribution of the length of the sequences (SEQUENCE LENGTH DISTRIBUTION) • Some high-performance sequencers generate fragments of sequences of uniform length, but others may contain different lengths. • This module generates a graph that shows the size distribution of the fragments in the file that was analyzed. • In our case, we see that the sequences are homogeneous, the 25000 sequences have 38 bases.
  • 18.
  • 19. Levels of duplicate sequences (SEQUENCE DUPLICATION LEVELS) • This module counts the degree of duplication for each sequence in a library and creates a graph that shows the relative number of sequences with different degrees of duplication. • When sequencing, it is necessary that random sequences occur. • The graph shows the proportion of the library that consists of sequences in each of the different duplication level containers. There are two lines: • The blue line shows the total of the sequences • The red line shows the duplicated sequences
  • 20. Continue… • In the case of the complete sequence we can observe 3 peaks: • > 10: in this case there are more than 10% of sequences that have the same fragment from the beginning to the end 10 times • > 100: of the 25,000 readings that I have, 25% of them have the same fragment from the beginning to the end 100 times • > 1K: 20% of repeated sequences, that is, they have the same fragment from the beginning to the end 1000 times • In the case, genomic DNA should not be observed duplications (red line). However, they can be generated. In general there are two possible types of duplicates of a library: duplicates derived from PCR artifacts, or biological duplicates that are natural collisions where different copies of the same sequence are randomly selected. However, there is no way to distinguish between these two types and both will be reported as duplicates here. • In the RNA-Seq libraries, some sequences are expected to occur very frequently, and others will be very rare ( transcripts under copy number), so a high level of duplication in the part of the library is inevitable.
  • 21.
  • 22. Overrepresented sequences • In this module, it shows the evaluation of the number of sequences that come out at the time of mapping (I use a reference transcriptome and I look for alignment by homology) that can give me problems when I try to do an assembly. • You can see if there are dimers in the adapters, because as you know the adapter that has been placed if that sequence comes out you know that the adapter has been sequenced forming dimers.
  • 23.
  • 24. Content of the adapters (ADAPTER CONTENT) • An obvious class of sequences that you may want to analyze are the adapter sequences. It is useful to know if the library contains a significant number of adapters to be able to evaluate if you need adjustment adapter or not. • Therefore, this module makes a specific search for a set of Kmers defined separately and will give you a view of the total proportion of your library that contains these Kmers.
  • 25.
  • 26. Contact now for Scientific writing, synopsis, thesis, assignments, ppt presentations, etc. hafizraza26@gmail.com Check the work now https://www.slideshare.net/HafizMuhammadRaza/edit_my_uploads