SlideShare a Scribd company logo
1 of 92
Download to read offline
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Data (Fastq)
FASTQC Prinseqlite
Trimming/filtering
Data Quality Check and Interpretation
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
perl prinseq-lite.pl -fastq control.fastq -out_format 5 -min_len 50 -min_qual_mean 25
Input fastq file
1 (FASTA only), 2 (FASTA and QUAL), 3 (FASTQ), 4 (FASTQ and
FASTA), or 5 (FASTQ, FASTA and QUAL)
Output format Filter sequences shorter
than minimum length
(here it is 50 nucleotides)
Filter sequence with
quality score mean below
minimum quality mean
(here it is 25)
PRINSEQ -
• generates summary statistics of sequence and quality data
• used to filter, reformat and trim next-generation sequence data.
• PRINSEQ is available through a user-friendly web interface or as
standalone version.
Command for quality filtering :
perl prinseq-lite.pl -fastq control.fastq -out_format 5 -min_len 50 -
min_qual_mean 25
For any further help type :- perl prinseq-lite.pl -h
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
The output generated from analyzing three files viz. original
fastq i.e control_R1.fastq, and the good and bad fastq files
generated from prinseq-lite
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Basic Statistics:
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpre
simple
Control_R1.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Basic Statistics:
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Basic Statistics:
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpret
simple c
gives th
length an
Here all
good.fas
raw data
generate
reads. Th
Note :- B
raises an
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Basic Statistics:
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module genera
simple composition statistics for the file analyzed
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good
generated after running prinseq-lite showed > 16 mill
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and ne
raises an error.
Control_R1.fastq Control_R1_good.fa
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Base Sequence Quality
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Base Sequence Quality
Computational Biology and Genomics Facility, Indian Veterinary Research InstituteInterpretation: This view shows an overview of the range of quality
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Base Sequence Quality
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interp
values
For ea
in colo
The bo
range
90% p
represe
It can
means
low me
mark (
window
Note :
than 1
raised
for any
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Base Sequence Quality
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This view shows an overview of the range of quality
values across all bases at each position in the FastQ file. 

For each position a BoxWhisker type plot is drawn. The central line (red
in colour) is the median value.

The box (yellow in colour) represents the inter-quartile 

range (25-75%). The upper and lower whiskers represent the 10% and
90% points. The line (blue in colour) that runs across the graphs
represents the mean quality.

It can be appreciated that output of good files looks to have the best
means quality in comparison to raw data file. The bad file had a very
low mean quality thereby, a failure was issued as indicated by a cross
mark (red in colour) against per base sequence quality in the main
window of that data set.
Note : A warning will be issued if the lower quartile for any base is less
than 10, or if the median for any base is less than 25. A failure was
raised if the lower quartile for any base is less than 5 or if the median
for any base is less than 20.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module genera
simple composition statistics for the file analyzed
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good
generated after running prinseq-lite showed > 16 mill
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and ne
raises an error.
Control_R1.fastq Control_R1_good.fa
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence Quality Scores
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1.fastq C
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence Quality Scores
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence Quality Scores
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretat
subset of y
under the b
file indicatin
file in com
better in the
very poor q
(red in colo
that data se
Note : A wa
below 27 -
most freque
error rate.
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence Quality Scores
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: The per sequence quality score report allows us to see if a
subset of your sequences have universally low quality values. The area
under the bell shaped curve was greater for the good file than the raw data
file indicating that most of the sequences had very good quality in the good
file in comparison to the raw data file.Average quality per read is also
better in the good file. In the bad file most of the sequences in the bad had
very poor quality. A failure was also issued as indicated by a cross mark
(red in colour) against per sequence quality score in the main window of
that data set.
Note : A warning is raised if the most frequently observed mean quality is
below 27 - this equates to a 0.2% error rate. A failure error is raised if the
most frequently observed mean quality is below 20 - this equates to a 1%
error rate.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module genera
simple composition statistics for the file analyzed
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good
generated after running prinseq-lite showed > 16 mill
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and ne
raises an error.
Control_R1.fastq Control_R1_good.fa
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: Per Base Sequence Content plots the proportion
of each base position in a file for which each of the four normal
DNA bases has been called. In a random library you would
expect that there would be little to no difference between the
different bases of a sequence run, so the lines in this plot should
run parallel with each other. The relative amount of each base
should reflect the overall amount of these bases in the genome,
but in any case they should not be hugely imbalanced from each
other. However in all our cases here there is a failure issued due
to variation in A to T and G to C percentages at the start of the
reads till the 14th base. This also indicates that the reads can be
trimmed at the 5’end till the 14th base.
Note : A warning is issued if the difference between A and T, or G
and C is greater than 10% in any position and a failure is issued if
the difference between A and T, or G and C is greater than 20% in
any position.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation: Per Base Sequence Content plots the proportion
of each base position in a file for which each of the four normal
DNA bases has been called. In a random library you would
expect that there would be little to no difference between the
different bases of a sequence run, so the lines in this plot should
run parallel with each other. The relative amount of each base
should reflect the overall amount of these bases in the genome,
but in any case they should not be hugely imbalanced from each
other. However in all our cases here there is a failure issued due
to variation in A to T and G to C percentages at the start of the
reads till the 14th base. This also indicates that the reads can be
trimmed at the 5’end till the 14th base.
Note : A warning is issued if the difference between A and T, or G
and C is greater than 10% in any position and a failure is issued if
the difference between A and T, or G and C is greater than 20% in
any position.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute In
Control_R1.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: Per Base Sequence Content plots the proportion
of each base position in a file for which each of the four normal
DNA bases has been called. In a random library you would
expect that there would be little to no difference between the
different bases of a sequence run, so the lines in this plot should
run parallel with each other. The relative amount of each base
should reflect the overall amount of these bases in the genome,
but in any case they should not be hugely imbalanced from each
other. However in all our cases here there is a failure issued due
to variation in A to T and G to C percentages at the start of the
reads till the 14th base. This also indicates that the reads can be
trimmed at the 5’end till the 14th base.
Note : A warning is issued if the difference between A and T, or G
and C is greater than 10% in any position and a failure is issued if
the difference between A and T, or G and C is greater than 20% in
any position.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: Per Base Sequence Content plots the proportion
of each base position in a file for which each of the four normal
DNA bases has been called. In a random library you would
expect that there would be little to no difference between the
different bases of a sequence run, so the lines in this plot should
run parallel with each other. The relative amount of each base
should reflect the overall amount of these bases in the genome,
but in any case they should not be hugely imbalanced from each
other. However in all our cases here there is a failure issued due
to variation in A to T and G to C percentages at the start of the
reads till the 14th base. This also indicates that the reads can be
trimmed at the 5’end till the 14th base.
Note : A warning is issued if the difference between A and T, or G
and C is greater than 10% in any position and a failure is issued if
the difference between A and T, or G and C is greater than 20% in
any position.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
I
p
f
w
t
s
b
g
f
i
s
c
N
a
t
a
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: Per Base Sequence Content plots the proportion
of each base position in a file for which each of the four normal
DNA bases has been called. In a random library you would
expect that there would be little to no difference between the
different bases of a sequence run, so the lines in this plot should
run parallel with each other. The relative amount of each base
should reflect the overall amount of these bases in the genome,
but in any case they should not be hugely imbalanced from each
other. However in all our cases here there is a failure issued due
to variation in A to T and G to C percentages at the start of the
reads till the 14th base. This also indicates that the reads can be
trimmed at the 5’end till the 14th base.
Note : A warning is issued if the difference between A and T, or G
and C is greater than 10% in any position and a failure is issued if
the difference between A and T, or G and C is greater than 20% in
any position.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: Per Base Sequence Content plots the proportion
of each base position in a file for which each of the four normal
DNA bases has been called. In a random library you would
expect that there would be little to no difference between the
different bases of a sequence run, so the lines in this plot should
run parallel with each other. The relative amount of each base
should reflect the overall amount of these bases in the genome,
but in any case they should not be hugely imbalanced from each
other. However in all our cases here there is a failure issued due
to variation in A to T and G to C percentages at the start of the
reads till the 14th base. This also indicates that the reads can be
trimmed at the 5’end till the 14th base.
Note : A warning is issued if the difference between A and T, or G
and C is greater than 10% in any position and a failure is issued if
the difference between A and T, or G and C is greater than 20% in
any position.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module genera
simple composition statistics for the file analyzed
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good
generated after running prinseq-lite showed > 16 mill
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and ne
raises an error.
Control_R1.fastq Control_R1_good.fa
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence GC Content
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpret
length of
Control_R1.fastq
Cont
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence GC Content
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module measures the GC content across the whole
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence GC Content
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretat
length of e
distribution
a roughly
correspond
shaped dis
of biased s
files and a
the normal
good and ra
Note : A wa
distribution
sum of the
of the reads
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure was issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Per Sequence GC Content
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module measures the GC content across the whole
length of each sequence in a file and compares it to a modeled normal
distribution of GC content. In a normal random library you would expect to see
a roughly normal distribution of GC content where the central peak
corresponds to the overall GC content of the underlying genome. An unusually
shaped distribution could indicate a contaminated library or some other kinds
of biased subset. In our data sets a warning is raised for the raw and good
files and a failure is issued for the bad file. It is clear that the deviation from
the normal distribution is greater in the bad file output in comparison to the
good and raw file outputs.
Note : A warning is raised if the sum of the deviations from the normal
distribution represents more than 15% of the reads. And failure is issued if the
sum of the deviations from the normal distribution represents more than 30%
of the reads.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generat
simple composition statistics for the file analyzed.
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_R
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good f
generated after running prinseq-lite showed > 16 milli
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and nev
raises an error.
Control_R1.fastq Control_R1_good.fas
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Per Base N Content
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpreta
position f
Control_R1.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Per Base N Content
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research InstituteInterpretation:This module plots out the percentage of base calls at each
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpre
position
proportio
sequenc
Note : A
failure is
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation:This module plots out the percentage of base calls at each
position for which an N was called. It's not unusual to see a very low
proportion of Ns appearing in a sequence, especially nearer the end of a
sequence. Here no N’s were found in all the datasets.
Note : A warning is raised if any position shows an N content of >5% and a
failure is issued if any position shows an N content of >20%
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generat
simple composition statistics for the file analyzed.
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_R
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good f
generated after running prinseq-lite showed > 16 milli
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and nev
raises an error.
Control_R1.fastq Control_R1_good.fas
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Sequence length Distribution
Interpretation :This module generates a graph showing the distribution of fragment
sizes in the file which was analysed.In many cases this will produce a simple graph
showing a peak only at one size, but for variable length FastQ files this will show
the relative amounts of each different size of sequence fragment. Here in all the
files we have reads with a length of 101 bp.
Note : A warning is raised if all sequences are not the same length. A failure is
issued if any of the sequences have zero length.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation :This module generates a graph showing the distribution of fragment
sizes in the file which was analysed.In many cases this will produce a simple graph
showing a peak only at one size, but for variable length FastQ files this will show
the relative amounts of each different size of sequence fragment. Here in all the
files we have reads with a length of 101 bp.
Note : A warning is raised if all sequences are not the same length. A failure is
issued if any of the sequences have zero length.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpre
sizes in
Control_R1.fastq
Co
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Sequence length Distribution
Interpretation :This module generates a graph showing the distribution of fragment
sizes in the file which was analysed.In many cases this will produce a simple graph
showing a peak only at one size, but for variable length FastQ files this will show
the relative amounts of each different size of sequence fragment. Here in all the
files we have reads with a length of 101 bp.
Note : A warning is raised if all sequences are not the same length. A failure is
issued if any of the sequences have zero length.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research InstituteInterpretation :This module generates a graph showing the distribution of fragment
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation :This module generates a graph showing the distribution of fragment
sizes in the file which was analysed.In many cases this will produce a simple graph
showing a peak only at one size, but for variable length FastQ files this will show
the relative amounts of each different size of sequence fragment. Here in all the
files we have reads with a length of 101 bp.
Note : A warning is raised if all sequences are not the same length. A failure is
issued if any of the sequences have zero length.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Sequence length Distribution
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretati
sizes in the
showing a
the relative
files we ha
Note : A w
issued if an
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module generates a graph showing the distribution of fragment
sizes in the file which was analysed.In many cases this will produce a simple graph
showing a peak only at one size, but for variable length FastQ files this will show
the relative amounts of each different size of sequence fragment. Here in all the
files we have reads with a length of 101 bp.
Note : A warning is raised if all sequences are not the same length. A failure is
issued if any of the sequences have zero length.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Sequence length Distribution
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module generates a graph showing the distribution of fragment
sizes in the file which was analysed.In many cases this will produce a simple graph
showing a peak only at one size, but for variable length FastQ files this will show
the relative amounts of each different size of sequence fragment. Here in all the
files we have reads with a length of 101 bp.
Note : A warning is raised if all sequences are not the same length. A failure is
issued if any of the sequences have zero length.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generat
simple composition statistics for the file analyzed.
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_R
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good f
generated after running prinseq-lite showed > 16 milli
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and nev
raises an error.
Control_R1.fastq Control_R1_good.fas
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1.fastq
No Overrepresented Sequences
Control_R1_good.fastq
No Overrepresented Sequences
Sequence Count Percentage Possible Source
GATCGGAAGA
GCACACGTCTG
AACTCCAGTCA
CTGACCAACTC
TCCGTATGC
3950 0.84781596
16915967
TrueSeq Adapter,
index 4 (110%
over 50 bp)
Interpretation: Overrepresented sequences include the sequences that
are highly duplicated in your library, as well as any primer and/or adapter
dimers that were present in the original library. Adapter sequences are
always present in a sequencing experiment at some level, but aren't
problematic in small percentages. These adapters will not align to your
genome. They can be ignored, or you may use analysis software to
remove them. This module lists all of the sequence, which make up
more than 0.1% of the total. To conserve memory only sequences which
appear in the first 200,000 sequences are tracked to the end of the file.
It is therefore possible that a sequence,which is overrepresented but
doesn't appear at the start of the file for some reason could be missed
by this module. Here wee find overrepresented sequences of adapter in
the bad file
Note : A warning is raised if any sequence is found to represent more
than 0.1% of the total and a failure is issued if any sequence is found to
represent more than 1% of the total.
Control_R1_bad.fastq
Overrepresented Sequences
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1.fastq
No Overrepresented Sequences
Sequence Count Percentage Possible Source
GATCGGAAGA
GCACACGTCTG
AACTCCAGTCA
CTGACCAACTC
TCCGTATGC
3950 0.84781596
16915967
TrueSeq Adapter,
index 4 (110%
over 50 bp)
I
a
d
a
p
g
r
m
a
I
d
b
t
N
t
r
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Control_R1.fastq
No Overrepresented Sequences
Control_R1_good.fastq
No Overrepresented Sequences
Sequence Count Percentage Possible Source
GATCGGAAGA
GCACACGTCTG
AACTCCAGTCA
CTGACCAACTC
TCCGTATGC
3950 0.84781596
16915967
TrueSeq Adapter,
index 4 (110%
over 50 bp)
Interpretation: Overrepresented sequences include the sequences that
are highly duplicated in your library, as well as any primer and/or adapter
dimers that were present in the original library. Adapter sequences are
always present in a sequencing experiment at some level, but aren't
problematic in small percentages. These adapters will not align to your
genome. They can be ignored, or you may use analysis software to
remove them. This module lists all of the sequence, which make up
more than 0.1% of the total. To conserve memory only sequences which
appear in the first 200,000 sequences are tracked to the end of the file.
It is therefore possible that a sequence,which is overrepresented but
doesn't appear at the start of the file for some reason could be missed
by this module. Here wee find overrepresented sequences of adapter in
the bad file
Note : A warning is raised if any sequence is found to represent more
than 0.1% of the total and a failure is issued if any sequence is found to
represent more than 1% of the total.
Control_R1_bad.fastq
Overrepresented Sequences
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
No Overrepresented Sequences
Interpretation: Overrepresented sequences include the sequences that
are highly duplicated in your library, as well as any primer and/or adapter
dimers that were present in the original library. Adapter sequences are
always present in a sequencing experiment at some level, but aren't
problematic in small percentages. These adapters will not align to your
genome. They can be ignored, or you may use analysis software to
remove them. This module lists all the sequences, which make up more
than 0.1% of the total. To conserve memory only sequences which
appear in the first 200,000 sequences are tracked to the end of the file.
It is therefore possible that a sequence,which is overrepresented but
doesn't appear at the start of the file for some reason could be missed
by this module. Here wee find overrepresented sequences of adapter in
the bad file
Note : A warning is raised if any sequence is found to represent more
than 0.1% of the total and a failure is issued if any sequence is found to
represent more than 1% of the total.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generat
simple composition statistics for the file analyzed.
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_R
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good f
generated after running prinseq-lite showed > 16 milli
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and nev
raises an error.
Control_R1.fastq Control_R1_good.fas
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Duplicate sequences
Interpretation: This module counts the degree of duplication for every
sequence in the set and creates a plot showing the relative number of
sequences with different degrees of duplication. In this module analysis
occurs only for the first 200,000 different sequences seen. The number of
occurrences of these sequences is then tracked through the rest of the file,
but any new sequences after the first 200,000 are then discarded. Also,
any sequences with more than 10 duplicates are placed into the 10
duplicates category - so it's not unusual to see a small rise in this final
category. If the rate at which the duplicate plot falls from unique sequences
is slow – showing appreciable proportions of the library with duplication
levels of 3-5, and a small spike in the 10+ bin there may be a biological
rather than a technical cause. The most common type of library to produce
this type of plot is an RNA-Seq library. In this type of library it is expected
that some sequences will occur very frequently, and others will be very
rare. If you want to see the very rare sequences (eg low copy number
transcripts), then you will have to greatly over-sequence the most frequent
sequences (eg housekeeping genes), so a high level of duplication in part
of the library is unavoidable. Therefore you see a warning in the case of
raw and good files
Note: A warning is raised if non-unique sequences make up more than
20% of the total. And failure is issued if non-unique sequences make up
more than 50% of the total.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpret
sequenc
sequenc
occurs o
Control_R1.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Duplicate sequences
Interpretation: This module counts the degree of duplication for every
sequence in the set and creates a plot showing the relative number of
sequences with different degrees of duplication. In this module analysis
occurs only for the first 200,000 different sequences seen. The number of
occurrences of these sequences is then tracked through the rest of the file,
but any new sequences after the first 200,000 are then discarded. Also,
any sequences with more than 10 duplicates are placed into the 10
duplicates category - so it's not unusual to see a small rise in this final
category. If the rate at which the duplicate plot falls from unique sequences
is slow – showing appreciable proportions of the library with duplication
levels of 3-5, and a small spike in the 10+ bin there may be a biological
rather than a technical cause. The most common type of library to produce
this type of plot is an RNA-Seq library. In this type of library it is expected
that some sequences will occur very frequently, and others will be very
rare. If you want to see the very rare sequences (eg low copy number
transcripts), then you will have to greatly over-sequence the most frequent
sequences (eg housekeeping genes), so a high level of duplication in part
of the library is unavoidable. Therefore you see a warning in the case of
raw and good files
Note: A warning is raised if non-unique sequences make up more than
20% of the total. And failure is issued if non-unique sequences make up
more than 50% of the total.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the degree of duplication for every
Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Duplicate sequences
Interpretation: This module counts the degree of duplication for every
sequence in the set and creates a plot showing the relative number of
sequences with different degrees of duplication. In this module analysis
occurs only for the first 200,000 different sequences seen. The number of
occurrences of these sequences is then tracked through the rest of the file,
but any new sequences after the first 200,000 are then discarded. Also,
any sequences with more than 10 duplicates are placed into the 10
duplicates category - so it's not unusual to see a small rise in this final
category. If the rate at which the duplicate plot falls from unique sequences
is slow – showing appreciable proportions of the library with duplication
levels of 3-5, and a small spike in the 10+ bin there may be a biological
rather than a technical cause. The most common type of library to produce
this type of plot is an RNA-Seq library. In this type of library it is expected
that some sequences will occur very frequently, and others will be very
rare. If you want to see the very rare sequences (eg low copy number
transcripts), then you will have to greatly over-sequence the most frequent
sequences (eg housekeeping genes), so a high level of duplication in part
of the library is unavoidable. Therefore you see a warning in the case of
raw and good files
Note: A warning is raised if non-unique sequences make up more than
20% of the total. And failure is issued if non-unique sequences make up
more than 50% of the total.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpr
sequen
sequen
occurs
occurre
but any
any se
duplica
catego
is slow
levels o
rather t
this typ
that so
rare.
transcr
sequen
of the l
raw and
Note: A
20% of
more th
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Duplicate sequences
Interpretation: This module counts the degree of duplication for every
sequence in the set and creates a plot showing the relative number of
sequences with different degrees of duplication. In this module analysis
occurs only for the first 200,000 different sequences seen. The number of
occurrences of these sequences is then tracked through the rest of the file,
but any new sequences after the first 200,000 are then discarded. Also,
any sequences with more than 10 duplicates are placed into the 10
duplicates category - so it's not unusual to see a small rise in this final
category. If the rate at which the duplicate plot falls from unique sequences
is slow – showing appreciable proportions of the library with duplication
levels of 3-5, and a small spike in the 10+ bin may be biological rather than
technical. The most common type of library to produce this type of plot is
an RNA-Seq library. In this type of library it is expected that some
sequences will occur very frequently, and others will be very rare. If you
want to see very rare sequences (eg low copy number transcripts), then
you will have to greatly over-sequence the most frequent sequences (eg
housekeeping genes), so a high level of duplication in part of the library is
unavoidable. Therefore you see a warning in the case of raw and good
files
Note: A warning is raised if non-unique sequences make up more than
20% of the total. And failure is issued if non-unique sequences make up
more than 50% of the total.
Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the degree of duplication for every
sequence in the set and creates a plot showing the relative number of
sequences with different degrees of duplication. In this module analysis
occurs only for the first 200,000 different sequences seen. The number of
occurrences of these sequences is then tracked through the rest of the file,
but any new sequences after the first 200,000 are then discarded. Also,
any sequences with more than 10 duplicates are placed into the 10
duplicates category - so it's not unusual to see a small rise in this final
category. If the rate at which the duplicate plot falls from unique sequences
is slow – showing appreciable proportions of the library with duplication
levels of 3-5, and a small spike in the 10+ bin may be biological rather than
technical. The most common type of library to produce this type of plot is
an RNA-Seq library. In this type of library it is expected that some
sequences will occur very frequently, and others will be very rare. If you
want to see very rare sequences (eg low copy number transcripts), then
you will have to greatly over-sequence the most frequent sequences (eg
housekeeping genes), so a high level of duplication in part of the library is
unavoidable. Therefore you see a warning in the case of raw and good
files
Note: A warning is raised if non-unique sequences make up more than
20% of the total. And failure is issued if non-unique sequences make up
more than 50% of the total.
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation : The Basic Statistics module generates
simple composition statistics for the file analyzed. It
gives the filename, file type, Sequences, sequence
length and GC % .
Here all the three files controlR1.fastq, control_R1
good.fastq and control_R1bad.fastq were analyzed. The
raw data file showed > 17 million reads and the good file
generated after running prinseq-lite showed > 16 million
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and never
raises an error.
Control_R1.fastq Control_R1_good.fastq
Control_R1_bad.fastq
Interpretation : The Basic Statistics module generat
simple composition statistics for the file analyzed.
gives the filename, file type, Sequences, sequen
length and GC % .
Here all the three files controlR1.fastq, control_R
good.fastq and control_R1bad.fastq were analyzed. T
raw data file showed > 17 million reads and the good f
generated after running prinseq-lite showed > 16 milli
reads. The bad file showed only 0.4 million reads.
Note :- Basic Statistics never raises a warning and nev
raises an error.
Control_R1.fastq Control_R1_good.fas
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Overrepresented Kmers
Interpretation: This module counts the enrichment of every 5-mer within the
sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5
fold enrichment at any given base position will be reported by this module. 

Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or
more than 5 fold at any individual position.
And failure is issued if non-unique if any k-mer is enriched more than 10 fold at
any individual base position 

Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpre
sequenc
fold enric
Control_R1.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the enrichment of every 5-mer within the
sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5
fold enrichment at any given base position will be reported by this module. 

Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or
more than 5 fold at any individual position.
And failure is issued if non-unique if any k-mer is enriched more than 10 fold at
any individual base position 

Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the enrichment of every 5-mer within the
sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5
fold enrichment at any given base position will be reported by this module. 

Control_R1_good.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the enrichment of every 5-mer within the
sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5
fold enrichment at any given base position will be reported by this module. 

Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or
more than 5 fold at any individual position.
And failure is issued if non-unique if any k-mer is enriched more than 10 fold at
any individual base position 

Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
In
se
fol
No
mo
An
an
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the enrichment of every 5-mer within the
sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5
fold enrichment at any given base position will be reported by this module. 

Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or
more than 5 fold at any individual position.
And failure is issued if non-unique if any k-mer is enriched more than 10 fold at
any individual base position 

Control_R1.fastq
Control_R1_good.fastq
Control_R1_bad.fastq
Computational Biology and Genomics Facility, Indian Veterinary Research Institute
Interpretation: This module counts the enrichment of every 5-mer within the
sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5
fold enrichment at any given base position will be reported by this module. 

Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or
more than 5 fold at any individual position.
And failure is issued if non-unique if any k-mer is enriched more than 10 fold at
any individual base position 

FASTQC Analysis of FASTQ Files Before and After Trimming

More Related Content

What's hot

RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
 
Analysis of ATAC-seq data
Analysis of ATAC-seq dataAnalysis of ATAC-seq data
Analysis of ATAC-seq dataShaojun Xie
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Torsten Seemann
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing DataSurya Saha
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Protein microarray
Protein microarrayProtein microarray
Protein microarrayGhalia Nawal
 
Amplicon Sequencing Introduction
Amplicon Sequencing IntroductionAmplicon Sequencing Introduction
Amplicon Sequencing IntroductionAaron Marc Saunders
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...Gabe Rudy
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
 

What's hot (20)

RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Analysis of ATAC-seq data
Analysis of ATAC-seq dataAnalysis of ATAC-seq data
Analysis of ATAC-seq data
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
BWA-MEM2-IPDPS 2019
BWA-MEM2-IPDPS 2019BWA-MEM2-IPDPS 2019
BWA-MEM2-IPDPS 2019
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Protein microarray
Protein microarrayProtein microarray
Protein microarray
 
RNAseq Analysis
RNAseq AnalysisRNAseq Analysis
RNAseq Analysis
 
Amplicon Sequencing Introduction
Amplicon Sequencing IntroductionAmplicon Sequencing Introduction
Amplicon Sequencing Introduction
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 

Similar to FASTQC Analysis of FASTQ Files Before and After Trimming

LUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineLUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineHai-Wei Yen
 
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Elsa von Licy
 
Flowinspect - A Network Inspection Tool
Flowinspect - A Network Inspection ToolFlowinspect - A Network Inspection Tool
Flowinspect - A Network Inspection ToolAnkur Tyagi
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019Shay Ginsbourg
 
Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...
Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...
Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...David Howe
 
NetApp CIFS Audit.docx
NetApp CIFS Audit.docxNetApp CIFS Audit.docx
NetApp CIFS Audit.docxssuser2dbaee
 
2013-04-03 Open Source Framework to Catch the Bad Guys
2013-04-03 Open Source Framework to Catch the Bad Guys2013-04-03 Open Source Framework to Catch the Bad Guys
2013-04-03 Open Source Framework to Catch the Bad GuysShawn Wells
 
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsWebinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsSeveralnines
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Yasset Perez-Riverol
 
The Linux Audit Framework
The Linux Audit FrameworkThe Linux Audit Framework
The Linux Audit FrameworkGary Smith
 
BC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan PresentationBC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan PresentationElijah Willie
 
Investigating Hackers' Tools
Investigating Hackers' ToolsInvestigating Hackers' Tools
Investigating Hackers' ToolsIsrael Umana
 
Process control daemon
Process control daemonProcess control daemon
Process control daemonhaish
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesHao Chen
 
nextcomputing-cyberpro
nextcomputing-cyberpronextcomputing-cyberpro
nextcomputing-cyberproblabadini
 

Similar to FASTQC Analysis of FASTQ Files Before and After Trimming (20)

LUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineLUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis Pipeline
 
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
 
Flowinspect - A Network Inspection Tool
Flowinspect - A Network Inspection ToolFlowinspect - A Network Inspection Tool
Flowinspect - A Network Inspection Tool
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
Ginsbourg.com - Presentation of Performance & Load Testing Validation 2019
 
Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...
Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...
Lab Informatics - NuGenesis SDMS interfaced with Compound QC Web App for Chem...
 
NetApp CIFS Audit.docx
NetApp CIFS Audit.docxNetApp CIFS Audit.docx
NetApp CIFS Audit.docx
 
2013-04-03 Open Source Framework to Catch the Bad Guys
2013-04-03 Open Source Framework to Catch the Bad Guys2013-04-03 Open Source Framework to Catch the Bad Guys
2013-04-03 Open Source Framework to Catch the Bad Guys
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsWebinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
 
Raptor user manual3.0
Raptor user manual3.0Raptor user manual3.0
Raptor user manual3.0
 
Software Quality Metrics
Software Quality MetricsSoftware Quality Metrics
Software Quality Metrics
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...
 
The Linux Audit Framework
The Linux Audit FrameworkThe Linux Audit Framework
The Linux Audit Framework
 
BC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan PresentationBC-Cancer ChimeraScan Presentation
BC-Cancer ChimeraScan Presentation
 
Investigating Hackers' Tools
Investigating Hackers' ToolsInvestigating Hackers' Tools
Investigating Hackers' Tools
 
Process control daemon
Process control daemonProcess control daemon
Process control daemon
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New Features
 
nextcomputing-cyberpro
nextcomputing-cyberpronextcomputing-cyberpro
nextcomputing-cyberpro
 

More from Ravi Gandham

Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packagesRavi Gandham
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
NGS data analysis Overview
NGS data analysis Overview NGS data analysis Overview
NGS data analysis Overview Ravi Gandham
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 

More from Ravi Gandham (8)

Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
RSEM and DE packages
RSEM and DE packagesRSEM and DE packages
RSEM and DE packages
 
Cufflinks
CufflinksCufflinks
Cufflinks
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Data formats
Data formatsData formats
Data formats
 
NGS data analysis Overview
NGS data analysis Overview NGS data analysis Overview
NGS data analysis Overview
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Primer designing
Primer designingPrimer designing
Primer designing
 

Recently uploaded

CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

FASTQC Analysis of FASTQ Files Before and After Trimming

  • 1. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Data (Fastq) FASTQC Prinseqlite Trimming/filtering Data Quality Check and Interpretation
  • 2. Computational Biology and Genomics Facility, Indian Veterinary Research Institute perl prinseq-lite.pl -fastq control.fastq -out_format 5 -min_len 50 -min_qual_mean 25 Input fastq file 1 (FASTA only), 2 (FASTA and QUAL), 3 (FASTQ), 4 (FASTQ and FASTA), or 5 (FASTQ, FASTA and QUAL) Output format Filter sequences shorter than minimum length (here it is 50 nucleotides) Filter sequence with quality score mean below minimum quality mean (here it is 25) PRINSEQ - • generates summary statistics of sequence and quality data • used to filter, reformat and trim next-generation sequence data. • PRINSEQ is available through a user-friendly web interface or as standalone version. Command for quality filtering : perl prinseq-lite.pl -fastq control.fastq -out_format 5 -min_len 50 - min_qual_mean 25 For any further help type :- perl prinseq-lite.pl -h
  • 3. Computational Biology and Genomics Facility, Indian Veterinary Research Institute
  • 4. Computational Biology and Genomics Facility, Indian Veterinary Research Institute
  • 5. Computational Biology and Genomics Facility, Indian Veterinary Research Institute
  • 6. Computational Biology and Genomics Facility, Indian Veterinary Research Institute The output generated from analyzing three files viz. original fastq i.e control_R1.fastq, and the good and bad fastq files generated from prinseq-lite
  • 7. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Basic Statistics: Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 8. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpre simple Control_R1.fastq
  • 9. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Basic Statistics: Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 10. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence Control_R1_good.fastq
  • 11. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Basic Statistics: Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 12. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpret simple c gives th length an Here all good.fas raw data generate reads. Th Note :- B raises an Control_R1_bad.fastq
  • 13. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Basic Statistics: Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 14. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error.
  • 15. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module genera simple composition statistics for the file analyzed gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_ good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good generated after running prinseq-lite showed > 16 mill reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and ne raises an error. Control_R1.fastq Control_R1_good.fa Control_R1_bad.fastq
  • 16. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Base Sequence Quality
  • 17. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1.fastq
  • 18. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Base Sequence Quality
  • 19. Computational Biology and Genomics Facility, Indian Veterinary Research InstituteInterpretation: This view shows an overview of the range of quality Control_R1_good.fastq
  • 20. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Base Sequence Quality
  • 21. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interp values For ea in colo The bo range 90% p represe It can means low me mark ( window Note : than 1 raised for any Control_R1_bad.fastq
  • 22. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Base Sequence Quality
  • 23. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This view shows an overview of the range of quality values across all bases at each position in the FastQ file. 
 For each position a BoxWhisker type plot is drawn. The central line (red in colour) is the median value.
 The box (yellow in colour) represents the inter-quartile 
 range (25-75%). The upper and lower whiskers represent the 10% and 90% points. The line (blue in colour) that runs across the graphs represents the mean quality.
 It can be appreciated that output of good files looks to have the best means quality in comparison to raw data file. The bad file had a very low mean quality thereby, a failure was issued as indicated by a cross mark (red in colour) against per base sequence quality in the main window of that data set. Note : A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25. A failure was raised if the lower quartile for any base is less than 5 or if the median for any base is less than 20.
  • 24. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module genera simple composition statistics for the file analyzed gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_ good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good generated after running prinseq-lite showed > 16 mill reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and ne raises an error. Control_R1.fastq Control_R1_good.fa Control_R1_bad.fastq
  • 25. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence Quality Scores Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 26. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1.fastq C
  • 27. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence Quality Scores Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 28. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1_good.fastq
  • 29. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence Quality Scores Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 30. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretat subset of y under the b file indicatin file in com better in the very poor q (red in colo that data se Note : A wa below 27 - most freque error rate. Control_R1_bad.fastq
  • 31. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence Quality Scores Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 32. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: The per sequence quality score report allows us to see if a subset of your sequences have universally low quality values. The area under the bell shaped curve was greater for the good file than the raw data file indicating that most of the sequences had very good quality in the good file in comparison to the raw data file.Average quality per read is also better in the good file. In the bad file most of the sequences in the bad had very poor quality. A failure was also issued as indicated by a cross mark (red in colour) against per sequence quality score in the main window of that data set. Note : A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. A failure error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate.
  • 33. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module genera simple composition statistics for the file analyzed gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_ good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good generated after running prinseq-lite showed > 16 mill reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and ne raises an error. Control_R1.fastq Control_R1_good.fa Control_R1_bad.fastq
  • 34. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: Per Base Sequence Content plots the proportion of each base position in a file for which each of the four normal DNA bases has been called. In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in the genome, but in any case they should not be hugely imbalanced from each other. However in all our cases here there is a failure issued due to variation in A to T and G to C percentages at the start of the reads till the 14th base. This also indicates that the reads can be trimmed at the 5’end till the 14th base. Note : A warning is issued if the difference between A and T, or G and C is greater than 10% in any position and a failure is issued if the difference between A and T, or G and C is greater than 20% in any position. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation: Per Base Sequence Content plots the proportion of each base position in a file for which each of the four normal DNA bases has been called. In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in the genome, but in any case they should not be hugely imbalanced from each other. However in all our cases here there is a failure issued due to variation in A to T and G to C percentages at the start of the reads till the 14th base. This also indicates that the reads can be trimmed at the 5’end till the 14th base. Note : A warning is issued if the difference between A and T, or G and C is greater than 10% in any position and a failure is issued if the difference between A and T, or G and C is greater than 20% in any position. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 35. Computational Biology and Genomics Facility, Indian Veterinary Research Institute In Control_R1.fastq
  • 36. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: Per Base Sequence Content plots the proportion of each base position in a file for which each of the four normal DNA bases has been called. In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in the genome, but in any case they should not be hugely imbalanced from each other. However in all our cases here there is a failure issued due to variation in A to T and G to C percentages at the start of the reads till the 14th base. This also indicates that the reads can be trimmed at the 5’end till the 14th base. Note : A warning is issued if the difference between A and T, or G and C is greater than 10% in any position and a failure is issued if the difference between A and T, or G and C is greater than 20% in any position. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 37. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1_good.fastq
  • 38. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: Per Base Sequence Content plots the proportion of each base position in a file for which each of the four normal DNA bases has been called. In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in the genome, but in any case they should not be hugely imbalanced from each other. However in all our cases here there is a failure issued due to variation in A to T and G to C percentages at the start of the reads till the 14th base. This also indicates that the reads can be trimmed at the 5’end till the 14th base. Note : A warning is issued if the difference between A and T, or G and C is greater than 10% in any position and a failure is issued if the difference between A and T, or G and C is greater than 20% in any position. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 39. Computational Biology and Genomics Facility, Indian Veterinary Research Institute I p f w t s b g f i s c N a t a Control_R1_bad.fastq
  • 40. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: Per Base Sequence Content plots the proportion of each base position in a file for which each of the four normal DNA bases has been called. In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in the genome, but in any case they should not be hugely imbalanced from each other. However in all our cases here there is a failure issued due to variation in A to T and G to C percentages at the start of the reads till the 14th base. This also indicates that the reads can be trimmed at the 5’end till the 14th base. Note : A warning is issued if the difference between A and T, or G and C is greater than 10% in any position and a failure is issued if the difference between A and T, or G and C is greater than 20% in any position. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 41. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: Per Base Sequence Content plots the proportion of each base position in a file for which each of the four normal DNA bases has been called. In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in the genome, but in any case they should not be hugely imbalanced from each other. However in all our cases here there is a failure issued due to variation in A to T and G to C percentages at the start of the reads till the 14th base. This also indicates that the reads can be trimmed at the 5’end till the 14th base. Note : A warning is issued if the difference between A and T, or G and C is greater than 10% in any position and a failure is issued if the difference between A and T, or G and C is greater than 20% in any position.
  • 42. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module genera simple composition statistics for the file analyzed gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_ good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good generated after running prinseq-lite showed > 16 mill reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and ne raises an error. Control_R1.fastq Control_R1_good.fa Control_R1_bad.fastq
  • 43. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence GC Content
  • 44. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpret length of Control_R1.fastq Cont
  • 45. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence GC Content
  • 46. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module measures the GC content across the whole Control_R1_good.fastq
  • 47. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence GC Content
  • 48. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretat length of e distribution a roughly correspond shaped dis of biased s files and a the normal good and ra Note : A wa distribution sum of the of the reads Control_R1_bad.fastq
  • 49. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure was issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Sequence GC Content
  • 50. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module measures the GC content across the whole length of each sequence in a file and compares it to a modeled normal distribution of GC content. In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. In our data sets a warning is raised for the raw and good files and a failure is issued for the bad file. It is clear that the deviation from the normal distribution is greater in the bad file output in comparison to the good and raw file outputs. Note : A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads. And failure is issued if the sum of the deviations from the normal distribution represents more than 30% of the reads.
  • 51. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generat simple composition statistics for the file analyzed. gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_R good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good f generated after running prinseq-lite showed > 16 milli reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and nev raises an error. Control_R1.fastq Control_R1_good.fas Control_R1_bad.fastq
  • 52. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Base N Content Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 53. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpreta position f Control_R1.fastq
  • 54. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Per Base N Content Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 55. Computational Biology and Genomics Facility, Indian Veterinary Research InstituteInterpretation:This module plots out the percentage of base calls at each Control_R1_good.fastq
  • 56. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 57. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpre position proportio sequenc Note : A failure is Control_R1_bad.fastq
  • 58. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20% Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 59. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation:This module plots out the percentage of base calls at each position for which an N was called. It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. Here no N’s were found in all the datasets. Note : A warning is raised if any position shows an N content of >5% and a failure is issued if any position shows an N content of >20%
  • 60. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generat simple composition statistics for the file analyzed. gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_R good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good f generated after running prinseq-lite showed > 16 milli reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and nev raises an error. Control_R1.fastq Control_R1_good.fas Control_R1_bad.fastq
  • 61. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Sequence length Distribution Interpretation :This module generates a graph showing the distribution of fragment sizes in the file which was analysed.In many cases this will produce a simple graph showing a peak only at one size, but for variable length FastQ files this will show the relative amounts of each different size of sequence fragment. Here in all the files we have reads with a length of 101 bp. Note : A warning is raised if all sequences are not the same length. A failure is issued if any of the sequences have zero length. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation :This module generates a graph showing the distribution of fragment sizes in the file which was analysed.In many cases this will produce a simple graph showing a peak only at one size, but for variable length FastQ files this will show the relative amounts of each different size of sequence fragment. Here in all the files we have reads with a length of 101 bp. Note : A warning is raised if all sequences are not the same length. A failure is issued if any of the sequences have zero length. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 62. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpre sizes in Control_R1.fastq Co
  • 63. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Sequence length Distribution Interpretation :This module generates a graph showing the distribution of fragment sizes in the file which was analysed.In many cases this will produce a simple graph showing a peak only at one size, but for variable length FastQ files this will show the relative amounts of each different size of sequence fragment. Here in all the files we have reads with a length of 101 bp. Note : A warning is raised if all sequences are not the same length. A failure is issued if any of the sequences have zero length. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 64. Computational Biology and Genomics Facility, Indian Veterinary Research InstituteInterpretation :This module generates a graph showing the distribution of fragment Control_R1_good.fastq
  • 65. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation :This module generates a graph showing the distribution of fragment sizes in the file which was analysed.In many cases this will produce a simple graph showing a peak only at one size, but for variable length FastQ files this will show the relative amounts of each different size of sequence fragment. Here in all the files we have reads with a length of 101 bp. Note : A warning is raised if all sequences are not the same length. A failure is issued if any of the sequences have zero length. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Sequence length Distribution
  • 66. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretati sizes in the showing a the relative files we ha Note : A w issued if an Control_R1_bad.fastq
  • 67. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module generates a graph showing the distribution of fragment sizes in the file which was analysed.In many cases this will produce a simple graph showing a peak only at one size, but for variable length FastQ files this will show the relative amounts of each different size of sequence fragment. Here in all the files we have reads with a length of 101 bp. Note : A warning is raised if all sequences are not the same length. A failure is issued if any of the sequences have zero length. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Sequence length Distribution
  • 68. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module generates a graph showing the distribution of fragment sizes in the file which was analysed.In many cases this will produce a simple graph showing a peak only at one size, but for variable length FastQ files this will show the relative amounts of each different size of sequence fragment. Here in all the files we have reads with a length of 101 bp. Note : A warning is raised if all sequences are not the same length. A failure is issued if any of the sequences have zero length.
  • 69. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generat simple composition statistics for the file analyzed. gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_R good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good f generated after running prinseq-lite showed > 16 milli reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and nev raises an error. Control_R1.fastq Control_R1_good.fas Control_R1_bad.fastq
  • 70. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1.fastq No Overrepresented Sequences Control_R1_good.fastq No Overrepresented Sequences Sequence Count Percentage Possible Source GATCGGAAGA GCACACGTCTG AACTCCAGTCA CTGACCAACTC TCCGTATGC 3950 0.84781596 16915967 TrueSeq Adapter, index 4 (110% over 50 bp) Interpretation: Overrepresented sequences include the sequences that are highly duplicated in your library, as well as any primer and/or adapter dimers that were present in the original library. Adapter sequences are always present in a sequencing experiment at some level, but aren't problematic in small percentages. These adapters will not align to your genome. They can be ignored, or you may use analysis software to remove them. This module lists all of the sequence, which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 200,000 sequences are tracked to the end of the file. It is therefore possible that a sequence,which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module. Here wee find overrepresented sequences of adapter in the bad file Note : A warning is raised if any sequence is found to represent more than 0.1% of the total and a failure is issued if any sequence is found to represent more than 1% of the total. Control_R1_bad.fastq Overrepresented Sequences
  • 71. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1.fastq No Overrepresented Sequences Sequence Count Percentage Possible Source GATCGGAAGA GCACACGTCTG AACTCCAGTCA CTGACCAACTC TCCGTATGC 3950 0.84781596 16915967 TrueSeq Adapter, index 4 (110% over 50 bp) I a d a p g r m a I d b t N t r Control_R1_bad.fastq
  • 72. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Control_R1.fastq No Overrepresented Sequences Control_R1_good.fastq No Overrepresented Sequences Sequence Count Percentage Possible Source GATCGGAAGA GCACACGTCTG AACTCCAGTCA CTGACCAACTC TCCGTATGC 3950 0.84781596 16915967 TrueSeq Adapter, index 4 (110% over 50 bp) Interpretation: Overrepresented sequences include the sequences that are highly duplicated in your library, as well as any primer and/or adapter dimers that were present in the original library. Adapter sequences are always present in a sequencing experiment at some level, but aren't problematic in small percentages. These adapters will not align to your genome. They can be ignored, or you may use analysis software to remove them. This module lists all of the sequence, which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 200,000 sequences are tracked to the end of the file. It is therefore possible that a sequence,which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module. Here wee find overrepresented sequences of adapter in the bad file Note : A warning is raised if any sequence is found to represent more than 0.1% of the total and a failure is issued if any sequence is found to represent more than 1% of the total. Control_R1_bad.fastq Overrepresented Sequences
  • 73. Computational Biology and Genomics Facility, Indian Veterinary Research Institute No Overrepresented Sequences Interpretation: Overrepresented sequences include the sequences that are highly duplicated in your library, as well as any primer and/or adapter dimers that were present in the original library. Adapter sequences are always present in a sequencing experiment at some level, but aren't problematic in small percentages. These adapters will not align to your genome. They can be ignored, or you may use analysis software to remove them. This module lists all the sequences, which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 200,000 sequences are tracked to the end of the file. It is therefore possible that a sequence,which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module. Here wee find overrepresented sequences of adapter in the bad file Note : A warning is raised if any sequence is found to represent more than 0.1% of the total and a failure is issued if any sequence is found to represent more than 1% of the total.
  • 74. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generat simple composition statistics for the file analyzed. gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_R good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good f generated after running prinseq-lite showed > 16 milli reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and nev raises an error. Control_R1.fastq Control_R1_good.fas Control_R1_bad.fastq
  • 75. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Duplicate sequences Interpretation: This module counts the degree of duplication for every sequence in the set and creates a plot showing the relative number of sequences with different degrees of duplication. In this module analysis occurs only for the first 200,000 different sequences seen. The number of occurrences of these sequences is then tracked through the rest of the file, but any new sequences after the first 200,000 are then discarded. Also, any sequences with more than 10 duplicates are placed into the 10 duplicates category - so it's not unusual to see a small rise in this final category. If the rate at which the duplicate plot falls from unique sequences is slow – showing appreciable proportions of the library with duplication levels of 3-5, and a small spike in the 10+ bin there may be a biological rather than a technical cause. The most common type of library to produce this type of plot is an RNA-Seq library. In this type of library it is expected that some sequences will occur very frequently, and others will be very rare. If you want to see the very rare sequences (eg low copy number transcripts), then you will have to greatly over-sequence the most frequent sequences (eg housekeeping genes), so a high level of duplication in part of the library is unavoidable. Therefore you see a warning in the case of raw and good files Note: A warning is raised if non-unique sequences make up more than 20% of the total. And failure is issued if non-unique sequences make up more than 50% of the total. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 76. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpret sequenc sequenc occurs o Control_R1.fastq
  • 77. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Duplicate sequences Interpretation: This module counts the degree of duplication for every sequence in the set and creates a plot showing the relative number of sequences with different degrees of duplication. In this module analysis occurs only for the first 200,000 different sequences seen. The number of occurrences of these sequences is then tracked through the rest of the file, but any new sequences after the first 200,000 are then discarded. Also, any sequences with more than 10 duplicates are placed into the 10 duplicates category - so it's not unusual to see a small rise in this final category. If the rate at which the duplicate plot falls from unique sequences is slow – showing appreciable proportions of the library with duplication levels of 3-5, and a small spike in the 10+ bin there may be a biological rather than a technical cause. The most common type of library to produce this type of plot is an RNA-Seq library. In this type of library it is expected that some sequences will occur very frequently, and others will be very rare. If you want to see the very rare sequences (eg low copy number transcripts), then you will have to greatly over-sequence the most frequent sequences (eg housekeeping genes), so a high level of duplication in part of the library is unavoidable. Therefore you see a warning in the case of raw and good files Note: A warning is raised if non-unique sequences make up more than 20% of the total. And failure is issued if non-unique sequences make up more than 50% of the total. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 78. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the degree of duplication for every Control_R1_good.fastq
  • 79. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Duplicate sequences Interpretation: This module counts the degree of duplication for every sequence in the set and creates a plot showing the relative number of sequences with different degrees of duplication. In this module analysis occurs only for the first 200,000 different sequences seen. The number of occurrences of these sequences is then tracked through the rest of the file, but any new sequences after the first 200,000 are then discarded. Also, any sequences with more than 10 duplicates are placed into the 10 duplicates category - so it's not unusual to see a small rise in this final category. If the rate at which the duplicate plot falls from unique sequences is slow – showing appreciable proportions of the library with duplication levels of 3-5, and a small spike in the 10+ bin there may be a biological rather than a technical cause. The most common type of library to produce this type of plot is an RNA-Seq library. In this type of library it is expected that some sequences will occur very frequently, and others will be very rare. If you want to see the very rare sequences (eg low copy number transcripts), then you will have to greatly over-sequence the most frequent sequences (eg housekeeping genes), so a high level of duplication in part of the library is unavoidable. Therefore you see a warning in the case of raw and good files Note: A warning is raised if non-unique sequences make up more than 20% of the total. And failure is issued if non-unique sequences make up more than 50% of the total. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 80. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpr sequen sequen occurs occurre but any any se duplica catego is slow levels o rather t this typ that so rare. transcr sequen of the l raw and Note: A 20% of more th Control_R1_bad.fastq
  • 81. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Duplicate sequences Interpretation: This module counts the degree of duplication for every sequence in the set and creates a plot showing the relative number of sequences with different degrees of duplication. In this module analysis occurs only for the first 200,000 different sequences seen. The number of occurrences of these sequences is then tracked through the rest of the file, but any new sequences after the first 200,000 are then discarded. Also, any sequences with more than 10 duplicates are placed into the 10 duplicates category - so it's not unusual to see a small rise in this final category. If the rate at which the duplicate plot falls from unique sequences is slow – showing appreciable proportions of the library with duplication levels of 3-5, and a small spike in the 10+ bin may be biological rather than technical. The most common type of library to produce this type of plot is an RNA-Seq library. In this type of library it is expected that some sequences will occur very frequently, and others will be very rare. If you want to see very rare sequences (eg low copy number transcripts), then you will have to greatly over-sequence the most frequent sequences (eg housekeeping genes), so a high level of duplication in part of the library is unavoidable. Therefore you see a warning in the case of raw and good files Note: A warning is raised if non-unique sequences make up more than 20% of the total. And failure is issued if non-unique sequences make up more than 50% of the total. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 82. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the degree of duplication for every sequence in the set and creates a plot showing the relative number of sequences with different degrees of duplication. In this module analysis occurs only for the first 200,000 different sequences seen. The number of occurrences of these sequences is then tracked through the rest of the file, but any new sequences after the first 200,000 are then discarded. Also, any sequences with more than 10 duplicates are placed into the 10 duplicates category - so it's not unusual to see a small rise in this final category. If the rate at which the duplicate plot falls from unique sequences is slow – showing appreciable proportions of the library with duplication levels of 3-5, and a small spike in the 10+ bin may be biological rather than technical. The most common type of library to produce this type of plot is an RNA-Seq library. In this type of library it is expected that some sequences will occur very frequently, and others will be very rare. If you want to see very rare sequences (eg low copy number transcripts), then you will have to greatly over-sequence the most frequent sequences (eg housekeeping genes), so a high level of duplication in part of the library is unavoidable. Therefore you see a warning in the case of raw and good files Note: A warning is raised if non-unique sequences make up more than 20% of the total. And failure is issued if non-unique sequences make up more than 50% of the total.
  • 83. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation : The Basic Statistics module generates simple composition statistics for the file analyzed. It gives the filename, file type, Sequences, sequence length and GC % . Here all the three files controlR1.fastq, control_R1 good.fastq and control_R1bad.fastq were analyzed. The raw data file showed > 17 million reads and the good file generated after running prinseq-lite showed > 16 million reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and never raises an error. Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq Interpretation : The Basic Statistics module generat simple composition statistics for the file analyzed. gives the filename, file type, Sequences, sequen length and GC % . Here all the three files controlR1.fastq, control_R good.fastq and control_R1bad.fastq were analyzed. T raw data file showed > 17 million reads and the good f generated after running prinseq-lite showed > 16 milli reads. The bad file showed only 0.4 million reads. Note :- Basic Statistics never raises a warning and nev raises an error. Control_R1.fastq Control_R1_good.fas Control_R1_bad.fastq
  • 84. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Overrepresented Kmers Interpretation: This module counts the enrichment of every 5-mer within the sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module. 
 Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. And failure is issued if non-unique if any k-mer is enriched more than 10 fold at any individual base position 
 Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 85. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpre sequenc fold enric Control_R1.fastq
  • 86. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the enrichment of every 5-mer within the sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module. 
 Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. And failure is issued if non-unique if any k-mer is enriched more than 10 fold at any individual base position 
 Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 87. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the enrichment of every 5-mer within the sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module. 
 Control_R1_good.fastq
  • 88. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the enrichment of every 5-mer within the sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module. 
 Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. And failure is issued if non-unique if any k-mer is enriched more than 10 fold at any individual base position 
 Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 89. Computational Biology and Genomics Facility, Indian Veterinary Research Institute In se fol No mo An an Control_R1_bad.fastq
  • 90. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the enrichment of every 5-mer within the sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module. 
 Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. And failure is issued if non-unique if any k-mer is enriched more than 10 fold at any individual base position 
 Control_R1.fastq Control_R1_good.fastq Control_R1_bad.fastq
  • 91. Computational Biology and Genomics Facility, Indian Veterinary Research Institute Interpretation: This module counts the enrichment of every 5-mer within the sequence library. Any k-mer showing more than a 3 fold overall enrichment or a 5 fold enrichment at any given base position will be reported by this module. 
 Note: A warning is raised if any k-mer is enriched more than 3 fold overall, or more than 5 fold at any individual position. And failure is issued if non-unique if any k-mer is enriched more than 10 fold at any individual base position