The document discusses sources affecting next-generation sequencing (NGS) quality and how to identify problematic NGS samples. It analyzes base sequencing quality, quality trimming, biases from base composition, potential contaminations, and gene content of two samples (A and B). Sample B showed poorer base quality, more unmapped reads, and evidence of Proteobacteria contamination compared to Sample A. Further quality control is recommended to identify issues before assembly.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
Talk by A Tovchigrechko at BOSC2012: "MGTAXA: a toolkit and webserver for predicting taxonomy of the metagenomic sequences with Galaxy frontend and parallel computational backend"
This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
Computational biologist and Basepair founder, Dr. Amit Sinha (@ausinha) helps viewers navigate the world of RNA-Seq analysis. Topics include: Introduction to RNA-Seq, tools and workflows for analysis, visualization and figures, Q & A. More info at: https://www.basepairtech.com/
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
Fourth part of the training session 'RNA-seq for Differential expression analysis'. We explain how we get a count table from a mapping result. We show how to do quality control on the count table. Interested in following this session? Please contact http://www.jakonix.be/contact.html
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
Talk by A Tovchigrechko at BOSC2012: "MGTAXA: a toolkit and webserver for predicting taxonomy of the metagenomic sequences with Galaxy frontend and parallel computational backend"
This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
Computational biologist and Basepair founder, Dr. Amit Sinha (@ausinha) helps viewers navigate the world of RNA-Seq analysis. Topics include: Introduction to RNA-Seq, tools and workflows for analysis, visualization and figures, Q & A. More info at: https://www.basepairtech.com/
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
Second part of the training session 'RNA-seq for Differential expression' analysis. We explain the characteristics of RNA-seq data that allow us to detect differential expression. Interested in following this session? Please contact http://www.jakonix.be/contact.html
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
Fourth part of the training session 'RNA-seq for Differential expression analysis'. We explain how we get a count table from a mapping result. We show how to do quality control on the count table. Interested in following this session? Please contact http://www.jakonix.be/contact.html
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
600 base reads on the Ion S5™ Next-Generation Sequencing System enables accur...Thermo Fisher Scientific
Longer read lengths simplify genome assembly,
haplotyping, metagenomics, and the design of library
primers for targeted resequencing. Several new
technologies were developed to enable the
sequencing of templates with inserts over 600 bases:
a fast isothermal templating technology, an ISP™
that is optimized for maximum template density, a
new long-read sequencing polymerase, and
instrument scripts that consume less reagents. We
demonstrate the combination of these technologies
to sequence 600 base long DNAs on an Ion 530
Chip™ with an average AQ20 mean read length over
500 bp. The protocol was used to type human
leukocyte antigen (HLA) alleles, a haplotyping
application that is greatly simplified by long read
length sequence data. 96 HLA samples were typed
with 99.7% concordance to truth on one Ion 530
chip.
Basics of Primer designing.
Steps involved in designing primers for Prokaryotic expression
Steps involved in designing primers for Eukaryotic expression
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Hemiptera.
DETECTION OF BACTERIAL PLANT PATHOGENS BY SEROLOGICAL METHODS 2.pdfsunilsuriya1
Detection of bacterial plant pathogens by serological methods involves the use of specific antibodies to identify and quantify the presence of harmful bacteria in plants. This approach is based on the principle of antigen-antibody interactions, where antibodies bind to specific antigens on the surface of the target bacteria.
Here's a short description of the process:
1. **Antibody Production**: Specific antibodies are raised against the bacterial antigens of interest. These antibodies can be generated in animals, such as rabbits or mice, through immunization with the target bacterial cells or purified antigens.
2. **Sample Collection**: Plant samples suspected of being infected with the target bacteria are collected from the field. These samples could include leaves, stems, roots, or fruits.
3. **Sample Preparation**: The collected plant samples are processed to extract bacterial antigens. This may involve grinding the plant tissue and isolating the bacterial cells or proteins.
4. **Serological Assay**: The extracted antigens are then applied to a solid phase, such as an enzyme-linked immunosorbent assay (ELISA) plate. The plate is coated with the extracted antigens, allowing them to immobilize on the surface.
5. **Antibody Binding**: The specific antibodies generated earlier are added to the plate. If the target bacteria are present in the sample, the antibodies will bind to the bacterial antigens, forming antigen-antibody complexes.
6. **Detection**: A secondary antibody, often labeled with an enzyme or fluorescent molecule, is then added. This secondary antibody binds to the primary antibodies, amplifying the signal.
7. **Signal Development**: In an ELISA, for example, an enzyme substrate is added, which, upon reaction with the enzyme on the secondary antibody, produces a detectable color change. In fluorescent assays, the signal is detected using a fluorescence microscope or plate reader.
8. **Quantification**: The intensity of the color change or fluorescence is proportional to the amount of target bacteria present in the sample. This allows for the quantification of the bacterial pathogen's concentration in the plant sample.
9. **Interpretation**: Results are compared to standards or controls to determine the presence and concentration of the bacterial pathogen. Positive samples show a visible signal, while negative samples do not.
**Advantages of Serological Methods:**
- High specificity, as antibodies are designed to target specific bacterial antigens.
- Sensitivity to detect even low concentrations of the pathogen.
- Relatively rapid results compared to traditional culture-based methods.
- Applicability to a wide range of plant samples and bacterial pathogens.
**Limitations:**
- Requires specific antibodies for each target pathogen.
- Cross-reactivity with related bacterial species can occur.
- Proper sample handling and processing are crucial to avoid false positives.
1. Learn
from
Prac,ce
-‐What
Tells
You
about
a
Problema,c
NGS
Dongyan
Postdoctoral
Research
Associate
Buell
Lab/Jiang
Lab
2015.4.8
2. Sources
affec,ng
NGS
1. Systema,c
varia,on
in
quality
scores
across
the
sequence
read
2. Quality
trimming
and
cleaning
of
raw
reads
3. Biases
in
sequence
genera,on
driven
by
base
composi,on
4. Contamina,on
from
known
and
unknown
species
other
than
the
sequencing
target
5. NGS
libraries
on
assembly
quality
6. others
7. ………………………………….
4. Per
base
quality
score
Forward
reads
Reverse
reads
Sample
A
Sample
B
5. 200
bp
300
bp
400
bp
500
bp
700
bp
800
bp
Library
QC
using
Bioanalyzer
Sample
A
Sample
B
Adapted
from
the
report
generated
by
Emily
Crisovan
(Buell
lab)
6. Cause
for
the
poor
base
quality
for
Sample
B
Illumina
flowcells
may
not
handle
longer
fragments
well
Bronner
et
al.,
2009
10. k-‐mer
content
(residual
adapter
sequences)
–paired-‐end
reads
Before
cleaning
Aaer
cleaning
• Only
happened
to
paired-‐end
libraries
with
small
insert
size
(<400
bp).
• Not
happen
to
paired-‐end
libraries
with
insert
size
greater
than
400
bp.
11. k-‐mer
content
• This
is
due
to
the
‘reading
through’
a
short
fragment
into
the
adapter
sequence
on
the
other
end.
• The
default
threshold
of
the
clip
is
too
high?
• ILLUMINACLIP:TruSeq3-‐PE.fa:2:30:10
12. k-‐mer
content
(residual
adapter
sequences)
–mate
pair
reads
Aaer
cleaning
and
grouping
reads
to
categories
using
NextClip
• Those
k-‐mers
are
from
the
junc,on
adapter
13. k-‐mer
content
• Didn’t
want
to
lower
down
the
threshold
in
case
it
may
clip
more
than
necessary
• Used
cutadapt
and
its
default
selng
to
remove
the
residual
adapter
sequences
aaer
trimmoma,c
and/or
NextClip
cleaning
14. Residual
adapter
on
assembly
w/
residual
adapter
w/o
residual
adapter
Never
rush
to
assembly
before
you
are
sure
you
have
a
high-‐quality
and
‘clean’
read
sets!
18. Per
sequence
GC
content
SGA
preQC
Sample
A
Sample
B
Contamina,ons?
19. QC
• Map
reads
back
to
the
assembly
• Taxon-‐Annotated
Gene
Content
• MAKER
annota,on
of
the
assembly
• OrthoMCL
analysis
20. Mapping
reads
to
the
assemblies
• Assembled
reads
using
ABySS
• Map
reads
back
to
the
assembly
using
Bow,e/
1.0.0
in
single
end
mode
allowing
1
mismatch
Sample
B
assembly
reads
mapped
unmapped
Sample
A
73.37%
26.63%
Sample
B
60.94%
39.06%
Contamina,ons?
21. TAGC
Sample
A
Sample
B
hpps://github.com/blaxterlab/blobology
hpps://github.com/mojones/blobsplorer
24. Maker
annota,on
Data
used
#con,g>1000bp
Sample
A
assembly
75,417
Sample
B
assembly
92833
• EST
evidence
• caa_assembly.fasta
(Elsa)
• Protein
homology
evidence:
• uniprot_sprot_plants.fasta
• TAIR10_pep_20110103_representa,ve_gene_model
• Repeat
masking-‐default
Sample
A
Sample
B
Num_of_transcripts
31,234
45,791
Max_len_trans
14,796
29,577
Min_len_trans
28
33
N50
17,253,963
27,945,180
N50
transcript
size
1,409
1,498
Average
transcript
size
1,105
1,221
With
help
from
Kevin
Childs
25. OrthoMCL
analysis
• OrthoMCL
DB
(web-‐based)
– hpp://www.orthomcl.org/orthomcl/
– search
against
predefined
sets
of
orthologous
groups
from
a
set
of
organisms
26. OrthoMCL
analysis
Steps:
1.
All-‐vs-‐all
BLASTP
of
the
proteins
2.
Compute
percent
match
length
-‐
Select
whichever
is
shorter,
the
query
or
subject
sequence.
Call
that
sequence
S.
-‐
Count
all
amino
acids
in
S
that
par,cipate
in
any
HSP.
-‐
Divide
that
count
by
the
length
of
S
and
mul,ply
by
100.
3.
Apply
thresholds
to
blast
result.
Keep
matches
with
E-‐value
<
1e-‐5,
percent
match
length
>=
50%.
4.
Find
poten,al
inparalog,
ortholog
and
co-‐ortholog
pairs
using
the
Orthomcl
Pairs
program
(These
are
the
pairs
that
are
counted
to
form
the
Average
%
Connec,vity
sta,s,c
per
group).
5.
User
the
MCL
program
to
cluster
the
pairs
into
groups.
orthomclResults/
1. orthologGroups
a
map
between
your
proteins
and
OrthoMCL
groups.
2. paralogPairs
reciprocal
best
hits
among
those
proteins
in
your
genome
3.
that
were
not
mapped
to
OrthoMCL
groups
4. paralogGroups
the
proteins
in
paralogPairs
clustered
into
groups
by
the
mcl
program
27. OrthoMCL
analysis
orthologGroups
your_protein,
orthomcl_group,
seq_id_of_best_hit,
evalue_man7ssa,
evalue_exponent,
percent_iden7ty,
percent_match
• Downloaded
the
“category”
,
“species
name”,
and
“abbrevia,on”
info
from
the
website
• Used
perl
scripts
to
add
the
corresponding
species
name
and
category
to
the
orthologousGroups
file
• Calculated
#
of
orthologous
groups
in
each
category
28. OrthoMCL
analysis
category
abbrevia,on
Archaea
ARCH
Bacteria
FIRM
Bacteria
OBAC
Bacteria
PROT
Fungi
FUNG
Metazoa
META
other
Eukaryota
OEUK
Pro,st
ALVE
Pro,st
AMOE
Pro,st
EUGL
Viridiplantae
VIRI
29. Orthologous
groups
category
abbrevia
,on
Archaea
ARCH
Bacteria
FIRM
Bacteria
OBAC
Bacteria
PROT
Fungi
FUNG
Metazoa
META
other
Eukaryota
OEUK
Pro,st
ALVE
Pro,st
AMOE
Pro,st
EUGL
Viridiplantae
VIRI
FIRM:
Firmicutes
OBAC:
Other
Bacteria
PROT:
Proteobacteria
Bacteria
Pro,st
Sample
A
Sample
B
Sample
A+B
33. SRA
• Reads
from
DRR004446.sra
and
DRR004447.sra
are
exactly
the
same
• #
Run
#
of
Spots
#
of
Bases
Size
• 1.
DRR004446
14,841,025
2.7G
1.5Gb
• #
Run
#
of
Spots
#
of
Bases
Size
• 1.
DRR004447
14,841,025
2.7G
1.5Gb
34. Take
home
message
• You
can’t
be
over
cau,ous
with
NGS
data!
• Always
do
QC
before
further
analysis!
hpp://en.wikipedia.org/wiki/DNA_sequencing