This document provides an introduction to RNA sequencing (RNA-Seq) applications using next-generation sequencing technologies. It discusses how RNA-Seq can be used to identify which genes are expressed, detect differential gene expression between samples, identify splicing isoforms, and detect genetic variants and structural variations. The document reviews Illumina sequencing by synthesis, the most common platform, outlining the work flow from sample acquisition, RNA extraction and library preparation to sequencing. It also discusses considerations for different sample types and extraction methods.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
Course: Bioinformatics for Biomedical Research (2014).
Session: 4.1- Introduction to RNA-seq and RNA-seq Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
Alignment algorithms are not just about placing reads in best-matching locations to a reference genome. They are now being expected to handle small insertions, deletions, gapped alignment of reads across intron boundaries and even span breakpoints of structural variations, fusions and copy number changes. At the same time, variant-calling algorithms can only reach their full potential by being intimately matched to the aligner's output or by doing local assemblies themselves. Knowing when these tools can be expected to perform well and when they will produce technical artifacts or be incapable of detecting features is critical when interpreting any analysis based on their output.
This presentation will compare the performance of the alignment and variant calling tools used by sequencing service providers including Illumina Genome Network, Complete Genomics and The Broad Institute. Using public samples analyzed by each pipeline, we will look at the level of concordance and dive into investigating problematic variants and regions of the genome.
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
An update version of the genome assembly including the mention of techniques such as HiC and Bionano. Also include the QC. These are the same slides used in the course for the UNL in Argentina.
Course: Bioinformatics for Biomedical Research (2014).
Session: 4.1- Introduction to RNA-seq and RNA-seq Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
Alignment algorithms are not just about placing reads in best-matching locations to a reference genome. They are now being expected to handle small insertions, deletions, gapped alignment of reads across intron boundaries and even span breakpoints of structural variations, fusions and copy number changes. At the same time, variant-calling algorithms can only reach their full potential by being intimately matched to the aligner's output or by doing local assemblies themselves. Knowing when these tools can be expected to perform well and when they will produce technical artifacts or be incapable of detecting features is critical when interpreting any analysis based on their output.
This presentation will compare the performance of the alignment and variant calling tools used by sequencing service providers including Illumina Genome Network, Complete Genomics and The Broad Institute. Using public samples analyzed by each pipeline, we will look at the level of concordance and dive into investigating problematic variants and regions of the genome.
Next generation-sequencing.ppt-convertedShweta Tiwari
The advance version, sequences the whole genome efficiently with high speed and high throughput sequencing at reduce cost is termed as Next Generation Sequencing (NGS) or massively parallel sequencing (MPS).
The human genome is pervasively transcribed, giving rise to an increasing number of long non-coding RNA genes. Most of these genes are novel or poorly characterized, and their relevance in human health and disease remains elusive. In our lab, we have developed various tools to study lncRNAs, amongst others to assess their role in cancer. As such, we are looking for novel biomarkers and therapeutic targets. I will describe various tools and ongoing research programs, including a comprehensive annotated catalog of human lncRNAs (LNCipedia), a targeted screen for focal lncRNA copy number alterations, a web tool for antisense oligonucleotide design, Zipper plot to visualize the transcriptional activity of lncRNAs in their genomic context, decodeRNA functional context mapping, and probe based lncRNA capture sequencing in body fluids.
It contains information about- DNA Sequencing; History and Era sequencing; Next Generation Sequencing- Introduction, Workflow, Illumina/Solexa sequencing, Roche/454 sequencing, Ion Torrent sequencing, ABI-SOLiD sequencing; Comparison between NGS & Sangers and NGS Platforms; Advantages and Applications of NGS; Future Applications of NGS.
Introduction to Real Time PCR (Q-PCR/qPCR/qrt-PCR): qPCR Technology Webinar S...QIAGEN
This slidedeck introduces the concepts of real-time PCR and how to conduct a real-time PCR assay. The topics that are covered include an overview of real-time PCR chemistries, protocols, quantification methods, real-time PCR applications and factors for success.
Struggling with low editing efficiency or delivery problems? IDT has developed a simple and affordable CRISPR-Cas9 solution that outperforms other methods. In this presentation we present the advantages of using a Cas9:tracrRNA:crRNA ribonucleoprotein (RNP) complex in genome editing experiments, and explain why it is the most efficient driver for genome editing compared to alternative methods, such as expression plasmids or the use of sgRNAs. We also review RNP delivery using cationic lipids and electroporation, and provide tips for optimized transfection in your system.
PCR (polymerase chain reaction) and Extraction of DNA from fungal plant patho...AjayDesouza V
PCR, Polymerase chain reaction, types of PCR, Template DNA, DNA polymerase, Primers, Nucleotides (DNTPs or deoxynucleotide triphosphates ), Denaturation, Annealing, Extension, Types of PCR, Multiplex PCR.
Long-range PCR.
Single-cell PCR.
Fast-cycling PCR.
Methylation-specific PCR (MSP)
Hot start PCR
High-fidelity PCR.
RAPD: Rapid amplified polymorphic DNA analysis.
Detection of fungal plant pathogen using PCR, Extraction of DNA from plant tissues,PCR amplification and detection of diagnostic amplicon
The study of the complete set of RNAs (transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics.
Transcriptomics aims:
I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and small RNAs.
II. To determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications.
III. To quantify the changing expression levels of each transcript during development and under different conditions.
Cloud Native Analysis Platform for NGS analysisYaoyu Wang
Cloud Native Analysis Platform optimized for user-friendly large data set transfer from Dropbox to cloud infrastructure for data processing and analysis. It is particular tailored for easy Next Generation Sequence (NGS) fastq file transfer for rapid exome, RNASeq, small RNASeq, and amplicon analysis.
Cloud Native Analysis Platform for NGS analysisYaoyu Wang
Cloud Native Analysis Platform optimized for user-friendly large data set transfer from Dropbox to cloud infrastructure for data processing and analysis. It is particular tailored for easy Next Generation Sequence (NGS) fastq file transfer for rapid exome, RNASeq, small RNASeq, and amplicon analysis.
Bio-IT 2017 - Session 7: Next-Gen Sequencing InformaticsYaoyu Wang
WebMeV is a robust, open-source cloud based scalable data analysis software tool developed at the Dana-Farber Cancer Institute that uses intuitive visual interfaces to provide users with access to advanced data analysis methods. It will allow researchers and biotechnology companies considering tools for large scale genomic data analysis an alternative option to all the proprietary software.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
4. Central
Dogma
of
Molecular
Biology
Cartegni,
L.,
Chew,
S.
L.,
&
Krainer,
A.
R.
Listening
to
silence
and
understanding
nonsense:
exonic
muta%ons
that
affect
splicing.
Nature
Reviews
Gene/cs3,
285–298
(2002)
Figure
1
from:h[p://www.nature.com/scitable/topicpage/gene-‐expression-‐14121669
Copyright
2010,
Nature
Educa%on
5. The
complexity
of
gene
regula+on
Image
from:
Nature
Reviews
Gene/cs
12,
283-‐293
(April
2011)
Gene
Expression
is
influenced
by
a
variety
of
mechanisms:
-‐polymerase
binding
elements
-‐proximal
promoter
sequences
-‐upstream/downstream
and
distal
enhancers/silencers
-‐microRNA/RNAi
-‐natural
transcript
stability
and
recycling
6. What
ques+ons
do
we
want
to
answer?
SNP
and
Indel
Detec%on
REF
ATCGGTACCATCCAGCTAAGGCT
S1
ATCGGAACCATCCAGCTAACGCT
S2
ATCGGTACCATC-‐-‐-‐CTAAGGCT
S3
ATCGGAACCATCCAGCTAAGGCT
S4
ATCGGTA-‐-‐-‐-‐-‐-‐-‐-‐CTAAGGCT
• Which
genes
are
expressed?
• In
experiments
with
mul%ple
samples,
which
genes
exhibit
differen%al
expression?
• Can
we
detect
splicing
isoforms
expression?
• Can
we
detect
novel
genes
or
isoforms?
• Can
we
detect
structural
variants?
SNPs,
inser%ons,
dele%ons,
RNA-‐edi%ng.
• Can
we
detect
ncRNA
that
controls
gene
regula%on
• Can
we
use
differen%al
expression
to
construct
biomarkers
for
diseases?
7. Personalized
Cancer
Genomics
Muta+on
Transloca+on
Copy
Number
Varia+on
Epigene+c
Altera+on
Protein
altera+on
Transcriptomic
altera+on
T
*
8. What
is
RNASeq?
RNASeq
means
the
sequencing
of
RNA
using
NGS
technology,
which
means
that…..
• Any
type
of
RNA
from
any
sample
sources,
such
as
cell,
body
fluid,
stool,
water,
etc.
can
be
the
sequenced
• Sample
from
different
sample
source
require
different
extrac%on
method
• Different
RNA
species
with
different
sizes
(i.e.
miRNA,
snoRNA,
tRNA)
require
different
prepara%on
protocol
• RNASeq
very
strictly
refers
to
the
sequencing
of
mRNA
from
cells
in
this
course
9. What
is
RNASeq
Analysis?
• Also
known
as
Whole
Transcriptome
Shotgun
Sequencing
• Iden%fica%on
and
quan%fica%on
of
RNA
snapshot
from
a
genome
at
a
specific
%me
point
• Method
to
study
how
genes
are
being
regulated
for
a
give
cell
type
(i.e.
tumor
cells
v.s.
normal
cells)
at
a
given
%me
using
Next
Genera%on
Sequencing
(NGS)
13. Illumina
SBS
RNASeq
Work
Flow
Sample
Acquisi%on
RNA
Extrac%on
Library
Prepara%on
Sequencing
14. Illumina
SBS
RNASeq
Work
Flow
Sample
Acquisi%on
RNA
Extrac%on
Library
Prepara%on
Sequencing
Fresh
Frozen
Tissues
-‐ Sample
%ssues
freeze
to
-‐80C
or
immerse
in
liquid
nitrogen
shortly
aler
sample
extrac%on
-‐ All
RNA
is
intact
in
natural
form
but
with
slow
degrada%on
process
-‐ Produce
highest
quality
data
-‐ Expensive
to
keep
and
rare
to
acquire
Formalin
Fixed
Paraffin
Embedded
(FFPE)
Samples
-‐ Fix
sample
%ssues
in
paraffin
wax
immediately
aler
extrac%on
-‐ All
RNA
are
immediately
sheared
into
fragments
-‐ All
mature
mRNA
lost
poly-‐A
tail
-‐ Most
common
sample
available
from
clinic
-‐ Used
in
pathology
lab
-‐ Very
cheap
to
store
15. Illumina
SBS
RNASeq
Work
Flow
RNA
Extrac+on
Methods
Column
based
RNA
Extrac+on
-‐ Majority
of
the
vendor
RNA
Extrac%on
-‐ Fast
and
convenient
-‐ Can
lose
small
RNA
(<100bp)
if
not
careful
Phenol-‐Chloroform
RNA
Extrac+on
-‐ Cheap
but
labor
intensive
-‐ Much
higher
RNA
yield
compare
to
column
based
extrac%on
-‐ Preferred
method
for
low
quan%ty
RNA
sample
-‐ Isolate
both
long
(>100bp)
and
small
RNA
(<100bp)
simultaneously
Sample
Acquisi%on
RNA
Extrac%on
Library
Prepara%on
Sequencing
16. Illumina
SBS
RNASeq
Sample
Acquisi%on
RNA
Extrac%on
Library
Prepara%on
Sequencing
80%
15%
5%
RNA
Composi+on
within
an
eukaryo+c
cell
rRNA
tRNA
Other
RNA
• Pre-‐mRNA
and
mature
mRNA
composed
of
very
small
por%on
of
total
RNA
• MicroRNA,
ncRNA,
and
others
composed
of
even
smaller
number
17. Illumina
SBS
RNASeq
Library
Prepara%on
Work
Flow
for
mature
mRNA
-‐ RNA
Isola%on
-‐ Poly-‐A
Purifica%on
-‐ Fragmenta%on
-‐ Convert
RNA
to
cDNA
using
random
primers
-‐ Adapter
liga%on
-‐ Size
selec%on
-‐ PCR
amplifica%on
Sample
Acquisi%on
RNA
Extrac%on
Library
Prepara%on
Sequencing
19. Sequencing
Library
Structure
Adaptor
1
cDNA
insert
Adaptor
2
Barcode
Adaptor
–
58
bp
nucleo%de
sequence
to
fix
sequence
library
onto
flow
cell
Barcode
–
op%onal
index
sequence
that
is
typically
6
nucleo%de
bases
long
for
associa%ng
sequence
with
a
par%cular
sample
(can
be
present
on
both
adaptor)
cDNA
insert
–
fragmented
cDNA
sequence
generated
from
mRNA
of
interest.
The
insert
typically
range
between
300-‐500bp
for
mRNA
20. Illumina
SBS
RNASeq
Determine
Sequencing
Library
Quality
Qubit
(RNA)
Measures
the
concentra%on
of
only
double
stranded
DNA,
more
accurate
than
Nanodrop
Bioanalyzer
Measures
the
RNA/library
size
in
base
pairs
qPCR
Measure
the
concentra%on
of
library
that
has
adaptors
ligated
and
will
hybridize
and
sequence
Sample
Acquisi%on
RNA
Extrac%on
Library
Prepara%on
Sequencing
23. DNA
(0.1-1.0 ug)
"
Single molecule array"
Sample
preparation" Cluster growth"
5’"
5’"3’"
G"
T"
C"
A"
G"
T"
C"
A"
G"
T"
C"
A"
C"
A"
G"
T"
C"
A"
T"
C"
A"
C"
C"
T"
A"
G"
C"
G"
T"
A"
G"
T"
1 2 3 7 8 94 5 6
Image acquisition" Base calling"
T G C T A C G A T …
Sequencing"
Illumina
Sequencing
Technology
Robust
Reversible
Terminator
Chemistry
Founda/on
24. Sources
of
Error
• Reading
error
ccatg
-‐>
ccnng
• Single
base
error
ccatg
-‐>
ccttg
• Inser%on
ccatg
-‐>
ccatcg
• Dele%on
ccatg
-‐>
cc-‐tg
•
Homopolymer
errors
aaaaatg
-‐>
aaaa-‐tg
aaaaaatg
aaaaatag
Cycle
1
Cycle
n
Few
errors
per
cluster
Several
errors:
ambiguous
call
Sequencing
by
Synthesis
25. The
run
is
finished.
How
are
sequence
files
created?
.bcl
files
Data
Processing
• Demul%plexing
• Fastq
file
genera%on
• Sequencing
filtering
Raw
files
containing
base
calls
and
quality
scores
Illumina
defined
quality
filters
Split
into
Project
and
Sample
Folders
Jones_Lab
ChIP_A
ChIP-‐B
Marcus_Lab
RNA-‐SeqA
RNA-‐SeqB
RNA-‐SeqC
Williams_Lab
Exome1
Exome2
Fastq
Files
Fastq
Files
Fastq
Files
26. Illumina
Fastq
Format
Fasta
format
>seqID
CTTCAGACGAGTCGAGGAAAGGCTTTGCTGCTTTCCTTTACAGGGTGGGG
Fastq
format
@HWI-‐ST389:225:D18R8ACXX:5:1101:1421:2191
1:N:0:CCGTCC
CTTCAGACGAGTCGAGGAAAGGCTTTGCTGCTTTCCTTTACAGGGTGGGG
+
@@@DDDDFHHFCFFHIJIHIJGIFGIIHIIIJGIIJHIIJIIJIHDFHJE
Illumina
Fastq
header:
@<instrument>:<run
number>:<flowcell
ID>:<lane>:<%le>:<xpos>:<y-‐
pos><read>:<isfiltered>:<control
number>:<indexsequence>
27. Illumina
Fastq
Format
Quality
Scores
@@@DDDDFHHFCFFHIJIHIJGIFGIIHIIIJGIIJHIIJIIJIHDFHJE
Illumina
Fastq
header:
@<instrument>:<run
number>:<flowcell
ID>:<lane>:<%le>:<xpos>:<y-‐
pos><read>:<isfiltered>:<control
number>:<indexsequence>
• Each
nucleo%de
in
a
read
has
an
associated
quality
value
(1-‐40).
• The
numerical
value
is
encoded
as
an
ASCII
character
to
save
space.
• Each
q-‐value
represents
a
probability
that
the
nucleo%de
is
incorrect
at
that
posi%on:
Q(X)
=-‐10
log10(P(~X))
Quality
score
Q(A)
Error
probability
P(~A)
10
0.1
20
0.01
30
0.001
40
0.0001
Typical
cutoff
for
acceptable
quality
28. Visualizing
Quality
with
FASTQC
FASTQC
h[p://www.bioinforma%cs.babraham.ac.uk/projects/fastqc/
FASTQC:
A
quality
control
tool
for
high
throughput
sequence
data.
THE
GOOD
29. Visualizing
Quality
with
FASTQC
FASTQC
h[p://www.bioinforma%cs.babraham.ac.uk/projects/fastqc/
FASTQC:
A
quality
control
tool
for
high
throughput
sequence
data.
THE
BAD
30. Data
Quality
Assessment
• Evaluate
read
library
quality
– Determine
if
the
data
is
proper
generated
• No
informa%on
on
if
the
data
is
what
you
want
• Iden%fy
technical
ar%fact
• Iden%fy
poor
quality
samples
• Key
features
to
evaluate
– Uniformity
of
sequencing
quality
score
(phred
score)
– GC
content
distribu%on
– Level
of
sequencing
adapter
contamina%on
– Level
of
sequence
duplica%on
(may
caused
by
PCR
ar%fact,
rRNA
contamina%on,
bacterial
contamina%on)
• Filter
or
trim
data
as
needed
using
FASTX
31. Use
FASTQC
on
GALAXY
FASTQC
-‐
provide
a
simple
way
to
do
some
quality
control
checks
on
raw
sequence
data
coming
from
high
throughput
sequencing
pipelines.
(h[p://www.bioinforma%cs.babraham.ac.uk/projects/fastqc/)
GALAXY-‐
a
scien%fic
workflow,
data
integra%on,and
data
and
analysis
persistence
and
publishing
plaform
that
aims
to
make
computa%onal
biology
accessible
to
research
scien%sts
that
do
not
have
computer
programming
experience.
(h[ps://galaxyproject.org/)