This document outlines the TUXEDO protocol for analyzing RNA-Seq data. It describes the basic steps as: 1) Alignment of RNA-Seq reads to a reference genome or transcriptome using splice-aware aligners like TopHat. 2) Quantification of gene and transcript expression levels using Cufflinks. 3) Quality control checks and 4) Detection of differential expression between conditions using Cuffdiff. Key points covered include RNA-Seq methodology, gene expression, alignment formats, FPKM normalization, and quality control metrics.
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
First part of the training session 'RNA-seq for Differential expression' analysis. We explain how we can detect differential expression based on RNA-seq data. Interested in following this session? Please contact http://www.jakonix.be/contact.html
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
First part of the training session 'RNA-seq for Differential expression' analysis. We explain how we can detect differential expression based on RNA-seq data. Interested in following this session? Please contact http://www.jakonix.be/contact.html
RNA Sequence data analysis,Transcriptome sequencing, Sequencing steady state RNA in a sample is known as RNA-Seq. It is free of limitations such as prior knowledge about the organism is not required.
RNA-Seq is useful to unravel inaccessible complexities of transcriptomics such as finding novel transcripts and isoforms.
Data set produced is large and complex; interpretation is not straight forward.
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the American Chestnut & Chinese Chestnut Genomics research community.
Apollo: A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the Manakin Genomics research community.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Hemiptera.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
Course: Bioinformatics for Biomedical Research (2014).
Session: 4.1- Introduction to RNA-seq and RNA-seq Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Open reading frame is part of reading frame that contains no stop codons or region of amino acids coding triple codons.
ORF starts with start codon and ends at stop codon.
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
Canadian Expert Patients in Health Technology Conference Nov 7-8, 2016 Day 1 Alicia Granados (Sanofi Genzyme) Adaptive Pathways and Lifecycle Approach (ADAPTSMART)
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the American Chestnut & Chinese Chestnut Genomics research community.
Apollo: A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
Apollo is a web-based, collaborative genomic annotation editing platform. We need annotation editing tools to modify and refine precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically.
This presentation is an introduction to how the manual annotation process takes place using Apollo. It is addressed to the members of the Manakin Genomics research community.
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
RNA-seq: A High-resolution View of the TranscriptomeSean Davis
The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project working on species of the order Hemiptera.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
Course: Bioinformatics for Biomedical Research (2014).
Session: 4.1- Introduction to RNA-seq and RNA-seq Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Open reading frame is part of reading frame that contains no stop codons or region of amino acids coding triple codons.
ORF starts with start codon and ends at stop codon.
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
Canadian Expert Patients in Health Technology Conference Nov 7-8, 2016 Day 1 Alicia Granados (Sanofi Genzyme) Adaptive Pathways and Lifecycle Approach (ADAPTSMART)
Digital Marketing:- SEO, SMO, Responsive Website, Web Hosting, Social Media Networking & Marketing, Page Promotion and Updating, Ads Management.
IT Management:- IT Administration, Hardware and Network, Installation Configuration and Maintenance.
Training:- Teaching IT Subjects, Programming, Digital Marketing, Database and Web Development.
El galvanizado o galvanización es el proceso electroquímico por el cual se puede
cubrir un metal con otro. Se denomina galvanización pues este proceso se
desarrolló a partir del trabajo de Luigi Galvani, quien descubrió en
sus experimentos que si se pone en contacto un metal con una pata cercenada de
una rana, ésta se contrae como si estuviese viva; posteriormente se dio cuenta de
que cada metal presentaba un grado diferente de reacción en la pata de rana, lo
que implica que cada metal tiene una carga
Piezoelectricity is not a new concept but its application in recent instrumentation and daily life field is noticeable. i have prepared this report for enhancing and making the new technologies and applications about piezoelectronics known among readers. Don't forget to give feedback
The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies
https://www.shamra.sy/academia/show/5b06e01c54e75
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
The i5K, an initiative to sequence the genomes of 5,000 insect and related arthropod species, is a broad and inclusive effort that seeks to involve scientists from around the world in their genome curation process, and Apollo is serving as the platform to empower this community.
This presentation is an introduction to Apollo for the members of the i5K Pilot Project on Eurytemora affinis
2. OUTLINE
RNA-Seq
Biology of Gene Expression
Concept of Genes and Transcripts
Alignment
RNA-Seq specific aligners
Tophat, STAR
SAM/BAM format
Quantification
cufflinks(gene level, isoform level, splicing…..)
Alternative methods
Post-Alignment Quality Control
Other analysis with RNA-Seq
3. RNA-Seq
Diagram Describes a
summary of the RNA-Seq
Technique: (1) RNA is
isolated from a panel of
tissues or treatments. (2) A
pool of core tissues (e.g.,
leaf, root, flower, fruit) are
used to create the
reference cDNA library
which is then sequenced
accordingly.
4. RNA-Seq
• RNA-seq basically seeks to answer the following question:
What genes are expressed in a sample?
What transcripts are expressed for each gene?
What are the genes, and transcripts?: what are the expression
levels?
How do expression levels and splicing pattern differ between
two conditions?
5. Biology of Gene Expression
The process of gene
expression is very
complex. “Genes takes a
body of their own in
process called gene
expression”. The process
starts with transcription
during which RNA-
polymerase creates a copy
of the gene, nucleotide by
nucleotide as a single
stranded molecule.
While in the
nucleus(highly unstable):5’
capping, 3’ cleavage and
addition of long polyA tail
to stabilize.
6. Genes are encoded in
the genome occupying
a specific location.
Genes consist of:
Informative block
“Exon” and non
informative block
“intron”. During
transcription the
introns are spliced out
and the exons are glued
together for form
mRNA.
8. Mapping
How to match millions of reads(~100 character fragments) with a reference
sequence of billions of characters.
9. Mapping for RNA-Seq
The transcript may result from splicing and the mapping strategy must account for
this.
Either we map to the transcriptome or we map to the genome with a “splice-aware”
aligner.
Tophat is one of the first and by far the most popular aligner for RNA-Seq data
it was built “on top” of bowtie
10. Mapping for RNA-Seq
It breaks the reads into pieces, maps first to the genome and then extends to
“possible splice junctions”.
We can also Pass an annotation file(GTF format) as arguments.
Extract transcript, build an index
Map to the “transcriptome” and then to the genome.
11. Tophat
Using annotation:
Improves accuracy, mostly around splicing junctions.
“bias” in favor of known transcripts
less power to detect novel transcripts or novel isoforms
Beware if you have an incomplete annotation
At the mapping step, it’s better to keep multi-mappers(within a reasonable limit,
10, 20 hits) Tophat provides an options control multiple mappings.
12. SAM/BAM FORMAT
Composed of two parts
Header to describe the source of the data, the reference sequence, the method of alignment and so on.
Alignments to describe the reads, the location and the nature of the alignments.
15. Quantification
In general, we want more than just alignment
in theory, RNA-Seq is a quantifying assay, and we want to measure gene expression.
16. Cufflinks
Based on alignment there are two goals:
Transcript assembly
Transcript quantification
17. Cufflinks
Assembly: Try to find the
minimal number of paths in a
graph to fully represent the
alignment.
Quantification: Estimate the
most likely abundance of the
difference isoforms.
18. Unit of Abundance
We can only measure relative measures based on X number of reads in library, Y map to
geneA and Z map to geneB. If we change X then both Y and Z will change.
Now if gene A and gene B have the same no of reads mapped
Do they have similar expression levels?
if gene B and gene A have the same size, Yes
otherwise, No because a longer gene will receive more reads than a shorter gene
19. FPKM
Fragment per Kilobase per millions of reads.
Not annotated gene size, but effective size
Effective length: number of possible start site on transcript(depends on the
estimated fragment size)
Millions of reads; millions of mapped reads(not millions of sequenced reads)
FPKM let you compare the expression of gene between samples(because it
account for differences in library depth)
It also lets you compare the expression of two genes within the same
sample (because it account difference in gene length).
20. Cufflinks and Cuffdiff
In addition to estimating expression, cufflinks output gene expression(more or less
the sum of the different isoforms)
cufflinks contains a method called cuffdiff for differential expression.
cuffdiff estimate the isoform expression in two groups(which can be composed of
multiple replicates) and performs statistical test for;
Differential gen expression
Differential transcript levels
Alternative splicing
Differential usage of transcription start sites
21. Quality Control
A number of matrices are important to look at:
Ribosomal contamination
Map the entire library against a set of ribosomal RNA
sequence and count the number reads mapping
Numbers of reads mapping, and number of reads
mapping uniquely.
Distribution of expression. Few high expressed,
some mid-expressed many low expression genes.
22. Quality Control
Other matrices:
Duplication rate(based on the location of the alignment) might bias the
estimation of the gene expression
% of the reads mapping of CDS, UTR, intron and intergenic regions,
obviously, the more on CDS and UTR the better.
Unsupervised clustering to verify that the samples cluster according to
biological differences and not according to experimental batches.
Hierarchical Clustering
PCA, MDS plot