This document summarizes an ncRNA analysis pipeline called nc-aReNA. It describes how nc-aReNA can be used to classify and analyze small non-coding RNAs from deep RNA sequencing data. The pipeline performs tasks such as adapter removal, quality control checks, mapping reads to reference databases to identify known ncRNA classes, filtering of rRNA sequences, quantification of ncRNA expression, differential expression analysis, and identification of isomiRs. Two test cases demonstrating ncRNA classification and differential expression analysis using nc-aReNA on mouse datasets are also described.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
From Expression to Pathways Using Online ToolsAli Kishk
Microarray and RNA seq analysis using Online Tools
Content:
Microarray Types
Microarray Vs RNA-Seq
Transcriptomic Database
Network Vs Enrichment Vs Pathway
Connectivity Map
GEO2Enrichr
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
From Expression to Pathways Using Online ToolsAli Kishk
Microarray and RNA seq analysis using Online Tools
Content:
Microarray Types
Microarray Vs RNA-Seq
Transcriptomic Database
Network Vs Enrichment Vs Pathway
Connectivity Map
GEO2Enrichr
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
Discover new cases studies giving you unprecedented access to both the data and results of how RNA-Seq is being applied successfully from bench to bedside
Gain new insights into RNA-Seq for the study of toxicity, IO, host-viral interactions and more from companies such as BMS, Janssen, Pfizer, Merck, UCSC and Stanford
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
Ion AmpliSeq™ sequencing is one of the most promising applications
of the Ion Torrent NGS platform. It involves multiplex PCR for target
enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to
customers to assist assay designs. While more and more people are
adopting Ion AmpliSeq technologies, challenges for assay designs
started to emerge. Here we present bioinformatics approaches to
improve the following areas of assay design: 1) assay specificity; 2)
primer quality control; 3) SNP under primer; and 4) flexibility to adapt
to different applications of Ion AmpliSeq sequencing including variant
calling, copy number variation detection, RNA expression, gene fusion
detection, and metagenomics. Design algorithms are developed to
ensure high coverage with controlled risk of amplification efficiency,
off-target reads and SNP effects. With the optimized design algorithm,
numerous custom and community research panels have been
created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and
CFTR Panel.
Until recently, the properties and compositions of the microbiota in the planet are still largely a black box. Next generation sequencing (NGS) has proven to be an invaluable tool for investigating diverse environmental and host-associated microbial communities, helping to generate enormous new data sets that can be mined for information on the composition and functional properties of vastly great numbers of microbial communities.
Summary: ENViz performs enrichment analysis for pathways and gene ontology (GO) terms in matched datasets of multiple data types (e.g. gene expression and metabolites or miRNA), then visualizes results as a Cytoscape network that can be navigated to show data overlaid on pathways and GO DAGs.
Background: Modern genomic, metabolomics, and proteomic assays produce multiplexed measurements that characterize molecular composition and biological activity from complimentary angles. Integrative analysis of such measurements remains a challenge to life science and biomedical researchers. We present an enrichment network approach to jointly analyzing two types of sample matched datasets and systematic annotations, implemented as a plugin to the Cytoscape [1] network biology software platform.
Approach: ENViz analyses a primary dataset (e.g. gene expression) with respect to a ‘pivot’ dataset (e.g. miRNA expression, metabolomics or proteomics measurements) and primary data annotation (e.g. pathway or GO). For each pivot entity, we rank elements of the primary data based on the correlation to the pivot across all samples, and compute statistical enrichment of annotation sets in the top of this ranked list based on minimum hypergeometric statistics [2]. Significant results are represented as an enrichment network - a bipartite graph with nodes corresponding to pivot and annotation entities, and edges corresponding to pivot-annotation pairs with statistical enrichmentscores above the user defined threshold. Correlations of primary data and pivot data are visually overlaid on biological pathways for significant pivot-annotation pairs using the WikiPathways resource [3], and on gene ontology terms. Edges of the enrichment network may point to functionally relevant mechanisms. In [4], a significant association between miR-19a and the cell-cycle module was substantiated as an association to proliferation, validated using a high-throughput transfection assay. The figures below show a pathway enrichment network, with pathway nodes green and miRNAs gray (left), network view of the edge between Inflammatory Response Pathway and mir-337-5p (center), and GO enrichment network with red areas indicating high enrichment for immune response and metabolic processes (right).
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
Bioinformatica e genomica comparata: nuove strategie sperimentali e computazionali per la produzione e analisi di dati NGS finalizzati a sviluppare processi e prodotti innovativi per la salute dell’uomo, l’ambiente e l’agroalimentare.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
Discover new cases studies giving you unprecedented access to both the data and results of how RNA-Seq is being applied successfully from bench to bedside
Gain new insights into RNA-Seq for the study of toxicity, IO, host-viral interactions and more from companies such as BMS, Janssen, Pfizer, Merck, UCSC and Stanford
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
Ion AmpliSeq™ sequencing is one of the most promising applications
of the Ion Torrent NGS platform. It involves multiplex PCR for target
enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to
customers to assist assay designs. While more and more people are
adopting Ion AmpliSeq technologies, challenges for assay designs
started to emerge. Here we present bioinformatics approaches to
improve the following areas of assay design: 1) assay specificity; 2)
primer quality control; 3) SNP under primer; and 4) flexibility to adapt
to different applications of Ion AmpliSeq sequencing including variant
calling, copy number variation detection, RNA expression, gene fusion
detection, and metagenomics. Design algorithms are developed to
ensure high coverage with controlled risk of amplification efficiency,
off-target reads and SNP effects. With the optimized design algorithm,
numerous custom and community research panels have been
created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and
CFTR Panel.
Until recently, the properties and compositions of the microbiota in the planet are still largely a black box. Next generation sequencing (NGS) has proven to be an invaluable tool for investigating diverse environmental and host-associated microbial communities, helping to generate enormous new data sets that can be mined for information on the composition and functional properties of vastly great numbers of microbial communities.
Summary: ENViz performs enrichment analysis for pathways and gene ontology (GO) terms in matched datasets of multiple data types (e.g. gene expression and metabolites or miRNA), then visualizes results as a Cytoscape network that can be navigated to show data overlaid on pathways and GO DAGs.
Background: Modern genomic, metabolomics, and proteomic assays produce multiplexed measurements that characterize molecular composition and biological activity from complimentary angles. Integrative analysis of such measurements remains a challenge to life science and biomedical researchers. We present an enrichment network approach to jointly analyzing two types of sample matched datasets and systematic annotations, implemented as a plugin to the Cytoscape [1] network biology software platform.
Approach: ENViz analyses a primary dataset (e.g. gene expression) with respect to a ‘pivot’ dataset (e.g. miRNA expression, metabolomics or proteomics measurements) and primary data annotation (e.g. pathway or GO). For each pivot entity, we rank elements of the primary data based on the correlation to the pivot across all samples, and compute statistical enrichment of annotation sets in the top of this ranked list based on minimum hypergeometric statistics [2]. Significant results are represented as an enrichment network - a bipartite graph with nodes corresponding to pivot and annotation entities, and edges corresponding to pivot-annotation pairs with statistical enrichmentscores above the user defined threshold. Correlations of primary data and pivot data are visually overlaid on biological pathways for significant pivot-annotation pairs using the WikiPathways resource [3], and on gene ontology terms. Edges of the enrichment network may point to functionally relevant mechanisms. In [4], a significant association between miR-19a and the cell-cycle module was substantiated as an association to proliferation, validated using a high-throughput transfection assay. The figures below show a pathway enrichment network, with pathway nodes green and miRNAs gray (left), network view of the edge between Inflammatory Response Pathway and mir-337-5p (center), and GO enrichment network with red areas indicating high enrichment for immune response and metabolic processes (right).
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
Bioinformatica e genomica comparata: nuove strategie sperimentali e computazionali per la produzione e analisi di dati NGS finalizzati a sviluppare processi e prodotti innovativi per la salute dell’uomo, l’ambiente e l’agroalimentare.
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...QIAGEN
Traditional RNA sequencing (RNA-Seq) is a powerful tool for expression profiling, but is hindered by PCR amplification bias and inaccuracy at low expressing genes. QIAseq RNA is a flexible and precise tool developed for mitigating these complications, allowing digital gene expression analysis. This in-depth webinar will cover sample requirements, experimental design, NGS platform-specific challenges and workflow for gene enrichment, library prep and sequencing. The applications of QIASeq RNA Panels in cancer research, stem cell differentiation and elucidating the effects small molecules on signaling pathways will be highlighted.
Meeting the challenges of miRNA research: miRNA and its Role in Human Disease...QIAGEN
miRNA plays a critical role in many biological processes such as differentiation and development, cell signaling, response to infection and more. This slideshow will cover the biology of miRNA, the key challenges associated with miRNA research and the latest advances in miRNA research technology.
RNA profiling is a powerful technique for understanding cellular origins and disease states. Recent studies in a variety of diseases have revealed RNA signatures that are excellent biomarker candidates for understanding disease status and predicting progression.
Suppose you want to discover a biomarker. What are the major steps in discovering a biomarker when you start from a blood sample? Here is the story of a researcher who is trying to find blood-based biomarkers in autism spectrum disorders.
ABSTRACT- Long non-coding RNAs (lncRNAs) are a group of longer than 200 nucleotides which are the largest and more diverse transcripts in the cells. After study from Functional Annotation of Mammalian cDNA, lncRNAs demonstrated some special characteristics such as lower quantity, higher tissue-specificity, higher stage specificity and higher cell subtype specificity. The current evidence from tumor diseases suggests that lncRNAs are an important regulatory RNA present at tumor cells, and therefore their alterations are associated with tumorigenesis and tumor diseases. Here we presented a clinical landscape of lncRNA including detection of lncRNA and their clinical application such as diagnosis biomarkers and therapeutic targets. We also discussed the challenges and resolving strategies for these clinical applications.
Key-words- Long non-coding RNA (lncRNA), Transcripts, sampling, Tumor and tumorigenesis
Extending miRQC’s dynamic range: amplifying the view of Limiting RNA samples ...QIAGEN
The original microRNA quality control (miRQC) study provided an in-depth analysis of commercially available microRNA (miRNA) quantification platforms. Specifically, twelve different
microarray, real-time PCR and small RNA sequencing platforms were assessed for reproducibility, sensitivity, accuracy, specificity and concordance of differential expression using a variety of sample types. Overall, each platform exhibited specific strengths and weaknesses, leading to the
final suggestion that a platform should be chosen on the basis of the experimental setting and the specific research questions. With this suggestion in mind, and the fact that liquid miRNA biopsies are an area of intense interest, we sought to expand the original miRQC study. For our “miRQC extension,” we benchmarked the QIAGEN miScript® PCR System with and without preamplification, and included a specific focus on routinely used biofluids. Concurrently, we benchmarked the miScript PCR System against another SYBR® Green miRNA detection platform. Overall, QIAGEN miScript demonstrated strong reproducibility and accuracy as well as superior detection rate and sensitivity in biofluids. Collectively, QIAGEN miScript provides the leading solution for novel miRNA discoveries.
Analyzing Fusion Genes Using Next-Generation SequencingQIAGEN
Fusion genes are hybrid genes formed by the fusion of two separate genes. Translocation, interstitial deletion and chromosomal inversions are some of the genetic events that can lead to the formation of fusion genes. The occurrence of fusion genes and its implications in cancer have already been known, but the emergence of NGS technology – especially RNA sequencing – offers the potential to detect novel gene fusions. You can learn more about fusion genes and applying NGS to detect them at our upcoming webinar, presented by Raed Samara, Ph.D., QIAGEN’s Global Product Manager for NGS technologies.
In this webinar, Dr. Raed Samara will cover:
1. Fusion genes: what they are and a historical perspective
2. Fusion gene detection: the current status
3. RNA sequencing vs. digital RNA sequencing
4. How to detect and accurately quantify novel fusion genes in your sample
The study of the complete set of RNAs (transcriptome) encoded by the genome of a specific cell or organism at a specific time or under a specific set of conditions is called Transcriptomics.
Transcriptomics aims:
I. To catalogue all species of transcripts, including mRNAs, noncoding RNAs and small RNAs.
II. To determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications.
III. To quantify the changing expression levels of each transcript during development and under different conditions.
Participants of the workshop learn the necessary background information and techniques to diagnose Sars-CoV-2 using the mobile diagnostic laboratory. The laboratory is shipped ready to use with all devices, reagents, certificates, and protocols. After one day of preparation together with a local assistant, a five-day course is given where every step is carried out by each participant. Experts accompany the learning process with written teaching materials, video training, virtual live coaching, and short exams to verify the learned content.
This slidedeck presents a simple and accurate real-time PCR system for relevant biological pathway- and disease-focused mRNA and long noncoding RNA (lncRNA) expression profiling. Learn about the stringent performance built into the technology to ensure its sensitivity, specificity, reproducibility and reliability. Application examples are also presented.
Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...eventi-ITBbari
MEWAs (Mitochondriome-Exome Wide Associations): sviluppo di un sistema bioinformatico per studi di associazione fra l’intero esoma nucleare e il DNA mitocondriale in fenotipi fisiologici o patologici.
Flavio Licciulli – Ricerca bioinformatica e sue applicazioni per l’identifica...
BiPday 2014 -- Tulipano Angelica
1. CNR – Istituto di Tecnologie Biomediche di Bari
nc-aReNA: an integrated
resource for small non-coding
RNA functional annotation
Angelica Tulipano
2. Small non-coding RNAs
(sncRNAs) serve as
regulatory molecules in
a number of different
organisms
MicroRNA (miRNA): post-transcriptional regulatory genes
PIWI-interacting RNA (piRNA): germline transposon silencing
Small interfering RNA (siRNA): active molecules in RNA interference
Small nuclear RNA (snRNA): includes spliceosomal RNAs.
Small nucleolar RNA (snoRNA): involved in rRNA modification
Long non-coding RNA (lncRNA): little is known about them, involved
in mRNA regulation
The RNA World
BiP-Day, 19 Dicembre 2014
3. To address these questions, new systematic gene-discovery approaches need
to be developed that are specifically aiming at the ncRNA discovery
High-Throughput Sequencing technologies enable such a research
The new spectrum of NGS applications together with the massive amount of
data requires focused investments and development of bioinformatics tools
managing and analysing such complex and large datasets to infer biological
meaning.
Why non-coding RNA?
• How many ncRNA genes are there?
• How important are they?
• Which functions does a cell delegate to RNA
instead of protein and why?
BiP-Day, 19 Dicembre 2014
The RNA World
4. Identification and classification of reads in known functional ncRNA classes
and dataset export
Identification and filtering of reads mapping to ribosomal RNAs and mtDNA
transcripts
Quantification of ncRNA expression and differential expression analysis
Graphical visualization of sample expression profiles in different conditions
and at different time courses
Creation of a collection of unclassified reads, useful for the prediction of
novel ncRNAs
A bioinformatics pipeline to classify and analyze
small non-coding RNAs (sncRNAs) in deep RNA sequencing
BiP-Day, 19 Dicembre 2014
The ITB nc-aReNA Platform
5. Differential
expression
Bioinformatics workflow for ncRNA analyses
BiP-Day, 19 Dicembre 2014
Raw Sequencing
Data
Adapter &
Barcode
identification
and removal
Size filtering &
General Statistics
Cleaned Sequence
Data (fastq, fasta)
Reads Mapping
data-warehouse
Quality control
check
isomiR
identification
6. Sequence Processing:
Adapter & Barcode identification and removal
detect barcode sequence and separate multiplexed experiments
trim barcode and 3’-adapter fragment
3’ adapter
~20 bases
barcode
(if multiplexed)
4-6 bases
small RNA
18-30 bases
5’ adapter
BiP-Day, 19 Dicembre 2014
7. Sequence Processing
BiP-Day, 19 Dicembre 2014
QC check and general statistics
FastQC: A Quality Control tool for High Throughput
Sequence Data
http://www.bioinformatics.babraham.ac.uk/projec
ts/fastqc/ by S. Andrews
8. Reads Mapping:
BiP-Day, 19 Dicembre 2014
Bowtie
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient
alignment of short DNA sequences to the human genome. Genome Biol 10:R25.
Mapping on
ncRNADB
reference
Mapping on
reference genome
unmapped
mapped
annotated
not
annotated
unmappedmapped
classified
ncRNA
classified
ncRNA
Export
cleaned
sequences
data-warehouse
miRNA
precursors
isomiR
identification
10. Reference database contains several classes of ncRNAs
Some reads map to more than one reference sequence in
different classes
Software dealing with multiple mapping: RSEM
BiP-Day, 19 Dicembre 2014
Li and Dewey, RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
BMC Bioinformatics 2011, 12:323
Multiple mapping: Multireads
Example of multimapping:
processed_transcript | miRNA_primary_transcript
lincRNA | miRNA_primary_transcript
antisense | miRNA_primary_transcript
lncRNA | miRNA_primary_transcript
rRNA | piRNA
rRNA | lincRNA
lincRNA | snoRNA
snoRNA | piRNA
11. IsomiRs
IsomiRs are defined as variations (isoforms) of a
mature microRNA
These variants were originally dismissed as
experimental artifacts
IsomiRs have demonstrated to be actively associated
with the RISC and the mRNA translation machinery
IsomiRs are real physiological miRNA variants
BiP-Day, 19 Dicembre 2014
13. IsomiR identification: isomiRID
de Oliveira LF, Christoff AP, Margis R. isomiRID: a framework to identify microRNA isoforms, Bioinformatics. 2013 Oct 15;29(20):2521-3
BiP-Day, 19 Dicembre 2014
14. Differential expression analysis
Comparison of ncRNA counts in different experimental
conditions:
- sample vs. control
- time course samples
Statistical test:
- Fisher test (no biological replicates)
- T-test or Wilcoxon test (with biological replicates)
BiP-Day, 19 Dicembre 2014
16. Test Case 1: ncRNA classification
BiP-Day, 19 Dicembre 2014
rRNA 47.1%
miRNA_primary_transcript
46.2%
piRNA 2.9%
tRNA 2.3%
snoRNA 0.7% lncRNA 0.2%
lincRNA 0.2%
ncRNA counts
categories
17. Test Case 2: DE analysis
BiP-Day, 19 Dicembre 2014
Characterization of miRNA expression profiles
in Mus musculus time course dataset
18. Test Case 2
BiP-Day, 19 Dicembre 2014
data-warehouse
Gene Ontology
ncRNA
expression
profile
BioinformaticsAnalyses
small
RNA-Seq
TarBase
Characterization of miRNAs expression
profiles in Mus musculus time course dataset
Metadata
•Mus musculus
•Muscle & Skin
•T0: Mock
•T1: 3 hrs
•T2: 24 hrs
19. ncRNA research activities
BiP-Day, 19 Dicembre 2014
Immune response in mouse
Plant – Viroid interactions in peach tree, grapevine
and tobacco
Multidrug resistence in dog cell lines
miRNA driven methylation profile in Arabidopsis
miRNAs expression profile in Amyotrophic Lateral
Sclerosis and interaction with biomarkers of clinical
feature
Collaborations
CNR – IVV, Bari
• Francesco Di Serio
• Beatriz Navarro
• Livia Stavolone
• Fabrizio Cillo
University of Bari
Dept. of Pharmacy
• Nicola Colabufo
• Antonio Carrieri
CSIC – UPV, Valencia, Spain
• Ricardo Flores
Work in progress:
Web portal
Statistical analysis of isomiR variants
Prediction of novel miRNA
20. The Team
BiP-Day, 19 Dicembre 2014
Consiglio Arianna
De Caro Giorgio
D’Elia Domenica
Gisel Andreas
Grillo Giorgio
Licciulli Flavio
Liuni Sabino
Losito Nicola
Tulipano Angelica
Abbiamo sviluppato nc-arena, una piattaforma integrata per l’analisi e l’annotazione funzionale di small-ncRNA provenienti da dati di sequenziamento NGS
E’ ormai noto come la complessità biologica sia correlata con la percentuale di genoma che non è protein-coding: nel caso dei mammiferi solo il 2% del genoma codifica per RNA messaggero, la maggior parte è trascritta come long e short non coding RNA, molecole funzionali che risultano regolatori chiave in numerosi processi cellulari.
I ncRNA sono classificati in molte differenti categorie a seconda della loro attività funzionale, ad esempio silenziamento genico trascrizionale e post-trascrizionale, disgenesi degli ibridi, inattivazione del cromosoma X.
Tuttavia sono molte le questioni ancora aperte sul mondo ncRNA:
Quanti tipi di ncRNA ci sono?
Che importanza hanno? Quali funzioni vengono delegate a loro da una cellula, anzichè ad una proteina?
Per dare delle risposte adeguate a queste domande è necessario sviluppare approcci di gene discovery finalizzati alla scoperta e allo studio del ncRNA.
Le tecnologie NGS rendono possibile una tale ricerca.
L’ampio spettro di applicazioni che esse hanno, insieme all’enorme quantità di dati che producono, rendono necessari sforzi e investimenti per lo sviluppo di tools bionformatici per la gestione e l’analisi di tali dataset per ottenerne nuova conoscenza biologica.
(In the figure: X-chromosome inactivation (XCI) in mammals relies on XIST, a long noncoding transcript that coats and silences the X chromosome in cis)
ncaReNA è una piattaforma bioinformatica che include una serie di pipeline e strumenti per la:
Identificazione e classificazione di reads in classi funzionali di ncRNA, con la possibilità di esportare i dati
L’individuazione di reads che mappano su RNA ribosomiale e trascritti di DNA mitocondriale
La quantificazione dell’espressione del ncRNA e l’analisi di espressione differenziale
La visualizzazione grafica dei profili di espressioni dei vari campioni in differenti condizioni o a differenti tempi
La creazione del set di reads nn classificate che possono essere utilizzate per la predizione di nuovi ncRNA
Questo è lo schema generale del workflow di ncaReNA: i dati grezzi di sequenziamento passano attraverso un primo step di processing delle sequenze, un secondo step di mappaggio e annotazione funzionale delle sequenze e poi altre funzionalità come l’analisi di espressione differenziale e l’identificazione di isomir.
Tutti i dataset risultanti da tali passi sono raccolti in un datawarehouse locale da cui possono essere esportati dall’utente
La prima fase di reads processing prevede la ricerca e la rimozione del frammento di adattatore 3’ nelle reads; in caso di sequenziamento di esperimenti multipli, essi si differenziano per il frammento di barcode nella read: in tal caso si cercano i diversi barcode per separare gli esperimenti e viene fatto il trimming del frammento
Le reads così processate vengono analizzate da un tool per il controllo di qualità, FASTQC. Con esso l’utente ottiene grafici relativi alla qualità delle sequenze (come la distribuzione dl quality score per base) e a utili statistiche, come la distribuzione delle lunghezze delle reads
La seconda fase di mapping consiste nel mapping tramite l’algoritmo bowtie. Dapprima le reads sono mappate su un reference database di ncRNA. Le sequenze che mappano danno quindi la classificazione nelle diverse categorie di ncRNA. Tra queste vengono selezionate in particolare I precursori di miRNA per l’identificazione di isomer. Le reads che non mappano sulla reference nc vengono mappate sul genoma di riferimento dell’organism..
Le reads che mappano sul genoma sono divise in reads annotate, che quindi vanno ad aggiungersi ai ncRNA individuate nella prima fase,, e in reads non annotate. Queste possono rappresentare potenziali nuovi ncRNA da ricercare mediante algoritmi di predizione come mirdeep. Le reads che non mappano neanche sul genoma possono rappresentare artefatti o contaminazioni e con un assembly potrebbero dare infomazioni in merito. Tutti I risultati del mapping sono raccolti nel dw e possono essere esportati dlal’utente per ulteriori analisi. Al momento sono inclusi genomi di uomo, mouse e arabidopsis., mentre nel reference non coding db vi sono anche altre specie, dato che I due processi di mapping sono indipendenti
Il database di riferimento dei ncRNA creato appositamente per la piattaforma raccoglie tutte le informazioni dai maggiori database per il mappaggio e l’annotazione dei ncRNA, assicurando così una completezza di informazioni.
La ridondanza dei dati relativi alla stessa classe provenienti da differenti sorgenti è eliminata utilizzando cross-link e identità di sequenza
Il database di riferimento nncoding contine diverse classi di ncRNA e si osserva che frequentemente una stessa reads può mappare su diverse classi e quindi essere classificata in maniera nn univoca. Il problema del multimapping in letteratura viene spesso non considerato, scartando a priori tali reads. Noi abbiamo implementato nella piattaforma RSEM, un software adatto a trattare il multiple mapping: utilizzando l’output del mapping corregge il conteggio degli allineamenti con un metodo probabilistico probabilistico, usando metodi bayesiani
Sulla piattaforma ncarena è stata aggiunta inoltre la possibilità per l’utente di individuare e annotare gli isomir. Queste sono variazioni (isoforme) di un maturo miRNA.
Vi sono due tipi principali di varianti isomir, dovute a un errato o alternativo taglio da parte del Dicer o d RNA editing. La classificazione degli isomir si basa su tre principali categorie: 5′, 3′ e polymorphic isomiRs
Il software che abbiamo implementato nella piattaforma per la ricerca delle isoforme è isomiRID. Esso prende come input le reads annotate come precursori di miRNA e ne fornisce gli eventuali isomir trovati con le relative frequenze.
Se l’utente ha differenti dataset la piattaforma dà la possibilità di effettuare l’analisi di espressione differenziale confrontando i conteggi dei ncRNA nelle differenti condizioni sperimentali: campione vs controllo, differenti tempi.
l’utente può scegliere tra due tipi di test statistici, a seconda che vi siano o meno repliche biologiche.
Questo è il design del processing della pipeline. Il processo è stato disegnato e implementato utilizzando pentaho data integration tool (pdi), che ha la caratteristica di essere modulare e quindi facilemente modificabile.
Abbiamo testato la pattaforma su dataset disponibili su banche dati pubbliche. Questo ad esempio è il risultato della classificazione dei ncrna nelle diverse categorie con relativi grafici.
Abbiamo anche testato la piattaforma su dei dati di Mus Musculus da piattaforma Illumina per la caratterizzazione dei profile di espressione di miRNA a 3 differenti tempi. Questa è la visualizzaizone grafica dei risultati dell’analisi di espressione differenziale. Attraverso questa interfaccia grafica l’utente può navigare attraverso risultati e l’annotazione degli smallRNA.
L’annotazione funzionale dei risultati di espressione differenziale è stata ottenuta attraverso l’integrazione nel datawerhouse di GO, Sequence Ontology e Pathway database e microRna target interaction databases
Stiamo lavorando allo siluppo del portale web per rendere accessibile pubblicamente il sistema.
http://ncarena.ba.itb.cnr.it
Inoltre stiamo implementando anche la possibilità di effettuare l’analisi statistica degli isomir e vogliamo implementare mirdeep
come algoritmo per la predizione dei novel miRNA.
Con la piattaforma abbiamo già analizzato dati di sequenziamento nell’ambito di alcune collaborazioni, su umano, topo e anche piante, ottimizzandone in questa maniera funzionalità e caratteristiche.
La realizzazione della piattaforma è frutto del lavoro di tutto il gruppo di biornformatica dell’itb.