There is an increasing amount of oncogenomic data available in the last years, and more is to come. The main challenges the scientific community is and will be facing are the integration of this data to extract new knowledge and the intuitive visualization of the results obtained in the analysis. Here two complementary but independent tools for the analysis of oncogenomic data are presented: IntOGen and GiTools.
IntOGen is a framework that includes public oncogenomic data and integrates it in different ways. Its main purpose is to identify those genes which are consistently altered (up or down-regulated) across many samples in a specific experiment, and combine all experiment from a same cancer type to end up having a p-value for a gene and cancer type. This same principle can then be applied to gene modules, or sets, which consist of groups of genes that share a biological property (module analysis). IntOGen has a web page from where the user can explore the datasets included in the database, from individual genes in all cancer types to different experiments, or gene modules (GO terms, KEGG pathways or user-defined groups of genes) across all the experiments.
GiTools is a desktop-based framework developed also by the lab which allows the analysis and visualization of genomic data. It supports different input formats (all plain text) and data can even be imported from BioMart, so everything stored in that database can be used directly in GiTools. Also there is an IntOGen data importer, so users can download matrices or oncomodules at different levels (experiments or combined results) and use them directly. Right now it can perform a limited number of analysis (enrichment analysis, correlations, results combination...) but it is built in a modular fashion and it can be easily expanded to include more matrix-based statistical tests. It allows the flexible exploration of the data and creating figures for papers from there directly, which can be exported in many different formats.
Two case studies are presented to illustrate the combined usefulness of these tools, aiming to answer two main questions: “what biological processes are enriched in genes siginificantly up-regulated in cancer?” and “what is the correlation between different tumour types for the pattern of genes up-regulated?”. Also different real applications of these tools are presented, both from published and unpublished research, stressing that they can be used not only in oncogenomics projects, but also in evolution and global gene regulation.
In the near future GiTools will be incorporating new analysis, such as GSEA and clustering, and connections with the R statistical framework. IntOGen will soon have a Biomart-compatible interface, which will make the data even more easily available.
This document summarizes resistance to hepatitis C virus (HCV) treatments. It discusses resistance to interferon-alpha ribavirin therapy, direct acting antivirals (DAAs) such as protease inhibitors and nucleotide analog inhibitors of HCV RNA polymerase, and resistance mechanisms. Key points are:
1) Treatment failure with interferon-alpha ribavirin is partly due to intrinsic HCV resistance properties. Resistance accounts for a small portion of treatment failures.
2) Resistance rapidly emerges to DAAs through amino acid substitutions near the drug binding site. This reduces drug susceptibility but also impairs viral fitness.
3) Resistance is a major challenge for DAA therapy and understanding resistance mechanisms is important for
Talk at the SIG M3 meeting (ISMB 2009), Stockholm June 2009
Describes an approach for the functional classification of environmental sequences of a metagenomic data set.
http://www-ab.informatik.uni-tuebingen.de/software/megan/welcome.html
The document discusses curating sequence and literature data for RefSeq and Gene at the National Center for Biotechnology Information. It provides an overview of RefSeq, describing what RefSeq is, how it compares to GenBank, its advantages, and how the RefSeq dataset is built through curated data and sequence analysis. It then discusses the curation process in depth, including examples of curating genes, transcripts, proteins, and literature. It also describes the tools and quality assurance checks used in curation.
Aequatus browser: Visualising complex similarity relationships among speciesAnil Thanki
The Aequatus Browser, a web-based tool with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures among differing species or subtypes of a common species.
P.S. Previously named as Synteny Borwser so in these slides referred as Synteny Browser
Java is increasingly being adopted by the bioinformatics community. Many former Perl-based applications and APIs are being rewritten in Java, such as BioMOBY and Ensembl. Advanced Java APIs have been released and improvements to the Java Virtual Machine have eliminated some drawbacks of Java for bioinformatics, including regular expressions and casting. The document discusses Java tools, APIs, libraries, and resources for bioinformatics and provides statistics on usage of Java versus other languages for bioinformatics.
These is the second part of the lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser.
See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190
Tutorial on the DisGeNET Discovery Platform, with especial focus on its exploitation in the Semantic Web showing how to retrieve and integrate DisGeNET data with other RDF linked datasets.
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
This document summarizes resistance to hepatitis C virus (HCV) treatments. It discusses resistance to interferon-alpha ribavirin therapy, direct acting antivirals (DAAs) such as protease inhibitors and nucleotide analog inhibitors of HCV RNA polymerase, and resistance mechanisms. Key points are:
1) Treatment failure with interferon-alpha ribavirin is partly due to intrinsic HCV resistance properties. Resistance accounts for a small portion of treatment failures.
2) Resistance rapidly emerges to DAAs through amino acid substitutions near the drug binding site. This reduces drug susceptibility but also impairs viral fitness.
3) Resistance is a major challenge for DAA therapy and understanding resistance mechanisms is important for
Talk at the SIG M3 meeting (ISMB 2009), Stockholm June 2009
Describes an approach for the functional classification of environmental sequences of a metagenomic data set.
http://www-ab.informatik.uni-tuebingen.de/software/megan/welcome.html
The document discusses curating sequence and literature data for RefSeq and Gene at the National Center for Biotechnology Information. It provides an overview of RefSeq, describing what RefSeq is, how it compares to GenBank, its advantages, and how the RefSeq dataset is built through curated data and sequence analysis. It then discusses the curation process in depth, including examples of curating genes, transcripts, proteins, and literature. It also describes the tools and quality assurance checks used in curation.
Aequatus browser: Visualising complex similarity relationships among speciesAnil Thanki
The Aequatus Browser, a web-based tool with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures among differing species or subtypes of a common species.
P.S. Previously named as Synteny Borwser so in these slides referred as Synteny Browser
Java is increasingly being adopted by the bioinformatics community. Many former Perl-based applications and APIs are being rewritten in Java, such as BioMOBY and Ensembl. Advanced Java APIs have been released and improvements to the Java Virtual Machine have eliminated some drawbacks of Java for bioinformatics, including regular expressions and casting. The document discusses Java tools, APIs, libraries, and resources for bioinformatics and provides statistics on usage of Java versus other languages for bioinformatics.
These is the second part of the lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser.
See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190
Tutorial on the DisGeNET Discovery Platform, with especial focus on its exploitation in the Semantic Web showing how to retrieve and integrate DisGeNET data with other RDF linked datasets.
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
IntOGen, Integrative Oncogenomics for Personal Cancer Genomeschristian.perez
IntOGen was presented September, 11th at the CSHL Meeting on Personal Genomes. The talk was given by Christian Perez-Llamas and he presented the main features of the current version and the advances of IntOGen 2.0 to store, analyze and visualize next generation sequencing data from cancer samples.
CSHL Meeting on Personal Cancer Genomes web: http://meetings.cshl.edu/meetings/person10.shtml
Digital PCR for soybean GMO detection on the OpenArray Platform: a case study...Thermo Fisher Scientific
This document describes how digital PCR on the TaqMan OpenArray platform can be used as a sensitive tool for detecting genetically modified organisms (GMOs). [The summary describes:]
1) Custom TaqMan assays were designed to differentiate between GMO and wild-type soybean DNA. 2) The assays were validated for specificity using digital PCR. 3) Spike-in experiments were conducted adding GMO DNA at levels as low as 0.01% to wild-type DNA, and digital PCR accurately measured ratios matching estimated levels.
The document describes a method for inferring cancer subnetwork markers using density-constrained biclustering. It begins with an introduction on personalized medicine and biomarker discovery. The methods section explains that the approach finds densely connected subnetworks that are partially differentially expressed. Experimental results on colon and breast cancer gene expression data show that the method achieves high classification performance. The top inferred subnetwork markers are enriched for processes involved in DNA replication, damage repair, and tumor suppression. Future work is proposed to compare signatures across cancers and integrate additional interaction data.
Using Ontologies to accelerate candidate gene identificationSimon Twigger
Copy of my slides from the AMIA Summit on Translational Medicine, 2010. This outlines our work with the National Center for Biomedical Ontology where we are using their tools to index biological data repositories and then enable the use of these annotations for further discoveries.
This document summarizes a presentation given by Dr. Jo Vandesompele on state-of-the-art normalization of RT-qPCR data. It discusses the importance of normalization to remove experimental variation and introduces the geNorm algorithm for determining the optimal number and combination of reference genes for normalization. GeNorm has become the standard method for reference gene validation and normalization and has improved qPCR data analysis. The document also proposes a novel global mean normalization strategy for large-scale gene expression studies.
Semantic Web Approaches to Candidate Gene IdentificationSimon Twigger
The document describes using semantic web approaches and ontologies to integrate and annotate genomic data from rat expression studies. Key points discussed include using the NCBO Annotator to annotate datasets, curating the results, linking annotations to genes and pathways in a triple store, and integrating strain and tissue level expression data into the Rat Genome Database. The goal is to enable researchers to better search and explore genomic data to identify candidate genes for phenotypes.
Multi-scale network biology model & the model librarylaserxiong
This document discusses multi-scale network biology models and a network model library. It describes how the library would contain different types of nodes and edges to represent diverse biological interactions. The library would annotate pre-defined network models and integrate updated models. It also discusses multi-scale networks from the inter-cellular to inter-tissue levels. A case study on prioritizing pre-clinical drugs via prognosis-guided genetic interaction networks is mentioned. The document notes challenges in current disease models for drug development and proposes approaches like synergistic outcome determination and module-module cooperation networks to address them.
The document discusses clinical applications of next generation sequencing (NGS), specifically a test called NIFTY (Non-Invasive Fetal TrisomY). NIFTY uses NGS and bioinformatics to analyze cell-free fetal DNA in maternal plasma to evaluate the likelihood of fetal trisomy 21, 18, and 13. Clinical validation studies showed NIFTY has a detection rate over 99.9% for these trisomies with a low false positive rate. NIFTY provides a safe, non-invasive prenatal screening alternative to invasive diagnostic tests.
IntOGen, Integrative Oncogenomics for Personal Cancer Genomeschristian.perez
IntOGen was presented September, 11th at the CSHL Meeting on Personal Genomes. The talk was given by Christian Perez-Llamas and he presented the main features of the current version and the advances of IntOGen 2.0 to store, analyze and visualize next generation sequencing data from cancer samples.
CSHL Meeting on Personal Cancer Genomes web: http://meetings.cshl.edu/meetings/person10.shtml
Digital PCR for soybean GMO detection on the OpenArray Platform: a case study...Thermo Fisher Scientific
This document describes how digital PCR on the TaqMan OpenArray platform can be used as a sensitive tool for detecting genetically modified organisms (GMOs). [The summary describes:]
1) Custom TaqMan assays were designed to differentiate between GMO and wild-type soybean DNA. 2) The assays were validated for specificity using digital PCR. 3) Spike-in experiments were conducted adding GMO DNA at levels as low as 0.01% to wild-type DNA, and digital PCR accurately measured ratios matching estimated levels.
The document describes a method for inferring cancer subnetwork markers using density-constrained biclustering. It begins with an introduction on personalized medicine and biomarker discovery. The methods section explains that the approach finds densely connected subnetworks that are partially differentially expressed. Experimental results on colon and breast cancer gene expression data show that the method achieves high classification performance. The top inferred subnetwork markers are enriched for processes involved in DNA replication, damage repair, and tumor suppression. Future work is proposed to compare signatures across cancers and integrate additional interaction data.
Using Ontologies to accelerate candidate gene identificationSimon Twigger
Copy of my slides from the AMIA Summit on Translational Medicine, 2010. This outlines our work with the National Center for Biomedical Ontology where we are using their tools to index biological data repositories and then enable the use of these annotations for further discoveries.
This document summarizes a presentation given by Dr. Jo Vandesompele on state-of-the-art normalization of RT-qPCR data. It discusses the importance of normalization to remove experimental variation and introduces the geNorm algorithm for determining the optimal number and combination of reference genes for normalization. GeNorm has become the standard method for reference gene validation and normalization and has improved qPCR data analysis. The document also proposes a novel global mean normalization strategy for large-scale gene expression studies.
Semantic Web Approaches to Candidate Gene IdentificationSimon Twigger
The document describes using semantic web approaches and ontologies to integrate and annotate genomic data from rat expression studies. Key points discussed include using the NCBO Annotator to annotate datasets, curating the results, linking annotations to genes and pathways in a triple store, and integrating strain and tissue level expression data into the Rat Genome Database. The goal is to enable researchers to better search and explore genomic data to identify candidate genes for phenotypes.
Multi-scale network biology model & the model librarylaserxiong
This document discusses multi-scale network biology models and a network model library. It describes how the library would contain different types of nodes and edges to represent diverse biological interactions. The library would annotate pre-defined network models and integrate updated models. It also discusses multi-scale networks from the inter-cellular to inter-tissue levels. A case study on prioritizing pre-clinical drugs via prognosis-guided genetic interaction networks is mentioned. The document notes challenges in current disease models for drug development and proposes approaches like synergistic outcome determination and module-module cooperation networks to address them.
The document discusses clinical applications of next generation sequencing (NGS), specifically a test called NIFTY (Non-Invasive Fetal TrisomY). NIFTY uses NGS and bioinformatics to analyze cell-free fetal DNA in maternal plasma to evaluate the likelihood of fetal trisomy 21, 18, and 13. Clinical validation studies showed NIFTY has a detection rate over 99.9% for these trisomies with a low false positive rate. NIFTY provides a safe, non-invasive prenatal screening alternative to invasive diagnostic tests.
1. IntOGen & Gitools
integration, visualization and data-mining of
multidimensional oncogenomic data
Christian Pérez-Llamas
Master student
Biomedical Genomics
GRIB-UPF
April 2010
2. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
3. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
5. Identification of cancer related genes
Cancer type A
exp. 1
exp. 2
exp. 3
exp. n
experiment 1
samples
STEP 1 STEP 2
identification of combination of
genes
driver alterations experiments
+ ...
genes
altered 0 0.05 1
not altered
corrected p-value
International Classification of Disease
from Word Health Organization
15. Data Analysis Browse Export
Many File Formats Supported
TSV
CDM
BDM
GMX
GMT
TCM
16. Data Analysis Browse Export
Import data from: Marts
● International Cancer Genome Consorcium
Data Levels Alterations
● Genes significantly altered ● Experiments ● Upregulation
● Modules of genes significantly altered ● Combinations ● Downregulation
● Gain
● Loss
20. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
21. Case study
● What biological processes are enriched in genes
significantly up-regulated in cancer ?
● What is the correlation between different tumour
types for the pattern of genes up-regulated ?
39. Enrichment analysis
Biological modules
Tumor
Tumor
type i
type i
... ... GO Biological processes
Tumor
type i
...
STEP 1 STEP 2
genes
genes
Transform to 1 Enrichment
p-values < 0.05 analysis
modules
Xi~Bin(pi)
H0: pm = pi
H1: pm > pi
0 0.05 1
Annotated genes
p-value in module M
53. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
54. Real projects
● RBP2 function
● Functional protein divergence
● Study of altered regulatory programs in cancer
● Stress response genes and transition into increased malignant states
● Comparison of alteration patterns among tumor types
RBP2
Functional Enrichment
of RBP2 targets at
different time points of
differentiation
Lopez-Bigas et al.,
Molecular Cell 2008
55. Real projects
● RBP2 function
● Functional protein divergence
● Study of altered regulatory programs in cancer
● Stress response genes and transition into increased malignant states
● Comparison of alteration patterns among tumor types
Lopez-Bigas et al.,
Genome Biology 2008
56. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
57. Conclusions
● IntOGen is a novel framework for Oncogenomics data
integration
● IntOGen.org is a discovery tool for cancer researchers
● Gitools main features are:
● Interactive heatmap
● Import from Biomart
● Import from IntOGen
● Command line option
58. Future work
● Biomart compatible interface for IntOGen
● Implement more analysis:
● GSEA
● Clustering
● Modules hierarchy aware enrichment like Gostats
● Connection with R
● Implement more editors:
● Table and modules editor
59. Acknowledgements
Nuria López-Bigas
Gunes Gundem
Jordi Deu-Pons
Khademul Islam
Michael Schroeder
Alba Jené-Sanz
Xavier Rafael
Remember to visit
www.intogen.org
www.gitools.org