IntOGen was presented September, 11th at the CSHL Meeting on Personal Genomes. The talk was given by Christian Perez-Llamas and he presented the main features of the current version and the advances of IntOGen 2.0 to store, analyze and visualize next generation sequencing data from cancer samples.
CSHL Meeting on Personal Cancer Genomes web: http://meetings.cshl.edu/meetings/person10.shtml
Novel network pharmacology methods for drug mechanism of action identificatio...
IntOGen, Integrative Oncogenomics for Personal Cancer Genomes
1. IntOGen, Integrative OncoGenomics
for personal cancer genomes
Christian Pérez-Llamas
Biomedical Genomics Lab
Pompeu Fabra University
Biomedical Research Park at Barcelona
2. IntOGen, Integrative OncoGenomics
for personal cancer genomes
Christian Pérez-Llamas
Biomedical Genomics Lab
Pompeu Fabra University
Biomedical Research Park at Barcelona
3.
4.
5. Overview
Oncogenomics data Clinical annotations Biological modules
Transcriptomic alterations International Functional
DATA
Copy Number alterations Classification Regulatory
Mutations of Diseases Cancer related
... for Oncology ...
Integrative methodologies
STATISTICS
Cancer related genes identification Data management
Cancer related modules identification
Combinations of experiments by ICDO
Generation of cancer specific modules
Web discovery tool Biomart services Gitools
EXPLORATION
www.intogen.org biomart.intogen.org www.gitools.org
6. Data
Transcriptomic alterations Copy number alterations Mutations
Copy Number Analysis
from Sanger Institute
Selection of experiments
Public data
Experiment design: cancer vs normal
At least 20 samples
Annotation of tumour type
International Classification of Diseases for Oncology (ICD-O)
Manual curation from publication or description
Progenetix already annotated with ICD-O
More than 800 experiments
More than 25000 samples
Almost 150 ICD-O tumor types
7. Statistics Cancer related genes identification
exp. 1
experiment 1
samples
STEP 1
identification of
genes
driver alterations
genes
altered 0 0.05 1
not altered
corrected p-value
8. Statistics Cancer related genes identification
Cancer type A
exp. 1
exp. 2
exp. 3
exp. n
experiment 1
samples
STEP 1 STEP 2
identification of combination of
genes
driver alterations experiments
+ ...
genes
altered 0 0.05 1
not altered
corrected p-value
9. Statistics Cancer related modules identification
20. More details...
IntOGen: Integration and data-mining of multidimensional oncogenomic data
Gundem G, Perez-Llamas C, Jene-Sanz A, Kedzierska A,Islam A,
Deu-Pons J, Furney S and Lopez-Bigas N.
Nature Methods, 7, 92-93 (2010)
www.intogen.org www.gitools.org
biomart.intogen.org
21. International Cancer Genome Consortium
50 cancer types
500 samples each cancer type
About 25000 genomes in total
22. International Cancer Genome Consortium
50 cancer types
500 samples each cancer type
About 25000 genomes in total
Data Storage, Analysis & Management
23. Cancer genomes in the context of IntOGen
ICGC-CLL genome
project
samples
Samples
7 CLL
7 normal
genes
Technology
RNA-seq
altere
Alteration d
not
Dif. Expression: altered
- Upregulated
- Downregulated
(Roderic Guigo lab)
24. Cancer genomes in the context of IntOGen
ICGC-CLL genome IntOGen
project
samples tumours /
Samples
experiments
7 CLL
7 normal
genes
genes
Technology
RNA-seq
altere 0 0.05 1
Alteration d
not
corrected p-
Dif. Expression: value
altered
- Upregulated
- Downregulated
(Roderic Guigo lab)
25. Cancer genomes in the context of IntOGen
ICGC-CLL genome IntOGen
project
samples tumours
Samples
7 CLL
7 normal
genes
genes
Technology
RNA-seq
altere 0 0.05 1
Alteration d
not
corrected p-
Dif. Expression: value
altered
- Upregulated
- Downregulated
(Roderic Guigo lab)
26. Cancer genomes in the context of IntOGen
ICGC-CLL genome IntOGen
project
samples tumours
Samples
7 CLL
7 normal
genes
genes
Technology
RNA-seq
altere 0 0.05 1
Alteration d
not
corrected p-
Dif. Expression: value
altered
- Upregulated Enrichment
- Downregulated analysis
(Roderic Guigo lab) samples tumours
pathway
pathway
s
s
0 0.05 1 0 0.05 1
corrected p- corrected p-
value value
27. Cancer genomes in the context of IntOGen
ICGC-CLL genome IntOGen
project
samples tumours
Samples
7 CLL
7 normal
genes
genes
Technology
RNA-seq
altere 0 0.05 1
Alteration d
not
corrected p-
Dif. Expression: value
altered
- Upregulated Enrichment
- Downregulated analysis
(Roderic Guigo lab) samples tumours
pathway
pathway
s
s
0 0.05 1 0 0.05 1
corrected p- corrected p-
value value
29. Ethical considerations
Data that cannot be used
open to identify individuals:
access age, normalized gene expression, ...
Germline genomic data and
controlled detailed clinical information
access associated to a unique individual
30. Ethical considerations
Data that cannot be used
open to identify individuals:
access age, normalized gene expression, ...
Germline genomic data and
controlled detailed clinical information
access associated to a unique individual
31. Technical considerations
User interfaces
Management Gitools Browser Biomart Web services
IntOGen core
Experiments Analysis Analysis Data Data Data
management management workflows management models importers
Infrastructure
Hadoop Hadoop
Cascading PIG Amazon / Eucalyptus
Map-Reduce DFS
Bioinformatics
Grid Engine Plain files MySQL MongoDB software
32. Technical considerations
Genome view
User interfaces
Management Gitools Browser Biomart Web services
NGS workflows IntOGen core
Experiments Analysis Analysis Data Data Data
management management workflows management models importers
Web management Infrastructure
Hadoop Hadoop
Cascading PIG Amazon / Eucalyptus
Map-Reduce DFS
Bioinformatics
Grid Engine Plain files MySQL MongoDB software
33. Technical considerations
Genome view
User interfaces
Management Gitools Browser Biomart Web services
NGS workflows IntOGen core
Experiments Analysis Analysis Data Data Data
management management workflows management models importers
Web management Infrastructure
Hadoop Hadoop
Cascading PIG Amazon / Eucalyptus
Map-Reduce DFS
Bioinformatics
Grid Engine Plain files MySQL MongoDB software
Flexibility Scalability
●Different ways to access the data ● Quantity of data increases
●Methods constantly evolving ● And also the number and complexity of calculations
●Methods impl. different languages and infrastructure requirements
34. Summary
IntOGen is a novel framework for oncogenomics data integration
and analysis
It integrates many tumor types and different types of alterations in
a common framework
It explores the data at different levels, from individual experiments
to combinations of experiments, and from individual genes to
biological modules
It incorporates an intuitive web system designed to be a
discovery tool for cancer researchers
I have presented some examples on how to use IntOGen and Gitools
to prioritize and compare personal genomes data.
We are adapting IntOGen to store, analyze and visualize
next generation sequencing data, which will allow to incorporate
data from the ICGC, starting by the Chronic Lymphocytic Leukemia
data.
Ethical and technological considerations has to be addressed.
35. Acknowledgements
Biomedical Genomics
Nuria López-Bigas
Gunes Gundem
Jordi Deu-Pons
Khademul Islam
Alba Jené-Sanz
Michael Schroeder
Xavier Rafael
Sophia Derdak
Abel Gonzalez-Pérez
Armand Gutierrez