Joaquín Dopazo 
Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Nod...
Looking for biomarkers 
Cases 
Controls 
A 
B 
C 
D 
E 
The simplest case: monogenic disease
A historic perspective: 
Candidate gene studies using GWAS 
Odds Ratio: 3.6 
95% CI = 1.3 to 10.4
NHGRI GWA Catalog 
www.genome.gov/GWAStudies 
Published Genome-Wide Associations 
By the time of the completion of the hum...
Lessons learned from GWAS 
•Many loci contribute to complex-trait variation 
•There is evidence for pleiotropy, i.e., that...
The missing heritability problem (individual genes cannot explain the heritability of traits) 
Where did the heritability ...
Genomic sequencing at rescue. NGS prices are in free fall. 
http://www.genome.gov/sequencingcosts/ 
Omics are all around
Again, looking for disease genes with new methodologies but same strategy … or why using the brain if the problem can be s...
Successive Filtering approach An example with 3-Methylglutaconic aciduria syndrome 
3-Methylglutaconic aciduria (3-MGA-uri...
Exome sequencing has been systematically used to identify Mendelian disease genes
Limitations of the single-gene approach Can disease-related variants be so easily detected? 
a)Interrogating 60Mb sites (e...
How to explain missing heritability? 
Rare Variants, rare CNVs, epigenetics or.. epistatic effects? 
Is the heritability r...
At the crossroad: how detection power 
of genomic technologies can be increased? 
There are 
two (non 
mutually 
exclusive...
Clear individual gene associations are difficult to find for most diseases 
Affected cases in complex diseases will be a h...
Modular nature of human genetic diseases 
•With the development of systems biology, studies have shown that phenotypically...
Initiatives to explore beyond single gene biomarkers 
LINCS aims to create a network-based understanding of biology by cat...
From single gene to function-based biomarkers 
SNPs WES/WGS 
Gene expression 
AND/OR 
Gene Ontology: define sets of genes ...
Analysis of SNP biomarkers in a GWAS 
GWAS in Breast Cancer. 
The CGEMS initiative. (Hunter et al. Nat Genet 2007) 
1145 c...
GSEA of the same GWAS data 
Breast Cancer 
CGEMS initiative. (Hunter et al. Nat 
Genet 2007) 
1145 cases 1142 controls. Af...
GO processes significantly associated to breast cancer 
Rho pathway 
Chromosomal instability 
Metastasis
From single gene to function-based biomarkers 
SNPs, NGS Gene expression Gene1 Gene2 Gene3 Gene4 : : : Gene22000 
Gene Ont...
From single gene to function-based biomarkers 
SNPs WES/WGS 
Gene expression 
Using protein interaction networks as an sca...
CACNA1F, CACNA2D4 
GNAT2 
RP 
CORD/COD 
CORD/COD 
CVD 
CVD 
MD 
LCA 
ERVR/EVR 
C2ORF71, C8ORF37, CA4,CERKL, CNGA1, CNGB1, ...
Connectivity among IRDs 
LCA-Leber Congenital Amaurosis CORD/COD- Cone and cone-rod dystro. CVD- Colour Vision Defects MD-...
CHRNA7 (rs2175886 p = 0.000607) IQGAP2 (rs950643 p = 0.0003585) DLC1 (rs1454947 p = 0.007526) RASGEF1A* (rs1254964 p = 3.8...
Mutation network analysis 
•This approach prioritizes well-known, infrequently mutated genes, which are shown to interact ...
From single gene to function-based biomarkers 
SNPs, gene expression, etc. 
GO 
Protein interaction networks 
Detection po...
From single gene to mechanism- based biomarkers 
Transforming gene expression values into another value that accounts for ...
A 
B 
A 
B 
B 
D 
C 
E 
F 
A 
G 
P(A→G activated) = P(A)P(B)P(D)P(F)P(G) + P(A)P(C)P(E)P(F)P(G) - P(A)P(F)P(G)P(B)P(C)P(D)...
From single gene to function-based biomarkers 
SNPs 
WES/WGS 
Gene expression 
We only need to estimate the probabilities ...
What would you predict about the consequences of gene activity changes in the apoptosis pathway in a case control experime...
Apoptosis inhibition is not obvious from gene expression 
Two of the three possible sub- pathways leading to apoptosis are...
Gene1 Gene2 Gene3 Gene4 : : : : : : : : : : : : : : : : : : : : : : Gene22000 
Raw measuret 
Transformed measure: 
Mechani...
Prediction of IC50 values from the activity of signaling circuits
Mechanism-based biomarkers have meaning by itself 
Unlike in single gene biomarkers, the selected mechanism-based biomarke...
From gene-based to function-based biomarkers 
SNPs, gene expression, etc. 
GO 
Protein interaction networks 
Models of cel...
Software development 
See interactive map of for the last 24h use http://bioinfo.cipf.es/toolsusage 
Babelomics is the thi...
The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… 
...the...
Upcoming SlideShare
Loading in …5
×

A New Generation Of Mechanism-Based Biomarkers For The Clinic

1,405 views

Published on

2nd Annual Next Generation Sequencing Data Congress
19-20 May, 2014
London, UK

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,405
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A New Generation Of Mechanism-Based Biomarkers For The Clinic

  1. 1. Joaquín Dopazo Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB), Bioinformatics Group (CIBERER) and Medical Genome Project, Spain. A New Generation Of Mechanism- Based Biomarkers For The Clinic http://bioinfo.cipf.es http://www.medicalgenomeproject.com http://www.babelomics.org http://www.hpc4g.org @xdopazo 2nd Next Generation Sequencing Data Congress, May 20th, 2014
  2. 2. Looking for biomarkers Cases Controls A B C D E The simplest case: monogenic disease
  3. 3. A historic perspective: Candidate gene studies using GWAS Odds Ratio: 3.6 95% CI = 1.3 to 10.4
  4. 4. NHGRI GWA Catalog www.genome.gov/GWAStudies Published Genome-Wide Associations By the time of the completion of the human genome sequence, in 2005, just a few genetic variants were known to be significantly associated to diseases. When the first exhaustive catalogue of GWAS was compiled, in 2008, only three years later, more than 500 single nucleotide polymorphisms (SNPs) were associated to traits Today such catalog had collected more than 1700 papers reporting 8800 SNPs significantly associated to more than 1700 traits.
  5. 5. Lessons learned from GWAS •Many loci contribute to complex-trait variation •There is evidence for pleiotropy, i.e., that the same variants are associated with multiple traits. •Much of the heritability of the trait cannot be explained by the loci/variants associated.
  6. 6. The missing heritability problem (individual genes cannot explain the heritability of traits) Where did the heritability go? How to explain all this? Rare Variants, rare CNVs, epigenetics or.. epistatic effects?
  7. 7. Genomic sequencing at rescue. NGS prices are in free fall. http://www.genome.gov/sequencingcosts/ Omics are all around
  8. 8. Again, looking for disease genes with new methodologies but same strategy … or why using the brain if the problem can be solved by brute force Patient 1 Patient 2 Patient 3 Control 1 Control 2 Control 3 mutation candidate gene (shares mutation for all patients but no controls)
  9. 9. Successive Filtering approach An example with 3-Methylglutaconic aciduria syndrome 3-Methylglutaconic aciduria (3-MGA-uria) is a heterogeneous group of syndromes characterized by an increased excretion of 3- methylglutaconic and 3-methylglutaric acids. WES with a consecutive filter approach is enough to detect the new mutation in this case.
  10. 10. Exome sequencing has been systematically used to identify Mendelian disease genes
  11. 11. Limitations of the single-gene approach Can disease-related variants be so easily detected? a)Interrogating 60Mb sites (exomes) produces too many variants. Many of these segregating with our experimental design b)There is a non-negligible amount of apparently deleterious variants with no pathologic effect c)In many cases we are not targeting rare but common variants (which occur in normal population) d)In many cases only one variant does not explain the disease but rather a combination of them e)Finally, variants found associated to the disease usually account for a small portion of the trait heritability
  12. 12. How to explain missing heritability? Rare Variants, rare CNVs, epigenetics or.. epistatic effects? Is the heritability really missing or are we looking at the wrong place? At the end, most of the heritability was there…
  13. 13. At the crossroad: how detection power of genomic technologies can be increased? There are two (non mutually exclusive) ways Scaling up: by increasing sample size. It is known that larger size allows detecting more individual gene (biomarker) associations. Limitations: Budget, patients availability and the own nature of the disease. Changing the perspective: systems approach to understand variation Interactions, multigenicity can be better detected and the role of variants understood in the context of disease mechanism. Limitations: Available information
  14. 14. Clear individual gene associations are difficult to find for most diseases Affected cases in complex diseases will be a heterogeneous population with different mutations (or combinations). Many cases and controls are needed to obtain significant associations. The only common element is the (know or unknown) functional module affected. Disease understood as the malfunction of a functional module Cases Controls
  15. 15. Modular nature of human genetic diseases •With the development of systems biology, studies have shown that phenotypically similar diseases are often caused by functionally related genes, being referred to as the modular nature of human genetic diseases (Oti and Brunner, 2007; Oti et al, 2008). •This modularity suggests that causative genes for the same or phenotypically similar diseases may generally reside in the same biological module, either a protein complex (Lage et al, 2007), a sub- network of protein interactions (Lim et al, 2006) , or a pathway (Wood et al, 2007) •This modular nature implies a positive correlation between gene–gene relatedness and phenotype–phenotype similarity. •This second-order association between gene and phenotype has been used for predicting disease genes
  16. 16. Initiatives to explore beyond single gene biomarkers LINCS aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents, and by using computational tools to integrate this diverse information into a comprehensive view of normal and disease states that can be applied for the development of new biomarkers and therapeutics. And the connectivity map, a pioneer initiative
  17. 17. From single gene to function-based biomarkers SNPs WES/WGS Gene expression AND/OR Gene Ontology: define sets of genes functionally related
  18. 18. Analysis of SNP biomarkers in a GWAS GWAS in Breast Cancer. The CGEMS initiative. (Hunter et al. Nat Genet 2007) 1145 cases 1142 controls. Affy 500K Conventional association test reports only significant 4 SNPs mapping only on one gene: FGFR2 Conclusions: conventional tests are not providing much resolution. What is the problem with them? Are there solutions?
  19. 19. GSEA of the same GWAS data Breast Cancer CGEMS initiative. (Hunter et al. Nat Genet 2007) 1145 cases 1142 controls. Affy 500K Only 4 SNPs were significantly associated, mapping only in one gene: FGFR2 Bonifaci et al., BMC Medical Genomics 2008; Medina et al., 2009 NAR PBA reveals 19 GO categories including regulation of signal transduction (FDR-adjusted p-value=4.45x10-03) in which FGFR2 is included.
  20. 20. GO processes significantly associated to breast cancer Rho pathway Chromosomal instability Metastasis
  21. 21. From single gene to function-based biomarkers SNPs, NGS Gene expression Gene1 Gene2 Gene3 Gene4 : : : Gene22000 Gene Ontology SNPs, gene exp. GO Detection power Low (only very prevalent genes) high Annotations available many many Use Biomarker Illustrative, give hints
  22. 22. From single gene to function-based biomarkers SNPs WES/WGS Gene expression Using protein interaction networks as an scaffold to interpret the genomic data in a functionally-derived context AND/OR What part of the interactome is active and/or is damaged
  23. 23. CACNA1F, CACNA2D4 GNAT2 RP CORD/COD CORD/COD CVD CVD MD LCA ERVR/EVR C2ORF71, C8ORF37, CA4,CERKL, CNGA1, CNGB1, DHDDS,EYS, FAM161A, IDH3B,KLHL7 IMPG2, MAK, NRL, PAP1, PDE6A, PDE6G, PRCD, PRF3, PRPF8, PRPF31 RBP3, RGR, ROM1, RP1, RP2, SNRNP200, TOPORS, TTC8 ZNF513 PDE6B, RHO, SAG GRK1, GRM6, NYX, TRPM1 CABP4, LCA5, RD3 CRB1, IMPDH1, LRAT, MERTK, RDH12, RPE65, SPATA7, TULP1 CRX AIPL1, GUCY2D, RPGRIP1 ADAM9, GUCA1A, HRG4/UNC119, KCNV2, PDE6H, PITPNM3, RAX2, RDH5, RIM1 CNGA3, PDE6C BCP, GCP, RCP ABCA4, PROM1, PRPH2, RPGR RLBP1, SEMA4A C1QTNF5, EFEMP1, ELOVL4, HMNC1, RS1, TIMP3 FSCN2, GUCA1B NR2E3 BEST1 FZD4, KCNJ13, LRP5, NDP, TSPAN12, VCAN NB ABHD12, CDH23, CIB2, DFNB31, GPR98, HARS, MYO7A, PCDH15, USH1C, USH1G CLRN1, USH2A USH CEP290 BBS1 BBS ARL6,, BBS2, BBS4, BBS5, BBS7, BBS9, BBS10, BBS12,, INPP5E, LZTFL1, MKKS, MKS1, SDCCAG8, TRIM32, TTC8 Example with Inherited Retinal Dystrophies Genetic overlapping among IRDs LCA-Leber Congenital Amaurosis CORD/COD- Cone and cone-rod dystro. CVD- Colour Vision Defects MD- Macular Degeneration ERVR/EVR- Erosive and Exudative Vitreoretinopathies USH- Usher Syndrome RP- Retinitis Pigmentosa NB- Night Blindness BBS- Bardet-Biedl Syndrome
  24. 24. Connectivity among IRDs LCA-Leber Congenital Amaurosis CORD/COD- Cone and cone-rod dystro. CVD- Colour Vision Defects MD- Macular Degeneration ERVR/EVR- Erosive and Exudative Vitreoretinopathies USH- Usher Syndrome RP- Retinitis Pigmentosa NB- Night Blindness BBS- Bardet-Biedl Syndrome BBS LCA CORD/COD CVD MD ERVR/EVR USH RP NB Significant Clustering coefficient, p-value=0.0103 SNOW Tool. Minguez et al., NAR 2009 Implemented in Babelomics (http://www.babelomics.org)
  25. 25. CHRNA7 (rs2175886 p = 0.000607) IQGAP2 (rs950643 p = 0.0003585) DLC1 (rs1454947 p = 0.007526) RASGEF1A* (rs1254964 p = 3.856x10-05) *no interactions known (yet) SNPs validated in independent cohorts Network analysis helps to find disease genes
  26. 26. Mutation network analysis •This approach prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single- gene-based approaches. •These observations suggest cancer genes could mutate in a broad range of frequency spectrums, making it difficult for the frequency-based filtering approach to be effective. Jia 2013 PLoS Comp. Biol. BRAF mutated in 76 samples MAP2K2 mutated in 4 samples BRAF and MAP2K2 interact in 80 samples Subgraphs in the melanoma consensus mutation network
  27. 27. From single gene to function-based biomarkers SNPs, gene expression, etc. GO Protein interaction networks Detection power Low (only very prevalent genes) High High Information coverage Almost all Almost all Less (~9000 genes in human) Use Biomarker Illustrative, give hints Biomarker* *Need of extra information (e.g. GO) to provide functional insights in the findings
  28. 28. From single gene to mechanism- based biomarkers Transforming gene expression values into another value that accounts for a function. Easiest example of modeling function: signaling pathways. Function: transmission of a signal from a receptor to an effector Activations and repressions occur Receptor Effector
  29. 29. A B A B B D C E F A G P(A→G activated) = P(A)P(B)P(D)P(F)P(G) + P(A)P(C)P(E)P(F)P(G) - P(A)P(F)P(G)P(B)P(C)P(D)P(E) Prob. = [1-P(A activated)]P(B activated) Prob. = P(A activated)P(B activated) Activation Inhibition Sub-pathway Modeling pathways
  30. 30. From single gene to function-based biomarkers SNPs WES/WGS Gene expression We only need to estimate the probabilities of gene activation and integrity and then calculate the probability for each circuit of being switched on or off. AND/OR
  31. 31. What would you predict about the consequences of gene activity changes in the apoptosis pathway in a case control experiment of colorectal cancer? The figure shows the gene up-regulations (red) and down- regulations (blue) The effects of changes in gene activity over pathway functionality are not obvious
  32. 32. Apoptosis inhibition is not obvious from gene expression Two of the three possible sub- pathways leading to apoptosis are inhibited in colorectal cancer. Upper panel shows the inhibited sub-pathways in blue. Lower panel shows the actual gene up- regulations (red) and down- regulations (blue) that justify this change in the activity of the sub- pathways
  33. 33. Gene1 Gene2 Gene3 Gene4 : : : : : : : : : : : : : : : : : : : : : : Gene22000 Raw measuret Transformed measure: Mechanism-based biomarker Circuit1 Circuit2 Circuit3 Circuit4 : : Circuit800 Sub-selection of highly discriminative mechanism-based biomarkers Mechanism-based biomarkers (signaling circuits here) can be used to predict phenotypes that can be discrete (e.g. disease subtype) or even continuous (e.g. the IC50 of drugs, model animals from cell lines, etc.). Mechanisms-based biomarkers can also be used to predict features : : :
  34. 34. Prediction of IC50 values from the activity of signaling circuits
  35. 35. Mechanism-based biomarkers have meaning by itself Unlike in single gene biomarkers, the selected mechanism-based biomarkers (probability of circuit activation) have meaning by itself. Survival activation in the apoptosis pathway is one of the predictive mechanism-based biomarkers of breast cancer.
  36. 36. From gene-based to function-based biomarkers SNPs, gene expression, etc. GO Protein interaction networks Models of cellular functions Detection power Low (only very prevalent genes) High High Very high Information coverage Almost all Almost all Low (~9000 genes in human) Low (~6700 genes in human)* Use Biomarker Illustrative, give hints Biomarker Biomarker that explain disease mechanism *Only ~800 genes in human signaling pathways
  37. 37. Software development See interactive map of for the last 24h use http://bioinfo.cipf.es/toolsusage Babelomics is the third most cited tool for functional analysis. Includes more than 30 tools for advanced, systems-biology based data analysis More than 150.000 experiments were analyzed in our tools during the last year HPC on CPU, SSE4, GPUs on NGS data processing Speedups up to 40X Genome maps is now part of the ICGC data portal Ultrafast genome viewer with google technology Mapping Visualization Functional analysis Variant annotation CellBase Knowledge database Variant prioritization NGS panels Signaling network Regulatory network Interaction network Diagnostic
  38. 38. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases, and… ...the Medical Genome Project (Sevilla) @xdopazo @bioinfocipf

×