SlideShare a Scribd company logo
1 of 45
Download to read offline
Joaquín Dopazo
Computational Genomics Department,
Centro de Investigación Príncipe Felipe (CIPF),
Functional Genomics Node, (INB),
Bioinformatics in Rare Diseases (BiER-CIBERER),
Valencia, Spain.
Digging into thousands of variants to
find disease genes in Mendelian and
complex diseases
http://bioinfo.cipf.es
http://www.babelomics.org
@xdopazo
IV DCEX symposium, Barcelona, November 17th, 2015
Precision medicine (P4*) is based on a better knowledge of phenotype-
genotype relationships
Requires of a better way of defining diseases by introducing genomic
technologies in the diagnostic procedures and treatment decisions
*P4: Predictive, Preventive, Personalized, Participative
Setting the problem in context:
The transition to precision medicine
Intuitive
Based on trial
and error
Identification of
probabilistic
patterns
Decisions and
actions based
on knowledge
Intuitive Medicine Empirical Medicine Precision Medicine (P4)
Today Tomorrow
Degree of personalization
Empirical Medicine First
Phase I: generation of the knowledge database
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
sequencing
Patient
Variants
Database. Query
Therapy outcome
System feedback
Genomic variants (biomarkers) can
be quickly associated to precise
diagnostics and therapy outcomes
Initially the system will need
much feedback: Knowledge
generation phase.
Empirical medicine
Knowledge
database
Genome projects enable
knowledge generation
Rare and familiar diseases
exome sequencing initiative
• Metabolic (86 samples)
• Optiz
• Atypical fracture
• coQ10 deficiency
• Congenital disorder of glycosylation types I and II
• Maple syrup urine disease
• Pelizaeus-like
• 4 unknown syndroms
• Genetic (24 samples)
• Charcot-Marie-Tooth
• Rett Syndrome
• Neurosensorial (35 samples)
• Usher
• AD non-syndromic hearing loss
• AR non-syndromic hearing loss
• RP
• Mitochondrial (28 samples)
• Progressive External Oftalmoplegy
• Multi-enzymatic deficiency in mitochondrial
respiratory complexes
• CoQ disease
• Other
• APL (10 samples)
Autism (37 samples)
Mental retardation (autosomal recessive) (24)
Immunodeficiency (18)
Leber's congenital amaurosis (9)
Cataract (2)
RP(AR) (60)
RP(AD) (46)
Deafness (24)
CLAPO (4)
Skeletal Dysplasia (3)
Cantú syndrome (1)
Dubowitz syndrome (2)
Gorham-Stout syndrome (1)
Malpuech syndrome (4)
Hirschprung’s disease (81)
Hereditary macrothrombocytopenia (3)
MTC (41)
Controls (301)
1044 samples = 183 samples + 200 controls + 360 samples + 301 controls
Data management, analysis
and storage
http://www.gbpa.es/
GCGTATAG
CACGGGTA
TCTGTATTA
TGGTGGAT
ATCAGCGG
ATTGCGATT
GGCAGAGC
GGCAAAGT
GCGTATAG
CACGGGTA
TCTGTATTA
TGGTGGAT
ATCAGCGG
ATTGCGATT
GGCAGAGC
GGCAAAGT
GCGTATAG
CACGGGTA
TCTGTATTA
TGGTGGAT
ATCAGCGG
ATTGCGATT
GGCAGAGC
GGCAAAGT
GCGTATAG
CACGGGTA
TCTGTATTA
TGGTGGAT
ATCAGCGG
ATTGCGATT
GGCAGAGC
GGCAAAGT
Raw files
(FastQ)
DB
Analysis
Pipeline
Storage
K-DB
Gene 1 ksdhkahcka
Gene 2 jckacsksda
Gene 3 lkkxkccj<jdc
Gene 4 ksfdjvjvlsdkvjd
Gene 5 kckcksñdksd
Gene 6 ldkdkcksdcldl
Gene x kcdlkclkldsklk
Gene Y jcdksdkcdks
Prioritization
report
Dialog with experts in the
disease + validations
Samples
GCGTATAG
CACGGGTA
TCTGTATTA
TGGTGGAT
ATCAGCGG
GCGTATAG
CACGGGTA
TCTGTATTA
TGGTGGAT
ATCAGCGG
VCF BAM
Processed files
Casablanca: Round up all suspicious
characters and search them for stolen
documents.
The prioritization process
Typically, an exome renders between 40 and 60K variants (and a
genome about 1 million). Only one or a few among all of them are
expected to be the causative factors of the disease.
The prioritization process is like a police investigation in which
suspected are discarded by their alibies
Thus, through sequential heuristic filtering steps, unlikely candidates
are discarded and a final, reduced list with one or a few candidate
genes is (hopefully) produced
Agatha Christie
40-60K suspects?
We need…
Pipeline of data analysis
Primary
processing
Initial QC
FASTQ file
Mapping
BAM file
Variant calling
VCF File
Knowledge-based
prioritization
Proximity to other
known disease genes
Functional proximity
Network proximity
Burden tests
Other prioritization
methods
Secondary analysis
(Heuristic filtering)
Variant annotation
Filtering by effect
Filtering by MAF
Filtering by family
segregation
Primary
analysis
Gene prioritization
Pipeline of data analysis
Primary
processing
Initial QC
FASTQ file
Mapping
BAM file
Variant calling
VCF File
Knowledge-based
prioritization
Proximity to other
known disease genes
Functional proximity
Network proximity
Burden tests
Other prioritization
methods
Secondary analysis
(Heuristic filtering)
Variant annotation
Filtering by effect
Filtering by MAF
Filtering by family
segregation
Primary
analysis
Gene prioritization
CellBase
Knowledge DB
(Now at EBI)
The knowledge database for
filtering by effect
CellBase (Bleda, 2012, NAR), a
comprehensive integrative database
and RESTful Web Services API:
● Core features: genes, transcripts,
exons, cytobands, proteins (UniProt),...
● Variation: dbSNP and Ensembl SNPs,
HapMap, 1000Genomes, Cosmic,
ClinVar, ...
● Functional: 40 OBO ontologies (Gene
Ontology, HPO, etc.), Interpro, etc.
● Regulatory: TFBS, miRNA targets,
conserved regions, tissue expression
etc.
● System biology: Interactome (IntAct),
Reactome database, co-expressed
genes.
NoSQL (Mongo) and scales to TB
Wiki: http://docs.bioinfo.cipf.es/projects/cellbase/wiki
Project: http://bioinfo.cipf.es/compbio/cellbase
Now available at the EBI: http://www.ebi.ac.uk/cellbase/webservices/rest/v3/ and the
University of Cambridge: http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest
Pipeline of data analysis
Primary
processing
Initial QC
FASTQ file
Mapping
BAM file
Variant calling
VCF File
Knowledge-based
prioritization
Proximity to other
known disease genes
Functional proximity
Network proximity
Burden tests
Other prioritization
methods
Secondary analysis
(Heuristic filtering)
Variant annotation
Filtering by effect
Filtering by MAF
Filtering by family
segregation
Primary
analysis
Gene prioritization
1000 genomes
EVS
Local variants
3-Methylglutaconic aciduria (3-
MGA-uria) is a heterogeneous
group of syndromes
characterized by an increased
excretion of 3-methylglutaconic
and 3-methylglutaric acids.
WES with a consecutive filter
approach is enough to detect
the new mutation in this case.
Heuristic Filtering approach
An example with 3-Methylglutaconic aciduria syndrome
Use known variants and their
population frequencies to filter out.
• Typically dbSNP, 1000 genomes and
the 6515 exomes from the ESP are
used as sources of population
frequencies.
• We sequenced 300 healthy controls
(rigorously phenotyped) to add and
extra filtering step to the analysis
pipeline
Novembre et al., 2008. Genes mirror
geography within Europe. Nature
Comparison of MGP controls to 1000g
How important do you
think local information is
to detect disease genes?
Filtering with or without local variants
Number of genes as a function of individuals in the study of a dominant disease
Retinitis Pigmentosa autosomal dominant
The use of local
variants makes
an enormous
difference
The CIBERER Spanish Variant Server (CSVS): the first
repository of variability of the Spanish population
Only another similar initiative
exists: the GoNL
http://www.nlgenome.nl/
http://ciberer.es/bier/exome-server/
And more recently
the Finnish
and the Icelandic
populations
Information provided
Genotype frequencies
in the different
reference populations
Genomic coordinates, variation, gene.
SNPid
if any
Information provided
Pathogenicity indexes
Phenotype,
if available
Accelerating the loop of knowledge generation
BiERapp: interactive web-based tool for easy
candidate prioritization by successive filtering
SEQUENCING CENTER
Data
preprocessing
VCF
FASTQ
Genome
Maps
BAM
BiERapp filters
No-SQL (Mongo)
VCF indexing
Population
frequencies Consequence types
Experimental
design
BAM viewer and
Genomic context?
Easy
scaleup
Quick contribution to empirical medicine: new variants
and disease genes found with WES and heuristic filtering
WES
IRDs
arRP
(EYS)
BBS
arRParRP
(USH2)
3-MGA-
uria
(SERAC1)
NBD
(BCKDK )
In less than two years, the CIBERER initiative rendered 36 new mutations in
known disease genes and 27 new mutations in13 new disease genes.
Limitations of the single-gene
approach
One might or might not be aware of it, but the single-gene
approach implies the following assumptions:
• Only one gene (or at least one main gene) is the
causative agent of the disease.
• The disease has complete (or very high) penetrance.
• Deleterious variation with no phenotype is not expected
(or expected to be marginal)
Even Mendelian genes can be elusive.
Intuitive belief: multiple family information should help
Families
1 2 3 4 5 6
Variants 3403 82 4 0 0 0
Genes 2560 331 35 8 1 0
Observation: this is not always true, not even in cases
of Mendelian diseases
Limitations of the single-gene
approach
One might or might not be aware of it, but the single-gene
approach implies the following assumptions:
• Only one gene (or at least one main gene) is the
causative agent of the disease.
• The disease has complete (or very high) penetrance.
• Deleterious variation with no phenotype is not expected
(or expected to be marginal)
However, a high level of deleterious
variability exists in the human genome
Variants predicted to severely affect the function
of human protein coding genes (known as loss-of-
function -LOF- variants) were thought:
To have a potential deleterious effect
To be associated to severe Mendelian
disease
An unexpectedly large number of LOF variants
have been found in the genomes of apparently
healthy individuals: 281-515 missense
substitutions per individual, 40-85 of them in
homozygous state and predicted to be highly
damaging.
A similar proportion was observed in miRNAs
and possibly affect to any functional element in
the genome
Such apparently deleterious mutation must be first detected and
then distinguished from real pathological mutations
So far so good… but is the single-gene
approach realistic? Can we detect such
disease-related variants so easily?
There are several problems:
a) There is a non-negligible amount of apparently deleterious variants
that seem to have no pathologic effect (effect filter fails)
b) Sometimes we are not targeting rare but common variants (which
occur in normal population) (MAF filter fails)
c) Interrogating 60Mb sites produces too many variants. Many of these
segregating with our experimental design (segregation filter fails)
d) In many cases only one variant does not explain the disease but rather
a combination of them (single-gene concept fails)
That results in a low ratio of discovery of variants associated to the
disease, that usually account for a small portion of the trait heritability
Clear individual gene associations are difficult
to find for most diseases
Affected cases in complex diseases actually conform a heterogeneous population with
different mutations (or combinations).
Many cases and controls are needed to obtain a few significant associations (more
prevalent, often population-specific variants, non-reproducible).
The only common element is the (know or unknown) functional module affected.
Disease understood as the failure of a functional module
CasesControls
Modular nature of human
genetic diseases
• With the development of systems biology, studies have shown that
phenotypically similar diseases are often caused by functionally related
genes, being referred to as the modular nature of human genetic
diseases (Oti and Brunner, 2007; Oti et al, 2008).
• This modularity suggests that causative genes for the same or
phenotypically similar diseases may generally reside in the same
biological module, either a protein complex (Lage et al, 2007), a sub-
network of protein interactions (Lim et al, 2006) , or a pathway (Wood et
al, 2007)
Disease genes are close in the interactome
Goh 2007 PNAS
Same disease
in different
populations is
caused by
different genes
affecting the
same functions
Fernandez 2013
Orphanet J Rare
Dis. 2013
Pipeline of data analysis
Primary
processing
Initial QC
FASTQ file
Mapping
BAM file
Variant calling
VCF File
Knowledge-based
prioritization
Proximity to other
known disease genes
Functional proximity
Network proximity
Burden tests
Other prioritization
methods
Secondary analysis
(Heuristic filtering)
Variant annotation
Filtering by effect
Filtering by MAF
Filtering by family
segregation
Primary
analysis
Gene prioritization
Defining functional modules with the
interactome
SNPs
WES/WGS
Gene
expression
Using protein interaction
networks as an scaffold to
interpret the genomic data in
a functionally-derived context
AND/OR
What part of the
interactome is active
and/or is damaged
CHRNA7 (rs2175886 p = 0.000607)
IQGAP2 (rs950643 p = 0.0003585)
DLC1 (rs1454947 p = 0.007526)
SNPs validated in independent cohorts
Network analysis helps to find disease genes
Network analysis and the 3D
interactome in cancer
Positively selected mutations in cancer tend to affect interaction
domains, and preferentially in proteins central in the interactome
Porta-Pardo et al., 2015 PLoS Comp Biol
Fig 5. Mutations in TP53 interface predict patient's outcomes.
Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A (2015) A Pan-Cancer Catalogue of Cancer Driver Protein Interaction
Interfaces. PLoS Comput Biol 11(10): e1004518. doi:10.1371/journal.pcbi.1004518
http://journals.plos.org/ploscompbiol/article?id=info:doi/10.1371/journal.pcbi.1004518
Fig 3. Network properties of interface-driver genes.
Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A (2015) A Pan-Cancer Catalogue of Cancer Driver Protein Interaction
Interfaces. PLoS Comput Biol 11(10): e1004518. doi:10.1371/journal.pcbi.1004518
http://journals.plos.org/ploscompbiol/article?id=info:doi/10.1371/journal.pcbi.1004518
A total of 103
genes with
potentially
driver
interfaces were
found (73 of
them yet
undescribed)
From gene-based to
mechanism-based perspective
Transforming gene expression values into another value
that accounts for a function. Easiest example of modeling
function: signaling pathways. Function: transmission of a
signal from a receptor to an effector
Receptors Effectors Important assumption:
collective changes in
gene expression within
the context of a
signaling circuit are
proxies of changes in
protein activation
What would you
predict about the
consequences of
gene activity changes
in the apoptosis
pathway in a case
control experiment of
colorectal cancer?
The figure shows the
gene up-regulations
(red) and down-
regulations (blue)
The effects of changes in gene
activity are not obvious
Apoptosis
inhibition is
not obvious
from gene
expression
Two of the three possible sub-
pathways leading to apoptosis
are inhibited in colorectal
cancer. Upper panel shows the
inhibited sub-pathways in blue.
Lower panel shows the actual
gene up-regulations (red) and
down-regulations (blue) that
justify this change in the activity
of the sub-pathways
Pathway analysis helps, in addition, to
understand disease mechanisms
Fanconi Anemia is a rare chromosome instability syndrome characterized by aplastic
anemia and cancer and leukemia susceptibility. It has been proposed that disruption of the
apoptotic control, a hallmark of FA, accounts for part of the phenotype of the disease.
No
proliferation
No
degradation
Survival
No
degradation
No
apoptosis
Activation
apoptosis
pathway
Cases and controls: Different signaling
As tumor grade / stage
progresses…
Up
• Cell survival
• Proliferation, differentiation
• Cell cycle progression
• Antiapoptosis
Down
• Glucose homeostasis
• Metabolism
• Degradation
Tumor grade / stage
Signaling changes and functionalities affected in
543 Kidney Renal Clear Cell Carcinoma samples
from ICGC/TCGA database
Using circuit activation probabilities
as features for prediction
Circuit activation
probabilities are
mechanism-based
biomarkers
1. Generation of features: signaling
circuit activities
2. Training set
3. Prediction
Prediction of IC50 values from the
activity of signaling circuits
Is gene expression a good proxy
for protein activity?
However, collective changes
in gene expression within the
context of a signalling circuit
can be considered good
proxies of changes in protein
activation
The scarce phosphoproteomic
data available are consistent
with circuit activations
predicted from gene
expression changes
The observation of the expression of a gene does not guarantee that
the corresponding protein is present and active.
Circuit TNF-NPY from the Adipocytokine signaling pathway (hsa04920).
Nodes in red contain proteins found to be hyperphosphorylated when
comparing treated versus untreated lung cancer cells. Nodes in blue
contain proteins found to be dephosphorylated in the same comparison.
BiologicalvariabilityTechnicalvariability
Technical variability is enormously
reduced at the pathway level
Human neuroblastoma cell lines.
RNA-seq vs microarrays
Errors across genes seem
to be cancelled within the
signaling circuits
Gene1
Gene2
Gene3
Gene4
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
Gene22000
Raw measurement
Transformed
measurement:
Receptor to effector
signaling activities
Circuit1
Circuit2
Circuit3
Circuit4
:
:
:
:
:
:
:
:
Circuit800
:
:
:
Abstraction levels when converting
genomic data into functional information
Second
transformation:
All receptors to
final effector
signaling
activities
Third
transformation:
Final effector
activities to
functions triggered
Some
functions
can be
cancer
hallmarks
Gaining insight into disease
mechanisms and MoA
From gene-based to function-based analysis perspective
SNPs, gene
expression,
etc.
GO
Protein
interaction
networks
Models of
cellular
functions
Detection
power
Low (only
very
prevalent
genes)
High High Very high
Information
coverage
Almost all Almost all Low (~9000
genes in
human)
Low (~6700
genes in
human)*
Use Biomarker Illustrative,
give hints
Biomarker Biomarker
that explain
disease
mechanism
*Only ~800 genes in human signaling pathways
The use of new algorithms that enable the transformation of genomic
measurements into cell functionality measurements that account for
disease mechanisms and for drug mechanisms of action will ultimately
allow the real transition from today’s empirical medicine to precision
medicine and provide increasingly personalized medicine
The real transition to precision medicine
Intuitive
Based on trial
and error
Identification of
probabilistic
patterns
Decisions and
actions based
on knowledge
Intuitive Medicine Empirical Medicine Precision Medicine
Today Tomorrow
Degree of personalization
Software available
See interactive map of for the last 24h use http://bioinfo.cipf.es/toolsusage
Babelomics is the third most cited tool for
functional analysis. Includes more than 30
tools for advanced, systems-biology based
data analysis
More than 150.000 experiments were analyzed in our tools during the last year
HPC on CPU, SSE4,
GPUs on NGS data
processing
Speedups up to 40X
Genome maps is now part
of the ICGC data portal
Ultrafast
genome
viewer with
google
technology
Mapping
Visualization
Functional analysis
Variant annotation
CellBase
Knowledge
database
Variant
prioritization
NGS
panels
Signaling network Regulatory
network
Interaction
network
Diagnostic
The Computational Genomics Department at the
Centro de Investigación Príncipe Felipe (CIPF),
Valencia, Spain, and…
...the INB, National
Institute of
Bioinformatics
(Functional Genomics
Node)
and the BiER
(CIBERER Network of
Centers for Rare
Diseases)
@xdopazo
@bioinfocipf
Follow us on twitter

More Related Content

What's hot

The trivial case of the missing heritability
The trivial case of the missing heritabilityThe trivial case of the missing heritability
The trivial case of the missing heritability
Max Moldovan
 
Gardner and Song_2015_Genetics in Medicine
Gardner and Song_2015_Genetics in MedicineGardner and Song_2015_Genetics in Medicine
Gardner and Song_2015_Genetics in Medicine
Amanda Natalizio
 
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
Antoaneta Vladimirova
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...
mhb120
 

What's hot (20)

Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decade
 
The trivial case of the missing heritability
The trivial case of the missing heritabilityThe trivial case of the missing heritability
The trivial case of the missing heritability
 
Big Data and Genomic Medicine by Corey Nislow
Big Data and Genomic Medicine by Corey NislowBig Data and Genomic Medicine by Corey Nislow
Big Data and Genomic Medicine by Corey Nislow
 
Big Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized MedicineBig Data Challenges for Real-Time Personalized Medicine
Big Data Challenges for Real-Time Personalized Medicine
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Overview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategyOverview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategy
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
Gardner and Song_2015_Genetics in Medicine
Gardner and Song_2015_Genetics in MedicineGardner and Song_2015_Genetics in Medicine
Gardner and Song_2015_Genetics in Medicine
 
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
A Retrospective Analysis of Exome Sequencing Cases Using the GenePool™ Genomi...
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
Raj Lab Meeting May/01/2019
Raj Lab Meeting May/01/2019Raj Lab Meeting May/01/2019
Raj Lab Meeting May/01/2019
 
The Clinical Genome Conference 2014
The Clinical Genome Conference 2014The Clinical Genome Conference 2014
The Clinical Genome Conference 2014
 
Genovesio et al j biomol screen 2011-genovesio-1087057111415521
Genovesio et al j biomol screen 2011-genovesio-1087057111415521Genovesio et al j biomol screen 2011-genovesio-1087057111415521
Genovesio et al j biomol screen 2011-genovesio-1087057111415521
 
FunGen JC Presentation - Mostafavi et al. (2019)
FunGen JC Presentation - Mostafavi et al. (2019)FunGen JC Presentation - Mostafavi et al. (2019)
FunGen JC Presentation - Mostafavi et al. (2019)
 
Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBI
 
Mohammed I Khan
Mohammed I KhanMohammed I Khan
Mohammed I Khan
 
Invicta eshre-poster-pregnancy rate after frozen blastocyst
Invicta eshre-poster-pregnancy rate after frozen blastocystInvicta eshre-poster-pregnancy rate after frozen blastocyst
Invicta eshre-poster-pregnancy rate after frozen blastocyst
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...
 
Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2
Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2
Translational Genomics and Prostate Cancer: Meet the NGS Experts Series Part 2
 

Similar to Digging into thousands of variants to find disease genes in Mendelian and complex diseases

Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur
opensourcepharmafound
 
Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6
stanislas547
 
508 search for genomic and proteomic risk factors and protective factors asso...
508 search for genomic and proteomic risk factors and protective factors asso...508 search for genomic and proteomic risk factors and protective factors asso...
508 search for genomic and proteomic risk factors and protective factors asso...
SHAPE Society
 

Similar to Digging into thousands of variants to find disease genes in Mendelian and complex diseases (20)

Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategies
 
Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur Open Source Pharma /Genomics and clinical practice / Prof Hosur
Open Source Pharma /Genomics and clinical practice / Prof Hosur
 
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
Medicine of the Future—The Transformation from Reactive to Proactive (P4) Med...
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdf
 
Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
MLGG_for_linkedIn
MLGG_for_linkedInMLGG_for_linkedIn
MLGG_for_linkedIn
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
 
The Foundation of P4 Medicine
The Foundation of P4 MedicineThe Foundation of P4 Medicine
The Foundation of P4 Medicine
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge
 
From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...
 
gene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptxgene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptx
 
Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6Sk microfluidics and lab on-a-chip-ch6
Sk microfluidics and lab on-a-chip-ch6
 
Genomics experimental-methods
Genomics experimental-methodsGenomics experimental-methods
Genomics experimental-methods
 
The State of Play in Diagnosis
The State of Play in DiagnosisThe State of Play in Diagnosis
The State of Play in Diagnosis
 
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Supporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi RehmSupporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi Rehm
 
508 search for genomic and proteomic risk factors and protective factors asso...
508 search for genomic and proteomic risk factors and protective factors asso...508 search for genomic and proteomic risk factors and protective factors asso...
508 search for genomic and proteomic risk factors and protective factors asso...
 

More from Joaquin Dopazo

More from Joaquin Dopazo (7)

Accelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwideAccelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwide
 
Navigating through disease maps
Navigating through disease mapsNavigating through disease maps
Navigating through disease maps
 
Differential metabolic activity and discovery of therapeutic targets using su...
Differential metabolic activity and discovery of therapeutic targets using su...Differential metabolic activity and discovery of therapeutic targets using su...
Differential metabolic activity and discovery of therapeutic targets using su...
 
Taller Genómica y cáncer Introducción
Taller Genómica y cáncer IntroducciónTaller Genómica y cáncer Introducción
Taller Genómica y cáncer Introducción
 
A look at the human mutational load from the systems biology perspective
A look at the human mutational load from the systems biology perspectiveA look at the human mutational load from the systems biology perspective
A look at the human mutational load from the systems biology perspective
 
Functional profile of the pre- to post-mortem transition in blood
Functional profile of the pre- to post-mortem transition in bloodFunctional profile of the pre- to post-mortem transition in blood
Functional profile of the pre- to post-mortem transition in blood
 
Big data genomico
Big data genomicoBig data genomico
Big data genomico
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 

Digging into thousands of variants to find disease genes in Mendelian and complex diseases

  • 1. Joaquín Dopazo Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB), Bioinformatics in Rare Diseases (BiER-CIBERER), Valencia, Spain. Digging into thousands of variants to find disease genes in Mendelian and complex diseases http://bioinfo.cipf.es http://www.babelomics.org @xdopazo IV DCEX symposium, Barcelona, November 17th, 2015
  • 2. Precision medicine (P4*) is based on a better knowledge of phenotype- genotype relationships Requires of a better way of defining diseases by introducing genomic technologies in the diagnostic procedures and treatment decisions *P4: Predictive, Preventive, Personalized, Participative Setting the problem in context: The transition to precision medicine Intuitive Based on trial and error Identification of probabilistic patterns Decisions and actions based on knowledge Intuitive Medicine Empirical Medicine Precision Medicine (P4) Today Tomorrow Degree of personalization
  • 3. Empirical Medicine First Phase I: generation of the knowledge database ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- sequencing Patient Variants Database. Query Therapy outcome System feedback Genomic variants (biomarkers) can be quickly associated to precise diagnostics and therapy outcomes Initially the system will need much feedback: Knowledge generation phase. Empirical medicine Knowledge database Genome projects enable knowledge generation
  • 4. Rare and familiar diseases exome sequencing initiative • Metabolic (86 samples) • Optiz • Atypical fracture • coQ10 deficiency • Congenital disorder of glycosylation types I and II • Maple syrup urine disease • Pelizaeus-like • 4 unknown syndroms • Genetic (24 samples) • Charcot-Marie-Tooth • Rett Syndrome • Neurosensorial (35 samples) • Usher • AD non-syndromic hearing loss • AR non-syndromic hearing loss • RP • Mitochondrial (28 samples) • Progressive External Oftalmoplegy • Multi-enzymatic deficiency in mitochondrial respiratory complexes • CoQ disease • Other • APL (10 samples) Autism (37 samples) Mental retardation (autosomal recessive) (24) Immunodeficiency (18) Leber's congenital amaurosis (9) Cataract (2) RP(AR) (60) RP(AD) (46) Deafness (24) CLAPO (4) Skeletal Dysplasia (3) Cantú syndrome (1) Dubowitz syndrome (2) Gorham-Stout syndrome (1) Malpuech syndrome (4) Hirschprung’s disease (81) Hereditary macrothrombocytopenia (3) MTC (41) Controls (301) 1044 samples = 183 samples + 200 controls + 360 samples + 301 controls
  • 5. Data management, analysis and storage http://www.gbpa.es/ GCGTATAG CACGGGTA TCTGTATTA TGGTGGAT ATCAGCGG ATTGCGATT GGCAGAGC GGCAAAGT GCGTATAG CACGGGTA TCTGTATTA TGGTGGAT ATCAGCGG ATTGCGATT GGCAGAGC GGCAAAGT GCGTATAG CACGGGTA TCTGTATTA TGGTGGAT ATCAGCGG ATTGCGATT GGCAGAGC GGCAAAGT GCGTATAG CACGGGTA TCTGTATTA TGGTGGAT ATCAGCGG ATTGCGATT GGCAGAGC GGCAAAGT Raw files (FastQ) DB Analysis Pipeline Storage K-DB Gene 1 ksdhkahcka Gene 2 jckacsksda Gene 3 lkkxkccj<jdc Gene 4 ksfdjvjvlsdkvjd Gene 5 kckcksñdksd Gene 6 ldkdkcksdcldl Gene x kcdlkclkldsklk Gene Y jcdksdkcdks Prioritization report Dialog with experts in the disease + validations Samples GCGTATAG CACGGGTA TCTGTATTA TGGTGGAT ATCAGCGG GCGTATAG CACGGGTA TCTGTATTA TGGTGGAT ATCAGCGG VCF BAM Processed files
  • 6. Casablanca: Round up all suspicious characters and search them for stolen documents. The prioritization process Typically, an exome renders between 40 and 60K variants (and a genome about 1 million). Only one or a few among all of them are expected to be the causative factors of the disease. The prioritization process is like a police investigation in which suspected are discarded by their alibies Thus, through sequential heuristic filtering steps, unlikely candidates are discarded and a final, reduced list with one or a few candidate genes is (hopefully) produced Agatha Christie 40-60K suspects? We need…
  • 7. Pipeline of data analysis Primary processing Initial QC FASTQ file Mapping BAM file Variant calling VCF File Knowledge-based prioritization Proximity to other known disease genes Functional proximity Network proximity Burden tests Other prioritization methods Secondary analysis (Heuristic filtering) Variant annotation Filtering by effect Filtering by MAF Filtering by family segregation Primary analysis Gene prioritization
  • 8. Pipeline of data analysis Primary processing Initial QC FASTQ file Mapping BAM file Variant calling VCF File Knowledge-based prioritization Proximity to other known disease genes Functional proximity Network proximity Burden tests Other prioritization methods Secondary analysis (Heuristic filtering) Variant annotation Filtering by effect Filtering by MAF Filtering by family segregation Primary analysis Gene prioritization CellBase Knowledge DB (Now at EBI)
  • 9. The knowledge database for filtering by effect CellBase (Bleda, 2012, NAR), a comprehensive integrative database and RESTful Web Services API: ● Core features: genes, transcripts, exons, cytobands, proteins (UniProt),... ● Variation: dbSNP and Ensembl SNPs, HapMap, 1000Genomes, Cosmic, ClinVar, ... ● Functional: 40 OBO ontologies (Gene Ontology, HPO, etc.), Interpro, etc. ● Regulatory: TFBS, miRNA targets, conserved regions, tissue expression etc. ● System biology: Interactome (IntAct), Reactome database, co-expressed genes. NoSQL (Mongo) and scales to TB Wiki: http://docs.bioinfo.cipf.es/projects/cellbase/wiki Project: http://bioinfo.cipf.es/compbio/cellbase Now available at the EBI: http://www.ebi.ac.uk/cellbase/webservices/rest/v3/ and the University of Cambridge: http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest
  • 10. Pipeline of data analysis Primary processing Initial QC FASTQ file Mapping BAM file Variant calling VCF File Knowledge-based prioritization Proximity to other known disease genes Functional proximity Network proximity Burden tests Other prioritization methods Secondary analysis (Heuristic filtering) Variant annotation Filtering by effect Filtering by MAF Filtering by family segregation Primary analysis Gene prioritization 1000 genomes EVS Local variants
  • 11. 3-Methylglutaconic aciduria (3- MGA-uria) is a heterogeneous group of syndromes characterized by an increased excretion of 3-methylglutaconic and 3-methylglutaric acids. WES with a consecutive filter approach is enough to detect the new mutation in this case. Heuristic Filtering approach An example with 3-Methylglutaconic aciduria syndrome
  • 12. Use known variants and their population frequencies to filter out. • Typically dbSNP, 1000 genomes and the 6515 exomes from the ESP are used as sources of population frequencies. • We sequenced 300 healthy controls (rigorously phenotyped) to add and extra filtering step to the analysis pipeline Novembre et al., 2008. Genes mirror geography within Europe. Nature Comparison of MGP controls to 1000g How important do you think local information is to detect disease genes?
  • 13. Filtering with or without local variants Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant The use of local variants makes an enormous difference
  • 14. The CIBERER Spanish Variant Server (CSVS): the first repository of variability of the Spanish population Only another similar initiative exists: the GoNL http://www.nlgenome.nl/ http://ciberer.es/bier/exome-server/ And more recently the Finnish and the Icelandic populations
  • 15. Information provided Genotype frequencies in the different reference populations Genomic coordinates, variation, gene. SNPid if any
  • 17. Accelerating the loop of knowledge generation BiERapp: interactive web-based tool for easy candidate prioritization by successive filtering SEQUENCING CENTER Data preprocessing VCF FASTQ Genome Maps BAM BiERapp filters No-SQL (Mongo) VCF indexing Population frequencies Consequence types Experimental design BAM viewer and Genomic context? Easy scaleup
  • 18. Quick contribution to empirical medicine: new variants and disease genes found with WES and heuristic filtering WES IRDs arRP (EYS) BBS arRParRP (USH2) 3-MGA- uria (SERAC1) NBD (BCKDK ) In less than two years, the CIBERER initiative rendered 36 new mutations in known disease genes and 27 new mutations in13 new disease genes.
  • 19. Limitations of the single-gene approach One might or might not be aware of it, but the single-gene approach implies the following assumptions: • Only one gene (or at least one main gene) is the causative agent of the disease. • The disease has complete (or very high) penetrance. • Deleterious variation with no phenotype is not expected (or expected to be marginal)
  • 20. Even Mendelian genes can be elusive. Intuitive belief: multiple family information should help Families 1 2 3 4 5 6 Variants 3403 82 4 0 0 0 Genes 2560 331 35 8 1 0 Observation: this is not always true, not even in cases of Mendelian diseases
  • 21. Limitations of the single-gene approach One might or might not be aware of it, but the single-gene approach implies the following assumptions: • Only one gene (or at least one main gene) is the causative agent of the disease. • The disease has complete (or very high) penetrance. • Deleterious variation with no phenotype is not expected (or expected to be marginal)
  • 22. However, a high level of deleterious variability exists in the human genome Variants predicted to severely affect the function of human protein coding genes (known as loss-of- function -LOF- variants) were thought: To have a potential deleterious effect To be associated to severe Mendelian disease An unexpectedly large number of LOF variants have been found in the genomes of apparently healthy individuals: 281-515 missense substitutions per individual, 40-85 of them in homozygous state and predicted to be highly damaging. A similar proportion was observed in miRNAs and possibly affect to any functional element in the genome Such apparently deleterious mutation must be first detected and then distinguished from real pathological mutations
  • 23. So far so good… but is the single-gene approach realistic? Can we detect such disease-related variants so easily? There are several problems: a) There is a non-negligible amount of apparently deleterious variants that seem to have no pathologic effect (effect filter fails) b) Sometimes we are not targeting rare but common variants (which occur in normal population) (MAF filter fails) c) Interrogating 60Mb sites produces too many variants. Many of these segregating with our experimental design (segregation filter fails) d) In many cases only one variant does not explain the disease but rather a combination of them (single-gene concept fails) That results in a low ratio of discovery of variants associated to the disease, that usually account for a small portion of the trait heritability
  • 24. Clear individual gene associations are difficult to find for most diseases Affected cases in complex diseases actually conform a heterogeneous population with different mutations (or combinations). Many cases and controls are needed to obtain a few significant associations (more prevalent, often population-specific variants, non-reproducible). The only common element is the (know or unknown) functional module affected. Disease understood as the failure of a functional module CasesControls
  • 25. Modular nature of human genetic diseases • With the development of systems biology, studies have shown that phenotypically similar diseases are often caused by functionally related genes, being referred to as the modular nature of human genetic diseases (Oti and Brunner, 2007; Oti et al, 2008). • This modularity suggests that causative genes for the same or phenotypically similar diseases may generally reside in the same biological module, either a protein complex (Lage et al, 2007), a sub- network of protein interactions (Lim et al, 2006) , or a pathway (Wood et al, 2007) Disease genes are close in the interactome Goh 2007 PNAS Same disease in different populations is caused by different genes affecting the same functions Fernandez 2013 Orphanet J Rare Dis. 2013
  • 26. Pipeline of data analysis Primary processing Initial QC FASTQ file Mapping BAM file Variant calling VCF File Knowledge-based prioritization Proximity to other known disease genes Functional proximity Network proximity Burden tests Other prioritization methods Secondary analysis (Heuristic filtering) Variant annotation Filtering by effect Filtering by MAF Filtering by family segregation Primary analysis Gene prioritization
  • 27. Defining functional modules with the interactome SNPs WES/WGS Gene expression Using protein interaction networks as an scaffold to interpret the genomic data in a functionally-derived context AND/OR What part of the interactome is active and/or is damaged
  • 28. CHRNA7 (rs2175886 p = 0.000607) IQGAP2 (rs950643 p = 0.0003585) DLC1 (rs1454947 p = 0.007526) SNPs validated in independent cohorts Network analysis helps to find disease genes
  • 29. Network analysis and the 3D interactome in cancer Positively selected mutations in cancer tend to affect interaction domains, and preferentially in proteins central in the interactome Porta-Pardo et al., 2015 PLoS Comp Biol
  • 30. Fig 5. Mutations in TP53 interface predict patient's outcomes. Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A (2015) A Pan-Cancer Catalogue of Cancer Driver Protein Interaction Interfaces. PLoS Comput Biol 11(10): e1004518. doi:10.1371/journal.pcbi.1004518 http://journals.plos.org/ploscompbiol/article?id=info:doi/10.1371/journal.pcbi.1004518
  • 31. Fig 3. Network properties of interface-driver genes. Porta-Pardo E, Garcia-Alonso L, Hrabe T, Dopazo J, Godzik A (2015) A Pan-Cancer Catalogue of Cancer Driver Protein Interaction Interfaces. PLoS Comput Biol 11(10): e1004518. doi:10.1371/journal.pcbi.1004518 http://journals.plos.org/ploscompbiol/article?id=info:doi/10.1371/journal.pcbi.1004518 A total of 103 genes with potentially driver interfaces were found (73 of them yet undescribed)
  • 32. From gene-based to mechanism-based perspective Transforming gene expression values into another value that accounts for a function. Easiest example of modeling function: signaling pathways. Function: transmission of a signal from a receptor to an effector Receptors Effectors Important assumption: collective changes in gene expression within the context of a signaling circuit are proxies of changes in protein activation
  • 33. What would you predict about the consequences of gene activity changes in the apoptosis pathway in a case control experiment of colorectal cancer? The figure shows the gene up-regulations (red) and down- regulations (blue) The effects of changes in gene activity are not obvious
  • 34. Apoptosis inhibition is not obvious from gene expression Two of the three possible sub- pathways leading to apoptosis are inhibited in colorectal cancer. Upper panel shows the inhibited sub-pathways in blue. Lower panel shows the actual gene up-regulations (red) and down-regulations (blue) that justify this change in the activity of the sub-pathways
  • 35. Pathway analysis helps, in addition, to understand disease mechanisms Fanconi Anemia is a rare chromosome instability syndrome characterized by aplastic anemia and cancer and leukemia susceptibility. It has been proposed that disruption of the apoptotic control, a hallmark of FA, accounts for part of the phenotype of the disease. No proliferation No degradation Survival No degradation No apoptosis Activation apoptosis pathway
  • 36. Cases and controls: Different signaling As tumor grade / stage progresses… Up • Cell survival • Proliferation, differentiation • Cell cycle progression • Antiapoptosis Down • Glucose homeostasis • Metabolism • Degradation Tumor grade / stage Signaling changes and functionalities affected in 543 Kidney Renal Clear Cell Carcinoma samples from ICGC/TCGA database
  • 37. Using circuit activation probabilities as features for prediction Circuit activation probabilities are mechanism-based biomarkers 1. Generation of features: signaling circuit activities 2. Training set 3. Prediction
  • 38. Prediction of IC50 values from the activity of signaling circuits
  • 39. Is gene expression a good proxy for protein activity? However, collective changes in gene expression within the context of a signalling circuit can be considered good proxies of changes in protein activation The scarce phosphoproteomic data available are consistent with circuit activations predicted from gene expression changes The observation of the expression of a gene does not guarantee that the corresponding protein is present and active. Circuit TNF-NPY from the Adipocytokine signaling pathway (hsa04920). Nodes in red contain proteins found to be hyperphosphorylated when comparing treated versus untreated lung cancer cells. Nodes in blue contain proteins found to be dephosphorylated in the same comparison.
  • 40. BiologicalvariabilityTechnicalvariability Technical variability is enormously reduced at the pathway level Human neuroblastoma cell lines. RNA-seq vs microarrays Errors across genes seem to be cancelled within the signaling circuits
  • 41. Gene1 Gene2 Gene3 Gene4 : : : : : : : : : : : : : : : : : : : : : : : : : Gene22000 Raw measurement Transformed measurement: Receptor to effector signaling activities Circuit1 Circuit2 Circuit3 Circuit4 : : : : : : : : Circuit800 : : : Abstraction levels when converting genomic data into functional information Second transformation: All receptors to final effector signaling activities Third transformation: Final effector activities to functions triggered Some functions can be cancer hallmarks
  • 42. Gaining insight into disease mechanisms and MoA From gene-based to function-based analysis perspective SNPs, gene expression, etc. GO Protein interaction networks Models of cellular functions Detection power Low (only very prevalent genes) High High Very high Information coverage Almost all Almost all Low (~9000 genes in human) Low (~6700 genes in human)* Use Biomarker Illustrative, give hints Biomarker Biomarker that explain disease mechanism *Only ~800 genes in human signaling pathways
  • 43. The use of new algorithms that enable the transformation of genomic measurements into cell functionality measurements that account for disease mechanisms and for drug mechanisms of action will ultimately allow the real transition from today’s empirical medicine to precision medicine and provide increasingly personalized medicine The real transition to precision medicine Intuitive Based on trial and error Identification of probabilistic patterns Decisions and actions based on knowledge Intuitive Medicine Empirical Medicine Precision Medicine Today Tomorrow Degree of personalization
  • 44. Software available See interactive map of for the last 24h use http://bioinfo.cipf.es/toolsusage Babelomics is the third most cited tool for functional analysis. Includes more than 30 tools for advanced, systems-biology based data analysis More than 150.000 experiments were analyzed in our tools during the last year HPC on CPU, SSE4, GPUs on NGS data processing Speedups up to 40X Genome maps is now part of the ICGC data portal Ultrafast genome viewer with google technology Mapping Visualization Functional analysis Variant annotation CellBase Knowledge database Variant prioritization NGS panels Signaling network Regulatory network Interaction network Diagnostic
  • 45. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the BiER (CIBERER Network of Centers for Rare Diseases) @xdopazo @bioinfocipf Follow us on twitter