Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Joaquín Dopazo 
Computational Genomics Department, 
Centro de Investigación Príncipe Felipe (CIPF), 
Functional Genomics N...
Allison, 2008. Is personalized medicine finally arriving? Nature. 
Personalized medicine: just about a better understandin...
The future of personalized medicine is strongly based on genomics 
•Personalized medicine is based on the availability of ...
While the cost falls down, the amount of data to manage and its complexity raise exponentially. 
Costs are already almost ...
http://www.nih.gov/news/health/jun2014/nhgri-18.htm 
http://www.nejm.org/doi/full/10.1056/NEJMra1312543 
More than 10,000 ...
Personalized Genomic Medicine. Phase I: generating the knowledge database 
----- ----- ----- ----- ----- ----- ----- -----...
Personalized genomic medicine. Phase II: applying the knowledge database 
Patient 
1)Genomic sequencing 
2)Database of mar...
From genetics to genomic medicine 
Test 1 
Test 2 
Therapy 1 
Therapy 2 
Therapy 3 
? 
Genetic medicine 
Test 
Therapy 1 
...
Some examples 
Conventional sequencing 
NGS (with capture) 
Marfan syndrome 
1300€ 
2 genes, 75 exons 
900€ 
3 genes, 237 ...
Preparing the scenario for the introduction of genome in the clinics 
Patient 
Treatment 
eHR 
Decision support techniques...
Preparing the scenario for the introduction of genome in the clinics 
Patient 
Treatment 
eHR 
feedback 
Corporative syste...
New Big Data storage strategies 
Automatic QC Sequence cleansing 
Variant calling + QC 
Mapping 
+ QC 
8-10 hours 8-12 hou...
Tools developed to improve the pipeline Genome Maps, a HTML5+SVG data visualization of VCF and BAM 
oGenome scale data vis...
Patient 
Treatment 
eHR 
feedback 
Corporative systems 
Orion clinic Abucasis, Gaia, etc. 
Acceleration of algorithms for ...
Finding new biomarkers 
Test 
Therapy 1 
Therapy 2 
Therapy 3 
? 
feedback 
Feedback: treatment failures are reanalyzed to...
Patient 
Treatment 
eHR 
feedback 
Corporative systems Orion clinic Abucasis, Gaia, etc. 
Decision support techniques: alg...
BiERapp: interactive web-based tool for easy candidate prioritization by successive filtering 
SEQUENCING CENTER 
Data pre...
NA19660 NA19661 
NA19600 NA19685 
BiERapp: the interactive filtering tool for easy candidate prioritization 
http://bierap...
3-Methylglutaconic aciduria (3- MGA-uria) is a heterogeneous group of syndromes characterized by an increased excretion of...
Use known variants and their population frequencies to filter out irrelevant polymorphisms. 
•Typically dbSNP, 1000 genome...
Filtering with or without local variants 
Number of genes as a function of individuals in the study of a dominant disease ...
New variants and disease genes found with WES and successive filtering 
WES 
IRDs 
arRP (EYS) 
BBS 
arRP 
arRP (USH2) 
3-M...
Knowledge DB 
Freq. popul. 
MySeq IonTorrent IonProton 
Illumina 
NO 
Diagnostic 
Therapeutic decision 
New variants 
Dise...
Diagnostic by targeted sequencing (panels of genes) 
Tool for defining panels 
New filter based on local population varian...
Implementation of tools in the IT4I Supercomputing Center (Czech Republic) 
The pipelines of primary and secondary analysi...
Implementation in the AVS 
….. 
1PB DB 
We have taken advantage of the already operative corporative medical image system ...
Knowledge DB 
Freq. popul. 
MySeq 
IonTorrent 
IonProton 
Illumina 
NO 
Diagnostic 
Therapeutic decision 
New variants 
Di...
Patient’s omic data Biological 
knowledge 
Systems 
biology 
computational 
models 
Epigenomics Regulation 
Interaction 
F...
Modeling pathways The effect of gene expression over signaling can be estimated. Virtual KOs (or over-expressions) can be ...
Future prospects Exome vs complete genome
The ENCODE project suggests a functional 
role for a large fraction of the genome 
Which percentage of the genome is 
occu...
Future prospects 
We need to efficiently query all the information contained in the genome, including all the epigenomic s...
The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… 
...the...
Upcoming SlideShare
Loading in …5
×

Forum on Personalized Medicine: Challenges for the next decade

2,257 views

Published on

Bioinformatics and Big Data in the era of Personalized Medicine
10th Anniversary Instituto Roche Forum on Personalized Medicine: Challenges for the next decade.
Santiago de Compostela (Spain), September 25th 2014

Published in: Healthcare
  • Be the first to comment

Forum on Personalized Medicine: Challenges for the next decade

  1. 1. Joaquín Dopazo Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB), Bioinformatics Group (CIBERER) and Medical Genome Project, Spain. http://bioinfo.cipf.es http://www.medicalgenomeproject.com http://www.babelomics.org http://www.hpc4g.org @xdopazo Forum on Personalized Medicine, 25 September 2014 Bioinformatics and Big Data in the era of Personalized Medicine
  2. 2. Allison, 2008. Is personalized medicine finally arriving? Nature. Personalized medicine: just about a better understanding of the relationship phenotype-genotype Personalized medicine through precision medicine •Precision medicine requires of better ways of defining diseases by introducing genomic technologies into the diagnostic procedures. •A more precise diagnostic of diseases, based on the description of their molecular mechanisms, is critical for creating innovative diagnostic, prognostic, and therapeutic strategies properly tailored to each patient’s necessities
  3. 3. The future of personalized medicine is strongly based on genomics •Personalized medicine is based on the availability of diagnostic biomarkers •Genome sequencing offers ALL this information (if properly analyzed) •Genome sequence prices are in free fall (exome price expected < 300€ in 2-3 years) •Over 30-40 % of budget (>500 B $) per year, is spent on costs associated with “overuse, underuse, misuse, ...”
  4. 4. While the cost falls down, the amount of data to manage and its complexity raise exponentially. Costs are already almost competitive enough to be used in clinic The problem is… are we ready to deal with this data? Exome sequencing successfully used. NGS prices will be soon affordable. http://www.genome.gov/sequencingcosts/
  5. 5. http://www.nih.gov/news/health/jun2014/nhgri-18.htm http://www.nejm.org/doi/full/10.1056/NEJMra1312543 More than 10,000 exomes will be ordered for diagnostic purposes Clinical application of exomes
  6. 6. Personalized Genomic Medicine. Phase I: generating the knowledge database ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- sequencing Patient List of variants Database. Query: variant/pathway Therapy Outcome System feedback Genetic variants are linked to therapies through the knowledge of their functional effects (systems biology) Initially the system will need much feedback: Knowledge generation phase. Growing knowledge database Genomic medicine Knowledge database
  7. 7. Personalized genomic medicine. Phase II: applying the knowledge database Patient 1)Genomic sequencing 2)Database of markers 3)Therapy prediction Genomic core facility phase II Clinician receives hints on possible prescriptions and therapeutic interventions + Other factors (risk, cost, etc.) Prescription Pre-symptomatic: • Genetic predisposition of acquired diseases (>6000. some treatable) • Early diagnosis of genetic diseases Symptomatic analysis • Diagnostic of acquired diseases • Early cancer detection • Cancer treatment recommendation
  8. 8. From genetics to genomic medicine Test 1 Test 2 Therapy 1 Therapy 2 Therapy 3 ? Genetic medicine Test Therapy 1 Therapy 2 Therapy 3 ? Genomic medicine + Genomic analysis allows associating patients to therapies from the very beginning, saving time and costs and increasing the success of treatments. feedback
  9. 9. Some examples Conventional sequencing NGS (with capture) Marfan syndrome 1300€ 2 genes, 75 exons 900€ 3 genes, 237 exons Hereditary deafness 12500€ 36 genes 1500 exons 1100€ 38 genes > 1500 exons • Low initial investment • Already existent infrastructure • Quick implementation • Easily implementation as a cloud service that guarantees sustainability
  10. 10. Preparing the scenario for the introduction of genome in the clinics Patient Treatment eHR Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Integration of the data in the eHR Visualization and data presentation. Ready for the clinical interpretation Acceleration of algorithms for data pre- processing. Data strorage optimization feedback Corporative systems Orion clinic Abucasis, Gaia, etc.
  11. 11. Preparing the scenario for the introduction of genome in the clinics Patient Treatment eHR feedback Corporative systems Orion clinic Abucasis, Gaia, etc. Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Visualization and data presentation. Ready for the clinical interpretation Integration of the data in the eHR Acceleration of algorithms for data pre- processing. Data strorage optimization
  12. 12. New Big Data storage strategies Automatic QC Sequence cleansing Variant calling + QC Mapping + QC 8-10 hours 8-12 hours 8-12 hours CLOUD FASTQ (10GB) BAM (7GB) VCF (200MB) Data sizes for exomes. In case of whole genomes sizes are >20x Remote visualization of big data. Data production phase e-health record Final human supervision of data QC
  13. 13. Tools developed to improve the pipeline Genome Maps, a HTML5+SVG data visualization of VCF and BAM oGenome scale data visualization plays an important role in the data analysis process. It is a big data management problem. oFeatures of Genome Maps (Medina, 2013, NAR; ICGC data analysis portal) ●First 100% HTML5 web based: HTML5+SVG (inspired in Google Maps) ●Always updated, no browser plugins or installation ●Data taken from CellBase, remote NGS data, local files and DAS servers: genes, transcripts, exons, SNPs, TFBS, miRNA targets, etc. ●Other features: Multi species, API oriented, easy integration, plugin framework, etc. BAM viewer VCF viewer ICGC genomic viewer www.genomemaps.org
  14. 14. Patient Treatment eHR feedback Corporative systems Orion clinic Abucasis, Gaia, etc. Acceleration of algorithms for data pre- processing. Data strorage optimization Integration of the data in the eHR Visualization and data presentation. Ready for the clinical interpretation Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Preparing the scenario for the introduction of genome in the clinics
  15. 15. Finding new biomarkers Test Therapy 1 Therapy 2 Therapy 3 ? feedback Feedback: treatment failures are reanalyzed to search for: 1)Biomarkers (of failure) 2)Subgroups (to search for new personalized and rational therapeutic interventions Treatables Failure treatment biomarkers Group A biomarkers Group A biomarkers Irrelevant Non treatables Signaling Protein interaction Regulation Variants are used as biomarkers to distinguish between responders and non-responders and to sub-classify non-responders Rationale design of therapies rely on Systems Biology concepts. Pathways are complex and must be understood with the proper bioinformatic tools
  16. 16. Patient Treatment eHR feedback Corporative systems Orion clinic Abucasis, Gaia, etc. Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Acceleration of algorithms for data pre- processing. Data strorage optimization Visualization and data presentation. Ready for the clinical interpretation Integration of the data in the eHR Preparing the scenario for the introduction of the genome in clinics
  17. 17. BiERapp: interactive web-based tool for easy candidate prioritization by successive filtering SEQUENCING CENTER Data preprocessing VCF FASTQ Genome Maps BAM BiERapp filters No-SQL (Mongo) VCF indexing Population frequencies Consequence types Experimental design BAM viewer and Genomic context ? Easy scale up
  18. 18. NA19660 NA19661 NA19600 NA19685 BiERapp: the interactive filtering tool for easy candidate prioritization http://bierapp.babelomics.org Aleman et al., 2014 NAR
  19. 19. 3-Methylglutaconic aciduria (3- MGA-uria) is a heterogeneous group of syndromes characterized by an increased excretion of 3-methylglutaconic and 3-methylglutaric acids. WES with a consecutive filter approach is enough to detect the new mutation in this case. Successive Filtering approach An example with 3-Methylglutaconic aciduria syndrome
  20. 20. Use known variants and their population frequencies to filter out irrelevant polymorphisms. •Typically dbSNP, 1000 genomes and the 6515 exomes from the ESP are used as sources of population frequencies. •We sequenced 300 healthy controls (rigorously phenotyped) to add and extra filtering step to the analysis pipeline Novembre et al., 2008. Genes mirror geography within Europe. Nature Comparison of MGP controls to 1000g How important do you think local information is to detect disease genes?
  21. 21. Filtering with or without local variants Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant The use of local variants makes an enormous difference
  22. 22. New variants and disease genes found with WES and successive filtering WES IRDs arRP (EYS) BBS arRP arRP (USH2) 3-MGA- uria (SERAC1) NBD (BCKDK )
  23. 23. Knowledge DB Freq. popul. MySeq IonTorrent IonProton Illumina NO Diagnostic Therapeutic decision New variants Disease All Candidate Prioritization Data preprocessing Sequence DB Sequences Freqs. Future technologies New knowledge for future diagnostic The final schema: diagnostic and discovery
  24. 24. Diagnostic by targeted sequencing (panels of genes) Tool for defining panels New filter based on local population variant frequencies If no diagnostic variants appear, then secondary findings are studied Diagnostic mutations http://team.babelomics.org
  25. 25. Implementation of tools in the IT4I Supercomputing Center (Czech Republic) The pipelines of primary and secondary analysis developed by the Computational Genomics Department of the CIPF in close collaboration with the Bull Chair has proven its efficiency in the analysis of more than 1000 exomes in a joint collaborative project of the CIBERER and the MGP A first pilot implementation has been done in the IT4I supercomputing center, which aims to centralize the analysis of genomics data in the country.
  26. 26. Implementation in the AVS ….. 1PB DB We have taken advantage of the already operative corporative medical image system using a quite similar philosophy. eHR gateway Upload image Retrieve (by patient ID) Genomic gateway Pilot project with 20 leukemias
  27. 27. Knowledge DB Freq. popul. MySeq IonTorrent IonProton Illumina NO Diagnostic Therapeutic decision New variants Disease All Candidate Prioritization Data preprocessing Sequence DB Sequences Freqs. Future technologies New knowledge for future diagnostic Gene discovery and diagnostic implemented But… what about personalized treatments? 
  28. 28. Patient’s omic data Biological knowledge Systems biology computational models Epigenomics Regulation Interaction Function Proteomics Genomics and transcriptomics Patient Metabolomics Diagnostic biomarkers Personalized medicine Therapeutic targets Cell culture Best combination Xenograft model Drug treatment Network drugs Personalized therapy Are individualized treatments a realistic option? Dopazo, 2003, Drug Discovery Today
  29. 29. Modeling pathways The effect of gene expression over signaling can be estimated. Virtual KOs (or over-expressions) can be simulated Colorectal cancer activates a signaling circuit of VEGF pathway that produces PGI2. Virtual KO of COX2 interrupts the circuit (known therapeutic inhibitor in CGR COX2 gene KO
  30. 30. Future prospects Exome vs complete genome
  31. 31. The ENCODE project suggests a functional role for a large fraction of the genome Which percentage of the genome is occupied by: Coding genes: 2.4% TFBSs 8.1% Open chromatin regions 15.2% Different RNA types 62.0% Total annotated elements: 80.4% Exomes are only covering a small fraction of the potential functionality of the genome (2.4%). Is the missing heritability hidden in the remaining 78%? If so, what type of variant should be expect to discover? SNVs? SVs?
  32. 32. Future prospects We need to efficiently query all the information contained in the genome, including all the epigenomic signatures as well as the structural variation. This involves data integration and “epistatic” queries. We need to prepare our health systems to deal with all the genomic data flood Information about variations Processed Raw Genome variant information (VCF) 150 MB 250 GB Epigenome 150 MB 250 GB Each transcriptome 20 MB 80 GB Individual complete variability 400 MB 525 GB Hospital (100.000 patients) 40 TB 50 PB We are only starting to realize the dimension of the daunting challenges posed by genomic big data There are technical (data size) and conceptual problems (data analysis) in the way genomic information is managed that must be addressed.
  33. 33. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases. @xdopazo @bioinfocipf

×