Forum on Personalized Medicine: Challenges for the next decade

1,570 views

Published on

Bioinformatics and Big Data in the era of Personalized Medicine
10th Anniversary Instituto Roche Forum on Personalized Medicine: Challenges for the next decade.
Santiago de Compostela (Spain), September 25th 2014

Published in: Healthcare
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,570
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
62
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Forum on Personalized Medicine: Challenges for the next decade

  1. 1. Joaquín Dopazo Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB), Bioinformatics Group (CIBERER) and Medical Genome Project, Spain. http://bioinfo.cipf.es http://www.medicalgenomeproject.com http://www.babelomics.org http://www.hpc4g.org @xdopazo Forum on Personalized Medicine, 25 September 2014 Bioinformatics and Big Data in the era of Personalized Medicine
  2. 2. Allison, 2008. Is personalized medicine finally arriving? Nature. Personalized medicine: just about a better understanding of the relationship phenotype-genotype Personalized medicine through precision medicine •Precision medicine requires of better ways of defining diseases by introducing genomic technologies into the diagnostic procedures. •A more precise diagnostic of diseases, based on the description of their molecular mechanisms, is critical for creating innovative diagnostic, prognostic, and therapeutic strategies properly tailored to each patient’s necessities
  3. 3. The future of personalized medicine is strongly based on genomics •Personalized medicine is based on the availability of diagnostic biomarkers •Genome sequencing offers ALL this information (if properly analyzed) •Genome sequence prices are in free fall (exome price expected < 300€ in 2-3 years) •Over 30-40 % of budget (>500 B $) per year, is spent on costs associated with “overuse, underuse, misuse, ...”
  4. 4. While the cost falls down, the amount of data to manage and its complexity raise exponentially. Costs are already almost competitive enough to be used in clinic The problem is… are we ready to deal with this data? Exome sequencing successfully used. NGS prices will be soon affordable. http://www.genome.gov/sequencingcosts/
  5. 5. http://www.nih.gov/news/health/jun2014/nhgri-18.htm http://www.nejm.org/doi/full/10.1056/NEJMra1312543 More than 10,000 exomes will be ordered for diagnostic purposes Clinical application of exomes
  6. 6. Personalized Genomic Medicine. Phase I: generating the knowledge database ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- sequencing Patient List of variants Database. Query: variant/pathway Therapy Outcome System feedback Genetic variants are linked to therapies through the knowledge of their functional effects (systems biology) Initially the system will need much feedback: Knowledge generation phase. Growing knowledge database Genomic medicine Knowledge database
  7. 7. Personalized genomic medicine. Phase II: applying the knowledge database Patient 1)Genomic sequencing 2)Database of markers 3)Therapy prediction Genomic core facility phase II Clinician receives hints on possible prescriptions and therapeutic interventions + Other factors (risk, cost, etc.) Prescription Pre-symptomatic: • Genetic predisposition of acquired diseases (>6000. some treatable) • Early diagnosis of genetic diseases Symptomatic analysis • Diagnostic of acquired diseases • Early cancer detection • Cancer treatment recommendation
  8. 8. From genetics to genomic medicine Test 1 Test 2 Therapy 1 Therapy 2 Therapy 3 ? Genetic medicine Test Therapy 1 Therapy 2 Therapy 3 ? Genomic medicine + Genomic analysis allows associating patients to therapies from the very beginning, saving time and costs and increasing the success of treatments. feedback
  9. 9. Some examples Conventional sequencing NGS (with capture) Marfan syndrome 1300€ 2 genes, 75 exons 900€ 3 genes, 237 exons Hereditary deafness 12500€ 36 genes 1500 exons 1100€ 38 genes > 1500 exons • Low initial investment • Already existent infrastructure • Quick implementation • Easily implementation as a cloud service that guarantees sustainability
  10. 10. Preparing the scenario for the introduction of genome in the clinics Patient Treatment eHR Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Integration of the data in the eHR Visualization and data presentation. Ready for the clinical interpretation Acceleration of algorithms for data pre- processing. Data strorage optimization feedback Corporative systems Orion clinic Abucasis, Gaia, etc.
  11. 11. Preparing the scenario for the introduction of genome in the clinics Patient Treatment eHR feedback Corporative systems Orion clinic Abucasis, Gaia, etc. Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Visualization and data presentation. Ready for the clinical interpretation Integration of the data in the eHR Acceleration of algorithms for data pre- processing. Data strorage optimization
  12. 12. New Big Data storage strategies Automatic QC Sequence cleansing Variant calling + QC Mapping + QC 8-10 hours 8-12 hours 8-12 hours CLOUD FASTQ (10GB) BAM (7GB) VCF (200MB) Data sizes for exomes. In case of whole genomes sizes are >20x Remote visualization of big data. Data production phase e-health record Final human supervision of data QC
  13. 13. Tools developed to improve the pipeline Genome Maps, a HTML5+SVG data visualization of VCF and BAM oGenome scale data visualization plays an important role in the data analysis process. It is a big data management problem. oFeatures of Genome Maps (Medina, 2013, NAR; ICGC data analysis portal) ●First 100% HTML5 web based: HTML5+SVG (inspired in Google Maps) ●Always updated, no browser plugins or installation ●Data taken from CellBase, remote NGS data, local files and DAS servers: genes, transcripts, exons, SNPs, TFBS, miRNA targets, etc. ●Other features: Multi species, API oriented, easy integration, plugin framework, etc. BAM viewer VCF viewer ICGC genomic viewer www.genomemaps.org
  14. 14. Patient Treatment eHR feedback Corporative systems Orion clinic Abucasis, Gaia, etc. Acceleration of algorithms for data pre- processing. Data strorage optimization Integration of the data in the eHR Visualization and data presentation. Ready for the clinical interpretation Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Preparing the scenario for the introduction of genome in the clinics
  15. 15. Finding new biomarkers Test Therapy 1 Therapy 2 Therapy 3 ? feedback Feedback: treatment failures are reanalyzed to search for: 1)Biomarkers (of failure) 2)Subgroups (to search for new personalized and rational therapeutic interventions Treatables Failure treatment biomarkers Group A biomarkers Group A biomarkers Irrelevant Non treatables Signaling Protein interaction Regulation Variants are used as biomarkers to distinguish between responders and non-responders and to sub-classify non-responders Rationale design of therapies rely on Systems Biology concepts. Pathways are complex and must be understood with the proper bioinformatic tools
  16. 16. Patient Treatment eHR feedback Corporative systems Orion clinic Abucasis, Gaia, etc. Decision support techniques: algorithms that relate biomarkers to treatments, outcomes, etc. (gene prioritization and predictors) Acceleration of algorithms for data pre- processing. Data strorage optimization Visualization and data presentation. Ready for the clinical interpretation Integration of the data in the eHR Preparing the scenario for the introduction of the genome in clinics
  17. 17. BiERapp: interactive web-based tool for easy candidate prioritization by successive filtering SEQUENCING CENTER Data preprocessing VCF FASTQ Genome Maps BAM BiERapp filters No-SQL (Mongo) VCF indexing Population frequencies Consequence types Experimental design BAM viewer and Genomic context ? Easy scale up
  18. 18. NA19660 NA19661 NA19600 NA19685 BiERapp: the interactive filtering tool for easy candidate prioritization http://bierapp.babelomics.org Aleman et al., 2014 NAR
  19. 19. 3-Methylglutaconic aciduria (3- MGA-uria) is a heterogeneous group of syndromes characterized by an increased excretion of 3-methylglutaconic and 3-methylglutaric acids. WES with a consecutive filter approach is enough to detect the new mutation in this case. Successive Filtering approach An example with 3-Methylglutaconic aciduria syndrome
  20. 20. Use known variants and their population frequencies to filter out irrelevant polymorphisms. •Typically dbSNP, 1000 genomes and the 6515 exomes from the ESP are used as sources of population frequencies. •We sequenced 300 healthy controls (rigorously phenotyped) to add and extra filtering step to the analysis pipeline Novembre et al., 2008. Genes mirror geography within Europe. Nature Comparison of MGP controls to 1000g How important do you think local information is to detect disease genes?
  21. 21. Filtering with or without local variants Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant The use of local variants makes an enormous difference
  22. 22. New variants and disease genes found with WES and successive filtering WES IRDs arRP (EYS) BBS arRP arRP (USH2) 3-MGA- uria (SERAC1) NBD (BCKDK )
  23. 23. Knowledge DB Freq. popul. MySeq IonTorrent IonProton Illumina NO Diagnostic Therapeutic decision New variants Disease All Candidate Prioritization Data preprocessing Sequence DB Sequences Freqs. Future technologies New knowledge for future diagnostic The final schema: diagnostic and discovery
  24. 24. Diagnostic by targeted sequencing (panels of genes) Tool for defining panels New filter based on local population variant frequencies If no diagnostic variants appear, then secondary findings are studied Diagnostic mutations http://team.babelomics.org
  25. 25. Implementation of tools in the IT4I Supercomputing Center (Czech Republic) The pipelines of primary and secondary analysis developed by the Computational Genomics Department of the CIPF in close collaboration with the Bull Chair has proven its efficiency in the analysis of more than 1000 exomes in a joint collaborative project of the CIBERER and the MGP A first pilot implementation has been done in the IT4I supercomputing center, which aims to centralize the analysis of genomics data in the country.
  26. 26. Implementation in the AVS ….. 1PB DB We have taken advantage of the already operative corporative medical image system using a quite similar philosophy. eHR gateway Upload image Retrieve (by patient ID) Genomic gateway Pilot project with 20 leukemias
  27. 27. Knowledge DB Freq. popul. MySeq IonTorrent IonProton Illumina NO Diagnostic Therapeutic decision New variants Disease All Candidate Prioritization Data preprocessing Sequence DB Sequences Freqs. Future technologies New knowledge for future diagnostic Gene discovery and diagnostic implemented But… what about personalized treatments? 
  28. 28. Patient’s omic data Biological knowledge Systems biology computational models Epigenomics Regulation Interaction Function Proteomics Genomics and transcriptomics Patient Metabolomics Diagnostic biomarkers Personalized medicine Therapeutic targets Cell culture Best combination Xenograft model Drug treatment Network drugs Personalized therapy Are individualized treatments a realistic option? Dopazo, 2003, Drug Discovery Today
  29. 29. Modeling pathways The effect of gene expression over signaling can be estimated. Virtual KOs (or over-expressions) can be simulated Colorectal cancer activates a signaling circuit of VEGF pathway that produces PGI2. Virtual KO of COX2 interrupts the circuit (known therapeutic inhibitor in CGR COX2 gene KO
  30. 30. Future prospects Exome vs complete genome
  31. 31. The ENCODE project suggests a functional role for a large fraction of the genome Which percentage of the genome is occupied by: Coding genes: 2.4% TFBSs 8.1% Open chromatin regions 15.2% Different RNA types 62.0% Total annotated elements: 80.4% Exomes are only covering a small fraction of the potential functionality of the genome (2.4%). Is the missing heritability hidden in the remaining 78%? If so, what type of variant should be expect to discover? SNVs? SVs?
  32. 32. Future prospects We need to efficiently query all the information contained in the genome, including all the epigenomic signatures as well as the structural variation. This involves data integration and “epistatic” queries. We need to prepare our health systems to deal with all the genomic data flood Information about variations Processed Raw Genome variant information (VCF) 150 MB 250 GB Epigenome 150 MB 250 GB Each transcriptome 20 MB 80 GB Individual complete variability 400 MB 525 GB Hospital (100.000 patients) 40 TB 50 PB We are only starting to realize the dimension of the daunting challenges posed by genomic big data There are technical (data size) and conceptual problems (data analysis) in the way genomic information is managed that must be addressed.
  33. 33. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases. @xdopazo @bioinfocipf

×