• Save
Experimentos de nubes científicas: Medical Genome Project
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Experimentos de nubes científicas: Medical Genome Project

on

  • 537 views

Guillermo Antiñolo, Director Científico Medical Genome Project (GBPA - Plataforma de Genómica y Bioinformática de Andalucía)

Guillermo Antiñolo, Director Científico Medical Genome Project (GBPA - Plataforma de Genómica y Bioinformática de Andalucía)

Statistics

Views

Total Views
537
Views on SlideShare
537
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 01/30/12 02/04/12
  • 01/30/12 02/04/12
  • Thank forum hosts for soliciting views from diverse groups involved in research on the important topic of family consent. The informed consent process is central to the obligation to protect the rights and welfare of research participants The integrity of the informed consent process, whether the issue is family consent or some other aspect of informed consent, is a shared interest and responsibility of researchers, ethics committees, regulators and study sponsors.
  • Las distrofias de retina con un grupo de enfermedades genética y fenotípicamente heterogéneas caracterizadas por la degeneración de la retina, que conduce a una ceguera parcial o total. La identificación de nuevos genes responsables de enfermedades de retina es la base por tanto de los avances en el conocimiento de la fisiología y la patología de la retina; de esta manera, será posible establecer nuevas líneas celulares y modelos animales para estudiar las funciones de los genes relevantes que permitan en última instancia mejorar las alternativas terapéuticas para las distrofias de retina.
  • La RP es una enfermedad muy heterogénea, tanto clínica como genéticamente, siendo ésta una de las razones que dificultan la labor de desentrañar su causa, progresión, e incluso la obtención de un tratamiento. Diferentes mutaciones en el mismo gen pueden producir el mismo o distintos fenotipos (heterogeneidad alélica) y así mismo, mutaciones en distintos genes pueden producir el mismo fenotipo (heterogeneidad de locus ). El tipo de herencia puede ser autosómica dominante, autosómica recesiva, ligada al cromosoma X, e incluso se han descrito patrones más complejos como la herencia digénica o la disomía uniparental. La forma más común es la autosómica recesiva, de la que hasta el momento se han identificado 35 loci , que en conjunto serían responsables de un 35-45% de los casos de RPar. Sin embargo, el gen EYS , identificado por nuestro grupo en 2008, podría ser responsable del 15,9 % de los casos en España.

Experimentos de nubes científicas: Medical Genome Project Presentation Transcript

  • 1. GUILLERMO ANTIÑOLODIRECTOR DE LA UNIDAD DE GESTIÓN CLÍNICA DE GENÉTICA, REPRODUCCIÓN Y MEDICINA FETAL DEL HOSPITAL UNIVERSITARIO VIRGEN DEL ROCÍO DIRECTOR CIENTÍFICO MGP/GBPA PROFESOR TITULAR DE OBSTETRICIA Y GINECOLOGÍA DE LA UNIVERSIDAD DE SEVILLA
  • 2. Healthcare in the 21st CenturyGenomic Medicine and Personalized Medicine
  • 3. Most common applications of NGS Resequencing ResequencingRNA-seq /Transcriptomics oo Mutation calling Mutation calling RNA-seq /Transcriptomicsoo Quantitative Quantitative oo Profiling Profilingoo Descriptive Descriptive ooGenome annotation Genome annotation Alternative splicing Alternative splicingoo miRNA profiling miRNA profiling De novo sequencing De novo sequencing Exome sequencing Exome sequencing Targeted TargetedChIP-seq /Epigenomics ChIP-seq /Epigenomicsoo Protein-DNA interactions Protein-DNA interactions sequencing sequencingoo Active transcription factor binding sites Active transcription factor binding sitesooHistone methilation Histone methilation Copy number variation Copy number variation Metagenomics Metagenomics Metatranscriptomics Metatranscriptomics
  • 4. Introduction Next-Generation Sequencing (NGS) technology is changing the wayBig data in Biology, a new scenario how researchers perform experiments. Many new experiments are being conducted by sequencing: exome re-sequencing, RNA-seq, Meth-seq, ChIP-seq, ... NGS is allowing researches to: ● Find exome and genomic variants responsible of diseases ● Study the whole transcriptome of a phenotype ● Establish the methylation state of a condition ● Locate DNA binding proteins But experiments have increased data size by 1000x when compared with microarrays, i.e. from MB to hundreds of GB in transcriptomics Data processing and analysis are becoming a bottleneck and a nightmare, from days or weeks with microarrays to months with NGS, and it will be worse as more data become available
  • 5. Nat Genet. 2010 Jan;42(1):13-4.Exome sequencing makes medical genomics a reality.Biesecker LG.
  • 6. Relative throughput of the different HT technologiesNGS emerges with apotential of data productionthat will, eventually wipe outconventional HT technologiesin the years coming Too many sequences to be handled and stored in standard computers
  • 7. The Pursuit of Better and more Efficient Healthcareas well as Clinical Innovation through Genetic and Genomic Research
  • 8. Clinical Service, Hospital & Health System (AHS) Text TextTranslationa Pharma l Science MGP & Biotech Institute Text Text (GBPA) Public-Private-Partnertship
  • 9. MGP Research Goals To sequence the genomes of clinically well characterized patients with potential mutations in novel genes. To generate and validate a database of genomes of phenotyped control individuals. To develop innovative bioinformatics tools for the detection and characterisation of mutations using genomic information.
  • 10. 11 Megasequencing PlatformsTwo technologies to scan for variations Structural variation •Amplifications 454 Roche •Deletions Longer reads •CNV Lower •Inversions coverage •Translocations Variants SOLiD ABI •SNPs Shorter reads •Mutations Higher •indels coverage
  • 11. Big data challenges and solutions “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” Big data is not a new scenario for other science areas: astronomy, physics, internet search, finance, business, ... Which are the main Big data challenges?: curation, search, sharing, storage, analysis and visualization We need to study and use new computational technologies :  High-Performance Computing (HPC): multi-core CPUs, SSE/AVX, GPUs  Distributed computing: Apache Hadoop MapReduce, MPI  Distributed and NoSQL databases: Apache Cassandra, HBase, …  Web apps: HTML5 (SVG, WebGL, ...), Javascript, RESTful WS, ...  Clouds: Amazon AWS, Google Cloud, Microsoft Azure, …  Biomed: Machine learning, data mining, clustering, probablistic graphicals models, visualization, health infprmation management and genomic data...
  • 12. New Solutions for Big Data Analysis, Storage and VisualizationHPC and Cloud-based solutions
  • 13. Bioinformatics Unit at MGP/GBPA 24 High Performance Computing nodes – 72-192 Gb RAM 2 Control nodes - 24 Gb RAM o 2 x Quad core CPU o 16 threads o 2 x 10Gb Network interface Execution of 400 jobs in parallel Storage 540 Tb total
  • 14. NGS pipeline, a HPC implementation for Bioinformatic analysis NGS Fastq file, up to hundreds of GB per run sequencer QC stats, filtering and QC and preprocessing preprocessing options HPG suite High-Performance Genomics HPG Aligner, short read Double mapping strategy: Burrows-Wheeler Transform (GPU Nvidia CUDA) aligner + Smith-Waterman (CPU OpenMP+SSE/AVX)More info at: SAM/BAM filehttp://bioinfo.cipf.es/docs/compbio/projects/hpg/doku.php QC stats, filtering and QC and preprocessing preprocessing options Variant calling analysis GATK and SAM mPileup HPC Other analysis Implementation. (HPC4Genomics consortium) Statistics genomic tests RNA-seq (mRNA sequenced) VCF file DNA assembly (not a real analysis) Meth-seq Copy Number QC and preprocessing Transcript isoform QC stats, filtering and ... Variant VCF viewer preprocessing options HPG Variant, Variant HTML5+SVG Web based viewer analysis Consequence type, GWAS, regulatory variants and system biology information
  • 15. “…Me parece increíble e injustificable el abismo que existe entre los resultados de las investigaciones y su aplicación cotidiana a los enfermos…” Eduard Punset
  • 16. Genomic and Personalized MedicinePatient Genomic core facility 1) Genomic sequencing 2) Database of markers/variants/mutations 3) Genetic/Genomic Diagnosis 4) Therapy/preventive intervention Pre-symptomatic: Clinician receives hints on • Genetic predisposition of acquired diseases (>6000. some treatable) Dx, and possible Early and faster diagnosis of genetic preventive therapeutic diseases and/or interventions Symptomatic analysis • Diagnostic of acquired diseases • Early cancer detection • Cancer treatment recommendation
  • 17. Inherited Retinal Distrophies (IRDs)  Prevalence 1 in 3000  Clinically and genetically very heterogeneous  190 GENES account for aprox. 50% of IRDs. Families with digenismFamilies with Families with unknown Families with known mutations one mutant mutations allele Diagnosed families
  • 18. Genetic overlapping among IRDs BBS ARL6,, BBS2, BBS4, BBS5, BBS7, BBS9, LCA BBS10, BBS12,, INPP5E, LZTFL1, MKKS, MKS1, LCA5, SDCCAG8, TRIM32, TTC8 CORD/COD RD3 CACNA1F, CEP290 CACNA2D4 CVD GNAT2 CRB1, IMPDH1, BBS1 CABP4, GRK1,CORD/COD AIPL1, LRAT, MERTK, GRM6, NB GUCY2D, RDH12, RPE65, PDE6B, NYX, RPGRIP1 SPATA7, TULP1 RHO, TRPM1 ADAM9, GUCA1A, CRX SAG C2ORF71, C8ORF37, HRG4/UNC119, LCA-Leber Congenital Amaurosis CA4,CERKL, CNGA1, CNGB1, KCNV2, PDE6H, PITPNM3, RAX2, RLBP1, DHDDS,EYS, FAM161A, IDH3B,KLHL7 CORD/COD- Cone and cone-rod dystro. RDH5, RIM1 SEMA4A IMPG2, MAK, NRL, PAP1, PDE6A, PDE6G, PRCD, PRF3, PRPF8, PRPF31 RP CVD- Colour Vision Defects ABCA4, MD- Macular Degeneration CNGA3, PROM1, RBP3, RGR, ROM1, RP1, RP2, ERVR/EVR- Erosive and Exudative CVD PDE6C PRPH2, FSCN2, SNRNP200, TOPORS, TTC8 ZNF513 Vitreoretinopathies BCP, RPGR CLRN1, GUCA1B USH2A USH- Usher Syndrome GCP, C1QTNF5, BEST1 ABHD12, CDH23, CIB2, RP- Retinitis Pigmentosa RCP EFEMP1, NR2E3 DFNB31, GPR98, NB- Night Blindness ELOVL4, HARS, MYO7A, PCDH15, USH1C, BBS- Bardet-Biedl Syndrome HMNC1, FZD4, KCNJ13, RS1, LRP5, NDP, USH1G TIMP3 TSPAN12, VCAN MD USH ERVR/EVR
  • 19. Molecular Genetics of RP RPLX UN ADRP 7% 3% RPE Variety of inheritance patterns. 15% 40% Autosomic Recessive RP (arRP)  most common. Allelic and locus heterogeneity. 35% 62 genes have been associated with RP  responsible of 2/3 of cases ARRP EYS  one of the most prevalent responsible of 15 % of arRP cases. EYS RP22, RP29, RP32 ABCA4, BEST1, C2ORF71, CERKL, CNGA1, CNGB1, CRB1, FAM161A, IDH3B, IMPG2, LRAT, MERTK, NR2E3, NRL, PDE6A, PDE6B, PDE6G, PRCD, PROM1, RBP3, RGR, RHO, RLBP1, RP1, RPE65, SAG, SEMA4A, SPATA7, TTC8, Unknown TULP1, USH2A, ZNF513…
  • 20. Clinical Diagnosis: ARRP APEX RESEQUENCING (Commercially available) (Custom design) CERKL CNGA1, EYS CNGB1, PROM1 MERTK PRCD PDE6A NR2E3 PDE6B LRAT PNR IDH3B RDH12 CERKL RGR, TULP1 RLBP1 RPE65 SAG RLBP1 TULP1 RHO CRB RGR RPE65 PDE6B USH2A CRB1 USH3A CNGA1 LRAT, MERTK PROML1 PBP3
  • 21. Summary after WES INITIAL INCORRECT CLINICAL DIAGNOSIS INITIAL INCOMPLETE CLINICAL DIAGNOSIS
  • 22. Mutación si conocida? Diagnóstico no si Mutación si Se en gen Validación confirma? conocido? no noMutación en si genrelacionado? no
  • 23. Next steps cloud-based and open solutions cloud-based environment integration ready, codename: GASC  Storage: efficient storage and data retrieval of ~TB, transparent connection to others clouds such as Amazon AWS or Microsoft Azure  Analysis: many tools ready to use (aligners, GATK, …), users can upload their tools to extend functionality, SGE queue, …  Search and access: data is indexed and can be queried efficiently, RESTful WS allows users to access to data and analysis programatically  Sharing: users can share their data and analysis, public and private data  Visualization: HTML5-SVG based web applications to visualize data Open development initiative  HPG project, CellBase, Genome Maps, GASC, … released as open source development initiative  Source code controlled with Git, hosted freely in GitHub  Scientist are encouraged to collaborate and extend functionality, a HPC4G consortium from universities already created
  • 24. High-throughputtechnologies such as NGS is pushing BioMedicine into Big DataWe must learn how to deal with this huge amount of data to translate it into clinically relevant informationThis new scenario demands new solutions as well as new computational technologiesOpen development model allows researchers to join forces and build up better solutions
  • 25. Joaquín Dopazo Javier Santoyo
  • 26. UGC Genética, Reproducción y Medicina Fetal Hospital Universitario Virgen del Rocío Sevilla Salud Borrego Alicia Vela Nacho Medina Cristina méndez Antonio Rueda María Gónzalez F.J. López Lorena Fernández Nereida Bravo