Bioinformatics life sciences_2012

1,691 views

Published on

Introduction to Bioinformatics For the Life Sciences

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,691
On SlideShare
0
From Embeds
0
Number of Embeds
1,122
Actions
Shares
0
Downloads
53
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bioinformatics life sciences_2012

  1. 1. Inleiding tot de bio-informatica encomputationele biologie
  2. 2. Lab for Bioinformatics and computational genomics 10 “genome hackers” mostly engineers (statistics) 42 scientists technicians, geneticists, clinicians >100 people hardware engineers,mathematicians, molecular biologists
  3. 3. What is Bioinformatics ? • Application of information technology to the storage, management and analysis of biological information (Facilitated by the use of computers) – Sequence analysis? – Molecular modeling (HTX) ? – Phylogeny/evolution? – Ecology and population studies? – Medical informatics? – Image Analysis ? – Statistics ? AI ? – Sterkstroom of zwakstroom ?
  4. 4. Promises of genomics and bioinformatics• Medicine (Pharma) – Genome analysis allows the targeting of genetic diseases – The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated – Knowledge of protein structure facilitates drug design – Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up• The same techniques can be applied to crop (Agro) and livestock improvement (Animal Health)
  5. 5. Bioinformatics, a life science discipline … Math (Molecular) Informatics Biology
  6. 6. Bioinformatics, a life science discipline … Math Computer Science Theoretical Biology (Molecular) Informatics Biology Computational Biology
  7. 7. Bioinformatics, a life science discipline … Math Computer Science Theoretical Biology Bioinformatics (Molecular) Informatics Biology Computational Biology
  8. 8. Bioinformatics, a life science discipline … management of expectations Math Computer Science Theoretical Biology NP AI, Image Analysis Datamining structure prediction (HTX) Bioinformatics Interface Design Expert Annotation Sequence Analysis (Molecular)Informatics Biology Computational Biology
  9. 9. Bioinformatics, a life science discipline … management of expectations Math Computer Science Theoretical Biology NP AI, Image Analysis Datamining structure prediction (HTX) Bioinformatics Discovery Informatics – Computational Genomics Interface Design Expert Annotation Sequence Analysis (Molecular)Informatics Biology Computational Biology
  10. 10. Time (years)
  11. 11. • Timelin: Magaret Dayhoff …
  12. 12. Happy Birthday …
  13. 13. PCR + dye termination Suddenly, a flash of insight caused him to pull the car off the road and stop. He awakened his friend dozing in the passenger seat and excitedly explained to her that he had hit upon a solution - not to his original problem, but to one of even greater significance. Kary Mullis had just conceived of a simple method for producing virtually unlimited copies of a specific DNA sequence in a test tube - the polymerase chain reaction (PCR)
  14. 14. Setting the stage … nature the Human genome
  15. 15. Biological Research Adapted from John McPherson, OICR
  16. 16. And this is just the beginning ….Next Generation Sequencing is here
  17. 17. One additional insight ...
  18. 18. Read Length is Not As Important For Resequencing 100% % of Paired K-mers with Uniquely 90% 80% Assignable Location 70% 60% E.COLI 50% HUMAN 40% 30% 20% 10% 0% 8 10 12 14 16 18 20 Length of K-mer Reads (bp)Jay Shendure
  19. 19. ABI SOLID
  20. 20. Paired End Reads are Important! Known Distance Read 1 Read 2 Repetitive DNA Unique DNA Paired read maps uniquely Single read maps to multiple positions
  21. 21. Adapted from: Barak Cohen, Washington University, Bio5488 http://tinyurl.com/6zttuq http://tinyurl.com/6k26nh Single Molecule SequencingMicroscope slide * * *Single DNAmolecule Super-cooled primer TIRF microscope dNTP-Cy3 * Helicos Biosciences Corp.
  22. 22. Complete genomics
  23. 23. Next next generation sequencing Third generation sequencing Now sequencing
  24. 24. Pacific Biosciences: A Third Generation Sequencing Technology Eid et al 2008
  25. 25. Nanopore Sequencing
  26. 26. Ultra-low-cost SINGLE molecule sequencing
  27. 27. Genome Size E. coli = 4.2 x 106 Yeast = 18 x 106 Arabidopsis = 80 x 106 C.elegans = 100 x 106 Drosophila = 180 x 106 Human/Rat/Mouse = 3000 x 106 Lily = 300 000 x 106 With ... : 99.9 % To primates: 99% DOGS: Database Of Genome Sizes
  28. 28. Anno 2012
  29. 29. Anno 2012
  30. 30. Definitions Identity The extent to which two (nucleotide or amino acid) sequences are invariant. Homology Similarity attributed to descent from a common ancestor.RBP: 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84 + K ++ + + GTW++MA+ L + A V T + +L+ W+glycodelin: 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81
  31. 31. DefinitionsOrthologousHomologous sequences in different speciesthat arose from a common ancestral geneduring speciation; may or may not be responsiblefor a similar function.ParalogousHomologous sequences within a single speciesthat arose by gene duplication.
  32. 32. speciationduplication
  33. 33. Overview • Simple identity, which scores only identical amino acids as a match. • Genetic code changes, which scores the minimum number of nucieotide changes to change a codon for one amino acid into a codon for the other. • Chemical similarity of amino acid side chains, which scores as a match two amino acids which have a similar side chain, such as hydrophobic, charged and polar amino acid groups. • The Dayhoff percent accepted mutation (PAM) family of matrices, which scores amino acid pairs on the basis of the expected frequency of substitution of one amino acid for the other during protein evolution. • The blocks substitution matrix (BLOSUM) amino acid substitution tables, which scores amino acid pairs based on the frequency of amino acid substitutions in aligned sequence motifs called blocks which are found in protein families
  34. 34. BLOSUM (BLOck – SUM) scoring Block = ungapped alignent Eg. Amino Acids D N V A S = 3 sequences W = 6 aa N= (W*S*(S-1))/2 = 18 pairs a b c d e f 1 DDNAAV 2 DNAVDD 3 NNVAVV
  35. 35. A. Observed pairs a b c d e f 1 DDNAAV 2 DNAVDD f fij D N A V 3 NNVAVV D N 1 4 1 A 1 1 1 V 3 1 4 1 Relative frequency table gij D N A V D .056 Probability of obtaining a pair /18 N .222 .056 if randomly choosing pairs A .056 .056 .056 from block V .167 .056 .222 .056
  36. 36. B. Expected pairs A Pi DDDDD 5/18 DDNAAV NNNN 4/18 DNAVDD AAAA 4/18 NNVAVV VVVVV 5/18 P{Draw DN pair}= P{Draw D, then N or Draw M, then D} P{Draw DN pair}= PDPN + PNPD = 2 * (5/18)*(4/18) = .123 Random rel. frequency table eij D N A V D .077 Probability of obtaining a pair of N .123 .049 each amino acid drawn A .154 .123 .049 independently from block V .123 .099 .123 .049
  37. 37. C. Summary (A/B) sij = log2 gij/eij (sij) is basic BLOSUM score matrix Notes: • Observed pairs in blocks contain information about relationships at all levels of evolutionary distance simultaneously (Cf: Dayhoffs’s close relationships) • Actual algorithm generates observed + expected pair distributions by accumalution over a set of approx. 2000 ungapped blocks of varrying with (w) + depth (s)
  38. 38. The BLOSUM Series • blosum30,35,40,45,50,55,60,62,65,70,75,80,85,90 • transition frequencies observed directly by identifying blocks that are at least – 45% identical (BLOSUM-45) – 50% identical (BLOSUM-50) – 62% identical (BLOSUM-62) etc. • No extrapolation made • High blosum - closely related sequences • Low blosum - distant sequences • blosum45  pam250 • blosum62  pam160 • blosum62 is the most popular matrix
  39. 39. Overview
  40. 40. • Church of the Flying Spaghetti Monster• http://www.venganza.org/about/open-letter
  41. 41. Overview – Henikoff and Henikoff have compared the BLOSUM matrices to PAM by evaluating how effectively the matrices can detect known members of a protein family from a database when searching with the ungapped local alignment program BLAST. They conclude that overall the BLOSUM 62 matrix is the most effective. • However, all the substitution matrices investigated perform better than BLOSUM 62 for a proportion of the families. This suggests that no single matrix is the complete answer for all sequence comparisons. • It is probably best to compliment the BLOSUM 62 matrix with comparisons using 250 PAMS, and Overington structurally derived matrices. – It seems likely that as more protein three dimensional structures are determined, substitution tables derived from structure comparison will give the most reliable data.
  42. 42. Rat versus Rat versusmouse RBP bacterial lipocalin
  43. 43. Alignments • Exhaustive … – All combinations: • Algorithm – Dynamic programming (much faster) • Heuristics – Needleman – Wunsh for global alignments (Journal of Molecular Biology, 1970) – Later adapated by Smith-Waterman for local alignment
  44. 44. A metric … GACGGATTAG, GATCGGAATAG GA-CGGATTAG GATCGGAATAG +1 (a match), -1 (a mismatch),-2 (gap) 9*1 + 1*(-1)+1*(-2) = 6
  45. 45. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  46. 46. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  47. 47. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 a 0 -1 -2 -3 -4 -5 2 K -2 0c 2b 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH A: 0 0 0 -1 0 -1 5 F -5 -3 -1(substr(seq1,j-1,1) eq substr(seq2,i-1,1) if -1 -1 1 0 -1 6 C -6 -4 up_score = matrix(i-1,j) + GAP 2 B: -2 -2 -2 0 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 left_score =-4 C: -4 matrix(i,j-1) +-2 -4 GAP 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  48. 48. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  49. 49. Needleman-Wunsch-edu.pl
  50. 50. Needleman-Wunsch-edu.pl Seq1:CKHVFCRVCI Seq2:CKKCFC-KCV ++--++--+- score = 0
  51. 51. • Practicum: use similarity function in initialization step -> scoring tables• Time Complexity• Use random proteins to generate histogram of scores from aligned random sequences
  52. 52. Time complexity with needleman-wunsch.pl Sequence Length (aa) Execution Time (s) 10 0 25 0 50 0 100 1 500 5 1000 19 2500 559 5000 Memory could not be written
  53. 53. Average around -64 ! -80 -78 -76 -74 -72 ** -70 ******* -68 *************** -66 ************************* -64 ************************************************************ -60 *********************** -58 *************** -56 ******** -54 **** -52 * -50 -48 -46 -44 -42 -40 -38
  54. 54. If the sequences are similar, the pathof the best alignment should be veryclose to the main diagonal.Therefore, we may not need to fill theentire matrix, rather, we fill a narrowband of entries around the maindiagonal.An algorithm that fills in a band ofwidth 2k+1 around the maindiagonal.
  55. 55. Multiple Alignment Method
  56. 56. Multiple Alignment Method
  57. 57. Examples Phylogenetic methods may be used to solve crimes, test purity of products, and determine whether endangered species have been smuggled or mislabeled: – Vogel, G. 1998. HIV strain analysis debuts in murder trial. Science 282(5390): 851-853. – Lau, D. T.-W., et al. 2001. Authentication of medicinal Dendrobium species by the internal transcribed spacer of ribosomal DNA. Planta Med 67:456-460.
  58. 58. Examples – Epidemiologists use phylogenetic methods to understand the development of pandemics, patterns of disease transmission, and development of antimicrobial resistance or pathogenicity: • Basler, C.F., et al. 2001. Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. PNAS, 98(5):2746-2751. • Ou, C.-Y., et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256(5060):1165-1171. • Bacillus Antracis:
  59. 59. Tree Of Life
  60. 60. Modeling
  61. 61. Ramachandran / Phi-Psi Plot
  62. 62. Protein Architecture
  63. 63. Modeling • Finding a structural homologue • Blast –versus PDB database or PSI- blast (E<0.005) –Domain coverage at least 60% • Avoid Gaps –Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores
  64. 64. Bootstrapping - an example Ciliate SSUrDNA - parsimony bootstrap Ochromonas (1) Symbiodinium (2) 100 Prorocentrum (3) Euplotes (8) 84 Tetrahymena (9) 96 Loxodes (4) 100 Tracheloraphis (5) 100 Spirostomum (6) 100 Gruberia (7) Majority-rule consensus
  65. 65. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  66. 66. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  67. 67. Personalized Medicine• The use of diagnostic tests (aka biomarkers) to identify in advance which patients are likely to respond well to a therapy• The benefits of this approach are to – avoid adverse drug reactions – improve efficacy – adjust the dose to suit the patient – differentiate a product in a competitive market – meet future legal or regulatory requirements• Potential uses of biomarkers – Risk assessment – Initial/early detection – Prognosis – Prediction/therapy selection – Response assessment – Monitoring for recurrence
  68. 68. BiomarkerFirst used in 1971 … An objective and « predictive » measure … at the molecular level … of normal and pathogenic processes and responses to therapeutic interventionsCharacteristic that is objectively measured and evaluated as an indicator of normal biologic or pathogenic processes or pharmacologic response to a drugA biomarker is valid if: – It can be measured in a test system with well established performance characteristics – Evidence for its clinical significance has been established
  69. 69. Rationale 1:Why now ? Regulatory path becoming more clear There is more at stake than efficient drug development. FDA « critical path initiative » Pharmacogenomics guideline Biomarkers are the foundation of « evidence based medicine » - who should be treated, how and with what. Without Biomarkers advances in targeted therapy will be limited and treatment remain largely emperical. It is imperative that Biomarker development be accelarated along with therapeutics
  70. 70. Why now ?First and maturing second generation molecular profiling methodologies allow to stratify clinical trial participants to include those most likely to benefit from the drug candidate—and exclude those who likely will not—pharmacogenomics- basedClinical trials should attain more specific results with smaller numbers of patients. Smaller numbers mean fewer costs (factor 2-10)An additional benefit for trial participants and internal review boards (IRBs) is that stratification, given the correct biomarker, may reduce or eliminate adverse events.
  71. 71. Molecular ProfilingThe study of specific patterns (fingerprints) of proteins,DNA, and/or mRNA and how these patterns correlatewith an individuals physical characteristics orsymptoms of disease.
  72. 72. Generic Health advice• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolarance)• Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
  73. 73. Generic Health advice (UNLESS)• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolarance)• Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
  74. 74. Generic Health advice (UNLESS)• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolerance)• Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
  75. 75. Generic Health advice (UNLESS)• Exercise (Hypertrophic Cardiomyopathy)• Drink your milk (MCM6 Lactose intolerance)• Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency)• & your grains (HLA-DQ2 – Celiac disease)• & your iron (HFE - Hemochromatosis)• Get more rest (HLA-DR2 - Narcolepsy)
  76. 76. EGFR based therapy in mCRC
  77. 77. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  78. 78. Before molecular profiling …
  79. 79. Before molecular profiling …
  80. 80. Before molecular profiling …
  81. 81. First Generation Molecular Profiling• Flow cytometry correlates surface markers, cell size and other parameters• Circulating tumor cell assays (CTC’s) quantitate the number of tumor cells in the peripheral blood.• Exosomes are 30-90 nm vesicles secreted by a wide range of mammalian cell types.• Immunohistochemistry (IHC) measures protein expression, usually on the cell surface.
  82. 82. First Generation Molecular Profiling• Gene sequencing for mutation detection• Microarray for m-RNA message detection• RT-PCR for gene expression• FISH analysis for gene copy number• Comparative Genome Hybridization (CGH) for gene copy number
  83. 83. Basics of the ―old‖ technology• Clone the DNA.• Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide.• Separate mixture on some matrix.• Detect fluorochrome by laser.• Interpret peaks as string of DNA.• Strings are 500 to 1,000 letters long• 1 machine generates 57,000 nucleotides/run• Assemble all strings into a genome.
  84. 84. Genetic Variation Among PeopleSingle nucleotide polymorphisms (SNPs) GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG 0.1% difference among people
  85. 85. The genome fits as an e-mail attachment
  86. 86. First Generation Molecular Profiling• Gene sequencing for mutation detection• Microarray for m-RNA message detection• RT-PCR for gene expression• FISH analysis for gene copy number• Comparative Genome Hybridization (CGH) for gene copy number
  87. 87. mRNA Expression Microarray
  88. 88. First Generation Molecular Profiling• Gene sequencing for mutation detection• Microarray for m-RNA message detection• RT-PCR for gene expression• FISH analysis for gene copy number• Comparative Genome Hybridization (CGH) for gene copy number
  89. 89. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  90. 90. Second Generation DNA profiling• Exome Sequencing (aka known as targeted exome capture) is an efficient strategy to selectively sequence the coding regions of the genome to identify novel genes associated with rare and common disorders.• 160K exons
  91. 91. Second Generation DNA profiling
  92. 92. Second Generation DNA profiling
  93. 93. Second Generation RNA profiling Besides the 6000 protein coding-genes … 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes Function of RNA genes pRNA in 29 rotary packaging motor (Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA Contents-Schedule (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
  94. 94. Second Generation RNA profiling RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence
  95. 95. ncRNAs in human genome tRNA 600 SRP RNA 1 18S rRNA 200 RNase P RNA 1 5.8S rRNA 200 Telomerase RNA 1 28S rRNA 200 RNase MRP 1 5S rRNA 200 Y RNA 5 snoRNA 300 miRNA 250 Vault 4 U1 40 7SK RNA 1 U2 30 Xist 1 U4 30 H19 1 U5 30 BIC 1 U6 20 U4atac 5 Antisense RNAs 1000s? U6atac 5 Cis reg regions 100s? U11 5 U12 5 Others ?
  96. 96. Mapping Structural Variation in Humans >1 kb segments - Thought to be Common 12% of the genome (Redon et al. 2006) - Likely involved in phenotype variation and disease CNVs - Until recently most methods for detection were low resolution (>50 kb)
  97. 97. Size Distribution of CNV in a Human Genome
  98. 98. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  99. 99. Defining Epigenetics Genome DNA  Reversible changes in gene expression/function  Without changes in DNA Chromatin sequence Epigenome  Can be inherited from precursor cells Gene Expression  Allows to integrate intrinsic with environmental signals Phenotype (including diet) CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  100. 100. CONFIDENTIALMethylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  101. 101. Epigenetic Regulation:Post Translational Modifications to Histones and Base Changes in DNA  Epigenetic modifications of histones and DNA include: – Histone acetylation and methylation, and DNA methylation Histone Methylation Me Me Histone Me Acetylation Ac DNA Methylation CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  102. 102. MGMT BiologyO6 Methyl-GuanineMethyl TransferaseEssential DNA Repair EnzymeRemoves alkyl groups from damaged guaninebasesHealthy individual: - MGMT is an essential DNA repair enzyme Loss of MGMT activity makes individuals susceptible to DNA damage and prone to tumor developmentGlioblastoma patient on alkylator chemotherapy: - Patients with MGMT promoter methylation show have longer PFS and OS with the use of alkylating agents as chemotherapy CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  103. 103. MGMT PromoterMethylation PredictsBenefit form DNA-Alkylating Chemotherapy Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma patients show benefit for patients with MGMT promoter methylation Median Overall Survival 25 21.7 months 20 plus temozolomide 15 12.7 months 10 radiotherapy radiotherapy 5 Adapted from Hegi et al. NEJM 2005 0 352(10):1036-8. Non-Methylated Methylated Study with 207 patients MGMT Gene MGMT Gene CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  104. 104. Genome-wide methylationby methylation sensitive restriction enzymes CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  105. 105. Genome-wide methylationby probes CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  106. 106. Genome-wide methylation…. by next generation sequencing # markers Discovery Verification Validation # samples CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  107. 107. MBD_SeqCondensed Chromatin DNA Sheared Immobilized Methyl Binding Domain DNA Sheared CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  108. 108. MBD_Seq Immobilized Methyl binding domain MgCl2 Next Gen Sequencing GA Illumina: 100 million reads CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  109. 109. MBD_SeqMGMT = dual core CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  110. 110. Genome-wide methylation…. by next generation sequencing # markers 1-2 million MBD_Seq methylation cores Discovery # samples CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  111. 111. Data integrationCorrelation tracksexpression expression Corr =-1 Corr = 1 methylation methylation CONFIDENTIAL 142
  112. 112. Correlation trackin GBM @ MGMT +1 -1 CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker 143 I NEXT-GEN | PharmacoDX |
  113. 113. Genome-wide methylation…. by next generation sequencing # markers MBD_Seq Discovery 454_BT_Seq Verification MSP Validation # samples CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX |
  114. 114. Deep Sequencing unmethylated alleles methylated alleles less methylation more methylation CONFIDENTIALGCATCGTGACTTACGACTGATCGATGGATGCTA
  115. 115. Deep MGMTHeterogenic complexity CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  116. 116. CONFIDENTIALMethylation I Epigenetics | Oncology | Biomarker 147 I NEXT-GEN | PharmacoDX | CRC
  117. 117. Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  118. 118. Translational Medicine: An inconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes ?? • Space/time continuum …
  119. 119. Translational Medicine: An inconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes … • ―space/time‖ continuum
  120. 120. Cellular programming Epigenetic (meta)information = stem cells
  121. 121. Cellular reprogrammingTumor Tumor Development and GrowthEpigeneticallyaltered, self-renewing cancerstem cells
  122. 122. Cellular reprogramming Gene-specific Epigenetic reprogramming
  123. 123. biobixwvcriekibiobix.bebioinformatics.be 156

×