Next Generation Molecular Profiling




                     26 oktober 2011
                     Auditorium J, Plateau, Gent
Lab for Bioinformatics and
          computational genomics
     10 “genome hackers”
   mostly engineers (statistics)




           42 scientists
 technicians, geneticists, clinicians




           >100 people
      hardware engineers,
mathematicians, molecular biologists
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Personalized Medicine
• The use of diagnostic tests (aka biomarkers) to identify in advance
  which patients are likely to respond well to a therapy
• The benefits of this approach are to
   – avoid adverse drug reactions
   – improve efficacy
   – adjust the dose to suit the patient
   – differentiate a product in a competitive market
   – meet future legal or regulatory requirements
• Potential uses of biomarkers
   – Risk assessment
   – Initial/early detection
   – Prognosis
   – Prediction/therapy selection
   – Response assessment
   – Monitoring for recurrence
Biomarker

First used in 1971 … An objective and
  « predictive » measure … at the molecular
  level … of normal and pathogenic processes
  and responses to therapeutic interventions
Characteristic that is objectively measured and
  evaluated as an indicator of normal biologic
  or pathogenic processes or pharmacologic
  response to a drug
A biomarker is valid if:
   – It can be measured in a test system with well
     established performance characteristics
   – Evidence for its clinical significance has been
     established
Rationale 1:
Why now ? Regulatory path becoming more clear


                                                There is more at stake than
                                                  efficient drug
                                                  development. FDA
                                                  « critical path initiative »
                                                  Pharmacogenomics
                                                  guideline

                                                Biomarkers are the
                                                   foundation of « evidence
                                                   based medicine » - who
                                                   should be treated, how
                                                   and with what.

                                                Without Biomarkers
                                                   advances in targeted
                                                   therapy will be limited and
                                                   treatment remain largely
                                                   emperical. It is imperative
                                                   that Biomarker
                                                   development be
                                                   accelarated along with
                                                   therapeutics
Why now ?

First and maturing second generation molecular
  profiling methodologies allow to stratify clinical
  trial participants to include those most likely to
  benefit from the drug candidate—and exclude
  those who likely will not—pharmacogenomics-
  based
Clinical trials should attain more specific results
  with smaller numbers of patients. Smaller
  numbers mean fewer costs (factor 2-10)
An additional benefit for trial participants and
  internal review boards (IRBs) is that
  stratification, given the correct biomarker, may
  reduce or eliminate adverse events.
Molecular Profiling

The study of specific patterns (fingerprints) of proteins,
DNA, and/or mRNA and how these patterns correlate
with an individual's physical characteristics or
symptoms of disease.
Generic Health advice




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)




• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
  dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
EGFR based therapy in mCRC
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Before molecular profiling …
Before molecular profiling …
Before molecular profiling …
First Generation Molecular Profiling


• Flow cytometry correlates surface markers,
  cell size and other parameters
• Circulating tumor cell assays (CTC’s)
  quantitate the number of tumor cells in the
  peripheral blood.
• Exosomes are 30-90 nm vesicles secreted by
  a wide range of mammalian cell types.
• Immunohistochemistry (IHC) measures
  protein expression, usually on the cell
  surface.
First Generation Molecular Profiling




• Gene sequencing for mutation detection

• Microarray for m-RNA message detection
• RT-PCR for gene expression

• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
  gene copy number
Basics of the ―old‖ technology


• Clone the DNA.
• Generate a ladder of labeled (colored)
  molecules that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
Genetic Variation
 Among People
Single nucleotide polymorphisms
            (SNPs)
  GATTTAGATCGCGATAGAG
  GATTTAGATCTCGATAGAG



 0.1% difference among
         people
The genome fits as an e-mail attachment
First Generation Molecular Profiling




• Gene sequencing for mutation detection

• Microarray for m-RNA message detection
• RT-PCR for gene expression

• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
  gene copy number
mRNA Expression Microarray
First Generation Molecular Profiling




• Gene sequencing for mutation detection

• Microarray for m-RNA message detection
• RT-PCR for gene expression

• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
  gene copy number
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Basics of the ―new‖ technology


• Get DNA.
• Attach it to something.
• Extend and amplify signal with some color
  scheme.
• Detect fluorochrome by microscopy.
• Interpret series of spots as short strings of
  DNA.
• Strings are 30-300 letters long
• Multiple images are interpreted as 0.4 to 1.2
  GB/run (1,200,000,000 letters/day).
• Map or align strings to one or many genome.
Next Generation Technologies

• Roche (454)
   –Emulsion PCR
   –Polymerase
   –Natural Nucleotides
• 100-500 Mb for 5-15k
   –1% error rate
   –Homopolymers
One additional insight ...
Read Length is Not As Important For Resequencing


                                                  100%
               % of Paired K-mers with Uniquely
                                                  90%
                                                  80%
                     Assignable Location


                                                  70%
                                                  60%
                                                                                         E.COLI
                                                  50%
                                                                                         HUMAN
                                                  40%
                                                  30%
                                                  20%
                                                  10%
                                                   0%
                                                         8   10   12   14 16   18   20
                                                         Length of K-mer Reads (bp)
Jay Shendure
Short Read Techologies


• Illumina GA (HiSeq, MySeq)



• ABI SOLID
Other second generation technology: (ABI) SOLID
So what ?
Second generation DNA/RNA profiling
Second Generation DNA profiling


• Enrichment Sequencing
    • ChIP-Seq (Chromosome
      Immunoprecipitation)
       • A substitute for ChIP-chip
       • Eg. to find the binding sequence of
         proteins (TFBS)
Paired End Reads are Important!

                                Known Distance

               Repetitive DNA
                       Read 1Unique DNA 2
                                     Read




          Single read maps to
          multiple positions
Paired End Reads are Important!

                                Known Distance

               Repetitive DNA
                       Read 1Unique DNA 2
                                     Read




          Single read maps to
          multiple positions
Second Generation DNA profiling


• Exome Sequencing (aka known as
  targeted exome capture) is an
  efficient strategy to selectively
  sequence the coding regions of the
  genome to identify novel genes
  associated with rare and common
  disorders.
• 160K exons
Second Generation DNA profiling
Second Generation DNA profiling
Bioinformatics tools
Bioinformatics tools
Second Generation RNA profiling

                      Besides the 6000 protein coding-genes …

                      140 ribosomal RNA genes
                      275 transfer RNA gnes
                      40 small nuclear RNA genes
                      >100 small nucleolar genes

                      Function of RNA genes

                      pRNA in 29 rotary packaging motor (Simpson
                      et el. Nature 408:745-750,2000)
                      Cartilage-hair hypoplasmia mapped to an RNA
 Contents-Schedule




                      (Ridanpoa et al. Cell 104:195-203,2001)
                      The human Prader-Willi ciritical region (Cavaille
                      et al. PNAS 97:14035-7, 2000)
Second Generation RNA profiling

                       RNA genes can be hard to detects

                       UGAGGUAGUAGGUUGUAUAGU

                       C.elegans let-27; 21 nt
                       (Pasquinelli et al. Nature 408:86-89,2000)


                       Often small
                       Sometimes multicopy and redundant
                       Often not polyadenylated
                       (not represented in ESTs)
                       Immune to frameshift and nonsense
                       mutations
                       No open reading frame, no codon bias
                       Often evolving rapidly in primary sequence
Second Generation RNA profiling


  Although details of the methods vary, the concept
  behind RNA-seq is simple:
        • isolate all mRNA
        • convert to cDNA using reverse transcriptase
        • sequence the cDNA
        • map sequences to the genome
  The more times a given sequence is detected, the
  more abundantly transcribed it is. If enough
  sequences are generated, a comprehensive and
  quantitative view of the entire transcriptome of an
  organism or tissue can be obtained.
Second Generation RNA profiling


• Comparing to microarray
    – Microarray
        • Closed technology: Prior knowledge required
        • Affected by pseudo-genes (homologous of real genes)
        • Low sensitivity
    – RNA-Seq
        • Open technology: No prior knowledge required
        • Not affected by pseudo-genes because exact
          sequence is measured
        • Other information could be yielded (SNP, Alternative
          splicing)
ncRNAs in human genome

 tRNA                    600   SRP RNA             1
 18S rRNA                200   RNase P RNA         1
 5.8S rRNA               200
                               Telomerase RNA      1
 28S rRNA                200
                               RNase MRP           1
 5S rRNA                 200
                               Y RNA               5
 snoRNA                  300
 miRNA                   250   Vault               4
 U1                       40   7SK RNA             1
 U2                       30   Xist                1
 U4                       30   H19                 1
 U5                       30   BIC                 1
 U6                       20
 U4atac                    5
                               Antisense RNAs 1000s?
 U6atac                    5
                               Cis reg regions   100s?
 U11                       5
 U12                       5   Others               ?
Mapping Structural Variation in Humans
           >1 kb segments
                   - Thought to be Common
                       12% of the genome
                       (Redon et al. 2006)
                   - Likely involved in phenotype
                        variation and disease
            CNVs
                   - Until recently most methods for
                      detection were low resolution
                      (>50 kb)
Size Distribution of CNV in a Human Genome
Next next generation sequencing
 Third generation sequencing
       Now sequencing
Ultra-low-cost SINGLE molecule sequencing
Pacific Biosciences: A Third Generation Sequencing Technology




                                                     Eid et al 2008
Complete genomics
Nanopore Sequencing
Second Generation Protein profiling


• Proteomics MS-MS-based
  exclusively in discovery mode
• Automate diagnostics assay
  generation (next generation
  proteomics)
    • Aptamers as alternative to antibodies
    • ImmunoPCR
MS/MS identification
pipeline overview
                                              pipeline
                                                  Goal
                                            filter dataset     Goal
                                                prior to     multi-tiered
                                               database
                                                  Goal
                                                search
                                                             database
                                Bonanza     define PTMs       search
                                                 profile
                                                 prior to
                                               database
                                                search




            Bonanza + IggyPep
Second Generation Protein profiling


• Proteomics MS-MS-based
  exclusively in discovery mode
• Automate diagnostics assay
  generation (next generation
  proteomics)
    • Aptamers as alternative to antibodies
    • ImmunoPCR
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Defining Epigenetics
                 Genome

                                             DNA           Reversible changes in gene
                                                            expression/function
                                                           Without changes in DNA
                                  Chromatin                 sequence
              Epigenome
                                                           Can be inherited from
                                                            precursor cells
       Gene Expression                                     Allows to integrate intrinsic
                                                            with environmental signals
 Phenotype
                                                            (including diet)

                                                                                        CONFIDENTIAL

             Methylation   I   Epigenetics         |        Oncology    |   Biomarker
                           I   NEXT-GEN            |       PharmacoDX   |     CRC
CONFIDENTIAL

Methylation   I   Epigenetics   |    Oncology    |   Biomarker
              I   NEXT-GEN      |   PharmacoDX   |     CRC
Epigenetic Regulation:
Post Translational Modifications to Histones and Base Changes in DNA

    Epigenetic modifications of histones and DNA include:
      – Histone acetylation and methylation, and DNA methylation

                                                 Histone
                                                 Methylation
                                                                        Me Me
 Histone
                                                      Me
 Acetylation
                          Ac


                                                                        DNA Methylation

                                                                                            CONFIDENTIAL

            Methylation        I   Epigenetics    |         Oncology      |     Biomarker
                               I   NEXT-GEN       |        PharmacoDX     |       CRC
MGMT Biology
O6 Methyl-Guanine
Methyl Transferase
Essential DNA Repair Enzyme

Removes alkyl groups from damaged guanine
bases

Healthy individual:
     - MGMT is an essential DNA repair enzyme
     Loss of MGMT activity makes individuals susceptible
     to DNA damage and prone to tumor development

Glioblastoma patient on alkylator chemotherapy:
     - Patients with MGMT promoter methylation show
     have longer PFS and OS with the use of alkylating
     agents as chemotherapy



                                                                                        CONFIDENTIAL

              Methylation     I     Epigenetics   |         Oncology    |   Biomarker
                              I     NEXT-GEN      |        PharmacoDX   |     CRC
MGMT Promoter
Methylation Predicts
Benefit form DNA-Alkylating Chemotherapy
  Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma
  patients show benefit for patients with MGMT promoter methylation

           Median Overall Survival
     25
                                      21.7 months
     20                                   plus
                                      temozolomide
     15
              12.7 months

     10                               radiotherapy

              radiotherapy
      5
                                                                      Adapted from Hegi et al.
                                                                      NEJM 2005
      0                                                               352(10):1036-8.
            Non-Methylated              Methylated                    Study with 207 patients
             MGMT Gene                  MGMT Gene
                                                                                                 CONFIDENTIAL

            Methylation         I    Epigenetics     |    Oncology    |      Biomarker
                                I    NEXT-GEN        |   PharmacoDX   |          CRC
Genome-wide methylation
by methylation sensitive restriction enzymes




                                                                            CONFIDENTIAL

           Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                         I   NEXT-GEN      |   PharmacoDX   |     CRC
Genome-wide methylation
by probes




                                                                          CONFIDENTIAL

         Methylation   I   Epigenetics   |    Oncology    |   Biomarker
                       I   NEXT-GEN      |   PharmacoDX   |     CRC
MBD_Seq

Condensed Chromatin                        DNA Sheared



                                                                 Immobilized
                                                                 Methyl Binding Domain
           DNA Sheared




                                                                               CONFIDENTIAL

          Methylation    I   Epigenetics   |     Oncology    |     Biomarker
                         I   NEXT-GEN      |    PharmacoDX   |       CRC
MBD_Seq

                                                 Immobilized
                                                 Methyl binding domain




                                      MgCl2




                                                 Next Gen Sequencing
                                                 GA Illumina: 100 million reads

                                                                              CONFIDENTIAL

      Methylation   I   Epigenetics   |        Oncology     |    Biomarker
                    I   NEXT-GEN      |       PharmacoDX    |      CRC
Overview


           Personalized Medicine,
              Biomarkers …
              … Molecular Profiling

           First Generation Molecular Profiling
           Next Generation Molecular Profiling
           Next Generation Epigenetic Profiling

           Concluding Remarks
Bioinformatics, a life science discipline … management of expectations


                                     Math




 Computer Science                                         Theoretical Biology
                  NP                            AI, Image Analysis
                  Datamining                    structure prediction (HTX)
                       Bioinformatics
      Discovery Informatics – Computational Genomics
           Interface Design                       Expert Annotation
                            Sequence Analysis                       (Molecular)
Informatics
                                                                      Biology
                            Computational Biology
Translational Medicine: An inconvenient truth


  • 1% of genome codes for proteins, however
    more than 90% is transcribed
  • Less than 10% of protein experimentally
    measured can be ―explained‖ from the
    genome
  • 1 genome ? Structural variation
  • > 200 Epigenomes ??

  • Space/time continuum …
Translational Medicine: An inconvenient truth


  • 1% of genome codes for proteins, however
    more than 90% is transcribed
  • Less than 10% of protein experimentally
    measured can be ―explained‖ from the
    genome
  • 1 genome ? Structural variation
  • > 200 Epigenomes …

  • ―space/time‖ continuum
Cellular programming

               Epigenetic (meta)information = stem cells
Cellular reprogramming




Tumor

                         Tumor
                         Development
                         and
                         Growth


Epigenetically
altered, self-
renewing cancer
stem cells
Cellular reprogramming

              Gene-specific
              Epigenetic
              reprogramming
biobix
wvcrieki




biobix.be
bioinformatics.be
bioinformatrix.net
                 102

2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter

  • 1.
    Next Generation MolecularProfiling 26 oktober 2011 Auditorium J, Plateau, Gent
  • 2.
    Lab for Bioinformaticsand computational genomics 10 “genome hackers” mostly engineers (statistics) 42 scientists technicians, geneticists, clinicians >100 people hardware engineers, mathematicians, molecular biologists
  • 3.
    Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 8.
    Personalized Medicine • Theuse of diagnostic tests (aka biomarkers) to identify in advance which patients are likely to respond well to a therapy • The benefits of this approach are to – avoid adverse drug reactions – improve efficacy – adjust the dose to suit the patient – differentiate a product in a competitive market – meet future legal or regulatory requirements • Potential uses of biomarkers – Risk assessment – Initial/early detection – Prognosis – Prediction/therapy selection – Response assessment – Monitoring for recurrence
  • 9.
    Biomarker First used in1971 … An objective and « predictive » measure … at the molecular level … of normal and pathogenic processes and responses to therapeutic interventions Characteristic that is objectively measured and evaluated as an indicator of normal biologic or pathogenic processes or pharmacologic response to a drug A biomarker is valid if: – It can be measured in a test system with well established performance characteristics – Evidence for its clinical significance has been established
  • 10.
    Rationale 1: Why now? Regulatory path becoming more clear There is more at stake than efficient drug development. FDA « critical path initiative » Pharmacogenomics guideline Biomarkers are the foundation of « evidence based medicine » - who should be treated, how and with what. Without Biomarkers advances in targeted therapy will be limited and treatment remain largely emperical. It is imperative that Biomarker development be accelarated along with therapeutics
  • 11.
    Why now ? Firstand maturing second generation molecular profiling methodologies allow to stratify clinical trial participants to include those most likely to benefit from the drug candidate—and exclude those who likely will not—pharmacogenomics- based Clinical trials should attain more specific results with smaller numbers of patients. Smaller numbers mean fewer costs (factor 2-10) An additional benefit for trial participants and internal review boards (IRBs) is that stratification, given the correct biomarker, may reduce or eliminate adverse events.
  • 12.
    Molecular Profiling The studyof specific patterns (fingerprints) of proteins, DNA, and/or mRNA and how these patterns correlate with an individual's physical characteristics or symptoms of disease.
  • 13.
    Generic Health advice •Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolarance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 14.
    Generic Health advice(UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolarance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 15.
    Generic Health advice(UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolerance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 16.
    Generic Health advice(UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolerance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 17.
  • 18.
    Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 19.
  • 23.
  • 24.
  • 26.
    First Generation MolecularProfiling • Flow cytometry correlates surface markers, cell size and other parameters • Circulating tumor cell assays (CTC’s) quantitate the number of tumor cells in the peripheral blood. • Exosomes are 30-90 nm vesicles secreted by a wide range of mammalian cell types. • Immunohistochemistry (IHC) measures protein expression, usually on the cell surface.
  • 30.
    First Generation MolecularProfiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 31.
    Basics of the―old‖ technology • Clone the DNA. • Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. • Separate mixture on some matrix. • Detect fluorochrome by laser. • Interpret peaks as string of DNA. • Strings are 500 to 1,000 letters long • 1 machine generates 57,000 nucleotides/run • Assemble all strings into a genome.
  • 33.
    Genetic Variation AmongPeople Single nucleotide polymorphisms (SNPs) GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG 0.1% difference among people
  • 34.
    The genome fitsas an e-mail attachment
  • 35.
    First Generation MolecularProfiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 36.
  • 37.
    First Generation MolecularProfiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 39.
    Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 40.
    Basics of the―new‖ technology • Get DNA. • Attach it to something. • Extend and amplify signal with some color scheme. • Detect fluorochrome by microscopy. • Interpret series of spots as short strings of DNA. • Strings are 30-300 letters long • Multiple images are interpreted as 0.4 to 1.2 GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.
  • 41.
    Next Generation Technologies •Roche (454) –Emulsion PCR –Polymerase –Natural Nucleotides • 100-500 Mb for 5-15k –1% error rate –Homopolymers
  • 46.
  • 47.
    Read Length isNot As Important For Resequencing 100% % of Paired K-mers with Uniquely 90% 80% Assignable Location 70% 60% E.COLI 50% HUMAN 40% 30% 20% 10% 0% 8 10 12 14 16 18 20 Length of K-mer Reads (bp) Jay Shendure
  • 48.
    Short Read Techologies •Illumina GA (HiSeq, MySeq) • ABI SOLID
  • 52.
    Other second generationtechnology: (ABI) SOLID
  • 54.
  • 56.
  • 57.
    Second Generation DNAprofiling • Enrichment Sequencing • ChIP-Seq (Chromosome Immunoprecipitation) • A substitute for ChIP-chip • Eg. to find the binding sequence of proteins (TFBS)
  • 58.
    Paired End Readsare Important! Known Distance Repetitive DNA Read 1Unique DNA 2 Read Single read maps to multiple positions
  • 59.
    Paired End Readsare Important! Known Distance Repetitive DNA Read 1Unique DNA 2 Read Single read maps to multiple positions
  • 60.
    Second Generation DNAprofiling • Exome Sequencing (aka known as targeted exome capture) is an efficient strategy to selectively sequence the coding regions of the genome to identify novel genes associated with rare and common disorders. • 160K exons
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
    Second Generation RNAprofiling Besides the 6000 protein coding-genes … 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes Function of RNA genes pRNA in 29 rotary packaging motor (Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA Contents-Schedule (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
  • 66.
    Second Generation RNAprofiling RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence
  • 67.
    Second Generation RNAprofiling Although details of the methods vary, the concept behind RNA-seq is simple: • isolate all mRNA • convert to cDNA using reverse transcriptase • sequence the cDNA • map sequences to the genome The more times a given sequence is detected, the more abundantly transcribed it is. If enough sequences are generated, a comprehensive and quantitative view of the entire transcriptome of an organism or tissue can be obtained.
  • 68.
    Second Generation RNAprofiling • Comparing to microarray – Microarray • Closed technology: Prior knowledge required • Affected by pseudo-genes (homologous of real genes) • Low sensitivity – RNA-Seq • Open technology: No prior knowledge required • Not affected by pseudo-genes because exact sequence is measured • Other information could be yielded (SNP, Alternative splicing)
  • 69.
    ncRNAs in humangenome tRNA 600 SRP RNA 1 18S rRNA 200 RNase P RNA 1 5.8S rRNA 200 Telomerase RNA 1 28S rRNA 200 RNase MRP 1 5S rRNA 200 Y RNA 5 snoRNA 300 miRNA 250 Vault 4 U1 40 7SK RNA 1 U2 30 Xist 1 U4 30 H19 1 U5 30 BIC 1 U6 20 U4atac 5 Antisense RNAs 1000s? U6atac 5 Cis reg regions 100s? U11 5 U12 5 Others ?
  • 71.
    Mapping Structural Variationin Humans >1 kb segments - Thought to be Common 12% of the genome (Redon et al. 2006) - Likely involved in phenotype variation and disease CNVs - Until recently most methods for detection were low resolution (>50 kb)
  • 72.
    Size Distribution ofCNV in a Human Genome
  • 74.
    Next next generationsequencing Third generation sequencing Now sequencing
  • 75.
  • 76.
    Pacific Biosciences: AThird Generation Sequencing Technology Eid et al 2008
  • 77.
  • 78.
  • 79.
    Second Generation Proteinprofiling • Proteomics MS-MS-based exclusively in discovery mode • Automate diagnostics assay generation (next generation proteomics) • Aptamers as alternative to antibodies • ImmunoPCR
  • 80.
    MS/MS identification pipeline overview pipeline Goal filter dataset Goal prior to multi-tiered database Goal search database Bonanza define PTMs search profile prior to database search Bonanza + IggyPep
  • 81.
    Second Generation Proteinprofiling • Proteomics MS-MS-based exclusively in discovery mode • Automate diagnostics assay generation (next generation proteomics) • Aptamers as alternative to antibodies • ImmunoPCR
  • 82.
    Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 83.
    Defining Epigenetics Genome DNA  Reversible changes in gene expression/function  Without changes in DNA Chromatin sequence Epigenome  Can be inherited from precursor cells Gene Expression  Allows to integrate intrinsic with environmental signals Phenotype (including diet) CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 84.
    CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 85.
    Epigenetic Regulation: Post TranslationalModifications to Histones and Base Changes in DNA  Epigenetic modifications of histones and DNA include: – Histone acetylation and methylation, and DNA methylation Histone Methylation Me Me Histone Me Acetylation Ac DNA Methylation CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 87.
    MGMT Biology O6 Methyl-Guanine MethylTransferase Essential DNA Repair Enzyme Removes alkyl groups from damaged guanine bases Healthy individual: - MGMT is an essential DNA repair enzyme Loss of MGMT activity makes individuals susceptible to DNA damage and prone to tumor development Glioblastoma patient on alkylator chemotherapy: - Patients with MGMT promoter methylation show have longer PFS and OS with the use of alkylating agents as chemotherapy CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 88.
    MGMT Promoter Methylation Predicts Benefitform DNA-Alkylating Chemotherapy Post-hoc subgroup analysis of Temozolomide Clinical trial with primary glioblastoma patients show benefit for patients with MGMT promoter methylation Median Overall Survival 25 21.7 months 20 plus temozolomide 15 12.7 months 10 radiotherapy radiotherapy 5 Adapted from Hegi et al. NEJM 2005 0 352(10):1036-8. Non-Methylated Methylated Study with 207 patients MGMT Gene MGMT Gene CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 89.
    Genome-wide methylation by methylationsensitive restriction enzymes CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 90.
    Genome-wide methylation by probes CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 91.
    MBD_Seq Condensed Chromatin DNA Sheared Immobilized Methyl Binding Domain DNA Sheared CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 92.
    MBD_Seq Immobilized Methyl binding domain MgCl2 Next Gen Sequencing GA Illumina: 100 million reads CONFIDENTIAL Methylation I Epigenetics | Oncology | Biomarker I NEXT-GEN | PharmacoDX | CRC
  • 93.
    Overview Personalized Medicine, Biomarkers … … Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 94.
    Bioinformatics, a lifescience discipline … management of expectations Math Computer Science Theoretical Biology NP AI, Image Analysis Datamining structure prediction (HTX) Bioinformatics Discovery Informatics – Computational Genomics Interface Design Expert Annotation Sequence Analysis (Molecular) Informatics Biology Computational Biology
  • 95.
    Translational Medicine: Aninconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes ?? • Space/time continuum …
  • 96.
    Translational Medicine: Aninconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes … • ―space/time‖ continuum
  • 99.
    Cellular programming Epigenetic (meta)information = stem cells
  • 100.
    Cellular reprogramming Tumor Tumor Development and Growth Epigenetically altered, self- renewing cancer stem cells
  • 101.
    Cellular reprogramming Gene-specific Epigenetic reprogramming
  • 102.