Genomics:
Experimental methods
Slides available
www.bioinformatics.be
Lab for Bioinformatics and
computational genomics

10 “genome hackers”
mostly engineers (statistics)

42 scientists
technicians, geneticists, clinicians

>100 people
hardware engineers,
mathematicians, molecular biologists
Overview

Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Overview

Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Personalized Medicine
• The use of diagnostic tests (aka biomarkers) to identify in advance
which patients are likely to respond well to a therapy
• The benefits of this approach are to
– avoid adverse drug reactions
– improve efficacy
– adjust the dose to suit the patient
– differentiate a product in a competitive market
– meet future legal or regulatory requirements
• Potential uses of biomarkers
– Risk assessment
– Initial/early detection
– Prognosis
– Prediction/therapy selection
– Response assessment
– Monitoring for recurrence
Biomarker

First used in 1971 … An objective and
« predictive » measure … at the molecular
level … of normal and pathogenic processes
and responses to therapeutic interventions
Characteristic that is objectively measured and
evaluated as an indicator of normal biologic
or pathogenic processes or pharmacologic
response to a drug
A biomarker is valid if:
– It can be measured in a test system with well
established performance characteristics
– Evidence for its clinical significance has been
established
Rationale 1:
Why now ? Regulatory path becoming more clear
There is more at stake than
efficient drug
development. FDA
« critical path initiative »
Pharmacogenomics
guideline
Biomarkers are the
foundation of « evidence
based medicine » - who
should be treated, how
and with what.
Without Biomarkers
advances in targeted
therapy will be limited and
treatment remain largely
emperical. It is imperative
that Biomarker
development be
accelarated along with
therapeutics
Why now ?

First and maturing second generation molecular
profiling methodologies allow to stratify clinical
trial participants to include those most likely to
benefit from the drug candidate—and exclude
those who likely will not—pharmacogenomicsbased
Clinical trials should attain more specific results
with smaller numbers of patients. Smaller
numbers mean fewer costs (factor 2-10)
An additional benefit for trial participants and
internal review boards (IRBs) is that
stratification, given the correct biomarker, may
reduce or eliminate adverse events.
Molecular Profiling

The study of specific patterns (fingerprints) of proteins,
DNA, and/or mRNA and how these patterns correlate
with an individual's physical characteristics or
symptoms of disease.
Generic Health advice

• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)

• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolarance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)

• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
Generic Health advice (UNLESS)

• Exercise (Hypertrophic Cardiomyopathy)
• Drink your milk (MCM6 Lactose intolerance)
• Eat your green beans (glucose-6-phosphate
dehydrogenase Deficiency)
• & your grains (HLA-DQ2 – Celiac disease)
• & your iron (HFE - Hemochromatosis)
• Get more rest (HLA-DR2 - Narcolepsy)
EGFR based therapy in mCRC
Overview

Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Before molecular profiling …
Before molecular profiling …
Before molecular profiling …
First Generation Molecular Profiling

• Flow cytometry correlates surface markers,
cell size and other parameters
• Circulating tumor cell assays (CTC’s)
quantitate the number of tumor cells in the
peripheral blood.
• Exosomes are 30-90 nm vesicles secreted by
a wide range of mammalian cell types.
• Immunohistochemistry (IHC) measures
protein expression, usually on the cell
surface.
First Generation Molecular Profiling

• Gene sequencing for mutation detection
• Microarray for m-RNA message detection
• RT-PCR for gene expression
• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
gene copy number
Basics of the ―old‖ technology

• Clone the DNA.
• Generate a ladder of labeled (colored)
molecules that are different by 1 nucleotide.
• Separate mixture on some matrix.
• Detect fluorochrome by laser.
• Interpret peaks as string of DNA.
• Strings are 500 to 1,000 letters long
• 1 machine generates 57,000 nucleotides/run
• Assemble all strings into a genome.
Genetic Variation
Among People
Single nucleotide polymorphisms
(SNPs)

GATTTAGATCGCGATAGAG
GATTTAGATCTCGATAGAG

0.1% difference among
people
The genome fits as an e-mail attachment
First Generation Molecular Profiling

• Gene sequencing for mutation detection
• Microarray for m-RNA message detection
• RT-PCR for gene expression
• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
gene copy number
mRNA Expression Microarray
First Generation Molecular Profiling

• Gene sequencing for mutation detection
• Microarray for m-RNA message detection
• RT-PCR for gene expression
• FISH analysis for gene copy number
• Comparative Genome Hybridization (CGH) for
gene copy number
Overview

Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Basics of the ―new‖ technology

• Get DNA.
• Attach it to something.
• Extend and amplify signal with some color
scheme.
• Detect fluorochrome by microscopy.
• Interpret series of spots as short strings of
DNA.
• Strings are 30-300 letters long
• Multiple images are interpreted as 0.4 to 1.2
GB/run (1,200,000,000 letters/day).
• Map or align strings to one or many genome.
Next Generation Technologies

• Roche (454)
–Emulsion PCR
–Polymerase
–Natural Nucleotides

• 100-500 Mb for 5-15k
–1% error rate
–Homopolymers
One additional insight ...
% of Paired K-mers with Uniquely
Assignable Location

Read Length is Not As Important For Resequencing

100%
90%
80%
70%
60%
E.COLI

50%

HUMAN

40%
30%
20%
10%
0%
8

Jay Shendure

10

12

14 16

18

20

Length of K-mer Reads (bp)
Short Read Techologies

• Illumina GA (HiSeq, MySeq)

• ABI SOLID
Other second generation technology: (ABI) SOLID
So what ?
Second generation DNA/RNA profiling
Second Generation DNA profiling

• Enrichment Sequencing
• ChIP-Seq (Chromosome
Immunoprecipitation)
• A substitute for ChIP-chip
• Eg. to find the binding sequence of
proteins (TFBS)
Paired End Reads are Important!
Known Distance

Repetitive DNA
Read 1Unique DNA 2
Read

Single read maps to
multiple positions
Paired End Reads are Important!
Known Distance

Repetitive DNA
Read 1Unique DNA 2
Read

Single read maps to
multiple positions
Second Generation DNA profiling

• Exome Sequencing (aka known as
targeted exome capture) is an
efficient strategy to selectively
sequence the coding regions of the
genome to identify novel genes
associated with rare and common
disorders.
• 160K exons
Second Generation DNA profiling
Second Generation DNA profiling
Bioinformatics tools
Bioinformatics tools
Second Generation RNA profiling

Besides the 6000 protein coding-genes …
140 ribosomal RNA genes
275 transfer RNA gnes
40 small nuclear RNA genes
>100 small nucleolar genes

Contents-Schedule

Function of RNA genes
pRNA in 29 rotary packaging motor (Simpson
et el. Nature 408:745-750,2000)
Cartilage-hair hypoplasmia mapped to an RNA
(Ridanpoa et al. Cell 104:195-203,2001)
The human Prader-Willi ciritical region (Cavaille
et al. PNAS 97:14035-7, 2000)
Second Generation RNA profiling

RNA genes can be hard to detects
UGAGGUAGUAGGUUGUAUAGU
C.elegans let-27; 21 nt
(Pasquinelli et al. Nature 408:86-89,2000)

Often small
Sometimes multicopy and redundant
Often not polyadenylated
(not represented in ESTs)
Immune to frameshift and nonsense
mutations
No open reading frame, no codon bias
Often evolving rapidly in primary sequence
Second Generation RNA profiling

Although details of the methods vary, the concept
behind RNA-seq is simple:
• isolate all mRNA
• convert to cDNA using reverse transcriptase
• sequence the cDNA
• map sequences to the genome
The more times a given sequence is detected, the
more abundantly transcribed it is. If enough
sequences are generated, a comprehensive and
quantitative view of the entire transcriptome of an
organism or tissue can be obtained.
Second Generation RNA profiling

• Comparing to microarray
– Microarray
• Closed technology: Prior knowledge required
• Affected by pseudo-genes (homologous of real genes)
• Low sensitivity

– RNA-Seq
• Open technology: No prior knowledge required
• Not affected by pseudo-genes because exact
sequence is measured
• Other information could be yielded (SNP, Alternative
splicing)
ncRNAs in human genome

tRNA
18S rRNA
5.8S rRNA
28S rRNA
5S rRNA
snoRNA
miRNA
U1
U2
U4
U5
U6
U4atac
U6atac
U11
U12

600
200
200
200
200
300
250
40
30
30
30
20
5
5
5
5

SRP RNA

1

RNase P RNA

1

Telomerase RNA

1

RNase MRP

1

Y RNA

5

Vault

4

7SK RNA

1

Xist

1

H19

1

BIC

1

Antisense RNAs 1000s?
Cis reg regions
Others

100s?
?
Mapping Structural Variation in Humans
>1 kb segments
- Thought to be Common
12% of the genome
(Redon et al. 2006)
- Likely involved in phenotype
variation and disease
CNVs

- Until recently most methods for
detection were low resolution
(>50 kb)
Size Distribution of CNV in a Human Genome
Next next generation sequencing
Third generation sequencing
Now sequencing
Ultra-low-cost SINGLE molecule sequencing
Pacific Biosciences: A Third Generation Sequencing Technology

Eid et al 2008
Complete genomics
Nanopore Sequencing
Second Generation Protein profiling

• Proteomics MS-MS-based
exclusively in discovery mode
• Automate diagnostics assay
generation (next generation
proteomics)
• Aptamers as alternative to antibodies
• ImmunoPCR
MS/MS identification
pipeline

pipeline overview

Bonanza

Bonanza + IggyPep

Goal
filter dataset
prior to
database
Goal
search
define PTMs
profile
prior to
database
search

Goal
multi-tiered
database
search
Second Generation Protein profiling

• Proteomics MS-MS-based
exclusively in discovery mode
• Automate diagnostics assay
generation (next generation
proteomics)
• Aptamers as alternative to antibodies
• ImmunoPCR
Overview

Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Genome-wide methylation
…. by next generation sequencing
# markers

3 000 000

MethylCap_Seq

6 000
EpiHealth

50
Deep_Seq
5

Discovery

<50
only models
and fresh frozen

Verification

Validation

> 50

# samples

CONFIDENTIAL
Molecular
Unification
E
P
I

Whole-genome
Bisulphite seq

G
E
N
E
T
I
C

Whole-genome
sequencing

Enrichment seq
(MBD, RRBS)

Probes
(450-27K)

Ultra
Deep

Enrichment
Targeted Panels

Enrichment seq
(Exome)

Enrichment
Targeted Panels

Deep
Seq
bp

Full genome

109

108

RUO

107

106

105

104

103

Sequencing

102

101

1

Clinical
CONFIDENTIAL
Overview

Personalized Medicine,
Biomarkers …
… Molecular Profiling
First Generation Molecular Profiling
Next Generation Molecular Profiling
Next Generation Epigenetic Profiling
Concluding Remarks
Bioinformatics, a life science discipline …

Math

Informatics

(Molecular)
Biology
Bioinformatics, a life science discipline …

Math

Computer Science

Theoretical Biology

Informatics
Computational Biology

(Molecular)
Biology
Bioinformatics, a life science discipline …

Math

Theoretical Biology

Computer Science

Bioinformatics

Informatics
Computational Biology

(Molecular)
Biology
Bioinformatics, a life science discipline … management of expectations

Math

Theoretical Biology

Computer Science
NP
Datamining

AI, Image Analysis
structure prediction (HTX)

Bioinformatics
Interface Design

Expert Annotation

Sequence Analysis
Informatics
Computational Biology

(Molecular)
Biology
Bioinformatics, a life science discipline … management of expectations

Math

Theoretical Biology

Computer Science
NP
Datamining

AI, Image Analysis
structure prediction (HTX)

Bioinformatics
Discovery Informatics – Computational Genomics
Interface Design

Expert Annotation

Sequence Analysis
Informatics
Computational Biology

(Molecular)
Biology
Translational Medicine: An inconvenient truth

• 1% of genome codes for proteins, however
more than 90% is transcribed
• Less than 10% of protein experimentally
measured can be ―explained‖ from the
genome
• 1 genome ? Structural variation
• > 200 Epigenomes ??
• Space/time continuum …
Translational Medicine: An inconvenient truth

• 1% of genome codes for proteins, however
more than 90% is transcribed
• Less than 10% of protein experimentally
measured can be ―explained‖ from the
genome
• 1 genome ? Structural variation
• > 200 Epigenomes …
• ―space/time‖ continuum
Cellular programming

Epigenetic (meta)information = stem cells
Cellular reprogramming

Tumor
Tumor
Development
and
Growth

Epigenetically
altered, selfrenewing cancer
stem cells
Cellular reprogramming

Gene-specific
Epigenetic
reprogramming
Wobblebase Mission
provide tools to both specialists (researchers,
bioinformaticians, health care providers) and
individual consumers that unlock the power of
genomic data to the USER
enable personalized genomics today by simplifying
the way we organize, visualize and manage
genomic data.
PGM: Personal Genomics Manifesto

Everybody who wants to get his genome sequenced has the human right to do so.
No third party can own your genetic data, your genetic data is exclusively yours.

Nobody can be forced to get his genome analyzed or to reveal his genome to a
third party.
Your genome should allways be treated as confidential, private information.
People should be advised not to share their identity AND their entire genome on a
public forum.
People should be advised to use secure technologies that allow to maximally
protect phenotypic and/or genotype data.
People should be able to actively explore, manage and get updated interpretation
on their genomic data.
Wobblebase Mission
•

change the
diagnostic/healthcare industry
forever by setting a new
standard and empowering the
user
Choosing the Red Pill
The Technical Feasibility Argument
The Quality Argument
The Price Argument
The Logistics around the sample on howto
manage the data Argument
The Ethical debate
The Privacy/Security concern
Notifications
Updates are the single moste
important feature of Wobblebase
#Rs1805007
Bioinformatics
Analysis
pipelines

Social
network
twitter

Wobblebase

Updates
Notifications

eHealth
(fixed
vocabulary)
Comparison
biobix
wvcrieki

biobix.be
bioinformatics.be
108

Genomics experimental-methods

  • 1.
  • 2.
    Lab for Bioinformaticsand computational genomics 10 “genome hackers” mostly engineers (statistics) 42 scientists technicians, geneticists, clinicians >100 people hardware engineers, mathematicians, molecular biologists
  • 3.
    Overview Personalized Medicine, Biomarkers … …Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 4.
    Overview Personalized Medicine, Biomarkers … …Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 10.
    Personalized Medicine • Theuse of diagnostic tests (aka biomarkers) to identify in advance which patients are likely to respond well to a therapy • The benefits of this approach are to – avoid adverse drug reactions – improve efficacy – adjust the dose to suit the patient – differentiate a product in a competitive market – meet future legal or regulatory requirements • Potential uses of biomarkers – Risk assessment – Initial/early detection – Prognosis – Prediction/therapy selection – Response assessment – Monitoring for recurrence
  • 11.
    Biomarker First used in1971 … An objective and « predictive » measure … at the molecular level … of normal and pathogenic processes and responses to therapeutic interventions Characteristic that is objectively measured and evaluated as an indicator of normal biologic or pathogenic processes or pharmacologic response to a drug A biomarker is valid if: – It can be measured in a test system with well established performance characteristics – Evidence for its clinical significance has been established
  • 12.
    Rationale 1: Why now? Regulatory path becoming more clear There is more at stake than efficient drug development. FDA « critical path initiative » Pharmacogenomics guideline Biomarkers are the foundation of « evidence based medicine » - who should be treated, how and with what. Without Biomarkers advances in targeted therapy will be limited and treatment remain largely emperical. It is imperative that Biomarker development be accelarated along with therapeutics
  • 13.
    Why now ? Firstand maturing second generation molecular profiling methodologies allow to stratify clinical trial participants to include those most likely to benefit from the drug candidate—and exclude those who likely will not—pharmacogenomicsbased Clinical trials should attain more specific results with smaller numbers of patients. Smaller numbers mean fewer costs (factor 2-10) An additional benefit for trial participants and internal review boards (IRBs) is that stratification, given the correct biomarker, may reduce or eliminate adverse events.
  • 14.
    Molecular Profiling The studyof specific patterns (fingerprints) of proteins, DNA, and/or mRNA and how these patterns correlate with an individual's physical characteristics or symptoms of disease.
  • 15.
    Generic Health advice •Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolarance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 16.
    Generic Health advice(UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolarance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 17.
    Generic Health advice(UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolerance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 18.
    Generic Health advice(UNLESS) • Exercise (Hypertrophic Cardiomyopathy) • Drink your milk (MCM6 Lactose intolerance) • Eat your green beans (glucose-6-phosphate dehydrogenase Deficiency) • & your grains (HLA-DQ2 – Celiac disease) • & your iron (HFE - Hemochromatosis) • Get more rest (HLA-DR2 - Narcolepsy)
  • 19.
  • 20.
    Overview Personalized Medicine, Biomarkers … …Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 21.
  • 25.
  • 26.
  • 28.
    First Generation MolecularProfiling • Flow cytometry correlates surface markers, cell size and other parameters • Circulating tumor cell assays (CTC’s) quantitate the number of tumor cells in the peripheral blood. • Exosomes are 30-90 nm vesicles secreted by a wide range of mammalian cell types. • Immunohistochemistry (IHC) measures protein expression, usually on the cell surface.
  • 32.
    First Generation MolecularProfiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 33.
    Basics of the―old‖ technology • Clone the DNA. • Generate a ladder of labeled (colored) molecules that are different by 1 nucleotide. • Separate mixture on some matrix. • Detect fluorochrome by laser. • Interpret peaks as string of DNA. • Strings are 500 to 1,000 letters long • 1 machine generates 57,000 nucleotides/run • Assemble all strings into a genome.
  • 35.
    Genetic Variation Among People Singlenucleotide polymorphisms (SNPs) GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG 0.1% difference among people
  • 36.
    The genome fitsas an e-mail attachment
  • 37.
    First Generation MolecularProfiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 38.
  • 39.
    First Generation MolecularProfiling • Gene sequencing for mutation detection • Microarray for m-RNA message detection • RT-PCR for gene expression • FISH analysis for gene copy number • Comparative Genome Hybridization (CGH) for gene copy number
  • 41.
    Overview Personalized Medicine, Biomarkers … …Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 42.
    Basics of the―new‖ technology • Get DNA. • Attach it to something. • Extend and amplify signal with some color scheme. • Detect fluorochrome by microscopy. • Interpret series of spots as short strings of DNA. • Strings are 30-300 letters long • Multiple images are interpreted as 0.4 to 1.2 GB/run (1,200,000,000 letters/day). • Map or align strings to one or many genome.
  • 43.
    Next Generation Technologies •Roche (454) –Emulsion PCR –Polymerase –Natural Nucleotides • 100-500 Mb for 5-15k –1% error rate –Homopolymers
  • 48.
  • 49.
    % of PairedK-mers with Uniquely Assignable Location Read Length is Not As Important For Resequencing 100% 90% 80% 70% 60% E.COLI 50% HUMAN 40% 30% 20% 10% 0% 8 Jay Shendure 10 12 14 16 18 20 Length of K-mer Reads (bp)
  • 50.
    Short Read Techologies •Illumina GA (HiSeq, MySeq) • ABI SOLID
  • 54.
    Other second generationtechnology: (ABI) SOLID
  • 56.
  • 58.
  • 59.
    Second Generation DNAprofiling • Enrichment Sequencing • ChIP-Seq (Chromosome Immunoprecipitation) • A substitute for ChIP-chip • Eg. to find the binding sequence of proteins (TFBS)
  • 60.
    Paired End Readsare Important! Known Distance Repetitive DNA Read 1Unique DNA 2 Read Single read maps to multiple positions
  • 61.
    Paired End Readsare Important! Known Distance Repetitive DNA Read 1Unique DNA 2 Read Single read maps to multiple positions
  • 62.
    Second Generation DNAprofiling • Exome Sequencing (aka known as targeted exome capture) is an efficient strategy to selectively sequence the coding regions of the genome to identify novel genes associated with rare and common disorders. • 160K exons
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
    Second Generation RNAprofiling Besides the 6000 protein coding-genes … 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes Contents-Schedule Function of RNA genes pRNA in 29 rotary packaging motor (Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
  • 68.
    Second Generation RNAprofiling RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence
  • 69.
    Second Generation RNAprofiling Although details of the methods vary, the concept behind RNA-seq is simple: • isolate all mRNA • convert to cDNA using reverse transcriptase • sequence the cDNA • map sequences to the genome The more times a given sequence is detected, the more abundantly transcribed it is. If enough sequences are generated, a comprehensive and quantitative view of the entire transcriptome of an organism or tissue can be obtained.
  • 70.
    Second Generation RNAprofiling • Comparing to microarray – Microarray • Closed technology: Prior knowledge required • Affected by pseudo-genes (homologous of real genes) • Low sensitivity – RNA-Seq • Open technology: No prior knowledge required • Not affected by pseudo-genes because exact sequence is measured • Other information could be yielded (SNP, Alternative splicing)
  • 71.
    ncRNAs in humangenome tRNA 18S rRNA 5.8S rRNA 28S rRNA 5S rRNA snoRNA miRNA U1 U2 U4 U5 U6 U4atac U6atac U11 U12 600 200 200 200 200 300 250 40 30 30 30 20 5 5 5 5 SRP RNA 1 RNase P RNA 1 Telomerase RNA 1 RNase MRP 1 Y RNA 5 Vault 4 7SK RNA 1 Xist 1 H19 1 BIC 1 Antisense RNAs 1000s? Cis reg regions Others 100s? ?
  • 73.
    Mapping Structural Variationin Humans >1 kb segments - Thought to be Common 12% of the genome (Redon et al. 2006) - Likely involved in phenotype variation and disease CNVs - Until recently most methods for detection were low resolution (>50 kb)
  • 74.
    Size Distribution ofCNV in a Human Genome
  • 76.
    Next next generationsequencing Third generation sequencing Now sequencing
  • 77.
  • 78.
    Pacific Biosciences: AThird Generation Sequencing Technology Eid et al 2008
  • 79.
  • 80.
  • 81.
    Second Generation Proteinprofiling • Proteomics MS-MS-based exclusively in discovery mode • Automate diagnostics assay generation (next generation proteomics) • Aptamers as alternative to antibodies • ImmunoPCR
  • 82.
    MS/MS identification pipeline pipeline overview Bonanza Bonanza+ IggyPep Goal filter dataset prior to database Goal search define PTMs profile prior to database search Goal multi-tiered database search
  • 83.
    Second Generation Proteinprofiling • Proteomics MS-MS-based exclusively in discovery mode • Automate diagnostics assay generation (next generation proteomics) • Aptamers as alternative to antibodies • ImmunoPCR
  • 84.
    Overview Personalized Medicine, Biomarkers … …Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 85.
    Genome-wide methylation …. bynext generation sequencing # markers 3 000 000 MethylCap_Seq 6 000 EpiHealth 50 Deep_Seq 5 Discovery <50 only models and fresh frozen Verification Validation > 50 # samples CONFIDENTIAL
  • 86.
    Molecular Unification E P I Whole-genome Bisulphite seq G E N E T I C Whole-genome sequencing Enrichment seq (MBD,RRBS) Probes (450-27K) Ultra Deep Enrichment Targeted Panels Enrichment seq (Exome) Enrichment Targeted Panels Deep Seq bp Full genome 109 108 RUO 107 106 105 104 103 Sequencing 102 101 1 Clinical CONFIDENTIAL
  • 88.
    Overview Personalized Medicine, Biomarkers … …Molecular Profiling First Generation Molecular Profiling Next Generation Molecular Profiling Next Generation Epigenetic Profiling Concluding Remarks
  • 89.
    Bioinformatics, a lifescience discipline … Math Informatics (Molecular) Biology
  • 90.
    Bioinformatics, a lifescience discipline … Math Computer Science Theoretical Biology Informatics Computational Biology (Molecular) Biology
  • 91.
    Bioinformatics, a lifescience discipline … Math Theoretical Biology Computer Science Bioinformatics Informatics Computational Biology (Molecular) Biology
  • 92.
    Bioinformatics, a lifescience discipline … management of expectations Math Theoretical Biology Computer Science NP Datamining AI, Image Analysis structure prediction (HTX) Bioinformatics Interface Design Expert Annotation Sequence Analysis Informatics Computational Biology (Molecular) Biology
  • 93.
    Bioinformatics, a lifescience discipline … management of expectations Math Theoretical Biology Computer Science NP Datamining AI, Image Analysis structure prediction (HTX) Bioinformatics Discovery Informatics – Computational Genomics Interface Design Expert Annotation Sequence Analysis Informatics Computational Biology (Molecular) Biology
  • 94.
    Translational Medicine: Aninconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes ?? • Space/time continuum …
  • 95.
    Translational Medicine: Aninconvenient truth • 1% of genome codes for proteins, however more than 90% is transcribed • Less than 10% of protein experimentally measured can be ―explained‖ from the genome • 1 genome ? Structural variation • > 200 Epigenomes … • ―space/time‖ continuum
  • 98.
  • 99.
  • 100.
  • 101.
    Wobblebase Mission provide toolsto both specialists (researchers, bioinformaticians, health care providers) and individual consumers that unlock the power of genomic data to the USER enable personalized genomics today by simplifying the way we organize, visualize and manage genomic data.
  • 102.
    PGM: Personal GenomicsManifesto Everybody who wants to get his genome sequenced has the human right to do so. No third party can own your genetic data, your genetic data is exclusively yours. Nobody can be forced to get his genome analyzed or to reveal his genome to a third party. Your genome should allways be treated as confidential, private information. People should be advised not to share their identity AND their entire genome on a public forum. People should be advised to use secure technologies that allow to maximally protect phenotypic and/or genotype data. People should be able to actively explore, manage and get updated interpretation on their genomic data.
  • 103.
    Wobblebase Mission • change the diagnostic/healthcareindustry forever by setting a new standard and empowering the user
  • 104.
    Choosing the RedPill The Technical Feasibility Argument The Quality Argument The Price Argument The Logistics around the sample on howto manage the data Argument The Ethical debate The Privacy/Security concern
  • 105.
    Notifications Updates are thesingle moste important feature of Wobblebase
  • 106.
  • 107.
  • 108.