SlideShare a Scribd company logo
1 of 46
Measuring methylation: from
arrays to sequencing
Jovana Maksimovic, PhD
jovana.maksimovic@mcri.edu.au
@JovMaksimovic
github.com/JovMaksimovic
Bioinformatics Winter School, 3 July 2017
Talk outline
• Epigenetics
• DNA methylation
• Measuring DNA methylation
• Methylation arrays
• How do they work?
• What do they measure?
• Example analysis
• Methylation sequencing
• What are the challenges?
• How does it work?
• Suggested analysis pipeline
• Summary
I work at MCRI
Me!
I mostly work on
human development
& disease…
I write software for
analysing methylation
array data
missMethyl
…and a lot of gene
expression data
RNAseq
Microarrays
I analyse a lot of
epigenetic data…
ChIPseq
ATACseq
BSseq
Microarrays
…sometimes using
mice or other
models
What is epigenetics?
• Epigenetics refers to stable
heritable traits not explained by
changes in DNA sequence
• Greek prefix “epi” means “on top
of” genetics
• Chromosome modifications
that affect gene expression
• Histones, DNA methylation
• “Anything” that isn’t DNA!
• Essential for normal
development
• Can be modified by
environment
• Can be disrupted in disease
Epigenetics brings DNA to life!
embryogenesis
blastocyst
zygote
sperm
egg
embryonic stem cells
B cell
T cell
red blood
cell
haematopoietic
stem cell fat
cell
sperm
cell
skin cell
muscle
cell
gland
cell
hormone-
secreting
cell
germ
cell
neuron
astrocyte
neuronal
progenitor
cell
lung
cell
kidney
cell
identical DNA in every cell
different
epigenetic
patterns
• Important in all species
Modified from https://biology.mit.edu/research/stemcell_epigenetics
intestine cell
Epigenetics is CRAZY complicated!
• New sequencing & microarray technologies are
enabling us to learn A LOT more about epigenetics
• Different data types need different analysis
• Today I’m only focussing on DNA methylation
Roy et al. (2010), Science
Me
Me
What is DNA methylation?
DNA methylation primarily
occurs at CpG dinucleotides
C G
A
T
C C
Patterson et al. 2011, J Vis Exp
DNA methylation in the genome
• The human genome contains
~30,000,000 CpGs (~1%)
• VERY different between
different species
• CpGs are not evenly spaced
across the genome
• Tend to be present in clusters
called CpG islands
• CpG methylation is spatially
correlated
Eckhardt et al. 2007, Nature Genetics
~500bp
Methylation correlation with distance
Methylation can regulate
gene expression
Plot from Peter Hickey
http://meeting.dxy.cn/oemethylation2012/article/i18782.html
Methylation at a single CpG vs. gene expression
Each point is one
sample
Methylation changes coat colour
of Agouti mice Dolinoy 2008, Nutr Rev.
This gene controls
coat colour in
Agouti mice
These CpG sites in the
promoter change PS1A
expression depending on
methylation
These mice
are genetically
identical
Hypomethylated Hypermethylated
Coat colour different due to different maternal diet i.e. environment!
Cridge et al. 2015, Nutrients
Methylation makes worker bees!
These larvae
are genetically
identical Hypomethylated
Hypermethylated
Methylation is cool
What do we usually want to know about it?
Finding methylation differences
can tell us a lot
• Methylation is critical in determining cell type
• Regulatory T-cell vs. Naïve T-cell
• Methylation can be disrupted in disease
• Cancer vs. Normal
• Methylation is affected by the environment
• Smokers vs. Non-smokers
Collect appropriate
samples
Extract DNA and
measure methylation
Statistical analysis
Normal
Cancer
Epigenome-wide association
studies (EWAS)
• Similar to GWAS
• Compare lots of cases to lots
of controls
• Often looking for small effects
e.g. complex disease or
environmental effects
• Need lots of samples
• 100s or 1000s of cases &
controls
https://en.wikipedia.org/wiki/Epigenome-wide_association_study_(EWAS)
How do we measure methylation?
• Bisulphite conversion
• Create “SNPs”
• Single nucleotide
resolution
• Array
• Sequencing
• Enrichment of
methylated DNA
• Restriction enzymes
• Affinity
• Regional resolution
• Array
• Sequencing
What is bisulphite conversion?
• Chemical process
• Unmethylated Cs get
converted to Ts
• Methylated Cs are
protected
• Creates “SNP”
• Used to call methylation
PCR
Methylation arrays
What are they and how do they work?
Illumina Infinium
HumanMethylation BeadChips
• Human only
• Gene biased; selected to be relevant to human development & disease
• eg. TSS, promoters, CpG islands, enhancers, ...
1 chip = 12 samples
>27,000 unique CpG sites
measured in each sample
1 chip = 8 samples
1 chip = 12 samples
27k array (2009) 450k array (2011) 850k array (2015)
>450,000 unique CpG sites
measured in each sample
>850,000 unique CpG sites
measured in each sample
Modified slide from Belinda Phipson
Methylation arrays are based on
SNP array technology
• Methylation array “SNPs” (C/T) are created by
bisulphite conversion
• Comparing the intensity of C/T gives the proportion
of methylation at single CpG
What is this
base?
Measure
fluorescence
intensity
What methylation values can
we get?
• On an array, we
measure methylation in
a population of cells
• Individual cell can be
either 0, 0.5 or 1 at one
CpG
• Across a population we
get a continuous
measurement between
[0-1]
CH3 CH3 CH3
0 0.5 1
A sample
Many cells in
single sample
Measures of methylation
• Arrays measure both methylated (C) and unmethylated (T)
signal to get proportion of methylation at a CpG
β =
𝑀𝑒𝑡ℎ
𝑀𝑒𝑡ℎ+𝑈𝑛𝑚𝑒𝑡ℎ
Intuitive, easy to interpret,
great for visualisation
M value
Beta
value
Du et al. 2011, BMC Bioinformatics
𝑀 = log2
𝛽
1−𝛽
Can convert between them
via a logit transformation
𝑀 = log2
𝑀𝑒𝑡ℎ
𝑈𝑛𝑚𝑒𝑡ℎ
Better statistical
properties, recommended
for statistical testing
What does the data look like?
Sample
A1
Sample
A2
Sample
A3
Sample
B1
Sample
B2
Sample
B3
0.213 0.221 0.311 0.123 0.216 0.198
-0.011 0.001 -0.016 2.011 2.002 2.702
2.213 2.256 2.698 0.052 0.101 0.238
4.567 5.231 4.982 4.152 6.216 4.698
-4.723 -3.459 -5.36 -5.763 -5.122 -4.998
-5.567 -4.666 -4.845 -4.522 -4.111 -3.245
3.421 5.467 5.554 5.445 5.298 4.514
2.981 3.345 3.512 -3.534 -4.311 -3.889
3.792 2.987 3.324 -0.231 -0.066 -0.001
… ... ...
CpG
sites
Table of M-values
Array analysis pipeline
QC: b density plots, control
probes, MDS/clustering plots, …
Normalization: within and
between arrays
Statistical testing for differential
methylation, CpGs & regions
Annotation to genes, gene set
testing, visualization, …
Combine with other data types
Transform data to
remove unwanted
variation
minfi, missMethyl,
wateRmelon
Estimate means and
variances and
borrow information
across probes
limma, bumphunter,
DMRcate
Think about biological
interpretation
missMethyl, Gviz
e.g. gene expression GenomicRanges
Remove bad samples
and poor performing
probes (CpGs)
minfi, methylumi,
limma
Software
M28 M29 M30
naive
activated
naive
activated
rTreg
rTreg
After QC, data exploration is your
friend!
Dimension 1 Dimension 3
Dimension
2
Dimension
4
Clustering by individual and cell type
MDS plots showing largest sources of variation in the data
Statistical testing:
Look for differences at single CpGs
Differential methylation
Phipson & Oshlack 2015, Genome Biology
moderated t =
|𝑦𝑐𝑎𝑛−𝑦𝑛𝑜𝑟𝑚|
𝑠 𝑣
𝑠 is the empirical Bayes variance
Linear model :
𝑦 = 𝑋𝛽 + ε
Smyth, 2004
Adjust the p-values using
Benjamini and Hochberg’s FDR
Can take
into account
any other
covariates
One test per CpG!
Modified slide from Belinda Phipson & Alicia Oshlack
Lots of differences between
immune cell types!
Statistical testing:
Differences across CpG dense region
• Recall: CpG methylation
is spatially correlated
• Can we find consistent
group-average level
differences between
CpGs that are close
together?
• More functionally
relevant than differences
at individual CpGs? Aryee et al. 2014, Bioinformatics
Lots of DMRs between immune cell types!
You can do other cool stuff!
• Unmethylated
regions in rTreg
compared to naïve
cells enriched for
FOXP3 binding
motifs!
Forkhead-
binding
motif
Consensus
motif from
DMR seqs.
DMR consensus motif matches
Forkhead-binding motif
Differences in cell types controlled by FOXP3!
Modified slide from Alicia Oshlack
Methylation array analysis is very
mature: lots of methods!
https://www.bioconductor.org/
https://f1000research.com/articles/5-1281/v3
Methylation sequencing
AKA bisulphite sequencing: the good, the bad and the ugly
Two main types of bisulphite
sequencing
• Whole-genome bisulphite sequencing (BS-seq)
• Gold standard
• Genome-wide (~30,000,000 CpGs in human)
• Expensive but covers almost everything
• Need high (10-30x) coverage to reliably call methylation
• Targeted BS-seq
• Only sequence regions of interest
• Reduced representation BS-seq (restriction enzyme)
• Capture BS-seq (similar principal to exome)
• Cheaper but can miss a lot of stuff
• Can usually do higher (20-60x) coverage
What was bisulphite conversion
again?
DNA
fragment
All four of these can
be sequenced!
What are the challenges?
• Like calling SNPs, methylation in BS-seq inferred by
comparison to unconverted reference sequence
• Correct alignment is critical
• More challenging than usual!
• Aligned sequences do not exactly match reference
• Complexity of libraries is reduced
• Many Cs become Ts, so less info for mapping!
• Methylation is not symmetrical
• Two strands of DNA in the reference genome must be
considered separately
Mapping (Bismark)
DNA
fragment
BS conversion & PCR
Mapping (Bismark)
TCGGTATGTTTAAACGTT
DNA
fragment
BS conversion & PCR
Mapping (Bismark)
TCGGTATGTTTAAACGTT
TTGGTATGTTTAAATGTT TCAATATATTTAAACATT
In silico read
conversion
C-to-T G-to-A
DNA
fragment
BS conversion & PCR
Mapping (Bismark)
TCGGTATGTTTAAACGTT
TTGGTATGTTTAAATGTT TCAATATATTTAAACATT
In silico read
conversion
C-to-T G-to-A
…TTGGTATGTTTAAATGTT…
…AACCATACAAATTTACAA…
…CCAACATATTTAAACACT…
…GGTTGTATAAATTTGTGA…
Align to in silico
bisulphite converted
genome
Fwd strand C-to-T converted genome Fwd strand G-to-A converted genome
Reverse complement Reverse complement
DNA
fragment
BS conversion & PCR
TCAATATATTTAAACATT TCAATATATTTAAACATT
TCAATATATTTAAACATT TCAATATATTTAAACATT
Mapping (Bismark)
TCGGTATGTTTAAACGTT
TTGGTATGTTTAAATGTT TCAATATATTTAAACATT
In silico read
conversion
C-to-T G-to-A
…TTGGTATGTTTAAATGTT…
…AACCATACAAATTTACAA…
…CCAACATATTTAAACACT…
…GGTTGTATAAATTTGTGA…
Align to in silico
bisulphite converted
genome
Fwd strand C-to-T converted genome Fwd strand G-to-A converted genome
Reverse complement Reverse complement
…TTGGTATGTTTAAATGTT…
…AACCATACAAATTTACAA…
…CCAACATATTTAAACACT…
…GGTTGTATAAATTTGTGA…
x x x x x x x x x
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
Read all alignment
outputs simultaneously
to determine if
sequence can be
mapped uniquely
DNA
fragment
BS conversion & PCR
TCAATATATTTAAACATT TCAATATATTTAAACATT
TCAATATATTTAAACATT TCAATATATTTAAACATT
Mapping (Bismark)
TCGGTATGTTTAAACGTT
TTGGTATGTTTAAATGTT TCAATATATTTAAACATT
In silico read
conversion
C-to-T G-to-A
…TTGGTATGTTTAAATGTT…
…AACCATACAAATTTACAA…
…CCAACATATTTAAACACT…
…GGTTGTATAAATTTGTGA…
Align to in silico
bisulphite converted
genome
Fwd strand C-to-T converted genome Fwd strand G-to-A converted genome
Reverse complement Reverse complement
…TTGGTATGTTTAAATGTT…
…AACCATACAAATTTACAA…
…CCAACATATTTAAACACT…
…GGTTGTATAAATTTGTGA…
x x x x x x x x x
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
TTGGTATGTTTAAATGTT TTGGTATGTTTAAATGTT
TTGGTATGTTTAAATGTT TTGGTATGTTTAAATGTT
x x x
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x
x
x x x
Read all alignment
outputs simultaneously
to determine if
sequence can be
mapped uniquely
DNA
fragment
BS conversion & PCR
Calling methylation
TCGGTATGTTTAAATGTT
TATGTTTAAATGTT
…TCGGTATGTTTAAAT
…TCGGTATGTT AAACGTT…
…TCGGTATGTTTAAATGTT
GTT…
…TTG
…CCGGCATGTTTAAACGCT…
…TCGGTATGTTT
…TCGGTATGTTTAAATGTT…
ATGTT…
…TCGGTATGTTTAAAT TT…
…TTGGTATGTTTA ATGTT…
…TCGGTATGTTTAAACGT 2
10
× 100 = 20%
8
10
× 100 = 80%
Genome reference
Calling methylation
TCGGTATGTTTAAATGTT
TATGTTTAAATGTT
…TCGGTATGTTTAAATGTT…
…TTG
…CCGGCATGTTTAAACGCT… Genome reference
Good coverage is very important for reliable
methylation calls!
Some real BS-seq mapping results
https://software.broadinstitute.org/software/igv/interpreting_bisulfite_mode
Methylation calling output
chr1 753479 753479 50 1 1
chr1 753492 753492 66.67 2 1
chr1 753540 753540 100 1 0
chr1 753541 753541 50 1 1
chr1 753667 753667 25 1 3
chr1 753724 753724 66.67 2 1
chr1 753763 753763 0 0 2
chr1 753785 753785 0 0 1
chr1 759932 759932 100 1 0
chr1 760913 760913 0 0 1
chr1 761299 761299 100 2 0
chr1 761371 761371 80 8 2
chr1 761377 761377 100 10 0
chr1 761446 761446 92.86 13 1
chr1 761460 761460 53.85 7 6
chr1 762005 762005 100 1 0
chr1 762114 762114 0 0 5
chr1 762176 762176 0 0 7
chr1 762180 762180 0 0 8
No. unmethylated
reads
No. methylated
reads
% methylation
Sum for total coverage
Position of C in genome
80% =
8
8+2
× 100
This is what we work with!
Krueger et al. 2012, Nature Methods
Analysis pipeline
Thorough QC is VERY
important for BS-seq
Need to be brutal with
trimming off poor
quality bases…
…and adapters
As with SNP calling,
removing PCR
duplicates is a good
idea for better
methylation calling
Other stuff to find cool biology!
Summary
• Methylation arrays very popular
• Only for human
• Great for EWAS
• Analysis very mature
• Bioconductor is the place to go!
• BS-seq best option for genome-wide single nucleotide
resolution
• Only option for species other than human
• Pre-processing, mapping, etc. pretty good
• Statistical analysis still developing
• Bioconductor is a valuable resource
• Downstream analysis dependent on biological question
• Methylation is interesting & we know how to measure it
• Best technology for the job depends on what you want to know!
Acknowledgments
Murdoch Childrens Research
Institute
• Alicia Oshlack
• Belinda Phipson
• MCRI Bioinformatics group!
Johns Hopkins University
• Peter Hickey
missMethyl
https://www.bioconductor.org/packages/
release/bioc/html/missMethyl.html
jovana.maksimovic@mcri.edu.au
@JovMaksimovic
github.com/JovMaksimovic
https://f1000research.com/articles/5-1281/v3

More Related Content

Similar to DNA methylation: from array to sequencing

Dr. Christopher Yau (University of Birmingham) - Data-driven systems medicine
Dr. Christopher Yau (University of Birmingham) - Data-driven systems medicineDr. Christopher Yau (University of Birmingham) - Data-driven systems medicine
Dr. Christopher Yau (University of Birmingham) - Data-driven systems medicinemntbs1
 
CancerBioinformatics_DataTypesResources_012215.pptx
CancerBioinformatics_DataTypesResources_012215.pptxCancerBioinformatics_DataTypesResources_012215.pptx
CancerBioinformatics_DataTypesResources_012215.pptxQiZhi2
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerTom Kelly
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfsabyabby
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
Hertweck AB3ACBS presentation
Hertweck AB3ACBS presentationHertweck AB3ACBS presentation
Hertweck AB3ACBS presentationKate Hertweck
 
Applications of Flow Cytometry | Cell Analysis
Applications of Flow Cytometry | Cell AnalysisApplications of Flow Cytometry | Cell Analysis
Applications of Flow Cytometry | Cell AnalysisUniversity of The Punjab
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 
Epigenetics
EpigeneticsEpigenetics
Epigeneticsgiumtell
 
Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategiesAshfaq Ahmad
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Leighton Pritchard
 
Poster13 laura Abell
Poster13 laura AbellPoster13 laura Abell
Poster13 laura AbellLaura Abell
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expressionmorenorossi
 
Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Mar Gonzàlez-Porta
 
Integrative analysis of medical imaging and omics
Integrative analysis of medical imaging and omicsIntegrative analysis of medical imaging and omics
Integrative analysis of medical imaging and omicsHongyoon Choi
 
genomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptxgenomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptxRajesh Yadav
 
Seftah DNA fingerprint 2007NEW.ppt
Seftah DNA fingerprint 2007NEW.pptSeftah DNA fingerprint 2007NEW.ppt
Seftah DNA fingerprint 2007NEW.pptSamerPaser
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, LeidenAlain van Gool
 

Similar to DNA methylation: from array to sequencing (20)

Dr. Christopher Yau (University of Birmingham) - Data-driven systems medicine
Dr. Christopher Yau (University of Birmingham) - Data-driven systems medicineDr. Christopher Yau (University of Birmingham) - Data-driven systems medicine
Dr. Christopher Yau (University of Birmingham) - Data-driven systems medicine
 
CancerBioinformatics_DataTypesResources_012215.pptx
CancerBioinformatics_DataTypesResources_012215.pptxCancerBioinformatics_DataTypesResources_012215.pptx
CancerBioinformatics_DataTypesResources_012215.pptx
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdf
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Hertweck AB3ACBS presentation
Hertweck AB3ACBS presentationHertweck AB3ACBS presentation
Hertweck AB3ACBS presentation
 
Applications of Flow Cytometry | Cell Analysis
Applications of Flow Cytometry | Cell AnalysisApplications of Flow Cytometry | Cell Analysis
Applications of Flow Cytometry | Cell Analysis
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Use of data
Use of dataUse of data
Use of data
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Gene hunting strategies
Gene hunting strategiesGene hunting strategies
Gene hunting strategies
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
TILLING & ECOTILLING
TILLING & ECOTILLINGTILLING & ECOTILLING
TILLING & ECOTILLING
 
Poster13 laura Abell
Poster13 laura AbellPoster13 laura Abell
Poster13 laura Abell
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...Identification, annotation and visualisation of extreme changes in splicing w...
Identification, annotation and visualisation of extreme changes in splicing w...
 
Integrative analysis of medical imaging and omics
Integrative analysis of medical imaging and omicsIntegrative analysis of medical imaging and omics
Integrative analysis of medical imaging and omics
 
genomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptxgenomics proteomics metbolomics.pptx
genomics proteomics metbolomics.pptx
 
Seftah DNA fingerprint 2007NEW.ppt
Seftah DNA fingerprint 2007NEW.pptSeftah DNA fingerprint 2007NEW.ppt
Seftah DNA fingerprint 2007NEW.ppt
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden
 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 

Recently uploaded (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 

DNA methylation: from array to sequencing

  • 1. Measuring methylation: from arrays to sequencing Jovana Maksimovic, PhD jovana.maksimovic@mcri.edu.au @JovMaksimovic github.com/JovMaksimovic Bioinformatics Winter School, 3 July 2017
  • 2. Talk outline • Epigenetics • DNA methylation • Measuring DNA methylation • Methylation arrays • How do they work? • What do they measure? • Example analysis • Methylation sequencing • What are the challenges? • How does it work? • Suggested analysis pipeline • Summary
  • 3. I work at MCRI Me! I mostly work on human development & disease… I write software for analysing methylation array data missMethyl …and a lot of gene expression data RNAseq Microarrays I analyse a lot of epigenetic data… ChIPseq ATACseq BSseq Microarrays …sometimes using mice or other models
  • 4. What is epigenetics? • Epigenetics refers to stable heritable traits not explained by changes in DNA sequence • Greek prefix “epi” means “on top of” genetics • Chromosome modifications that affect gene expression • Histones, DNA methylation • “Anything” that isn’t DNA! • Essential for normal development • Can be modified by environment • Can be disrupted in disease
  • 5. Epigenetics brings DNA to life! embryogenesis blastocyst zygote sperm egg embryonic stem cells B cell T cell red blood cell haematopoietic stem cell fat cell sperm cell skin cell muscle cell gland cell hormone- secreting cell germ cell neuron astrocyte neuronal progenitor cell lung cell kidney cell identical DNA in every cell different epigenetic patterns • Important in all species Modified from https://biology.mit.edu/research/stemcell_epigenetics intestine cell
  • 6. Epigenetics is CRAZY complicated! • New sequencing & microarray technologies are enabling us to learn A LOT more about epigenetics • Different data types need different analysis • Today I’m only focussing on DNA methylation Roy et al. (2010), Science Me Me
  • 7. What is DNA methylation? DNA methylation primarily occurs at CpG dinucleotides C G A T C C
  • 8. Patterson et al. 2011, J Vis Exp DNA methylation in the genome • The human genome contains ~30,000,000 CpGs (~1%) • VERY different between different species • CpGs are not evenly spaced across the genome • Tend to be present in clusters called CpG islands • CpG methylation is spatially correlated Eckhardt et al. 2007, Nature Genetics ~500bp Methylation correlation with distance
  • 9. Methylation can regulate gene expression Plot from Peter Hickey http://meeting.dxy.cn/oemethylation2012/article/i18782.html Methylation at a single CpG vs. gene expression Each point is one sample
  • 10. Methylation changes coat colour of Agouti mice Dolinoy 2008, Nutr Rev. This gene controls coat colour in Agouti mice These CpG sites in the promoter change PS1A expression depending on methylation These mice are genetically identical Hypomethylated Hypermethylated Coat colour different due to different maternal diet i.e. environment!
  • 11. Cridge et al. 2015, Nutrients Methylation makes worker bees! These larvae are genetically identical Hypomethylated Hypermethylated
  • 12. Methylation is cool What do we usually want to know about it?
  • 13. Finding methylation differences can tell us a lot • Methylation is critical in determining cell type • Regulatory T-cell vs. Naïve T-cell • Methylation can be disrupted in disease • Cancer vs. Normal • Methylation is affected by the environment • Smokers vs. Non-smokers Collect appropriate samples Extract DNA and measure methylation Statistical analysis Normal Cancer
  • 14. Epigenome-wide association studies (EWAS) • Similar to GWAS • Compare lots of cases to lots of controls • Often looking for small effects e.g. complex disease or environmental effects • Need lots of samples • 100s or 1000s of cases & controls https://en.wikipedia.org/wiki/Epigenome-wide_association_study_(EWAS)
  • 15. How do we measure methylation? • Bisulphite conversion • Create “SNPs” • Single nucleotide resolution • Array • Sequencing • Enrichment of methylated DNA • Restriction enzymes • Affinity • Regional resolution • Array • Sequencing
  • 16. What is bisulphite conversion? • Chemical process • Unmethylated Cs get converted to Ts • Methylated Cs are protected • Creates “SNP” • Used to call methylation PCR
  • 17. Methylation arrays What are they and how do they work?
  • 18. Illumina Infinium HumanMethylation BeadChips • Human only • Gene biased; selected to be relevant to human development & disease • eg. TSS, promoters, CpG islands, enhancers, ... 1 chip = 12 samples >27,000 unique CpG sites measured in each sample 1 chip = 8 samples 1 chip = 12 samples 27k array (2009) 450k array (2011) 850k array (2015) >450,000 unique CpG sites measured in each sample >850,000 unique CpG sites measured in each sample Modified slide from Belinda Phipson
  • 19. Methylation arrays are based on SNP array technology • Methylation array “SNPs” (C/T) are created by bisulphite conversion • Comparing the intensity of C/T gives the proportion of methylation at single CpG What is this base? Measure fluorescence intensity
  • 20. What methylation values can we get? • On an array, we measure methylation in a population of cells • Individual cell can be either 0, 0.5 or 1 at one CpG • Across a population we get a continuous measurement between [0-1] CH3 CH3 CH3 0 0.5 1 A sample Many cells in single sample
  • 21. Measures of methylation • Arrays measure both methylated (C) and unmethylated (T) signal to get proportion of methylation at a CpG β = 𝑀𝑒𝑡ℎ 𝑀𝑒𝑡ℎ+𝑈𝑛𝑚𝑒𝑡ℎ Intuitive, easy to interpret, great for visualisation M value Beta value Du et al. 2011, BMC Bioinformatics 𝑀 = log2 𝛽 1−𝛽 Can convert between them via a logit transformation 𝑀 = log2 𝑀𝑒𝑡ℎ 𝑈𝑛𝑚𝑒𝑡ℎ Better statistical properties, recommended for statistical testing
  • 22. What does the data look like? Sample A1 Sample A2 Sample A3 Sample B1 Sample B2 Sample B3 0.213 0.221 0.311 0.123 0.216 0.198 -0.011 0.001 -0.016 2.011 2.002 2.702 2.213 2.256 2.698 0.052 0.101 0.238 4.567 5.231 4.982 4.152 6.216 4.698 -4.723 -3.459 -5.36 -5.763 -5.122 -4.998 -5.567 -4.666 -4.845 -4.522 -4.111 -3.245 3.421 5.467 5.554 5.445 5.298 4.514 2.981 3.345 3.512 -3.534 -4.311 -3.889 3.792 2.987 3.324 -0.231 -0.066 -0.001 … ... ... CpG sites Table of M-values
  • 23. Array analysis pipeline QC: b density plots, control probes, MDS/clustering plots, … Normalization: within and between arrays Statistical testing for differential methylation, CpGs & regions Annotation to genes, gene set testing, visualization, … Combine with other data types Transform data to remove unwanted variation minfi, missMethyl, wateRmelon Estimate means and variances and borrow information across probes limma, bumphunter, DMRcate Think about biological interpretation missMethyl, Gviz e.g. gene expression GenomicRanges Remove bad samples and poor performing probes (CpGs) minfi, methylumi, limma Software
  • 25. After QC, data exploration is your friend! Dimension 1 Dimension 3 Dimension 2 Dimension 4 Clustering by individual and cell type MDS plots showing largest sources of variation in the data
  • 26. Statistical testing: Look for differences at single CpGs Differential methylation Phipson & Oshlack 2015, Genome Biology moderated t = |𝑦𝑐𝑎𝑛−𝑦𝑛𝑜𝑟𝑚| 𝑠 𝑣 𝑠 is the empirical Bayes variance Linear model : 𝑦 = 𝑋𝛽 + ε Smyth, 2004 Adjust the p-values using Benjamini and Hochberg’s FDR Can take into account any other covariates One test per CpG! Modified slide from Belinda Phipson & Alicia Oshlack Lots of differences between immune cell types!
  • 27. Statistical testing: Differences across CpG dense region • Recall: CpG methylation is spatially correlated • Can we find consistent group-average level differences between CpGs that are close together? • More functionally relevant than differences at individual CpGs? Aryee et al. 2014, Bioinformatics Lots of DMRs between immune cell types!
  • 28. You can do other cool stuff! • Unmethylated regions in rTreg compared to naïve cells enriched for FOXP3 binding motifs! Forkhead- binding motif Consensus motif from DMR seqs. DMR consensus motif matches Forkhead-binding motif Differences in cell types controlled by FOXP3! Modified slide from Alicia Oshlack
  • 29. Methylation array analysis is very mature: lots of methods! https://www.bioconductor.org/ https://f1000research.com/articles/5-1281/v3
  • 30. Methylation sequencing AKA bisulphite sequencing: the good, the bad and the ugly
  • 31. Two main types of bisulphite sequencing • Whole-genome bisulphite sequencing (BS-seq) • Gold standard • Genome-wide (~30,000,000 CpGs in human) • Expensive but covers almost everything • Need high (10-30x) coverage to reliably call methylation • Targeted BS-seq • Only sequence regions of interest • Reduced representation BS-seq (restriction enzyme) • Capture BS-seq (similar principal to exome) • Cheaper but can miss a lot of stuff • Can usually do higher (20-60x) coverage
  • 32. What was bisulphite conversion again? DNA fragment All four of these can be sequenced!
  • 33. What are the challenges? • Like calling SNPs, methylation in BS-seq inferred by comparison to unconverted reference sequence • Correct alignment is critical • More challenging than usual! • Aligned sequences do not exactly match reference • Complexity of libraries is reduced • Many Cs become Ts, so less info for mapping! • Methylation is not symmetrical • Two strands of DNA in the reference genome must be considered separately
  • 36. Mapping (Bismark) TCGGTATGTTTAAACGTT TTGGTATGTTTAAATGTT TCAATATATTTAAACATT In silico read conversion C-to-T G-to-A DNA fragment BS conversion & PCR
  • 37. Mapping (Bismark) TCGGTATGTTTAAACGTT TTGGTATGTTTAAATGTT TCAATATATTTAAACATT In silico read conversion C-to-T G-to-A …TTGGTATGTTTAAATGTT… …AACCATACAAATTTACAA… …CCAACATATTTAAACACT… …GGTTGTATAAATTTGTGA… Align to in silico bisulphite converted genome Fwd strand C-to-T converted genome Fwd strand G-to-A converted genome Reverse complement Reverse complement DNA fragment BS conversion & PCR
  • 38. TCAATATATTTAAACATT TCAATATATTTAAACATT TCAATATATTTAAACATT TCAATATATTTAAACATT Mapping (Bismark) TCGGTATGTTTAAACGTT TTGGTATGTTTAAATGTT TCAATATATTTAAACATT In silico read conversion C-to-T G-to-A …TTGGTATGTTTAAATGTT… …AACCATACAAATTTACAA… …CCAACATATTTAAACACT… …GGTTGTATAAATTTGTGA… Align to in silico bisulphite converted genome Fwd strand C-to-T converted genome Fwd strand G-to-A converted genome Reverse complement Reverse complement …TTGGTATGTTTAAATGTT… …AACCATACAAATTTACAA… …CCAACATATTTAAACACT… …GGTTGTATAAATTTGTGA… x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Read all alignment outputs simultaneously to determine if sequence can be mapped uniquely DNA fragment BS conversion & PCR
  • 39. TCAATATATTTAAACATT TCAATATATTTAAACATT TCAATATATTTAAACATT TCAATATATTTAAACATT Mapping (Bismark) TCGGTATGTTTAAACGTT TTGGTATGTTTAAATGTT TCAATATATTTAAACATT In silico read conversion C-to-T G-to-A …TTGGTATGTTTAAATGTT… …AACCATACAAATTTACAA… …CCAACATATTTAAACACT… …GGTTGTATAAATTTGTGA… Align to in silico bisulphite converted genome Fwd strand C-to-T converted genome Fwd strand G-to-A converted genome Reverse complement Reverse complement …TTGGTATGTTTAAATGTT… …AACCATACAAATTTACAA… …CCAACATATTTAAACACT… …GGTTGTATAAATTTGTGA… x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x TTGGTATGTTTAAATGTT TTGGTATGTTTAAATGTT TTGGTATGTTTAAATGTT TTGGTATGTTTAAATGTT x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Read all alignment outputs simultaneously to determine if sequence can be mapped uniquely DNA fragment BS conversion & PCR
  • 42. Some real BS-seq mapping results https://software.broadinstitute.org/software/igv/interpreting_bisulfite_mode
  • 43. Methylation calling output chr1 753479 753479 50 1 1 chr1 753492 753492 66.67 2 1 chr1 753540 753540 100 1 0 chr1 753541 753541 50 1 1 chr1 753667 753667 25 1 3 chr1 753724 753724 66.67 2 1 chr1 753763 753763 0 0 2 chr1 753785 753785 0 0 1 chr1 759932 759932 100 1 0 chr1 760913 760913 0 0 1 chr1 761299 761299 100 2 0 chr1 761371 761371 80 8 2 chr1 761377 761377 100 10 0 chr1 761446 761446 92.86 13 1 chr1 761460 761460 53.85 7 6 chr1 762005 762005 100 1 0 chr1 762114 762114 0 0 5 chr1 762176 762176 0 0 7 chr1 762180 762180 0 0 8 No. unmethylated reads No. methylated reads % methylation Sum for total coverage Position of C in genome 80% = 8 8+2 × 100 This is what we work with!
  • 44. Krueger et al. 2012, Nature Methods Analysis pipeline Thorough QC is VERY important for BS-seq Need to be brutal with trimming off poor quality bases… …and adapters As with SNP calling, removing PCR duplicates is a good idea for better methylation calling Other stuff to find cool biology!
  • 45. Summary • Methylation arrays very popular • Only for human • Great for EWAS • Analysis very mature • Bioconductor is the place to go! • BS-seq best option for genome-wide single nucleotide resolution • Only option for species other than human • Pre-processing, mapping, etc. pretty good • Statistical analysis still developing • Bioconductor is a valuable resource • Downstream analysis dependent on biological question • Methylation is interesting & we know how to measure it • Best technology for the job depends on what you want to know!
  • 46. Acknowledgments Murdoch Childrens Research Institute • Alicia Oshlack • Belinda Phipson • MCRI Bioinformatics group! Johns Hopkins University • Peter Hickey missMethyl https://www.bioconductor.org/packages/ release/bioc/html/missMethyl.html jovana.maksimovic@mcri.edu.au @JovMaksimovic github.com/JovMaksimovic https://f1000research.com/articles/5-1281/v3

Editor's Notes

  1. Good afternoon & thanks to the Winter School for inviting me to tell you all about how we measure DNA methylation.
  2. My talk today will start off with a general introduction about what epigenentics is and then focus specifically on DNA methylation and how we measure it. I will cover the two most popular platforms for measuring methylation: arrays and sequencing and how they work.
  3. This is me. I am a post doc at the Murdoch Childrens Resarch Institute at the Royal Children’s Hospital in Melbourne. Somewhat unsurprisingly, I work mostly with human data focusing on disease and development, but sometimes I work on data from mice and other model organisms. So, a lot of what I will talk to today will be about humans, but many of the principles relate to other species as well. In my day to day work, I mostly analyse epigenetic data from a variety of sequencing and array platforms. I also analyse a lot of gene expression data from RNAseq and microarrays. And, when I’m not analysing data, I occasionally develop methods and write software. Along with my colleague Belinda Phipson, I have developed a software package for analysing methylation array data called missMethyl, which is freely available online from the Bioconductor website, which I’ll tell you a little more about later.
  4. So, what is epigenetics? Epigenetics typically refers to stable heritable traits that can’t be explained by changed to the DNA sequence. This Greek prefix “epi” literally means “on top of” genetics, so it refers to another layer of control on top of the DNA sequence. Typically when we refer to epigenetics, we are talking about chromosomal modifications that affect the expression of genes; usually histones and DNA methylation. But, epigenetics can pretty much refer to anything that isn’t DNA. Epigenetics is important because it’s essential for development, can be modified by the environment and can be disrupted in disease.
  5. I like to think that epigenetics is what brings DNA to life. DNA itself is just a static code but it’s epigenetics that executes the code to produce many different cellular phenotypes from the same DNA. If we look at human development, for example, we all start of as a single cell – the zygote – with a bunch of instructions written into the DNA. But then this single cell goes on to become the hundreds of different cell types that make up an entire human being – like the one I’ve got wriggling around in here! And all of this is made possible through epigenetics!!! Although this is a human example, epigenetics is absolutely vital for all species.
  6. Because epigenetics lets DNA turn a single cell into an entire multicellular, living organism it is actually CRAZY complicated! And many of the new sequenceing and microarray technologies that have appeared in the last decade have enabled us to learn a lot more about all sort of epigenetics phenomena. This has required a lot of different types of data and different types of analysis but, Lucky for me, today I’m only going to be focusing on one thing – DNA methylation.
  7. At this point some of you may be wondering, what is DNA methylation? Well, if we zoom into one of our many cells, and unpackage the DNA and then look at the individual bases, the As, C, Ts, and Gs, DNA methylation is the chemical modification of the cytosine C base through the addition of this CH3 methyl group. This primarily occurs on Cs in the context of CG dinucleotides, which we call “CpGs”.
  8. The human genome contains about 30 million CpGs, which is 1% of the total number of bases in the genome. But, this varies quite a lot between different species. A thing to note about CpGs is that they are not evenly spaced across the genome, they tend to be present in clusters that we call CpG islands. You can see that in this diagram here where the lollipops represent CpGs. You can see that they are sort of randomly far apart and then appear in a tight cluster here. This represents a CpG island in a promoter, which is quite common as these islands are often colocalised within regulatory regions. Also of note is that CpGs that are closer together tend to be correlated in their methylation status. This plot at the bottom shows that methylation within about 500bp is between 60 and 90% correlated but that the correlation is reduced with greater distance.
  9. Methylation is important because it can regulate the expression of genes. So, sometimes we see a relationship between methylation and gene expression that looks like what you see here in this plot on the right. Each point in this plot is from one sample and DNA methylation is shown on the x axis and gene expression is on the y And you can see that as DNA methylation increases gene expression decreases and vice versa. Now the reason that this can happen is shown here on the left. You can see that in a normal cell, the CpG lollipops in the body of the gene are methylated, while the ones in the promoter are unmethylated and so this gene is expressed in a normal cell. However, in cancer for example, we can lose the methylation in the gene body and gain methylation in the promoter, which can lead to this gene being supressed. This is when you see the sort of relationship show in this plot.
  10. A neat example of how methylation can control gene expression and change phenotype is the coat colour of agouti mice. The agouti mouse has this PS1A gene that controls its coat colour. The promoter of this gene contains a cluster of CpGs that can modify the expression of the gene depending on how methylated they are. There mice in the photo are all genetically identical but have different coat colour because the ones on this side a less methylated at this gene than the ones on this side. What’s even more interesting is that this difference was produced solely by differences in the diet their mothers were fed!
  11. Another very cool example of how methylation can produce wildly different phenotypes from the same DNA is in bees. In bees, the female larvae are genetically identical and whether they become a queen or a worker depends on their diet, which changes their methylation. The larvae fed on royal jelly, are very hypomethylated and go on to become queens. However, the larvae that are fed worker jelly, gain a lot of methylation and become workers.
  12. Ok, so I’ve give you a few examples of how methylation is pretty cool, So, what you may be wondering now if what we usually want to know about it?
  13. When we study methylation, we are usually interested in finding methylation differences. Because methylation is critical in determining cell types, we often want to know what the differences between cell types are. Methylation can also be disrupted in disease, for example cancer. So, we may want to compare cancer and normal tissue. Methylation has also been shown to be affected by environmental factor such as smoking. So, we may want to compare smokers and non smokers. In a typical methylation study, we firstly collect the appropriate samples, for example normal and cancer cells. Then we extract their DNA and measure the methylation using an appropriate platform. Once we have the data then we can do some statistical analysis to answer the biological question of interest.
  14. A common type of study that people of then do is called an epigenome wide association study or EWAS. Now, if this sounds familiar it’s because it’s similar to a genome wide association study or GWAS. These kind of studies usually involve comparing lots of cases to lots of controls, to find some difference in methylation. EWAS are often looking for small effects such as those that might be present in a complex disease or due to environment. For these studies to work, they require lots and lots of samples; Usually you would hope to have 100s or 1000s of cases and controls.
  15. Ok, so now we know that methylation is cool and interesting and that we often want to know about differences between things, But how do we actually measure it? Well, there are two main types of approaches: The first is using bisulfite conversion, which lets us create “SNPs” which can tell us if a CpG was methylated or not. This technique provides us with single nucleotide resolution and we can use it with both arrays and sequencing. The other approach is to do some sort of enrichment of methylated DNA. This typically uses a restriction enzyme or some sort of affinity pulldown to enrich for the methylted DNA. In contract to bisulfite conversion, this really only gives us a regional resolution, so we can only tell if chunks of DNA were methylated or not. We can also combine this with both arrays and sequencing. For the remainder of this talk, I’m only going to focus on bisulfite based approaches.
  16. At this point you may be wondering what is this bisulfite you speak of, what is bisulfite coversion? Bisulfite conversion is a chemical process that was originally published by Sue Clark, who will be speaking after me. So, you are actually very lucky to have a DNA methylation pioneer speaking to you today. Ok, so bisulfite conversion works like this: You start with some DNA that has some methylated Cs and some unmethylated Cs and you mix it with sodium bisulfite. Bisulfite conversion then happens: Any unmethylated Cs are converted to uracils, while the methylated Cs are protected and remain Cs. Then after PCR, the Us become Ts. So effectively we’ve created C=>T SNPs that will let us call methylation.
  17. Now that we know about bisulfite conversion, I’m going to talk about how its used in conjunction with methylation arrays and how they work
  18. These give us the ability to measure the methylation at a subset of the CpG sites in the human genome, for a several samples on a single chip. An early version of the array, released in 2009, enabled us to measure about 27000 or the 30million sites in the genome for a single sample. A newer version, released in 2011, increased the number of sites measure to 450 thousand out of 30 million. The newest array, released in 2015, gives us the ability to measure >850 thousand out of 30 million CpGs. The Illumina arrays are for humans only and the CpGs on them have been carefully selected to be in and around genes and regulatory regions As these are likely to be most relevant to human development and disease.
  19. Because bisulfite conversion lets us create methylation SNPs, methylation arrays are actually based on SNP genotyping arrays. SNP arrays work via a whole bunch of probes that bind to the sequence near SNPs of interest,. Then when you apply your DNA sample to the array, bits of DNA will stick to the probes and then we want to know what the SNP in the sample is. To find this out, fluorescently labelled nucleotides are then incorporated. By measureing the fluorescence we can tell what SNP base was in the sample. Methylation SNPs C=> are created by bisulfite conversion and we measure them in a very similar way. Then we can compare the intensity of the C to T SNPs to get the proportion of methyhlation at a single CpG.
  20. Since were feading fluorescence intensities from the array, you may be wondering what values we can actually get for methylation? Let’s just think about these 3 cell here that have a pair of chromosomes – one for mum and one from dad. And we’re interested in methylation at a single CpG. In this first cell, there is no methylation on either chromosome. This cell in the middle here, is methylated at this CpG on one chromosome but not the other. And finally, this third cell is methylated at the CpG on both chromosomes. Can anyone tell me what methylation values they think we will get for each of these cells? Yes, these cell will have methylation values of 0, 0.5 and 1. However, on an array are not measuring single cells, we are measuring methylation in a population of cells – a whole smoosh of cells. So, even though and individual cell can only be either 0, 0.5 or 1, across a population we can get any continuous meaure between 0 and 1.
  21. Due to bisulfite conversion we measure both methylated C and unmethylated Ts to calculate the proportion of methylation at a CpG This is called the beta value and is the proportion of the methylated signal over the total signal. Looking at the density distribution of the CpGs measured on the array we can see that most are either unmethylated or methylated with a smattering of signal in between. This is a very intuitive measure that is easy to interpret and is great for visualisation, but it’s not great for statistics. For statistical analysis we tend to use M-values, which also have a bimodal distribution but aren’t constricted between 0 and 1, the distribution extends from –inf to +inf M-values have been shown to have better statistical properties and are recommended for statistical testing. These two measures are also related through a logit transformation, which is quite useful.
  22. Take out variability
  23. Another type of analysis that you can do on methylation data is looking for differentially methylated regions. If you recall, at the beginning of the talk I said that CpGs that are close together are correlated. What we’re doing in this type of analysis is looking for CpGs that are close together that consistently display group-average level differences like seen in this plot here. Some of the populat functions for doing this are bumphunter in minfi and dmrcate from the DMRcate package.
  24. Once you’ve done some statistical analysis it may give you some clues ideas about follow up analyses could be interesting. In our immyne cell type differences analysis we took all the DMRs that we found between two of the cell types and then extracted all of the underlying DNA sequences and looked for enriched sequence motifs. We found that the unmethylated regions from regulatory Tcells were enriched for a consensus sequence motif that matched the FoxP3 binding motif. Foxp3 is known to be a master regulator of Foxp3 and this showed that methylation is involved in the binding meshanism.
  25. To sum up the section of my talk on methylation arrays, I’d like to highlight that methylation array analysis is actually very mature and that there are lots of published methods freely available. Most of them are written in R and available from the Biodoncutor website. One of these is our package missMethyl which we published in Bioinformatics last year. We also recently published an example workflow for methyhlation array analysis in F1000 research. It that takes you through and entire methylation array analysis using real data, including all of the required software packages and R code. So, it’s an excellent resource if you are starting out in this field.
  26. Moving on now to methylation sequencing, by which I actually mean bisulfite sequencing and all of the associated trials and tribulations.
  27. Before I move onto the nitty gritty of how to analyse and map bisulfite sequencing data, I’d like to refresh your memory about how bisulfite conversion works. We start of with a DNA fragment that has some methylated Cs and some unmethylated Cs. Then we subject it to bisulfite conversion, which converts the unmethylated Cs to T and the methylated Cs stay Cs. We then end up with two sequences that are no longer complementary that we can PCR. After PCR, we end up with four possible bits of sequence that we can get sequence reads from. There are some protocols which let you sequence only the original top and bottom strands, However, it is generally equally likely that you will get reads from all 4 of these sequences.
  28. So, given this complexity, what are the challenges that we face? F
  29. Given that I’ve just outlined that mapping bisulfite sequencing reads is pretty challenging, I’m going to take you through how it actually works. Specifically, the approach that the Bismark alignment tool takes. So, if we start of with this particular DNA fragment and then bisulfite convert and PCR it, we can get reads from these 4 sequences. Let’s just focus on how we would map this one from the original top strand.
  30. Given that I’ve just outlined that mapping bisulfite sequencing reads is pretty challenging, I’m going to take you through how it actually works. Specifically, the approach that the Bismark alignment tool takes. So, if we start of with this particular DNA fragment and then bisulfite convert and PCR it, we can get reads from these 4 sequences. Let’s just focus on how we would map this one from the original top strand.
  31. Given that I’ve just outlined that mapping bisulfite sequencing reads is pretty challenging, I’m going to take you through how it actually works. Specifically, the approach that the Bismark alignment tool takes. So, if we start of with this particular DNA fragment and then bisulfite convert and PCR it, we can get reads from these 4 sequences. Let’s just focus on how we would map this one from the original top strand.
  32. Given that I’ve just outlined that mapping bisulfite sequencing reads is pretty challenging, I’m going to take you through how it actually works. Specifically, the approach that the Bismark alignment tool takes. So, if we start of with this particular DNA fragment and then bisulfite convert and PCR it, we can get reads from these 4 sequences. Let’s just focus on how we would map this one from the original top strand.
  33. Given that I’ve just outlined that mapping bisulfite sequencing reads is pretty challenging, I’m going to take you through how it actually works. Specifically, the approach that the Bismark alignment tool takes. So, if we start of with this particular DNA fragment and then bisulfite convert and PCR it, we can get reads from these 4 sequences. Let’s just focus on how we would map this one from the original top strand.
  34. Given that I’ve just outlined that mapping bisulfite sequencing reads is pretty challenging, I’m going to take you through how it actually works. Specifically, the approach that the Bismark alignment tool takes. So, if we start of with this particular DNA fragment and then bisulfite convert and PCR it, we can get reads from these 4 sequences. Let’s just focus on how we would map this one from the original top strand.
  35. (B) The methylation state of positions involving cytosines is determined by comparing the read sequence with the corresponding genomic sequence. Depending on the strand a read mapped against this can involve looking for C-to-T (as shown here) or G-to-A substitutions.
  36. (B) The methylation state of positions involving cytosines is determined by comparing the read sequence with the corresponding genomic sequence. Depending on the strand a read mapped against this can involve looking for C-to-T (as shown here) or G-to-A substitutions.