EuroBioc 2018 - metyhlKit overview

methylKit,
DNA methylation analysis from
high-throughput bisulfite
sequencing data
Alexander Gosdschan
PhD Student
Akalin Group, BIMSB MDC
bioinformatics.mdc-berlin.de
Bioconductor Europe Meeting 2018

Bioconductor Europe Meeting 2018Alexander Gosdschan
Bisulfite Sequencing Workflow
5’
ACm
GTAATCGAG
3’
5’
ACm
GTAATUGAG
3’
5’
ACm
GTAATTGAG3
’
sodium
bisulfite
PCR
Sequence
Align to genome
Call methylation
Statistics on
samples
Comparative
analysis
Sample
Correlation
Sample
Clustering
Differential
Analysis
Annotation
Krueger &
Andrews (2011)
Akalin et.
al. (2012)
Segmentation

Read in data
## chrBase chr base strand coverage freqC freqT
## 1 chr21.9764539 chr21 9764539 R 12 25.00 75.00
## 2 chr21.9764513 chr21 9764513 R 12 0.00 100.00
## 3 chr21.9820622 chr21 9820622 F 13 0.00 100.00
## 4 chr21.9837545 chr21 9837545 F 11 0.00 100.00
## 5 chr21.9849022 chr21 9849022 F 124 72.58 27.42
from pre-called txt files (e.g. cytosineReport or
coverage files from Bismark aligner ):
methRead()
from Bismark BAM reads
(supported through RHTSlib):
processBismarkAln()
flat-file database: dbtype = “tabix”
methyl*DB, supported through
Rsamtools
in-memory:
methylRaw / methylRawList

Summarize statistics on samples
getCoverageStats(methylRaw)getMethylationStats(methylRaw)
methylation statistics per base
summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 20.00 82.79 63.17 94.74 100.00
percentiles:
0% 10% 20% 30% 40% 50% 60% 70%
0.00000 0.00000 0.00000 48.38710 70.00000 82.78556 90.00000 93.33333
80% 90% 95% 99% 99.5% 99.9% 100%
96.42857 100.00000 100.00000 100.00000 100.00000 100.00000 100.00000
read coverage statistics per base
summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 16.00 26.00 34.45 39.00 630.00
percentiles:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
10.000 11.000 14.000 17.000 20.000 26.000 30.000 36.000 42.000 60.000
95% 99% 99.5% 99.9% 100%
78.750 195.800 328.300 441.945 630.000

Segmentation
Segmenting the methylome into sections of CpGs with similar
methylation profiles using change-point analysis, followed by
clustering using a mixture modeling approach.
Comparison of features identified
using methylKit change-point based
segmentation on Human IMR90
methylome with published PMDs
identified with MethPipe (Song et al.,
2013b),(Lister et al., 2009, Gaidatzis
et al., 2014)
Wreczycka & Gosdschan (2017)
(supported through fastseg)
methSeg(object) - segmentation of
GRanges, methylDiff,methylRaw

Compare Samples
get the bases covered in all samples: merge all samples to one object for
base-pair locations that are covered in all samples:
unite(methylRawList) → methylBase
getCorrelation(methylBase) clusterSamples(methylBase) PCASamples(methylBase)
assocComp - Batch effect correction
tileMethylCounts - Tiling
windows analysis

Differential Analysis
Testing for differential methylation using either Fisher’s exact test or
Chisq test for logistic regression model (depending on the sample
size per set) with p-value adjustment using SLIM method (Wang,
Tuominen, and Tsai 2011):
calculateDiffMeth(methylBase) → methylDiff
getMethylDiff - filtering
differential bases
calculateDiffMeth(...,
mc.cores=2) - use multiple cores:
Optional correction for overdispersion if more variability present in
the data than assumed by binomial distribution:
calculateDiffMeth(methylBase,overdispersion="MN")
Covariates can be included in the analysis to separate the influence of
the covariates from the treatment effect via the logistic regression model.
Testing if full model is better than the model with the covariates only.
covariates=data.frame(age=c(30,80,30,80))
calculateDiffMeth(methylBase,covariates=covariates)

Annotation
Use genomation package to annotate differentially
methylated regions/bases based on gene annotation:
Presentation on Friday:
Session VI - Katarzyna Wreczycka
first read the gene BED file:
gene.obj=readTranscriptFeatures(system.file("extdata",
"refseq.hg18.bed.txt",package = "methylKit"))
then get all differentially methylated bases:
myDiff25p=getMethylDiff(methylDiff,difference=25,
qvalue=0.01)
now annotate differentially methylated CpGs with
promoter/exon/intron using annotation data
diffAnn=annotateWithGeneParts(as(myDiff25p,"GRanges"),
gene.obj)
finally visualize the annotation:
plotTargetAnnotation(diffAnn,precedence=TRUE,
main="differential methylation annotation")

Acknowledgements
BIMSB: Altuna Akalin, Katarzyna Wreczycka
Bioconductor Team
Code:
- https://github.com/al2na/methylKit
Blog:
- http://zvfak.blogspot.com/search/label/methylKit
Support:
- https://groups.google.com/forum/#!forum/methylkit_discussion

EuroBioc 2018 - metyhlKit overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to EuroBioc 2018 - metyhlKit overview

Similar to EuroBioc 2018 - metyhlKit overview (20)

Recently uploaded

Recently uploaded (20)

EuroBioc 2018 - metyhlKit overview