DNA Methylation Data Analysis

- Academia Sinica LSL NGS Workshop -
DNA Methylation Data Analysis
Yi-Feng Chang Ph.D.
Molecular Medicine Research Center, Chang Gung University
ianyfchang@mail.cgu.edu.tw
03-2118800 #3166 or #3528
2015/11/18
1

Outlines
• DNA Methylation: Functions and Diseases
• Methods of Measuring DNA Methylation Status
• DNA Methylation Data Analysis
• A Case Study of DNA Methylation Data Analysis
• DNA Methylation Data Visualization
2

http://commonfund.nih.gov/epigenomics/figure.aspx
3

DNA Methylation: Functions and Diseases
4
Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat Biotechnol 28, 1057-1068, doi:10.1038/nbt.1685 (2010).

DNA Epigenetic Modifications in
Human Diseases
5
Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat Biotechnol 28, 1057-1068, doi:10.1038/nbt.1685 (2010).

DNA Methylation Pathway
6
Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).

DNA Demethylation Pathway
7
Moore, L.D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23-38 (2013).
• 5mC: 5-Methylcytosine
• 5hmC: 5-hydroxymethylcytosine
• 5hmU: 5-hydroxymethyluracil
• 5fC: 5-formylcytosine
• 5caC: 5-carboxycytosine
• Tet: Ten-eleven translocation enzymes
• AID/ APOBEC: activation-induced cytidine
deaminase/apolipo-protein B mRNA-
editing enzyme complex
• TDG: Thymine DNA glycosylase
• SMUG1: Single-strand-selective
monofunctional uracil-DNA glycosylase 1

Methods of Measuring DNA Methylation Status
8

Timeline of Technologies for Studying DNA
Methylation
9
COBRA: Combined Bisulfite Restriction Analysis
AP-PCR: Methylation-Sensitive Arbitrarily Primed PCR
AIMS: DNA methylation by amplification of intermethylated sites
RRBS: Reduced representation bisulfite sequencing
MS-HRM: Methylation-sensitive high resolution melting
MeDIP-Seq: Methylated DNA immunoprecipitation sequencing
MethylC-Seq/BS-Seq: Bisulfite sequencing
TAB-Seq: Tet-Assisted Bs-Seq
MAB-Seq: M.SssI methylase-assisted BS-Seq
MS-HRM
MeDIP-Seq
BS-Seq
MethylC-Seq
TAB-Seq
MAB-Seq
Harrison, A. & Parle-McDermott, A. DNA methylation: a timeline of methods and applications. Front Genet 2, 74 (2011).
2015

The Steps to Determining the Methylation Status
of Cytosine in a Known DNA Sequence by The
Bisulfite Conversion Method
10
Singal, R. & Ginder, G.D. DNA Methylation. Blood Journal 93, 4059-4070 (1999).

11
Lister, R. & Ecker, J.R. Finding the fifth base:
genome-wide sequencing of cytosine methylation.
Genome Res 19, 959-66 (2009).
Genomic DNA
Deep Sequencing
Techniques for Genome-
Wide Sequencing of
Cytosine Methylation Sites

12
Genomic DNA
Deep Sequencing
Techniques for Enrichment of Methylated
or Target Regions Prior to BS-Seq
Lister, R. & Ecker, J.R. Finding the
fifth base: genome-wide sequencing
of cytosine methylation. Genome
Res 19, 959-66 (2009).

Approaches for Detecting Active DNA
Demethylation at Single Base Resolution
13
TAB-Seq: Tet-Assisted Bs-Seq
Yu, M. et al. Tet-assisted bisulfite sequencing of 5-
hydroxymethylcytosine. Nat Protoc 7, 2159-70 (2012).
Yu, M. et al. Base-resolution analysis of 5-
hydroxymethylcytosine in the mammalian genome. Cell 149,
1368-80 (2012).
MAB-Seq: M.SssI methylase-assisted BS-Seq
Wu, H., Wu, X., Shen, L. & Zhang, Y. Single-base resolution analysis of active DNA
demethylation using methylase-assisted bisulfite sequencing. Nat Biotechnol 32,
1231-40 (2014).

Key Metrics of the Technology Comparison
14
Beck, S. Taking the measure of the methylome. Nat Biotechnol 28, 1026-8 (2010).
Human Methylation 450K
contains approximately 480k
CpG sites, covering 99%
RefSeq genes (hg19) and
96% CpG islands (CGIs).

Genomic Coverage of MeDIP-seq, MethylCap-seq,
RRBS and Infinium
15
Bock, C. et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28, 1106-14 (2010).
MeDIP-seq and MethylCap-seq provide broad coverage of the genome, whereas RRBS
and Infinium are more restricted to CpG islands and promoter regions

Common Base Resolution Methylation Sequencing Platforms
16Sun, Z., Cunningham, J., Slager, S. & Kocher, J. P. Base resolution methylome profiling: considerations in platform
selection, data preprocessing and analysis. Epigenomics 7, 813-828, doi:10.2217/epi.15.21 (2015).

WGBS Coverage Depth vs Replicates
• Using several high-coverage reference data sets to experimentally
determine minimal sequencing requirements
17
Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite
sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).

WGBS Coverage Depth vs Replicates
• For DMR identification
• Per-sample coverage in the range of 5–15×, depending on the magnitude of methylation differences
between the groups and whether a smoothing or single CpG-based DMR identification strategy is
used
• To identify long DMRs with large methylation differences, we find that reducing coverage down to 1×
or 2× per sample is acceptable
• Biological replicates should be analyzed separately to increase power, as opposed to being pooled
together for analysis
• Strongly argue for the use of at least two separate biological replicates for DMR analysis
• Choosing an appropriate number of biological replicates is a complex issue influenced by the degree
of within-group heterogeneity, the magnitude of between-group differences and the presence of
confounding factors such as batch effects.
18Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite
sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).

DNA Methylation Data Analysis
19

Effect and Problems of Bisulfite Treatment of DNA
20
Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).
Mapping bisulfite reads to 4 possible bisulfite strands (OT/CTOT/OB/CTOB) is
equivalent to mapping the bisulfite read and its reverse complementary
read to both Top/Bottom strands of the original reference sequence.
OT, original top strand; CTOT, strand complementary to the original top
strand; OB, original bottom strand; and CTOB, strand complementary to the
original bottom strand.

How to Align BS Reads Against Reference Genome?
21
Bock, C. Analysing and interpreting DNA methylation data. Nat Rev Genet 13, 705-19 (2012)
TCGA TCGT ACGT ATGA
TTGT ATGTTCGA ATGA
BS-Seq reads

Procedure to Perform Three-Letter Alignment
22
Krueger, F. & Andrews, S.R. Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics (2011).

Three-Letter Alignment
23
Multiple hits

Wild-Card Alignment
24
Convert C/T to Y
Multiple hits

Wild-Card Alignments have Better Accuracy
but Poor Running Time
25
http://smithlabresearch.org/manuals/rmap_manual.pdf

Workflow for Analyzing BS-Seq data
26Krueger, F., Kreck, B., Franke, A. & Andrews, S.R. DNA methylome analysis using
short bisulfite sequencing data. Nat Methods 9, 145-51 (2012).
http://omictools.com/bisulfite-seq/

A Case Study of DNA Methylation Data Analysis
27

Turn off PowerPoint Smart Quote
28

Required Software in Your Laptop
• Mac OS X Terminal
• Application à Utilities à Terminal (終端機)
• Linux console
• Putty:
http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
• SCP/SFTP/FTP client
• Winscp: http://winscp.net/download/winscp556.zip
• PDF viewer
• http://get.adobe.com/tw/reader/
• R
• https://cran.r-project.org/
29

Required R Packages
• Bioconductor
• http://www.bioconductor.org/install/#install-
bioconductor-packages
• methylKit:
• https://github.com/al2na/methylKit
30
> R
# dependencies
> install.packages( c("data.table","devtools"))
> source("http://bioconductor.org/biocLite.R")
> biocLite(c("GenomicRanges","IRanges"))
# install the development version from github
> library(devtools)
> install_github("al2na/methylKit",build_vignettes=FALSE)

Analysis Pipeline
31
Allele-specific Methylated Regions
amrfinder allelicmeth
Differential Methylation Region
dmr
Large Hypo/Hyper-Methylation Domains
pmd
Hypo/Hyper-Methylation Regions
hmr hyperhmr pmr
Methylation Calling
methcounts
Bisulfite Conversion Rate
bsrate
Remove Duplicate Reads
duplicate-remover
Mapping
walt
Quality Trimming
fastq_masker
Cross-species Comparison of Methylomes
liftOver
Calculating Methylation Ratio for Regions
bigWigAverageOverBed roimethstat bwtools
Generate Methylation BED file
Bedtools bedGraphToBigWig
fastx toolkit: http://hannonlab.cshl.edu/fastx_toolkit/
MethPipe: http://smithlabresearch.org/software/methpipe/
Bedtools: https://github.com/arq5x/bedtools2
Programs from UCSC Genome Browser:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64
bwtool: https://github.com/CRG-Barcelona/bwtool/wiki
Sorting mr files
Sorting mr files
http://smithlabresearch.org/downloads/methpipe-manual.pdf

Public BS-Seq Datasets
32
http://smithlabresearch.org/software/methbase/
Other species in NCBI GEO Database
• Glycine max (Soy beans)
• Schistocerca gregaria (Locust)
• Rattus norvegicus (Rat)
• Danio rerio (Zebra fish)
• Drosophila melanogaster (Fruit fly)
• Oryza sativa (Rice)
• Macaca mulatta (Rhesus monkey)
• Mus musculus domesticus (Western Europen house mouse)
• Xenopus (Silurana) tropicalis (Frog)
• Cynoglossus semilaevis (Tongue sole, bony fish)
• Bombyx mori (Silkworm)
• Harpegnathos saltator (Jerdon's jumping ant)
• Camponotus floridanus (Florida carpenter ant)

H1 (male): human embryonic stem cells (107GB)
IMR90 (female): fetal lung fibroblasts (154GB)
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16256
33
Datasets used in This Case Study

Convert SRA to FASTQ (Example ONLY)
# sra-toolkit can be download from https://github.com/ncbi/sratoolkit
> fastq-dump --split-3 SRR018975.sra
> ls
SRR018975.fastq
34

DEMO Files
> cd /work3/LSLNGSDNAMETH
> ls -alh
total 12G
drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 16 00:29 .
drwxrwxrwt 109 root root 4.0K Nov 15 14:10 ..
-rwxr-xr-x 1 u00gel00 u00ycm02 65K Nov 15 17:22 h1.chrX.hmr
-rwxr-xr-x 1 u00gel00 u00ycm02 4.6G Nov 15 14:51 h1.chrX.mr.dremove
-rwxr-xr-x 1 u00gel00 u00ycm02 9.8K Nov 15 17:22 h1.chrX.pmd
-rwxr-xr-x 1 u00gel00 u00ycm02 34M Nov 15 17:39 h1.chrX_CpG.meth
-rwxr-xr-x 1 u00gel00 u00ycm02 39M Nov 15 23:52 h1.chrX_CpG.meth.for.methylKit
-rwxr-xr-x 1 u00gel00 u00ycm02 161K Nov 15 17:22 h1_gt_imr90.chrX.dmr
-rwxr-xr-x 1 u00gel00 u00ycm02 45M Nov 15 17:22 h1_imr90.chrX.methdiff
-rwxr-xr-x 1 u00gel00 u00ycm02 55K Nov 15 17:22 h1_lt_imr90.chrX.dmr
-rwxr-xr-x 1 u00gel00 u00ycm02 194K Nov 15 17:22 imr90.chrX.hmr
-rwxr-xr-x 1 u00gel00 u00ycm02 7.3G Nov 15 14:52 imr90.chrX.mr.dremove
-rwxr-xr-x 1 u00gel00 u00ycm02 5.6K Nov 15 17:22 imr90.chrX.pmd
-rwxr-xr-x 1 u00gel00 u00ycm02 35M Nov 15 17:39 imr90.chrX_CpG.meth
-rwxr-xr-x 1 u00gel00 u00ycm02 40M Nov 15 23:52 imr90.chrX_CpG.meth.for.methylKit
drwxr-xr-x 6 u00gel00 u00ycm02 4.0K Nov 15 14:28 methpipe-3.3.1
drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 15 14:46 methpipe-data
35

Quality Trimming and Split FASTQ Files into Smaller
Files (Example ONLY)
#e.g. SRR018975.fastq.gz
> for f in *.gz;
do
b=`basename $f .gz`;
echo $f
bsub -q 4G -o $f.stdout -e $f.stderr "
gzip -dc $f|
fastq_masker -q 30 -Q33|
split -dl 6000000 - $b- ";
done
> ls
SRR018975.fastq-00
SRR018975.fastq-01
SRR018975.fastq-02
… 36
#e.g. SRR018975.fastq.gz
# listing all gzip files one by one
# SRR018975.fastq
#uncompressing gzip file and out to stdout
#masking low quality reads as Ns
#spliting fastq file into smaller ones

Mapping BS-Seq
FASTQ Files
(Example ONLY)
> export AdapterTrich=AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
> export AdapterArich=CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
> bsub -q 4G -o rmapbs.stdout -e rmapbs.stderr "
/work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe
-c /work3/LSLNGSDNAMETH/methpipe-data/data/genome
-o /work3/USERNAME/Output/test.mr
-m 3 -L 400 -C $AdapterTrich:$AdapterArich
/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_1.fq
/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_2.fq"
37
> /work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe
Usage: rmapbs-pe [OPTIONS] <fastq-reads-file>
Options:
-o, -output output file name
-c, -chrom chromosomes in FASTA file or dir
-T, -start index of first read to map
-N, -number number of reads to map
-s, -suffix suffix of chrom files (assumes dir provided)
-m, -mismatch maximum allowed mismatches
-M, -max-map maximum allowed mappings for a read
-C, -clip clip the specified adaptor
-L, -fraglen max fragment length
-suffix-len Suffix length of reads name
-v, -verbose print more run info
Help options:
-?, -help print this help message
-about print about message

Example Output of imr90 chrX
38
> head -n 30 /work3/LSLNGSDNAMETH/imr90.chrX.mr.dremove |column
MR Format
•RNAME (chromosome name)
•SPOS (start position, 0-based)
•EPOS (end position, 0-based)
•QNAME (read name)
•MISMATCH (number of mismatches)
•STRAND (forward or reverse strand)
•SEQ
•QUAL

Remove Duplicates (Example ONLY)
> export PATH=$PATH:/pkg/biology/methpipe/methpipe-3.3.1/bin/
> bsub -q 16G -o stdout -e stderr "
LC_ALL=C sort -S 14G -k 1,1 -k 2,2n -k 3,3n -k 6,6
-o /work3/USERNAME/h1.chrX.mr.sorted_start
/work3/LSLNGSDNAMETH/h1.chrX.mr;
duplicate-remover -S /work3/USERNAME/h1.chrX_dremove_stat.txt
-o /work3/USERNAME/h1.chrX.mr.dremove
/work3/USERNAME/h1.chrX.mr.sorted_start "
> cat stdout
Successfully completed.
Resource usage summary:
CPU time : 343.80 sec.
Max Processes : 3
Max Threads : 4 39
> cat/work3/USERNAME/h1.chrX_dremove_stat.txt
TOTAL READS IN: 24350707
GOOD BASES IN: 1987943796
TOTAL READS OUT: 22884736
GOOD BASES OUT: 1867152730
DUPLICATES REMOVED: 1465971
READS WITH DUPLICATES: 1219174

Computing single-site methylation levels (Example Only)
# sorting again for methylated CpG analysis
bsub -q 16G -o stdout -e stderr "
LC_ALL=C sort -S 14G -k 1,1 -k 3,3n -k 2,2n -k 6,6
-o /work3/USERNAME/h1.chrX.mr.sorted_end_first
/work3/LSLNGSDNAMETH/h1.chrX.mr.dremove"
# methylation calling
methcounts -c /work3/LSLNGSDNAMETH/hg18
-o /work3/USERNAME/h1.chrX.meth
/work3/USERNAME/h1.chrX.mr.sorted_end_first"
#extract CpG sites
symmetric-cpgs
-o /work3/USERNAME/h1.chrX_CpG.meth h1.chrX.meth"
40
chrX 152 + CpG 0 0
chrX 232 + CpG 0 0
chrX 330 + CpG 0 0
chrX 334 + CpG 0 0
chrX 336 + CpG 0 0
chrX 364 + CpG 0 0
chrX 366 + CpG 0 0
chrX 374 + CpG 0 0
chrX 376 + CpG 0 0
meth ratio read count

Computation of methylation level statistics
(Example ONLY)
41
levels -o /work3/USERNAME/Output/h1.chrX.levels
/work3/USERNAME/h1.chrX.meth"

Estimating bisulfite conversion rate
bsrate -c /work3/LSLNGSDNAMETH/hg18
-o /work3/USERNAME/Output/h1.chrX.bsrate
/work3/LSLNGSDNAMETH/h1.chrX.mr.dremove"
42
# head –n 16 /work3/USERNAME/Output/h1.chrX.bsrate
OVERALL CONVERSION RATE = 0.980192
POS CONVERSION RATE = 0.980204 96942555
NEG CONVERSION RATE = 0.980179 96821402
BASE PTOT PCONV PRATE NTOT NCONV NRATE BTHTOT BTHCONV BTHRATE ERR ALL ERRRATE
1 1798190 1762518 0.98016 1796291 1760655 0.98016 3594481 3523173 0.98016 36327 3630808 0.01001
2 1654252 1617801 0.97797 1649805 1613025 0.97771 3304057 3230826 0.97784 41299 3345356 0.01235
3 1646403 1615036 0.98095 1644710 1613525 0.98104 3291113 3228561 0.98099 48231 3339344 0.01444
4 1699787 1666286 0.98029 1695105 1662078 0.98052 3394892 3328364 0.98040 50697 3445589 0.01471
5 1663363 1631006 0.98055 1658397 1626045 0.98049 3321760 3257051 0.98052 52464 3374224 0.01555
6 1720978 1687130 0.98033 1716036 1682351 0.98037 3437014 3369481 0.98035 45366 3482380 0.01303
7 1677561 1644979 0.98058 1677119 1644343 0.98046 3354680 3289322 0.98052 53873 3408553 0.01581
8 1714426 1681206 0.98062 1714378 1681339 0.98073 3428804 3362545 0.98068 34491 3463295 0.00996
9 1702891 1668424 0.97976 1700092 1665742 0.97980 3402983 3334166 0.97978 34861 3437844 0.01014
10 1681522 1648092 0.98012 1680471 1647068 0.98012 3361993 3295160 0.98012 45776 3407769 0.01343
11 1664207 1631036 0.98007 1664386 1631083 0.97999 3328593 3262119 0.98003 46055 3374648 0.01365
12 1651326 1618334 0.98002 1649370 1616514 0.98008 3300696 3234848 0.98005 44139 3344835 0.01320

Hypomethylated (hmr) and hypermethylated
(hypermr)
hmr -o /work3/USERNAME/h1.chrX.hmr /work3/USERNAME/h1.chrX_CpG.meth"
pmd -o /work3/USERNAME/h1.chrX.pmd /work3/USERNAME/h1.chrX_CpG.meth"
43
chrX 2727656 2728600 HYPO0 18 +
chrX 2731108 2731952 HYPO1 14 +
chrX 2732390 2733303 HYPO2 23 +
chrX 2740632 2740962 HYPO3 9 +
chrX 2756524 2758153 HYPO4 139 +
chrX 2817685 2817980 HYPO5 8 +
chrX 2855757 2857708 HYPO6 127 +
chrX 2890571 2890884 HYPO7 9 +
chrX 3004371 3004626 HYPO8 9 +
chrX 3238227 3238677 HYPO9 9 +
# of CpG

Differential Methylation Analysis
methdiff -o /work3/USERNAME/h1_imr90.chrX.methdiff
/work3/LSLNGSDNAMETH/h1.chrX_CpG.meth /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth"
44
chrX 2709681 + CpG 0.749276 7 2 12 7
chrX 2709727 + CpG 0.917633 4 1 9 12
chrX 2709774 + CpG 0.894737 3 1 6 10
chrX 2709871 + CpG 0.742424 0 16 0 48
chrX 2709890 + CpG 0.857575 3 20 3 47
chrX 2709982 + CpG 0.999354 10 2 7 19
chrX 2710014 + CpG 0.704043 3 6 3 10
chrX 2710023 + CpG 0.600782 4 3 4 4
chrX 2710146 + CpG 0.523077 1 2 8 14
chrX 2710155 + CpG 0.234026 3 3 17 9
Probability
Sample A
Un-meth
Sample A
Meth
Sample B
Un-meth
Sample B
Meth

Differential methylated region (DMR)
dmr /work3/LSLNGSDNAMETH/h1_imr90.chrX.methdiff
/work3/LSLNGSDNAMETH/h1.chrX.hmr /work3/LSLNGSDNAMETH/imr90.chrX.hmr
h1_lt_imr90.chrX.dmr h1_gt_imr90.chrX.dmr"
45
==> h1_lt_imr90.chrX.dmr <==
chrX 2727656 2728600 X:18 10 +
chrX 2731108 2731952 X:15 4 +
chrX 2732390 2733303 X:37 8 +
chrX 2740632 2740962 X:9 0 +
chrX 2758131 2758153 X:3 0 +
chrX 2817685 2817980 X:9 0 +
chrX 2855757 2855890 X:1 1 +
chrX 2890571 2890884 X:9 4 +
chrX 3004371 3004626 X:9 0 +
chrX 3238227 3238677 X:24 0 +
==> h1_gt_imr90.chrX.dmr <==
chrX 2825454 2826947 X:37 17 +
chrX 2857708 2857760 X:2 0 +
chrX 3272822 3273033 X:13 3 +
chrX 3275527 3275594 X:1 0 +
chrX 3287038 3289160 X:36 9 +
chrX 3643168 3643374 X:7 0 +
chrX 4016033 4022054 X:47 29 +
chrX 4028369 4042000 X:79 54 +
chrX 4051286 4059878 X:52 39 +
chrX 4079778 4087714 X:45 26 +
Number of significant differential methylated CpG
Meth. level lower in H1 than IMR90 Meth. level lower in IMR90 than H1
# of CpG
> awk -F "[:t]" ’$5 >= 10 && $6 >= 5 {print $0}’ h1_lt_imr90.chrX.dmr
> h1_lt_imr90.chrX.dmr.filtered

Other Utilities
• DM analysis of two groups of DNA methylomes
• Robinson, M. D. et al. Statistical methods for detecting differentially
methylated loci and regions. Frontiers in genetics 5, 324,
doi:10.3389/fgene.2014.00324 (2014).
• Allele-specific methylation
• allelicmeth
• amrfinder: http://smithlabresearch.org/software/amrfinder/
• Estimate hydroxymethylation(5hmC) and methylation (5mC)
levels from BS-seq, oxBS-seq and TAB-seq
• mlml: http://smithlabresearch.org/software/mlml/
46

DNA Methylation Data Visualization
47

R Packages: methylKit
The following examples were adopt from the tutorials of methylKit
• Akalin, A. et al. methylKit: a comprehensive R package for the
analysis of genome-wide DNA methylation profiles. Genome Biol
13, R87, doi:10.1186/gb-2012-13-10-r87 (2012).
• Tutorial:
http://methylkit.googlecode.com/files/methylKitTutorial_feb2012.
pdf
• Tutorial Slide: http://methylkit.googlecode.com/files/
methylKitTutorialSlides_2013.pdf
48

Convert MethPipe mr Format to methylKit
Format
Id chr base strand coverage freqC freqT
Chr21.9764539 chr21 9764539 R 12 25.00 75.00
Chr21.9764513 chr21 9764513 R 12 0.00 100.00
Chr21.9820622 chr21 9820622 F 13 0.00 100.00
Chr21.9837545 chr21 9837545 F 11 0.00 100.00
Chr21.9849022 chr21 9849022 F 124 72.58 27.42
Chr21.9853326 chr21 9853326 F 17 70.59 29.41
49
> awk -F $'t' -v OFS=$'t’ '$6>0{$5=int($5*100); print $1"."$2, $1,
$2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/h1.chrX_CpG.meth >
/work3/USERNAME/Output/h1.chrX_CpG.meth.for.methylKit
> awk -F $'t' -v OFS=$'t' '$6>0{$5=int($5*100); print $1"."$2, $1,
$2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth >
/work3/USERNAME/Output/imr90.chrX_CpG.meth.for.methylKit

Read Methylation Files into methylKit Objects
> library(methylKit)
# load methylation files (change to your datasets)
> file.list=list(
system.file("extdata", "test1.myCpG.txt", package = "methylKit"),
system.file("extdata", "test2.myCpG.txt", package = "methylKit"),
system.file("extdata", "control1.myCpG.txt", package = "methylKit"),
system.file("extdata", "control2.myCpG.txt", package = "methylKit") )
# read the files to a methylRawList object: myobj
> myobj=read( file.list, sample.id=list("test1", "test2","ctrl1","ctrl2"),
assembly="hg18",treatment=c(1,1,0,0))
> head(myobj)
50

Get descriptive stats on methylation
> png("test1.png",width=600,height=600)
> getMethylationStats(myobj[[1]],plot=T,both.strands=F)
> dev.off()
null device 1
> png("control1.png",width=600,height=600)
> getMethylationStats(myobj[[3]],plot=T,both.strands=F)
> dev.off()
null device 1
51

Sample Correlation
> png("correlation.png",width=1000,height=1000)
> getCorrelation(meth, plot = T)
test1 test2 ctrl1 ctrl2
test1 1.0000000 0.9252530 0.8767865 0.8737509
test2 0.9252530 1.0000000 0.8791864 0.8801669
ctrl1 0.8767865 0.8791864 1.0000000 0.9465369
ctrl2 0.8737509 0.8801669 0.9465369 1.0000000
> dev.off()
52

Get bases covered by all samples and cluster
samples
# merge all samples to one table by using base-pair locations that are covered in all samples
> meth=unite(myobj)
# cluster all samples using correlation distance and plot hierarchical clustering
> png("cluster.png", width=600, height=600)
> hc = clusterSamples(meth, dist="correlation", method="ward", plot=T)
> dev.off()
> png("pca.png", width=600,height=600)
> PCASamples(meth)
> dev.off()
53

Calculate differential methylation
# calculate differential methylation p-values and q-values
> myDiff=calculateDiffMeth(meth)
# get differentially methylated regions with 25% difference and qvalue < 0.01
> myDiff25p=get.methylDiff(myDiff,difference=25,qvalue=0.01)
# get differentially hypo methylated regions with 25% difference and qvalue<0.01
> myDiff25pHypo =get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")
# get differentially hyper methylated regions with 25% difference and qvalue<0.01
> myDiff25pHyper=get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")
54

Differential methylation events per chromosome
> png("meth_event.png",width=600,height=600)
> diffMethPerChr(myDiff, plot = T, qvalue.cutoff = 0.01,meth.cutoff = 25)
> dev.off()
55

Annotate Differentially Methylated Bases/Regions
# read-in transcript locations to be used in annotation
> gene.obj=read.transcript.features(system.file("extdata", "refseq.hg18.bed.txt", package =
"methylKit"))
# annotate differentially methylated Cs with promoter/exon/intron using annotation data
>annotate.WithGenicParts(myDiff25p,gene.obj)
56

Annotating Differential Methylation Events around
CpG Islands
> cpg.obj = read.feature.flank(system.file("extdata", "cpgi.hg18.bed.txt", package =
"methylKit"),feature.flank.name = c("CpGi", "shores"))
> diffCpGann = annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi, cpg.obj$shores,
feature.name = "CpGi",flank.name = "shores")
57

https://www.gitbook.com/book/ycl6/methylation-sequencing-
analysis/details
58Dr. I-Hsuan Lin, NYMU

DNA Methylation Data Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DNA Methylation Data Analysis

Similar to DNA Methylation Data Analysis (20)

Recently uploaded

Recently uploaded (20)

DNA Methylation Data Analysis