Visualizing and identifying the
DNA methylation markers in
breast cancer tumor subtypes
By
Islam Ibrahim Amin
The 5th International Conference on Innovations in Bio-
Inspired Computing and Applications, June 23-25, 2014
Scientific Research Group in Egypt
www.egyptscience.net
Agenda
 Introduction
 DNA Methylation Analysis
 Non-specific Filtering
 specific Filtering
 Formal Concept Analysis
 Applied FCA for Breast Cancer Subtypes
 Formal Context
 Concept Lattice
 Conclusion and future work
Introduction
DNA (Deoxyribonucleic acid)
Introduction
Central Dogma
Introduction
 DNA Methylation
 DNA methylation plays a very important
role in the gene expression regulation,
causing many diseases such as cancers.
Methylation in a specific region of genes
called promoter, can inhibit genes to be
expressed. Methylation occurs at a specific
regions in the genome called CPG sites
(CPGs), the ”P” in CPG refers to
phosphodiester bond between the guanine
(G) and the cytosine (C).
 Hypermethylated referred to that regions
are become methylated but
hypomethylated referred to that the regions
are less methylated.
Introduction
Cancer
 Cancer refers to a group of
diseases characterized by
uncontrolled cell growth.
 There are two main types of
genes that play a role cancer in
development:
 Oncogenes tell cells when to
divide.
 Tumor suppressor genes tell cells
when not to divide.
Hypermethylated
Tumorsuppressor
genes
Hypomethylated
Oncogenes
Introduction
Measuring DNA Methylation
How Can
we measure
DNA
methylation
?
Introduction
 Illumina High Throughput Arrays
 On the chip, there are two bead types for each CpG site per locus.
 One of the bead types will correspond to the methylated cytosine locus and the other will
correspond to the unmethylated cytosine locus, this type of probe is known as an Allele
specific oligonucleotide.
Introduction
Introduction
 Breast cancer can be classified according to receptor status.
Receptors are proteins existed on the surface of a cell, in the
cytoplasm and nucleus. These receptors play an important role
in receiving chemical signals to take from outside into the inside
of the cell.
 Breast Cancer Subtypes:
 Basal-like
 ERBB2+
 Luminal A
 Luminal B
Introduction
Introduction
 we analyze the DNA methylation data from 28 breast cancer
subtypes paired samples, The normal tissue is located at least 2
Centimeters away from site of the tumor. The methylation data
reported in this paper have been previously deposited in NCBIs
Gene Expression Omnibus (GEO) and are accessible through
GEO Series accession number [GEO: GSE22135]. The
methylation level measured as a continuous values start from
zero (completely unmethylated) to one (completely methylated).
Introduction
 DNA Methylation Data
 The methylation level measured as a continuous values start from zero
(completely unmethylated) to one (completely methylated).
 It is logical to suppose that samples with a methylation value greater than |0.2|
are candidate to be methylated markers
Introduction
Introduction
 Illumina High Throughput Arrays
 By using Illumina methylation microarray, our experiment analyzed DNA
methylation level in 1505 CPG loci sites from the regulatory regions of 806 cancer
related genes (one to five CPG sites per gene)
DNA Methylation Analysis
 Non-Specific Filtering : This phase is determining rows (CPG sites)
which are candidate to be demonstrate a differential change of their
methylation level. The using of |∆β| can determine the level of cut
off, where |∆β| is the absolutely value of the difference between the
mean of methylation level for cancer samples with the mean of
methylation level for the corresponding adjacent normal tissue.
 Specific Filtering : This phase aims to determine the most
differential DNA methylation markers by using the most appropriate
statistical test after testing the normality of methylation data. A one
sample Kolmogorov-smirnov test used to determine which test will
be used a parametric or a non-parametric. For a paired sample the
t-test is using as a parametric test otherwise using a Wilcoxon
signed rank test as non-parametric test. False discover rate (FDR)
is used to adjusted the raw p-value to reduce the false positives that
arise from multiple testing.
DNA Methylation Analysis
DNA Methylation Analysis
Formal Concept Analysis
 Formal concept analysis
was introduced as a
mathematical theory
modelling by WILLE
(1982). Formal concept
analysis is very helpful for
the analysis of data, also it
has been applied in many
applications. Visualizing
the data is one of the
useful objective of FCA.
The concept lattice
provides this visualization.
Formal Concept Analysis
 To distinguish between
hypomethylated and hypermethylated,
we refere to hypomethylated genes by
adding plus (+) in their names (e.g.
Aim2(+)), also we refere to
hypermethylated genes by adding
22
Conclusions and Future Work
 Finally, DNA methylation have been associated with cancer in
several investigations. Hypermethylated or hypomethylated of
CPG islands can affect the expression of genes, therefore
there is a need of mining and visualizing the DNA methylation
status among breast cancer molecular subtypes. In future
work we will use FCA for mining DNA methylation status by
using data obtained from Illumina Infinium
HumanMethylation27 BeadChip, this BeadChip allows
researchers to interrogate 27,578 targeted CpGs sites in
total, spread across promoter regions of 14,495 genes,
therefore we can identify a new relationship among cancer
tumors subtypes.
For further questions:For further questions:
Islam AminIslam Amin
Eng.IslamAmin@gmail.comEng.IslamAmin@gmail.com

Ibica2014 p(8) visualizing and identifying the dna methylation

  • 1.
    Visualizing and identifyingthe DNA methylation markers in breast cancer tumor subtypes By Islam Ibrahim Amin The 5th International Conference on Innovations in Bio- Inspired Computing and Applications, June 23-25, 2014
  • 2.
    Scientific Research Groupin Egypt www.egyptscience.net
  • 3.
    Agenda  Introduction  DNAMethylation Analysis  Non-specific Filtering  specific Filtering  Formal Concept Analysis  Applied FCA for Breast Cancer Subtypes  Formal Context  Concept Lattice  Conclusion and future work
  • 4.
  • 5.
  • 6.
    Introduction  DNA Methylation DNA methylation plays a very important role in the gene expression regulation, causing many diseases such as cancers. Methylation in a specific region of genes called promoter, can inhibit genes to be expressed. Methylation occurs at a specific regions in the genome called CPG sites (CPGs), the ”P” in CPG refers to phosphodiester bond between the guanine (G) and the cytosine (C).  Hypermethylated referred to that regions are become methylated but hypomethylated referred to that the regions are less methylated.
  • 7.
    Introduction Cancer  Cancer refersto a group of diseases characterized by uncontrolled cell growth.  There are two main types of genes that play a role cancer in development:  Oncogenes tell cells when to divide.  Tumor suppressor genes tell cells when not to divide. Hypermethylated Tumorsuppressor genes Hypomethylated Oncogenes
  • 8.
    Introduction Measuring DNA Methylation HowCan we measure DNA methylation ?
  • 9.
    Introduction  Illumina HighThroughput Arrays  On the chip, there are two bead types for each CpG site per locus.  One of the bead types will correspond to the methylated cytosine locus and the other will correspond to the unmethylated cytosine locus, this type of probe is known as an Allele specific oligonucleotide.
  • 10.
  • 11.
    Introduction  Breast cancercan be classified according to receptor status. Receptors are proteins existed on the surface of a cell, in the cytoplasm and nucleus. These receptors play an important role in receiving chemical signals to take from outside into the inside of the cell.  Breast Cancer Subtypes:  Basal-like  ERBB2+  Luminal A  Luminal B
  • 12.
  • 13.
    Introduction  we analyzethe DNA methylation data from 28 breast cancer subtypes paired samples, The normal tissue is located at least 2 Centimeters away from site of the tumor. The methylation data reported in this paper have been previously deposited in NCBIs Gene Expression Omnibus (GEO) and are accessible through GEO Series accession number [GEO: GSE22135]. The methylation level measured as a continuous values start from zero (completely unmethylated) to one (completely methylated).
  • 14.
    Introduction  DNA MethylationData  The methylation level measured as a continuous values start from zero (completely unmethylated) to one (completely methylated).  It is logical to suppose that samples with a methylation value greater than |0.2| are candidate to be methylated markers
  • 15.
  • 16.
    Introduction  Illumina HighThroughput Arrays  By using Illumina methylation microarray, our experiment analyzed DNA methylation level in 1505 CPG loci sites from the regulatory regions of 806 cancer related genes (one to five CPG sites per gene)
  • 17.
    DNA Methylation Analysis Non-Specific Filtering : This phase is determining rows (CPG sites) which are candidate to be demonstrate a differential change of their methylation level. The using of |∆β| can determine the level of cut off, where |∆β| is the absolutely value of the difference between the mean of methylation level for cancer samples with the mean of methylation level for the corresponding adjacent normal tissue.  Specific Filtering : This phase aims to determine the most differential DNA methylation markers by using the most appropriate statistical test after testing the normality of methylation data. A one sample Kolmogorov-smirnov test used to determine which test will be used a parametric or a non-parametric. For a paired sample the t-test is using as a parametric test otherwise using a Wilcoxon signed rank test as non-parametric test. False discover rate (FDR) is used to adjusted the raw p-value to reduce the false positives that arise from multiple testing.
  • 18.
  • 19.
  • 20.
    Formal Concept Analysis Formal concept analysis was introduced as a mathematical theory modelling by WILLE (1982). Formal concept analysis is very helpful for the analysis of data, also it has been applied in many applications. Visualizing the data is one of the useful objective of FCA. The concept lattice provides this visualization.
  • 21.
    Formal Concept Analysis To distinguish between hypomethylated and hypermethylated, we refere to hypomethylated genes by adding plus (+) in their names (e.g. Aim2(+)), also we refere to hypermethylated genes by adding
  • 22.
  • 23.
    Conclusions and FutureWork  Finally, DNA methylation have been associated with cancer in several investigations. Hypermethylated or hypomethylated of CPG islands can affect the expression of genes, therefore there is a need of mining and visualizing the DNA methylation status among breast cancer molecular subtypes. In future work we will use FCA for mining DNA methylation status by using data obtained from Illumina Infinium HumanMethylation27 BeadChip, this BeadChip allows researchers to interrogate 27,578 targeted CpGs sites in total, spread across promoter regions of 14,495 genes, therefore we can identify a new relationship among cancer tumors subtypes.
  • 24.
    For further questions:Forfurther questions: Islam AminIslam Amin Eng.IslamAmin@gmail.comEng.IslamAmin@gmail.com