1. 1
Honours Project BIO303 2015/16
Final Report
Name: Jiangyuan Liu
Title: The analysis of FTO-targeted mRNA m6
A
methylation in mouse transcriptome
Word count: 4324
Supervisor: Dr. Jia Meng
Biological Sciences
2. 2
Content
Abstract 3
1. Introduction 4
1.1 FTO gene 4
1.2 MeRIP-Seq 5
1.3 Gene Ontology 7
2. Methods 8
3. Results and Discussion 15
3.1 Differential RNA methylation between wild type and FTO knockout cell lines
in mouse midbrain 15
3.2 Differential expression between wild type and FTO knockout cell lines in
mouse midbrain 20
3.3 Functional enrichment analysis of FTO-targeted genes 22
3.4 A web application developed by Shiny 24
4. Acknowledgement 27
5. References 27
6. Appendices 31
3. 3
Abstract
Recent studies have found that methylation of the N6 position of adenosine (m6A) is a
common base modification on mRNAs. Meanwhile, it has discovered that the
demethylase encoded from fat mass and obesity-associated (FTO) gene is able to carry
out m6A modification on mRNA. Therefore, there might be a connection between m6A
modification and physiological functions, which may give researchers some information
to understand the mechanisms about how FTO demethylase can be associated with
obesity through regulating m6A modification. Here, this study aims to identify the
differential methylation sites between wild type and FTO knockout cell lines in mouse
midbrain, which could indicate FTO-targeted m6A sites and genes. This study has found
that FTO-targeted m6A sites mainly appear around stop codon and on coding sequence
(CDS). In addition, the analysis of differential expression between these two cell lines has
found that there are no statistically significantly changes of expression levels, indicating
that FTO might not participate in regulation of gene expression levels. Gene Ontology
(GO) enrichment analysis has revealed that there are many functions which have an
association with FTO; particularly, FTO could regulate a selective subset of mRNAs
whose biological functions are specifically related to neuronal signal transduction.
Finally, this study developed a web application by using Shiny, a R package, in order to
allow users to customize the analysis of this study for their specific needs and extract
more insight from the data, which is helpful to explore the physiological roles of m6A
modification on mRNAs.
4. 4
Introduction
FTO gene
The fat mass and obesity-associated (FTO) gene plays an important role in regulating
energy utilization and metabolism (Fischer et al., 2009). As for human beings, the
increase of FTO expression is related to the high body mass index and risk for obesity
(Fawcett and Barroso, 2010). Some studies have found that FTO demethylase has the
ability to demethylate N6-methyladenosine (m6A) (Figure 1), which is the adenosine
modification without artificial manipulation (Jia et al., 2011). These studies revealed that
there might be a relationship between adenosine modification and its physiological roles,
which participate in human biological process (Meyer et al., 2012).
Figure 1 FTO catalyzes the conversion of N6-methyladenosine in mRNA to
adenosine.
5. 5
MeRIP-Seq
m6A-specific methylated RNA immunoprecipitation with next generation sequencing
(MeRIP-Seq) is powerful method, which can be used to localize transcriptome-wide
m6A. Some studies showed that 7,676 mammalian genes’ messenger RNA (mRNA)
contain m6A, which indicates that m6A modification is widespread on mRNAs encoded
from many genes (Meyer et al., 2012). Here, it is necessary to introduce the procedure of
MeRIP-Seq (Figure 2). First of all, rRNA species should be removed from total RNA by
RiboMinus treatment. Secondly, purified mRNA is cut into ~100-nt-long oligonucleotides
by chemical fragmentation. Thirdly, those reads with m6A could be immunoprecipitated
by using m6A antibody-coupled Dynabeads. Fourthly, the reads eluted from these
Dynabeads and untreated input control reads are ligated to sequencing adaptors and then
converted to cDNA, which is then amplified by PCR in order to be sequenced and
aligned to a reference transcript database (Dominissini et al., 2012). After compared with
m6A signals from input sample, some distinct peaks from immunoprecipitated sample
represent high enrichment of reads, indicating that these peaks are m6A signal peaks.
Since an m6A site might sit on any position along a 100nt fragment, it is necessary to
make the m6A site close to a center. As a result, m6A sites can be represented as peaks
whose base and midpoint are ~200nt and ~100nt wide respectively, which means a
resolution of 200 nt for m6A sites identification by MeRIP-Seq.
6. 6
Figure 2 Outline of MeRIP-SeqProtocol
Moreover, it is necessary to explain why the untreated input control sample is
indispensable for MeRIP-Seq. In fact, the transcriptome-wide RNA methylation is under
the control of both transcriptional and enzymatic regulations. It is possible that because of
differential expression, one gene might transcript more copies of its RNA under certain
conditions, which causes an increase of the absolute amount of this RNA. However, this
cannot indicate a stronger RNA enzymatic hypermethylation (Meng et al., 2014). Thus,
MeRIP-Seq must include an input sample in order to estimate the number of RNA
molecules, which is only influenced by gene expression. In other words, the input signal
seems like a background compared to m6A RNA immunoprecipitation signal. The
7. 7
difference between them is attributed to the real enzymatic methylation.
Gene Ontology
Gene ontology (GO) is a major innovative project of bioinformatics in order to unify the
characteristics of gene and gene product attributes existing in all species (The Gene
Ontology Consortium, 2008). “Ontologies” represent the characteristics of detectable
things, and how these things interact with each other. An ontology of defined terms
offered by GO project represents gene product properties, which is helpful for different
specialized biologists to communicate and share their information. There are three
domains in this ontology, including cellular component, molecular function and
biological process. And the defined terms are divided into these three domains. Moreover,
other organizations or databases also have built their own ontology systems as same as
GO project. These ontology systems represent different categories whose databases focus
on different aspects such as diseases and research organizations.
Here, this study identified the differential RNA m6A sites between wild type and FTO
knockout cell lines in mouse midbrain by analyzing a MeRIP-Seq dataset downloaded
from Gene Expression Omnibus (GEO) database in order to know whether FTO controls
demethylation of all m6A-modified mRNAs or a distinct subset of these mRNAs. Then
those FTO-targeted genes underwent GO enrichment analysis. In addition, this study also
found out whether differentially expressed genes exist under wild type condition and
FTO knockout condition, indicating whether FTO regulates gene expression or not.
Eventually, an interactive web application was developed and then put online in order to
8. 8
user-friendly share the results of this study everywhere, which would be helpful for other
researchers to further study physiological roles of m6A modification regulated by FTO.
Methods
This study analyzed the MeRIP-Seq dataset (GEO GSE47217), which measures the
transcriptome-wide m6A profiles for wild type and FTO knockout cell lines in mouse
midbrain (Hess et al., 2013). The Bash UNIX Shell and R system were used to analyze
this MeRIP-Seq dataset. In short, this study began from downloading raw data from GEO
database, then carried out reads alignement, m6A site detection, the analysis of
differential methylation, the analysis of m6A mRNA topology on 3’UTR (Untranslated
Regions), CDS (Coding Sequence) and 5’UTR, transcriptome-wide m6A site
visualization, functional annotation, differentially expression detection, web application
development. The detailed of each step are described in the following content.
First of all, there are three biological replicates for each of wild type and FTO knockout
cell lines in the raw data (GEO GSE47217). Each replicate consists of an
immunoprecipitated (IP) sample and an input control sample. The raw data downloaded
from GEO database are SRA files, which first need to be converted to FASTQ files by
using a tool called “Fastq-dump” on the Bash UNIX Shell system. Then the reads for
each condition in these FASTQ files were mapped to the reference genome with TopHat,
which is a powerful software tool that can address the limitation of Bowtie, another short-
read aligner. It is incapable for Bowtie to align reads that span introns. However, TopHat
has the ability to identify that a read spans a splice junction and possible junction’s splice
9. 9
sites, which increase the accuracy of reads alignment (Trapnell et al., 2012). The results
of alignment were saved in twelve bam files. The code used in this step is saved in
“Script 1 (Bash script).txt” attached to Appendix I.
Secondly, exomePeak, a R/Bioconductor package, is the vitally software tool of this
study, which was used to detect RNA m6A sites and further identify the differential RNA
m6A sites in term of percentage rather the absolute amount in this case control study. This
package is designed for the analysis of affinity-based epitranscriptome shortgun
sequencing data from MeRIP-seq (i.e. m6A-Seq). Moreover, exomePeak R-package can
statistically analyze multiple biological replicates at the same time; it can also internally
remove PCR artifacts and multi-mapping reads (Meng, 2015a). Since PCR artifacts are
not derived from immunoprecipitated sample and input control sample, they cannot be
used to represent signal peaks. Multi-mapping reads should also be removed because they
could increase the experimental errors. In the course of the analysis of exomePeak, the IP
samples of both cell lines were firstly compared with their input samples in the common
replicates to acquire the difference of percentage of reads number so as to detect RNA
m6A peaks. Then these two differences of percentage of reads number from wild type and
FTO knockout cell lines were compared with each other to identify the differential RNA
m6A peaks. Since there are three biological replicates of bam files for each of wild type
and FTO knockout cell lines after TopHat, three datasets of the differential RNA m6A
peaks were acquired. Furthermore, exomePeak screened all the consistently differentially
methylated peaks. In other words, there are peaks that are consistently differentially
methylated in all these three datasets acquired previously, indicating highly confidence.
10. 10
Therefore, FTO-targeted m6A peaks could be these consistently differentially methylated
peaks. The data of consistently differentially methylated peaks and genes were saved in
both “con_sig_diff_peak.bed” and “con_sig_diff_peak.xls”. Microsoft Excel can view
“con_sig_diff_peak.xls” saved in a folder called “exomePeak” attached to Appendix II.
Meanwhile, exomePeak also identified all m6A peaks. Specifically, the data of the
numbers of each read in six IP samples and six INPUT samples from two cell lines are
respectively integrated to two uniform datasets (i.e. Uniform IP and Uniform INPUT).
Then they compared with each other to find out all highly enriched peaks, indicating m6A
peaks. The data of all these detected peaks and genes were saved in both “diff_peak.bed”
and “diff_peak.xls”. Microsoft Excel can view “diff_peak.xls” saved in a folder called
“exomePeak” attached to Appendix II. The code used in this step is saved in “Script 2 (R
script).txt” attached to Appendix I.
Thirdly, Guitar, a R/Bioconductor package, was used to detect the distribution of FTO-
targeted m6A peaks on the coordinate of a transcript, which then was compared with the
distribution of transcriptome-wide m6A peaks. In this step, “con_sig_diff_peak.bed” and
“diff_peak.bed” acquired from step 2 respectively contain the FTO-targeted m6A peaks
and all the detected m6A peaks. Meanwhile, it also needs to convert a transcriptDb file
called “mm10.txdb” that contains the gene annotation information to Guitar coordiantes,
which is required to link the transcriptomic landmarks and genomic coordinates together
(Meng, 2015b). Finally, by using these three objects, a function called “GuitarPlot” was
used to generate a plot, which shows the relative frequency of m6A sites on 5’UTR
(Untranslated Regions), CDS (Coding Sequence) and 3’UTR for both factors (i.e. FTO-
11. 11
targeted m6A sites and all the detected m6A sites). The code used in this step is saved in
“Script 3 (R script).txt” attached to Appendix I.
Fourthly, Integrative Genomics Viewer (IGV) is a high-performance visualization tool for
interactive exploration of large, integrated genomic datasets. This step made use of this
tool to visualize the FTO-targeted m6A sites and aligned bam files respectively acquired
from step 2 and step 1. The BED file called “con_sig_diff_peak.bed” can be directly
visualized in the IGV browser. Igvtools (part of IGV) was used to convert these bam files
to viewable TDF format. Finally, these generated TDF files and the BED file were
together visualized by using IGV browser. The code of the conversion from bam files to
TDF files is saved in “Script 4 (Bash script).txt” attached to Appendix I.
Fifthly, “con_sig_diff_peak.xls” contains the information of the consistently differentially
methylated sites and genes. Since it is known that FTO is a m6A demethylase, knocking
out FTO gene should result in hypermethylation of its target sites. The symbols of those
genes whose mRNAs have hypermethylation sites (i.e. differential log2 fold change
(diff.log2.fc) > 0) were extracted from “con_sig_diff_peak.xls” and underwent Gene
Ontology (GO) enrichment analysis by using the DAVID functional annotation tool
(Huang et al., 2008): https://david.ncifcrf.gov/summary.jsp. Actually, hypergeometric test
is the statistical method behind this analysis. Specifically, there are two ratios
participating in this hypergeometric test. The first ratio is the number of genes associated
with one term in a specific database divided by the number of all genes in this database.
The second ratio is the number of genes that simultaneously belong to the same term and
12. 12
the uploaded FTO-targeted gene list divided by the number of all genes in the uploaded
FTO-targeted gene list. Finally, fold enrichment is the second ratio divided by the first
ratio. At the same time, generated p value is used to decide whether fold enrichment is
statistically significant or not. At last, DAVID outputted several core elements including
the enriched terms, the subsets of FTO-targeted genes that belong to their corresponding
terms, fold enrichment, and p-values. The data was saved in “Functional Annotation
chart.xls”, which was then saved in a folder called “DAVID” attached to Appendix II.
Sixthly, Cuffdiff can calculate expression in two or more samples and test whether
changes of expression level between them are statistically significant or not (Trapnell et
al., 2012). In this step, running Cuffdiff requires six aligned bam files of input samples
from both cell lines along with the mouse reference transcriptome saved in “genes.gtf”.
All data generated by a Cuffdiff analysis was saved in a folder called “CuffdiffOut_S9”.
And the data of changes of gene expression level and p-values saved in “gene_exp.diff”
can be viewed with spreadsheet and charting programs such as Microsoft Excel. The file
“gene_exp.diff” was saved in a folder called “Cuffdiff” attached to Appendix II.
However, it is difficult to browse the global changes and trend in gene expression
between wild type condition and FTO knockout condition. Fortunately, CummeRbund, a
R/Bioconductor package, can help visualize all data generated by a Cuffdiff analysis,
which transform the Cuffdiff data from Bash UNIX Shell system into the R statistical
computing environment, making it possible that other advanced statistical analysis and
plotting packages have more access to RNA-Seq expression analysis with Cuffdiff
(Trapnell et al., 2012). A function called “csScatter” was used to generate scatterplots by
13. 13
using Cuffdiff data, which can identify biases in gene expression between wild type
condition and FTO knockout condition. The codes of Cuffdiff analysis and CummeRbund
are respectively saved in “Script 5 (Bash script).txt” and “Script 6 (R script).txt” attached
to Appendix I.
Seventhly, Shiny, a R package, is able to develop interactive web applications, which
allows researchers to customize applications, servers of which can efficiently process and
analyze the user input and give feedbacks to user interface in real-time (Wojciechowski et
al., 2015). Since there is a mass of data of FTO-targeted m6A peaks from exomePeak and
GO enrichment analysis from DAVID, this problem makes it difficult to retrieve any
FTO-targeted m6A peak or any term enriched with FTO-targeted genes. As a result, it is
indispensable to develop an interactive application to meet these requirements by using
Shiny. Moreover, Shiny Server is a server program that makes Shiny applications
available over the web. At last, Shiny Server put this shiny web application online, which
can share this application with the world. The codes of User Interface (UI) and Server of
this web application are separately saved in “Script 7_UI (R script).txt” and “Script
7_Server (R script).txt” attached to Appendix I.
In summary, Figure 3 shows a flowchart that can be helpful for readers to review the
whole procedure of this study and understand the following content.
14. 14
Figure 3 The whole procedure of this study shown in the flowchart. There are nine
tools written in bold (i.e. Fastq-dump, Tophat, exomePeak, Guitar, IGV, Cuffdiff,
CummeRbund, DAVID and Shiny). All results of this study are represented in dotted
textboxes. Some significant results will be discussed in Results and Discussion; others
are attached to Appendix.
15. 15
Results and Discussion
Differential RNA methylation between wild type and FTO knockout cell lines in
mouse midbrain
The MeRIP-seq data this study analyzed includes three biological replicates for each cell
line (i.e. wild type and FTO knockout cell lines). After alignment by TopHat and
differential methylation detection by exomePeak, there are 1,132 consistently differential
m6A peaks between these two cell lines, which are saved in “con_sig_diff_peak.xls”.
Since FTO is a demethylase, the diff.log2. fc should be larger than 0, which means these
consistently differential m6A peaks are hypermethylation peaks. Actually, only 3
differential m6A peaks’ diff.log2. fc are less than 0, probably because of biological
variability between replicates of the same experiment and because of technical variability
during library preparation and sequencing (Trapnell et al., 2012). Therefore, this result
accords with the fact that FTO is a m6A-specific demethylase. Moreover, there are 15,731
genes containing m6A sites in terms of transcriptome-wide m6A profile, which are saved
in “diff_peak.xls”. The 1,129 hypermethylation sites (i.e. FTO-targeted sites) are
distributed on 912 genes’ transcripts, which is approximately 5.8 percent of all genes
whose mRNA transcripts contain m6A sites (i.e. 15,731 genes). This result indicates that
FTO plays an important role in regulating expressed genes in mouse midbrain. Table 1
shows the top 20 genes that contain m6A peaks with the highest levels of enrichment.
Notably, because the resolution of m6A site detection for MeRIP-Seq is 200 nt, it is
possible that the area of one peak could cover multiple individual m6A residues.
Nevertheless, the peaks with high levels of methylation indicate that their transcripts are
very likely influenced by the demethylation activity of FTO.
16. 16
Table 1 The top 20 genes encoding transcripts with the highest degree of m6A
enrichment. Gene names are represented as gene symbols; and the positions of their m6A
peaks are defined by three parameters: chromosome no., peak start site and peak end site.
Zfp3612, the gene with the highest level of m6A enrichment, is under the regulation of
FTO to the largest extent.
The IGV browser can visualize the detected differential methylation sites and the aligned
bam files (i.e. the peaks of reads) (Figure 4). The peaks represent the degrees of read
enrichment along the genome; and the black block represents the differential methylation
peaks derived from the analysis of exomePeak. Although there is an obvious increase of
methylated reads when comparing IP samples with INPUT samples, it is difficult for
Chr Peak
Start
Peak End Gene
Symbol
Fold
Enrichment
chr17 84185184 84186233 Zfp36l2 209
chr19 46501647 46513192 Trim8 111
chr11 102436980 102438887 Fam171a2 89.3
chr7 96905888 96906877 Tenm4 84.4
chr11 59157755 59159128 Iba57 83.2
chr8 110956936 110962195 St3gal2 79.4
chr4 141467919 141469030 Spen 77.5
chr15 79996588 79997243 Pdgfb 75.7
chr7 45794300 45798497 Lmtk3 73.4
chr7 142081790 142083345 Dusp8 73.4
chr6 108662606 108665788 Bhlhe40 71.2
chr2 157469804 157470852 Src 70.2
chr2 160365055 160366255 Mafb 66.4
chr8 119941223 119947089 Usp10 66.3
chr10 81020572 81025377 Diras1 66
chr7 28466770 28467548 Lrfn1 63.9
chr11 116538886 116539692 Ube2o 63.7
chr2 30256998 30262502 Lrrc8a 63.1
chr19 5068613 5069507 Cd248 62.3
chr16 18151069 18152349 Rtn4r 61.9
17. 17
naked eyes to identify the difference among IP samples or INPUT samples in either
condition. Therefore, Figure 4 merely demonstrates the basic principle for exomePeak
that has been described in Methods about how to identify one differential methylation
site; rather, exomePeak is an accurate tool to offer quantified positions of all differential
methylation sites rather than visualization.
Figure 4 A differential methylation site and signal peaks for Gpr26 gene shown in
IGV browser. When compared with the INPUT samples of two conditions, Gpr26 is
somewhat down-regulated under FTO knockout condition. When compared with the IP
samples of two conditions, the percent of methylated reads slightly increases under FTO
knockout condition. Collectively, there is an RNA m6A hypermethylation site spanning
the start codon of Gpr26 after FTO knockout.
18. 18
On top of that, Guitar, a R/Bioconductor package, is able to visualize RNA m6A
methylation sites with regard to the landmarks of RNA transcripts, i.e., transcription
starting site, start codon, stop codon and transcription ending site. These four landmarks
can divide a RNA transcript into three regions, including 5’UTR, CDS and 3’UTR.
Figure 5 shows the relative frequency of m6A sites on these three regions for two factors
(i.e. FTO-targeted m6A sites and transcriptome-wide m6A sites). Both FTO-targeted m6A
sites and transcriptome-wide m6A sites appear most frequently near stop codon, which
indicates that FTO-targeted m6A sites near stop codon could have a biological influence
on mRNA transcripts. In addition, FTO-targeted m6A sites also concentrate in the CDS.
Accurately, in the CDS, the frequency of m6A sites goes up steadily along transcript
length. Thus, FTO also mainly regulates m6A RNA modification in the CDS. Some
studies reveal that FTO is likely to affect the translation of the proteins encoded from
m6A modified mRNAs (Hess et al., 2013). However, the mechanism of how the position
and absolute number of m6A sites on mRNA transcripts could influence their translation
is still poorly understood. It is probably that this mechanism could be associated with
several mechanisms of post-transcriptional modification such as polyadenylation and
capping. Because human and mouse are both eukaryotes, the transcription and translation
occur at nucleus and cytoplasm separately, which means that mRNA generated from
nucleus should be exported from nuclear pores into cytoplasm for protein synthesis. It is
known that polyadenylation increases the stability of mRNA. The longer the poly-A
chain, the more stable the mRNA. In addition, capping can help the mRNA to be
recognized by transport proteins which then transport the mRNA from the nucleus to the
cytoplasm (Shatkin & Manley, 2000). It is possible that m6A modification could
19. 19
influence these two processes for the purpose of regulating protein expression. However,
this is just a hypothesis, which will require much time and work to discover whether this
hypothesis is correct or not. Nevertheless, the data of FTO-targeted m6A sites could be
very informative and efficient for those specialized researchers to study the effects of
certain mRNA transcripts’ m6A sites on protein expression, which might provide some
constructive information so as to discover a systematic and sound mechanism that makes
sense for all genes whose mRNA transcripts contain m6A sites.
Figure 5 Enrichment of m6A across the length of mRNA transcripts for two factors
(i.e. FTO-targeted m6A sites and transcriptome-wide m6A sites). The x-axis represents
the relative positions of 3 regions: 5’UTR, CDS, and 3’UTR on the mRNA transcript.
The y-axis represents the relative frequency of m6A sites along mRNA transcripts. The
20. 20
frequencies less than 1 indicate that m6A sites seldom appear at these positions. The
frequencies equal to 1 indicate that m6A sites are randomly distributed on these positions.
The frequencies greater than 1 indicates that m6A sites are relatively intensively
distributed on these positions. The red and blue areas respectively show the enrichment of
FTO-targeted m6A sites and transcriptome-wide m6A sites on the mRNA transcript.
Differential expression between wild type and FTO knockout cell lines in mouse
midbrain
Cuffdiff is powerful program that can calculate expression level in two or more samples
and then test whether changes of expression between them are statistically significant or
not. Two cell lines of wild type and FTO knockout conditions underwent the analysis of
Cuffdiff; and the result from Cuffdiff is visualized by using CummeRbund. In the
“gene_exp.diff” file, there are 23,352 genes whose gene expression levels are changed
between these two conditions; however, all these changes are not statistically significant.
Meanwhile, Figure 6 also shows that there is not an apparent bias in gene expression
between two cell lines. This indicates that FTO gene might be not involved in the
regulation of gene expression level. Some studies found that although FTO has the ability
to oxidatively demethylate m3U and m3T in single-stranded DNA (ssDNA), FTO shows
low activity toward these two base modifications (Jia et al., 2011). Moreover, DNA
directly participates in transcription in terms of central dogma of genetics. Together,
compared to protein expression, FTO might be not relatively associated with gene
expression.
21. 21
Figure 6 The difference between gene expression levels in wild type and FTO
knockout cell lines. Each dot on this figure represents one expressed gene. The x-axis
represents the value of FPKM, a normalization of gene expression level, in wild type cell
line. The greater value of FPKM indicates the higher gene expression level. The y-axis
represents the value of FPKM in FTO knockout cell line. The linear regression does not
show an obvious bias toward any cell line.
22. 22
Functional enrichment analysis of FTO-targeted genes
DAVID functional annotation tool carried out the GO enrichment analysis of FTO -
targeted genes. The data of the GO enrichment analysis was saved in “Functional
Annotation chart.xls”. Here, Figure 7 only shows five representative terms associated
with FTO-targeted genes in “Functional Annotation chart.xls”, including phosphoprotein
(p-value 6.62e-40), alternative splicing (p-value 6.51e-25), synapse (p-value 2.85e-13),
ion binding (p-value 5.27e-12), and neuron projection (p-value 1.17e-09). Some studies
revealed that FTO can target a subset of mRNAs that participate in neuronal function
(Hess et al., 2013). They found that the proteins encoded from many FTO-targeted genes
are associated with neuronal signaling pathways, such as GRIN1 and GNAO1;
particularly, those are specifically involved in Dopaminergic (DA) signaling, including
PDE1b, GNAO1, DRD3, SYN1 (synapsin I) and GIRK2 (Hess et al., 2013). DA
signaling participates in the regulation of complex behaviors, particularly for food intake.
If this signaling is out of control, it might increase the risk of obesity because of food
addiction. Furthermore, the function of DA D2 receptor of the DA system is to a large
extent involved in food motivation and brain signaling in obesity (Baik, 2013). All in all,
FTO could indirectly regulate DA signaling so as to control body mass. Therefore, the
data from GO enrichment analysis by using DAVID can give some orientations to
specialized researchers, which is helpful for them to discover other mechanisms of how
FTO can be involved in controlling body mass.
23. 23
Figure 7 Five representative terms that are enriched with FTO-targeted genes. The
x-axis represents the five terms’ names. The y-axis represents the p-value. The more less
a p-value, the more statistically significantly FTO is implicated in the corresponding
term.
24. 24
A web application developed by Shiny
Shiny, a R package, was used to develop an interactive web application called
“Bioinformatics”, which contains the data of differential methylation sites from
exomePeak and GO enrichment analysis from DAVID. Although the data can be
viewable in “con_sig_diff_peak.xls” and “Functional Annotation chart.xls” by using
Microsoft Excel, it is often required to select and visualize a part of the data like the m6A
sites on a specific gene. Therefore, it is quite necessary to meet these requirements in
virtue of this interactive web application, which can be now accessed by
http://180.208.58.19:3838/sample-apps/m6A_v2/. It might require a little time for
application launch.
Figure 8 shows the first functional interface of this web application with a title called
“Gene’s m6A position”, which is capable of retrieval of m6A peak positions on FTO-
targeted genes derived from “con_sig_diff_peak.xls”. After inputting a gene name
(ENTREZ id, alias or symbol), there are two results shown, including a plot and a table.
The plot visualize the relative positions of the gene and FTO -targeted m6A peaks along
the genome, which can make users easily know the relative position of FTO-targeted m6A
peaks (blue) along the genes (orange). Moreover, one gene could have several isoforms.
Since m6A peaks were initially identified on mRNA, m6A peaks are divided by introns
into several blocks, which then are distributed on exons after transferring m6A peaks from
mRNA to genome. The table shows detailed information relative to the plot, including
start and end points of m6A peak position and the number, sizes and start points of blocks
for each m6A peak. And the start point of the first block for each m6A peak defaults to 0.
25. 25
Figure 8 The first functional interface for retrieval of m6A peak positions on FTO-
targeted genes. The notes in red give some details about how to use this function and
understand the results in the plot and table.
Figure 9 shows the second functional interface of this web application with a title called
“DAVID”, which is capable of retrieval of terms enriched with FTO-targeted genes
derived from “Functional Annotation Chart.xls”. After inputting a term that could be
regarded as a gene product property, there is a table shown. In this table, the category
represents the original database the term is come from. According to hypergeometric test,
fold enrichment is calculated from the values of “Count”, “List Total”, “Pop Hits” and
“Pop Total” in the table (fold enrichment = (“Count” / “List Total”) / (“Pop Hits” / “Pop
26. 26
Total”) ). Fold enrichment can represent how intensively FTO is implicated in terms users
are interested in. In addition, if p-value is less than 0.05, which means that the calculated
fold enrichment is statistically significant.
Figure 9 The second functional interface for retrieval of terms enriched with FTO-
targeted genes. The notes in red explain some details about how to use this function and
understand the results in the table.
However, this web application also can be improved to some extent in order to empower
users to further customize the analysis of this study for their specific needs and extract
more insight from the data. There are two new functions that can be added to the second
functional interface. For the first one, users can input a category like a database of Type I
diabetes. Then the server will output several most statistically significant terms; in other
words, these terms have the least p-values. Certainly, the number of these terms can be
modifiable for different users. As a result, specialized researchers only need to focus on
the terms which are most relevant to their own fields. For example, a researcher who is
27. 27
dedicated to the research of Type I diabetes can utilize this function to know whether
Type I diabetes is related to FTO by virtue of some terms. If this relationship exists, this
researcher then could refer to several most statistically significant terms outputted from
this function and do some related experiments so as to discover how FTO could be
associated with Type I diabetes by regulating these terms. For the second one, users can
input a gene list. Then the server will output which genes belong to FTO-targeted genes
and the terms these matched genes belong to. This function is very useful. For example, a
researcher probably finds some genes that regulate certain signaling pathway, which
might be associated with body mass. This researcher can upload this gene list and then
know whether some genes of this gene list are FTO-targeted genes, which could be
helpful to improve comprehension for the regulatory mechanisms of this signaling
pathway in regard to FTO-dependent demethylation.
Acknowledgement
The author appreciated the supervision and guidance from Dr. Jia Meng, Department of
Biological Sciences, XJTLU.
References
Baik, J. (2013) ‘Dopamine signaling in food addiction: role of dopamine D2 receptors’,
BMB Rep, 46(11), pp.519-526.
Dominissini, D. et al. (2012) ‘Topology of the human and mouse m6A RNA methylomes
revealed by m6A-seq’, Nature, 485, pp.201-206.
28. 28
Fawcett, K.A. & Barroso, I. (2010) ‘The genetics of obesity: FTO leads the way’, Trends
Genet, 26, pp.266-274.
Fischer, J., Koch, L., Emmerling, C., Vierkotten, J., Peters, T., Bru¨ ning, J.C. & Ru¨ ther,
U. (2009) ‘Inactivation of the Fto gene protects from obesity’, Nature, 458, pp.894-898.
Hess, M.E., Hess, S., Meyer, K.D., Verhagen, L.A., Koch, L., Bronneke, H.S., Dietrich,
M.O., Jordan, S.D., Saletore, Y., Elemento, O., Belgardt, B.F., Franz, T., Horvath, T.L.,
Ruther, U., Jaffrey, S.R., Kloppenburg, P., Bruning, J.C. & Neurosci, N. (2013) ‘The fat
mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain
circuitry’, Nature Neuroscience, 16, pp.1042-1048.
Huang, D.W., Sherman, B.T. & Lempicki, R.A. (2008) ‘Systematic and integrative
analysis of large gene lists using DAVID bioinformatics resources’, Nat. Protocols, 4,
pp.44-57.
Jia, G., Fu, Y., Zhao, X., Dai, Q., Zheng, G., Yang, Y., Yi, C., Lindahl, T., Pan, T., Yang,
Y.G. et al. (2011) ‘N6-methyladenosine in nuclear RNA is a major substrate of the
obesity-associated FTO’, Nat. Chem. Biol, 7, pp.885-887.
Meng, J., Lu, Z., Liu, H., Zhang, L., Zhang, S, Chen, Y., Rao, M.K. & Huang, Y. (2014)
‘A protocol for RNA methylation differential analysis with MeRIP-Seq data and
29. 29
exomePeak R/Bioconductor package’, Methods, 69 (3), pp.274-281.
Meng, J. (2015a) An Introduction to exomePeak [Online]. Available from:
http://www.bioconductor.org/packages/release/bioc/vignettes/exomePeak/inst/doc/exome
Peak-Overview.pdf (Accessed: 14 April 2016).
Meng, J. (2015b) An Introduction to Guitar Package [Online]. Available from:
http://www.bioconductor.org/packages/release/bioc/vignettes/Guitar/inst/doc/Guitar-
Overview.pdf (Accessed: 14 April 2016).
Meyer, K.D., Saletore, Yogesh., Zumbo, P., Elemento, Olivier., Mason, C.E. & Jaffrey,
S.R. (2012) ‘Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 30
UTRs and near Stop Codons’, Cell, 149, pp.1635-1646.
Shatkin, A.J. & Manley, J.L. (2000) ‘The ends of the affair: capping and
polyadenylation’, Nat Struct Biol, 7(10), pp.838-842.
The Gene Ontology Consortium (2008) ‘The Gene Ontology project in 2008’, Nucleic
Acids Res, 36 (Database issue), pp.D440-444.
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H.,
Salzberg, S.L., Rinn, J.L. & Pachter, L. (2012) ‘Differential gene and transcript
expression analysis of RNA-seq experiments with TopHat and Cufflinks’, Nat. Protocols,
30. 30
7, pp.562-578.
Wojciechowski, J., Hopkins, A.M. & Upton, R.N. (2015) ‘Interactive Pharmacometric
Applications Using R and the Shiny Package’, CPT Pharmacometrics Syst. Pharmacol,
4, pp.146-159.
31. 31
Appendices (saved in a disc)
Appendix I – The scripts of all steps described in Methods
A folder called “Appendix I” contains all scripts, including:
“Script 1 (Bash script).txt”,
“Script 2 (R script).txt”,
“Script 3 (R script).txt”,
“Script 4 (Bash script).txt”,
“Script 5 (Bash script).txt”,
“Script 6 (R script).txt”,
“Script 7_UI (R script).txt”,
“Script 7_Server (R script).txt”.
Appendix II – The results from certain steps that are not shown in Results and
Discussion
A folder called “Appendix II” contains three subfolders for all steps, including:
“exomePeak”: “con_sig_diff_peak.xls” and “diff_peak.xls”;
“DAVID”: “Functional Annotation chart.xls”;
“Cuffdiff”: “gene_exp.diff”.