SlideShare a Scribd company logo
1 of 31
1
Honours Project BIO303 2015/16
Final Report
Name: Jiangyuan Liu
Title: The analysis of FTO-targeted mRNA m6
A
methylation in mouse transcriptome
Word count: 4324
Supervisor: Dr. Jia Meng
Biological Sciences
2
Content
Abstract 3
1. Introduction 4
1.1 FTO gene 4
1.2 MeRIP-Seq 5
1.3 Gene Ontology 7
2. Methods 8
3. Results and Discussion 15
3.1 Differential RNA methylation between wild type and FTO knockout cell lines
in mouse midbrain 15
3.2 Differential expression between wild type and FTO knockout cell lines in
mouse midbrain 20
3.3 Functional enrichment analysis of FTO-targeted genes 22
3.4 A web application developed by Shiny 24
4. Acknowledgement 27
5. References 27
6. Appendices 31
3
Abstract
Recent studies have found that methylation of the N6 position of adenosine (m6A) is a
common base modification on mRNAs. Meanwhile, it has discovered that the
demethylase encoded from fat mass and obesity-associated (FTO) gene is able to carry
out m6A modification on mRNA. Therefore, there might be a connection between m6A
modification and physiological functions, which may give researchers some information
to understand the mechanisms about how FTO demethylase can be associated with
obesity through regulating m6A modification. Here, this study aims to identify the
differential methylation sites between wild type and FTO knockout cell lines in mouse
midbrain, which could indicate FTO-targeted m6A sites and genes. This study has found
that FTO-targeted m6A sites mainly appear around stop codon and on coding sequence
(CDS). In addition, the analysis of differential expression between these two cell lines has
found that there are no statistically significantly changes of expression levels, indicating
that FTO might not participate in regulation of gene expression levels. Gene Ontology
(GO) enrichment analysis has revealed that there are many functions which have an
association with FTO; particularly, FTO could regulate a selective subset of mRNAs
whose biological functions are specifically related to neuronal signal transduction.
Finally, this study developed a web application by using Shiny, a R package, in order to
allow users to customize the analysis of this study for their specific needs and extract
more insight from the data, which is helpful to explore the physiological roles of m6A
modification on mRNAs.
4
Introduction
FTO gene
The fat mass and obesity-associated (FTO) gene plays an important role in regulating
energy utilization and metabolism (Fischer et al., 2009). As for human beings, the
increase of FTO expression is related to the high body mass index and risk for obesity
(Fawcett and Barroso, 2010). Some studies have found that FTO demethylase has the
ability to demethylate N6-methyladenosine (m6A) (Figure 1), which is the adenosine
modification without artificial manipulation (Jia et al., 2011). These studies revealed that
there might be a relationship between adenosine modification and its physiological roles,
which participate in human biological process (Meyer et al., 2012).
Figure 1 FTO catalyzes the conversion of N6-methyladenosine in mRNA to
adenosine.
5
MeRIP-Seq
m6A-specific methylated RNA immunoprecipitation with next generation sequencing
(MeRIP-Seq) is powerful method, which can be used to localize transcriptome-wide
m6A. Some studies showed that 7,676 mammalian genes’ messenger RNA (mRNA)
contain m6A, which indicates that m6A modification is widespread on mRNAs encoded
from many genes (Meyer et al., 2012). Here, it is necessary to introduce the procedure of
MeRIP-Seq (Figure 2). First of all, rRNA species should be removed from total RNA by
RiboMinus treatment. Secondly, purified mRNA is cut into ~100-nt-long oligonucleotides
by chemical fragmentation. Thirdly, those reads with m6A could be immunoprecipitated
by using m6A antibody-coupled Dynabeads. Fourthly, the reads eluted from these
Dynabeads and untreated input control reads are ligated to sequencing adaptors and then
converted to cDNA, which is then amplified by PCR in order to be sequenced and
aligned to a reference transcript database (Dominissini et al., 2012). After compared with
m6A signals from input sample, some distinct peaks from immunoprecipitated sample
represent high enrichment of reads, indicating that these peaks are m6A signal peaks.
Since an m6A site might sit on any position along a 100nt fragment, it is necessary to
make the m6A site close to a center. As a result, m6A sites can be represented as peaks
whose base and midpoint are ~200nt and ~100nt wide respectively, which means a
resolution of 200 nt for m6A sites identification by MeRIP-Seq.
6
Figure 2 Outline of MeRIP-SeqProtocol
Moreover, it is necessary to explain why the untreated input control sample is
indispensable for MeRIP-Seq. In fact, the transcriptome-wide RNA methylation is under
the control of both transcriptional and enzymatic regulations. It is possible that because of
differential expression, one gene might transcript more copies of its RNA under certain
conditions, which causes an increase of the absolute amount of this RNA. However, this
cannot indicate a stronger RNA enzymatic hypermethylation (Meng et al., 2014). Thus,
MeRIP-Seq must include an input sample in order to estimate the number of RNA
molecules, which is only influenced by gene expression. In other words, the input signal
seems like a background compared to m6A RNA immunoprecipitation signal. The
7
difference between them is attributed to the real enzymatic methylation.
Gene Ontology
Gene ontology (GO) is a major innovative project of bioinformatics in order to unify the
characteristics of gene and gene product attributes existing in all species (The Gene
Ontology Consortium, 2008). “Ontologies” represent the characteristics of detectable
things, and how these things interact with each other. An ontology of defined terms
offered by GO project represents gene product properties, which is helpful for different
specialized biologists to communicate and share their information. There are three
domains in this ontology, including cellular component, molecular function and
biological process. And the defined terms are divided into these three domains. Moreover,
other organizations or databases also have built their own ontology systems as same as
GO project. These ontology systems represent different categories whose databases focus
on different aspects such as diseases and research organizations.
Here, this study identified the differential RNA m6A sites between wild type and FTO
knockout cell lines in mouse midbrain by analyzing a MeRIP-Seq dataset downloaded
from Gene Expression Omnibus (GEO) database in order to know whether FTO controls
demethylation of all m6A-modified mRNAs or a distinct subset of these mRNAs. Then
those FTO-targeted genes underwent GO enrichment analysis. In addition, this study also
found out whether differentially expressed genes exist under wild type condition and
FTO knockout condition, indicating whether FTO regulates gene expression or not.
Eventually, an interactive web application was developed and then put online in order to
8
user-friendly share the results of this study everywhere, which would be helpful for other
researchers to further study physiological roles of m6A modification regulated by FTO.
Methods
This study analyzed the MeRIP-Seq dataset (GEO GSE47217), which measures the
transcriptome-wide m6A profiles for wild type and FTO knockout cell lines in mouse
midbrain (Hess et al., 2013). The Bash UNIX Shell and R system were used to analyze
this MeRIP-Seq dataset. In short, this study began from downloading raw data from GEO
database, then carried out reads alignement, m6A site detection, the analysis of
differential methylation, the analysis of m6A mRNA topology on 3’UTR (Untranslated
Regions), CDS (Coding Sequence) and 5’UTR, transcriptome-wide m6A site
visualization, functional annotation, differentially expression detection, web application
development. The detailed of each step are described in the following content.
First of all, there are three biological replicates for each of wild type and FTO knockout
cell lines in the raw data (GEO GSE47217). Each replicate consists of an
immunoprecipitated (IP) sample and an input control sample. The raw data downloaded
from GEO database are SRA files, which first need to be converted to FASTQ files by
using a tool called “Fastq-dump” on the Bash UNIX Shell system. Then the reads for
each condition in these FASTQ files were mapped to the reference genome with TopHat,
which is a powerful software tool that can address the limitation of Bowtie, another short-
read aligner. It is incapable for Bowtie to align reads that span introns. However, TopHat
has the ability to identify that a read spans a splice junction and possible junction’s splice
9
sites, which increase the accuracy of reads alignment (Trapnell et al., 2012). The results
of alignment were saved in twelve bam files. The code used in this step is saved in
“Script 1 (Bash script).txt” attached to Appendix I.
Secondly, exomePeak, a R/Bioconductor package, is the vitally software tool of this
study, which was used to detect RNA m6A sites and further identify the differential RNA
m6A sites in term of percentage rather the absolute amount in this case control study. This
package is designed for the analysis of affinity-based epitranscriptome shortgun
sequencing data from MeRIP-seq (i.e. m6A-Seq). Moreover, exomePeak R-package can
statistically analyze multiple biological replicates at the same time; it can also internally
remove PCR artifacts and multi-mapping reads (Meng, 2015a). Since PCR artifacts are
not derived from immunoprecipitated sample and input control sample, they cannot be
used to represent signal peaks. Multi-mapping reads should also be removed because they
could increase the experimental errors. In the course of the analysis of exomePeak, the IP
samples of both cell lines were firstly compared with their input samples in the common
replicates to acquire the difference of percentage of reads number so as to detect RNA
m6A peaks. Then these two differences of percentage of reads number from wild type and
FTO knockout cell lines were compared with each other to identify the differential RNA
m6A peaks. Since there are three biological replicates of bam files for each of wild type
and FTO knockout cell lines after TopHat, three datasets of the differential RNA m6A
peaks were acquired. Furthermore, exomePeak screened all the consistently differentially
methylated peaks. In other words, there are peaks that are consistently differentially
methylated in all these three datasets acquired previously, indicating highly confidence.
10
Therefore, FTO-targeted m6A peaks could be these consistently differentially methylated
peaks. The data of consistently differentially methylated peaks and genes were saved in
both “con_sig_diff_peak.bed” and “con_sig_diff_peak.xls”. Microsoft Excel can view
“con_sig_diff_peak.xls” saved in a folder called “exomePeak” attached to Appendix II.
Meanwhile, exomePeak also identified all m6A peaks. Specifically, the data of the
numbers of each read in six IP samples and six INPUT samples from two cell lines are
respectively integrated to two uniform datasets (i.e. Uniform IP and Uniform INPUT).
Then they compared with each other to find out all highly enriched peaks, indicating m6A
peaks. The data of all these detected peaks and genes were saved in both “diff_peak.bed”
and “diff_peak.xls”. Microsoft Excel can view “diff_peak.xls” saved in a folder called
“exomePeak” attached to Appendix II. The code used in this step is saved in “Script 2 (R
script).txt” attached to Appendix I.
Thirdly, Guitar, a R/Bioconductor package, was used to detect the distribution of FTO-
targeted m6A peaks on the coordinate of a transcript, which then was compared with the
distribution of transcriptome-wide m6A peaks. In this step, “con_sig_diff_peak.bed” and
“diff_peak.bed” acquired from step 2 respectively contain the FTO-targeted m6A peaks
and all the detected m6A peaks. Meanwhile, it also needs to convert a transcriptDb file
called “mm10.txdb” that contains the gene annotation information to Guitar coordiantes,
which is required to link the transcriptomic landmarks and genomic coordinates together
(Meng, 2015b). Finally, by using these three objects, a function called “GuitarPlot” was
used to generate a plot, which shows the relative frequency of m6A sites on 5’UTR
(Untranslated Regions), CDS (Coding Sequence) and 3’UTR for both factors (i.e. FTO-
11
targeted m6A sites and all the detected m6A sites). The code used in this step is saved in
“Script 3 (R script).txt” attached to Appendix I.
Fourthly, Integrative Genomics Viewer (IGV) is a high-performance visualization tool for
interactive exploration of large, integrated genomic datasets. This step made use of this
tool to visualize the FTO-targeted m6A sites and aligned bam files respectively acquired
from step 2 and step 1. The BED file called “con_sig_diff_peak.bed” can be directly
visualized in the IGV browser. Igvtools (part of IGV) was used to convert these bam files
to viewable TDF format. Finally, these generated TDF files and the BED file were
together visualized by using IGV browser. The code of the conversion from bam files to
TDF files is saved in “Script 4 (Bash script).txt” attached to Appendix I.
Fifthly, “con_sig_diff_peak.xls” contains the information of the consistently differentially
methylated sites and genes. Since it is known that FTO is a m6A demethylase, knocking
out FTO gene should result in hypermethylation of its target sites. The symbols of those
genes whose mRNAs have hypermethylation sites (i.e. differential log2 fold change
(diff.log2.fc) > 0) were extracted from “con_sig_diff_peak.xls” and underwent Gene
Ontology (GO) enrichment analysis by using the DAVID functional annotation tool
(Huang et al., 2008): https://david.ncifcrf.gov/summary.jsp. Actually, hypergeometric test
is the statistical method behind this analysis. Specifically, there are two ratios
participating in this hypergeometric test. The first ratio is the number of genes associated
with one term in a specific database divided by the number of all genes in this database.
The second ratio is the number of genes that simultaneously belong to the same term and
12
the uploaded FTO-targeted gene list divided by the number of all genes in the uploaded
FTO-targeted gene list. Finally, fold enrichment is the second ratio divided by the first
ratio. At the same time, generated p value is used to decide whether fold enrichment is
statistically significant or not. At last, DAVID outputted several core elements including
the enriched terms, the subsets of FTO-targeted genes that belong to their corresponding
terms, fold enrichment, and p-values. The data was saved in “Functional Annotation
chart.xls”, which was then saved in a folder called “DAVID” attached to Appendix II.
Sixthly, Cuffdiff can calculate expression in two or more samples and test whether
changes of expression level between them are statistically significant or not (Trapnell et
al., 2012). In this step, running Cuffdiff requires six aligned bam files of input samples
from both cell lines along with the mouse reference transcriptome saved in “genes.gtf”.
All data generated by a Cuffdiff analysis was saved in a folder called “CuffdiffOut_S9”.
And the data of changes of gene expression level and p-values saved in “gene_exp.diff”
can be viewed with spreadsheet and charting programs such as Microsoft Excel. The file
“gene_exp.diff” was saved in a folder called “Cuffdiff” attached to Appendix II.
However, it is difficult to browse the global changes and trend in gene expression
between wild type condition and FTO knockout condition. Fortunately, CummeRbund, a
R/Bioconductor package, can help visualize all data generated by a Cuffdiff analysis,
which transform the Cuffdiff data from Bash UNIX Shell system into the R statistical
computing environment, making it possible that other advanced statistical analysis and
plotting packages have more access to RNA-Seq expression analysis with Cuffdiff
(Trapnell et al., 2012). A function called “csScatter” was used to generate scatterplots by
13
using Cuffdiff data, which can identify biases in gene expression between wild type
condition and FTO knockout condition. The codes of Cuffdiff analysis and CummeRbund
are respectively saved in “Script 5 (Bash script).txt” and “Script 6 (R script).txt” attached
to Appendix I.
Seventhly, Shiny, a R package, is able to develop interactive web applications, which
allows researchers to customize applications, servers of which can efficiently process and
analyze the user input and give feedbacks to user interface in real-time (Wojciechowski et
al., 2015). Since there is a mass of data of FTO-targeted m6A peaks from exomePeak and
GO enrichment analysis from DAVID, this problem makes it difficult to retrieve any
FTO-targeted m6A peak or any term enriched with FTO-targeted genes. As a result, it is
indispensable to develop an interactive application to meet these requirements by using
Shiny. Moreover, Shiny Server is a server program that makes Shiny applications
available over the web. At last, Shiny Server put this shiny web application online, which
can share this application with the world. The codes of User Interface (UI) and Server of
this web application are separately saved in “Script 7_UI (R script).txt” and “Script
7_Server (R script).txt” attached to Appendix I.
In summary, Figure 3 shows a flowchart that can be helpful for readers to review the
whole procedure of this study and understand the following content.
14
Figure 3 The whole procedure of this study shown in the flowchart. There are nine
tools written in bold (i.e. Fastq-dump, Tophat, exomePeak, Guitar, IGV, Cuffdiff,
CummeRbund, DAVID and Shiny). All results of this study are represented in dotted
textboxes. Some significant results will be discussed in Results and Discussion; others
are attached to Appendix.
15
Results and Discussion
Differential RNA methylation between wild type and FTO knockout cell lines in
mouse midbrain
The MeRIP-seq data this study analyzed includes three biological replicates for each cell
line (i.e. wild type and FTO knockout cell lines). After alignment by TopHat and
differential methylation detection by exomePeak, there are 1,132 consistently differential
m6A peaks between these two cell lines, which are saved in “con_sig_diff_peak.xls”.
Since FTO is a demethylase, the diff.log2. fc should be larger than 0, which means these
consistently differential m6A peaks are hypermethylation peaks. Actually, only 3
differential m6A peaks’ diff.log2. fc are less than 0, probably because of biological
variability between replicates of the same experiment and because of technical variability
during library preparation and sequencing (Trapnell et al., 2012). Therefore, this result
accords with the fact that FTO is a m6A-specific demethylase. Moreover, there are 15,731
genes containing m6A sites in terms of transcriptome-wide m6A profile, which are saved
in “diff_peak.xls”. The 1,129 hypermethylation sites (i.e. FTO-targeted sites) are
distributed on 912 genes’ transcripts, which is approximately 5.8 percent of all genes
whose mRNA transcripts contain m6A sites (i.e. 15,731 genes). This result indicates that
FTO plays an important role in regulating expressed genes in mouse midbrain. Table 1
shows the top 20 genes that contain m6A peaks with the highest levels of enrichment.
Notably, because the resolution of m6A site detection for MeRIP-Seq is 200 nt, it is
possible that the area of one peak could cover multiple individual m6A residues.
Nevertheless, the peaks with high levels of methylation indicate that their transcripts are
very likely influenced by the demethylation activity of FTO.
16
Table 1 The top 20 genes encoding transcripts with the highest degree of m6A
enrichment. Gene names are represented as gene symbols; and the positions of their m6A
peaks are defined by three parameters: chromosome no., peak start site and peak end site.
Zfp3612, the gene with the highest level of m6A enrichment, is under the regulation of
FTO to the largest extent.
The IGV browser can visualize the detected differential methylation sites and the aligned
bam files (i.e. the peaks of reads) (Figure 4). The peaks represent the degrees of read
enrichment along the genome; and the black block represents the differential methylation
peaks derived from the analysis of exomePeak. Although there is an obvious increase of
methylated reads when comparing IP samples with INPUT samples, it is difficult for
Chr Peak
Start
Peak End Gene
Symbol
Fold
Enrichment
chr17 84185184 84186233 Zfp36l2 209
chr19 46501647 46513192 Trim8 111
chr11 102436980 102438887 Fam171a2 89.3
chr7 96905888 96906877 Tenm4 84.4
chr11 59157755 59159128 Iba57 83.2
chr8 110956936 110962195 St3gal2 79.4
chr4 141467919 141469030 Spen 77.5
chr15 79996588 79997243 Pdgfb 75.7
chr7 45794300 45798497 Lmtk3 73.4
chr7 142081790 142083345 Dusp8 73.4
chr6 108662606 108665788 Bhlhe40 71.2
chr2 157469804 157470852 Src 70.2
chr2 160365055 160366255 Mafb 66.4
chr8 119941223 119947089 Usp10 66.3
chr10 81020572 81025377 Diras1 66
chr7 28466770 28467548 Lrfn1 63.9
chr11 116538886 116539692 Ube2o 63.7
chr2 30256998 30262502 Lrrc8a 63.1
chr19 5068613 5069507 Cd248 62.3
chr16 18151069 18152349 Rtn4r 61.9
17
naked eyes to identify the difference among IP samples or INPUT samples in either
condition. Therefore, Figure 4 merely demonstrates the basic principle for exomePeak
that has been described in Methods about how to identify one differential methylation
site; rather, exomePeak is an accurate tool to offer quantified positions of all differential
methylation sites rather than visualization.
Figure 4 A differential methylation site and signal peaks for Gpr26 gene shown in
IGV browser. When compared with the INPUT samples of two conditions, Gpr26 is
somewhat down-regulated under FTO knockout condition. When compared with the IP
samples of two conditions, the percent of methylated reads slightly increases under FTO
knockout condition. Collectively, there is an RNA m6A hypermethylation site spanning
the start codon of Gpr26 after FTO knockout.
18
On top of that, Guitar, a R/Bioconductor package, is able to visualize RNA m6A
methylation sites with regard to the landmarks of RNA transcripts, i.e., transcription
starting site, start codon, stop codon and transcription ending site. These four landmarks
can divide a RNA transcript into three regions, including 5’UTR, CDS and 3’UTR.
Figure 5 shows the relative frequency of m6A sites on these three regions for two factors
(i.e. FTO-targeted m6A sites and transcriptome-wide m6A sites). Both FTO-targeted m6A
sites and transcriptome-wide m6A sites appear most frequently near stop codon, which
indicates that FTO-targeted m6A sites near stop codon could have a biological influence
on mRNA transcripts. In addition, FTO-targeted m6A sites also concentrate in the CDS.
Accurately, in the CDS, the frequency of m6A sites goes up steadily along transcript
length. Thus, FTO also mainly regulates m6A RNA modification in the CDS. Some
studies reveal that FTO is likely to affect the translation of the proteins encoded from
m6A modified mRNAs (Hess et al., 2013). However, the mechanism of how the position
and absolute number of m6A sites on mRNA transcripts could influence their translation
is still poorly understood. It is probably that this mechanism could be associated with
several mechanisms of post-transcriptional modification such as polyadenylation and
capping. Because human and mouse are both eukaryotes, the transcription and translation
occur at nucleus and cytoplasm separately, which means that mRNA generated from
nucleus should be exported from nuclear pores into cytoplasm for protein synthesis. It is
known that polyadenylation increases the stability of mRNA. The longer the poly-A
chain, the more stable the mRNA. In addition, capping can help the mRNA to be
recognized by transport proteins which then transport the mRNA from the nucleus to the
cytoplasm (Shatkin & Manley, 2000). It is possible that m6A modification could
19
influence these two processes for the purpose of regulating protein expression. However,
this is just a hypothesis, which will require much time and work to discover whether this
hypothesis is correct or not. Nevertheless, the data of FTO-targeted m6A sites could be
very informative and efficient for those specialized researchers to study the effects of
certain mRNA transcripts’ m6A sites on protein expression, which might provide some
constructive information so as to discover a systematic and sound mechanism that makes
sense for all genes whose mRNA transcripts contain m6A sites.
Figure 5 Enrichment of m6A across the length of mRNA transcripts for two factors
(i.e. FTO-targeted m6A sites and transcriptome-wide m6A sites). The x-axis represents
the relative positions of 3 regions: 5’UTR, CDS, and 3’UTR on the mRNA transcript.
The y-axis represents the relative frequency of m6A sites along mRNA transcripts. The
20
frequencies less than 1 indicate that m6A sites seldom appear at these positions. The
frequencies equal to 1 indicate that m6A sites are randomly distributed on these positions.
The frequencies greater than 1 indicates that m6A sites are relatively intensively
distributed on these positions. The red and blue areas respectively show the enrichment of
FTO-targeted m6A sites and transcriptome-wide m6A sites on the mRNA transcript.
Differential expression between wild type and FTO knockout cell lines in mouse
midbrain
Cuffdiff is powerful program that can calculate expression level in two or more samples
and then test whether changes of expression between them are statistically significant or
not. Two cell lines of wild type and FTO knockout conditions underwent the analysis of
Cuffdiff; and the result from Cuffdiff is visualized by using CummeRbund. In the
“gene_exp.diff” file, there are 23,352 genes whose gene expression levels are changed
between these two conditions; however, all these changes are not statistically significant.
Meanwhile, Figure 6 also shows that there is not an apparent bias in gene expression
between two cell lines. This indicates that FTO gene might be not involved in the
regulation of gene expression level. Some studies found that although FTO has the ability
to oxidatively demethylate m3U and m3T in single-stranded DNA (ssDNA), FTO shows
low activity toward these two base modifications (Jia et al., 2011). Moreover, DNA
directly participates in transcription in terms of central dogma of genetics. Together,
compared to protein expression, FTO might be not relatively associated with gene
expression.
21
Figure 6 The difference between gene expression levels in wild type and FTO
knockout cell lines. Each dot on this figure represents one expressed gene. The x-axis
represents the value of FPKM, a normalization of gene expression level, in wild type cell
line. The greater value of FPKM indicates the higher gene expression level. The y-axis
represents the value of FPKM in FTO knockout cell line. The linear regression does not
show an obvious bias toward any cell line.
22
Functional enrichment analysis of FTO-targeted genes
DAVID functional annotation tool carried out the GO enrichment analysis of FTO -
targeted genes. The data of the GO enrichment analysis was saved in “Functional
Annotation chart.xls”. Here, Figure 7 only shows five representative terms associated
with FTO-targeted genes in “Functional Annotation chart.xls”, including phosphoprotein
(p-value 6.62e-40), alternative splicing (p-value 6.51e-25), synapse (p-value 2.85e-13),
ion binding (p-value 5.27e-12), and neuron projection (p-value 1.17e-09). Some studies
revealed that FTO can target a subset of mRNAs that participate in neuronal function
(Hess et al., 2013). They found that the proteins encoded from many FTO-targeted genes
are associated with neuronal signaling pathways, such as GRIN1 and GNAO1;
particularly, those are specifically involved in Dopaminergic (DA) signaling, including
PDE1b, GNAO1, DRD3, SYN1 (synapsin I) and GIRK2 (Hess et al., 2013). DA
signaling participates in the regulation of complex behaviors, particularly for food intake.
If this signaling is out of control, it might increase the risk of obesity because of food
addiction. Furthermore, the function of DA D2 receptor of the DA system is to a large
extent involved in food motivation and brain signaling in obesity (Baik, 2013). All in all,
FTO could indirectly regulate DA signaling so as to control body mass. Therefore, the
data from GO enrichment analysis by using DAVID can give some orientations to
specialized researchers, which is helpful for them to discover other mechanisms of how
FTO can be involved in controlling body mass.
23
Figure 7 Five representative terms that are enriched with FTO-targeted genes. The
x-axis represents the five terms’ names. The y-axis represents the p-value. The more less
a p-value, the more statistically significantly FTO is implicated in the corresponding
term.
24
A web application developed by Shiny
Shiny, a R package, was used to develop an interactive web application called
“Bioinformatics”, which contains the data of differential methylation sites from
exomePeak and GO enrichment analysis from DAVID. Although the data can be
viewable in “con_sig_diff_peak.xls” and “Functional Annotation chart.xls” by using
Microsoft Excel, it is often required to select and visualize a part of the data like the m6A
sites on a specific gene. Therefore, it is quite necessary to meet these requirements in
virtue of this interactive web application, which can be now accessed by
http://180.208.58.19:3838/sample-apps/m6A_v2/. It might require a little time for
application launch.
Figure 8 shows the first functional interface of this web application with a title called
“Gene’s m6A position”, which is capable of retrieval of m6A peak positions on FTO-
targeted genes derived from “con_sig_diff_peak.xls”. After inputting a gene name
(ENTREZ id, alias or symbol), there are two results shown, including a plot and a table.
The plot visualize the relative positions of the gene and FTO -targeted m6A peaks along
the genome, which can make users easily know the relative position of FTO-targeted m6A
peaks (blue) along the genes (orange). Moreover, one gene could have several isoforms.
Since m6A peaks were initially identified on mRNA, m6A peaks are divided by introns
into several blocks, which then are distributed on exons after transferring m6A peaks from
mRNA to genome. The table shows detailed information relative to the plot, including
start and end points of m6A peak position and the number, sizes and start points of blocks
for each m6A peak. And the start point of the first block for each m6A peak defaults to 0.
25
Figure 8 The first functional interface for retrieval of m6A peak positions on FTO-
targeted genes. The notes in red give some details about how to use this function and
understand the results in the plot and table.
Figure 9 shows the second functional interface of this web application with a title called
“DAVID”, which is capable of retrieval of terms enriched with FTO-targeted genes
derived from “Functional Annotation Chart.xls”. After inputting a term that could be
regarded as a gene product property, there is a table shown. In this table, the category
represents the original database the term is come from. According to hypergeometric test,
fold enrichment is calculated from the values of “Count”, “List Total”, “Pop Hits” and
“Pop Total” in the table (fold enrichment = (“Count” / “List Total”) / (“Pop Hits” / “Pop
26
Total”) ). Fold enrichment can represent how intensively FTO is implicated in terms users
are interested in. In addition, if p-value is less than 0.05, which means that the calculated
fold enrichment is statistically significant.
Figure 9 The second functional interface for retrieval of terms enriched with FTO-
targeted genes. The notes in red explain some details about how to use this function and
understand the results in the table.
However, this web application also can be improved to some extent in order to empower
users to further customize the analysis of this study for their specific needs and extract
more insight from the data. There are two new functions that can be added to the second
functional interface. For the first one, users can input a category like a database of Type I
diabetes. Then the server will output several most statistically significant terms; in other
words, these terms have the least p-values. Certainly, the number of these terms can be
modifiable for different users. As a result, specialized researchers only need to focus on
the terms which are most relevant to their own fields. For example, a researcher who is
27
dedicated to the research of Type I diabetes can utilize this function to know whether
Type I diabetes is related to FTO by virtue of some terms. If this relationship exists, this
researcher then could refer to several most statistically significant terms outputted from
this function and do some related experiments so as to discover how FTO could be
associated with Type I diabetes by regulating these terms. For the second one, users can
input a gene list. Then the server will output which genes belong to FTO-targeted genes
and the terms these matched genes belong to. This function is very useful. For example, a
researcher probably finds some genes that regulate certain signaling pathway, which
might be associated with body mass. This researcher can upload this gene list and then
know whether some genes of this gene list are FTO-targeted genes, which could be
helpful to improve comprehension for the regulatory mechanisms of this signaling
pathway in regard to FTO-dependent demethylation.
Acknowledgement
The author appreciated the supervision and guidance from Dr. Jia Meng, Department of
Biological Sciences, XJTLU.
References
Baik, J. (2013) ‘Dopamine signaling in food addiction: role of dopamine D2 receptors’,
BMB Rep, 46(11), pp.519-526.
Dominissini, D. et al. (2012) ‘Topology of the human and mouse m6A RNA methylomes
revealed by m6A-seq’, Nature, 485, pp.201-206.
28
Fawcett, K.A. & Barroso, I. (2010) ‘The genetics of obesity: FTO leads the way’, Trends
Genet, 26, pp.266-274.
Fischer, J., Koch, L., Emmerling, C., Vierkotten, J., Peters, T., Bru¨ ning, J.C. & Ru¨ ther,
U. (2009) ‘Inactivation of the Fto gene protects from obesity’, Nature, 458, pp.894-898.
Hess, M.E., Hess, S., Meyer, K.D., Verhagen, L.A., Koch, L., Bronneke, H.S., Dietrich,
M.O., Jordan, S.D., Saletore, Y., Elemento, O., Belgardt, B.F., Franz, T., Horvath, T.L.,
Ruther, U., Jaffrey, S.R., Kloppenburg, P., Bruning, J.C. & Neurosci, N. (2013) ‘The fat
mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain
circuitry’, Nature Neuroscience, 16, pp.1042-1048.
Huang, D.W., Sherman, B.T. & Lempicki, R.A. (2008) ‘Systematic and integrative
analysis of large gene lists using DAVID bioinformatics resources’, Nat. Protocols, 4,
pp.44-57.
Jia, G., Fu, Y., Zhao, X., Dai, Q., Zheng, G., Yang, Y., Yi, C., Lindahl, T., Pan, T., Yang,
Y.G. et al. (2011) ‘N6-methyladenosine in nuclear RNA is a major substrate of the
obesity-associated FTO’, Nat. Chem. Biol, 7, pp.885-887.
Meng, J., Lu, Z., Liu, H., Zhang, L., Zhang, S, Chen, Y., Rao, M.K. & Huang, Y. (2014)
‘A protocol for RNA methylation differential analysis with MeRIP-Seq data and
29
exomePeak R/Bioconductor package’, Methods, 69 (3), pp.274-281.
Meng, J. (2015a) An Introduction to exomePeak [Online]. Available from:
http://www.bioconductor.org/packages/release/bioc/vignettes/exomePeak/inst/doc/exome
Peak-Overview.pdf (Accessed: 14 April 2016).
Meng, J. (2015b) An Introduction to Guitar Package [Online]. Available from:
http://www.bioconductor.org/packages/release/bioc/vignettes/Guitar/inst/doc/Guitar-
Overview.pdf (Accessed: 14 April 2016).
Meyer, K.D., Saletore, Yogesh., Zumbo, P., Elemento, Olivier., Mason, C.E. & Jaffrey,
S.R. (2012) ‘Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 30
UTRs and near Stop Codons’, Cell, 149, pp.1635-1646.
Shatkin, A.J. & Manley, J.L. (2000) ‘The ends of the affair: capping and
polyadenylation’, Nat Struct Biol, 7(10), pp.838-842.
The Gene Ontology Consortium (2008) ‘The Gene Ontology project in 2008’, Nucleic
Acids Res, 36 (Database issue), pp.D440-444.
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H.,
Salzberg, S.L., Rinn, J.L. & Pachter, L. (2012) ‘Differential gene and transcript
expression analysis of RNA-seq experiments with TopHat and Cufflinks’, Nat. Protocols,
30
7, pp.562-578.
Wojciechowski, J., Hopkins, A.M. & Upton, R.N. (2015) ‘Interactive Pharmacometric
Applications Using R and the Shiny Package’, CPT Pharmacometrics Syst. Pharmacol,
4, pp.146-159.
31
Appendices (saved in a disc)
Appendix I – The scripts of all steps described in Methods
A folder called “Appendix I” contains all scripts, including:
“Script 1 (Bash script).txt”,
“Script 2 (R script).txt”,
“Script 3 (R script).txt”,
“Script 4 (Bash script).txt”,
“Script 5 (Bash script).txt”,
“Script 6 (R script).txt”,
“Script 7_UI (R script).txt”,
“Script 7_Server (R script).txt”.
Appendix II – The results from certain steps that are not shown in Results and
Discussion
A folder called “Appendix II” contains three subfolders for all steps, including:
“exomePeak”: “con_sig_diff_peak.xls” and “diff_peak.xls”;
“DAVID”: “Functional Annotation chart.xls”;
“Cuffdiff”: “gene_exp.diff”.

More Related Content

What's hot

Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...
Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...
Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...rajmaha9
 
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...Mayi Suárez
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networksMadiheh
 
Prediction of disorder in protein structure (amit singh)
Prediction of disorder in protein structure (amit singh)Prediction of disorder in protein structure (amit singh)
Prediction of disorder in protein structure (amit singh)Amit Singh
 
MicroRNA-Disease Predictions Based On Genomic Data
MicroRNA-Disease Predictions Based On Genomic DataMicroRNA-Disease Predictions Based On Genomic Data
MicroRNA-Disease Predictions Based On Genomic Dataijtsrd
 
Proteoimic presentation
Proteoimic presentationProteoimic presentation
Proteoimic presentationSham Sadiq
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interactionAashish Patel
 
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)Suresh Antre
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Sai Ram
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interactionsunil kaintura
 
Brief Introduction of Protein-Protein Interactions (PPIs)
Brief Introduction of Protein-Protein Interactions (PPIs)Brief Introduction of Protein-Protein Interactions (PPIs)
Brief Introduction of Protein-Protein Interactions (PPIs)Creative Proteomics
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtaiSirris
 
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva  GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva Nelson Giovanny Rincon S
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyAbhijeet Kashyap
 

What's hot (20)

Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...
Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...
Simplified receptor based pharmacophore approach to retrieve potent ptp lar i...
 
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
71st ICREA Colloquium - Intrinsically disordered proteins (IDPs) the challeng...
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networks
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Prediction of disorder in protein structure (amit singh)
Prediction of disorder in protein structure (amit singh)Prediction of disorder in protein structure (amit singh)
Prediction of disorder in protein structure (amit singh)
 
MicroRNA-Disease Predictions Based On Genomic Data
MicroRNA-Disease Predictions Based On Genomic DataMicroRNA-Disease Predictions Based On Genomic Data
MicroRNA-Disease Predictions Based On Genomic Data
 
Proteoimic presentation
Proteoimic presentationProteoimic presentation
Proteoimic presentation
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)Peptide Mass Fingerprinting  (PMF) and Isotope Coded Affinity Tags (ICAT)
Peptide Mass Fingerprinting (PMF) and Isotope Coded Affinity Tags (ICAT)
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Brief Introduction of Protein-Protein Interactions (PPIs)
Brief Introduction of Protein-Protein Interactions (PPIs)Brief Introduction of Protein-Protein Interactions (PPIs)
Brief Introduction of Protein-Protein Interactions (PPIs)
 
Proteomics
ProteomicsProteomics
Proteomics
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva  GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
 
Slides 0
Slides 0Slides 0
Slides 0
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathology
 

Similar to Liu_Jiangyuan_1201662_FR

Internship Report
Internship ReportInternship Report
Internship ReportNeha Gupta
 
2016 micro rna in control of gene expression an overview of nuclear functions
2016   micro rna in control of gene expression an overview of nuclear functions2016   micro rna in control of gene expression an overview of nuclear functions
2016 micro rna in control of gene expression an overview of nuclear functionsAntar
 
Liu_Jiangyuan_1201662_Presentation
Liu_Jiangyuan_1201662_PresentationLiu_Jiangyuan_1201662_Presentation
Liu_Jiangyuan_1201662_Presentation姜圆 刘
 
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...J. Colin Cox
 
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterAnalysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
 
ConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenRony Armon
 
Arrays and alternative splicing
Arrays and alternative splicingArrays and alternative splicing
Arrays and alternative splicingAnn Loraine
 
Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95Jessica Ayache
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Keiji Takamoto
 
Mol Cell Proteomics-2013-Koytiger-1204-13
Mol Cell Proteomics-2013-Koytiger-1204-13Mol Cell Proteomics-2013-Koytiger-1204-13
Mol Cell Proteomics-2013-Koytiger-1204-13Greg Koytiger
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013Iddo
 
Summer internship at University of Tokyo
Summer internship at University of TokyoSummer internship at University of Tokyo
Summer internship at University of TokyoVaibhav Kulshrestha
 
Degradome sequencing and small rna targets
Degradome sequencing and small rna targetsDegradome sequencing and small rna targets
Degradome sequencing and small rna targetsAswinChilakala
 
Analysing curated protein targets: Partitioning the drugged and the druggable
Analysing curated protein targets: Partitioning the drugged and the druggable Analysing curated protein targets: Partitioning the drugged and the druggable
Analysing curated protein targets: Partitioning the drugged and the druggable Chris Southan
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumVivek Krishnakumar
 

Similar to Liu_Jiangyuan_1201662_FR (20)

Internship Report
Internship ReportInternship Report
Internship Report
 
2016 micro rna in control of gene expression an overview of nuclear functions
2016   micro rna in control of gene expression an overview of nuclear functions2016   micro rna in control of gene expression an overview of nuclear functions
2016 micro rna in control of gene expression an overview of nuclear functions
 
Liu_Jiangyuan_1201662_Presentation
Liu_Jiangyuan_1201662_PresentationLiu_Jiangyuan_1201662_Presentation
Liu_Jiangyuan_1201662_Presentation
 
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
 
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir FilterAnalysis of Genomic and Proteomic Sequence Using Fir Filter
Analysis of Genomic and Proteomic Sequence Using Fir Filter
 
Genome editing tools article
Genome editing tools   articleGenome editing tools   article
Genome editing tools article
 
ConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_iden
 
Sirm core2 (2)
Sirm core2 (2)Sirm core2 (2)
Sirm core2 (2)
 
Arrays and alternative splicing
Arrays and alternative splicingArrays and alternative splicing
Arrays and alternative splicing
 
Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
 
Mol Cell Proteomics-2013-Koytiger-1204-13
Mol Cell Proteomics-2013-Koytiger-1204-13Mol Cell Proteomics-2013-Koytiger-1204-13
Mol Cell Proteomics-2013-Koytiger-1204-13
 
CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013CAFA poster presented at CSHL Genome Informatics 2013
CAFA poster presented at CSHL Genome Informatics 2013
 
Summer internship at University of Tokyo
Summer internship at University of TokyoSummer internship at University of Tokyo
Summer internship at University of Tokyo
 
Degradome sequencing and small rna targets
Degradome sequencing and small rna targetsDegradome sequencing and small rna targets
Degradome sequencing and small rna targets
 
Analysing curated protein targets: Partitioning the drugged and the druggable
Analysing curated protein targets: Partitioning the drugged and the druggable Analysing curated protein targets: Partitioning the drugged and the druggable
Analysing curated protein targets: Partitioning the drugged and the druggable
 
Araport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD MinisymposiumAraport Data Integration - 2015 UMD Minisymposium
Araport Data Integration - 2015 UMD Minisymposium
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 

Liu_Jiangyuan_1201662_FR

  • 1. 1 Honours Project BIO303 2015/16 Final Report Name: Jiangyuan Liu Title: The analysis of FTO-targeted mRNA m6 A methylation in mouse transcriptome Word count: 4324 Supervisor: Dr. Jia Meng Biological Sciences
  • 2. 2 Content Abstract 3 1. Introduction 4 1.1 FTO gene 4 1.2 MeRIP-Seq 5 1.3 Gene Ontology 7 2. Methods 8 3. Results and Discussion 15 3.1 Differential RNA methylation between wild type and FTO knockout cell lines in mouse midbrain 15 3.2 Differential expression between wild type and FTO knockout cell lines in mouse midbrain 20 3.3 Functional enrichment analysis of FTO-targeted genes 22 3.4 A web application developed by Shiny 24 4. Acknowledgement 27 5. References 27 6. Appendices 31
  • 3. 3 Abstract Recent studies have found that methylation of the N6 position of adenosine (m6A) is a common base modification on mRNAs. Meanwhile, it has discovered that the demethylase encoded from fat mass and obesity-associated (FTO) gene is able to carry out m6A modification on mRNA. Therefore, there might be a connection between m6A modification and physiological functions, which may give researchers some information to understand the mechanisms about how FTO demethylase can be associated with obesity through regulating m6A modification. Here, this study aims to identify the differential methylation sites between wild type and FTO knockout cell lines in mouse midbrain, which could indicate FTO-targeted m6A sites and genes. This study has found that FTO-targeted m6A sites mainly appear around stop codon and on coding sequence (CDS). In addition, the analysis of differential expression between these two cell lines has found that there are no statistically significantly changes of expression levels, indicating that FTO might not participate in regulation of gene expression levels. Gene Ontology (GO) enrichment analysis has revealed that there are many functions which have an association with FTO; particularly, FTO could regulate a selective subset of mRNAs whose biological functions are specifically related to neuronal signal transduction. Finally, this study developed a web application by using Shiny, a R package, in order to allow users to customize the analysis of this study for their specific needs and extract more insight from the data, which is helpful to explore the physiological roles of m6A modification on mRNAs.
  • 4. 4 Introduction FTO gene The fat mass and obesity-associated (FTO) gene plays an important role in regulating energy utilization and metabolism (Fischer et al., 2009). As for human beings, the increase of FTO expression is related to the high body mass index and risk for obesity (Fawcett and Barroso, 2010). Some studies have found that FTO demethylase has the ability to demethylate N6-methyladenosine (m6A) (Figure 1), which is the adenosine modification without artificial manipulation (Jia et al., 2011). These studies revealed that there might be a relationship between adenosine modification and its physiological roles, which participate in human biological process (Meyer et al., 2012). Figure 1 FTO catalyzes the conversion of N6-methyladenosine in mRNA to adenosine.
  • 5. 5 MeRIP-Seq m6A-specific methylated RNA immunoprecipitation with next generation sequencing (MeRIP-Seq) is powerful method, which can be used to localize transcriptome-wide m6A. Some studies showed that 7,676 mammalian genes’ messenger RNA (mRNA) contain m6A, which indicates that m6A modification is widespread on mRNAs encoded from many genes (Meyer et al., 2012). Here, it is necessary to introduce the procedure of MeRIP-Seq (Figure 2). First of all, rRNA species should be removed from total RNA by RiboMinus treatment. Secondly, purified mRNA is cut into ~100-nt-long oligonucleotides by chemical fragmentation. Thirdly, those reads with m6A could be immunoprecipitated by using m6A antibody-coupled Dynabeads. Fourthly, the reads eluted from these Dynabeads and untreated input control reads are ligated to sequencing adaptors and then converted to cDNA, which is then amplified by PCR in order to be sequenced and aligned to a reference transcript database (Dominissini et al., 2012). After compared with m6A signals from input sample, some distinct peaks from immunoprecipitated sample represent high enrichment of reads, indicating that these peaks are m6A signal peaks. Since an m6A site might sit on any position along a 100nt fragment, it is necessary to make the m6A site close to a center. As a result, m6A sites can be represented as peaks whose base and midpoint are ~200nt and ~100nt wide respectively, which means a resolution of 200 nt for m6A sites identification by MeRIP-Seq.
  • 6. 6 Figure 2 Outline of MeRIP-SeqProtocol Moreover, it is necessary to explain why the untreated input control sample is indispensable for MeRIP-Seq. In fact, the transcriptome-wide RNA methylation is under the control of both transcriptional and enzymatic regulations. It is possible that because of differential expression, one gene might transcript more copies of its RNA under certain conditions, which causes an increase of the absolute amount of this RNA. However, this cannot indicate a stronger RNA enzymatic hypermethylation (Meng et al., 2014). Thus, MeRIP-Seq must include an input sample in order to estimate the number of RNA molecules, which is only influenced by gene expression. In other words, the input signal seems like a background compared to m6A RNA immunoprecipitation signal. The
  • 7. 7 difference between them is attributed to the real enzymatic methylation. Gene Ontology Gene ontology (GO) is a major innovative project of bioinformatics in order to unify the characteristics of gene and gene product attributes existing in all species (The Gene Ontology Consortium, 2008). “Ontologies” represent the characteristics of detectable things, and how these things interact with each other. An ontology of defined terms offered by GO project represents gene product properties, which is helpful for different specialized biologists to communicate and share their information. There are three domains in this ontology, including cellular component, molecular function and biological process. And the defined terms are divided into these three domains. Moreover, other organizations or databases also have built their own ontology systems as same as GO project. These ontology systems represent different categories whose databases focus on different aspects such as diseases and research organizations. Here, this study identified the differential RNA m6A sites between wild type and FTO knockout cell lines in mouse midbrain by analyzing a MeRIP-Seq dataset downloaded from Gene Expression Omnibus (GEO) database in order to know whether FTO controls demethylation of all m6A-modified mRNAs or a distinct subset of these mRNAs. Then those FTO-targeted genes underwent GO enrichment analysis. In addition, this study also found out whether differentially expressed genes exist under wild type condition and FTO knockout condition, indicating whether FTO regulates gene expression or not. Eventually, an interactive web application was developed and then put online in order to
  • 8. 8 user-friendly share the results of this study everywhere, which would be helpful for other researchers to further study physiological roles of m6A modification regulated by FTO. Methods This study analyzed the MeRIP-Seq dataset (GEO GSE47217), which measures the transcriptome-wide m6A profiles for wild type and FTO knockout cell lines in mouse midbrain (Hess et al., 2013). The Bash UNIX Shell and R system were used to analyze this MeRIP-Seq dataset. In short, this study began from downloading raw data from GEO database, then carried out reads alignement, m6A site detection, the analysis of differential methylation, the analysis of m6A mRNA topology on 3’UTR (Untranslated Regions), CDS (Coding Sequence) and 5’UTR, transcriptome-wide m6A site visualization, functional annotation, differentially expression detection, web application development. The detailed of each step are described in the following content. First of all, there are three biological replicates for each of wild type and FTO knockout cell lines in the raw data (GEO GSE47217). Each replicate consists of an immunoprecipitated (IP) sample and an input control sample. The raw data downloaded from GEO database are SRA files, which first need to be converted to FASTQ files by using a tool called “Fastq-dump” on the Bash UNIX Shell system. Then the reads for each condition in these FASTQ files were mapped to the reference genome with TopHat, which is a powerful software tool that can address the limitation of Bowtie, another short- read aligner. It is incapable for Bowtie to align reads that span introns. However, TopHat has the ability to identify that a read spans a splice junction and possible junction’s splice
  • 9. 9 sites, which increase the accuracy of reads alignment (Trapnell et al., 2012). The results of alignment were saved in twelve bam files. The code used in this step is saved in “Script 1 (Bash script).txt” attached to Appendix I. Secondly, exomePeak, a R/Bioconductor package, is the vitally software tool of this study, which was used to detect RNA m6A sites and further identify the differential RNA m6A sites in term of percentage rather the absolute amount in this case control study. This package is designed for the analysis of affinity-based epitranscriptome shortgun sequencing data from MeRIP-seq (i.e. m6A-Seq). Moreover, exomePeak R-package can statistically analyze multiple biological replicates at the same time; it can also internally remove PCR artifacts and multi-mapping reads (Meng, 2015a). Since PCR artifacts are not derived from immunoprecipitated sample and input control sample, they cannot be used to represent signal peaks. Multi-mapping reads should also be removed because they could increase the experimental errors. In the course of the analysis of exomePeak, the IP samples of both cell lines were firstly compared with their input samples in the common replicates to acquire the difference of percentage of reads number so as to detect RNA m6A peaks. Then these two differences of percentage of reads number from wild type and FTO knockout cell lines were compared with each other to identify the differential RNA m6A peaks. Since there are three biological replicates of bam files for each of wild type and FTO knockout cell lines after TopHat, three datasets of the differential RNA m6A peaks were acquired. Furthermore, exomePeak screened all the consistently differentially methylated peaks. In other words, there are peaks that are consistently differentially methylated in all these three datasets acquired previously, indicating highly confidence.
  • 10. 10 Therefore, FTO-targeted m6A peaks could be these consistently differentially methylated peaks. The data of consistently differentially methylated peaks and genes were saved in both “con_sig_diff_peak.bed” and “con_sig_diff_peak.xls”. Microsoft Excel can view “con_sig_diff_peak.xls” saved in a folder called “exomePeak” attached to Appendix II. Meanwhile, exomePeak also identified all m6A peaks. Specifically, the data of the numbers of each read in six IP samples and six INPUT samples from two cell lines are respectively integrated to two uniform datasets (i.e. Uniform IP and Uniform INPUT). Then they compared with each other to find out all highly enriched peaks, indicating m6A peaks. The data of all these detected peaks and genes were saved in both “diff_peak.bed” and “diff_peak.xls”. Microsoft Excel can view “diff_peak.xls” saved in a folder called “exomePeak” attached to Appendix II. The code used in this step is saved in “Script 2 (R script).txt” attached to Appendix I. Thirdly, Guitar, a R/Bioconductor package, was used to detect the distribution of FTO- targeted m6A peaks on the coordinate of a transcript, which then was compared with the distribution of transcriptome-wide m6A peaks. In this step, “con_sig_diff_peak.bed” and “diff_peak.bed” acquired from step 2 respectively contain the FTO-targeted m6A peaks and all the detected m6A peaks. Meanwhile, it also needs to convert a transcriptDb file called “mm10.txdb” that contains the gene annotation information to Guitar coordiantes, which is required to link the transcriptomic landmarks and genomic coordinates together (Meng, 2015b). Finally, by using these three objects, a function called “GuitarPlot” was used to generate a plot, which shows the relative frequency of m6A sites on 5’UTR (Untranslated Regions), CDS (Coding Sequence) and 3’UTR for both factors (i.e. FTO-
  • 11. 11 targeted m6A sites and all the detected m6A sites). The code used in this step is saved in “Script 3 (R script).txt” attached to Appendix I. Fourthly, Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. This step made use of this tool to visualize the FTO-targeted m6A sites and aligned bam files respectively acquired from step 2 and step 1. The BED file called “con_sig_diff_peak.bed” can be directly visualized in the IGV browser. Igvtools (part of IGV) was used to convert these bam files to viewable TDF format. Finally, these generated TDF files and the BED file were together visualized by using IGV browser. The code of the conversion from bam files to TDF files is saved in “Script 4 (Bash script).txt” attached to Appendix I. Fifthly, “con_sig_diff_peak.xls” contains the information of the consistently differentially methylated sites and genes. Since it is known that FTO is a m6A demethylase, knocking out FTO gene should result in hypermethylation of its target sites. The symbols of those genes whose mRNAs have hypermethylation sites (i.e. differential log2 fold change (diff.log2.fc) > 0) were extracted from “con_sig_diff_peak.xls” and underwent Gene Ontology (GO) enrichment analysis by using the DAVID functional annotation tool (Huang et al., 2008): https://david.ncifcrf.gov/summary.jsp. Actually, hypergeometric test is the statistical method behind this analysis. Specifically, there are two ratios participating in this hypergeometric test. The first ratio is the number of genes associated with one term in a specific database divided by the number of all genes in this database. The second ratio is the number of genes that simultaneously belong to the same term and
  • 12. 12 the uploaded FTO-targeted gene list divided by the number of all genes in the uploaded FTO-targeted gene list. Finally, fold enrichment is the second ratio divided by the first ratio. At the same time, generated p value is used to decide whether fold enrichment is statistically significant or not. At last, DAVID outputted several core elements including the enriched terms, the subsets of FTO-targeted genes that belong to their corresponding terms, fold enrichment, and p-values. The data was saved in “Functional Annotation chart.xls”, which was then saved in a folder called “DAVID” attached to Appendix II. Sixthly, Cuffdiff can calculate expression in two or more samples and test whether changes of expression level between them are statistically significant or not (Trapnell et al., 2012). In this step, running Cuffdiff requires six aligned bam files of input samples from both cell lines along with the mouse reference transcriptome saved in “genes.gtf”. All data generated by a Cuffdiff analysis was saved in a folder called “CuffdiffOut_S9”. And the data of changes of gene expression level and p-values saved in “gene_exp.diff” can be viewed with spreadsheet and charting programs such as Microsoft Excel. The file “gene_exp.diff” was saved in a folder called “Cuffdiff” attached to Appendix II. However, it is difficult to browse the global changes and trend in gene expression between wild type condition and FTO knockout condition. Fortunately, CummeRbund, a R/Bioconductor package, can help visualize all data generated by a Cuffdiff analysis, which transform the Cuffdiff data from Bash UNIX Shell system into the R statistical computing environment, making it possible that other advanced statistical analysis and plotting packages have more access to RNA-Seq expression analysis with Cuffdiff (Trapnell et al., 2012). A function called “csScatter” was used to generate scatterplots by
  • 13. 13 using Cuffdiff data, which can identify biases in gene expression between wild type condition and FTO knockout condition. The codes of Cuffdiff analysis and CummeRbund are respectively saved in “Script 5 (Bash script).txt” and “Script 6 (R script).txt” attached to Appendix I. Seventhly, Shiny, a R package, is able to develop interactive web applications, which allows researchers to customize applications, servers of which can efficiently process and analyze the user input and give feedbacks to user interface in real-time (Wojciechowski et al., 2015). Since there is a mass of data of FTO-targeted m6A peaks from exomePeak and GO enrichment analysis from DAVID, this problem makes it difficult to retrieve any FTO-targeted m6A peak or any term enriched with FTO-targeted genes. As a result, it is indispensable to develop an interactive application to meet these requirements by using Shiny. Moreover, Shiny Server is a server program that makes Shiny applications available over the web. At last, Shiny Server put this shiny web application online, which can share this application with the world. The codes of User Interface (UI) and Server of this web application are separately saved in “Script 7_UI (R script).txt” and “Script 7_Server (R script).txt” attached to Appendix I. In summary, Figure 3 shows a flowchart that can be helpful for readers to review the whole procedure of this study and understand the following content.
  • 14. 14 Figure 3 The whole procedure of this study shown in the flowchart. There are nine tools written in bold (i.e. Fastq-dump, Tophat, exomePeak, Guitar, IGV, Cuffdiff, CummeRbund, DAVID and Shiny). All results of this study are represented in dotted textboxes. Some significant results will be discussed in Results and Discussion; others are attached to Appendix.
  • 15. 15 Results and Discussion Differential RNA methylation between wild type and FTO knockout cell lines in mouse midbrain The MeRIP-seq data this study analyzed includes three biological replicates for each cell line (i.e. wild type and FTO knockout cell lines). After alignment by TopHat and differential methylation detection by exomePeak, there are 1,132 consistently differential m6A peaks between these two cell lines, which are saved in “con_sig_diff_peak.xls”. Since FTO is a demethylase, the diff.log2. fc should be larger than 0, which means these consistently differential m6A peaks are hypermethylation peaks. Actually, only 3 differential m6A peaks’ diff.log2. fc are less than 0, probably because of biological variability between replicates of the same experiment and because of technical variability during library preparation and sequencing (Trapnell et al., 2012). Therefore, this result accords with the fact that FTO is a m6A-specific demethylase. Moreover, there are 15,731 genes containing m6A sites in terms of transcriptome-wide m6A profile, which are saved in “diff_peak.xls”. The 1,129 hypermethylation sites (i.e. FTO-targeted sites) are distributed on 912 genes’ transcripts, which is approximately 5.8 percent of all genes whose mRNA transcripts contain m6A sites (i.e. 15,731 genes). This result indicates that FTO plays an important role in regulating expressed genes in mouse midbrain. Table 1 shows the top 20 genes that contain m6A peaks with the highest levels of enrichment. Notably, because the resolution of m6A site detection for MeRIP-Seq is 200 nt, it is possible that the area of one peak could cover multiple individual m6A residues. Nevertheless, the peaks with high levels of methylation indicate that their transcripts are very likely influenced by the demethylation activity of FTO.
  • 16. 16 Table 1 The top 20 genes encoding transcripts with the highest degree of m6A enrichment. Gene names are represented as gene symbols; and the positions of their m6A peaks are defined by three parameters: chromosome no., peak start site and peak end site. Zfp3612, the gene with the highest level of m6A enrichment, is under the regulation of FTO to the largest extent. The IGV browser can visualize the detected differential methylation sites and the aligned bam files (i.e. the peaks of reads) (Figure 4). The peaks represent the degrees of read enrichment along the genome; and the black block represents the differential methylation peaks derived from the analysis of exomePeak. Although there is an obvious increase of methylated reads when comparing IP samples with INPUT samples, it is difficult for Chr Peak Start Peak End Gene Symbol Fold Enrichment chr17 84185184 84186233 Zfp36l2 209 chr19 46501647 46513192 Trim8 111 chr11 102436980 102438887 Fam171a2 89.3 chr7 96905888 96906877 Tenm4 84.4 chr11 59157755 59159128 Iba57 83.2 chr8 110956936 110962195 St3gal2 79.4 chr4 141467919 141469030 Spen 77.5 chr15 79996588 79997243 Pdgfb 75.7 chr7 45794300 45798497 Lmtk3 73.4 chr7 142081790 142083345 Dusp8 73.4 chr6 108662606 108665788 Bhlhe40 71.2 chr2 157469804 157470852 Src 70.2 chr2 160365055 160366255 Mafb 66.4 chr8 119941223 119947089 Usp10 66.3 chr10 81020572 81025377 Diras1 66 chr7 28466770 28467548 Lrfn1 63.9 chr11 116538886 116539692 Ube2o 63.7 chr2 30256998 30262502 Lrrc8a 63.1 chr19 5068613 5069507 Cd248 62.3 chr16 18151069 18152349 Rtn4r 61.9
  • 17. 17 naked eyes to identify the difference among IP samples or INPUT samples in either condition. Therefore, Figure 4 merely demonstrates the basic principle for exomePeak that has been described in Methods about how to identify one differential methylation site; rather, exomePeak is an accurate tool to offer quantified positions of all differential methylation sites rather than visualization. Figure 4 A differential methylation site and signal peaks for Gpr26 gene shown in IGV browser. When compared with the INPUT samples of two conditions, Gpr26 is somewhat down-regulated under FTO knockout condition. When compared with the IP samples of two conditions, the percent of methylated reads slightly increases under FTO knockout condition. Collectively, there is an RNA m6A hypermethylation site spanning the start codon of Gpr26 after FTO knockout.
  • 18. 18 On top of that, Guitar, a R/Bioconductor package, is able to visualize RNA m6A methylation sites with regard to the landmarks of RNA transcripts, i.e., transcription starting site, start codon, stop codon and transcription ending site. These four landmarks can divide a RNA transcript into three regions, including 5’UTR, CDS and 3’UTR. Figure 5 shows the relative frequency of m6A sites on these three regions for two factors (i.e. FTO-targeted m6A sites and transcriptome-wide m6A sites). Both FTO-targeted m6A sites and transcriptome-wide m6A sites appear most frequently near stop codon, which indicates that FTO-targeted m6A sites near stop codon could have a biological influence on mRNA transcripts. In addition, FTO-targeted m6A sites also concentrate in the CDS. Accurately, in the CDS, the frequency of m6A sites goes up steadily along transcript length. Thus, FTO also mainly regulates m6A RNA modification in the CDS. Some studies reveal that FTO is likely to affect the translation of the proteins encoded from m6A modified mRNAs (Hess et al., 2013). However, the mechanism of how the position and absolute number of m6A sites on mRNA transcripts could influence their translation is still poorly understood. It is probably that this mechanism could be associated with several mechanisms of post-transcriptional modification such as polyadenylation and capping. Because human and mouse are both eukaryotes, the transcription and translation occur at nucleus and cytoplasm separately, which means that mRNA generated from nucleus should be exported from nuclear pores into cytoplasm for protein synthesis. It is known that polyadenylation increases the stability of mRNA. The longer the poly-A chain, the more stable the mRNA. In addition, capping can help the mRNA to be recognized by transport proteins which then transport the mRNA from the nucleus to the cytoplasm (Shatkin & Manley, 2000). It is possible that m6A modification could
  • 19. 19 influence these two processes for the purpose of regulating protein expression. However, this is just a hypothesis, which will require much time and work to discover whether this hypothesis is correct or not. Nevertheless, the data of FTO-targeted m6A sites could be very informative and efficient for those specialized researchers to study the effects of certain mRNA transcripts’ m6A sites on protein expression, which might provide some constructive information so as to discover a systematic and sound mechanism that makes sense for all genes whose mRNA transcripts contain m6A sites. Figure 5 Enrichment of m6A across the length of mRNA transcripts for two factors (i.e. FTO-targeted m6A sites and transcriptome-wide m6A sites). The x-axis represents the relative positions of 3 regions: 5’UTR, CDS, and 3’UTR on the mRNA transcript. The y-axis represents the relative frequency of m6A sites along mRNA transcripts. The
  • 20. 20 frequencies less than 1 indicate that m6A sites seldom appear at these positions. The frequencies equal to 1 indicate that m6A sites are randomly distributed on these positions. The frequencies greater than 1 indicates that m6A sites are relatively intensively distributed on these positions. The red and blue areas respectively show the enrichment of FTO-targeted m6A sites and transcriptome-wide m6A sites on the mRNA transcript. Differential expression between wild type and FTO knockout cell lines in mouse midbrain Cuffdiff is powerful program that can calculate expression level in two or more samples and then test whether changes of expression between them are statistically significant or not. Two cell lines of wild type and FTO knockout conditions underwent the analysis of Cuffdiff; and the result from Cuffdiff is visualized by using CummeRbund. In the “gene_exp.diff” file, there are 23,352 genes whose gene expression levels are changed between these two conditions; however, all these changes are not statistically significant. Meanwhile, Figure 6 also shows that there is not an apparent bias in gene expression between two cell lines. This indicates that FTO gene might be not involved in the regulation of gene expression level. Some studies found that although FTO has the ability to oxidatively demethylate m3U and m3T in single-stranded DNA (ssDNA), FTO shows low activity toward these two base modifications (Jia et al., 2011). Moreover, DNA directly participates in transcription in terms of central dogma of genetics. Together, compared to protein expression, FTO might be not relatively associated with gene expression.
  • 21. 21 Figure 6 The difference between gene expression levels in wild type and FTO knockout cell lines. Each dot on this figure represents one expressed gene. The x-axis represents the value of FPKM, a normalization of gene expression level, in wild type cell line. The greater value of FPKM indicates the higher gene expression level. The y-axis represents the value of FPKM in FTO knockout cell line. The linear regression does not show an obvious bias toward any cell line.
  • 22. 22 Functional enrichment analysis of FTO-targeted genes DAVID functional annotation tool carried out the GO enrichment analysis of FTO - targeted genes. The data of the GO enrichment analysis was saved in “Functional Annotation chart.xls”. Here, Figure 7 only shows five representative terms associated with FTO-targeted genes in “Functional Annotation chart.xls”, including phosphoprotein (p-value 6.62e-40), alternative splicing (p-value 6.51e-25), synapse (p-value 2.85e-13), ion binding (p-value 5.27e-12), and neuron projection (p-value 1.17e-09). Some studies revealed that FTO can target a subset of mRNAs that participate in neuronal function (Hess et al., 2013). They found that the proteins encoded from many FTO-targeted genes are associated with neuronal signaling pathways, such as GRIN1 and GNAO1; particularly, those are specifically involved in Dopaminergic (DA) signaling, including PDE1b, GNAO1, DRD3, SYN1 (synapsin I) and GIRK2 (Hess et al., 2013). DA signaling participates in the regulation of complex behaviors, particularly for food intake. If this signaling is out of control, it might increase the risk of obesity because of food addiction. Furthermore, the function of DA D2 receptor of the DA system is to a large extent involved in food motivation and brain signaling in obesity (Baik, 2013). All in all, FTO could indirectly regulate DA signaling so as to control body mass. Therefore, the data from GO enrichment analysis by using DAVID can give some orientations to specialized researchers, which is helpful for them to discover other mechanisms of how FTO can be involved in controlling body mass.
  • 23. 23 Figure 7 Five representative terms that are enriched with FTO-targeted genes. The x-axis represents the five terms’ names. The y-axis represents the p-value. The more less a p-value, the more statistically significantly FTO is implicated in the corresponding term.
  • 24. 24 A web application developed by Shiny Shiny, a R package, was used to develop an interactive web application called “Bioinformatics”, which contains the data of differential methylation sites from exomePeak and GO enrichment analysis from DAVID. Although the data can be viewable in “con_sig_diff_peak.xls” and “Functional Annotation chart.xls” by using Microsoft Excel, it is often required to select and visualize a part of the data like the m6A sites on a specific gene. Therefore, it is quite necessary to meet these requirements in virtue of this interactive web application, which can be now accessed by http://180.208.58.19:3838/sample-apps/m6A_v2/. It might require a little time for application launch. Figure 8 shows the first functional interface of this web application with a title called “Gene’s m6A position”, which is capable of retrieval of m6A peak positions on FTO- targeted genes derived from “con_sig_diff_peak.xls”. After inputting a gene name (ENTREZ id, alias or symbol), there are two results shown, including a plot and a table. The plot visualize the relative positions of the gene and FTO -targeted m6A peaks along the genome, which can make users easily know the relative position of FTO-targeted m6A peaks (blue) along the genes (orange). Moreover, one gene could have several isoforms. Since m6A peaks were initially identified on mRNA, m6A peaks are divided by introns into several blocks, which then are distributed on exons after transferring m6A peaks from mRNA to genome. The table shows detailed information relative to the plot, including start and end points of m6A peak position and the number, sizes and start points of blocks for each m6A peak. And the start point of the first block for each m6A peak defaults to 0.
  • 25. 25 Figure 8 The first functional interface for retrieval of m6A peak positions on FTO- targeted genes. The notes in red give some details about how to use this function and understand the results in the plot and table. Figure 9 shows the second functional interface of this web application with a title called “DAVID”, which is capable of retrieval of terms enriched with FTO-targeted genes derived from “Functional Annotation Chart.xls”. After inputting a term that could be regarded as a gene product property, there is a table shown. In this table, the category represents the original database the term is come from. According to hypergeometric test, fold enrichment is calculated from the values of “Count”, “List Total”, “Pop Hits” and “Pop Total” in the table (fold enrichment = (“Count” / “List Total”) / (“Pop Hits” / “Pop
  • 26. 26 Total”) ). Fold enrichment can represent how intensively FTO is implicated in terms users are interested in. In addition, if p-value is less than 0.05, which means that the calculated fold enrichment is statistically significant. Figure 9 The second functional interface for retrieval of terms enriched with FTO- targeted genes. The notes in red explain some details about how to use this function and understand the results in the table. However, this web application also can be improved to some extent in order to empower users to further customize the analysis of this study for their specific needs and extract more insight from the data. There are two new functions that can be added to the second functional interface. For the first one, users can input a category like a database of Type I diabetes. Then the server will output several most statistically significant terms; in other words, these terms have the least p-values. Certainly, the number of these terms can be modifiable for different users. As a result, specialized researchers only need to focus on the terms which are most relevant to their own fields. For example, a researcher who is
  • 27. 27 dedicated to the research of Type I diabetes can utilize this function to know whether Type I diabetes is related to FTO by virtue of some terms. If this relationship exists, this researcher then could refer to several most statistically significant terms outputted from this function and do some related experiments so as to discover how FTO could be associated with Type I diabetes by regulating these terms. For the second one, users can input a gene list. Then the server will output which genes belong to FTO-targeted genes and the terms these matched genes belong to. This function is very useful. For example, a researcher probably finds some genes that regulate certain signaling pathway, which might be associated with body mass. This researcher can upload this gene list and then know whether some genes of this gene list are FTO-targeted genes, which could be helpful to improve comprehension for the regulatory mechanisms of this signaling pathway in regard to FTO-dependent demethylation. Acknowledgement The author appreciated the supervision and guidance from Dr. Jia Meng, Department of Biological Sciences, XJTLU. References Baik, J. (2013) ‘Dopamine signaling in food addiction: role of dopamine D2 receptors’, BMB Rep, 46(11), pp.519-526. Dominissini, D. et al. (2012) ‘Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq’, Nature, 485, pp.201-206.
  • 28. 28 Fawcett, K.A. & Barroso, I. (2010) ‘The genetics of obesity: FTO leads the way’, Trends Genet, 26, pp.266-274. Fischer, J., Koch, L., Emmerling, C., Vierkotten, J., Peters, T., Bru¨ ning, J.C. & Ru¨ ther, U. (2009) ‘Inactivation of the Fto gene protects from obesity’, Nature, 458, pp.894-898. Hess, M.E., Hess, S., Meyer, K.D., Verhagen, L.A., Koch, L., Bronneke, H.S., Dietrich, M.O., Jordan, S.D., Saletore, Y., Elemento, O., Belgardt, B.F., Franz, T., Horvath, T.L., Ruther, U., Jaffrey, S.R., Kloppenburg, P., Bruning, J.C. & Neurosci, N. (2013) ‘The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry’, Nature Neuroscience, 16, pp.1042-1048. Huang, D.W., Sherman, B.T. & Lempicki, R.A. (2008) ‘Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources’, Nat. Protocols, 4, pp.44-57. Jia, G., Fu, Y., Zhao, X., Dai, Q., Zheng, G., Yang, Y., Yi, C., Lindahl, T., Pan, T., Yang, Y.G. et al. (2011) ‘N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO’, Nat. Chem. Biol, 7, pp.885-887. Meng, J., Lu, Z., Liu, H., Zhang, L., Zhang, S, Chen, Y., Rao, M.K. & Huang, Y. (2014) ‘A protocol for RNA methylation differential analysis with MeRIP-Seq data and
  • 29. 29 exomePeak R/Bioconductor package’, Methods, 69 (3), pp.274-281. Meng, J. (2015a) An Introduction to exomePeak [Online]. Available from: http://www.bioconductor.org/packages/release/bioc/vignettes/exomePeak/inst/doc/exome Peak-Overview.pdf (Accessed: 14 April 2016). Meng, J. (2015b) An Introduction to Guitar Package [Online]. Available from: http://www.bioconductor.org/packages/release/bioc/vignettes/Guitar/inst/doc/Guitar- Overview.pdf (Accessed: 14 April 2016). Meyer, K.D., Saletore, Yogesh., Zumbo, P., Elemento, Olivier., Mason, C.E. & Jaffrey, S.R. (2012) ‘Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 30 UTRs and near Stop Codons’, Cell, 149, pp.1635-1646. Shatkin, A.J. & Manley, J.L. (2000) ‘The ends of the affair: capping and polyadenylation’, Nat Struct Biol, 7(10), pp.838-842. The Gene Ontology Consortium (2008) ‘The Gene Ontology project in 2008’, Nucleic Acids Res, 36 (Database issue), pp.D440-444. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L. & Pachter, L. (2012) ‘Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks’, Nat. Protocols,
  • 30. 30 7, pp.562-578. Wojciechowski, J., Hopkins, A.M. & Upton, R.N. (2015) ‘Interactive Pharmacometric Applications Using R and the Shiny Package’, CPT Pharmacometrics Syst. Pharmacol, 4, pp.146-159.
  • 31. 31 Appendices (saved in a disc) Appendix I – The scripts of all steps described in Methods A folder called “Appendix I” contains all scripts, including: “Script 1 (Bash script).txt”, “Script 2 (R script).txt”, “Script 3 (R script).txt”, “Script 4 (Bash script).txt”, “Script 5 (Bash script).txt”, “Script 6 (R script).txt”, “Script 7_UI (R script).txt”, “Script 7_Server (R script).txt”. Appendix II – The results from certain steps that are not shown in Results and Discussion A folder called “Appendix II” contains three subfolders for all steps, including: “exomePeak”: “con_sig_diff_peak.xls” and “diff_peak.xls”; “DAVID”: “Functional Annotation chart.xls”; “Cuffdiff”: “gene_exp.diff”.