Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Ā
Bioinformatic jc 08_14_2013_formal
1. Genome-wide variation of alternative
polyadenylation in sense and antisense
transcription in Arabidopsis accessions
Ā
Li Lei
Plant Pathology, KSU
lilei@ksu.edu
August 14, 2013
2. Outline
ā¢āÆ Background
ĆļāÆ Pre-mRNA processing & polyadenylation
ĆļāÆ Alternative polyadenylation (APA)
ĆļāÆ APA in plants & unknown questions
ā¢āÆ Objective
ā¢āÆ Method
ĆļāÆ Approach
ĆļāÆ PALMapper: map RNA-seq reads to reference
ĆļāÆ How I retrieved the poly(A) reads
ā¢āÆ Result
ĆļāÆ Evidence for APA
ĆļāÆ Poly(A) site location & related gene annotation
ā¢āÆ Conclusion
ā¢āÆ Outlook
ā¢āÆ Acknowledgements
3. Background
Eukaryotic pre-mRNA processing & polyadenylation
poly(A)
Ā site
Ā (PAS)
Ā
ā¢āÆ poly(A) site = PAS
ā¢āÆ Some genes, PASs of their mRNAs only in one place
ā¢āÆ Other, PASs of their mRNAs in different places
Freitag, et al. 2012
TCT GAG AAA AGT AAG TAA ... ... CAG GC CCT AGA CTG TAG..
S E K S K * S P R L *
Aspergillus nidulans: pgkA (PGK)c
pA1 pA2
RFPāPTS1PgkAGFPāSps19DIC Merge
pA1 -LPGVAALSEKSK* ā53.5
pA2 -LPGVAALSEKSPRL* +3.1
ESTs C terminus PTS1 (score)
Alternative polyadenylation (APA): Different mRNAs transcribed from the same
gene have different PASs
4. Alternative polyadenylation (APA)
Background
thus allowing these transcripts to evade miRNA-
mediated degradation. Transcripts are also subject to
transcript degradation but also stability. In a genome-
wide computational analysis of sequence and stability
Figure 1
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā²
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
3ā²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
dentical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
Mueller, et al. 2012
Tian, et al. 2013
differentiated cells are reprogrammed to ES cell-like in-
duced pluripotent stem (iPS) cells [41]. A notable excep-
tion, however, has been observed with spermatogonial
germ cells, whose reprogramming to ES cells involves 30
UTR lengthening [41]. Notably, this is in line with the fact
that germ cells are more proliferative than ES cells. Simi-
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
Cancer
Cancer cells are of co
with this, and consist
been found to express,
UTRs, as ļ¬rst shown
mouse B-cell leukem
recently in human colo
lung cancers [47]. In t
proļ¬le was found to
subtypes with differe
its relevance to cance
nostic marker. One ke
in cancer is whether p
major driver of APA. M
transformed and non
dicted proliferation ra
transformation has a
[44]. However, a recen
the same cells (BJ prim
lial cell line MCF10A)
formed states, pro
determinant of 30
UTR
of 30
UTR regulation i
that, compared to MC
and MB231 show sho
spectively. Notably, it
to the general trend,
adhesion genes, tend t
UTRs in cancer cells [4
delineated how APA o
different cancer types
APA is modulated by
miRNA
RBP
TranslaĘon DegradaĘonLocalizaĘon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
TiBS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
mediated degradation. Transcripts are also subject to wide computational analysis of sequence and stability
Figure 1
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā²
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
3ā²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā232
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā232
Adapted from Tress et al. 2007
Protein
Ā isoforms
Ā
depletion at t
downstream
tioning migh
ing the rate o
these observ
mental studie
and to estab
nucleosome o
Anotherw
to affect APA
genetic effect
tissues, in tw
Napl15), whi
genes(namel
cases,thepro
are therefore
Nature Reviews | Genetics
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
Favour distal poly(A) site usage Favour proximal poly(A) site usage
Figure 3 | Biological processes that have been linked with broad APA modulation.
A schematic showing the biological processes and diseases that alternative
polyadenylation(APA)hasbeenlinkedwith.Inaddition,thetendencytowardsdistal
orproximalpoly(A)siteusageisshown.
Elkon, et al. 2013
6. Investigate genome-wide variation of alternative
polyadenylation in sense and antisense transcription
across a set of Arabidopsis thaliana accessions
Ā
Objective
Objective
ā¢āÆ Is variation in APA as prevalent across genotypes as across
tissue types?
ā¢āÆ Is there genetic basis for variation related to the trans
regulation as well as cis of APA?
ā¢āÆ Does a geneās proximity to neighboring genes constrain
polyadenylation site choice and limit variation?
7. Approach
Method
82 bp Strand-specific RNA-seq
Map reads to each corresponding genome--
PALMapper
Transform read positions from each transcriptome
into a common coordinate system based on a
multiple-genome alignment
Retrieve polyA-containing reads, cluster across all
accessions and identify poly(A) site (PAS)
Generate read counts for each PAS for each
accession
Compare PASs genome-wide across accessions
19 accessions (genome sequenced)
SeedlingRoot Floral bud
RNA extraction & library construction with barcode
8. PALMapper: map RNA-seq reads to reference
ā¢āÆ PALMapper (Jean, et al. 2010)
ā¢āÆ A combination of:
the spliced alignment method QPALMA (De Bona, et al. 2008)
the short read alignment tool GenomeMapper (Schneeberger, et al. 2009)
http://ftp.raetschlab.org/software/palmapper/palmapper-0.5.tar.gz
Version
Ā 0.5
Ā released:
Ā
Method
Adapted from Kahles, et al. 2013 talk
Another Mapper?
Memorial Sloan-Kettering Cancer Cente
Advantages:
ā¢āÆ Alignments with variants, e.g. mismatches, indels
ā¢āÆ Accurate spliced alignments using computational splice site predictions
ā¢āÆ More accurate than TopHat (e.g. C. elegance 47% & 81%, respectively)
ā¢āÆ Fast alignments (about 10 million reads/hour)
ā¢āÆ Softtrimming for polyA tail of each read
9. Softtrimming
ā¢āÆ
Ā The sequence remain in bam file
ā¢āÆ Annotated with cigar āSā annotation
ā¢āÆ Ignored by many tools such as the IGV
10. How did I retrieve the poly(A) reads?
Method
The mapped sam file with softtrimmed poly(A)
Softtrimming
+
Ā
5ā
Ā 3ā
Ā
RNAseq_reads
Ā
5ā
Ā
3ā
Ā
Genome
Ā 5ā
Ā 3ā
Ā
AAAAAAAA
5ā
Ā +
Ā
Splicing
Ā length
Ā >=1500bp
Ā
Ā
Perl programming to pick up Poly(A) reads
Consecutive As in 3ā end of reads >=8bp
Quality score of each A >=40
Huge splicing
11. Defining poly(A) clusters (PAS)
Result
Identify poly(A) reads across accessions 2,203,313
Ā
Cluster poly(A) reads: 75,532 PASs
ā¢āÆ In the same orientation
ā¢āÆ Within 10bp of each other across all accessions
ā¢āÆ Total cluster interval spanning <= 24bp
Map PASs to genic regions (Ā±120bp to the
annotated range):
ā¢āÆ 93.4% PASs map to genic regions
ā¢āÆ 6.6% PASs further away from genic regions
Consider the sense & antisense PASs:
ā¢āÆ Poly(A) reads orientation relative to the gene
orientation
ā¢āÆ 6581 genes with >= 20 sense poly(A) reads
across accessions
ā¢āÆ 1473 genes with >= 10 antisense poly(A) reads
across accessions
12. Reads mapping to the major and non-major poly(A)
cluster within gene
Result
ā¢āÆ Major PAS: the PAS with the most
reads across all accessions for
each gene
ā¢āÆ p = proportion of total reads in gene
mapping to major PAS
ā¢āÆ q = 1-p = proportion of total reads in
gene mapping to non-major PASs
13. The distribution of the proportion of reads mapping to non-
major sense & antisense poly(A) clusters per gene
Genes with the proportion of non-major cluster reads equal to or greater than 0.4
( indicated with gray dashed lines) were considered as containing alternative poly(A) sites
and chosen for further polymorphic analysis
Result
6581 gene with sense PASs 1471 gene with antisense PASs
14. Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
Result
ĀÆD =
1
n
n 1X
i=1
nX
j=i+1
Dij
ā¢āÆ For the ith and jth accessions Ai, and Aj, we can calculate their absolute
difference of the proportion of reads mapping to non-major poly(A) cluster,
here called Dij, Dij = |qAi ā qAj|
ā¢āÆ Average pairwise difference:
Where n=19
ā¢āÆ Maximum pairwise difference:
Dmax = max{Dij}
15. Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
3074 genes with sense PAS
Result
Average pairwise difference Maximum pairwise difference Dmax
16. Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
544 genes with antisense PAS
Result
Maximum pairwise difference DmaxAverage pairwise difference
17. Gene position and antisense PAS
Result
Nearby gene: the distance apart from its adjacent gene <=2kb
Groups Fraction of
genes in
each group
Fraction of genes
with sense
poly(A) reads
>=20
Fraction of genes
with proportion of
non-major sense
PASs>0.4
Fraction of genes
with antisense
poly(A) reads
>=10
Fraction of genes with
proportion of non-
major antisense
PASs>0.4
A 57.87% 62.92% 62.94% 96.91% 97.79%
B 20.48% 21.30% 20.59% 1.65% 0.74%
C 21.64% 15.77% 16.46% 1.43% 1.47%
18. Conclusion
ā¢āÆ For genes with more sense & antisense poly(A) reads, half use
non-major PAS at least 40% of the time
ā¢āÆ Pairwise comparison across all accessions helped to identify the
best candidate genes for polymorphism in the usage or position of
major PASs
Conclusion
19. Outlook
ā¢āÆ Combine all tissues & all accessions, calculate & its variance
ā¢āÆ Associate with gene categories, poly(A) site location of genes, etc.
ā¢āÆ Examine the trans/cis poly(A) QTL with the MAGIC linesā data
ā¢āÆ Check the relationship between the antisense poly(A) site & the
orientation of nearby genes, and the relationship this may have with
expression level
ā¢āÆ Check the data from related species, Capsella rubella & A. lyrata to look at
APA usage & its evolution between species
ā¢āÆ Ask if A. thaliana an outlier for any of the trends observed? if APA is
derived in A. thaliana?
Outlook
20. Acknowledgements
Kansas State University
Dr. Chris Toomajian
University of Utah
Dr. Richard Clark
Dr. Joshua Steffen
Edward J. Osborne
Robert Greenhalgh
Wellcome Trust Centre for Human
Genetics, University of Oxford
Dr. Richard Mott
Memorial Sloan-Kettering Cancer Center
Dr. Gunnar Raetsch
Philipp Drewe
Andre Kahles
21.
22. Alternative polyadenylation (APA)
Background
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā232
Adapted from Tress et al. 2007
Protein
Ā isoforms
Ā
23. Outlook
ā¢āÆ Combine all tissues and all accessions, take each tissue as subset, calculate and its variance
ā¢āÆ For each tissue, associate with gene categories according to GO analysis & gene families
ā¢āÆ Compare the distribution of from different tissues, and PAS usage patterns among tissues or
accessions
ā¢āÆ Check Ka/Ks for genes with high/low in all tissues
ā¢āÆ Check the poly(A) site location for genes with high , e.g. 3'UTR, CDS, 5'UTR or intron
ā¢āÆ Compare the location across accessions
ā¢āÆ Look at the relationship of location with gene expression level
ā¢āÆ Examine the cis poly(A) QTL with the MAGIC linesā RNA-seq data
ā¢āÆ Check the relationship between the antisense poly(A) site and the orientation of nearby genes
for each tissue subset, and the relationship this may have with expression level
ā¢āÆ Check the data from Capsella rubella and A. lyrata to look at APA usage and its evolution
between species
ā¢āÆ Ask if A. thaliana an outlier for any of the trends observed? if APA is derived in A. thaliana?
Outlook
24. Tian, et al. 2013
differentiated cells are reprogrammed to ES cell-like in-
duced pluripotent stem (iPS) cells [41]. A notable excep-
tion, however, has been observed with spermatogonial
germ cells, whose reprogramming to ES cells involves 30
UTR lengthening [41]. Notably, this is in line with the fact
that germ cells are more proliferative than ES cells. Simi-
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
Cancer
Cancer cells are of course hi
with this, and consistent with
been found to express, in gene
UTRs, as ļ¬rst shown in tran
mouse B-cell leukemia/lymp
recently in human colorectal c
lung cancers [47]. In the stud
proļ¬le was found to be info
subtypes with different surv
its relevance to cancer devel
nostic marker. One key questi
in cancer is whether prolifera
major driver of APA. Meta-an
transformed and nontransfo
dicted proliferation rates has
transformation has a signiļ¬c
[44]. However, a recent study
the same cells (BJ primary ļ¬b
lial cell line MCF10A) in prol
formed states, proliferatio
determinant of 30
UTR length
of 30
UTR regulation in cance
that, compared to MCF10A,
and MB231 show shortened
spectively. Notably, it has als
to the general trend, some g
adhesion genes, tend to expre
UTRs in cancer cells [45,46]. T
delineated how APA of differe
different cancer types and at
APA is modulated by multi
Regulation of core C/P facto
miRNA
RBP
TranslaĘon DegradaĘonLocalizaĘon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
TiBS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
shown. The 30
UTR region upstream of the proximal cleavage and
Figure 1
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā²
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
3ā²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā232
25. Alternative polyadenylation (APA)
Background
in abundance. One of the best-charac-
is that of microRNA (miR)-mediated
studies of myogenic [43,44
], hemato-
d cancer [45] cells, transcripts bearing
contained fewer miRNA-binding sites,
these transcripts to evade miRNA-
dation. Transcripts are also subject to
Upf1 binds to the 3 UTR in a length-dependent manner,
thus eliciting degradation of longer transcripts more
rapidly [48
].
The 30
UTR contains elements that affect not only
transcript degradation but also stability. In a genome-
wide computational analysis of sequence and stability
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā²
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
3ā²
Current Opinion in Cell Biology
PA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
hosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
om Current Opinion in Cell Biology 2013, 25:222ā232
Mueller, et al. 2012
Tian, et al. 2013
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
recentl
lung ca
proļ¬le
subtyp
its rele
nostic m
in canc
major d
transfo
dicted
transfo
[44]. H
the sam
lial cel
formed
determ
of 30
U
that, co
and M
spectiv
to the
adhesi
UTRs i
delinea
differen
APA is
Regula
The co
include
subuni
miRNA
RBP
TranslaĘon DegradaĘonLocalizaĘon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
Ti BS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
shown. The 30
UTR region upstream of the proximal cleavage and
polyadenylation site (pA) is called the constitutive UTR (cUTR), and the
downstream region is called the alternative UTR (aUTR). RNA-binding protein
(RBP) and miRNA targeting to the aUTR are shown. Impacts on mRNA localization,
translation, and degradation are indicated. CDS, coding sequence.
Adapted from Tress et al. 2007
Protein
Ā isoforms
Ā
depletion at the site and more pron
downstream from it, suggesting th
tioning might influence PAS use by
ing the rate of polymerase elongat
these observations are only corr
mental studies are required in ord
and to establish a causeāeffect re
nucleosome occupancy and poly(A
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
R
Elkon, et al. 2013
26. Alternative polyadenylation (APA)
Background
in abundance. One of the best-charac-
is that of microRNA (miR)-mediated
studies of myogenic [43,44
], hemato-
d cancer [45] cells, transcripts bearing
contained fewer miRNA-binding sites,
these transcripts to evade miRNA-
dation. Transcripts are also subject to
Upf1 binds to the 3 UTR in a length-dependent manner,
thus eliciting degradation of longer transcripts more
rapidly [48
].
The 30
UTR contains elements that affect not only
transcript degradation but also stability. In a genome-
wide computational analysis of sequence and stability
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā²
5ā² 3ā²
5ā²
5ā²
5ā² 5ā²3ā² 3ā²
3ā²
3ā²
3ā²
Current Opinion in Cell Biology
PA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
hosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
om Current Opinion in Cell Biology 2013, 25:222ā232
Mueller, et al. 2012
Tian, et al. 2013
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
recentl
lung ca
proļ¬le
subtyp
its rele
nostic m
in canc
major d
transfo
dicted
transfo
[44]. H
the sam
lial cel
formed
determ
of 30
U
that, co
and M
spectiv
to the
adhesi
UTRs i
delinea
differen
APA is
Regula
The co
include
subuni
miRNA
RBP
TranslaĘon DegradaĘonLocalizaĘon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
Ti BS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
shown. The 30
UTR region upstream of the proximal cleavage and
polyadenylation site (pA) is called the constitutive UTR (cUTR), and the
downstream region is called the alternative UTR (aUTR). RNA-binding protein
(RBP) and miRNA targeting to the aUTR are shown. Impacts on mRNA localization,
translation, and degradation are indicated. CDS, coding sequence.
Adapted from Tress et al. 2007
Protein
Ā isoforms
Ā
depletion at the site and more pron
downstream from it, suggesting th
tioning might influence PAS use by
ing the rate of polymerase elongat
these observations are only corr
mental studies are required in ord
and to establish a causeāeffect re
nucleosome occupancy and poly(A
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
R
Elkon, et al. 2013
27. depletion at t
downstream
tioning migh
ing the rate o
these observ
mental studie
and to estab
nucleosome o
Anotherw
to affect APA
genetic effect
tissues, in tw
Napl15), whi
Nature Reviews | Genetics
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
Favour distal poly(A) site usage Favour proximal poly(A) site usage
Figure 3 | Biological processes that have been linked with broad APA modulation.
A schematic showing the biological processes and diseases that alternative
polyadenylation(APA)hasbeenlinkedwith.Inaddition,thetendencytowardsdistal
orproximalpoly(A)siteusageisshown.
Elkon, et al. 2013
28. hles (SKI, New York) PALMapper HiTSeq, July 20, 2013 1
Advantages:
ā¢āÆ Alignments with variants, e.g. mismatches, indels
ā¢āÆ Accurate spliced alignments using computational splice site predictions
ā¢āÆ More accurate than TopHat (e.g. C. elegance 47% 81%, respectively)
ā¢āÆ Fast alignments (about 10 million reads/hour)
ā¢āÆ Softtrimming for polyA tail of each read
29. How did I retrieve the poly(A) reads?
The mapped sam file with softtrimmed poly(A)
Reads with Softtrimmed end
consecutive As in the end
Reads with long splicing length
consecutive As in the end
SoItrimming
Ā
Ā
+
Ā
5ā
Ā 3ā
Ā
RNAseq_reads
Ā consecuKve
Ā As=8
Ā
Ā quality
Ā score
Ā of
Ā
each
Ā soItrimmed
Ā bp
Ā
=40
Ā
5ā
Ā
3ā
Ā
Genome
Ā
5ā
Ā 3ā
Ā
AAAAAAAA
5ā
Ā
+
Ā
Splicing
Ā length
Ā =1500bp
Ā
Ā SoItrimming
Ā
consecuKve
Ā As=8
Ā
Ā quality
Ā score
Ā of
Ā each
Ā soItrimmed
Ā bp
Ā =40
Ā
Genome
Ā
RNAseq_reads
Ā
5ā
Ā
3ā
Ā
AAAAAAAA
5ā
Ā +
Ā
Splicing2
Ā length
Ā =1500bp
Ā
Ā Splicing1
Ā
Splicing1
Ā Splicing
Ā 2
Ā
consecuKve
Ā As=8
Ā
Ā quality
Ā score
Ā of
Ā each
Ā soItrimmed
Ā bp
Ā =40
Ā
5ā
Ā
3ā
Ā
AAAAAAAA
5ā
Ā +
Ā
Splicing
Ā length
Ā =1500bp
Ā
Ā
consecuKve
Ā As=8
Ā
Ā quality
Ā score
Ā of
Ā each
Ā soItrimmed
Ā bp
Ā =40
Ā
Perl programming to make the criteria true
Method
30. Defining poly(A) clusters (PAS)
ā¢āÆ 2,203,313 poly(A) reads across accessions are identified
ā¢āÆ Calculate the poly(A) site for each poly(A) read with Perl script
ā¢āÆ 75,532 PAS defined by clustering poly(A) reads in the same orientation and
within 10bp of each other across all accessions with total cluster interval
spanning no more than 24bp
ā¢āÆ 93.4% of clusters map to genic regions, and the 6.6% of clusters that are
further away from genic regions
ā¢āÆ 6581 genes have at least 20 sense poly(A) reads across accessions
ā¢āÆ 1473 genes have at least 10 antisense poly(A) reads across accessions
ā¢āÆ Major sense PAS defined across all accessions for each gene as the sense
PAS with the most reads
ā¢āÆ p = proportion of total reads in gene mapping to major PAS
ā¢āÆ q = 1-p = proportion of total reads in gene mapping to non-major PASs
Result
31. The distribution of the proportion of reads mapping to non-
major sense and antisense poly(A) clusters per gene
Genes with the proportion of non-major cluster reads equal to or greater than 0.4
( indicated with gray dashed lines) were considered as containing alternative poly(A) sites
and chosen for further polymorphic analysis
Result
32. Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
3074 genes with sense PAS
Result
33. Gene position and antisense PAS
Result
10
Nearby gene: the distance apart from its adjacent gene =2kb