SlideShare a Scribd company logo
1 of 33
Download to read offline
Genome-wide variation of alternative
polyadenylation in sense and antisense
transcription in Arabidopsis accessions	
 Ā 
Li Lei
Plant Pathology, KSU
lilei@ksu.edu
August 14, 2013
Outline
ā€¢ā€Æ Background
Ƙļƒ˜ā€Æ Pre-mRNA processing & polyadenylation
Ƙļƒ˜ā€Æ Alternative polyadenylation (APA)
Ƙļƒ˜ā€Æ APA in plants & unknown questions
ā€¢ā€Æ Objective
ā€¢ā€Æ Method
Ƙļƒ˜ā€Æ Approach
Ƙļƒ˜ā€Æ PALMapper: map RNA-seq reads to reference
Ƙļƒ˜ā€Æ How I retrieved the poly(A) reads
ā€¢ā€Æ Result
Ƙļƒ˜ā€Æ Evidence for APA
Ƙļƒ˜ā€Æ Poly(A) site location & related gene annotation
ā€¢ā€Æ Conclusion
ā€¢ā€Æ Outlook
ā€¢ā€Æ Acknowledgements
Background
Eukaryotic pre-mRNA processing & polyadenylation
poly(A)	
 Ā site	
 Ā (PAS)	
 Ā 
ā€¢ā€Æ poly(A) site = PAS
ā€¢ā€Æ Some genes, PASs of their mRNAs only in one place
ā€¢ā€Æ Other, PASs of their mRNAs in different places
Freitag, et al. 2012
TCT GAG AAA AGT AAG TAA ... ... CAG GC CCT AGA CTG TAG..
S E K S K * S P R L *
Aspergillus nidulans: pgkA (PGK)c
pA1 pA2
RFPā€“PTS1PgkAGFPā€“Sps19DIC Merge
pA1 -LPGVAALSEKSK* ā€“53.5
pA2 -LPGVAALSEKSPRL* +3.1
ESTs C terminus PTS1 (score)
Alternative polyadenylation (APA): Different mRNAs transcribed from the same
gene have different PASs
Alternative polyadenylation (APA)
Background
thus allowing these transcripts to evade miRNA-
mediated degradation. Transcripts are also subject to
transcript degradation but also stability. In a genome-
wide computational analysis of sequence and stability
Figure 1
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€²
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
dentical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
Mueller, et al. 2012
Tian, et al. 2013
differentiated cells are reprogrammed to ES cell-like in-
duced pluripotent stem (iPS) cells [41]. A notable excep-
tion, however, has been observed with spermatogonial
germ cells, whose reprogramming to ES cells involves 30
UTR lengthening [41]. Notably, this is in line with the fact
that germ cells are more proliferative than ES cells. Simi-
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
Cancer
Cancer cells are of co
with this, and consist
been found to express,
UTRs, as ļ¬rst shown
mouse B-cell leukem
recently in human colo
lung cancers [47]. In t
proļ¬le was found to
subtypes with differe
its relevance to cance
nostic marker. One ke
in cancer is whether p
major driver of APA. M
transformed and non
dicted proliferation ra
transformation has a
[44]. However, a recen
the same cells (BJ prim
lial cell line MCF10A)
formed states, pro
determinant of 30
UTR
of 30
UTR regulation i
that, compared to MC
and MB231 show sho
spectively. Notably, it
to the general trend,
adhesion genes, tend t
UTRs in cancer cells [4
delineated how APA o
different cancer types
APA is modulated by
miRNA
RBP
Translaʟon DegradaʟonLocalizaʟon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
TiBS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
mediated degradation. Transcripts are also subject to wide computational analysis of sequence and stability
Figure 1
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€²
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232
Adapted from Tress et al. 2007
Protein	
 Ā isoforms	
 Ā 
depletion at t
downstream
tioning migh
ing the rate o
these observ
mental studie
and to estab
nucleosome o
Anotherw
to affect APA
genetic effect
tissues, in tw
Napl15), whi
genes(namel
cases,thepro
are therefore
Nature Reviews | Genetics
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
Favour distal poly(A) site usage Favour proximal poly(A) site usage
Figure 3 | Biological processes that have been linked with broad APA modulation.
A schematic showing the biological processes and diseases that alternative
polyadenylation(APA)hasbeenlinkedwith.Inaddition,thetendencytowardsdistal
orproximalpoly(A)siteusageisshown.
Elkon, et al. 2013
APA in plants and unknown questions?
Background
Although genome-wide investigation of polyadenylation in single Arabidopsis
accession, we still do not know:
1.ā€Æ How much variation in the polyadenylation usage across Arabidopsis
accessions? What is the genetic basis for such variation? Cis regulation?
Trans?
2.ā€Æ Is Arabidopsis an outlier for any of the trends of polyadenylation site
usage compared with related species? How has APA evolved across
related species?
	
 Ā 
	
 Ā 
	
 Ā 	
 Ā 
Genome-wide landscape of polyadenylation in
Arabidopsis provides evidence for extensive
alternative polyadenylation
Xiaohui Wua,b
, Man Liua
, Bruce Downiec
, Chun Lianga
, Guoli Jib
, Qingshun Q. Lia,b,1
, and Arthur G. Huntd,1
a
Department of Botany, Miami University, Oxford, OH 45056; b
Department of Automation, Xiamen University, Xiamen, Fujian 361005, Peopleā€™s Republic of
China; and c
Department of Horticulture and Seed Biology Group, and d
Department of Plant and Soil Sciences, University of Kentucky, Lexington,
KY 40546-0312.
Edited by David C. Baulcombe, University of Cambridge, Cambridge, United Kingdom, and approved June 8, 2011 (received for review January 14, 2011)
Alternative polyadenylation (APA) has been shown to play an
important role in gene expression regulation in animals and
plants. However, the extent of sense and antisense APA at the
genome level is not known. We developed a deep-sequencing
protocol that queries the junctions of 3ā€²UTR and poly(A) tails and
conļ¬dently maps the poly(A) tags to the annotated genome. The
results of this mapping show that 70% of Arabidopsis genes use
more than one poly(A) site, excluding microheterogeneity. Analy-
sis of the poly(A) tags reveal extensive APA in introns and coding
sequences, results of which can signiļ¬cantly alter transcript se-
quences and their encoding proteins. Although the interplay of
intron splicing and polyadenylation potentially deļ¬nes poly(A)
site uses in introns, the polyadenylation signals leading to the
use of CDS protein-coding region poly(A) sites are distinct from
the rest of the genome. Interestingly, a large number of poly(A)
sites correspond to putative antisense transcripts that overlap
with the promoter of the associated sense transcript, a mode pre-
viously demonstrated to regulate sense gene expression. Our
results suggest that APA plays a far greater role in gene expres-
sion in plants than previously expected.
alternative processing | antisense transcription | nonstop mRNAs
The polyadenylation of mRNA in eukaryotes is an important
step in gene expression in eukaryotes. With few exceptions,
mature eukaryotic mRNAs possess a poly(A) tract, that in turn
functions to facilitate transport of the mRNA to the cytoplasm
and its subsequent stabilization and translation. The poly(A) tail
contributes regulatory information to each of these processes
through interactions with RNA processing factors and poly(A)-
binding proteins. The process of polyadenylation also contributes
to regulation by ā€œdeterminingā€ the composition of the mRNA
apart from the poly(A) tail. Thus, the position along the gene
where the pre-mRNA is processed and polyadenylated deter-
mines the sequence content in terms of exons and regulatory
motifs. If a gene possesses more than one polyadenylation site,
then the nature of the expressed mRNA can be altered via dif-
ferential choice of these sites, a process that is called alternative
polyadenylation, or APA. That APA may be important is sug-
gested by the observations that more than 50% of human and
plant genes have multiple poly(A) sites (1ā€“5). APA may be an
important factor in the regulation of genes associated with can-
cer and with early embryo development in animals (6ā€“8). APA
the FLC gene (15, 16); these antisense transcripts are involved in
transcriptional regulation of sense FLC mRNAs through chro-
matin modiļ¬cations in the vicinity of the sense FLC promoter.
The regulation of these two genes thus provides examples of two
modes of APA, involving intronic polyadenylation and 3ā€² end
processing of antisense transcripts.
Plant poly(A) site datasets (3, 17) have been assembled from
the analysis and curation of the results of EST and full-length
cDNA sequencing projects. Unfortunately, these projects are not
specially targeted to the identiļ¬cation of poly(A) sites, nor are
they high-throughput. With this consideration in mind, a strategy
designed to speciļ¬cally query the mRNA-poly(A) junction on
a transcriptome-wide basis was developed and used to study
poly(A) site choice in Arabidopsis leaves and seeds. The results
obtained using this strategy reveal an extensive network of po-
tential APA in Arabidopsis, including unanticipated and novel
modes of APA. In addition, the results corroborate other reports
suggestive of wide-spread antisense transcription in Arabidopsis,
and provide a dataset of poly(A) sites associated with antisense
transcripts. Finally, they provide evidence for tissue-speciļ¬c
poly(A) site choice.
Results
Preparation and Characterization of cDNA Tags That Query Poly-
adenylation Sites. To study Arabidopsis poly(A) sites on a genome-
wide basis, short DNA tags that include the mRNA-poly(A) site
junction [called poly(A) tags, or PATs hereafter] were prepared
and sequenced; the starting materials for these samples were
RNA isolated from dry seeds and the leaves of young seed-
lings. The initial sequences were processed and mapped to the
Arabidopsis reference genome. After removing potential internal
priming candidates and eliminating tags that mapped to chlo-
roplast and mitochondria genomes and to miscellaneous RNAs
(primarily rRNAs), a collection of tags that deļ¬ned more than
280,000 individual poly(A) sites were obtained (Table S1). Be-
cause poly(A) site microheterogeneity is ubiquitous in plants (3,
4), poly(A) sites in the same gene that are located within 24 nt of
each other were clustered so as to deļ¬ne a poly(A) site cluster
(PAC). The results of this process were more than 71,000 PACs
with an average of 54 PATs per PAC (Table S1). Of these PACs,
57,473 were in the ā€œsenseā€ orientation with respect to an anno-
Author contributions: X.W., M.L., G.J., Q.Q.L., and A.G.H. designed research; X.W., M.L.,
NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 19 NUMBER 8 AUGUST 2012 845
R E S OU RC E
Arabidopsis thaliana is an important model system that has had a
critical role in discoveries essential to our understanding of plant
biology and of generically important processes such as RNA interfer-
ence (RNAi). Although the A. thaliana genome was sequenced more
than a decade ago, challenges remain in resolving the RNAs that it
encodes and determining their functional significance. Establishing
where transcripts end is essential in genome annotation and for
understanding gene function. Alternative cleavage and polyadenyla-
tion (APA) defines different 3 ends within pre-mRNA transcribed
from the same gene, and this can affect function by determining
coding potential or the inclusion of regulatory sequence elements1,2.
This regulation of RNA 3 -end formation is considerably more wide-
spread than previously thought1,2, and RNA-binding proteins that
enable A. thaliana flowering provide important examples of the
biological impact of this control3. Defective 3 -end formation and
transcription termination at tandem or convergent gene pairs can
result in transcription interference or RNAi4,5, revealing that these
processes normally partition the genome and maintain expression of
neighboring genes6. Accordingly, such consequences of uncontrolled
3 -end formation also emphasize the critical nature of gene arrange-
ment along a eukaryotic chromosome.
As a prelude to the analysis of regulators of 3 -end formation,
we set out to map A. thaliana RNA 3 ends genome-wide. Previous
high-throughput A. thaliana transcriptome studies have depended
on the copying of RNA into complementary DNA (cDNA) with
reverse transcriptase7ā€“10. However, the intrinsic template switch-
ing11 and DNA-dependent DNA-polymerase12 activities of reverse
transcriptases, together with oligo(dT)-dependent internal priming13,
cause well-established artifacts that can affect the identification of
authentic antisense RNAs14,15, splicing events14 and RNA 3 ends13,16.
Different strategies have been developed to address these problems,
making strand-specific RNA sequencing an increasingly powerful
tool for the analysis of transcriptomes. However, a recent comparison
of several such methods showed marked differences not only in strand
specificity but also in a range of criteria that influence transcriptome
interpretation17. Therefore, as an alternative, we used direct RNA
sequencing (DRS) to identify polyadenylated A. thaliana RNAs18.
This approach is direct in the sense that native RNA is used as the
sequencing template, but the sequence is read by imaging comple-
mentary fluorescent nucleotides incorporated by a polymerase.
In this true single-molecule sequencing (tSMS) procedure, the site
of RNA cleavage and polyadenylation is defined with an accuracy
of 2 nucleotides (nt) in the absence of errors induced by reverse
transcriptase, ligation or amplification18.
RESULTS
Mapping A. thaliana RNA 3 ends
Total RNA purified from A. thaliana seedlings was subjected to DRS,
and a computational procedure to align reads uniquely to the most
recent A. thaliana genome release (currently TAIR10) was developed.
The initial mapping analysis revealed that the vast majority of reads
(89.60%) aligned to protein-coding genes, which is consistent with
the idea that this approach can identify authentic sites of mRNA
cleavage and polyadenylation (Fig. 1a). These data define extremely
heterogeneous patterns of RNA 3 -end formation (Fig. 1b) that
differ markedly from those of human mRNAs analyzed in the same
way (Supplementary Fig. 1a)18.
Although nontemplated base addition between cleavage sites and the
poly(A) tail has been reported from analysis of A. thaliana expressed-
sequence-tag (EST) data19, we found no evidence for this phenomenon
1College of Life Sciences, University of Dundee, Dundee, UK. 2Department of Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee, UK. 3Helicos
BioSciences Corporation, Cambridge, Massachusetts, USA. Correspondence should be addressed to G.G.S. (g.g.simpson@dundee.ac.uk) or G.J.B. (g.j.barton@dundee.ac.uk).
Received 16 February; accepted 19 June; published online 22 July 2012; doi:10.1038/nsmb.2345
Direct sequencing of Arabidopsis thaliana RNA reveals
patterns of cleavage and polyadenylation
Alexander Sherstnev1, CĆ©line Duc1, Christian Cole1, Vasiliki Zacharaki1, Csaba Hornyik2, Fatih Ozsolak3,
Patrice M Milos3, Geoffrey J Barton1 & Gordon G Simpson1,2
It has recently been shown that RNA 3 -end formation plays a more widespread role in controlling gene expression than
previously thought. To examine the impact of regulated 3 -end formation genome-wide, we applied direct RNA sequencing to
A. thaliana. Here we show the authentic transcriptome in unprecedented detail and describe the effects of 3 -end formation on
genome organization. We reveal extreme heterogeneity in RNA 3 ends, discover previously unrecognized noncoding RNAs and
propose widespread reannotation of the genome. We explain the origin of most poly(A)+ antisense RNAs and identify cis elements
that control 3 -end formation in different registers. These findings are essential to understanding what the genome actually
encodes, how it is organized and how regulated 3 -end formation affects these processes.
npgĀ©2012NatureAmerica,Inc.Allrightsreserved.
(AtCPSF30)
(AtCPSF30*-YT521B)
FLC
OXT6
D P
P D
a
a
a
b
c
b
b
c
c
FIGURE 2 | Schematic representation of alternative polyadenyla
Xing, et al. 2012PAS2 PAS1
Gene
Transcript1
Transcript2
Investigate genome-wide variation of alternative
polyadenylation in sense and antisense transcription
across a set of Arabidopsis thaliana accessions	
 Ā 
Objective
Objective
ā€¢ā€Æ Is variation in APA as prevalent across genotypes as across
tissue types?
ā€¢ā€Æ Is there genetic basis for variation related to the trans
regulation as well as cis of APA?
ā€¢ā€Æ Does a geneā€™s proximity to neighboring genes constrain
polyadenylation site choice and limit variation?
Approach
Method
82 bp Strand-specific RNA-seq
Map reads to each corresponding genome--
PALMapper
Transform read positions from each transcriptome
into a common coordinate system based on a
multiple-genome alignment
Retrieve polyA-containing reads, cluster across all
accessions and identify poly(A) site (PAS)
Generate read counts for each PAS for each
accession
Compare PASs genome-wide across accessions
19 accessions (genome sequenced)
SeedlingRoot Floral bud
RNA extraction & library construction with barcode
PALMapper: map RNA-seq reads to reference
ā€¢ā€Æ PALMapper (Jean, et al. 2010)
ā€¢ā€Æ A combination of:
the spliced alignment method QPALMA (De Bona, et al. 2008)
the short read alignment tool GenomeMapper (Schneeberger, et al. 2009)
http://ftp.raetschlab.org/software/palmapper/palmapper-0.5.tar.gz
Version	
 Ā 0.5	
 Ā released:	
 Ā 
Method
Adapted from Kahles, et al. 2013 talk
Another Mapper?
Memorial Sloan-Kettering Cancer Cente
Advantages:
ā€¢ā€Æ Alignments with variants, e.g. mismatches, indels
ā€¢ā€Æ Accurate spliced alignments using computational splice site predictions
ā€¢ā€Æ More accurate than TopHat (e.g. C. elegance 47% & 81%, respectively)
ā€¢ā€Æ Fast alignments (about 10 million reads/hour)
ā€¢ā€Æ Softtrimming for polyA tail of each read
Softtrimming
ā€¢ā€Æ 	
 Ā The sequence remain in bam file
ā€¢ā€Æ Annotated with cigar ā€œSā€ annotation
ā€¢ā€Æ Ignored by many tools such as the IGV
How did I retrieve the poly(A) reads?
Method
The mapped sam file with softtrimmed poly(A)
Softtrimming
+	
 Ā 
5ā€™	
 Ā  3ā€™	
 Ā 
RNAseq_reads	
 Ā 
5ā€™	
 Ā 
3ā€™	
 Ā 
Genome	
 Ā  5ā€™	
 Ā  3ā€™	
 Ā 
AAAAAAAA
5ā€™	
 Ā  +	
 Ā 
Splicing	
 Ā length	
 Ā >=1500bp	
 Ā 	
 Ā 
Perl programming to pick up Poly(A) reads
Consecutive As in 3ā€™ end of reads >=8bp
Quality score of each A >=40
Huge splicing
Defining poly(A) clusters (PAS)
Result
Identify poly(A) reads across accessions 2,203,313 	
 Ā 
Cluster poly(A) reads: 75,532 PASs
ā€¢ā€Æ In the same orientation
ā€¢ā€Æ Within 10bp of each other across all accessions
ā€¢ā€Æ Total cluster interval spanning <= 24bp
Map PASs to genic regions (Ā±120bp to the
annotated range):
ā€¢ā€Æ 93.4% PASs map to genic regions
ā€¢ā€Æ 6.6% PASs further away from genic regions
Consider the sense & antisense PASs:
ā€¢ā€Æ Poly(A) reads orientation relative to the gene
orientation
ā€¢ā€Æ 6581 genes with >= 20 sense poly(A) reads
across accessions
ā€¢ā€Æ 1473 genes with >= 10 antisense poly(A) reads
across accessions
Reads mapping to the major and non-major poly(A)
cluster within gene
Result
ā€¢ā€Æ Major PAS: the PAS with the most
reads across all accessions for
each gene
ā€¢ā€Æ p = proportion of total reads in gene
mapping to major PAS
ā€¢ā€Æ q = 1-p = proportion of total reads in
gene mapping to non-major PASs
The distribution of the proportion of reads mapping to non-
major sense & antisense poly(A) clusters per gene
Genes with the proportion of non-major cluster reads equal to or greater than 0.4
( indicated with gray dashed lines) were considered as containing alternative poly(A) sites
and chosen for further polymorphic analysis
Result
6581 gene with sense PASs 1471 gene with antisense PASs
Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
Result
ĀÆD =
1
n
n 1X
i=1
nX
j=i+1
Dij
ā€¢ā€Æ For the ith and jth accessions Ai, and Aj, we can calculate their absolute
difference of the proportion of reads mapping to non-major poly(A) cluster,
here called Dij, Dij = |qAi ā€“ qAj|
ā€¢ā€Æ Average pairwise difference:
Where n=19
ā€¢ā€Æ Maximum pairwise difference:
Dmax = max{Dij}
Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
3074 genes with sense PAS
Result
Average pairwise difference Maximum pairwise difference Dmax
Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
544 genes with antisense PAS
Result
Maximum pairwise difference DmaxAverage pairwise difference
Gene position and antisense PAS
Result
Nearby gene: the distance apart from its adjacent gene <=2kb
Groups Fraction of
genes in
each group
Fraction of genes
with sense
poly(A) reads
>=20
Fraction of genes
with proportion of
non-major sense
PASs>0.4
Fraction of genes
with antisense
poly(A) reads
>=10
Fraction of genes with
proportion of non-
major antisense
PASs>0.4
A 57.87% 62.92% 62.94% 96.91% 97.79%
B 20.48% 21.30% 20.59% 1.65% 0.74%
C 21.64% 15.77% 16.46% 1.43% 1.47%
Conclusion
ā€¢ā€Æ For genes with more sense & antisense poly(A) reads, half use
non-major PAS at least 40% of the time
ā€¢ā€Æ Pairwise comparison across all accessions helped to identify the
best candidate genes for polymorphism in the usage or position of
major PASs
Conclusion
Outlook
ā€¢ā€Æ Combine all tissues & all accessions, calculate & its variance
ā€¢ā€Æ Associate with gene categories, poly(A) site location of genes, etc.
ā€¢ā€Æ Examine the trans/cis poly(A) QTL with the MAGIC linesā€™ data
ā€¢ā€Æ Check the relationship between the antisense poly(A) site & the
orientation of nearby genes, and the relationship this may have with
expression level
ā€¢ā€Æ Check the data from related species, Capsella rubella & A. lyrata to look at
APA usage & its evolution between species
ā€¢ā€Æ Ask if A. thaliana an outlier for any of the trends observed? if APA is
derived in A. thaliana?
Outlook
Acknowledgements
Kansas State University
Dr. Chris Toomajian
University of Utah
Dr. Richard Clark
Dr. Joshua Steffen
Edward J. Osborne
Robert Greenhalgh
Wellcome Trust Centre for Human
Genetics, University of Oxford
Dr. Richard Mott
Memorial Sloan-Kettering Cancer Center
Dr. Gunnar Raetsch
Philipp Drewe
Andre Kahles
Alternative polyadenylation (APA)
Background
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232
Adapted from Tress et al. 2007
Protein	
 Ā isoforms	
 Ā 
Outlook
ā€¢ā€Æ Combine all tissues and all accessions, take each tissue as subset, calculate and its variance
ā€¢ā€Æ For each tissue, associate with gene categories according to GO analysis & gene families
ā€¢ā€Æ Compare the distribution of from different tissues, and PAS usage patterns among tissues or
accessions
ā€¢ā€Æ Check Ka/Ks for genes with high/low in all tissues
ā€¢ā€Æ Check the poly(A) site location for genes with high , e.g. 3'UTR, CDS, 5'UTR or intron
ā€¢ā€Æ Compare the location across accessions
ā€¢ā€Æ Look at the relationship of location with gene expression level
ā€¢ā€Æ Examine the cis poly(A) QTL with the MAGIC linesā€™ RNA-seq data
ā€¢ā€Æ Check the relationship between the antisense poly(A) site and the orientation of nearby genes
for each tissue subset, and the relationship this may have with expression level
ā€¢ā€Æ Check the data from Capsella rubella and A. lyrata to look at APA usage and its evolution
between species
ā€¢ā€Æ Ask if A. thaliana an outlier for any of the trends observed? if APA is derived in A. thaliana?
Outlook
Tian, et al. 2013
differentiated cells are reprogrammed to ES cell-like in-
duced pluripotent stem (iPS) cells [41]. A notable excep-
tion, however, has been observed with spermatogonial
germ cells, whose reprogramming to ES cells involves 30
UTR lengthening [41]. Notably, this is in line with the fact
that germ cells are more proliferative than ES cells. Simi-
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
Cancer
Cancer cells are of course hi
with this, and consistent with
been found to express, in gene
UTRs, as ļ¬rst shown in tran
mouse B-cell leukemia/lymp
recently in human colorectal c
lung cancers [47]. In the stud
proļ¬le was found to be info
subtypes with different surv
its relevance to cancer devel
nostic marker. One key questi
in cancer is whether prolifera
major driver of APA. Meta-an
transformed and nontransfo
dicted proliferation rates has
transformation has a signiļ¬c
[44]. However, a recent study
the same cells (BJ primary ļ¬b
lial cell line MCF10A) in prol
formed states, proliferatio
determinant of 30
UTR length
of 30
UTR regulation in cance
that, compared to MCF10A,
and MB231 show shortened
spectively. Notably, it has als
to the general trend, some g
adhesion genes, tend to expre
UTRs in cancer cells [45,46]. T
delineated how APA of differe
different cancer types and at
APA is modulated by multi
Regulation of core C/P facto
miRNA
RBP
Translaʟon DegradaʟonLocalizaʟon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
TiBS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
shown. The 30
UTR region upstream of the proximal cleavage and
Figure 1
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€²
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
identical proteins are produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232
Alternative polyadenylation (APA)
Background
in abundance. One of the best-charac-
is that of microRNA (miR)-mediated
studies of myogenic [43,44
], hemato-
d cancer [45] cells, transcripts bearing
contained fewer miRNA-binding sites,
these transcripts to evade miRNA-
dation. Transcripts are also subject to
Upf1 binds to the 3 UTR in a length-dependent manner,
thus eliciting degradation of longer transcripts more
rapidly [48
].
The 30
UTR contains elements that affect not only
transcript degradation but also stability. In a genome-
wide computational analysis of sequence and stability
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€²
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
PA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
hosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
om Current Opinion in Cell Biology 2013, 25:222ā€“232
Mueller, et al. 2012
Tian, et al. 2013
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
recentl
lung ca
proļ¬le
subtyp
its rele
nostic m
in canc
major d
transfo
dicted
transfo
[44]. H
the sam
lial cel
formed
determ
of 30
U
that, co
and M
spectiv
to the
adhesi
UTRs i
delinea
differen
APA is
Regula
The co
include
subuni
miRNA
RBP
Translaʟon DegradaʟonLocalizaʟon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
Ti BS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
shown. The 30
UTR region upstream of the proximal cleavage and
polyadenylation site (pA) is called the constitutive UTR (cUTR), and the
downstream region is called the alternative UTR (aUTR). RNA-binding protein
(RBP) and miRNA targeting to the aUTR are shown. Impacts on mRNA localization,
translation, and degradation are indicated. CDS, coding sequence.
Adapted from Tress et al. 2007
Protein	
 Ā isoforms	
 Ā 
depletion at the site and more pron
downstream from it, suggesting th
tioning might influence PAS use by
ing the rate of polymerase elongat
these observations are only corr
mental studies are required in ord
and to establish a causeā€“effect re
nucleosome occupancy and poly(A
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
R
Elkon, et al. 2013
Alternative polyadenylation (APA)
Background
in abundance. One of the best-charac-
is that of microRNA (miR)-mediated
studies of myogenic [43,44
], hemato-
d cancer [45] cells, transcripts bearing
contained fewer miRNA-binding sites,
these transcripts to evade miRNA-
dation. Transcripts are also subject to
Upf1 binds to the 3 UTR in a length-dependent manner,
thus eliciting degradation of longer transcripts more
rapidly [48
].
The 30
UTR contains elements that affect not only
transcript degradation but also stability. In a genome-
wide computational analysis of sequence and stability
(a)
Ex1 Ex3
PASPAS
Ex2
Ex1 Ex3Ex2Ex1 Ex3Ex2
(b)
Ex1 Ex3Ex2Ex1 Ex2
Ex1 Ex3
PASPAS
Ex2
5ā€²
5ā€² 3ā€²
5ā€²
5ā€²
5ā€² 5ā€²3ā€² 3ā€²
3ā€²
3ā€²
3ā€²
Current Opinion in Cell Biology
PA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30
UTR, then
produced. Because the 30
UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of
be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when
hosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions.
om Current Opinion in Cell Biology 2013, 25:222ā€“232
Mueller, et al. 2012
Tian, et al. 2013
lar trends of 30
UTR length regulation have been reported
for comparisons of ES cells versus neural stem/progenitor
(NSP) cells or neurons [42]. Although these studies have all
pointed to a connection between 30
UTR length and cell
proliferation, cardiac hypertrophy, in which myocytes grow
in size rather than in number, has also been found to
involve 30
UTR shortening [43]. Thus, a general rule
may be that APA regulation is correlated with cell growth.
recentl
lung ca
proļ¬le
subtyp
its rele
nostic m
in canc
major d
transfo
dicted
transfo
[44]. H
the sam
lial cel
formed
determ
of 30
U
that, co
and M
spectiv
to the
adhesi
UTRs i
delinea
differen
APA is
Regula
The co
include
subuni
miRNA
RBP
Translaʟon DegradaʟonLocalizaʟon
AAAnCDS
CDS
cUTR aUTR
!!
AAA
AAA
n
Ti BS
Figure 2. Regulation of cis elements in 30
untranslated regions (UTRs) by
alternative cleavage and polyadenylation (APA). Two mRNA isoforms are
shown. The 30
UTR region upstream of the proximal cleavage and
polyadenylation site (pA) is called the constitutive UTR (cUTR), and the
downstream region is called the alternative UTR (aUTR). RNA-binding protein
(RBP) and miRNA targeting to the aUTR are shown. Impacts on mRNA localization,
translation, and degradation are indicated. CDS, coding sequence.
Adapted from Tress et al. 2007
Protein	
 Ā isoforms	
 Ā 
depletion at the site and more pron
downstream from it, suggesting th
tioning might influence PAS use by
ing the rate of polymerase elongat
these observations are only corr
mental studies are required in ord
and to establish a causeā€“effect re
nucleosome occupancy and poly(A
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
R
Elkon, et al. 2013
depletion at t
downstream
tioning migh
ing the rate o
these observ
mental studie
and to estab
nucleosome o
Anotherw
to affect APA
genetic effect
tissues, in tw
Napl15), whi
Nature Reviews | Genetics
Neuron activity
Proliferation
Cancer
Oculopharyngeal muscular dystrophy
Global APA
Biological processes
Connections to disease
Favour distal poly(A) site usage Favour proximal poly(A) site usage
Figure 3 | Biological processes that have been linked with broad APA modulation.
A schematic showing the biological processes and diseases that alternative
polyadenylation(APA)hasbeenlinkedwith.Inaddition,thetendencytowardsdistal
orproximalpoly(A)siteusageisshown.
Elkon, et al. 2013
hles (SKI, New York) PALMapper HiTSeq, July 20, 2013 1
Advantages:
ā€¢ā€Æ Alignments with variants, e.g. mismatches, indels
ā€¢ā€Æ Accurate spliced alignments using computational splice site predictions
ā€¢ā€Æ More accurate than TopHat (e.g. C. elegance 47%  81%, respectively)
ā€¢ā€Æ Fast alignments (about 10 million reads/hour)
ā€¢ā€Æ Softtrimming for polyA tail of each read
How did I retrieve the poly(A) reads?
The mapped sam file with softtrimmed poly(A)
Reads with Softtrimmed end
 consecutive As in the end
Reads with long splicing length
 consecutive As in the end
SoItrimming	
 Ā 	
 Ā 
+	
 Ā 
5ā€™	
 Ā  3ā€™	
 Ā 
RNAseq_reads	
 Ā  consecuKve	
 Ā As=8	
 Ā 
	
 Ā quality	
 Ā score	
 Ā of	
 Ā 
each	
 Ā soItrimmed	
 Ā bp	
 Ā 
=40	
 Ā 
5ā€™	
 Ā 
3ā€™	
 Ā 
Genome	
 Ā 
5ā€™	
 Ā  3ā€™	
 Ā 
AAAAAAAA
5ā€™	
 Ā 
+	
 Ā 
Splicing	
 Ā length	
 Ā =1500bp	
 Ā 	
 Ā SoItrimming	
 Ā 
consecuKve	
 Ā As=8	
 Ā 
	
 Ā quality	
 Ā score	
 Ā of	
 Ā each	
 Ā soItrimmed	
 Ā bp	
 Ā =40	
 Ā 
Genome	
 Ā 
RNAseq_reads	
 Ā 
5ā€™	
 Ā 
3ā€™	
 Ā 
AAAAAAAA
5ā€™	
 Ā  +	
 Ā 
Splicing2	
 Ā length	
 Ā =1500bp	
 Ā 	
 Ā Splicing1	
 Ā 
Splicing1	
 Ā Splicing	
 Ā 2	
 Ā 
consecuKve	
 Ā As=8	
 Ā 
	
 Ā quality	
 Ā score	
 Ā of	
 Ā each	
 Ā soItrimmed	
 Ā bp	
 Ā =40	
 Ā 
5ā€™	
 Ā 
3ā€™	
 Ā 
AAAAAAAA
5ā€™	
 Ā  +	
 Ā 
Splicing	
 Ā length	
 Ā =1500bp	
 Ā 	
 Ā 
consecuKve	
 Ā As=8	
 Ā 
	
 Ā quality	
 Ā score	
 Ā of	
 Ā each	
 Ā soItrimmed	
 Ā bp	
 Ā =40	
 Ā 
Perl programming to make the criteria true
Method
Defining poly(A) clusters (PAS)
ā€¢ā€Æ 2,203,313 poly(A) reads across accessions are identified
ā€¢ā€Æ Calculate the poly(A) site for each poly(A) read with Perl script
ā€¢ā€Æ 75,532 PAS defined by clustering poly(A) reads in the same orientation and
within 10bp of each other across all accessions with total cluster interval
spanning no more than 24bp
ā€¢ā€Æ 93.4% of clusters map to genic regions, and the 6.6% of clusters that are
further away from genic regions
ā€¢ā€Æ 6581 genes have at least 20 sense poly(A) reads across accessions
ā€¢ā€Æ 1473 genes have at least 10 antisense poly(A) reads across accessions
ā€¢ā€Æ Major sense PAS defined across all accessions for each gene as the sense
PAS with the most reads
ā€¢ā€Æ p = proportion of total reads in gene mapping to major PAS
ā€¢ā€Æ q = 1-p = proportion of total reads in gene mapping to non-major PASs
Result
The distribution of the proportion of reads mapping to non-
major sense and antisense poly(A) clusters per gene
Genes with the proportion of non-major cluster reads equal to or greater than 0.4
( indicated with gray dashed lines) were considered as containing alternative poly(A) sites
and chosen for further polymorphic analysis
Result
Pairwise difference in the proportion of reads mapping to non-
major poly(A) clusters across accessions
3074 genes with sense PAS
Result
Gene position and antisense PAS
Result
10
Nearby gene: the distance apart from its adjacent gene =2kb

More Related Content

What's hot

Terzic and Maxon et al., 2016
Terzic and Maxon et al., 2016Terzic and Maxon et al., 2016
Terzic and Maxon et al., 2016
Jake Maxon
Ā 
s12864-015-1541-1
s12864-015-1541-1s12864-015-1541-1
s12864-015-1541-1
Dago Noel
Ā 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
Nikhil Aggarwal
Ā 
Chen_et_al-2008-Genes_to_Cells
Chen_et_al-2008-Genes_to_CellsChen_et_al-2008-Genes_to_Cells
Chen_et_al-2008-Genes_to_Cells
Da-Wei Lin
Ā 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics II
Nikolay Vyahhi
Ā 
iGEM Paper (more pretty)
iGEM Paper (more pretty)iGEM Paper (more pretty)
iGEM Paper (more pretty)
David Dinh
Ā 
Directed Evolution
Directed EvolutionDirected Evolution
Directed Evolution
Ifrah Ishaq
Ā 
Directed evolution
Directed evolutionDirected evolution
Directed evolution
Ifrah Ishaq
Ā 
Sarah POSTER-final
Sarah POSTER-finalSarah POSTER-final
Sarah POSTER-final
Sarah Metcalfe
Ā 

What's hot (20)

1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky
Ā 
Brian_Strahl 2013_class_on_genomics_and_proteomics
Brian_Strahl 2013_class_on_genomics_and_proteomicsBrian_Strahl 2013_class_on_genomics_and_proteomics
Brian_Strahl 2013_class_on_genomics_and_proteomics
Ā 
Suppressor mutation
Suppressor mutationSuppressor mutation
Suppressor mutation
Ā 
Terzic and Maxon et al., 2016
Terzic and Maxon et al., 2016Terzic and Maxon et al., 2016
Terzic and Maxon et al., 2016
Ā 
s12864-015-1541-1
s12864-015-1541-1s12864-015-1541-1
s12864-015-1541-1
Ā 
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
Ā 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
Ā 
2015 10 21_pathology_wim_vancriekinge
2015 10 21_pathology_wim_vancriekinge2015 10 21_pathology_wim_vancriekinge
2015 10 21_pathology_wim_vancriekinge
Ā 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Ā 
Chen_et_al-2008-Genes_to_Cells
Chen_et_al-2008-Genes_to_CellsChen_et_al-2008-Genes_to_Cells
Chen_et_al-2008-Genes_to_Cells
Ā 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics II
Ā 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathology
Ā 
iGEM Paper (more pretty)
iGEM Paper (more pretty)iGEM Paper (more pretty)
iGEM Paper (more pretty)
Ā 
Directed Evolution
Directed EvolutionDirected Evolution
Directed Evolution
Ā 
Directed evolution
Directed evolutionDirected evolution
Directed evolution
Ā 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
Ā 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networks
Ā 
Sarah POSTER-final
Sarah POSTER-finalSarah POSTER-final
Sarah POSTER-final
Ā 
Proteomics
ProteomicsProteomics
Proteomics
Ā 
zahid hussain ajk
zahid hussain ajkzahid hussain ajk
zahid hussain ajk
Ā 

Viewers also liked

Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Jennifer Shelton
Ā 
Dna Sequencing
Dna SequencingDna Sequencing
Dna Sequencing
Zahoor Ahmed
Ā 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHOD
Musa Khan
Ā 

Viewers also liked (17)

Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2
Ā 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Ā 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Ā 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Ā 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
Ā 
Using BioNano Maps to Improve an Insect Genome Assemblyā€‹
Using BioNano Maps to Improve an Insect Genome Assemblyā€‹Using BioNano Maps to Improve an Insect Genome Assemblyā€‹
Using BioNano Maps to Improve an Insect Genome Assemblyā€‹
Ā 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014
Ā 
Bng presentation draft
Bng presentation draftBng presentation draft
Bng presentation draft
Ā 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
Ā 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Ā 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
Ā 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
Ā 
Dna Sequencing
Dna SequencingDna Sequencing
Dna Sequencing
Ā 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
Ā 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHOD
Ā 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
Ā 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Ā 

Similar to Bioinformatic jc 08_14_2013_formal

Venters Molecular and Cellular Biology 2011 2253-2261
Venters Molecular and Cellular Biology 2011 2253-2261Venters Molecular and Cellular Biology 2011 2253-2261
Venters Molecular and Cellular Biology 2011 2253-2261
Jordan Irvin
Ā 
Evaluation of ERG Responsive Proteome in Prostate Cancer
Evaluation of ERG Responsive Proteome in Prostate CancerEvaluation of ERG Responsive Proteome in Prostate Cancer
Evaluation of ERG Responsive Proteome in Prostate Cancer
Kaneeka Sood
Ā 
Hsiao-DevNeurobiol2014
Hsiao-DevNeurobiol2014Hsiao-DevNeurobiol2014
Hsiao-DevNeurobiol2014
Katie K. Hsiao
Ā 
Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...
Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...
Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...
NIKITAPATHANIA
Ā 
Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx
Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docxVisel2009_.pdfARTICLESChIP-seq accurately predictsti.docx
Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx
dickonsondorris
Ā 
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
J. Colin Cox
Ā 
Cancer Res-2015-Bonastre-1287-97
Cancer Res-2015-Bonastre-1287-97Cancer Res-2015-Bonastre-1287-97
Cancer Res-2015-Bonastre-1287-97
Sara Verdura
Ā 
ConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_iden
Rony Armon
Ā 

Similar to Bioinformatic jc 08_14_2013_formal (20)

Venters Molecular and Cellular Biology 2011 2253-2261
Venters Molecular and Cellular Biology 2011 2253-2261Venters Molecular and Cellular Biology 2011 2253-2261
Venters Molecular and Cellular Biology 2011 2253-2261
Ā 
Computational models for the analysis of gene expression regulation and its a...
Computational models for the analysis of gene expression regulation and its a...Computational models for the analysis of gene expression regulation and its a...
Computational models for the analysis of gene expression regulation and its a...
Ā 
Evaluation of ERG Responsive Proteome in Prostate Cancer
Evaluation of ERG Responsive Proteome in Prostate CancerEvaluation of ERG Responsive Proteome in Prostate Cancer
Evaluation of ERG Responsive Proteome in Prostate Cancer
Ā 
Hsiao-DevNeurobiol2014
Hsiao-DevNeurobiol2014Hsiao-DevNeurobiol2014
Hsiao-DevNeurobiol2014
Ā 
Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...
Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...
Presentation1..gymno..non specific markers n microsatellites..by Nikita Patha...
Ā 
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
Ā 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
Ā 
Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx
Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docxVisel2009_.pdfARTICLESChIP-seq accurately predictsti.docx
Visel2009_.pdfARTICLESChIP-seq accurately predictsti.docx
Ā 
pap paper pdf
pap paper pdfpap paper pdf
pap paper pdf
Ā 
Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...Making Protein Function and Subcellular Localization Predictions: Challenges ...
Making Protein Function and Subcellular Localization Predictions: Challenges ...
Ā 
Moeller-2012pub
Moeller-2012pubMoeller-2012pub
Moeller-2012pub
Ā 
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Allert2010-Multifactorial_determinants_of_protein_expression_in_prokaryotic_o...
Ā 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Ā 
Grant Proposal 2006
Grant Proposal 2006Grant Proposal 2006
Grant Proposal 2006
Ā 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattle
Ā 
Light Regulates Plant Alternative Splicing through the Control of Transcripti...
Light Regulates Plant Alternative Splicing through the Control of Transcripti...Light Regulates Plant Alternative Splicing through the Control of Transcripti...
Light Regulates Plant Alternative Splicing through the Control of Transcripti...
Ā 
Cancer Res-2015-Bonastre-1287-97
Cancer Res-2015-Bonastre-1287-97Cancer Res-2015-Bonastre-1287-97
Cancer Res-2015-Bonastre-1287-97
Ā 
Alternative splicing by kk sahu
Alternative splicing by kk sahuAlternative splicing by kk sahu
Alternative splicing by kk sahu
Ā 
ConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_iden
Ā 
Molecular systematics.pdf
Molecular systematics.pdfMolecular systematics.pdf
Molecular systematics.pdf
Ā 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
Ā 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
Ā 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Ā 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Ā 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Ā 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Ā 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Ā 

Bioinformatic jc 08_14_2013_formal

  • 1. Genome-wide variation of alternative polyadenylation in sense and antisense transcription in Arabidopsis accessions Ā  Li Lei Plant Pathology, KSU lilei@ksu.edu August 14, 2013
  • 2. Outline ā€¢ā€Æ Background Ƙļƒ˜ā€Æ Pre-mRNA processing & polyadenylation Ƙļƒ˜ā€Æ Alternative polyadenylation (APA) Ƙļƒ˜ā€Æ APA in plants & unknown questions ā€¢ā€Æ Objective ā€¢ā€Æ Method Ƙļƒ˜ā€Æ Approach Ƙļƒ˜ā€Æ PALMapper: map RNA-seq reads to reference Ƙļƒ˜ā€Æ How I retrieved the poly(A) reads ā€¢ā€Æ Result Ƙļƒ˜ā€Æ Evidence for APA Ƙļƒ˜ā€Æ Poly(A) site location & related gene annotation ā€¢ā€Æ Conclusion ā€¢ā€Æ Outlook ā€¢ā€Æ Acknowledgements
  • 3. Background Eukaryotic pre-mRNA processing & polyadenylation poly(A) Ā site Ā (PAS) Ā  ā€¢ā€Æ poly(A) site = PAS ā€¢ā€Æ Some genes, PASs of their mRNAs only in one place ā€¢ā€Æ Other, PASs of their mRNAs in different places Freitag, et al. 2012 TCT GAG AAA AGT AAG TAA ... ... CAG GC CCT AGA CTG TAG.. S E K S K * S P R L * Aspergillus nidulans: pgkA (PGK)c pA1 pA2 RFPā€“PTS1PgkAGFPā€“Sps19DIC Merge pA1 -LPGVAALSEKSK* ā€“53.5 pA2 -LPGVAALSEKSPRL* +3.1 ESTs C terminus PTS1 (score) Alternative polyadenylation (APA): Different mRNAs transcribed from the same gene have different PASs
  • 4. Alternative polyadenylation (APA) Background thus allowing these transcripts to evade miRNA- mediated degradation. Transcripts are also subject to transcript degradation but also stability. In a genome- wide computational analysis of sequence and stability Figure 1 (a) Ex1 Ex3 PASPAS Ex2 Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then dentical proteins are produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when Mueller, et al. 2012 Tian, et al. 2013 differentiated cells are reprogrammed to ES cell-like in- duced pluripotent stem (iPS) cells [41]. A notable excep- tion, however, has been observed with spermatogonial germ cells, whose reprogramming to ES cells involves 30 UTR lengthening [41]. Notably, this is in line with the fact that germ cells are more proliferative than ES cells. Simi- lar trends of 30 UTR length regulation have been reported for comparisons of ES cells versus neural stem/progenitor (NSP) cells or neurons [42]. Although these studies have all pointed to a connection between 30 UTR length and cell proliferation, cardiac hypertrophy, in which myocytes grow in size rather than in number, has also been found to involve 30 UTR shortening [43]. Thus, a general rule may be that APA regulation is correlated with cell growth. Cancer Cancer cells are of co with this, and consist been found to express, UTRs, as ļ¬rst shown mouse B-cell leukem recently in human colo lung cancers [47]. In t proļ¬le was found to subtypes with differe its relevance to cance nostic marker. One ke in cancer is whether p major driver of APA. M transformed and non dicted proliferation ra transformation has a [44]. However, a recen the same cells (BJ prim lial cell line MCF10A) formed states, pro determinant of 30 UTR of 30 UTR regulation i that, compared to MC and MB231 show sho spectively. Notably, it to the general trend, adhesion genes, tend t UTRs in cancer cells [4 delineated how APA o different cancer types APA is modulated by miRNA RBP Translaʟon DegradaʟonLocalizaʟon AAAnCDS CDS cUTR aUTR !! AAA AAA n TiBS Figure 2. Regulation of cis elements in 30 untranslated regions (UTRs) by alternative cleavage and polyadenylation (APA). Two mRNA isoforms are mediated degradation. Transcripts are also subject to wide computational analysis of sequence and stability Figure 1 (a) Ex1 Ex3 PASPAS Ex2 Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then identical proteins are produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions. www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232 Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then identical proteins are produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions. www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232 Adapted from Tress et al. 2007 Protein Ā isoforms Ā  depletion at t downstream tioning migh ing the rate o these observ mental studie and to estab nucleosome o Anotherw to affect APA genetic effect tissues, in tw Napl15), whi genes(namel cases,thepro are therefore Nature Reviews | Genetics Neuron activity Proliferation Cancer Oculopharyngeal muscular dystrophy Global APA Biological processes Connections to disease Favour distal poly(A) site usage Favour proximal poly(A) site usage Figure 3 | Biological processes that have been linked with broad APA modulation. A schematic showing the biological processes and diseases that alternative polyadenylation(APA)hasbeenlinkedwith.Inaddition,thetendencytowardsdistal orproximalpoly(A)siteusageisshown. Elkon, et al. 2013
  • 5. APA in plants and unknown questions? Background Although genome-wide investigation of polyadenylation in single Arabidopsis accession, we still do not know: 1.ā€Æ How much variation in the polyadenylation usage across Arabidopsis accessions? What is the genetic basis for such variation? Cis regulation? Trans? 2.ā€Æ Is Arabidopsis an outlier for any of the trends of polyadenylation site usage compared with related species? How has APA evolved across related species? Ā  Ā  Ā  Ā  Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation Xiaohui Wua,b , Man Liua , Bruce Downiec , Chun Lianga , Guoli Jib , Qingshun Q. Lia,b,1 , and Arthur G. Huntd,1 a Department of Botany, Miami University, Oxford, OH 45056; b Department of Automation, Xiamen University, Xiamen, Fujian 361005, Peopleā€™s Republic of China; and c Department of Horticulture and Seed Biology Group, and d Department of Plant and Soil Sciences, University of Kentucky, Lexington, KY 40546-0312. Edited by David C. Baulcombe, University of Cambridge, Cambridge, United Kingdom, and approved June 8, 2011 (received for review January 14, 2011) Alternative polyadenylation (APA) has been shown to play an important role in gene expression regulation in animals and plants. However, the extent of sense and antisense APA at the genome level is not known. We developed a deep-sequencing protocol that queries the junctions of 3ā€²UTR and poly(A) tails and conļ¬dently maps the poly(A) tags to the annotated genome. The results of this mapping show that 70% of Arabidopsis genes use more than one poly(A) site, excluding microheterogeneity. Analy- sis of the poly(A) tags reveal extensive APA in introns and coding sequences, results of which can signiļ¬cantly alter transcript se- quences and their encoding proteins. Although the interplay of intron splicing and polyadenylation potentially deļ¬nes poly(A) site uses in introns, the polyadenylation signals leading to the use of CDS protein-coding region poly(A) sites are distinct from the rest of the genome. Interestingly, a large number of poly(A) sites correspond to putative antisense transcripts that overlap with the promoter of the associated sense transcript, a mode pre- viously demonstrated to regulate sense gene expression. Our results suggest that APA plays a far greater role in gene expres- sion in plants than previously expected. alternative processing | antisense transcription | nonstop mRNAs The polyadenylation of mRNA in eukaryotes is an important step in gene expression in eukaryotes. With few exceptions, mature eukaryotic mRNAs possess a poly(A) tract, that in turn functions to facilitate transport of the mRNA to the cytoplasm and its subsequent stabilization and translation. The poly(A) tail contributes regulatory information to each of these processes through interactions with RNA processing factors and poly(A)- binding proteins. The process of polyadenylation also contributes to regulation by ā€œdeterminingā€ the composition of the mRNA apart from the poly(A) tail. Thus, the position along the gene where the pre-mRNA is processed and polyadenylated deter- mines the sequence content in terms of exons and regulatory motifs. If a gene possesses more than one polyadenylation site, then the nature of the expressed mRNA can be altered via dif- ferential choice of these sites, a process that is called alternative polyadenylation, or APA. That APA may be important is sug- gested by the observations that more than 50% of human and plant genes have multiple poly(A) sites (1ā€“5). APA may be an important factor in the regulation of genes associated with can- cer and with early embryo development in animals (6ā€“8). APA the FLC gene (15, 16); these antisense transcripts are involved in transcriptional regulation of sense FLC mRNAs through chro- matin modiļ¬cations in the vicinity of the sense FLC promoter. The regulation of these two genes thus provides examples of two modes of APA, involving intronic polyadenylation and 3ā€² end processing of antisense transcripts. Plant poly(A) site datasets (3, 17) have been assembled from the analysis and curation of the results of EST and full-length cDNA sequencing projects. Unfortunately, these projects are not specially targeted to the identiļ¬cation of poly(A) sites, nor are they high-throughput. With this consideration in mind, a strategy designed to speciļ¬cally query the mRNA-poly(A) junction on a transcriptome-wide basis was developed and used to study poly(A) site choice in Arabidopsis leaves and seeds. The results obtained using this strategy reveal an extensive network of po- tential APA in Arabidopsis, including unanticipated and novel modes of APA. In addition, the results corroborate other reports suggestive of wide-spread antisense transcription in Arabidopsis, and provide a dataset of poly(A) sites associated with antisense transcripts. Finally, they provide evidence for tissue-speciļ¬c poly(A) site choice. Results Preparation and Characterization of cDNA Tags That Query Poly- adenylation Sites. To study Arabidopsis poly(A) sites on a genome- wide basis, short DNA tags that include the mRNA-poly(A) site junction [called poly(A) tags, or PATs hereafter] were prepared and sequenced; the starting materials for these samples were RNA isolated from dry seeds and the leaves of young seed- lings. The initial sequences were processed and mapped to the Arabidopsis reference genome. After removing potential internal priming candidates and eliminating tags that mapped to chlo- roplast and mitochondria genomes and to miscellaneous RNAs (primarily rRNAs), a collection of tags that deļ¬ned more than 280,000 individual poly(A) sites were obtained (Table S1). Be- cause poly(A) site microheterogeneity is ubiquitous in plants (3, 4), poly(A) sites in the same gene that are located within 24 nt of each other were clustered so as to deļ¬ne a poly(A) site cluster (PAC). The results of this process were more than 71,000 PACs with an average of 54 PATs per PAC (Table S1). Of these PACs, 57,473 were in the ā€œsenseā€ orientation with respect to an anno- Author contributions: X.W., M.L., G.J., Q.Q.L., and A.G.H. designed research; X.W., M.L., NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 19 NUMBER 8 AUGUST 2012 845 R E S OU RC E Arabidopsis thaliana is an important model system that has had a critical role in discoveries essential to our understanding of plant biology and of generically important processes such as RNA interfer- ence (RNAi). Although the A. thaliana genome was sequenced more than a decade ago, challenges remain in resolving the RNAs that it encodes and determining their functional significance. Establishing where transcripts end is essential in genome annotation and for understanding gene function. Alternative cleavage and polyadenyla- tion (APA) defines different 3 ends within pre-mRNA transcribed from the same gene, and this can affect function by determining coding potential or the inclusion of regulatory sequence elements1,2. This regulation of RNA 3 -end formation is considerably more wide- spread than previously thought1,2, and RNA-binding proteins that enable A. thaliana flowering provide important examples of the biological impact of this control3. Defective 3 -end formation and transcription termination at tandem or convergent gene pairs can result in transcription interference or RNAi4,5, revealing that these processes normally partition the genome and maintain expression of neighboring genes6. Accordingly, such consequences of uncontrolled 3 -end formation also emphasize the critical nature of gene arrange- ment along a eukaryotic chromosome. As a prelude to the analysis of regulators of 3 -end formation, we set out to map A. thaliana RNA 3 ends genome-wide. Previous high-throughput A. thaliana transcriptome studies have depended on the copying of RNA into complementary DNA (cDNA) with reverse transcriptase7ā€“10. However, the intrinsic template switch- ing11 and DNA-dependent DNA-polymerase12 activities of reverse transcriptases, together with oligo(dT)-dependent internal priming13, cause well-established artifacts that can affect the identification of authentic antisense RNAs14,15, splicing events14 and RNA 3 ends13,16. Different strategies have been developed to address these problems, making strand-specific RNA sequencing an increasingly powerful tool for the analysis of transcriptomes. However, a recent comparison of several such methods showed marked differences not only in strand specificity but also in a range of criteria that influence transcriptome interpretation17. Therefore, as an alternative, we used direct RNA sequencing (DRS) to identify polyadenylated A. thaliana RNAs18. This approach is direct in the sense that native RNA is used as the sequencing template, but the sequence is read by imaging comple- mentary fluorescent nucleotides incorporated by a polymerase. In this true single-molecule sequencing (tSMS) procedure, the site of RNA cleavage and polyadenylation is defined with an accuracy of 2 nucleotides (nt) in the absence of errors induced by reverse transcriptase, ligation or amplification18. RESULTS Mapping A. thaliana RNA 3 ends Total RNA purified from A. thaliana seedlings was subjected to DRS, and a computational procedure to align reads uniquely to the most recent A. thaliana genome release (currently TAIR10) was developed. The initial mapping analysis revealed that the vast majority of reads (89.60%) aligned to protein-coding genes, which is consistent with the idea that this approach can identify authentic sites of mRNA cleavage and polyadenylation (Fig. 1a). These data define extremely heterogeneous patterns of RNA 3 -end formation (Fig. 1b) that differ markedly from those of human mRNAs analyzed in the same way (Supplementary Fig. 1a)18. Although nontemplated base addition between cleavage sites and the poly(A) tail has been reported from analysis of A. thaliana expressed- sequence-tag (EST) data19, we found no evidence for this phenomenon 1College of Life Sciences, University of Dundee, Dundee, UK. 2Department of Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee, UK. 3Helicos BioSciences Corporation, Cambridge, Massachusetts, USA. Correspondence should be addressed to G.G.S. (g.g.simpson@dundee.ac.uk) or G.J.B. (g.j.barton@dundee.ac.uk). Received 16 February; accepted 19 June; published online 22 July 2012; doi:10.1038/nsmb.2345 Direct sequencing of Arabidopsis thaliana RNA reveals patterns of cleavage and polyadenylation Alexander Sherstnev1, CĆ©line Duc1, Christian Cole1, Vasiliki Zacharaki1, Csaba Hornyik2, Fatih Ozsolak3, Patrice M Milos3, Geoffrey J Barton1 & Gordon G Simpson1,2 It has recently been shown that RNA 3 -end formation plays a more widespread role in controlling gene expression than previously thought. To examine the impact of regulated 3 -end formation genome-wide, we applied direct RNA sequencing to A. thaliana. Here we show the authentic transcriptome in unprecedented detail and describe the effects of 3 -end formation on genome organization. We reveal extreme heterogeneity in RNA 3 ends, discover previously unrecognized noncoding RNAs and propose widespread reannotation of the genome. We explain the origin of most poly(A)+ antisense RNAs and identify cis elements that control 3 -end formation in different registers. These findings are essential to understanding what the genome actually encodes, how it is organized and how regulated 3 -end formation affects these processes. npgĀ©2012NatureAmerica,Inc.Allrightsreserved. (AtCPSF30) (AtCPSF30*-YT521B) FLC OXT6 D P P D a a a b c b b c c FIGURE 2 | Schematic representation of alternative polyadenyla Xing, et al. 2012PAS2 PAS1 Gene Transcript1 Transcript2
  • 6. Investigate genome-wide variation of alternative polyadenylation in sense and antisense transcription across a set of Arabidopsis thaliana accessions Ā  Objective Objective ā€¢ā€Æ Is variation in APA as prevalent across genotypes as across tissue types? ā€¢ā€Æ Is there genetic basis for variation related to the trans regulation as well as cis of APA? ā€¢ā€Æ Does a geneā€™s proximity to neighboring genes constrain polyadenylation site choice and limit variation?
  • 7. Approach Method 82 bp Strand-specific RNA-seq Map reads to each corresponding genome-- PALMapper Transform read positions from each transcriptome into a common coordinate system based on a multiple-genome alignment Retrieve polyA-containing reads, cluster across all accessions and identify poly(A) site (PAS) Generate read counts for each PAS for each accession Compare PASs genome-wide across accessions 19 accessions (genome sequenced) SeedlingRoot Floral bud RNA extraction & library construction with barcode
  • 8. PALMapper: map RNA-seq reads to reference ā€¢ā€Æ PALMapper (Jean, et al. 2010) ā€¢ā€Æ A combination of: the spliced alignment method QPALMA (De Bona, et al. 2008) the short read alignment tool GenomeMapper (Schneeberger, et al. 2009) http://ftp.raetschlab.org/software/palmapper/palmapper-0.5.tar.gz Version Ā 0.5 Ā released: Ā  Method Adapted from Kahles, et al. 2013 talk Another Mapper? Memorial Sloan-Kettering Cancer Cente Advantages: ā€¢ā€Æ Alignments with variants, e.g. mismatches, indels ā€¢ā€Æ Accurate spliced alignments using computational splice site predictions ā€¢ā€Æ More accurate than TopHat (e.g. C. elegance 47% & 81%, respectively) ā€¢ā€Æ Fast alignments (about 10 million reads/hour) ā€¢ā€Æ Softtrimming for polyA tail of each read
  • 9. Softtrimming ā€¢ā€Æ Ā The sequence remain in bam file ā€¢ā€Æ Annotated with cigar ā€œSā€ annotation ā€¢ā€Æ Ignored by many tools such as the IGV
  • 10. How did I retrieve the poly(A) reads? Method The mapped sam file with softtrimmed poly(A) Softtrimming + Ā  5ā€™ Ā  3ā€™ Ā  RNAseq_reads Ā  5ā€™ Ā  3ā€™ Ā  Genome Ā  5ā€™ Ā  3ā€™ Ā  AAAAAAAA 5ā€™ Ā  + Ā  Splicing Ā length Ā >=1500bp Ā  Ā  Perl programming to pick up Poly(A) reads Consecutive As in 3ā€™ end of reads >=8bp Quality score of each A >=40 Huge splicing
  • 11. Defining poly(A) clusters (PAS) Result Identify poly(A) reads across accessions 2,203,313 Ā  Cluster poly(A) reads: 75,532 PASs ā€¢ā€Æ In the same orientation ā€¢ā€Æ Within 10bp of each other across all accessions ā€¢ā€Æ Total cluster interval spanning <= 24bp Map PASs to genic regions (Ā±120bp to the annotated range): ā€¢ā€Æ 93.4% PASs map to genic regions ā€¢ā€Æ 6.6% PASs further away from genic regions Consider the sense & antisense PASs: ā€¢ā€Æ Poly(A) reads orientation relative to the gene orientation ā€¢ā€Æ 6581 genes with >= 20 sense poly(A) reads across accessions ā€¢ā€Æ 1473 genes with >= 10 antisense poly(A) reads across accessions
  • 12. Reads mapping to the major and non-major poly(A) cluster within gene Result ā€¢ā€Æ Major PAS: the PAS with the most reads across all accessions for each gene ā€¢ā€Æ p = proportion of total reads in gene mapping to major PAS ā€¢ā€Æ q = 1-p = proportion of total reads in gene mapping to non-major PASs
  • 13. The distribution of the proportion of reads mapping to non- major sense & antisense poly(A) clusters per gene Genes with the proportion of non-major cluster reads equal to or greater than 0.4 ( indicated with gray dashed lines) were considered as containing alternative poly(A) sites and chosen for further polymorphic analysis Result 6581 gene with sense PASs 1471 gene with antisense PASs
  • 14. Pairwise difference in the proportion of reads mapping to non- major poly(A) clusters across accessions Result ĀÆD = 1 n n 1X i=1 nX j=i+1 Dij ā€¢ā€Æ For the ith and jth accessions Ai, and Aj, we can calculate their absolute difference of the proportion of reads mapping to non-major poly(A) cluster, here called Dij, Dij = |qAi ā€“ qAj| ā€¢ā€Æ Average pairwise difference: Where n=19 ā€¢ā€Æ Maximum pairwise difference: Dmax = max{Dij}
  • 15. Pairwise difference in the proportion of reads mapping to non- major poly(A) clusters across accessions 3074 genes with sense PAS Result Average pairwise difference Maximum pairwise difference Dmax
  • 16. Pairwise difference in the proportion of reads mapping to non- major poly(A) clusters across accessions 544 genes with antisense PAS Result Maximum pairwise difference DmaxAverage pairwise difference
  • 17. Gene position and antisense PAS Result Nearby gene: the distance apart from its adjacent gene <=2kb Groups Fraction of genes in each group Fraction of genes with sense poly(A) reads >=20 Fraction of genes with proportion of non-major sense PASs>0.4 Fraction of genes with antisense poly(A) reads >=10 Fraction of genes with proportion of non- major antisense PASs>0.4 A 57.87% 62.92% 62.94% 96.91% 97.79% B 20.48% 21.30% 20.59% 1.65% 0.74% C 21.64% 15.77% 16.46% 1.43% 1.47%
  • 18. Conclusion ā€¢ā€Æ For genes with more sense & antisense poly(A) reads, half use non-major PAS at least 40% of the time ā€¢ā€Æ Pairwise comparison across all accessions helped to identify the best candidate genes for polymorphism in the usage or position of major PASs Conclusion
  • 19. Outlook ā€¢ā€Æ Combine all tissues & all accessions, calculate & its variance ā€¢ā€Æ Associate with gene categories, poly(A) site location of genes, etc. ā€¢ā€Æ Examine the trans/cis poly(A) QTL with the MAGIC linesā€™ data ā€¢ā€Æ Check the relationship between the antisense poly(A) site & the orientation of nearby genes, and the relationship this may have with expression level ā€¢ā€Æ Check the data from related species, Capsella rubella & A. lyrata to look at APA usage & its evolution between species ā€¢ā€Æ Ask if A. thaliana an outlier for any of the trends observed? if APA is derived in A. thaliana? Outlook
  • 20. Acknowledgements Kansas State University Dr. Chris Toomajian University of Utah Dr. Richard Clark Dr. Joshua Steffen Edward J. Osborne Robert Greenhalgh Wellcome Trust Centre for Human Genetics, University of Oxford Dr. Richard Mott Memorial Sloan-Kettering Cancer Center Dr. Gunnar Raetsch Philipp Drewe Andre Kahles
  • 21.
  • 22. Alternative polyadenylation (APA) Background Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then identical proteins are produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions. www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232 Adapted from Tress et al. 2007 Protein Ā isoforms Ā 
  • 23. Outlook ā€¢ā€Æ Combine all tissues and all accessions, take each tissue as subset, calculate and its variance ā€¢ā€Æ For each tissue, associate with gene categories according to GO analysis & gene families ā€¢ā€Æ Compare the distribution of from different tissues, and PAS usage patterns among tissues or accessions ā€¢ā€Æ Check Ka/Ks for genes with high/low in all tissues ā€¢ā€Æ Check the poly(A) site location for genes with high , e.g. 3'UTR, CDS, 5'UTR or intron ā€¢ā€Æ Compare the location across accessions ā€¢ā€Æ Look at the relationship of location with gene expression level ā€¢ā€Æ Examine the cis poly(A) QTL with the MAGIC linesā€™ RNA-seq data ā€¢ā€Æ Check the relationship between the antisense poly(A) site and the orientation of nearby genes for each tissue subset, and the relationship this may have with expression level ā€¢ā€Æ Check the data from Capsella rubella and A. lyrata to look at APA usage and its evolution between species ā€¢ā€Æ Ask if A. thaliana an outlier for any of the trends observed? if APA is derived in A. thaliana? Outlook
  • 24. Tian, et al. 2013 differentiated cells are reprogrammed to ES cell-like in- duced pluripotent stem (iPS) cells [41]. A notable excep- tion, however, has been observed with spermatogonial germ cells, whose reprogramming to ES cells involves 30 UTR lengthening [41]. Notably, this is in line with the fact that germ cells are more proliferative than ES cells. Simi- lar trends of 30 UTR length regulation have been reported for comparisons of ES cells versus neural stem/progenitor (NSP) cells or neurons [42]. Although these studies have all pointed to a connection between 30 UTR length and cell proliferation, cardiac hypertrophy, in which myocytes grow in size rather than in number, has also been found to involve 30 UTR shortening [43]. Thus, a general rule may be that APA regulation is correlated with cell growth. Cancer Cancer cells are of course hi with this, and consistent with been found to express, in gene UTRs, as ļ¬rst shown in tran mouse B-cell leukemia/lymp recently in human colorectal c lung cancers [47]. In the stud proļ¬le was found to be info subtypes with different surv its relevance to cancer devel nostic marker. One key questi in cancer is whether prolifera major driver of APA. Meta-an transformed and nontransfo dicted proliferation rates has transformation has a signiļ¬c [44]. However, a recent study the same cells (BJ primary ļ¬b lial cell line MCF10A) in prol formed states, proliferatio determinant of 30 UTR length of 30 UTR regulation in cance that, compared to MCF10A, and MB231 show shortened spectively. Notably, it has als to the general trend, some g adhesion genes, tend to expre UTRs in cancer cells [45,46]. T delineated how APA of differe different cancer types and at APA is modulated by multi Regulation of core C/P facto miRNA RBP Translaʟon DegradaʟonLocalizaʟon AAAnCDS CDS cUTR aUTR !! AAA AAA n TiBS Figure 2. Regulation of cis elements in 30 untranslated regions (UTRs) by alternative cleavage and polyadenylation (APA). Two mRNA isoforms are shown. The 30 UTR region upstream of the proximal cleavage and Figure 1 (a) Ex1 Ex3 PASPAS Ex2 Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology Major categories of APA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then identical proteins are produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of protein produced may be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when the proximal PAS is chosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions. www.sciencedirect.com Current Opinion in Cell Biology 2013, 25:222ā€“232
  • 25. Alternative polyadenylation (APA) Background in abundance. One of the best-charac- is that of microRNA (miR)-mediated studies of myogenic [43,44 ], hemato- d cancer [45] cells, transcripts bearing contained fewer miRNA-binding sites, these transcripts to evade miRNA- dation. Transcripts are also subject to Upf1 binds to the 3 UTR in a length-dependent manner, thus eliciting degradation of longer transcripts more rapidly [48 ]. The 30 UTR contains elements that affect not only transcript degradation but also stability. In a genome- wide computational analysis of sequence and stability (a) Ex1 Ex3 PASPAS Ex2 Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology PA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when hosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions. om Current Opinion in Cell Biology 2013, 25:222ā€“232 Mueller, et al. 2012 Tian, et al. 2013 lar trends of 30 UTR length regulation have been reported for comparisons of ES cells versus neural stem/progenitor (NSP) cells or neurons [42]. Although these studies have all pointed to a connection between 30 UTR length and cell proliferation, cardiac hypertrophy, in which myocytes grow in size rather than in number, has also been found to involve 30 UTR shortening [43]. Thus, a general rule may be that APA regulation is correlated with cell growth. recentl lung ca proļ¬le subtyp its rele nostic m in canc major d transfo dicted transfo [44]. H the sam lial cel formed determ of 30 U that, co and M spectiv to the adhesi UTRs i delinea differen APA is Regula The co include subuni miRNA RBP Translaʟon DegradaʟonLocalizaʟon AAAnCDS CDS cUTR aUTR !! AAA AAA n Ti BS Figure 2. Regulation of cis elements in 30 untranslated regions (UTRs) by alternative cleavage and polyadenylation (APA). Two mRNA isoforms are shown. The 30 UTR region upstream of the proximal cleavage and polyadenylation site (pA) is called the constitutive UTR (cUTR), and the downstream region is called the alternative UTR (aUTR). RNA-binding protein (RBP) and miRNA targeting to the aUTR are shown. Impacts on mRNA localization, translation, and degradation are indicated. CDS, coding sequence. Adapted from Tress et al. 2007 Protein Ā isoforms Ā  depletion at the site and more pron downstream from it, suggesting th tioning might influence PAS use by ing the rate of polymerase elongat these observations are only corr mental studies are required in ord and to establish a causeā€“effect re nucleosome occupancy and poly(A Neuron activity Proliferation Cancer Oculopharyngeal muscular dystrophy Global APA Biological processes Connections to disease R Elkon, et al. 2013
  • 26. Alternative polyadenylation (APA) Background in abundance. One of the best-charac- is that of microRNA (miR)-mediated studies of myogenic [43,44 ], hemato- d cancer [45] cells, transcripts bearing contained fewer miRNA-binding sites, these transcripts to evade miRNA- dation. Transcripts are also subject to Upf1 binds to the 3 UTR in a length-dependent manner, thus eliciting degradation of longer transcripts more rapidly [48 ]. The 30 UTR contains elements that affect not only transcript degradation but also stability. In a genome- wide computational analysis of sequence and stability (a) Ex1 Ex3 PASPAS Ex2 Ex1 Ex3Ex2Ex1 Ex3Ex2 (b) Ex1 Ex3Ex2Ex1 Ex2 Ex1 Ex3 PASPAS Ex2 5ā€² 5ā€² 3ā€² 5ā€² 5ā€² 5ā€² 5ā€²3ā€² 3ā€² 3ā€² 3ā€² 3ā€² Current Opinion in Cell Biology PA. This model refers to a hypothetical gene with three exons and two PASs. (a) When both PASs are located in the 30 UTR, then produced. Because the 30 UTR often contains elements regulating transcript stability, degradation, or localization, the quantity of be altered depending upon PAS choice. (b) When one PAS is located in the coding region, a truncated protein is produced when hosen. Ex = exon, PAS = polyadenylation site; thick lines = UTR regions, thin lines = intronic regions. om Current Opinion in Cell Biology 2013, 25:222ā€“232 Mueller, et al. 2012 Tian, et al. 2013 lar trends of 30 UTR length regulation have been reported for comparisons of ES cells versus neural stem/progenitor (NSP) cells or neurons [42]. Although these studies have all pointed to a connection between 30 UTR length and cell proliferation, cardiac hypertrophy, in which myocytes grow in size rather than in number, has also been found to involve 30 UTR shortening [43]. Thus, a general rule may be that APA regulation is correlated with cell growth. recentl lung ca proļ¬le subtyp its rele nostic m in canc major d transfo dicted transfo [44]. H the sam lial cel formed determ of 30 U that, co and M spectiv to the adhesi UTRs i delinea differen APA is Regula The co include subuni miRNA RBP Translaʟon DegradaʟonLocalizaʟon AAAnCDS CDS cUTR aUTR !! AAA AAA n Ti BS Figure 2. Regulation of cis elements in 30 untranslated regions (UTRs) by alternative cleavage and polyadenylation (APA). Two mRNA isoforms are shown. The 30 UTR region upstream of the proximal cleavage and polyadenylation site (pA) is called the constitutive UTR (cUTR), and the downstream region is called the alternative UTR (aUTR). RNA-binding protein (RBP) and miRNA targeting to the aUTR are shown. Impacts on mRNA localization, translation, and degradation are indicated. CDS, coding sequence. Adapted from Tress et al. 2007 Protein Ā isoforms Ā  depletion at the site and more pron downstream from it, suggesting th tioning might influence PAS use by ing the rate of polymerase elongat these observations are only corr mental studies are required in ord and to establish a causeā€“effect re nucleosome occupancy and poly(A Neuron activity Proliferation Cancer Oculopharyngeal muscular dystrophy Global APA Biological processes Connections to disease R Elkon, et al. 2013
  • 27. depletion at t downstream tioning migh ing the rate o these observ mental studie and to estab nucleosome o Anotherw to affect APA genetic effect tissues, in tw Napl15), whi Nature Reviews | Genetics Neuron activity Proliferation Cancer Oculopharyngeal muscular dystrophy Global APA Biological processes Connections to disease Favour distal poly(A) site usage Favour proximal poly(A) site usage Figure 3 | Biological processes that have been linked with broad APA modulation. A schematic showing the biological processes and diseases that alternative polyadenylation(APA)hasbeenlinkedwith.Inaddition,thetendencytowardsdistal orproximalpoly(A)siteusageisshown. Elkon, et al. 2013
  • 28. hles (SKI, New York) PALMapper HiTSeq, July 20, 2013 1 Advantages: ā€¢ā€Æ Alignments with variants, e.g. mismatches, indels ā€¢ā€Æ Accurate spliced alignments using computational splice site predictions ā€¢ā€Æ More accurate than TopHat (e.g. C. elegance 47% 81%, respectively) ā€¢ā€Æ Fast alignments (about 10 million reads/hour) ā€¢ā€Æ Softtrimming for polyA tail of each read
  • 29. How did I retrieve the poly(A) reads? The mapped sam file with softtrimmed poly(A) Reads with Softtrimmed end consecutive As in the end Reads with long splicing length consecutive As in the end SoItrimming Ā  Ā  + Ā  5ā€™ Ā  3ā€™ Ā  RNAseq_reads Ā  consecuKve Ā As=8 Ā  Ā quality Ā score Ā of Ā  each Ā soItrimmed Ā bp Ā  =40 Ā  5ā€™ Ā  3ā€™ Ā  Genome Ā  5ā€™ Ā  3ā€™ Ā  AAAAAAAA 5ā€™ Ā  + Ā  Splicing Ā length Ā =1500bp Ā  Ā SoItrimming Ā  consecuKve Ā As=8 Ā  Ā quality Ā score Ā of Ā each Ā soItrimmed Ā bp Ā =40 Ā  Genome Ā  RNAseq_reads Ā  5ā€™ Ā  3ā€™ Ā  AAAAAAAA 5ā€™ Ā  + Ā  Splicing2 Ā length Ā =1500bp Ā  Ā Splicing1 Ā  Splicing1 Ā Splicing Ā 2 Ā  consecuKve Ā As=8 Ā  Ā quality Ā score Ā of Ā each Ā soItrimmed Ā bp Ā =40 Ā  5ā€™ Ā  3ā€™ Ā  AAAAAAAA 5ā€™ Ā  + Ā  Splicing Ā length Ā =1500bp Ā  Ā  consecuKve Ā As=8 Ā  Ā quality Ā score Ā of Ā each Ā soItrimmed Ā bp Ā =40 Ā  Perl programming to make the criteria true Method
  • 30. Defining poly(A) clusters (PAS) ā€¢ā€Æ 2,203,313 poly(A) reads across accessions are identified ā€¢ā€Æ Calculate the poly(A) site for each poly(A) read with Perl script ā€¢ā€Æ 75,532 PAS defined by clustering poly(A) reads in the same orientation and within 10bp of each other across all accessions with total cluster interval spanning no more than 24bp ā€¢ā€Æ 93.4% of clusters map to genic regions, and the 6.6% of clusters that are further away from genic regions ā€¢ā€Æ 6581 genes have at least 20 sense poly(A) reads across accessions ā€¢ā€Æ 1473 genes have at least 10 antisense poly(A) reads across accessions ā€¢ā€Æ Major sense PAS defined across all accessions for each gene as the sense PAS with the most reads ā€¢ā€Æ p = proportion of total reads in gene mapping to major PAS ā€¢ā€Æ q = 1-p = proportion of total reads in gene mapping to non-major PASs Result
  • 31. The distribution of the proportion of reads mapping to non- major sense and antisense poly(A) clusters per gene Genes with the proportion of non-major cluster reads equal to or greater than 0.4 ( indicated with gray dashed lines) were considered as containing alternative poly(A) sites and chosen for further polymorphic analysis Result
  • 32. Pairwise difference in the proportion of reads mapping to non- major poly(A) clusters across accessions 3074 genes with sense PAS Result
  • 33. Gene position and antisense PAS Result 10 Nearby gene: the distance apart from its adjacent gene =2kb