Your SlideShare is downloading. ×
Comparing the early ciRNA papers
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Comparing the early ciRNA papers

490
views

Published on

Lab meeting presentation on the early ciRNA papers, with details on what they found and how they did it. …

Lab meeting presentation on the early ciRNA papers, with details on what they found and how they did it.

Mostly discussing:
WHITE SLIDES
Memczak,S. et al. (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature.

ORANGE SLIDES
Jeck,W.R. et al. (2012) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA.

All figures taken from respective papers.

Published in: Health & Medicine, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
490
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Computational pipeline Global analysis Characterization of 50 Circular RNAs are a large class of animal RNAs with regulatory potency S Memczak∗ , M Jens∗ , A Elefsinioti∗ , F Torti∗ , J Krueger, A Rybak, L Maier, S D Mackowiak, L H Gregersen, M Munschauer, A Loewer, U Ziebold, M Landthaler, C Kocks, F le Noble, and N Rajewsky April 30, 2013 ciRNA April 30, 2013 1 / 47
  • 2. Computational pipeline Global analysis Characterization of 50 Basic principles used 2x76bp run data generated after ribosomal RNA depletion and random priming screened RNA sequencing reads for splice junctions formed by an acceptor splice site at the 5’ end of an exon and a donor site at a downstream 3’ end (head-to-tail) ciRNA April 30, 2013 2 / 47
  • 3. Computational pipeline Global analysis Characterization of 50 More details 1 filtered out reads that aligned (with bowtie2) contiguously and full-length to the genome 2 From unmapped reads (including normal spliced reads) extracted 20mers from both ends and aligned them independently to find unique anchor positions within spliced exons. Anchors that aligned in the reversed orientation (head-to-tail) indicated circRNA splicing (normal splicing was also found by consecutive anchors). 3 Extended the anchor alignments such that the complete read aligns and the breakpoints were flanked by GU/AG splice sites. Ambiguous breakpoints were discarded. 4 The resulting alignments were read by another custom script that jointly evaluates consecutive anchor alignments belonging to the same original read, performs extensions of the anchor alignments, and collects statistics on splice sites. After the run completes, the script outputs all detected splice junctions (linear and circular) in a UCSC BED-like format with extra columns holding quality statistics, read counts etc. ciRNA April 30, 2013 3 / 47
  • 4. Computational pipeline Global analysis Characterization of 50 Filtering demands GU/AG flanking the splice sites (built in); unambiguous breakpoint detection a maximum of two mismatches in the extension procedure the breakpoint cannot reside more than 2 nucleotides inside an anchor at least two independent reads (each distinct sequence only counted once per sample) support the junction unique anchor alignments with a safety margin to the next-best alignment of at least one anchor above 35 points (approximately equivalent to more than two extra mismatches in high-quality bases) a genomic distance between the two splice sites of no more than 100 kb (only a small percentage of the data). As the ribosomal DNA cluster is part of the C. elegans genome assembly and ribosomal pre-RNAs could give rise to circular RNAs by mechanisms independent of the spliceosome, we discarded 130 candidates that mapped to the rDNA cluster on chrI:15,060,286-15,071,020. ciRNA April 30, 2013 4 / 47
  • 5. Computational pipeline Global analysis Characterization of 50 Permutation testing reversed either anchor reversed the complete read randomly reassigned anchors between reads reverse complemented the read (as a positive control). the reverse complement recovered the same output as expected, the various permutations led to only very few candidate predictions, well below 0.2% of the output with unpermuted reads and in excel-lent agreement with the results from simulated reads sensitivity (.75%) and FDR (0.2%) using simulated reads and permutations of real sequencing data the efficiency of ribominus protocols to extract and sequence circRNAs is limited, reducing overall sensitivity ciRNA April 30, 2013 5 / 47
  • 6. Computational pipeline Global analysis Characterization of 50 Data generated generated ribominus data for HEK293 cells and combined with human leukocyte data detected 1,950 circRNAs with support from at least two independent junction-spanning reads ciRNA April 30, 2013 6 / 47
  • 7. Computational pipeline Global analysis Characterization of 50 Jeck 2012: cANRIL, or how they got interested Our group discovered a circular RNA species, circular ANRIL, whose expression is associated with that of products of the human INK4a/ARF locus and is correlated with the risk of human atherosclerosis Production of cANRIL in humans is associated with common SNPs predicted to affect cANRIL splicing, suggesting the possibility that cANRIL production influences PcG-mediated repression of the INK4a/ARF locus to influence atherosclerosis risk ciRNA April 30, 2013 7 / 47
  • 8. Computational pipeline Global analysis Characterization of 50 CircleSeq: The RNAse R approach - Jeck 2012 ciRNA April 30, 2013 8 / 47
  • 9. Computational pipeline Global analysis Characterization of 50 CircleSeq: The RNase R approach - Jeck 2012 Method was optimized to allow for >10-fold enrichment of cANRIL in cDNA prepared from RNase R-treated vs. untreated samples Used approx. 300 million 100-bp reads per sample aligned to the human genome using a de novo splice mapping algorithm, MapSplice compiled the list of all fusion splice junctions where splice donor and acceptor occur within 2 Mb but in the non-colinear ordering; term these junctions backsplices. New metric - SRPBM - spliced reads per billion mapping Counts of reads mapping across an identified backsplice in untreated samples, normalized by read length and number of reads mapping,– to permit quantitative comparisons between backsplices. ciRNA April 30, 2013 9 / 47
  • 10. Computational pipeline Global analysis Characterization of 50 Using RNase R works CircleSeq enriches for backsplice junctions. Two-dimensional histograms showing normalized backspliced read count (SRPBM) or normalized exon coverage (RPKM) between two samples or replicates. (A) Coverage of backsplice reads in RNase R-treated replicates over all distinct backsplice species (R2 = 0.579). (B) Coverage of exons in mock treated replicates (R2 = 0.91). (C) Average backsplice coverage in RNase R-treated against mock treated RNA-seq showing enrichment of most backsplice species by RNase R. (D) Mean normalized exon coverage in annotated exon sequences in RNase R-treated against mock treated RNA-seq showing depletion of the majority of species by RNase R. ciRNA April 30, 2013 10 / 47
  • 11. Computational pipeline Global analysis Characterization of 50 And generates some interesting results... identified >100,000 unique backsplice events throughout the genome. Of these, 25,166 were present in both RNase R-treated biological replicates and were enriched by RNase R treatment as compared with mock treatment. 31% of backsplice species observed in untreated controls were not enriched by RNase R (Fig. 2C). These species likely represent mapping artifacts or nonsequential exons harbored in linear products, resulting from either RNA trans-splicing or cleavage of ecircRNAs. See “positive controls” cANRIL, cETS-1; not able to detect other known circular species (e.g., SRY and DCC), due to low expression of these genes in Hs68 cells the 25,166 replicated backsplice events detected by CircleSeq included the large majority (1025 of 1319) of putative circles identified through a previously described bioinformatic approach (Salzman 2012) - in different cell types!!! [T-cells vs fibroblasts] Of the 25,166 unique backsplicing events that reproducibly enriched by RNase R digestion in both biological replicates, most were only found in RNase R-treated samples and were not observed in the absence of exonuclease digestion. many of these events represent rare ecircRNAs arising from pervasive background levels of RNA circularization, which may result from an occasional error in splicing (Hsu and Hertel 2009). ciRNA April 30, 2013 11 / 47
  • 12. Computational pipeline Global analysis Characterization of 50 Filtering on confidence LOW stringency set requiring a single backsplice read in the control data and HIGH stringency requiring coverage on a par with splices in a moderately expressed gene ciRNA April 30, 2013 12 / 47
  • 13. Computational pipeline Global analysis Characterization of 50 Data generated - Memzack 2013 generated ribominus data for HEK293 cells and combined with human leukocyte data detected 1,950 circRNAs with support from at least two independent junction-spanning reads ciRNA April 30, 2013 13 / 47
  • 14. Computational pipeline Global analysis Characterization of 50 Expression of genes predicted to give rise to circRNAs was slightly shifted towards higher expression values “indicating that circRNAs are not just rare mistakes of the spliceosome” Histograms of gene expression levels obtained from polyA+ RNA sequencing in HEK293 cells. The number of reads per kilobase of exon per million mapped reads (RPKM) reflects mRNA abundance. Genes that are predicted to give rise to circRNAs (red circles) are not specifically enriched for high expression, (solid line: all genes). circRNAs from lowly expressed genes are detected less frequently, comparable to the loss of sensitivity observed for linear splicing (black dashed line: genes with > 75% of annotated splice sites recovered, gray dashed line: >50%, light gray:>10%) ciRNA April 30, 2013 14 / 47
  • 15. Computational pipeline Global analysis Characterization of 50 Mouse and nematode identified 1,903 circRNAs in mouse (brains, fetal head, differentiation-induced embryonic stem cells; 81 of these mapped to human circRNAs mapped mouse circRNAs were compared with independently identified human circRNAs using liftOver, yielding 229 circRNAs with precisely orthologous splice sites between human and mouse. Of these, 223 were composed exclusively of coding exons and were subsequently used for our conservation analysis (Fig. 1f). When intersecting the reported sets of circRNAs supported by two independent reads in each species, we found 81 conserved circRNAs (supported by at least 4 reads in total). used sequencing data from various C. elegans developmental stages (Stoeckius, M. manuscript in prep) and detected 724 circRNAs, with at least two independent reads. Numerous circRNAs seem to be specifically expressed in a cell type or developmental stage (Fig. 1b,c, S1e) hsa-circRNA 2149 is supported by 13 unique, head-to-tail spanning reads in CD191 leukocytes but is not detected in CD341 leukocytes, neutrophils or HEK293 cells a number of nematode circRNAs seem to be expressed in oocytes but absent in 1- or 2-cell embryos ciRNA April 30, 2013 15 / 47
  • 16. Computational pipeline Global analysis Characterization of 50 Where do these circRNAs come from? Intersection of circRNAs with known transcripts computational screen identifies only the splice sites that lead to circularization but not the internal exon/ intron structure of circular RNAs => inferred as much as possible from annotated transcripts conservative assumption was that as little as possible should be spliced out coincidence of circRNA splice sites with exonic boundaries inside a transcript were considered as an indicator for relevant agreement and internal introns appear to be spliced out ciRNA April 30, 2013 16 / 47
  • 17. Computational pipeline Global analysis Characterization of 50 Where do these circRNAs come from? annotated human circRNAs using the RefSeq database and a catalogue of non-coding RNAs 85% of human circRNAs align sense to known genes 10% of all circRNAs align antisense to known transcripts, smaller fractions align to UTRs, introns, unannotated regions of the genome. ciRNA April 30, 2013 17 / 47
  • 18. Computational pipeline Global analysis Characterization of 50 Where do these circRNAs come from? Sorted all overlapping transcripts hierarchically by splice-site coincidence (2, 1, or 0) total amount of exonic sequence between the splice sites total amount of coding sequence Latter was used to break ties only and helped the annotation process. if one or both splice sites fell into an exon of the best matching transcript, the corresponding exon boundary was trimmed. if it fell into an intron or beyond transcript bounds, the closest exon was extended to match the circRNA boundaries. circRNA start/end coordinates were never altered. If no annotated exons overlapped the circRNA we assumed a single-exon circRNA. The resulting annotation of circRNAs is based on the best matching transcript and may in some cases not represent the ideal choice. Changing the annotation rules, however, did not substantially change the numbers in Fig. 1d ciRNA April 30, 2013 18 / 47
  • 19. Computational pipeline Global analysis Characterization of 50 How many exons form part of circRNA? Their splice sites typically span one to five exons and overlap coding exons (84%), but only in 65% of these cases are both splice sites that participate in the circularization known splice sites Jeck: introns are spliced from most circular forms Jeck: observed many single exon ecircRNAs, that is, where the donor site splices to the acceptor of the same exon (e.g., KIAA0182) ciRNA April 30, 2013 19 / 47
  • 20. Computational pipeline Global analysis Characterization of 50 Jeck : ratio of forward to back-spliced products compared the abundance of backsplice-spanning reads to reads spanning traditionally spliced junctions. The relative rate of backsplicing over forward splicing for these sites varied enormously, from <0.1% to >3200%, with no forward splicing products observed in some cases using the LOW stringency list, 14.4% of genes expressed in human fibroblasts produced circular RNA species, suggesting at least one in eight human genes produces abundant circular as well as linear transcripts. ciRNA April 30, 2013 20 / 47
  • 21. Computational pipeline Global analysis Characterization of 50 Jeck validation: circRNA-generating transcripts are non-polyadenylated oligo-dT priming for reverse transcription significantly reduced the levels of backspliced products relative to poly(A)- containing transcripts To exclude the possibility that these transcripts were the results of trans-spliced products, used a virtual Northern approach, which identifies the size of transcripts by fractionating on a denatur-ing agarose gel, followed by qPCR. Backsplice-containing transcripts appeared in faster migrating fractions as compared to the associated full-length linear transcripts, as would be predicted of circular RNA species composed of only a subset of exons. Trans-spliced products, in contrast, would be expected to be longer than full-length, as they contain repeated exons. ciRNA April 30, 2013 21 / 47
  • 22. Computational pipeline Global analysis Characterization of 50 Examples The AFF1 intron is spliced out (Supplementary Fig. 2e). Sequence conservation: placental mammals phyloP score, scale bar, 200nucleotides. ciRNA April 30, 2013 22 / 47
  • 23. Computational pipeline Global analysis Characterization of 50 Jeck examples ciRNA April 30, 2013 23 / 47
  • 24. Computational pipeline Global analysis Characterization of 50 Conservation of intergenic and intronic circRNAs Intergenic and a few intronic circRNAs display a mild but significant enrichment of conserved nucleotides ciRNA April 30, 2013 24 / 47
  • 25. Computational pipeline Global analysis Characterization of 50 Conservation of intergenic and intronic: approach downloaded genome-wide human (hg19) phyloP conservation score58 tracks derived from genome alignments of placental mammals from UCSC read out the conservation scores along the complete circRNA and searched for blocks of at least 6-nucleotide length that exceeded a conservation score of 0.3 for intergenic and 0.5 for intronic circRNAs. The different cutoffs empirically adjust for the different background levels of conservation and were also used on the respective controls. For each circRNA, we computed the cumulative length of all such blocks and normalized it by the genomic length of the circRNA. Artefacts of constant positive conservation scores in the phyloP profile, apparently caused by missing alignment data, were removed with an entropy filter (this did not qualitatively affect the results). circRNAs annotated as intronic by the best-match procedure explained above that had any overlap with exons in alternative transcripts on either strand (5 cases) were removed from the analysis. phastCons score takes neighboring bases into account, estimating the probability that each nucleotide belongs to a conserved element. phyloP score is a separate measurement of conservation at each base, ignoring neighboring bases in its calculation. It can measure acceleration (faster evolution than expected under neutral drift) as well as conservation (slower than expected evolution). PhyloP is useful for evaluating signatures of selection at particular nucleotides (e.g. third codon positions, or first positions of miRNA target sites). In the phyloP plots, sites predicted to be conserved are assigned positive scores, while sites predicted to be fast-evolving are assigned negative scores. The absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution (|phyloP| = -log(p-value) under a null hypothesis of neutral evolution) ciRNA April 30, 2013 25 / 47
  • 26. Computational pipeline Global analysis Characterization of 50 circRNAs from coding loci conservation - approach To analyse circRNAs composed of coding sequence and thus high overall conservation, we selected 223 human circRNAs with circular orthologues in mouse and entirely composed of coding sequence. Control (linear) exons were randomly selected to match the level of conservation observed in first and second codon positions (Methods, Fig. 1f inset and Supplementary Fig. 1k for conservation of the remaining coding sequence (CDS)) ciRNA April 30, 2013 26 / 47
  • 27. Computational pipeline Global analysis Characterization of 50 circRNAs from coding loci conservation - results circRNAs with conserved circularization were significantly more conserved in the third codon position than controls, indicating evolutionary constraints at the nucleotide level, in addition to selection at the protein level (Fig. 1f and Supplementary Fig. 1j, k) Coding sequence phyloP conservation score distributions of first and second codon positions match between circRNAs and controls, in contrast the 3rd codon position is significantly more conserved in circRNAs (P ¡ 3e-10 n=223 Mann-Whitney-U (mwu)(also main Fig1. f). k, The conservation score distributions in the remaining parts of the CDS (outside the circRNA or control) do not differ significantly for codon positions two and three. For the first codon position, the controls are actually more conserved, P ¡ 2e-3 n=223 Mann- Whitney-U (mwu)), therefore conservative ciRNA April 30, 2013 27 / 47
  • 28. Computational pipeline Global analysis Characterization of 50 How they did coding conservation - the gritty details used the best-match strategy outlined above to construct an estimated exon- chain for the circRNAs that overlapped exclusively coding sequence. Using this chain we in silico spliced out the corresponding blocks of the conservation score profile. kept track of the frame and sorted the conservation scores into separate bins for each codon position. also recorded conservation scores in the remaining pieces of coding sequence (outside the circRNA) as a control. However, we observed that the level of conservation is systematically different between internal parts of the coding sequence and the amino- or carboxy-terminal parts (not shown). We therefore randomly generated chains of internal exons, mimicking the exon-number distribution of real circRNAs, as a control. When analysing the circRNAs conserved between human and mouse, it became furthermore apparent that we also needed to adjust for the higher level of overall conservation. High expression generally correlates with conservation and thus, an expression cutoff was enforced on the transcripts used to generate random controls. This resulted in a good to conservative match with the actual circRNAs. ciRNA April 30, 2013 28 / 47
  • 29. Computational pipeline Global analysis Characterization of 50 Conservation - Jeck - HIPK2 and HIPK3 the paralogous kinases, HIPK2 and HIPK3, both produced abundant circRNA, and so do the mouse orthologues sufficiently diverged to allow unique mapping but retain a similar genomic structure: a large second exon that contains the start codon flanked by large introns on either side increased coverage of exon 2 was not seen in polyAplus libraries, consistent with the predominant exon 2 species being circular (Encode Data, not shown). Based on RPKM and qRT-PCR, the circular exon 2 transcript of HIPK3 was apparently 5fold more abundant than the linear form. Consistent with transcripts being ecircRNAs originating from the murine Hipk2/3 genes, the amplified fragments were of the expected size and sequence and were enriched by RNase R digestion ciRNA April 30, 2013 29 / 47
  • 30. Computational pipeline Global analysis Characterization of 50 Jeck - more global conservation: compare to murine testis As was the case for human cells, a high number (646 of 1477) of circles found through a prior bioinformatic analysis (Salzman et al. 2012) of murine brain were identified by CircleSeq of murine testis. Of 2121 human circles from the MEDIUM stringency list (human fibroblasts) that could be readily mapped to the murine genome, 457 mapped to genes that produced a murine circular RNA (in testis...). identified 69 murine circular species (including Hipk3) with exactly homologous start and stop points of RNA circularization ciRNA April 30, 2013 30 / 47
  • 31. Computational pipeline Global analysis Characterization of 50 Experimentally tested circRNA predictions in HEK293 cells Head-to-tail splicing assayed by RT-qPCR with divergent primers and Sanger sequencing. Predicted head-to-tail junctions of 19/23 could be validated. 5/7 candidates exclusively predicted in leukocytes could not be detected in HEK293 cells, validating cell-type-specific expression. ciRNA April 30, 2013 31 / 47
  • 32. Computational pipeline Global analysis Characterization of 50 circRNAs are insensitive(ish) to RNAse R Jeck 2012: Backsplices for CDR1-AS were abundant in the control samples (mean SRPBM of 198), but both the backsplice reads and nonsplicing reads within the gene were depleted by exonuclease digestion (mean SRPBM of 16). These observations are most consistent with linear trans-splicing products rather than circular RNAs or with the cleavage of this circular RNA, as has been reported Head-to-tail splicing could be produced by trans-splicing or genomic rearrangements. To rule out these possibilities and PCR artefacts, validated the insensitivity of human circRNA candidates to digestion with RNase R (degrades linear RNA molecules) by northern blotting with probes which span the head-to-tail junctions Quantified RNase R resistance for 21 candidates with confirmed head-to-tail splicing by qPCR. All of these were at least 10-fold more resistant than GAPDH ciRNA April 30, 2013 32 / 47
  • 33. Computational pipeline Global analysis Characterization of 50 circRNAs turn over more slowly than linear RNA 24 h after blocking transcription circRNAs were highly stable, exceeding the stability of the housekeeping gene GAPDH ciRNA April 30, 2013 33 / 47
  • 34. Computational pipeline Global analysis Characterization of 50 circRNAs turn over more slowly than linear RNA ciRNA April 30, 2013 34 / 47
  • 35. Computational pipeline Global analysis Characterization of 50 Validation in other organisms 3/3 tested mouse circRNAs with human orthologues in mouse brains C. elegans ciRNA April 30, 2013 35 / 47
  • 36. Computational pipeline Global analysis Characterization of 50 circRNAs are not translated Engineered circular RNAs have previously been shown to have coding potential (Chen and Sarnow 1995). Linear products were significantly enriched in the ribosome-bound fraction for the genes assayed: HIPK3, KIAA0182, and MYO9B. Circular products, in contrast, were abundant in the unbound fractions but not detected in the bound fractions, indicating that these AUG- containing ecircRNAs are not translated ciRNA April 30, 2013 36 / 47
  • 37. Computational pipeline Global analysis Characterization of 50 circRNAs can be targeted by RNAi transfected Hs68 cells with siRNA targeting HIPK3 and ZFY that produce both linear and circularized transcripts; 3 siRNAs per gene one targeting sequence only in the linear transcript targeting the backsplice sequence targeting sequence in a circularized exon shared by both linear and circular species It was even possible to design an siRNA to the backsplice junction of ZFY that specifically targeted the circular, but not linear, transcript. ciRNA April 30, 2013 37 / 47
  • 38. Computational pipeline Global analysis Characterization of 50 circRNAs are probably miRNA sponges screened for occurrences of conserved miRNA family seed matches (Methods). counting repetitions of conserved matches to the same miRNA family, circRNAs were significantly enriched compared to coding sequences (P<2.96x10−22, MannWhitney U-test, n=3873) or 3’ UTR sequences (P<2.76x10−21, MannWhitney U-test, n=3182) ciRNA April 30, 2013 38 / 47
  • 39. Computational pipeline Global analysis Characterization of 50 circRNAs are cytoplasmic ecircRNAs either undergo nuclear export or are released to the cytoplasm during mitosis, where they enjoy extraordinary stability, likely as a result of resistance to debranching enzymes and RNA exonucleases. ciRNA April 30, 2013 39 / 47
  • 40. Computational pipeline Global analysis Characterization of 50 CDR1as is localized in the cytoplasm CDR1as RNA is cytoplasmic and disperse (white spots; single-molecule RNA FISH; maximum intensity merges of Z-stacks). siSCR, positive; siRNA1, negative control. Blue, nuclei (DAPI); scale bar, 5mkm ciRNA April 30, 2013 40 / 47
  • 41. Computational pipeline Global analysis Characterization of 50 Bioinformatic analysis DAVID revealed an enrichment of protein kinases and related proteins among the set of genes producing ecircRNAs no specific subfamily of kinase was particularly associated with ecircRNA production. sought to identify cis-sequence elements proximal to backsplice events. Sequences in the 200 bp preceding or following backsplice sites were analyzed for enriched motifs compared to similar windows flanking noncircularized, expressed exons. the highest information- bearing motif was shared by both the upstream and downstream introns and was identified as the canonical ALU repeat the intronic flanks adjacent to circularized exons were approximately twofold more likely to contain an ALU repeat than noncircularized exons. Circularized exons were sixfold more likely to contain complementary ALUs than control, noncircularized exons. Pairs of ALU elements taken from introns flanking circularized exons were significantly more likely to be complementary (in in- verted orientation) than noncomplementary. Equally likely in single-exon and multi-exon circRNAs ciRNA April 30, 2013 41 / 47
  • 42. Computational pipeline Global analysis Characterization of 50 Bioinformatic analysis 2: circRNA exons are large the upstream and downstream introns flanking circularized exons tended to be large: on average more than approximately threefold longer than introns flanking control exons circularized exons were larger than expected: when restricted to an analysis of single exon ecircRNAs, we noted that circularized exons were approximately threefold longer than expressed exons overall, at an average length of 690 nt ciRNA April 30, 2013 42 / 47
  • 43. Computational pipeline Global analysis Characterization of 50 How are circRNAs formed? ciRNA April 30, 2013 43 / 47
  • 44. Computational pipeline Global analysis Characterization of 50 Overlap of identified circRNAs with published circular RNAs. exons of DCC, ETS1 and a non-coding RNA from the human INK4/ARF locus and the CDR1as locus circRNAs from exons of the genes CAMSAP1, FBXW4, MAN1A2, REXO4, RNF220 and ZKSCAN1 have been recently experimentally validated10. For the four genes from this study, where we had ribominus data from the tissues in which these circRNAs were predicted (leukocytes), we recovered validated circRNAs from all of them (ZKSCAN1, CAMSAP1, FBXW4, MAN1A2). ciRNA April 30, 2013 44 / 47
  • 45. Computational pipeline Global analysis Characterization of 50 Known before DCC scrambled transcripts were estimated to comprise less than one one-thousandth of transcripts Nigro, J. M., Cho, K. R., Fearon, E. R., Kern, S. E., Ruppert, J. M., Oliner, J. D., et al. (1991). Scrambled exons. Cell, 64(3), 607613. MLL Caldas, C., So, C. W., MacGregor, A., Ford, A. M., McDonald, B., Chan, L. C., Wiedemann, L. M. (1998). Exon scrambling of MLL transcripts occur commonly and mimic partial genomic duplication of the gene. Gene, 208(2), 167176. ETS-1 Cocquerelle, C., Mascrez, B., Hetuin, D., Bailleul, B. (1993). Mis-splicing yields circular RNA molecules. FASEB Journal, 7(1), 155160. ciRNA April 30, 2013 45 / 47
  • 46. Computational pipeline Global analysis Characterization of 50 Best studied: mouse SRY consists of a single exon. During development, the RNA exists as a linear transcript that is translated into protein. In the adult testes, the RNA exists primarily as a circular product that is predominantly localized to the cytoplasm and is apparently not translated inverted repeats in the genomic sequence flanking the SRY exon direct transcript circularization ciRNA April 30, 2013 46 / 47