my desk is my bed
my desk is my bed
my desk is my bed
Human’s innovation......

in a crowded city,
where space is limited.
Nature’s innovations......

in a compact genome,
where space is also limited.
A Tale of Two Strands
sharing of transcriptional motifs between
the two strands of a compact genome

Paris, France
Entamoeba histolytica

a protozoan parasite
It has a compact genome
genome size

coding genes

A+T content

20Mb

9000

75%

coding region

60%

retrotransposon

25%
It has a compact genome
genome size

coding genes

A+T content

20Mb

9000

75%

intron frequency

0.2 

per gene

interge...
codon restricted evolution on both strands

in a compact genome

real estate is limited
codon restricted evolution on both strands

Natural
Antisense 
Transcripts

?

?

[ NAT ]

end
motifs

start
motifs
1

2

4

3
1
RNA-Seq

4

NAT
Locations

2
3
2

1
RNA-Seq

4

NAT
Locations

Motif
Identification

3
2

1
RNA-Seq

4

NAT
Locations

Motif
Identification

Genomic
Determinants

Simulations

3
2

1
RNA-Seq

NAT
Locations

Motif
Identification

Implications

Genomic
Determinants

Simulations

4

3
1
RNA-Seq

strand-specific 
RNA-Seq

transcription start
site mapping

poly(A) site mapping
1
RNA-Seq

strand-specific 
RNA-Seq

transcription start
site mapping

poly(A) site mapping
Pervasive NAT transcription
Partially

covered 
by NAT

~61% 
coding genes are 
covered by NAT

47%

Not

covered
by NAT

...
NATs at 3’end of genes
Partially

covered 
by NAT

47%

5’biased

unbiased

3’biased
NATs at 3’end of genes
+ve strand

RNA-Seq
log 2 coverage 
EHI_016120

-ve strand

RNA-Seq
log 2 coverage 

NAT

EHI_01613...
1
RNA-Seq

Artifact-corrected
strand-specific 
RNA-Seq

transcription start
site mapping

poly(A) site mapping
n=4186

NAT TSS 
Hotspot around stop codon

mRNA TSS 
Hotspot around -15nt of start codon

n=1991
Stop 
codon 

NAT TSS 

Hotspot at 1st base of stop codon
mRNA TSS
mRNA TSS

Stop 
codon 

Reverse
Complement

stop codon
Stop codon resembles an TSS initiator motif"
on the antisense strand 

mRNA TSS

Sequence around

mRNA TSS 
n=4186

NAT TS...
Stop codon resembles an TSS initiator motif"
on the antisense strand 

Sequence around

mRNA TSS 
n=4186

Reverse compleme...
1
RNA-Seq

Artifact-corrected
strand-specific 
RNA-Seq

transcription start
site mapping

poly(A) site mapping
n=7312

mRNA poly(A) site 
at around +20nt of stop codon
NAT poly(A) site
most poly(A) before ~500bp 
and decreases gradually along CDS

n=3128
NAT poly(A) site 

Hotspot at around -20nt of start codon

n=7312
mRNA 
TSS
?????

n=7312
2

1
RNA-Seq

4

NAT
Locations

Motif
Identification

3
2
Motif
Identification

poly(A) motifs

TSS motifs

Correlations
2
Motif
Identification

poly(A) motifs

TSS motifs

correlations
1

2
Poly(A) Signal Seq
(PAS), p<0.00001

3

Cleavage
 Downstream Seq Element 
Sequence
 (T-rich DSE) , p<0.00001
Element
...
1

2

3

Cleavage
 Downstream Seq Element 
Sequence
 (T-rich DSE) , p<0.00001
Element

(CSE)

Poly(A) Signal Seq
(PAS), p<...
3

Downstream Seq Element 
(T-rich DSE) , p<0.00001


mRNA 
Poly(A) site
n=4603

3

NAT

Poly(A) site
n=1991
	
  
2
Motif
Identification

Poly(A) Motifs

TSS Motifs

Correlations
1

A-rich region
(A-Box), p<0.00001

2

Upstream promoter core
(core), p<0.00001

Initiator 
Motif
(Inr)

3

mRNA 
TSS

n=...
1

A-rich region
(A-Box), p<0.00001

2

Upstream promoter core
(core), p<0.00001

Initiator 
Motif
(Inr)

3

mRNA 
TSS

n=...
Positional 
Constrains 
of NAT are encoded in 
the genome or not?
2
Motif
Identification

Poly(A) motifs

TSS motifs

Correlations
NAT TSS

Reverse
Complement

stop codon
mRNA 

Downstream seq element 
(T-rich DSE) , p<0.00001


Poly(A) site
motif

NAT

Promoter
motif

A-rich Box
(A-Box), p<0...
NAT poly(A) site 

Hotspot at around -20nt of start codon

n=7312
A-rich Box
(A-Box), p<0.00001

mRNA 
Promoter
motif

NAT

Downstream seq element 
(T-rich DSE) , p<0.00001

Poly(A) site
m...
mRNA 
TSS

NAT

Poly(A)

mRNA 
Poly(A)

NAT
TSS
NAT poly(A) site

Peak at ~500bp and decrease gradually 

n=3128
n=3128

Observed poly(A) sites are 
~5 times more frequent in the

Antisense Strand

n=3128
2

1
RNA-Seq

4

NAT
Locations

Motif
Identification

Genomic
Determinants

Simulations

3
Rationale

Results

Simulations

3
Rationale

Results

Simulations

3
Defining the positional weighting matrix 
1

A-rich Box
(A-Box), p<0.00001

2

Upstream promoter core
(core), p<0.00001

In...
Defining the positional weighting matrix 
Weight factor for positions of 

3 motifs
Calculate the TSScore

Individual motif score 

= positional weighting * log (MAST p-value)
Calculate the TSScore
TSScore = A-Box PW * log ( A-Box MAST p-value )






+ core PW * log ( core MAST p-value )
+ Inr PW...
Rationale

Results

Simulations

3
RNA-Seq
Observed mRNA TSS distribution

TSScore

Simulated mRNA TSS distribution
RNA-Seq
Observed NAT TSS distribution

TSScore

Simulated NAT TSS distribution
Constrains for

NAT transcription initiation is 

encoded in genome
How about
Poly(A) site?
Simulation

PAScore
RNA-Seq 

Observed poly(A) site

PAScore

Simulated poly(A) site
PAScore

Simulated poly(A) site

Codon constrains
on sense strand 	
  

More poly(A) motifs 
on antisense strand ?	
  
Codon usage bias
Codon bias towards 3rd position A codons
Codon bias towards 3rd position A codons
Codon bias towards 3rd position A codons
Enrichment of T-rich motif on antisense strand

Codon bias towards NNA

Enrichment of T-rich motifs
Enrichment of T-rich motif on antisense strand
Antisense vs Sense strand

Enriched in 
Antisense strand

Enriched in 
Sens...
Enrichment of T-rich motif on antisense strand

Observed poly(A) sites are 
~5 times more frequent in the

antisense stran...
T-rich motifs are depleted on long NAT

T-rich motifs
depleted?

Long NAT

Short NAT
T-rich motifs are depleted on long NAT
Short vs Long NAT

Enriched in 
Short NAT


Enriched in 
Long NAT
T-rich motifs are depleted on long NAT

T-rich motifs "
depleted

T-rich motifs 
enriched

Long NAT

Short NAT
2

1
RNA-Seq

NAT
Locations

Motif
Identification

Implications

Genomic
Determinants

Simulations

4

3
mRNA with short 3’UTR
average 15nt

AAAAAA
stop codon

T-rich DSE

Initiator motif

A-box

NAT

pervasive
transcription at...
Codon bias towards NNA

Enrichment of T-rich

DSE for poly(A)
AAAAAA

NAT transcription is limited to 3’end by
enriched po...
Natural selection for long NAT?



Reduced codon bias towards NNA
Depletion of T-rich DSE for poly(A)

NAT transcription i...
neighbor

mRNA 

mRNA

Promoter motifs





A-box

Partition ??

T-rich DSE
AAAAAA
NAT poly(A) site hotspot.

Partition be...
A Tale of Two Strands
sharing of transcriptional motifs between
the two strands of a compact genome

Paris, France
Chung.chau.hon.pasteur.cshl.2013.no.movie.pptx
Upcoming SlideShare
Loading in …5
×

Chung.chau.hon.pasteur.cshl.2013.no.movie.pptx

536 views
445 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
536
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Chung.chau.hon.pasteur.cshl.2013.no.movie.pptx

  1. 1. my desk is my bed
  2. 2. my desk is my bed
  3. 3. my desk is my bed
  4. 4. Human’s innovation...... in a crowded city, where space is limited.
  5. 5. Nature’s innovations...... in a compact genome, where space is also limited.
  6. 6. A Tale of Two Strands sharing of transcriptional motifs between the two strands of a compact genome Paris, France
  7. 7. Entamoeba histolytica a protozoan parasite
  8. 8. It has a compact genome genome size coding genes A+T content 20Mb 9000 75% coding region 60% retrotransposon 25%
  9. 9. It has a compact genome genome size coding genes A+T content 20Mb 9000 75% intron frequency 0.2 per gene intergenic distance 150bp 5’UTR and 3’UTR 15bp
  10. 10. codon restricted evolution on both strands in a compact genome real estate is limited
  11. 11. codon restricted evolution on both strands Natural Antisense Transcripts ? ? [ NAT ] end motifs start motifs
  12. 12. 1 2 4 3
  13. 13. 1 RNA-Seq 4 NAT Locations 2 3
  14. 14. 2 1 RNA-Seq 4 NAT Locations Motif Identification 3
  15. 15. 2 1 RNA-Seq 4 NAT Locations Motif Identification Genomic Determinants Simulations 3
  16. 16. 2 1 RNA-Seq NAT Locations Motif Identification Implications Genomic Determinants Simulations 4 3
  17. 17. 1 RNA-Seq strand-specific RNA-Seq transcription start site mapping poly(A) site mapping
  18. 18. 1 RNA-Seq strand-specific RNA-Seq transcription start site mapping poly(A) site mapping
  19. 19. Pervasive NAT transcription Partially covered by NAT ~61% coding genes are covered by NAT 47% Not covered by NAT 39% 14% Fully covered by NAT
  20. 20. NATs at 3’end of genes Partially covered by NAT 47% 5’biased unbiased 3’biased
  21. 21. NATs at 3’end of genes +ve strand RNA-Seq log 2 coverage EHI_016120 -ve strand RNA-Seq log 2 coverage NAT EHI_016130 NAT NAT EHI_016140
  22. 22. 1 RNA-Seq Artifact-corrected strand-specific RNA-Seq transcription start site mapping poly(A) site mapping
  23. 23. n=4186 NAT TSS Hotspot around stop codon mRNA TSS Hotspot around -15nt of start codon n=1991
  24. 24. Stop codon NAT TSS Hotspot at 1st base of stop codon
  25. 25. mRNA TSS
  26. 26. mRNA TSS Stop codon Reverse Complement stop codon
  27. 27. Stop codon resembles an TSS initiator motif" on the antisense strand mRNA TSS Sequence around mRNA TSS n=4186 NAT TSS ??? Reverse complement of sequence around stop codon n=7312 reverse complement stop codon
  28. 28. Stop codon resembles an TSS initiator motif" on the antisense strand Sequence around mRNA TSS n=4186 Reverse complement of sequence around stop codon n=7312 initiator motif
  29. 29. 1 RNA-Seq Artifact-corrected strand-specific RNA-Seq transcription start site mapping poly(A) site mapping
  30. 30. n=7312 mRNA poly(A) site at around +20nt of stop codon
  31. 31. NAT poly(A) site most poly(A) before ~500bp and decreases gradually along CDS n=3128
  32. 32. NAT poly(A) site Hotspot at around -20nt of start codon n=7312
  33. 33. mRNA TSS ????? n=7312
  34. 34. 2 1 RNA-Seq 4 NAT Locations Motif Identification 3
  35. 35. 2 Motif Identification poly(A) motifs TSS motifs Correlations
  36. 36. 2 Motif Identification poly(A) motifs TSS motifs correlations
  37. 37. 1 2 Poly(A) Signal Seq (PAS), p<0.00001 3 Cleavage Downstream Seq Element Sequence (T-rich DSE) , p<0.00001 Element (CSE) mRNA Poly(A) site n=4603 AAAAAA
  38. 38. 1 2 3 Cleavage Downstream Seq Element Sequence (T-rich DSE) , p<0.00001 Element (CSE) Poly(A) Signal Seq (PAS), p<0.00001 mRNA Poly(A) site n=4603 1 2 3 NAT Poly(A) site n=1991  
  39. 39. 3 Downstream Seq Element (T-rich DSE) , p<0.00001 mRNA Poly(A) site n=4603 3 NAT Poly(A) site n=1991  
  40. 40. 2 Motif Identification Poly(A) Motifs TSS Motifs Correlations
  41. 41. 1 A-rich region (A-Box), p<0.00001 2 Upstream promoter core (core), p<0.00001 Initiator Motif (Inr) 3 mRNA TSS n=3865
  42. 42. 1 A-rich region (A-Box), p<0.00001 2 Upstream promoter core (core), p<0.00001 Initiator Motif (Inr) 3 mRNA TSS n=3865 1 2 3 NAT TSS n=1991
  43. 43. Positional Constrains of NAT are encoded in the genome or not?
  44. 44. 2 Motif Identification Poly(A) motifs TSS motifs Correlations
  45. 45. NAT TSS Reverse Complement stop codon
  46. 46. mRNA Downstream seq element (T-rich DSE) , p<0.00001 Poly(A) site motif NAT Promoter motif A-rich Box (A-Box), p<0.00001
  47. 47. NAT poly(A) site Hotspot at around -20nt of start codon n=7312
  48. 48. A-rich Box (A-Box), p<0.00001 mRNA Promoter motif NAT Downstream seq element (T-rich DSE) , p<0.00001 Poly(A) site motif
  49. 49. mRNA TSS NAT Poly(A) mRNA Poly(A) NAT TSS
  50. 50. NAT poly(A) site Peak at ~500bp and decrease gradually n=3128
  51. 51. n=3128 Observed poly(A) sites are ~5 times more frequent in the Antisense Strand n=3128
  52. 52. 2 1 RNA-Seq 4 NAT Locations Motif Identification Genomic Determinants Simulations 3
  53. 53. Rationale Results Simulations 3
  54. 54. Rationale Results Simulations 3
  55. 55. Defining the positional weighting matrix 1 A-rich Box (A-Box), p<0.00001 2 Upstream promoter core (core), p<0.00001 Initiator Motif (Inr) 3 mRNA TSS n=3865
  56. 56. Defining the positional weighting matrix Weight factor for positions of 3 motifs
  57. 57. Calculate the TSScore Individual motif score = positional weighting * log (MAST p-value)
  58. 58. Calculate the TSScore TSScore = A-Box PW * log ( A-Box MAST p-value ) + core PW * log ( core MAST p-value ) + Inr PW * log ( Inr MAST p-value )
  59. 59. Rationale Results Simulations 3
  60. 60. RNA-Seq Observed mRNA TSS distribution TSScore Simulated mRNA TSS distribution
  61. 61. RNA-Seq Observed NAT TSS distribution TSScore Simulated NAT TSS distribution
  62. 62. Constrains for NAT transcription initiation is encoded in genome
  63. 63. How about Poly(A) site?
  64. 64. Simulation PAScore
  65. 65. RNA-Seq Observed poly(A) site PAScore Simulated poly(A) site
  66. 66. PAScore Simulated poly(A) site Codon constrains on sense strand   More poly(A) motifs on antisense strand ?  
  67. 67. Codon usage bias
  68. 68. Codon bias towards 3rd position A codons
  69. 69. Codon bias towards 3rd position A codons
  70. 70. Codon bias towards 3rd position A codons
  71. 71. Enrichment of T-rich motif on antisense strand Codon bias towards NNA Enrichment of T-rich motifs
  72. 72. Enrichment of T-rich motif on antisense strand Antisense vs Sense strand Enriched in Antisense strand Enriched in Sense strand
  73. 73. Enrichment of T-rich motif on antisense strand Observed poly(A) sites are ~5 times more frequent in the antisense strand
  74. 74. T-rich motifs are depleted on long NAT T-rich motifs depleted? Long NAT Short NAT
  75. 75. T-rich motifs are depleted on long NAT Short vs Long NAT Enriched in Short NAT Enriched in Long NAT
  76. 76. T-rich motifs are depleted on long NAT T-rich motifs " depleted T-rich motifs enriched Long NAT Short NAT
  77. 77. 2 1 RNA-Seq NAT Locations Motif Identification Implications Genomic Determinants Simulations 4 3
  78. 78. mRNA with short 3’UTR average 15nt AAAAAA stop codon T-rich DSE Initiator motif A-box NAT pervasive transcription at 3’end
  79. 79. Codon bias towards NNA Enrichment of T-rich DSE for poly(A) AAAAAA NAT transcription is limited to 3’end by enriched poly(A) site motifs
  80. 80. Natural selection for long NAT? Reduced codon bias towards NNA Depletion of T-rich DSE for poly(A) NAT transcription is lengthen and possibly functional
  81. 81. neighbor mRNA mRNA Promoter motifs A-box Partition ?? T-rich DSE AAAAAA NAT poly(A) site hotspot. Partition between genes?
  82. 82. A Tale of Two Strands sharing of transcriptional motifs between the two strands of a compact genome Paris, France

×