The document summarizes a seminar on the physics of DNA, RNA, and RNA-like polymers. It discusses how DNA drives the initial infection process in bacteriophage due to its stiff and self-repelling properties. It also describes how RNA forms secondary structures through base pairing between nucleotides and how the 5' and 3' ends spontaneously associate in long RNA molecules, bringing them into close proximity. Finally, it compares models of RNA secondary structure, such as the randomly self-paired polymer model and successive folding model, and how they predict different scaling laws for end-to-end distance with polymer length.
After a long period, I bring you new - fresh Presentation which gives you a brief idea on sub-problem of Dynamic Programming which is called as -"Longest Common Subsequence".I hope this presentation may help to all my viewers....
There are various reasons why we would want to find the extreme (maximum and minimum values) of a function. Fermat's Theorem tells us we can find local extreme points by looking at critical points. This process is known as the Closed Interval Method.
After a long period, I bring you new - fresh Presentation which gives you a brief idea on sub-problem of Dynamic Programming which is called as -"Longest Common Subsequence".I hope this presentation may help to all my viewers....
There are various reasons why we would want to find the extreme (maximum and minimum values) of a function. Fermat's Theorem tells us we can find local extreme points by looking at critical points. This process is known as the Closed Interval Method.
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
This paper proposes two approaches to compositional
semantics in distributional semantic spaces. Both approaches
conceive the semantics of complex structures, such
as phrases or sentences, as being other than the sum of its
terms. Syntax is the plus used as a glue to compose words. The
former kind of approach encodes information about syntactic
dependencies directly into distributional spaces, the latter exploits
compositional operators reflecting the syntactic role of words.
We present a preliminary evaluation performed on GEMS
2011 “Compositional Semantics” dataset, with the aim of understanding
the effects of these approaches when applied to
simple word pairs of the kind Noun-Noun, Adjective-Noun and
Verb-Noun. Experimental results corroborate our conjecture that
exploiting syntax can lead to improved distributional models and
compositional operators, and suggest new openings for future
uses in real-application scenario.
Note on set convergence: We give the definitions of inner and outer limits for sequences of sets in topological and normed spaces and we provide some important facts on set convergence on topological and normed spaces. We juxtapose the notions of the limit superior and limit inferior for sequences of sets and we outline some facts regarding the Painlevé-Kuratowski convergence of set-sequences.
The concept of limit formalizes the notion of closeness of the function values to a certain value "near" a certain point. Limits behave well with respect to arithmetic--usually. Division by zero is always a problem, and we can't make conclusions about nonexistent limits!
It's the deck for one Hulu internal machine learning workshop, which introduces the background, theory and application of expectation propagation method.
In topological inference, the goal is to extract information about a shape, given only a sample of points from it. There are many approaches to this problem, but the one we focus on is persistent homology. We get a view of the data at different scales by imagining the points are balls and consider different radii. The shape information we want comes in the form of a persistence diagram, which describes the components, cycles, bubbles, etc in the space that persist over a range of different scales.
To actually compute a persistence diagram in the geometric setting, previous work required complexes of size n^O(d). We reduce this complexity to O(n) (hiding some large constants depending on d) by using ideas from mesh generation.
This talk will not assume any knowledge of topology. This is joint work with Gary Miller, Benoit Hudson, and Steve Oudot.
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
This paper proposes two approaches to compositional
semantics in distributional semantic spaces. Both approaches
conceive the semantics of complex structures, such
as phrases or sentences, as being other than the sum of its
terms. Syntax is the plus used as a glue to compose words. The
former kind of approach encodes information about syntactic
dependencies directly into distributional spaces, the latter exploits
compositional operators reflecting the syntactic role of words.
We present a preliminary evaluation performed on GEMS
2011 “Compositional Semantics” dataset, with the aim of understanding
the effects of these approaches when applied to
simple word pairs of the kind Noun-Noun, Adjective-Noun and
Verb-Noun. Experimental results corroborate our conjecture that
exploiting syntax can lead to improved distributional models and
compositional operators, and suggest new openings for future
uses in real-application scenario.
Note on set convergence: We give the definitions of inner and outer limits for sequences of sets in topological and normed spaces and we provide some important facts on set convergence on topological and normed spaces. We juxtapose the notions of the limit superior and limit inferior for sequences of sets and we outline some facts regarding the Painlevé-Kuratowski convergence of set-sequences.
The concept of limit formalizes the notion of closeness of the function values to a certain value "near" a certain point. Limits behave well with respect to arithmetic--usually. Division by zero is always a problem, and we can't make conclusions about nonexistent limits!
It's the deck for one Hulu internal machine learning workshop, which introduces the background, theory and application of expectation propagation method.
In topological inference, the goal is to extract information about a shape, given only a sample of points from it. There are many approaches to this problem, but the one we focus on is persistent homology. We get a view of the data at different scales by imagining the points are balls and consider different radii. The shape information we want comes in the form of a persistence diagram, which describes the components, cycles, bubbles, etc in the space that persist over a range of different scales.
To actually compute a persistence diagram in the geometric setting, previous work required complexes of size n^O(d). We reduce this complexity to O(n) (hiding some large constants depending on d) by using ideas from mesh generation.
This talk will not assume any knowledge of topology. This is joint work with Gary Miller, Benoit Hudson, and Steve Oudot.
Slides of a talk at CMU Theory lunch (http://www.cs.cmu.edu/~theorylunch/20111116.html) and Capital Area Theory seminar (http://www.cs.umd.edu/areas/Theory/CATS/#Grigory).
Dissertation Defense: The Physics of DNA, RNA, and RNA-like polymers
1. Exit Seminar:
Physics of
DNA, RNA, and RNA-like Polymers
Li Tai Fang
Department of Chemistry & Biochemistry
UCLA
2. Bacteriophage: DNA as genome
● DNA is a stiff,
self-repelling polymer
measure length:
● Capsid is highly gel electrophoresis
pressurized LamB
● DNA is released from
capsid upon binding
LamB
DNase
4. Generic properties of DNA
- independent of sequence
● Stiff and self-repelling
● persistence length radius of capsid
●
contour length diameter of capsid
● Confinement
● entropy outside entropy inside
● Physical properties of DNA drive the initial
infection process
5. RNA
a biopolymer
consisting of 4
different species of
monomers (bases):
G, C, A, U
G–C
A –U secondary
G–U structure
3'
5'
6. generic vs. sequence-specific properties
● Regardless of sequence or length, we can
predict
●
Pairing fraction: 60%
●
Average loop size: 8
● Average duplex length: 4
7. generic vs. sequence-specific properties
● Regardless of sequence or length, we can
predict
●
Pairing fraction: 60%
●
Average loop size: 8
● Average duplex length: 4
● 5' – 3' distance
8. Association of 5' – 3' required for:
● Efficient replication ● Efficient translation
of viral RNA of mRNA
complementary RNA binding
sequence protein
e.g.,
HIV-1, Influenza, Sindbis, etc.
9. Question:
How do the 5' and 3' ends of long RNAs find each other?
Answer:
The ends of RNA are always in close proximity, regardless of
sequence or length !
Yoffe A. et al, 2009
14. general approach
1) pi = probability that the ith set of “base-pair(s)”
-------will bring the ends to less than/equal to X
2) P(X) = at least one of those sets will occur
= 1 – (1 – pi)·(1 – pj)·(1 – pk)· … ·(1 – pz)
(X) = P(X) – P(X–1) = probability Ree is X
X = X (X) · X
17. Let's start the grunt work
Reminder:
RNA: Model:
NT = 1000 NT,eff = 520
Np = 600 Np,eff = 120
st
Now, the 1 challenge:
18. probability of a particular set of pairs
i j k l m n
p(i) = 120/520
p(ij) = 1 /519 = p (this partial set)
p(k) = 118/518
= p(i) p(i – j) p(k) p(k – l) p(m) p(m – n)
p(kl) = 1 /517
p(m) = 116/516
p(mn) = 1 /515 depends on NT,eff, Np,eff, and B
19. Next challenge:
● We have pi = p(NT,eff, Np,eff, B)
● We want P(X) = 1 – (1 – pi)·(1 – pj)·(1 – pk)· … ·(1 – pz)
Let (B) = number of ways to make a set of pairs
Then, P(X) = 1 – (1 – pB=1)B=1 · (1 – pB=2)B=2 · … · (1 – pBmax)Bmax
x1 x2 x3 x4
B = 3:
i j k l m n
20. Task: find (B)
● 1st, find the number of sets {x1, x2, …, xB+1},
such that X = x1+ x2+ … + xB+1
● for B = 3, X = 10: # of ways to arrange these:
X+B (X+B)!
=
B X! B!
21. For each {xi}, how many ways to move the
middle regions?
vs.
i j k l i j k l
Navailable NT,eff – X – B – 1
=
B–1 B–1
22. Consider all X's
X
X+B
B
NT,eff – X – B – 1
B–1
Xi=0
Missing something...... base-pairing “crossovers:”
(a) (b) (c) vs. (a) (b) (c)
i j k l i j k l
23. Crossovers are also known as pseudoknots
● X = xa + xb + x c
as long as xb j – i
____ and xb l – k
● 2 ways to connect
each middle region
●
undercount by 2(B – 1)
Now, let's put it all together
24. ( NT,eff , X, B )
X
(B – 1)
= 2 X+B
B
NT,eff – X – B – 1
B–1
Xi=0
25. Once again, the general approach
where end-to-end distance X
P(X) = at least one of these pairs will occur
P(X) = 1 – (1 – pi)·(1 – pj)·(1 – pk)· … ·(1 – pz)
P(X) = 1 – (1 – pB=1) B=1 · (1 – pB=2) B=2 · … · (1 – pBmax) Bmax
●
(X) = P(X) – P(X–1)
29. Problems:
● Pseudoknots are rare in RNA
● Not held in check in the self-paired
polymer model
● Successive RNA Folding Model:
● Pseudoknots completely prohibited
33. Acknowledgment
● Thesis advisors
Professors Bill Gelbart and Chuck Knobler
● Special thanks to
Professor Avi Ben-Shaul
● Thesis committee
Professors Joseph Loo, Giovanni Zocchi, Tom Chou
● Group members and former group members:
Aron Yoffe, Ajay Gopal, Odisse Azizgolshani, Peter Prinsen, Ruben Cadena,
Cathy Jin, Maurico Comas-Garcia, Rees Garmann, Peter Stavros, Vivian
Chiu Glover, Venus Vakhshori, Yufang Hu, Roya Zandi
34. For an RNA of N = 1000, pairing fraction = 0.6
Probability that the ends will be no more than 20 unpaired bases apart?
B = 1:
p = (120/520) (1/519) = 1/2249 = 4.45x10-4
= 231
P(1) = (1 – 4.45x10-4)231 = 0.902
B = 2:
p = (120 x 118) / (520x519x518x517) = 1.96x10-7
= 1.78 x 106
P(2) = (1 – 1.96x10-7)1.78E6 = 0.706
B = 3:
p = 8.55x10-11
= 5.301x109
P(3) = (1 – 8.55x10-11)5.301E9 = 0.635
B = 4:
p = 3.70x10-14
= 8.72x1012
P(4) = (1 – 3.70x10-14)8.72E12 = 0.725
Prob (X 20) = 1 – (0.902 0.706 0.635 0.725 … 1) = 0.81