SlideShare a Scribd company logo
1 of 82
Download to read offline
THERMODYNAMICS OF COMPUTATION:
BEYOND NUMBER OF BIT ERASURES
David H. Wolpert
https://centre.santafe.edu/thermocomp
• ~5% energy use in developed countries goes to heating digital circuits
• Cells compute ~ 105 times more efficiently than CMOS computers
• In fact, biosphere as a whole is more efficient than supercomputers
• Deep connections among information theory, physics and (neuro)biology
• Calculate heat produced by running each gate in a circuit
• Sum over all gates to get total heat produced by running the full circuit
• Must be able to calculate heat produced by running an arbitrary gate…
(See N. Gershenfeld, IBM Systems Journal 35, 577 (1996))
Analyze relation between circuit’s design and the
heat it produces
Consider a (perhaps time-varying) master equation that sends
p0(x) to p1(x) = ∑x0
P(x1 | x0) p0(x).
• Example: Stochastic dynamics in a genetic network
• Example: (Noise-free) dynamics of a digital gate in a circuit
Consider a (perhaps time-varying) master equation that sends
p0(x) to p1(x) = ∑x0
P(x1 | x0) p0(x).
• Example: Stochastic dynamics in a genetic network
• Example: (Noise-free) dynamics of a digital gate in a circuit
where:
• S(p) is Shannon entropy of p
• EF(p0) is total entropy flow (out of system) between t = 0 and t = 1
• EP(p0) is total entropy production in system between t = 0 and t = 1
- cannot be negative
S(p1) − S(p0) = −EF(p0) + EP(p0)
(VanDenbroeck and Esposito, Physica A, 2015)
EXAMPLE
System evolves while connected to multiple reservoirs, e.g., heat baths at
different temperatures.
Assume “local detailed balance” holds
Then: EF(p0) is (temperature-normalized) heat flow into reservoirs
EP is non-negative. So:
EF(p0) = S(p0) − S(p1) + EP(p0)
“G𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿′
𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏”:
Heat flow from system ≥ S(p0) − S(p1)
EXAMPLE
• System evolves while connected to single heat bath at temperature T
• Two possible states
• p0 uniform
• Process implements bit erasure (so p1 a delta function)
So generalized Landauer’s bound says
Landauer’s conclusion
Total heat flow from system ≥ kT ln[2]
(Parrondo et al. 2015, Sagawa 2014,
Hasegawa et al. 2010, Wolpert 2015, etc.)
BACK TO CIRCUITS
• For fixed P(x1 | x0), changing p0 changes S(p0) − S(p1)
• N.b., same P(x1 | x0). e.g., same AND gate, has different p0, depending on
where it is in a circuit.
• So identical gates at different locations in a circuit have different
values of Landauer cost, S(p0) − S(p1)
 A new circuit design optimization problem
EF(p0) = S(p0) − S(p1) + EP(p0)
Different circuits all implementing same Boolean
function have different sum-total Landauer cost
NOTATION:
𝐼𝐼 𝑃𝑃 𝑋𝑋1, 𝑋𝑋2, … = ∑𝑖𝑖 𝑆𝑆(𝑃𝑃 𝑋𝑋𝑖𝑖 ) − 𝑆𝑆(𝑃𝑃 𝑋𝑋1, 𝑋𝑋2, … )
- “Multi-information”
- A generalization of mutual information
- Quantifies how much information is shared among the Xi
WHAT CIRCUIT TO COMPUTE A GIVEN FUNCTION?
(Partial) answer: Total Landauer cost if we use circuit C′ rather than C:
where g indexes gates.
I.e., choose circuit with
smallest sum total multi-information
of input distributions into its gates.
�
𝑔𝑔∈𝐶𝐶′
𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 ) − �
𝑔𝑔∈𝐶𝐶
𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 )
WHAT CIRCUIT TO COMPUTE A GIVEN FUNCTION?
(Partial) answer: Total Landauer cost if we use circuit C′ rather than C:
where g indexes gates.
I.e., choose circuit with
smallest sum total multi-information
of input distributions into its gates.
A global optimization problem.
�
𝑔𝑔∈𝐶𝐶′
𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 ) − �
𝑔𝑔∈𝐶𝐶
𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 )
• But total EF is the Landauer cost of the circuit plus its EP:
• For fixed P(x1 | x0), e.g., a fixed gate, changing p0 changes EP(p0)
So in order to know total EF from a gate in a circuit
(and therefore total EF in the circuit):
Need to know how EP(p0) depends on p0
EF(p0) = S(p0) − S(p1) + EP(p0)
Theorem: For any p0, and any master equation, entropy production is
where:
• KL(., .) is Kullback-Leibler divergence;
• q0(x) is a “prior” built into the gate (a physical parameter);
• The sum is over “islands” of the gate’s dynamics;
• 𝐸𝐸𝐸𝐸𝑐𝑐(. ) is non-negative for each c (a physical parameter);
(Kolchinsky and Wolpert, 2017)
• Total EF is the Landauer cost of the circuit plus its EP:
• Sum of entropy plus KL divergence is cross-entropy. So:
Total entropy flow out of a gate – the thermodynamic cost – is
where K(., .) is cross–entropy.
EF(p0) = S(p0) − S(p1) + EP(p0)
Total entropy flow out of a gate – the thermodynamic cost – is
where K(., .) is cross–entropy.
Example: An idealized two-ends “wire” maps
• (xin, xout) = (b, 0) ➙ (xin, xout) = (0, b)
• So regardless of p0, drop in cross-entropies equals 0.
• So in an idealized wire, entropy flow is linear in input distribution p0
Total entropy flow out of a gate – the thermodynamic cost – is
where K(., .) is cross–entropy.
Evaluate this cost for each wire and gate in a circuit, and then sum.
For simplicity, take EP(.) = 0.
Then total entropy flow out of circuit – the thermodynamic cost – is
where:
• g indexes the circuit’s gates (including “wire gates”)
• pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g)
 Another new circuit design optimization problem
OTHER RESULTS
• Analysis where do not set EP = 0.
• A family of circuits refers to any set of circuits that all “implement the same
function”, just for differing numbers of input bits
- Circuit complexity theory analyzes how costs (e.g., number of gates)
of each circuit in a family of circuits scale with number of input bits.
- Extension of circuit complexity theory to include thermodynamics costs.
• Analysis for “logically reversible circuits” - circuits built out of Fredkin
gates, with enough extra gates added to remove all “garbage bits”.
THERMODNAMICS OF TURING MACHINES
Kolmogorov complexity of bit string v:
Minimum length of an input bit string that causes a given
(universal, prefix) Turing machine to compute v and halt.
Thermodynamic Kolmogorov complexity of bit string v:
Minimum (over all input bit strings) entropy flow for a given
Turing machine to compute the bit string v and then halt:
where
 K(v) is Kolmogorov complexity of v
 Z – the normalization constant – is Chaitin’s constant
 G(Iv) is expected length of all input strings that compute v (i.e.,
probability of v under “universal distribution”)
K(v) + log[G(Iv)] + log[Z]
Thermodynamic Kolmogorov complexity of bit string v:
Minimum (over all input bit strings) entropy flow for a given
Turing machine to compute the bit string v and then halt:
where
 K(v) is Kolmogorov complexity of v
 Z – the normalization constant – is Chaitin’s constant
 G(Iv) is expected length of all input strings that compute v (i.e.,
probability of v under “universal distribution”)
A “correction” to Kolmogorov complexity,
reflecting cost of many-to-one maps
K(v) + log[G(Iv)] + log[Z]
Minimum (over all initial bit strings) entropy flow for a given Turing
machine to compute the bit string v and then halt:
State of
TM
Time v
A “correction” to Kolmogorov complexity,
reflecting cost of many-to-one maps
K(v) + log[G(Iv)] + log[Z]
Minimum (over all initial bit strings) entropy flow for a given Turing
machine to compute the bit string v and then halt:
State of
TM
Time v
K(v) is unbounded – no constant exceeds
length of {the shortest string to compute v} for all v
K(v) {
K(v) + log[G(Iv)] + log[Z]
Minimum (over all initial bit strings) entropy flow for a given Turing
machine to compute the bit string v and then halt:
State of
TM
Time v
Minimal EF is bounded – there is a constant that
exceeds {minimal work to compute v} for all v:
log[sum of lengths of red lines]
K(v) + log[G(Iv)] + log[Z]
K(v) {
= Prior prob.’s of
initial TM states
CONCLUSIONS
• Exact equations for entire entropy flow of a system:
EF(p0) = Landauer cost (p0) + EP(p0)
• Different circuits, all implementing the same function, all using
thermodynamically reversible gates, have different thermodynamic costs.
• Lots of future research!
• New Wiki:
https://centre.santafe.edu/thermocomp
Please visit and start to add material!
• New book:
“The energetics of computing in Life and
Machines”, D. Wolpert, C. Kempes, P.
Stadler, J. Grochow (Ed.), SFI Press, 2019
• New invited review:
“The stochastic thermodynamics of
computation”, D. Wolpert, J. Physics A (2019)
So entropy flow out of a system – the thermodynamic cost – is
where K(., .) is cross–entropy.
Applies to:
• Each gate in a digital circuit
• Each wire in a digital circuit
• Each “gate” in a noisy circuit, e.g., in a genetic circuit
• Each reaction in a stochastic chemical reaction network
OTHER RESULTS
• Some sufficient conditions for C to have greater EP than AO(C)
• Some sufficient conditions for C to have less EP than AO(C)
• Analysis when outdegrees of some gates > 1
• Analysis when prior distributions qg at each gate g are arbitrary.
• Analysis accounting for thermodynamic costs of wires
Total entropy flow out of circuit – the thermodynamic cost – is
where:
• g indexes the circuit’s gates
• pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g)
• L(g) is the set of islands of (function implemented by) gate g
Total entropy flow out of circuit – the thermodynamic cost – is
where:
• g indexes the circuit’s gates
• pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g)
• L(g) is the set of islands of (function implemented by) gate g
To focus on information theory, assume every EP(qpa(g);c) = 0
Total entropy flow out of circuit – the thermodynamic cost – is
where:
• g indexes the circuit’s gates
• pa(g) is parent gates of g in circuit (so p0
pa(g) is joint distribution into g)
• For any circuit C, the “All-at-once gate”, AO(C), is a single gate that
computes same function as C.
Rest of talk: Compare
Landauer costs and total EF for C and AO(C)
𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = ∑𝑔𝑔 𝐾𝐾𝑔𝑔 𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) − 𝐾𝐾𝑔𝑔 𝑝𝑝 𝑔𝑔, 𝑞𝑞 𝑔𝑔
EF for running circuit C on input distribution p when optimal distribution is q:
EF for running AO gate that implements same input-output function as C:
𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐾𝐾𝑖𝑖 𝑖𝑖 𝑝𝑝𝑖𝑖 𝑖𝑖
, 𝑞𝑞𝑖𝑖 𝑖𝑖
- 𝐾𝐾𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑜𝑜𝑜𝑜𝑜𝑜
, 𝑞𝑞𝑜𝑜𝑜𝑜𝑜𝑜
𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = ∑𝑔𝑔(𝐾𝐾𝑔𝑔 𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) − 𝐾𝐾𝑔𝑔 𝑝𝑝 𝑔𝑔, 𝑞𝑞 𝑔𝑔 )
EF for running circuit C on input distribution p when optimal distribution is q:
EF for running AO gate that implements same input-output function as C:
Thermodynamic penalty / gain by using C rather than AO(C):
𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐾𝐾𝑖𝑖 𝑖𝑖 𝑝𝑝𝑖𝑖 𝑖𝑖
, 𝑞𝑞𝑖𝑖 𝑖𝑖
- 𝐾𝐾𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑜𝑜𝑜𝑜𝑜𝑜
, 𝑞𝑞𝑜𝑜𝑜𝑜𝑜𝑜
Δ𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 − 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞
= 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
Thermodynamic penalty / gain by using C rather than AO(C):
where IK(p, q) is cross multi-information
When p = q, ∆EFC(p, q) cannot be negative
- Indeed, it can be +∞.
So never an advantage to using a circuit ... if p = q, i.e., if one “guessed
right” when designing every single gate.
Δ𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 − 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞
= 𝐼𝐼 𝐾𝐾
𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾
(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔
, 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔)
)
Thermodynamic penalty / gain by using C rather than AO(C):
So never an advantage to using a circuit, if p = q.
However in real world, too expensive to build an AO gate.
Even if p = q, there are circuits where total (Landauer) cost is infinite!
So, even if p = q, if we can’t use an AO gate,
what circuit to use to implement a given function?
Δ𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 − 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞
= 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
Even worse: if p ≠ q, extra total entropy flow if use C rather than AO(C) is
This can be positive or negative
In fact, extra EF can be either −∞ or +∞
So, if p ≠ q,
what circuit to use to implement a given function?
𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
When – as in real world – we aren’t lucky enough
to have gates obey p = q, using a circuit C rather
than AO(C) may either increase or decrease
total entropy flow out of circuit.
However if p ≠ q, extra EP if use C rather than AO(C) is
This can be positive or negative
−𝐼𝐼 𝐷𝐷 𝑝𝑝, 𝑞𝑞 + ∑𝑔𝑔 𝐼𝐼 𝐷𝐷(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
When – as in real world – we aren’t lucky enough
to have gates obey p = q, using a circuit C rather
than AO(C) may either increase or decrease EP
Even worse: if p ≠ q, extra total entropy flow if use C rather than AO(C) is
This can be positive or negative
In fact, extra EF can be either −∞ or +∞
𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
When – as in real world – we aren’t lucky enough
to have gates obey p = q, using a circuit C rather
than AO(C) may either increase or decrease
total entropy flow out of circuit.
Consider a (perhaps time-varying) master equation that sends
p0(x) to p1(x) = ∑x0
P(x1 | x0) p0(x).
• Example: Stochastic dynamics in a genetic network network
• Example: (Noise-free) dynamics of a digital gate in a circuit
EP(p0) is non-negative (regardless of the master equation)
EF(p0) = S(p0) − S(p1) + EP(p0)
(See Van Den Broeck and Esposito, Physica A, 2015)
Consider a (perhaps time-varying) master equation that sends
p0(x) to p1(x) = ∑x0
P(x1 | x0) p0(x).
EP(p0) is non-negative (regardless of the master equation)
EF(p0) = S(p0) − S(p1) + EP(p0)
EF(p0) ≥ S(p0) − S(p1)
“𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝑟𝑟′ 𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
Consider a (perhaps time-varying) master equation that sends
p0(x) to p1(x) = ∑x0
P(x1 | x0) p0(x).
EP(p0) is non-negative (regardless of the master equation).
• S(p0) − S(p1) called “Landauer cost” – minimal possible heat flow
So far, all math, no physics …
EF(p0) = S(p0) − S(p1) + EP(p0)
EF(p0) ≥ S(p0) − S(p1)
“𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝑟𝑟′ 𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
EXAMPLE
• System evolves while connected to single heat bath at temperature T
• Two possible states
• p0 uniform
• Process implements bit erasure (so p1 a delta function)
So generalized Landauer’s bound says
Landauer’s conclusion
Total heat flow from system ≥ kT ln[2]
(Parrondo et al. 2015, Sagawa 2014,
Hasegawa et al. 2010, Wolpert 2015, etc.)
STATISTICAL PHYSICS APPLICATION
Suppose system evolves while connected to multiple reservoirs, e.g., heat
baths at different temperatures.
Assume “local detailed balance” holds for those reservoirs (usually true)
Then: EF(p0) is (temperature-normalized) heat flow into reservoirs
S(p0) − S(p1) called “Landauer cost” – minimal possible heat flow
EF(p0) = S(p0) − S(p1) + EP(p0)
Generalized Landauer′
s bound:
Heat flow from system ≥ S(p0) − S(p1)
For any p0, and any master equation,
• The drop in KL divergence is called mismatch cost
• A purely information theoretic contribution to EF
• Non-negative; equals 0 iff p0 = q0
For any p0, and any master equation,
• The average of minimal entropy productions is called residual EP
• A non-information theoretic contribution to EF
• Non-negative; generally equals 0 only if the process is quasistatic
• For any p0, and any master equation,
• So total entropy flow out of a system – the thermodynamic cost – is
(where K(., .) is cross entropy)
EF(p0) = S(p0) − S(p1) + EP(p0)
EXAMPLE
System successively connected to two heat baths, at temp.’s T1 and T2 < T1
Recall: EF(p0) is sum over each reservoir of (temperature-normalized) total
heat flow from system into that reservoir
Suppose full cycle, so S(p0) − S(p1) = 0
So generalized Landauer’s bound says
𝑄𝑄𝑜𝑜𝑜𝑜𝑜𝑜(𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐)
𝑇𝑇2
−
𝑄𝑄𝑖𝑖 𝑖𝑖(ℎ𝑜𝑜𝑜𝑜)
𝑇𝑇1
≤ 0
Clausius’ inequality
Some simplifications arise if we restrict attention to particular
graphical model representations of the distributions.
Examples/
• A product distribution form of q 𝑥𝑥𝑖𝑖 𝑖𝑖 , the optimal input distribution
• A Bayes net representation of each 𝑝𝑝 𝑝𝑝𝑝𝑝 𝑔𝑔
(one for each gate g)
• A factor graph representation of 𝑝𝑝 𝑥𝑥𝑖𝑖 𝑖𝑖 , the actual input distribution
Aside: state dynamics (computation)
versus thermo-dynamics
t
v:
1
0
Dynamics of states in bit erasure
Aside: state dynamics (computation)
versus thermo-dynamics
t
v:
1
0
Dynamics of states in bit erasure
Aside: state dynamics (computation)
versus thermo-dynamics
t
v:
1
0
Dynamics of states in bit erasure
Aside: state dynamics (computation)
versus thermo-dynamics
t
v:
1
0
?
Dynamics of states in bit erasure
• Non-reversible map
Aside: state dynamics (computation)
versus thermo-dynamics
t
v:
1
0
Dynamics of distributions in bit erasure
with uniform initial probabilities
Aside: logical-dynamics (computation)
versus thermo-dynamics
t
v:
1
0
Dynamics of distributions in bit erasure
with uniform initial probabilities
• Reversible map
Total work to run a gate taking p0 to p1
K(p0, q0) – K(p1, q1)
Example of equation for work applied to a digital gate:
• Forward in time
Example of equation for work applied to a digital gate:
• Forward in time
Example of equation for work applied to a digital gate:
• If thermodynamically reversible, running in reverse sends ending
distribution back to starting one.
Example of equation for work applied to a digital gate:
• If thermodynamically reversible, running in reverse sends ending
distribution back to starting one.
Example of equation for work applied to a digital gate:
• If thermodynamically reversible, running in reverse sends ending
distribution back to starting one.
• In this case, mismatch cost = 0; total work is drop in entropies
Example: An all-at-once gate to implement some function f over large spaces
• Four bit input (16 states), 2 bit output
Example: An all-at-once gate to implement some function f over large spaces
• Four bit input (16 states), 2 bit output
• If thermodynamically reversible, running in reverse sends ending
distribution back to starting one.
• Always true for (optimized) all-at-once gate
Example: A digital circuit that also implements f
• Four bit input (16 states), 2 bit output
Example: A digital circuit that also implements f
• Four bit input (16 states), 2 bit output
• Suppose distribution over inputs is not a product of marginals
• If thermodynamically reversible, running in reverse sends ending
distribution back to starting one.
Example: A digital circuit that also implements f
• Four bit input (16 states), 2 bit output
• Suppose distribution over inputs is not a product of marginals
• If thermodynamically reversible, running in reverse sends ending
distribution back to starting one.
• Not true in general
Example: A digital circuit that also implements f
• Four bit input (16 states), 2 bit output
• Mismatch cost to implement f depends on what circuit used to
implement f, in general.
Example: A digital circuit that also implements f
• Four bit input (16 states), 2 bit output
• Mismatch cost to implement f depends on what circuit used to
implement f, in general.
• Can be lower with a circuit than with an “all at once” (AO) gate.
Example: A digital circuit that also implements f
• Four bit input (16 states), 2 bit output
• Mismatch cost to implement f depends on what circuit used to
implement f, in general.
• Can be lower with a circuit than with an “all at once” (AO) gate.
New kind of circuit design problem
Total entropy flow out of a gate / wire – the thermodynamic cost – is
where K(., .) is cross–entropy.
Evaluate this cost for each wire and gate in a {…} circuit, and then sum.
- Thermodynamic cost for any circuit, not just digital, noise-free circuits
So analysis holds for:
- “Noisy circuits” with noise in gates and/or paths, e.g., genetic circuits
- “Reversible circuits”, e.g., made with Fredkin gates
Some simplifications arise if we restrict attention to particular
graphical model representations of the distributions.
Shorthand: ∆WC(p, q) = extra EF using C rather than AO(C)
∆ ℇC(p, q) = extra mismatch cost using C
For Bayes net representations of distributions 𝑝𝑝𝑝𝑝𝑝𝑝(𝑔𝑔)
:
- have a separate DAG at each gate g, Γ(𝑔𝑔, 𝑝𝑝) (to define the Bayes nets)
The total work cost when p = q reduces to a sum of mutual informations:
where
- “in” refers to the joint set of all inputs to the circuit,
- For any DAG Γ, “R(Γ)” means its root nodes.
• A factor graph representation of a distribution P(x):
- each factor 𝜉𝜉𝑖𝑖 is a subset of X
- each 𝜙𝜙𝑖𝑖(𝜉𝜉𝑖𝑖) is a positive real-valued function
• Note that a given x can occur in more than one factor
P(x) ∝ �
𝑖𝑖
𝜙𝜙𝑖𝑖(𝜉𝜉𝑖𝑖)
…..
Different color curly brackets:
- Different factors of factor graph
representation of q.
…..
…..
Different color curly brackets:
- Different factors of factor graph
representation of q.
- Each yellow node has its
parents statistically
independent.
- Theorem: Each yellow node
contributes zero to total EF
…..
• For fixed P(x1 | x0), changing p0 also changes EP(p0)
• So need to know how EP(p0) depends on p0 in order to fully specify the
problem of designing a circuit to minimize total EF
• To present this result, must first define “islands” ...
EF(p0) = S(p0) − S(p1) + EP(p0)
Different circuits all implementing same Boolean
function have different sum-total Landauer cost
First new result: EP(p0) is a sum of an information-theoretic
term and a non-information-theoretic term.
An island of a function f : x0 → x1 is any set f -1(x1) for some x1.
Examples:
1) A noise-free wire maps (xin = y, xout = 0) → (xin = 0, xout = y).
For binary y, two islands
2) A noise-free AND gate maps
(x1
in = y1, x2
in = y2, xout = 0) → (0, 0, y1 AND y2)
For binary y1, y2, two islands
• Given a fixed master equation implementing f, for each island c, define
- The initial distribution that results in best possible thermodynamic
efficiency, for the (implicit) master equation – the prior distribution
qc(.) := argminr EP(r)
Total entropy flow out of a gate / wire – the thermodynamic cost – is
where K(., .) is cross–entropy.
Examples:
Noise-free wire: (xin = y, xout = 0) → (xin = 0, xout = y)
- Landauer cost = mismatch cost = 0; drop in cross-entropy = 0
- Residual EP captures all the dependence of the heat generated by
using a wire on the distribution of signals through it
Total entropy flow out of a gate / wire – the thermodynamic cost – is
where K(., .) is cross–entropy.
Examples:
Noise-free AND gate: (x1
in = y1, x2
in = y2, xout = 0) → (0, 0, y1 AND y2).
- Neither Landauer cost not mismatch cost = 0
- Drop in cross-entropy ≠ 0; depends on p0
Total entropy flow out of a gate / wire – the thermodynamic cost – is
where K(., .) is cross–entropy.
Evaluate this cost for each wire and gate in a circuit, and then sum.
Lots of new information theory comes out, relating:
• The topology of the circuit
• The distribution over the circuit’s inputs
• The thermodynamic cost of that circuit when run with that distribution
Total entropy flow out of circuit – the thermodynamic cost – is
where:
• g indexes the circuit’s gates
• pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g)
• ∀g, the distributions pg and ppa(g) are given by propagating input
distribution pin through the circuit
• Nitty-gritty physical details of each gate g determine qpa(g) and qg
• To focus on information theory, assume ∀g, qpa(g) and qg are given by
propagating a joint input prior distribution qin through the circuit
𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = ∑𝑔𝑔 𝐾𝐾𝑔𝑔 𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) − 𝐾𝐾𝑔𝑔 𝑝𝑝 𝑔𝑔, 𝑞𝑞 𝑔𝑔
EF for running circuit C on input distribution p when optimal distribution is q:
EF for running AO gate that implements same input-output function as C:
For simplicity, assume outdegree of each gate is 1 (a “Boolean formula”)
𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐾𝐾𝑖𝑖 𝑖𝑖 𝑝𝑝𝑖𝑖 𝑖𝑖
, 𝑞𝑞𝑖𝑖 𝑖𝑖
- 𝐾𝐾𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑜𝑜𝑜𝑜𝑜𝑜
, 𝑞𝑞𝑜𝑜𝑜𝑜𝑜𝑜

More Related Content

What's hot

SPU Optimizations-part 1
SPU Optimizations-part 1SPU Optimizations-part 1
SPU Optimizations-part 1
Naughty Dog
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
elliando dias
 
Presentation final
Presentation finalPresentation final
Presentation final
Kate Lu
 

What's hot (20)

First order active rc sections hw1
First order active rc sections hw1First order active rc sections hw1
First order active rc sections hw1
 
Dsp Lab Record
Dsp Lab RecordDsp Lab Record
Dsp Lab Record
 
SPU Optimizations-part 1
SPU Optimizations-part 1SPU Optimizations-part 1
SPU Optimizations-part 1
 
Network flow problems
Network flow problemsNetwork flow problems
Network flow problems
 
Minimum cost maximum flow
Minimum cost maximum flowMinimum cost maximum flow
Minimum cost maximum flow
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
 
CArcMOOC 03.04 - Gate-level design
CArcMOOC 03.04 - Gate-level designCArcMOOC 03.04 - Gate-level design
CArcMOOC 03.04 - Gate-level design
 
190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub
 
PAM
PAMPAM
PAM
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Search and optimization on quantum accelerators - 2019-05-23
Search and optimization on quantum accelerators - 2019-05-23Search and optimization on quantum accelerators - 2019-05-23
Search and optimization on quantum accelerators - 2019-05-23
 
Circuit complexity
Circuit complexityCircuit complexity
Circuit complexity
 
RF Circuit Design - [Ch2-1] Resonator and Impedance Matching
RF Circuit Design - [Ch2-1] Resonator and Impedance MatchingRF Circuit Design - [Ch2-1] Resonator and Impedance Matching
RF Circuit Design - [Ch2-1] Resonator and Impedance Matching
 
Ford Fulkerson Algorithm
Ford Fulkerson AlgorithmFord Fulkerson Algorithm
Ford Fulkerson Algorithm
 
Kinematic Equations for Uniformly Accelerated Motion
Kinematic Equations for Uniformly Accelerated MotionKinematic Equations for Uniformly Accelerated Motion
Kinematic Equations for Uniformly Accelerated Motion
 
Concurrent Triple Band Low Noise Amplifier Design
Concurrent Triple Band Low Noise Amplifier DesignConcurrent Triple Band Low Noise Amplifier Design
Concurrent Triple Band Low Noise Amplifier Design
 
STANDARD CELL LIBRARY DESIGN
STANDARD CELL LIBRARY DESIGNSTANDARD CELL LIBRARY DESIGN
STANDARD CELL LIBRARY DESIGN
 
Fawag snorre pilot_revisit
Fawag snorre pilot_revisitFawag snorre pilot_revisit
Fawag snorre pilot_revisit
 
Concurrency with Go
Concurrency with GoConcurrency with Go
Concurrency with Go
 
ScilabTEC 2015 - J. Richalet
ScilabTEC 2015 - J. RichaletScilabTEC 2015 - J. Richalet
ScilabTEC 2015 - J. Richalet
 

Similar to Thermodynamics of Computation: Far More Than Counting Bit Erasure: Far More Than Counting Bit Erasure

Electrical System Design transformer 4.pptx
Electrical System Design transformer 4.pptxElectrical System Design transformer 4.pptx
Electrical System Design transformer 4.pptx
GulAhmad16
 
Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)
Xi Qiu
 
ENERGY_CONVERSION 5_.................ppt
ENERGY_CONVERSION 5_.................pptENERGY_CONVERSION 5_.................ppt
ENERGY_CONVERSION 5_.................ppt
MANOJ KHARADE
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
Praveen Narayanan
 
MOS Inverters Switching Characterstics and interconnect Effects-converted.pptx
MOS Inverters Switching Characterstics and interconnect Effects-converted.pptxMOS Inverters Switching Characterstics and interconnect Effects-converted.pptx
MOS Inverters Switching Characterstics and interconnect Effects-converted.pptx
Balraj Singh
 

Similar to Thermodynamics of Computation: Far More Than Counting Bit Erasure: Far More Than Counting Bit Erasure (20)

Advd lecture 7 logical effort
Advd   lecture 7 logical effortAdvd   lecture 7 logical effort
Advd lecture 7 logical effort
 
An Efficient Construction of Online Testable Circuits using Reversible Logic ...
An Efficient Construction of Online Testable Circuits using Reversible Logic ...An Efficient Construction of Online Testable Circuits using Reversible Logic ...
An Efficient Construction of Online Testable Circuits using Reversible Logic ...
 
Avinash_PPT
Avinash_PPTAvinash_PPT
Avinash_PPT
 
07_Digital timing_&_Pipelining.ppt
07_Digital timing_&_Pipelining.ppt07_Digital timing_&_Pipelining.ppt
07_Digital timing_&_Pipelining.ppt
 
Electrical System Design transformer 4.pptx
Electrical System Design transformer 4.pptxElectrical System Design transformer 4.pptx
Electrical System Design transformer 4.pptx
 
Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)Ecd302 unit 04 (analysis)
Ecd302 unit 04 (analysis)
 
ENERGY_CONVERSION 5_.................ppt
ENERGY_CONVERSION 5_.................pptENERGY_CONVERSION 5_.................ppt
ENERGY_CONVERSION 5_.................ppt
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
 
Electricmotor4
Electricmotor4Electricmotor4
Electricmotor4
 
Lecture_10_Parallel_Algorithms_Part_II.ppt
Lecture_10_Parallel_Algorithms_Part_II.pptLecture_10_Parallel_Algorithms_Part_II.ppt
Lecture_10_Parallel_Algorithms_Part_II.ppt
 
Reduction of Total Harmonic Distortion in Cascaded H-Bridge Inverter by Patte...
Reduction of Total Harmonic Distortion in Cascaded H-Bridge Inverter by Patte...Reduction of Total Harmonic Distortion in Cascaded H-Bridge Inverter by Patte...
Reduction of Total Harmonic Distortion in Cascaded H-Bridge Inverter by Patte...
 
Final Project
Final ProjectFinal Project
Final Project
 
MOS Inverters Switching Characterstics and interconnect Effects-converted.pptx
MOS Inverters Switching Characterstics and interconnect Effects-converted.pptxMOS Inverters Switching Characterstics and interconnect Effects-converted.pptx
MOS Inverters Switching Characterstics and interconnect Effects-converted.pptx
 
Design and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment HelpDesign and Analysis of Algorithms Assignment Help
Design and Analysis of Algorithms Assignment Help
 
Power estimation in low power vlsi design
Power estimation in low power vlsi designPower estimation in low power vlsi design
Power estimation in low power vlsi design
 
HDT TOOLS PRESENTATION (2000)
HDT TOOLS PRESENTATION (2000)HDT TOOLS PRESENTATION (2000)
HDT TOOLS PRESENTATION (2000)
 
Active RC Filters Design Techniques
Active RC Filters Design TechniquesActive RC Filters Design Techniques
Active RC Filters Design Techniques
 
Current-Transformer-ppt.ppt
Current-Transformer-ppt.pptCurrent-Transformer-ppt.ppt
Current-Transformer-ppt.ppt
 
180953548-Current-Transformer-ppt.ppt
180953548-Current-Transformer-ppt.ppt180953548-Current-Transformer-ppt.ppt
180953548-Current-Transformer-ppt.ppt
 
CMOS Combinational_Logic_Circuits.pdf
CMOS Combinational_Logic_Circuits.pdfCMOS Combinational_Logic_Circuits.pdf
CMOS Combinational_Logic_Circuits.pdf
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Thermodynamics of Computation: Far More Than Counting Bit Erasure: Far More Than Counting Bit Erasure

  • 1. THERMODYNAMICS OF COMPUTATION: BEYOND NUMBER OF BIT ERASURES David H. Wolpert https://centre.santafe.edu/thermocomp
  • 2. • ~5% energy use in developed countries goes to heating digital circuits • Cells compute ~ 105 times more efficiently than CMOS computers • In fact, biosphere as a whole is more efficient than supercomputers • Deep connections among information theory, physics and (neuro)biology
  • 3. • Calculate heat produced by running each gate in a circuit • Sum over all gates to get total heat produced by running the full circuit • Must be able to calculate heat produced by running an arbitrary gate… (See N. Gershenfeld, IBM Systems Journal 35, 577 (1996)) Analyze relation between circuit’s design and the heat it produces
  • 4. Consider a (perhaps time-varying) master equation that sends p0(x) to p1(x) = ∑x0 P(x1 | x0) p0(x). • Example: Stochastic dynamics in a genetic network • Example: (Noise-free) dynamics of a digital gate in a circuit
  • 5. Consider a (perhaps time-varying) master equation that sends p0(x) to p1(x) = ∑x0 P(x1 | x0) p0(x). • Example: Stochastic dynamics in a genetic network • Example: (Noise-free) dynamics of a digital gate in a circuit where: • S(p) is Shannon entropy of p • EF(p0) is total entropy flow (out of system) between t = 0 and t = 1 • EP(p0) is total entropy production in system between t = 0 and t = 1 - cannot be negative S(p1) − S(p0) = −EF(p0) + EP(p0) (VanDenbroeck and Esposito, Physica A, 2015)
  • 6. EXAMPLE System evolves while connected to multiple reservoirs, e.g., heat baths at different temperatures. Assume “local detailed balance” holds Then: EF(p0) is (temperature-normalized) heat flow into reservoirs EP is non-negative. So: EF(p0) = S(p0) − S(p1) + EP(p0) “G𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿′ 𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏”: Heat flow from system ≥ S(p0) − S(p1)
  • 7. EXAMPLE • System evolves while connected to single heat bath at temperature T • Two possible states • p0 uniform • Process implements bit erasure (so p1 a delta function) So generalized Landauer’s bound says Landauer’s conclusion Total heat flow from system ≥ kT ln[2] (Parrondo et al. 2015, Sagawa 2014, Hasegawa et al. 2010, Wolpert 2015, etc.)
  • 8. BACK TO CIRCUITS • For fixed P(x1 | x0), changing p0 changes S(p0) − S(p1) • N.b., same P(x1 | x0). e.g., same AND gate, has different p0, depending on where it is in a circuit. • So identical gates at different locations in a circuit have different values of Landauer cost, S(p0) − S(p1)  A new circuit design optimization problem EF(p0) = S(p0) − S(p1) + EP(p0) Different circuits all implementing same Boolean function have different sum-total Landauer cost
  • 9. NOTATION: 𝐼𝐼 𝑃𝑃 𝑋𝑋1, 𝑋𝑋2, … = ∑𝑖𝑖 𝑆𝑆(𝑃𝑃 𝑋𝑋𝑖𝑖 ) − 𝑆𝑆(𝑃𝑃 𝑋𝑋1, 𝑋𝑋2, … ) - “Multi-information” - A generalization of mutual information - Quantifies how much information is shared among the Xi
  • 10. WHAT CIRCUIT TO COMPUTE A GIVEN FUNCTION? (Partial) answer: Total Landauer cost if we use circuit C′ rather than C: where g indexes gates. I.e., choose circuit with smallest sum total multi-information of input distributions into its gates. � 𝑔𝑔∈𝐶𝐶′ 𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 ) − � 𝑔𝑔∈𝐶𝐶 𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 )
  • 11. WHAT CIRCUIT TO COMPUTE A GIVEN FUNCTION? (Partial) answer: Total Landauer cost if we use circuit C′ rather than C: where g indexes gates. I.e., choose circuit with smallest sum total multi-information of input distributions into its gates. A global optimization problem. � 𝑔𝑔∈𝐶𝐶′ 𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 ) − � 𝑔𝑔∈𝐶𝐶 𝐼𝐼(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 )
  • 12. • But total EF is the Landauer cost of the circuit plus its EP: • For fixed P(x1 | x0), e.g., a fixed gate, changing p0 changes EP(p0) So in order to know total EF from a gate in a circuit (and therefore total EF in the circuit): Need to know how EP(p0) depends on p0 EF(p0) = S(p0) − S(p1) + EP(p0)
  • 13. Theorem: For any p0, and any master equation, entropy production is where: • KL(., .) is Kullback-Leibler divergence; • q0(x) is a “prior” built into the gate (a physical parameter); • The sum is over “islands” of the gate’s dynamics; • 𝐸𝐸𝐸𝐸𝑐𝑐(. ) is non-negative for each c (a physical parameter); (Kolchinsky and Wolpert, 2017)
  • 14. • Total EF is the Landauer cost of the circuit plus its EP: • Sum of entropy plus KL divergence is cross-entropy. So: Total entropy flow out of a gate – the thermodynamic cost – is where K(., .) is cross–entropy. EF(p0) = S(p0) − S(p1) + EP(p0)
  • 15. Total entropy flow out of a gate – the thermodynamic cost – is where K(., .) is cross–entropy. Example: An idealized two-ends “wire” maps • (xin, xout) = (b, 0) ➙ (xin, xout) = (0, b) • So regardless of p0, drop in cross-entropies equals 0. • So in an idealized wire, entropy flow is linear in input distribution p0
  • 16. Total entropy flow out of a gate – the thermodynamic cost – is where K(., .) is cross–entropy. Evaluate this cost for each wire and gate in a circuit, and then sum.
  • 17. For simplicity, take EP(.) = 0. Then total entropy flow out of circuit – the thermodynamic cost – is where: • g indexes the circuit’s gates (including “wire gates”) • pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g)  Another new circuit design optimization problem
  • 18. OTHER RESULTS • Analysis where do not set EP = 0. • A family of circuits refers to any set of circuits that all “implement the same function”, just for differing numbers of input bits - Circuit complexity theory analyzes how costs (e.g., number of gates) of each circuit in a family of circuits scale with number of input bits. - Extension of circuit complexity theory to include thermodynamics costs. • Analysis for “logically reversible circuits” - circuits built out of Fredkin gates, with enough extra gates added to remove all “garbage bits”.
  • 19. THERMODNAMICS OF TURING MACHINES Kolmogorov complexity of bit string v: Minimum length of an input bit string that causes a given (universal, prefix) Turing machine to compute v and halt.
  • 20. Thermodynamic Kolmogorov complexity of bit string v: Minimum (over all input bit strings) entropy flow for a given Turing machine to compute the bit string v and then halt: where  K(v) is Kolmogorov complexity of v  Z – the normalization constant – is Chaitin’s constant  G(Iv) is expected length of all input strings that compute v (i.e., probability of v under “universal distribution”) K(v) + log[G(Iv)] + log[Z]
  • 21. Thermodynamic Kolmogorov complexity of bit string v: Minimum (over all input bit strings) entropy flow for a given Turing machine to compute the bit string v and then halt: where  K(v) is Kolmogorov complexity of v  Z – the normalization constant – is Chaitin’s constant  G(Iv) is expected length of all input strings that compute v (i.e., probability of v under “universal distribution”) A “correction” to Kolmogorov complexity, reflecting cost of many-to-one maps K(v) + log[G(Iv)] + log[Z]
  • 22. Minimum (over all initial bit strings) entropy flow for a given Turing machine to compute the bit string v and then halt: State of TM Time v A “correction” to Kolmogorov complexity, reflecting cost of many-to-one maps K(v) + log[G(Iv)] + log[Z]
  • 23. Minimum (over all initial bit strings) entropy flow for a given Turing machine to compute the bit string v and then halt: State of TM Time v K(v) is unbounded – no constant exceeds length of {the shortest string to compute v} for all v K(v) { K(v) + log[G(Iv)] + log[Z]
  • 24. Minimum (over all initial bit strings) entropy flow for a given Turing machine to compute the bit string v and then halt: State of TM Time v Minimal EF is bounded – there is a constant that exceeds {minimal work to compute v} for all v: log[sum of lengths of red lines] K(v) + log[G(Iv)] + log[Z] K(v) { = Prior prob.’s of initial TM states
  • 25. CONCLUSIONS • Exact equations for entire entropy flow of a system: EF(p0) = Landauer cost (p0) + EP(p0) • Different circuits, all implementing the same function, all using thermodynamically reversible gates, have different thermodynamic costs. • Lots of future research!
  • 26. • New Wiki: https://centre.santafe.edu/thermocomp Please visit and start to add material! • New book: “The energetics of computing in Life and Machines”, D. Wolpert, C. Kempes, P. Stadler, J. Grochow (Ed.), SFI Press, 2019 • New invited review: “The stochastic thermodynamics of computation”, D. Wolpert, J. Physics A (2019)
  • 27.
  • 28. So entropy flow out of a system – the thermodynamic cost – is where K(., .) is cross–entropy. Applies to: • Each gate in a digital circuit • Each wire in a digital circuit • Each “gate” in a noisy circuit, e.g., in a genetic circuit • Each reaction in a stochastic chemical reaction network
  • 29. OTHER RESULTS • Some sufficient conditions for C to have greater EP than AO(C) • Some sufficient conditions for C to have less EP than AO(C) • Analysis when outdegrees of some gates > 1 • Analysis when prior distributions qg at each gate g are arbitrary. • Analysis accounting for thermodynamic costs of wires
  • 30. Total entropy flow out of circuit – the thermodynamic cost – is where: • g indexes the circuit’s gates • pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g) • L(g) is the set of islands of (function implemented by) gate g
  • 31. Total entropy flow out of circuit – the thermodynamic cost – is where: • g indexes the circuit’s gates • pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g) • L(g) is the set of islands of (function implemented by) gate g To focus on information theory, assume every EP(qpa(g);c) = 0
  • 32. Total entropy flow out of circuit – the thermodynamic cost – is where: • g indexes the circuit’s gates • pa(g) is parent gates of g in circuit (so p0 pa(g) is joint distribution into g) • For any circuit C, the “All-at-once gate”, AO(C), is a single gate that computes same function as C. Rest of talk: Compare Landauer costs and total EF for C and AO(C)
  • 33. 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = ∑𝑔𝑔 𝐾𝐾𝑔𝑔 𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) − 𝐾𝐾𝑔𝑔 𝑝𝑝 𝑔𝑔, 𝑞𝑞 𝑔𝑔 EF for running circuit C on input distribution p when optimal distribution is q: EF for running AO gate that implements same input-output function as C: 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐾𝐾𝑖𝑖 𝑖𝑖 𝑝𝑝𝑖𝑖 𝑖𝑖 , 𝑞𝑞𝑖𝑖 𝑖𝑖 - 𝐾𝐾𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑜𝑜𝑜𝑜𝑜𝑜 , 𝑞𝑞𝑜𝑜𝑜𝑜𝑜𝑜
  • 34. 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = ∑𝑔𝑔(𝐾𝐾𝑔𝑔 𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) − 𝐾𝐾𝑔𝑔 𝑝𝑝 𝑔𝑔, 𝑞𝑞 𝑔𝑔 ) EF for running circuit C on input distribution p when optimal distribution is q: EF for running AO gate that implements same input-output function as C: Thermodynamic penalty / gain by using C rather than AO(C): 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐾𝐾𝑖𝑖 𝑖𝑖 𝑝𝑝𝑖𝑖 𝑖𝑖 , 𝑞𝑞𝑖𝑖 𝑖𝑖 - 𝐾𝐾𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑜𝑜𝑜𝑜𝑜𝑜 , 𝑞𝑞𝑜𝑜𝑜𝑜𝑜𝑜 Δ𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 − 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
  • 35. Thermodynamic penalty / gain by using C rather than AO(C): where IK(p, q) is cross multi-information When p = q, ∆EFC(p, q) cannot be negative - Indeed, it can be +∞. So never an advantage to using a circuit ... if p = q, i.e., if one “guessed right” when designing every single gate. Δ𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 − 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) )
  • 36. Thermodynamic penalty / gain by using C rather than AO(C): So never an advantage to using a circuit, if p = q. However in real world, too expensive to build an AO gate. Even if p = q, there are circuits where total (Landauer) cost is infinite! So, even if p = q, if we can’t use an AO gate, what circuit to use to implement a given function? Δ𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 − 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔))
  • 37. Even worse: if p ≠ q, extra total entropy flow if use C rather than AO(C) is This can be positive or negative In fact, extra EF can be either −∞ or +∞ So, if p ≠ q, what circuit to use to implement a given function? 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔)) When – as in real world – we aren’t lucky enough to have gates obey p = q, using a circuit C rather than AO(C) may either increase or decrease total entropy flow out of circuit.
  • 38. However if p ≠ q, extra EP if use C rather than AO(C) is This can be positive or negative −𝐼𝐼 𝐷𝐷 𝑝𝑝, 𝑞𝑞 + ∑𝑔𝑔 𝐼𝐼 𝐷𝐷(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔)) When – as in real world – we aren’t lucky enough to have gates obey p = q, using a circuit C rather than AO(C) may either increase or decrease EP
  • 39. Even worse: if p ≠ q, extra total entropy flow if use C rather than AO(C) is This can be positive or negative In fact, extra EF can be either −∞ or +∞ 𝐼𝐼 𝐾𝐾 𝑝𝑝, 𝑞𝑞 − ∑𝑔𝑔 𝐼𝐼 𝐾𝐾(𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔)) When – as in real world – we aren’t lucky enough to have gates obey p = q, using a circuit C rather than AO(C) may either increase or decrease total entropy flow out of circuit.
  • 40. Consider a (perhaps time-varying) master equation that sends p0(x) to p1(x) = ∑x0 P(x1 | x0) p0(x). • Example: Stochastic dynamics in a genetic network network • Example: (Noise-free) dynamics of a digital gate in a circuit EP(p0) is non-negative (regardless of the master equation) EF(p0) = S(p0) − S(p1) + EP(p0) (See Van Den Broeck and Esposito, Physica A, 2015)
  • 41. Consider a (perhaps time-varying) master equation that sends p0(x) to p1(x) = ∑x0 P(x1 | x0) p0(x). EP(p0) is non-negative (regardless of the master equation) EF(p0) = S(p0) − S(p1) + EP(p0) EF(p0) ≥ S(p0) − S(p1) “𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝑟𝑟′ 𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
  • 42. Consider a (perhaps time-varying) master equation that sends p0(x) to p1(x) = ∑x0 P(x1 | x0) p0(x). EP(p0) is non-negative (regardless of the master equation). • S(p0) − S(p1) called “Landauer cost” – minimal possible heat flow So far, all math, no physics … EF(p0) = S(p0) − S(p1) + EP(p0) EF(p0) ≥ S(p0) − S(p1) “𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝑟𝑟′ 𝑠𝑠 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏
  • 43. EXAMPLE • System evolves while connected to single heat bath at temperature T • Two possible states • p0 uniform • Process implements bit erasure (so p1 a delta function) So generalized Landauer’s bound says Landauer’s conclusion Total heat flow from system ≥ kT ln[2] (Parrondo et al. 2015, Sagawa 2014, Hasegawa et al. 2010, Wolpert 2015, etc.)
  • 44. STATISTICAL PHYSICS APPLICATION Suppose system evolves while connected to multiple reservoirs, e.g., heat baths at different temperatures. Assume “local detailed balance” holds for those reservoirs (usually true) Then: EF(p0) is (temperature-normalized) heat flow into reservoirs S(p0) − S(p1) called “Landauer cost” – minimal possible heat flow EF(p0) = S(p0) − S(p1) + EP(p0) Generalized Landauer′ s bound: Heat flow from system ≥ S(p0) − S(p1)
  • 45. For any p0, and any master equation, • The drop in KL divergence is called mismatch cost • A purely information theoretic contribution to EF • Non-negative; equals 0 iff p0 = q0
  • 46. For any p0, and any master equation, • The average of minimal entropy productions is called residual EP • A non-information theoretic contribution to EF • Non-negative; generally equals 0 only if the process is quasistatic
  • 47. • For any p0, and any master equation, • So total entropy flow out of a system – the thermodynamic cost – is (where K(., .) is cross entropy) EF(p0) = S(p0) − S(p1) + EP(p0)
  • 48. EXAMPLE System successively connected to two heat baths, at temp.’s T1 and T2 < T1 Recall: EF(p0) is sum over each reservoir of (temperature-normalized) total heat flow from system into that reservoir Suppose full cycle, so S(p0) − S(p1) = 0 So generalized Landauer’s bound says 𝑄𝑄𝑜𝑜𝑜𝑜𝑜𝑜(𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐) 𝑇𝑇2 − 𝑄𝑄𝑖𝑖 𝑖𝑖(ℎ𝑜𝑜𝑜𝑜) 𝑇𝑇1 ≤ 0 Clausius’ inequality
  • 49. Some simplifications arise if we restrict attention to particular graphical model representations of the distributions. Examples/ • A product distribution form of q 𝑥𝑥𝑖𝑖 𝑖𝑖 , the optimal input distribution • A Bayes net representation of each 𝑝𝑝 𝑝𝑝𝑝𝑝 𝑔𝑔 (one for each gate g) • A factor graph representation of 𝑝𝑝 𝑥𝑥𝑖𝑖 𝑖𝑖 , the actual input distribution
  • 50. Aside: state dynamics (computation) versus thermo-dynamics t v: 1 0 Dynamics of states in bit erasure
  • 51. Aside: state dynamics (computation) versus thermo-dynamics t v: 1 0 Dynamics of states in bit erasure
  • 52. Aside: state dynamics (computation) versus thermo-dynamics t v: 1 0 Dynamics of states in bit erasure
  • 53. Aside: state dynamics (computation) versus thermo-dynamics t v: 1 0 ? Dynamics of states in bit erasure • Non-reversible map
  • 54. Aside: state dynamics (computation) versus thermo-dynamics t v: 1 0 Dynamics of distributions in bit erasure with uniform initial probabilities
  • 55. Aside: logical-dynamics (computation) versus thermo-dynamics t v: 1 0 Dynamics of distributions in bit erasure with uniform initial probabilities • Reversible map
  • 56. Total work to run a gate taking p0 to p1 K(p0, q0) – K(p1, q1)
  • 57. Example of equation for work applied to a digital gate: • Forward in time
  • 58. Example of equation for work applied to a digital gate: • Forward in time
  • 59. Example of equation for work applied to a digital gate: • If thermodynamically reversible, running in reverse sends ending distribution back to starting one.
  • 60. Example of equation for work applied to a digital gate: • If thermodynamically reversible, running in reverse sends ending distribution back to starting one.
  • 61. Example of equation for work applied to a digital gate: • If thermodynamically reversible, running in reverse sends ending distribution back to starting one. • In this case, mismatch cost = 0; total work is drop in entropies
  • 62. Example: An all-at-once gate to implement some function f over large spaces • Four bit input (16 states), 2 bit output
  • 63. Example: An all-at-once gate to implement some function f over large spaces • Four bit input (16 states), 2 bit output • If thermodynamically reversible, running in reverse sends ending distribution back to starting one. • Always true for (optimized) all-at-once gate
  • 64. Example: A digital circuit that also implements f • Four bit input (16 states), 2 bit output
  • 65. Example: A digital circuit that also implements f • Four bit input (16 states), 2 bit output • Suppose distribution over inputs is not a product of marginals • If thermodynamically reversible, running in reverse sends ending distribution back to starting one.
  • 66. Example: A digital circuit that also implements f • Four bit input (16 states), 2 bit output • Suppose distribution over inputs is not a product of marginals • If thermodynamically reversible, running in reverse sends ending distribution back to starting one. • Not true in general
  • 67. Example: A digital circuit that also implements f • Four bit input (16 states), 2 bit output • Mismatch cost to implement f depends on what circuit used to implement f, in general.
  • 68. Example: A digital circuit that also implements f • Four bit input (16 states), 2 bit output • Mismatch cost to implement f depends on what circuit used to implement f, in general. • Can be lower with a circuit than with an “all at once” (AO) gate.
  • 69. Example: A digital circuit that also implements f • Four bit input (16 states), 2 bit output • Mismatch cost to implement f depends on what circuit used to implement f, in general. • Can be lower with a circuit than with an “all at once” (AO) gate. New kind of circuit design problem
  • 70. Total entropy flow out of a gate / wire – the thermodynamic cost – is where K(., .) is cross–entropy. Evaluate this cost for each wire and gate in a {…} circuit, and then sum. - Thermodynamic cost for any circuit, not just digital, noise-free circuits So analysis holds for: - “Noisy circuits” with noise in gates and/or paths, e.g., genetic circuits - “Reversible circuits”, e.g., made with Fredkin gates
  • 71. Some simplifications arise if we restrict attention to particular graphical model representations of the distributions. Shorthand: ∆WC(p, q) = extra EF using C rather than AO(C) ∆ ℇC(p, q) = extra mismatch cost using C
  • 72. For Bayes net representations of distributions 𝑝𝑝𝑝𝑝𝑝𝑝(𝑔𝑔) : - have a separate DAG at each gate g, Γ(𝑔𝑔, 𝑝𝑝) (to define the Bayes nets) The total work cost when p = q reduces to a sum of mutual informations: where - “in” refers to the joint set of all inputs to the circuit, - For any DAG Γ, “R(Γ)” means its root nodes.
  • 73. • A factor graph representation of a distribution P(x): - each factor 𝜉𝜉𝑖𝑖 is a subset of X - each 𝜙𝜙𝑖𝑖(𝜉𝜉𝑖𝑖) is a positive real-valued function • Note that a given x can occur in more than one factor P(x) ∝ � 𝑖𝑖 𝜙𝜙𝑖𝑖(𝜉𝜉𝑖𝑖)
  • 74. ….. Different color curly brackets: - Different factors of factor graph representation of q. …..
  • 75. ….. Different color curly brackets: - Different factors of factor graph representation of q. - Each yellow node has its parents statistically independent. - Theorem: Each yellow node contributes zero to total EF …..
  • 76. • For fixed P(x1 | x0), changing p0 also changes EP(p0) • So need to know how EP(p0) depends on p0 in order to fully specify the problem of designing a circuit to minimize total EF • To present this result, must first define “islands” ... EF(p0) = S(p0) − S(p1) + EP(p0) Different circuits all implementing same Boolean function have different sum-total Landauer cost First new result: EP(p0) is a sum of an information-theoretic term and a non-information-theoretic term.
  • 77. An island of a function f : x0 → x1 is any set f -1(x1) for some x1. Examples: 1) A noise-free wire maps (xin = y, xout = 0) → (xin = 0, xout = y). For binary y, two islands 2) A noise-free AND gate maps (x1 in = y1, x2 in = y2, xout = 0) → (0, 0, y1 AND y2) For binary y1, y2, two islands • Given a fixed master equation implementing f, for each island c, define - The initial distribution that results in best possible thermodynamic efficiency, for the (implicit) master equation – the prior distribution qc(.) := argminr EP(r)
  • 78. Total entropy flow out of a gate / wire – the thermodynamic cost – is where K(., .) is cross–entropy. Examples: Noise-free wire: (xin = y, xout = 0) → (xin = 0, xout = y) - Landauer cost = mismatch cost = 0; drop in cross-entropy = 0 - Residual EP captures all the dependence of the heat generated by using a wire on the distribution of signals through it
  • 79. Total entropy flow out of a gate / wire – the thermodynamic cost – is where K(., .) is cross–entropy. Examples: Noise-free AND gate: (x1 in = y1, x2 in = y2, xout = 0) → (0, 0, y1 AND y2). - Neither Landauer cost not mismatch cost = 0 - Drop in cross-entropy ≠ 0; depends on p0
  • 80. Total entropy flow out of a gate / wire – the thermodynamic cost – is where K(., .) is cross–entropy. Evaluate this cost for each wire and gate in a circuit, and then sum. Lots of new information theory comes out, relating: • The topology of the circuit • The distribution over the circuit’s inputs • The thermodynamic cost of that circuit when run with that distribution
  • 81. Total entropy flow out of circuit – the thermodynamic cost – is where: • g indexes the circuit’s gates • pa(g) is parent gates of g in circuit (so ppa(g) is joint distribution into g) • ∀g, the distributions pg and ppa(g) are given by propagating input distribution pin through the circuit • Nitty-gritty physical details of each gate g determine qpa(g) and qg • To focus on information theory, assume ∀g, qpa(g) and qg are given by propagating a joint input prior distribution qin through the circuit
  • 82. 𝐸𝐸𝐸𝐸𝐶𝐶 𝑝𝑝, 𝑞𝑞 = ∑𝑔𝑔 𝐾𝐾𝑔𝑔 𝑝𝑝𝑝𝑝𝑝𝑝 𝑔𝑔 , 𝑞𝑞𝑝𝑝𝑝𝑝(𝑔𝑔) − 𝐾𝐾𝑔𝑔 𝑝𝑝 𝑔𝑔, 𝑞𝑞 𝑔𝑔 EF for running circuit C on input distribution p when optimal distribution is q: EF for running AO gate that implements same input-output function as C: For simplicity, assume outdegree of each gate is 1 (a “Boolean formula”) 𝐸𝐸𝐸𝐸𝐴𝐴𝐴𝐴(𝐶𝐶) 𝑝𝑝, 𝑞𝑞 = 𝐾𝐾𝑖𝑖 𝑖𝑖 𝑝𝑝𝑖𝑖 𝑖𝑖 , 𝑞𝑞𝑖𝑖 𝑖𝑖 - 𝐾𝐾𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑜𝑜𝑜𝑜𝑜𝑜 , 𝑞𝑞𝑜𝑜𝑜𝑜𝑜𝑜