05-01-2013 3rd BCGL Conference 1/22
Resilience in Transaction-Oriented
Networks
Dmitry Zinoviev*, Hamid Benbrahim,
Greta Meszoely+
, Dan Stefanescu*
*Mathematics and Computer Science Department
+
Sawyer School of Management
Suffolk University, Boston
05-01-2013 3rd BCGL Conference 2/22
Outline
Transactionoriented networks
Network model and its interpretations
Simulation results:
Dense and sparse networks
Throughput amplification
Equivalence of excessive traffic and faulty nodes
Network as a fourphase matter
Conclusion and future work
05-01-2013 3rd BCGL Conference 3/22
Transaction-Oriented Networks
Used to execute distributed transactions (compound operations that succeed or fail
atomically)
Interpretations:
Distributed database transactions (original, HPCrelated interpretation)
Financial transactions (e.g., loans)
Transportation (e.g., multileg flights)
How resilient are these networks to externally and internally induced failures?
05-01-2013 3rd BCGL Conference 5/22
Network Model Overview
Random Erdös–Rényi network, N=1,600 identical nodes representing network
hosts, density d.
Each node can simultaneously execute up to C almost independent
subtransactions. Each subtransaction takes constant time
0
to complete. The
network is simulated for the duration of S
0
.
Each node can be used for injecting transactions into the network and for
terminating transactions. Transactions are injected uniformly across the network.
The delays between subsequent transactions are drawn from the exponential
distribution E(1/r).
Each transaction has L=N(10,4) subtransactions.
05-01-2013 3rd BCGL Conference 6/22
Opportunistic Routing
The node for the next subtransaction is chosen uniformly at random from all
neighbors of the current node.
If the next node is disabled, then another neighbor is chosen.
If all neighbors are disabled, the subtransaction is aborted, and the master
transaction rolls back.
If a transaction is aborted, all other transactions that crossed path with it in the past
T time units (T=100
), are also aborted with probability p0
=.01.
We observed very little dependence of the simulated network measures on p0
.
05-01-2013 3rd BCGL Conference 7/22
Node Shutdown
When a node is overloaded (load > C), it shuts down.
A node may fail randomly after an initial delay drawn from the exponential
distribution E(Tf
).
Once disabled, a node is not restarted. All subtransactions currently executed at a
disabled node are aborted.
05-01-2013 3rd BCGL Conference 8/22
Simulation Framework
Custombuilt network simulator in C++
In each experiment, the network has been simulated for a variety of combinations
of node capacities and densities (C, d):
d {0.01, 0.011, 0.015, 0.025, 0.04, 0.055, 0.075, 0.1, 0.2, 0.3, 0.5, 0.6,
0.75, 0.85, 0.99}
C {2, 3, 4, ... 22}
Red color indicates sparse networks (they behave diferently from the dense
networks)
05-01-2013 3rd BCGL Conference 9/22
Failing by Overloading
Start with a fully functional network.
Gradually increase the injection rate from
0 to r0
until at least 106
of all
transactions abort (superconductive
mode ⇒ resistive mode).
The fraction of aborted transactions
monotonically increases, until at some
rate r1
the network chokes (resistive
mode ⇒ dielectric mode).
Define 0
= r0
/ r1
.
r0
and r1
slightly depend on the simulation
running time. Our results have been
obtained for S=84,6000
(“one day”).
05-01-2013 3rd BCGL Conference 10/22
Phase Transition Injection Rates
r1
, smaller d
r0
, smaller d
dense
05-01-2013 3rd BCGL Conference 11/22
Quadratic Amplification
Both r0
(C) and r1
(C) can be approximated by a power function:
The exponents i
for the dense networks are ~1.7 and ~2.1, respectively. Both i
's
tend to 1 as d tends to 0.
The mantissas Ai
for the dense networks are ~0.7 and ~2.8, respectively. Both Ai
increase and possibly diverge as d tends to 0.
Doubling node capacity almost quadruples the throughput.
r0,1C ≈A0,1C−2
0,1
05-01-2013 3rd BCGL Conference 12/22
Failing by Internal Faults
Start with a fully functional network.
Gradually increase the injection rate
from 0 to r0
.
At the fixed injection rate, fail
random nodes after random delays.
Let m0
be the smallest fraction of
failed nodes that causes the network
to choke.
05-01-2013 3rd BCGL Conference 14/22
Faulty Nodes Effect
Estimation of m0
:
For the dense networks, A tends to [0...0.23]
That is, it takes no more 23% of internally faulty nodes to choke a dense network
with infinite buffer space in the presence of the highest superconductive injection
rate.
m0C≈
A−1erf logC−2/− A1
2
05-01-2013 3rd BCGL Conference 15/22
Failing by Overloading and Internal
Faults
Start with a fully functional network.
Gradually increase the injection rate
from 0 to r () and simultaneously
fail random nodes after random
delays, until the network chokes.
05-01-2013 3rd BCGL Conference 17/22
Equivalence of Excessive Traffic and
Node Failures
dense
05-01-2013 3rd BCGL Conference 18/22
Equivalence of Excessive Traffic and
Node Failures
To a first approximation, the relationship between the network resilience
parameters 0
and m0
is almost linear, with the slope of 1
Tolerating additional superconductive traffic 0
is equivalent to disabling extra
network nodes m0
due to internal faults:
≈−m0
05-01-2013 3rd BCGL Conference 19/22
A Closer Look at the Resistive Phase
r1
r0
???
05-01-2013 3rd BCGL Conference 20/22
What Happens around the “knee”?
The “knee” is visible only in sparse networks
Network state at the end of the simulation run: red circles correspond to faulty
nodes, cyan circles—to healthy nodes
05-01-2013 3rd BCGL Conference 21/22
How Many Nodes Are in the GC?
Percentage of faulty (red)
and healthy (blue) nodes
in the respective giant
component for various r's
The phase transition
happens when all faulty
nodes join the giant
component
Two “resistive” phases:
“resistiveA” (truly
“resistive”) and
“resistiveB” (“resistive
dielectric”)
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
0%
20%
40%
60%
80%
100%
r
All faulty nodes join
the giant component!
05-01-2013 3rd BCGL Conference 22/22
Conclusion
Random transactional networks can stay in four phases of interest:
“superconductive” (no transactions fail), “resistiveA and B'' (some transactions
fail), and “dielectric” (all transactions fail)
Injection rates associated with the phase transitions, scale almost quadratically with
respect to the node capacity
At the resistivetodielectric phase transition, the effects of excessive network load
and internal, spontaneous, and irreparable node faults are equivalent and almost
perfectly anticorrelated
The phase transition between two “resistive” phases can be attributed to the
evolution of the giant component of faulty nodes