SlideShare a Scribd company logo
1 of 9
Download to read offline
System Damage Mitigation via Absorption Provenance
Sumaya Almanee, Micah Sherr, and Wenchao Zhou
Georgetwon University
Abstract
Today’s modern computing systems are treated as black
boxes with limited visibility of system events and data
items, especially those that are derived as a response to
updates on inputs or configurations. Such visibility, how-
ever, is crucial for many system administration tasks in-
cluding: network diagnostics, identifying malicious nodes,
damage assessment and accountability and policy enforce-
ment. This has led to a series of advancements to pro-
vide better network support for accountability, and effi-
cient mechanisms to trace packets and information flows in
distributed system. In this paper we explore two main
techniques to detect and remove the effects of an identi-
fied fault in distributed systems: Incremental view main-
tenance works by deleting incorrect inputs along with the
system states that were transitively derived from these in-
puts, and DMAP our proposed alternative fault removal
technique which works by maintaining metadata with each
system state that can decide accurately whether a system
state should be removed in response to the deletion of an
input.
1 Introduction
Modern computing systems are by nature intangible and
not directly visible. However, the visibility of such sys-
tems – specially distributed systems – are sometimes cru-
cial specifically when it comes to enforcing accountability
and security policies.
Network operators often find themselves needing to an-
swer forensic and diagnostic questions. When an unex-
pected event occurs in a certain system – such as the dis-
covery of a suspicious entry in a database or a malicious
rule in a router’s routing table, the effects and causes of
such an event must be determined in order to repair the
system and restore it back to a correct state. Here’s where
the importance of utilizing network provenance emerges.
Data provenance is not a new concept, it has been suc-
cessfully applied to many different areas including collab-
orative databases [7], cloud computing [10] and in dis-
tributed systems in general [18, 20, 19]. It’s mainly used
to answer questions related to the causes or effects of data
items.
Network provenance can be utilized to answer forensic
questions, perform network diagnostics, identify malicious
nodes and misbehaving users and enforce accountability
and trust management policies in distributed systems.
In this paper we explore two main techniques to detect
and remove the effects of an identified fault in distributed
systems. A main challenge for damage mitigation is de-
ciding the minimum number of states that need to be red-
erived; mitigation may potentially involve most, if not all
nodes in a system.
We adopt the provenance model that builds upon our
prior work on provenance [18, 20, 19] to perform dam-
age assessment and provide mitigation strategies. There
are two approaches to revert a system to a correct state:
(1) Incremental view maintenance works by deleting in-
correct inputs along with the system states that were tran-
sitively derived from these inputs (This has already been
implemented in [20]). (2) DMAP (Damage Mitigation via
Absorption Provenance) , our proposed schema, works by
maintaining metadata [12] with each system state that can
decide accurately whether a system state should be re-
moved in response to the deletion of an input.
We argue that our proposed schema leads to a significant
reduction in network update cost as compared with the
incremental view maintenance due to DMAP’s ability to
eliminate the need for recomputations.
Section 2 presents a background introduction to network
provenance and declarative networking. We also discuss
the previous damage mitigation technique – incremental
view maintenance – and we point out some of its draw-
backs. In Section 3 we propose DMAP, an alternative fault
removal technique. We then conclude with Implementa-
tion and evaluation results in Section 4.
2 Background and Related Work
As a background, we briefly introduce network prove-
nance. Data provenance [3] is not a new concept, it has
been extensively adopted in many different contexts. For
example, in database, provenance (or lineage) is a well stud-
ied concept mainly used to answer questions related to the
derivations of query results and to investigate which data
sources they originated from.
Network provenance [17] is utilized to answer foren-
sic questions (how and why) related to messages traversal
path, how data messages were derived and which parties
1
were involved in these derivations. It basically describes
the history and derivations of network states resulting from
the execution of distributed protocols. Networks’ ability to
perform forensics analysis is essential to a number of net-
work management tasks including: identifying malicious
nodes, performing network diagnostics and enforcing trust
management policies.
Previous work [20] has captured network provenance as
a global dependency graph where vertices represent sys-
tem states at a particular node and edges represent mes-
sages across nodes. For example, Figure 1 depicts a net-
work topology and the corresponding provenance graph
for bestPathCost(@a,c,5) (computed by MINCOST). Ovals
represent the rule execution vertices while the link, path-
Cost and bestPathCost vertices represent the intermediate
computation results. The graph encodes how tuples are
generated during the execution of MINCOST query. For
instance, pathCost(@b,c,2) is generated from rule sp3 at node
b taking link(@b,c,2) as an input.
Figure 1 : Example network topology (left) and the corre-
sponding provenance graph for bestPathCost(@a,c,5)
Provenance information is usually represented in rela-
tional tables in a format similar to that used in existing
work [5, 6, 11]. Provenance information can be stored in
two tables [20] –prov and ruleExec – that are distributed
across all network nodes and partitioned based on the lo-
cation specifier Loc. Each entry in the prov table represents
a direct derivation of a tuple and it is of the form prov(@Loc,
VID, RID, RLoc) indicating that the tuple vertex VID at loca-
tion Loc is directly derivable from the rule execution vertex
ruleExec for a rule located at RLoc. The ruleExec table stores
the actual metadata of the rule execution and it is of the
form ruleExec(@RLoc, RID, R, VIDList) indicating that the
rule execution RID located at node RLoc is directly derived
from tuple vertices VIDList. Figure 2 presents the prov and
ruleExec tables that correspond to the provenance graph in
Figure 1 .
Figure 2 : An example prov and ruleExec relations that cor-
respond to the provenance graph in Figure 1
Network provenance is maintained through the use of
Declarative Networking which models network protocol as
continuous queries over distributed streams. Declarative
queries such as, NDlog, represent a compact way to im-
plement a variety of routing protocols and overlays net-
works. For more information on Declarative Networking and
its queries, refer to [13, 14, 15]
2.1 Incremental View Maintenance
In incremental view maintenance, provenance maintains
sufficient information in order to reproduce the system ex-
ecution trace. This is beneficial when applying modifica-
tions: System inputs that are marked for deletion are re-
moved along with the system states that are derived from
these tuples. Updated tuples then get inserted and the sys-
tem state is derived.
Incremental view maintenance with provenance is sim-
ilar to the standard DRed [9] algorithm for incremental
view update in databases. DRed works by first over-
deleting tuples and then reproducing tuples that might
have alternative derivations. Previous work [12] has shown
that DRed can be greatly expensive especially when large
number of deletions or updates are preformed. An exam-
ple in [12] demonstrates how deleting a single link in the
2
Reachable view resulted in the deletion and then the red-
erivation of all tuples – those that are still derivable even
after the deletion took place. i.e. the cost of deleting one
link was equal to computing the entire Reachable view from
scratch. Moreover, it has been noted that a system, in
the revocation phase, might reproduce a system state that
should not be revoked [9].
Incremental view maintenance has been applied to
provenance in previous work [20]: Whenever a base tu-
ple is deleted, all derivations resulted from the NDlog [13]
rules are incrementally removed, resulting in cascading
deletions of the corresponding prov and ruleExec entries in
the provenance graph.
In the following section we introduce our main contribu-
tion: an alternative damage mitigation strategy that elimi-
nates the need for re-computations.
3 Damage Mitigation via Absorption
Provenance
One major challenge associated with the previous dam-
age mitigation technique is handling deletions of tuples.
Incremental view maintenance works by first deleting the
detected bad inputs and the system states affected by these
inputs, and then inserting updated inputs and re-driving
the system state.
In this section we propose DMAP (Damage Mitigation
via Absorption Provenance), a damage mitigation technique
aimed at eliminating the need for recomputations imposed
by the Incremental view maintenance. DMAP works by
maintaining metadata [12] with each system state that can
decide accurately whether a system state should be re-
moved in a response to the deletion of an input.
Our proposed schema implements a previously pro-
posed provenance model known as the absorption prove-
nance [12]. The benefits of implementing the latter model
in DMAP are twofold: (1) It is easy to determine whether
an intermediate tuple should be removed in response to
the removal of a base tuple; and (2) It aims to reduce net-
work and computation costs by propagating appropriate
updates from one node to another only when necessary.
Figure 3 demonstrates how deletion is handled in
DMAP: each tuple in a provenance graph is annotated with
a boolean expression. When base tuples are inserted, each
tuple is assigned a boolean variable with an initial value of
True. These boolean variables are then propagated to the
remaining derived tuples. In our scheme, deletion on the
derived tuples are directly caused by deletions on base tu-
ples. Once a base tuple gets deleted, the associated boolean
variable gets reset to False and appropriate updates are
propagated to the derived tuples. If applying the absorp-
tion rule to a tuple’s boolean expression (usually computed
by BDD operation "restrict" [4]) results in False, we remove
it. Otherwise, it remains derivable. In other words, tuples
remain derivable iff the expression evaluates to True.
(a) Each base tuple is annotated with a boolean variable (p1, p2
and p3 in red) with an initial value of True. Remaining provenance
expressions are computed for each derived tuple (shown in the
right table in pv column)
(b) When link(@b,c,2) is deleted, p1 is reset to false in all derived
tuples. The table on the right represents the provenance expres-
sions after p1 is set to False
Figure 3 : DMAP algorithm applied to base and derived
tuples generated by MINCOST . pv column contains the
absorption provenance for each tuple
To implement the absorption rule, we utilize the Reduced
Ordered Binary Decision Diagram [2]. ROBDD is not only a
natural way of representing boolean expressions in prove-
nance but it also reduces unsatisfied algebraic functions to
zero. In other words, absorption is applied automatically
3
in ROBDD. In the implementation of DMAP, we exploit
the capabilities of one of many available optimized BDD
libraries [4]. This library provides symbolic boolean alge-
bra with a selection of function representations including:
logic expressions, truth tables and ROBDDs.
DMAP’s ability to eliminate the need for over-deletions
and re-derivations leads to a significant reduction in a net-
work’s update cost as compared with incremental view
maintenance. The following section evaluates the perfor-
mance of the two stated damage mitigation techniques.
4 Evaluation
In this section, we evaluate our proposed schema in Sec-
tion 3 .The goals of our evaluation are (1) to prove that our
proposed schema constantly outperforms cascading dele-
tion by a significant amount; and (2) to confirm that the
total update cost grows gradually with update ratio and
topology size.
4.1 Implementation and Experimental Setup
Our proposed schema is applied to Rapidnet [16], a de-
velopment toolkit for simulation and implementation of
network protocols. Rapidnet compiles NDlog programs
into applications that are executed by the ns-3 [1] runtime.
ns-3 is a discrete-event network simulator that provides a
platform for users who wish to perform tests that are diffi-
cult to conduct with real systems or for those who merely
want to learn how networks work in general. ns-3 also fa-
cilitates the evaluation of systems’ behavior in a highly con-
trolled environment. For our simulation experiment, we
generate transit-stub topologies using the GT-ITM topol-
ogy generator [8]. There is one transit domain in each
graph that consists of x transit nodes (where x corresponds
to 1, 2, 3, 4 ... 15 respectively). Each transit node is con-
nected to 3 stub domains: There is no extra transit-stub
edges and no extra stub-stub edges. Every stub domain is
connected to 4 nodes. We increase the number of nodes
in the network by increasing the number of transit nodes.
Figure 4 shows 2 graphs generated by the GT-ITM net-
work generator. In each graph, we adjusted the number of
transit nodes to increase the total number of nodes in the
topology.
(a)
(b)
Figure 4 : The network in (a) consists of 2 transit nodes per
domain, 3 stub domains per transit node and 4 nodes per
stub domain. The total number of nodes in this network
is: 1 · 2(1 + 4 · 3) = 26nodes. If we increase the number of
transit nodes in (a) by 2, we get the resulted topology in
(b) that consists of 52 nodes in total (1 · 4(1 + 4 · 3) = 52
nodes)
Of course, the provenance graphs that correspond to the
generated topologies are much more greater in size. Fig-
ure 5 shows the topology size of each network tested in
our deployment experiment along with the corresponding
provenance graph size.
4
Figure 5 : The sizes of topologies used in our experiments
along with the sizes of their corresponding provenance
graphs
DMAP is divided into two engines that are connected
to each other via a bi-directional pipe: The Provenance
Engine acts as the decision making engine in DMAP
which identifies the level of damage across the network
(i.e. the consequences of a malicious node) and determines
what tuples need to be deleted in order to restore the
system to its correct state. The Deletion Engine on the
other hand, is responsible for carrying out the actual dele-
tion. Despite the fact that DMAP can be easily integrated
into exiting distributed systems, we apply our proposed
schema – for the sake of simplicity– to MINCOST [20]
which upon execution, generates streams of link, pathCost
and bestPathCost tuples that are joined at different nodes
to compute the best paths (lowest-cost) between pairs of
nodes. The three-rule MINCOST program is shown in
Figure 6 .
Figure 6 : The MINCOST program in NDlog
MINCOST Example: The following scenario further il-
lustrates the exact functions of DMAP (depicted in Figure
7 )1.
1. When executed, MINCOST generates streams of link,
pathCost and bestPathCost tuples and stores them in a
rational database.The Deletion Engine retrieves the
1The graph and table examples are similar to those used in existing
work [20]
prov and ruleExec relations and sends them to the
Provenance Engine along with the identified mali-
cious nodes.
2. The Provenance Engine converts the received string
streams into a graph representation (refer to Section
4.1 in the existing work [20] for more information on
how the resulted graph is constructed from prov and
ruleExec relations). Ovals represent the rule execution
vertices while the link, pathCost and bestPathCost ver-
tices represent the intermediate computation results.
Provenance Engine then annotates every tuple in
the graph with a boolean expression: Base tuples
(link(@b,c,2), link(@b,a,3) and link(@a,c,5)) are assigned
boolean variables (p1, p2 and p3 respectively) each
with an initial value of True. These boolean variables
are then propagated to the remaining derived tuples.
For instance, p1 and p2 propagate to the successors of
link(@b,c,2) and link(@b,a,3) until they reach the rule-
Exec tuple sp2@b which is assigned to a boolean ex-
pression of the value p1 OR p2.
3. The provenance variable (p1) annotated with the iden-
tified malicious node link(@b,c,2) gets reset to False,
indicating that the latter needs to be deleted from the
provenance graph. Afterwards, the value of p1 gets
substituted into the boolean expressions of the remain-
ing tuples. If the tuple’s boolean expression evaluated
to False, it gets removed. Otherwise, it remains deriv-
able. For example, zeroing out p1 results in reseting
the boolean variables in sp1@b, pathCost(@b, c,2), sp3@b
and bestPathCost(@b, c,2) to False and therefore elimi-
nating them from the provenance graph. On the other
hand, applying the absorption rule (described in Sec-
tion 3) to the boolean expressions: (p1 OR p2) and [(p1
OR p2) AND p3] evaluates to: p2 and (p2 AND p3) re-
spectively.
4. The tuples to be deleted are sent back to the Deletion
Engine via a bi-directional pipe.
5. Once the Deletion Engine receives the tuples to be
deleted, it parses them into a readable format and then
removes them statically from the provenance table.
Note that the cascading deletion trigger (described in
Section 2.1) is deactivated in our evaluation experi-
ment.
Experimental Setup: Our deployment experiments are
carried out on a local cluster consisting of 8 Intel Xeon 2.67
GHz CPUs with 8 GB RAM running Linux 3.2.0. The ex-
periment results descried in the upcoming sections reflect
the average time of 50 executions of the experiment.
5
Figure 7 : An Illustrative scenario that exhibits the two main engines in DMAP
4.2 Experimental Results
Our deployment provides a mechanism to study the up-
date cost of various damage mitigation techniques. Figures
8 and 9 illustrate that the total update cost grows rapidly
with topology size and update ratio. The larger a topology
is, the longer it takes DMAP and incremental view mainte-
nance to update a specific provenance graph. Similarly, the
more base tuples are deleted (update ratio), the more time
it takes those damage mitigation techniques to execute.
Figure 8 plots the update cost (average execution time)
for various sized simulated networks when deleting 10%
of their corresponding provenance graph. For example, a
topology of size 78 – which has a provenance graph of size
3883 tuples – takes an average of 0.8 seconds to delete 388
tuples using DMAP.
As shown in Figure 8 incremental view maintenance in-
curs a significant increase in execution time. For instance,
in a 117-node network, incremental view maintenance im-
poses a dramatic increase in execution time as compared to
DMAP (roughly a 370% increase). The significant increase
in update cost in the former system is due to the incremen-
tal re-computations caused by cascaded deletions. When-
ever a base tuple is deleted, all its derivations are removed
and new prov and ruleExec tuples are created. Our pro-
posed technique on the other hand, decreases the update
cost by relying on a decision making engine (Provenance
Engine) to determine which tuples must be deleted to re-
store the system to its correct state. Those tuples are then
deleted statically from the provenance relation without in-
crementally creating or recomputing any new derivations.
The average execution time required to mitigate network
failures in a topology of size 78 is shown in Figure 9 . We
6
Figure 8 : Average execution time per topology size when
10% of the corresponding provenance graph is deleted
Figure 9 : Average execution time per update ratio for a
topology of size 78
Figure 10 : Average execution time and total number of
deleted tuples per topology size using DMAP when 10%
of the corresponding provenance graph is deleted
Figure 11 : Average execution time and number of deleted
tuples per update ratio for a topology of size 78 using
DMAP
run our experiments on a 78-node network and gradually
delete randomly selected base tuples. As can be discerned
from the figure, our proposed schema significantly reduces
the update cost: Deleting 90 base tuples using DMAP de-
creases the average execution time by a factor of 3 as com-
pared with incremental view maintenance.
It is important to note that DMAP incurs much more
update cost compared with that of the Provenance Engine
(also plotted in both Figure 8 and 9 ). Of course, we have
to keep in mind that the considerable increase in execution
time using DMAP is due to the fact that the latter is re-
sponsible for exchanging provenance information between
its two engines, parsing received data, carrying out the ac-
tual static deletion in addition to the tasks assigned to the
Provenance Engine. Nonetheless, our results clearly indi-
cate that our proposed schema constantly outperforms in-
cremental view maintenance by a significant amount. Also,
it seems that our approach works particularly well when
7
the graph size is large.
In addition to examining the update cost (left Y-axis) for
various sized networks using DMAP, Figure 10 plots the
total number of deleted tuples (right Y-axis). For instance,
deleting 45 base tuples in a 169-node topology – which has
a provenance graph of size 16066 tuples – takes an average
of 3.9 seconds and removes 10% of the provenance graph
(roughly 1607 tuples). Similarly, Figure 11 shows the aver-
age execution time and the total number of deleted tuples
per update ratio for a topology of size 78 using DMAP.
The two depicted lines in both Figures 10 and 11 present
a near-identical constant cost, which is expected.
In summary, the results of our deployment experiment
indicate that DMAP achieves a substantial decrease in up-
date cost as compared to incremental view maintenance.
This decrease is due to DMAP’s ability to eliminate the
need for recomputations imposed by incremental view
maintenance. The graphs have also demonstrated that the
update cost has a positive correlation with both topology
size and update ratio: The larger a topology is, the longer it
takes DMAP and the incremental view maintenance to up-
date a specific provenance graph. Similarly, the more base
tuples are deleted (update ratio), the more time it takes
those damage mitigation techniques to execute.
5 Conclusion
This paper presents DMAP, a fault removal technique
that aims to reduce network and computation costs by
propagating appropriate updates from one node to another
only when necessary. DMAP works by maintaining ab-
sorption provenance with each system state that can de-
cide accurately whether a system state should be removed
in a response to the deletion of an input. Unlike incremen-
tal view maintenance – which works by deleting incorrect
inputs along with the system states that were transitively
derived from these inputs – our proposed technique relies
on a decision making engine (Provenance Engine) to deter-
mine which tuples must be deleted to restore the system
to its correct state. Those tuples are then deleted statically
from the provenance relation without incrementally creat-
ing or recomputing any new derivations.
We argue that DMAP’s ability to eliminate the need for
over-deletions and re-derivations leads to a significant re-
duction in networks’ update cost as compared with incre-
mental view maintenance.The evaluation results in Section
4 have shown that our proposed schema constantly outper-
forms the incremental view maintenance by a significant
amount. Also, it seems that our approach works particu-
larly well when the graph size is large.
6 References
[1] Network simulator 3. http://www.nsnam.org/.
[2] R. E. Bryant. Graph-based algorithms for boolean
function manipulation. IEEE Transactions on Comput-
ers, C-35-8:677–691, 1986.
[3] P. Buneman, S. Khanna, and W.-C. Tan. Why and
where: A characterization of data provenance. In
ICDT, 2001.
[4] C. Drake. Python electronic design automation library.
https://github.com/cjdrake/pyeda.
[5] B. Glavic and G. Alonso. Perm: Processing prove-
nance and data on the same data model through
query rewriting. In ICDE, 2009.
[6] T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tan-
nen. Update exchange with mappings and prove-
nance. In VLDB, 2007.
[7] T. J. Green, G. Karvounarakis, N. E. Taylor, O. Biton,
Z. G. Ives, and V. Tannen. Orchestra: Facilitating col-
laborative data sharing. In SIGMOD, 2007.
[8] GT-ITM. Modeling topology of large networks. http:
//www.cc.gatech.edu/projects/gtitm/.
[9] A. Gupta, I. S. Mumick, and V. Subrahmaniant. Main-
taining views incrementally. In SIGMOD, 1993.
[10] R. Ikeda, H. Park, and J. Widom. Provenance for gen-
eralized map and reduce workflows. In CIDR, 2011.
[11] Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Or-
chestra: Rapid, collaborative sharing and dynamic
data. In CIDR, 2005.
[12] M. Liu, N. E. Taylor, W. Zhou, Z. G. Ives, and B. T. Loo.
Recursive computation of regions and connectivity in
networks. In ICDE, 2009.
[13] B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M.
Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe,
and I. Stoica. Declarative networking: Language, exe-
cution and optimization. In SIGMOD, 2006.
[14] B. T. Loo, T. Condie, J. M. Hellerstein, P. Maniatis,
T. Roscoe, and I. Stoica. Implementing declarative
overlays. In SOSP, 2005.
[15] B. T. Loo, T. Condie, J. M. Hellerstein, R. Ramakr-
ishnan, and I. Stoica. Declarative routing: Extensible
routing with declarative queries. In SIGCOMM, 2005.
8
[16] S. C. Muthukumar, X. Li, C. Liu, J. B. Kopena,
M. Oprea, and B. T. Loo. Declarative toolkit for rapid
network protocol simulation and experimentation. In
SIGMOD (demo), 2009.
[17] W. Zhou, E. Cronin, and B. T. Loo. Provenance-aware
secure networks. In ICDE/NetDB, 2008.
[18] W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo,
and M. Sherr. Secure network provenance. In SOSP,
2011.
[19] W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen,
Z. Ives, B. T. Loo, and M. Sherr. Distributed time-
aware provenance. In VLDB, 2013.
[20] W. Zhou, M. Sherr, T. Tao, X. Liu, B. T. Loo, and
Y. Mao. Efficient querying and maintenance of net-
work provenance at internet-scale. Technical report,
University of Pennsylvania, 2010.
9

More Related Content

What's hot

Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemHardik Patil
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceAIRCC Publishing Corporation
 
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...IDES Editor
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)Sudarshan Mondal
 
B036407011
B036407011B036407011
B036407011theijes
 
(Paper) Task scheduling algorithm for multicore processor system for minimiz...
 (Paper) Task scheduling algorithm for multicore processor system for minimiz... (Paper) Task scheduling algorithm for multicore processor system for minimiz...
(Paper) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
 
Power system and communication network co simulation for smart grid applications
Power system and communication network co simulation for smart grid applicationsPower system and communication network co simulation for smart grid applications
Power system and communication network co simulation for smart grid applicationsIndra S Wahyudi
 
Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer Network Performance Evaluation Based on Different Data Packet Size U...Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer Network Performance Evaluation Based on Different Data Packet Size U...Jaipal Dhobale
 
Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)Rushdi Shams
 
Distributed Computing: An Overview
Distributed Computing: An OverviewDistributed Computing: An Overview
Distributed Computing: An OverviewEswar Publications
 
Efficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEfficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEditor IJCATR
 
Analyzing consistency models for semi active data replication protocol in dis...
Analyzing consistency models for semi active data replication protocol in dis...Analyzing consistency models for semi active data replication protocol in dis...
Analyzing consistency models for semi active data replication protocol in dis...ijfcstjournal
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 

What's hot (19)

Load balancing
Load balancingLoad balancing
Load balancing
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
 
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
 
Lec 8 (distributed database)
Lec 8 (distributed database)Lec 8 (distributed database)
Lec 8 (distributed database)
 
Load rebalancing
Load rebalancingLoad rebalancing
Load rebalancing
 
B036407011
B036407011B036407011
B036407011
 
DDBMS Paper with Solution
DDBMS Paper with SolutionDDBMS Paper with Solution
DDBMS Paper with Solution
 
Chapter25
Chapter25Chapter25
Chapter25
 
(Paper) Task scheduling algorithm for multicore processor system for minimiz...
 (Paper) Task scheduling algorithm for multicore processor system for minimiz... (Paper) Task scheduling algorithm for multicore processor system for minimiz...
(Paper) Task scheduling algorithm for multicore processor system for minimiz...
 
Power system and communication network co simulation for smart grid applications
Power system and communication network co simulation for smart grid applicationsPower system and communication network co simulation for smart grid applications
Power system and communication network co simulation for smart grid applications
 
Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer Network Performance Evaluation Based on Different Data Packet Size U...Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer Network Performance Evaluation Based on Different Data Packet Size U...
 
Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)Distributed Database Management Systems (Distributed DBMS)
Distributed Database Management Systems (Distributed DBMS)
 
Distributed Computing: An Overview
Distributed Computing: An OverviewDistributed Computing: An Overview
Distributed Computing: An Overview
 
Efficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent EnvironmentEfficient & Lock-Free Modified Skip List in Concurrent Environment
Efficient & Lock-Free Modified Skip List in Concurrent Environment
 
master_seminar
master_seminarmaster_seminar
master_seminar
 
Lec 7 query processing
Lec 7 query processingLec 7 query processing
Lec 7 query processing
 
Analyzing consistency models for semi active data replication protocol in dis...
Analyzing consistency models for semi active data replication protocol in dis...Analyzing consistency models for semi active data replication protocol in dis...
Analyzing consistency models for semi active data replication protocol in dis...
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 

Viewers also liked

Apertura Sucursal Ferretería
Apertura Sucursal FerreteríaApertura Sucursal Ferretería
Apertura Sucursal Ferreteríapaas86
 
Administracion de inversiones
Administracion de inversionesAdministracion de inversiones
Administracion de inversionesViviana Otero
 
Ferreteria coa
Ferreteria coaFerreteria coa
Ferreteria coaJose Vidal
 
[ Knawer ] Perfil del agente de ventas
[ Knawer ] Perfil del agente de ventas[ Knawer ] Perfil del agente de ventas
[ Knawer ] Perfil del agente de ventasEugenio Guzman
 
Pronostico financiero
Pronostico financieroPronostico financiero
Pronostico financieroViviana Otero
 

Viewers also liked (9)

Ferreteria
FerreteriaFerreteria
Ferreteria
 
Apertura Sucursal Ferretería
Apertura Sucursal FerreteríaApertura Sucursal Ferretería
Apertura Sucursal Ferretería
 
Administracion de inversiones
Administracion de inversionesAdministracion de inversiones
Administracion de inversiones
 
Ferreteria coa
Ferreteria coaFerreteria coa
Ferreteria coa
 
Ley 51 de 1990
Ley 51 de 1990Ley 51 de 1990
Ley 51 de 1990
 
[ Knawer ] Perfil del agente de ventas
[ Knawer ] Perfil del agente de ventas[ Knawer ] Perfil del agente de ventas
[ Knawer ] Perfil del agente de ventas
 
Estados Financieros y Planeación Financiera Estratégica
Estados Financieros y Planeación Financiera EstratégicaEstados Financieros y Planeación Financiera Estratégica
Estados Financieros y Planeación Financiera Estratégica
 
Pronostico financiero
Pronostico financieroPronostico financiero
Pronostico financiero
 
Análisis del punto de equilibrio y estados
Análisis del punto de equilibrio y estadosAnálisis del punto de equilibrio y estados
Análisis del punto de equilibrio y estados
 

Similar to DTAP

Solve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same NetworkSolve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same NetworkAlkis Vazacopoulos
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...IJECEIAES
 
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud ComputingA Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud ComputingIRJET Journal
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
 
An optimized cost-based data allocation model for heterogeneous distributed ...
An optimized cost-based data allocation model for  heterogeneous distributed ...An optimized cost-based data allocation model for  heterogeneous distributed ...
An optimized cost-based data allocation model for heterogeneous distributed ...IJECEIAES
 
Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...IJECEIAES
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Damir Delija
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Damir Delija
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Damir Delija
 
Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...
Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...
Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...neirew J
 
INTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENT
INTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENTINTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENT
INTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENTijccsa
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed SystemsRicha Singh
 
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...IJCNCJournal
 
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...IJCNCJournal
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System ManagementIbrahim Amer
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxShakas Technologies
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxShakas Technologies
 

Similar to DTAP (20)

Solve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same NetworkSolve Production Allocation and Reconciliation Problems using the same Network
Solve Production Allocation and Reconciliation Problems using the same Network
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
Comparative Study of Neural Networks Algorithms for Cloud Computing CPU Sched...
 
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud ComputingA Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
A Survey on Task Scheduling and Load Balanced Algorithms in Cloud Computing
 
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...
 
An optimized cost-based data allocation model for heterogeneous distributed ...
An optimized cost-based data allocation model for  heterogeneous distributed ...An optimized cost-based data allocation model for  heterogeneous distributed ...
An optimized cost-based data allocation model for heterogeneous distributed ...
 
Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...Multimode system condition monitoring using sparsity reconstruction for quali...
Multimode system condition monitoring using sparsity reconstruction for quali...
 
Ie3514301434
Ie3514301434Ie3514301434
Ie3514301434
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...
 
Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...Communication network simulation on the unix system trough use of the remote ...
Communication network simulation on the unix system trough use of the remote ...
 
Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...
Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...
Intrusion Detection and Marking Transactions in a Cloud of Databases Environm...
 
INTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENT
INTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENTINTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENT
INTRUSION DETECTION AND MARKING TRANSACTIONS IN A CLOUD OF DATABASES ENVIRONMENT
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
 
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
I1102014953
I1102014953I1102014953
I1102014953
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
 
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docxAbnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
 

DTAP

  • 1. System Damage Mitigation via Absorption Provenance Sumaya Almanee, Micah Sherr, and Wenchao Zhou Georgetwon University Abstract Today’s modern computing systems are treated as black boxes with limited visibility of system events and data items, especially those that are derived as a response to updates on inputs or configurations. Such visibility, how- ever, is crucial for many system administration tasks in- cluding: network diagnostics, identifying malicious nodes, damage assessment and accountability and policy enforce- ment. This has led to a series of advancements to pro- vide better network support for accountability, and effi- cient mechanisms to trace packets and information flows in distributed system. In this paper we explore two main techniques to detect and remove the effects of an identi- fied fault in distributed systems: Incremental view main- tenance works by deleting incorrect inputs along with the system states that were transitively derived from these in- puts, and DMAP our proposed alternative fault removal technique which works by maintaining metadata with each system state that can decide accurately whether a system state should be removed in response to the deletion of an input. 1 Introduction Modern computing systems are by nature intangible and not directly visible. However, the visibility of such sys- tems – specially distributed systems – are sometimes cru- cial specifically when it comes to enforcing accountability and security policies. Network operators often find themselves needing to an- swer forensic and diagnostic questions. When an unex- pected event occurs in a certain system – such as the dis- covery of a suspicious entry in a database or a malicious rule in a router’s routing table, the effects and causes of such an event must be determined in order to repair the system and restore it back to a correct state. Here’s where the importance of utilizing network provenance emerges. Data provenance is not a new concept, it has been suc- cessfully applied to many different areas including collab- orative databases [7], cloud computing [10] and in dis- tributed systems in general [18, 20, 19]. It’s mainly used to answer questions related to the causes or effects of data items. Network provenance can be utilized to answer forensic questions, perform network diagnostics, identify malicious nodes and misbehaving users and enforce accountability and trust management policies in distributed systems. In this paper we explore two main techniques to detect and remove the effects of an identified fault in distributed systems. A main challenge for damage mitigation is de- ciding the minimum number of states that need to be red- erived; mitigation may potentially involve most, if not all nodes in a system. We adopt the provenance model that builds upon our prior work on provenance [18, 20, 19] to perform dam- age assessment and provide mitigation strategies. There are two approaches to revert a system to a correct state: (1) Incremental view maintenance works by deleting in- correct inputs along with the system states that were tran- sitively derived from these inputs (This has already been implemented in [20]). (2) DMAP (Damage Mitigation via Absorption Provenance) , our proposed schema, works by maintaining metadata [12] with each system state that can decide accurately whether a system state should be re- moved in response to the deletion of an input. We argue that our proposed schema leads to a significant reduction in network update cost as compared with the incremental view maintenance due to DMAP’s ability to eliminate the need for recomputations. Section 2 presents a background introduction to network provenance and declarative networking. We also discuss the previous damage mitigation technique – incremental view maintenance – and we point out some of its draw- backs. In Section 3 we propose DMAP, an alternative fault removal technique. We then conclude with Implementa- tion and evaluation results in Section 4. 2 Background and Related Work As a background, we briefly introduce network prove- nance. Data provenance [3] is not a new concept, it has been extensively adopted in many different contexts. For example, in database, provenance (or lineage) is a well stud- ied concept mainly used to answer questions related to the derivations of query results and to investigate which data sources they originated from. Network provenance [17] is utilized to answer foren- sic questions (how and why) related to messages traversal path, how data messages were derived and which parties 1
  • 2. were involved in these derivations. It basically describes the history and derivations of network states resulting from the execution of distributed protocols. Networks’ ability to perform forensics analysis is essential to a number of net- work management tasks including: identifying malicious nodes, performing network diagnostics and enforcing trust management policies. Previous work [20] has captured network provenance as a global dependency graph where vertices represent sys- tem states at a particular node and edges represent mes- sages across nodes. For example, Figure 1 depicts a net- work topology and the corresponding provenance graph for bestPathCost(@a,c,5) (computed by MINCOST). Ovals represent the rule execution vertices while the link, path- Cost and bestPathCost vertices represent the intermediate computation results. The graph encodes how tuples are generated during the execution of MINCOST query. For instance, pathCost(@b,c,2) is generated from rule sp3 at node b taking link(@b,c,2) as an input. Figure 1 : Example network topology (left) and the corre- sponding provenance graph for bestPathCost(@a,c,5) Provenance information is usually represented in rela- tional tables in a format similar to that used in existing work [5, 6, 11]. Provenance information can be stored in two tables [20] –prov and ruleExec – that are distributed across all network nodes and partitioned based on the lo- cation specifier Loc. Each entry in the prov table represents a direct derivation of a tuple and it is of the form prov(@Loc, VID, RID, RLoc) indicating that the tuple vertex VID at loca- tion Loc is directly derivable from the rule execution vertex ruleExec for a rule located at RLoc. The ruleExec table stores the actual metadata of the rule execution and it is of the form ruleExec(@RLoc, RID, R, VIDList) indicating that the rule execution RID located at node RLoc is directly derived from tuple vertices VIDList. Figure 2 presents the prov and ruleExec tables that correspond to the provenance graph in Figure 1 . Figure 2 : An example prov and ruleExec relations that cor- respond to the provenance graph in Figure 1 Network provenance is maintained through the use of Declarative Networking which models network protocol as continuous queries over distributed streams. Declarative queries such as, NDlog, represent a compact way to im- plement a variety of routing protocols and overlays net- works. For more information on Declarative Networking and its queries, refer to [13, 14, 15] 2.1 Incremental View Maintenance In incremental view maintenance, provenance maintains sufficient information in order to reproduce the system ex- ecution trace. This is beneficial when applying modifica- tions: System inputs that are marked for deletion are re- moved along with the system states that are derived from these tuples. Updated tuples then get inserted and the sys- tem state is derived. Incremental view maintenance with provenance is sim- ilar to the standard DRed [9] algorithm for incremental view update in databases. DRed works by first over- deleting tuples and then reproducing tuples that might have alternative derivations. Previous work [12] has shown that DRed can be greatly expensive especially when large number of deletions or updates are preformed. An exam- ple in [12] demonstrates how deleting a single link in the 2
  • 3. Reachable view resulted in the deletion and then the red- erivation of all tuples – those that are still derivable even after the deletion took place. i.e. the cost of deleting one link was equal to computing the entire Reachable view from scratch. Moreover, it has been noted that a system, in the revocation phase, might reproduce a system state that should not be revoked [9]. Incremental view maintenance has been applied to provenance in previous work [20]: Whenever a base tu- ple is deleted, all derivations resulted from the NDlog [13] rules are incrementally removed, resulting in cascading deletions of the corresponding prov and ruleExec entries in the provenance graph. In the following section we introduce our main contribu- tion: an alternative damage mitigation strategy that elimi- nates the need for re-computations. 3 Damage Mitigation via Absorption Provenance One major challenge associated with the previous dam- age mitigation technique is handling deletions of tuples. Incremental view maintenance works by first deleting the detected bad inputs and the system states affected by these inputs, and then inserting updated inputs and re-driving the system state. In this section we propose DMAP (Damage Mitigation via Absorption Provenance), a damage mitigation technique aimed at eliminating the need for recomputations imposed by the Incremental view maintenance. DMAP works by maintaining metadata [12] with each system state that can decide accurately whether a system state should be re- moved in a response to the deletion of an input. Our proposed schema implements a previously pro- posed provenance model known as the absorption prove- nance [12]. The benefits of implementing the latter model in DMAP are twofold: (1) It is easy to determine whether an intermediate tuple should be removed in response to the removal of a base tuple; and (2) It aims to reduce net- work and computation costs by propagating appropriate updates from one node to another only when necessary. Figure 3 demonstrates how deletion is handled in DMAP: each tuple in a provenance graph is annotated with a boolean expression. When base tuples are inserted, each tuple is assigned a boolean variable with an initial value of True. These boolean variables are then propagated to the remaining derived tuples. In our scheme, deletion on the derived tuples are directly caused by deletions on base tu- ples. Once a base tuple gets deleted, the associated boolean variable gets reset to False and appropriate updates are propagated to the derived tuples. If applying the absorp- tion rule to a tuple’s boolean expression (usually computed by BDD operation "restrict" [4]) results in False, we remove it. Otherwise, it remains derivable. In other words, tuples remain derivable iff the expression evaluates to True. (a) Each base tuple is annotated with a boolean variable (p1, p2 and p3 in red) with an initial value of True. Remaining provenance expressions are computed for each derived tuple (shown in the right table in pv column) (b) When link(@b,c,2) is deleted, p1 is reset to false in all derived tuples. The table on the right represents the provenance expres- sions after p1 is set to False Figure 3 : DMAP algorithm applied to base and derived tuples generated by MINCOST . pv column contains the absorption provenance for each tuple To implement the absorption rule, we utilize the Reduced Ordered Binary Decision Diagram [2]. ROBDD is not only a natural way of representing boolean expressions in prove- nance but it also reduces unsatisfied algebraic functions to zero. In other words, absorption is applied automatically 3
  • 4. in ROBDD. In the implementation of DMAP, we exploit the capabilities of one of many available optimized BDD libraries [4]. This library provides symbolic boolean alge- bra with a selection of function representations including: logic expressions, truth tables and ROBDDs. DMAP’s ability to eliminate the need for over-deletions and re-derivations leads to a significant reduction in a net- work’s update cost as compared with incremental view maintenance. The following section evaluates the perfor- mance of the two stated damage mitigation techniques. 4 Evaluation In this section, we evaluate our proposed schema in Sec- tion 3 .The goals of our evaluation are (1) to prove that our proposed schema constantly outperforms cascading dele- tion by a significant amount; and (2) to confirm that the total update cost grows gradually with update ratio and topology size. 4.1 Implementation and Experimental Setup Our proposed schema is applied to Rapidnet [16], a de- velopment toolkit for simulation and implementation of network protocols. Rapidnet compiles NDlog programs into applications that are executed by the ns-3 [1] runtime. ns-3 is a discrete-event network simulator that provides a platform for users who wish to perform tests that are diffi- cult to conduct with real systems or for those who merely want to learn how networks work in general. ns-3 also fa- cilitates the evaluation of systems’ behavior in a highly con- trolled environment. For our simulation experiment, we generate transit-stub topologies using the GT-ITM topol- ogy generator [8]. There is one transit domain in each graph that consists of x transit nodes (where x corresponds to 1, 2, 3, 4 ... 15 respectively). Each transit node is con- nected to 3 stub domains: There is no extra transit-stub edges and no extra stub-stub edges. Every stub domain is connected to 4 nodes. We increase the number of nodes in the network by increasing the number of transit nodes. Figure 4 shows 2 graphs generated by the GT-ITM net- work generator. In each graph, we adjusted the number of transit nodes to increase the total number of nodes in the topology. (a) (b) Figure 4 : The network in (a) consists of 2 transit nodes per domain, 3 stub domains per transit node and 4 nodes per stub domain. The total number of nodes in this network is: 1 · 2(1 + 4 · 3) = 26nodes. If we increase the number of transit nodes in (a) by 2, we get the resulted topology in (b) that consists of 52 nodes in total (1 · 4(1 + 4 · 3) = 52 nodes) Of course, the provenance graphs that correspond to the generated topologies are much more greater in size. Fig- ure 5 shows the topology size of each network tested in our deployment experiment along with the corresponding provenance graph size. 4
  • 5. Figure 5 : The sizes of topologies used in our experiments along with the sizes of their corresponding provenance graphs DMAP is divided into two engines that are connected to each other via a bi-directional pipe: The Provenance Engine acts as the decision making engine in DMAP which identifies the level of damage across the network (i.e. the consequences of a malicious node) and determines what tuples need to be deleted in order to restore the system to its correct state. The Deletion Engine on the other hand, is responsible for carrying out the actual dele- tion. Despite the fact that DMAP can be easily integrated into exiting distributed systems, we apply our proposed schema – for the sake of simplicity– to MINCOST [20] which upon execution, generates streams of link, pathCost and bestPathCost tuples that are joined at different nodes to compute the best paths (lowest-cost) between pairs of nodes. The three-rule MINCOST program is shown in Figure 6 . Figure 6 : The MINCOST program in NDlog MINCOST Example: The following scenario further il- lustrates the exact functions of DMAP (depicted in Figure 7 )1. 1. When executed, MINCOST generates streams of link, pathCost and bestPathCost tuples and stores them in a rational database.The Deletion Engine retrieves the 1The graph and table examples are similar to those used in existing work [20] prov and ruleExec relations and sends them to the Provenance Engine along with the identified mali- cious nodes. 2. The Provenance Engine converts the received string streams into a graph representation (refer to Section 4.1 in the existing work [20] for more information on how the resulted graph is constructed from prov and ruleExec relations). Ovals represent the rule execution vertices while the link, pathCost and bestPathCost ver- tices represent the intermediate computation results. Provenance Engine then annotates every tuple in the graph with a boolean expression: Base tuples (link(@b,c,2), link(@b,a,3) and link(@a,c,5)) are assigned boolean variables (p1, p2 and p3 respectively) each with an initial value of True. These boolean variables are then propagated to the remaining derived tuples. For instance, p1 and p2 propagate to the successors of link(@b,c,2) and link(@b,a,3) until they reach the rule- Exec tuple sp2@b which is assigned to a boolean ex- pression of the value p1 OR p2. 3. The provenance variable (p1) annotated with the iden- tified malicious node link(@b,c,2) gets reset to False, indicating that the latter needs to be deleted from the provenance graph. Afterwards, the value of p1 gets substituted into the boolean expressions of the remain- ing tuples. If the tuple’s boolean expression evaluated to False, it gets removed. Otherwise, it remains deriv- able. For example, zeroing out p1 results in reseting the boolean variables in sp1@b, pathCost(@b, c,2), sp3@b and bestPathCost(@b, c,2) to False and therefore elimi- nating them from the provenance graph. On the other hand, applying the absorption rule (described in Sec- tion 3) to the boolean expressions: (p1 OR p2) and [(p1 OR p2) AND p3] evaluates to: p2 and (p2 AND p3) re- spectively. 4. The tuples to be deleted are sent back to the Deletion Engine via a bi-directional pipe. 5. Once the Deletion Engine receives the tuples to be deleted, it parses them into a readable format and then removes them statically from the provenance table. Note that the cascading deletion trigger (described in Section 2.1) is deactivated in our evaluation experi- ment. Experimental Setup: Our deployment experiments are carried out on a local cluster consisting of 8 Intel Xeon 2.67 GHz CPUs with 8 GB RAM running Linux 3.2.0. The ex- periment results descried in the upcoming sections reflect the average time of 50 executions of the experiment. 5
  • 6. Figure 7 : An Illustrative scenario that exhibits the two main engines in DMAP 4.2 Experimental Results Our deployment provides a mechanism to study the up- date cost of various damage mitigation techniques. Figures 8 and 9 illustrate that the total update cost grows rapidly with topology size and update ratio. The larger a topology is, the longer it takes DMAP and incremental view mainte- nance to update a specific provenance graph. Similarly, the more base tuples are deleted (update ratio), the more time it takes those damage mitigation techniques to execute. Figure 8 plots the update cost (average execution time) for various sized simulated networks when deleting 10% of their corresponding provenance graph. For example, a topology of size 78 – which has a provenance graph of size 3883 tuples – takes an average of 0.8 seconds to delete 388 tuples using DMAP. As shown in Figure 8 incremental view maintenance in- curs a significant increase in execution time. For instance, in a 117-node network, incremental view maintenance im- poses a dramatic increase in execution time as compared to DMAP (roughly a 370% increase). The significant increase in update cost in the former system is due to the incremen- tal re-computations caused by cascaded deletions. When- ever a base tuple is deleted, all its derivations are removed and new prov and ruleExec tuples are created. Our pro- posed technique on the other hand, decreases the update cost by relying on a decision making engine (Provenance Engine) to determine which tuples must be deleted to re- store the system to its correct state. Those tuples are then deleted statically from the provenance relation without in- crementally creating or recomputing any new derivations. The average execution time required to mitigate network failures in a topology of size 78 is shown in Figure 9 . We 6
  • 7. Figure 8 : Average execution time per topology size when 10% of the corresponding provenance graph is deleted Figure 9 : Average execution time per update ratio for a topology of size 78 Figure 10 : Average execution time and total number of deleted tuples per topology size using DMAP when 10% of the corresponding provenance graph is deleted Figure 11 : Average execution time and number of deleted tuples per update ratio for a topology of size 78 using DMAP run our experiments on a 78-node network and gradually delete randomly selected base tuples. As can be discerned from the figure, our proposed schema significantly reduces the update cost: Deleting 90 base tuples using DMAP de- creases the average execution time by a factor of 3 as com- pared with incremental view maintenance. It is important to note that DMAP incurs much more update cost compared with that of the Provenance Engine (also plotted in both Figure 8 and 9 ). Of course, we have to keep in mind that the considerable increase in execution time using DMAP is due to the fact that the latter is re- sponsible for exchanging provenance information between its two engines, parsing received data, carrying out the ac- tual static deletion in addition to the tasks assigned to the Provenance Engine. Nonetheless, our results clearly indi- cate that our proposed schema constantly outperforms in- cremental view maintenance by a significant amount. Also, it seems that our approach works particularly well when 7
  • 8. the graph size is large. In addition to examining the update cost (left Y-axis) for various sized networks using DMAP, Figure 10 plots the total number of deleted tuples (right Y-axis). For instance, deleting 45 base tuples in a 169-node topology – which has a provenance graph of size 16066 tuples – takes an average of 3.9 seconds and removes 10% of the provenance graph (roughly 1607 tuples). Similarly, Figure 11 shows the aver- age execution time and the total number of deleted tuples per update ratio for a topology of size 78 using DMAP. The two depicted lines in both Figures 10 and 11 present a near-identical constant cost, which is expected. In summary, the results of our deployment experiment indicate that DMAP achieves a substantial decrease in up- date cost as compared to incremental view maintenance. This decrease is due to DMAP’s ability to eliminate the need for recomputations imposed by incremental view maintenance. The graphs have also demonstrated that the update cost has a positive correlation with both topology size and update ratio: The larger a topology is, the longer it takes DMAP and the incremental view maintenance to up- date a specific provenance graph. Similarly, the more base tuples are deleted (update ratio), the more time it takes those damage mitigation techniques to execute. 5 Conclusion This paper presents DMAP, a fault removal technique that aims to reduce network and computation costs by propagating appropriate updates from one node to another only when necessary. DMAP works by maintaining ab- sorption provenance with each system state that can de- cide accurately whether a system state should be removed in a response to the deletion of an input. Unlike incremen- tal view maintenance – which works by deleting incorrect inputs along with the system states that were transitively derived from these inputs – our proposed technique relies on a decision making engine (Provenance Engine) to deter- mine which tuples must be deleted to restore the system to its correct state. Those tuples are then deleted statically from the provenance relation without incrementally creat- ing or recomputing any new derivations. We argue that DMAP’s ability to eliminate the need for over-deletions and re-derivations leads to a significant re- duction in networks’ update cost as compared with incre- mental view maintenance.The evaluation results in Section 4 have shown that our proposed schema constantly outper- forms the incremental view maintenance by a significant amount. Also, it seems that our approach works particu- larly well when the graph size is large. 6 References [1] Network simulator 3. http://www.nsnam.org/. [2] R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Comput- ers, C-35-8:677–691, 1986. [3] P. Buneman, S. Khanna, and W.-C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. [4] C. Drake. Python electronic design automation library. https://github.com/cjdrake/pyeda. [5] B. Glavic and G. Alonso. Perm: Processing prove- nance and data on the same data model through query rewriting. In ICDE, 2009. [6] T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tan- nen. Update exchange with mappings and prove- nance. In VLDB, 2007. [7] T. J. Green, G. Karvounarakis, N. E. Taylor, O. Biton, Z. G. Ives, and V. Tannen. Orchestra: Facilitating col- laborative data sharing. In SIGMOD, 2007. [8] GT-ITM. Modeling topology of large networks. http: //www.cc.gatech.edu/projects/gtitm/. [9] A. Gupta, I. S. Mumick, and V. Subrahmaniant. Main- taining views incrementally. In SIGMOD, 1993. [10] R. Ikeda, H. Park, and J. Widom. Provenance for gen- eralized map and reduce workflows. In CIDR, 2011. [11] Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Or- chestra: Rapid, collaborative sharing and dynamic data. In CIDR, 2005. [12] M. Liu, N. E. Taylor, W. Zhou, Z. G. Ives, and B. T. Loo. Recursive computation of regions and connectivity in networks. In ICDE, 2009. [13] B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking: Language, exe- cution and optimization. In SIGMOD, 2006. [14] B. T. Loo, T. Condie, J. M. Hellerstein, P. Maniatis, T. Roscoe, and I. Stoica. Implementing declarative overlays. In SOSP, 2005. [15] B. T. Loo, T. Condie, J. M. Hellerstein, R. Ramakr- ishnan, and I. Stoica. Declarative routing: Extensible routing with declarative queries. In SIGCOMM, 2005. 8
  • 9. [16] S. C. Muthukumar, X. Li, C. Liu, J. B. Kopena, M. Oprea, and B. T. Loo. Declarative toolkit for rapid network protocol simulation and experimentation. In SIGMOD (demo), 2009. [17] W. Zhou, E. Cronin, and B. T. Loo. Provenance-aware secure networks. In ICDE/NetDB, 2008. [18] W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In SOSP, 2011. [19] W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen, Z. Ives, B. T. Loo, and M. Sherr. Distributed time- aware provenance. In VLDB, 2013. [20] W. Zhou, M. Sherr, T. Tao, X. Liu, B. T. Loo, and Y. Mao. Efficient querying and maintenance of net- work provenance at internet-scale. Technical report, University of Pennsylvania, 2010. 9