SlideShare a Scribd company logo
1 of 14
Download to read offline
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Terrorism Informatics and Privacy
Shrey Jairath
IIIT- Delhi
Abstract. Terrorism Informatics is an emerging field where tools of Informa-
tion Technology are applied to produce counter-terrorism efforts. The govern-
ments of terrorism–affected nations have shown the primary interest in this
field. Numerous programs are believed to exist under US government, which
specifically indulge in pattern-based Data Mining over huge databases. These
databases contain huge amount of data from disparate sources, including de-
tailed data on the US citizens as well. Subjecting such data to data mining has
caused various privacy issues. Articles on this topic by Bruce Schneier and
some papers published in this field have been surveyed and summarized here to
present - a view of privacy invasive nature of work being done in the field of
counter-terrorism data mining, ways to introduce privacy preserving technolo-
gies in this field and arguments on whether data mining can be an effective tool
for National Security at all.
Keywords: Terrorism, Informatics, Data Mining, Privacy.
1 Introduction
Terrorism is a nuisance. It evokes anger. The thought that, somebody‘s life is
taken without any fault of that individual, that it was not an accident but was precisely
planned, that the killer did not even care which individual died as long as somebody
died, that the dead individual is just an increment in the number of such deaths gives
an unexplainable feeling –as if human life is so cheap.
In my view, the effect of terrorism on society is more pervasive and damaging
than it is generally considered. I find effects of terrorism, besides the loss of human
lives, very similar to effects of privacy violations. Terrorism is so often compared to
other causes of death like accidents, diseases, poverty, and it is argued that terrorism
is too rare when compared to these events. But though terror strikes are rare and
therefore the actual deaths due to the same is generally not high, but I believe the
effect of terrorism is etched in society. There is always a threat and fear hanging in
the air. There might not be an attack for a long time but the threat would always be
there and this threat is directly faced by the society. Terrorist attacks might be a rare
event, but terrorism is a permanent trend. Considering that no counter-measures are
taken for terrorism which would violate the sense of society‘s freedom, still the socie-
ty is threatened and frightened and no sense of privacy could provide an escape. The
purpose of terrorism is to instill this fear in the society; therefore ignoring terrorism
on the basis that it is rare does not kill the fear. The fear gets killed only through as-
surances that there is no threat and when those assurances are kept. Hence the point is
that, keeping all the privacy and handicapping counter-terrorism efforts does not gain
any goal. Privacy in a threatened atmosphere seems absurd.
Therefore, terrorism cannot be ignored and effective counter-terrorism meas-
ures are a must in order to achieve goals of both security as well as privacy. There are
a host of counter-measures that are around us all the time. From in-uniform security
personnel to under-cover cops to intelligence agencies and from traditional technolo-
gies like CCTV networks, X-ray machines to complex face recognition, sensor net-
works etc, there has been a lot of efforts put and initiatives taken to counter terrorism.
While, there are certainly a number of measures taken, but the need is to maintain
objectivity and evaluate each measure by its effectiveness. Most of the measures are
different forms of ―Security Theatre‖, the concept described by Bruce Schneier[1].
Security theatre means securing against a very specific attack like securing against a
second 9/11, securing super bowl, historical monuments, subways, metro against
terrorist attacks. Bruce Schneier says that such a strategy to secure against terrorism-
by trying to secure against each possible attack - is grossly ineffective. The main rea-
sons for the ineffectiveness are that -
1. The number of possible attacks is limitless, by securing against a set of at-
tacks we are only forcing the terrorists to do slight modification in the plan and follow
some other attack - By securing Airports we are only getting the subways blown.
2. There is no dearth of terrifying ideas, but we do not see them in reality
very often. It is because terrorism is hard to carry out. Terrorism is very rare. When
the number of attacks is few, each attack would be a new attack- not a copy of pre-
vious ones. It would be a new target and a new tactic. Hence security theatre can't
work since it is based on old tactics taken by terrorists.
It is required that each measure taken is effective since otherwise there is not only loss
of resources and privacy due to that measure but also loss of security as a possible
better alternative measure goes neglected.
One certain counter-terrorism initiative is use of Information Technology in
the form of data mining, which we are particularly interested in. The governments of
terrorism–affected nations have shown the primary interest in this field. Numerous
programs are believed to exist under US government, which specifically indulge in
pattern-based Data Mining over huge databases. These databases contain huge
amount of data from disparate sources, including detailed data on the US citizens as
well. Subjecting such data to data mining has caused various privacy issues. Articles
on this topic by Bruce Schneier and some papers published in this field have been
surveyed and summarized here to present - a view of privacy invasive nature of work
being done in the field of counter-terrorism data mining, ways to introduce privacy
preserving technologies in this field and arguments on whether data mining can be an
effective tool for National Security at all.
The structure of this paper is as follows: Section 2 describes the privacy in-
vasive nature of these data mining measures; Section 3 is about how to make the tra-
deoff between Security and Privacy in the context of counter-terrorism; Section 4
presents the arguments on why data mining would never work for the purpose of na-
tional security; Section 5 describes a framework which would ensure that the data
mining practices do not lead to privacy invasions; Section 6 provides ways of doing
privacy preserving data mining where the privacy preserving nature is inbuilt in the
tool; finally we conclude in Section 7.
2 Privacy Invasive Terrorism Informatics
Past instances of terror strikes – 9/11, Madrid and London bombings- have shown that
terrorists integrate into the society to seek invisibility [3]. This has led governments to
look for terrorists blended in their own society in addition to looking for them in for-
eign lands. Data Mining is one of the strategies adopted in this regard. Vast databases
have been created which records every day information about an individual like –
educational, health, financial, commincations. These records are then subjected to
data mining algorithms to find patterns. The assumption is that terrorist activity leaves
behind a trail in the every day activities and there are patterns which could identify it.
Two types of data mining are being used aggressively:
1. Subject-Based – Used to gather information about individuals already sus-
pected of wrong-doing. This type of data mining has been used since a long
time and forms major source of the investigations.
2. Pattern-Based – A model is built which is considered to characterize the ac-
tivities related to terrorism and is used to match against the sea of every day
data. Any hit is considered as a possible terrorist plan or potentially culpable
individuals. The aim of such program is to find terrorists hidden in the socie-
ty. This type of data mining for national security purposes started after 9/11.
While in subject-based data mining, there is an initial suspect around whom the data
mining revolves, there is no such center of suspicion in pattern-based data mining and
is based on the predictive powers of data linkages [3]. This has caused concerns as
people who have done nothing to warrant a suspicion are suddenly being watched day
in and day out. Almost all of the privacy concerns regarding data mining for national
security purposes have been regarding pattern-based type of data mining.
Although the goal of the program is the security of its citizens, the means are
privacy invasive since the sensitive data of the citizens are scrutinized. Process of
extracting information about individuals used to be expensive and time-consuming.
This ensured that privacy violations are not practically feasible. This effect was
termed as ―practical obscurity‖ by U.S Supreme Court [2]. In twenty first century,
though, practical obscurity has been eroded by the developments in technology.
3 Trading Security with Privacy
Whenever security and privacy are face to face, security measure automatically wins
over civil liberties as the security threat is always more apparent and there is a loss of
understanding of the concept of privacy. Usually, no reasoning is done about whether
the measure is even effective enough. This is a wrong tradeoff for civil liberties as
well as a loss for security as there might be better alternatives which do not get the
attention and the resources.
[4] Specifically talks about the tradeoff that exists between security and pri-
vacy. It puts forward the rational way to balance the security with liberty. It says that
the tradeoff between security and privacy is not set in linear equation and it is possible
that alternatives occur with better security promises as well as lesser civil infringes.
Also, protecting privacy does not necessarily require the proposed measure to be
scrapped completely but certain measures ensuring accountability might be enough.
But, the courts are not ready to go even that far as gravity of security threat automati-
cally wins over the loss of privacy.
In order to rationally trade security with privacy [4] puts forward the follow-
ing methodology and applies it in case of terrorism as threat and data mining as secu-
rity measure:
 First assess the gravity of security threat
-About terrorism the author says that threat of terrorism is over-
hyped as number of people dying due to terrorism is miniscule; panic and fear cause
the threat to be overstated. But, I would contest this perspective, as I have done earlier
in the paper. The consequences of rare terrorist strikes are long-lasting and very akin
to consequences of privacy violations. In my view, the threat of terrorism can not be
taken lightly and should be given enough weight.
 Secondly, Assess the effectiveness of proposed security measure
against the given security threat
-About Data Mining as a security measure against terrorism, the
author says that it is effective in commercial settings where appetite for false positives
is much higher and automatically has serious concerns in governmental purposes due
to the harms of false positives. Also, the author says that there is no evidence where
Data Mining has proved its efficiency and worthiness.
 Based on above two factors decide whether the loss of civil liberties is
justified.
In the case of data mining for counter terrorism, as mentioned above the author feels
the threat of terrorism is overhyped and says that the lack of any example proving the
efficiency of data mining for such purposes and the highly covert nature of such tech-
nologies make it hard to gauge the possible worthiness. The verdict of the author is
fully captured in these lines-
" Given the significant potential privacy issues and other constitutional concerns,
combined with speculative and unproven security benefits as well as many other
alternative means of promoting security, should data min-ing still be on the table as a
viable policy option? Of course, one could argue that data mining at least should be
investigated and studied. There is nothing wrong with doing so, but the cost must be
considered in light of alternative security measures that might already be effective and
lack as many potential problems. "
In my view, the threat of terrorism would always qualify to consider the possible se-
curity measures and I would like to give it enough weight to consider even privacy
violating measure. I feel this is the problem with the method, as it is qualitative in
nature. I can‘t quantitatively assess a security threat like terrorism and see whether it
qualifies for certain amount of privacy violation (which too can‘t be quantified).
Through this method as well it comes to the whims of the judge to say whether the
particular security threat is grave enough for a list of privacy violations. But, the secu-
rity advocates and privacy advocates would already have sides chosen.
But evaluating the security measure though, certainly seems a logical re-
quirement to perform the tradeoff between security and privacy. Effectiveness of a
security measure is much more quantifiable and apparent. It makes sense to match the
effectiveness of a security measure with the privacy violations. Though, certain de-
gree of ambiguity remains. It may seem naïve to argue but suppose a particular securi-
ty measure saves one life per year in return of particular privacy violations. How
would you decide whether the trade-off is balanced? How would you balance the
certain number of lives saved with any amount of privacy violation?
Thus, the only step I would really stand by while performing the trade-off is
comparing the possible security measures against each other. It is vital to choose the
most effective security measure or the most effective to privacy invasive measure
available, if that ratio is measurable.
4 Why Data Mining won’t work for National Security
Bruce Schneier in [1] from 2001 till today has maintained that data mining would
never work for national security purposes. The main reasons pointed out are:
1. The attacks are very rare.
2. No well defined profile to search for.
3. High cost of false positives.
The author says that Data mining works when there is a reasonable number of attacks
per year and there is a well defined profile to search for. In case of terrorism, though
there is a pattern common to many terrorist attacks, the pattern is shared by many
many other events as well. And since the number of actual attacks are too few than
those other events - the number of false positives per every true positive is massively
large. Further, the author says that the cost of the false positives is financially and in
terms of civil liberties very high.
Hence, in Bruce Schneier‘s view the only way to fight terrorism is through on the
ground intelligence work and investigation.
[1] Performs qualitative assessment of data mining and puts forward the current road-
blocks for data mining to prove efficient for national security:
1. Data Quality
Duplicate Records, lack of data standards, timeliness of updates and human
error are some factors that make data mining inaccurate. The reports describing the
various governmental data mining programs have frequently stated the evidence of
such data inaccuracies. Further, the high stakes of such errors for individuals make it
even harder.
2. Data Matching
There is no single huge database and data mining requires integration across
many different databases. This linking different databases together is a difficult and
sometimes infeasible task as databases might have different formats, the data about
the same individual might be in different forms, the data itself might be in unstruc-
tured format etc. Government often does not have control over the disparate sources
of data and hence rectifying this issue is even harder.
3. Data Mining Tools
It is hard to comment directly on governmental data mining efficiency as
there are no examples of its success and otherwise it is carried out in classified man-
ner. But, inferring from the efficiency of data mining in commercial sector, there is
the problem of inaccuracies mainly in the form of huge false positives. Compared to
private sector there are lot many factors that should further diminish the performance
of governmental data mining as
 The target for government is far lesser in number than the target for pri-
vate sector.
 The terrorist can blend in
 Hard to get the pattern to search for as there have not been many terrorist
attacks and those that have occurred are very different from each other. [1]
puts it properly in - ―With a relatively small number of attempts every
year and only one or two major terrorist incidents every few years—
each one distinct in terms of planning and execution—there are no
meaningful patterns that show what behavior indicates planning or
preparation for terrorism.‖
 Data Mining efforts are reactive i.e they respond to the previous examples
of terrorist incidents but national security requires proactive efforts as the
terrorists can always come up with a very new plot.
 In private space, the targets do not care a lot but in counter terrorism the
terrorists will make all efforts to avoid getting caught.
 Paul Rosenzweig, Deputy Assistant Secretary for Policy at DHS : ―[t]he
only certainty [in data mining] is that there will be false positives.‖
5 Framework to Prevent Privacy Invasion
The Fourth Amendment is the restriction imposed on government from obtaining
personal information about individuals against ―general searches‖[1]. Thus Fourth
Amendment while allows for specific searches which are encountered in Subject-
based data mining, the general searches of pattern-based data mining are blocked by
the fourth amendment. But, the boundary between specific searches and general
searches are dissolved to just distinguish between reasonable searches and unreasona-
ble searches. Fourth Amendment applies to searches performed by US government
for national security and intelligence purposes.
But Fourth Amendment does not apply on data collected by third parties i.e.
is the private parties. And since most of the data used for data mining purposes are
collected from these private third parties, there is almost no refrain due to the Fourth
Amendment in use of this data.
Apart from Fourth Amendment, the Privacy Act of 1974 tries to regulate the
government‘s collection and usage of private data. This act requires agencies to [1]:
 Store no more information than required by the executive order.
 Maintain data quality.
 Ensure security of the stored data.
But there are various exceptions in this act which let the government get away with
their motives.
To evaluate the efficacy of its data mining programs and the privacy violations due to
them, US government established TAPAC – Technology and privacy Advisory com-
mittee. TAPAC in its recommendations to the government recommended a frame-
work for carrying out the data mining activities. This framework has been generally
accepted and is advocated in [1] and [6]:
 Legal Authorization – Requires the agency head to write an authoriza-
tion letter stating the purpose of project, how the information will be used,
establish acceptable false positive rates and the ways to deal with them.
 Access Control – Ensure that only the authorized users gets access to the
data and that they do not misuse the data.
 Anonymization and Selective Revelation – Reveal the minimum amount
of private information. Further detailed data is shown only if need be
which is also selectively revealed.
 Audit – Keep a record of what information was watched by which ana-
lyst. This would allow investigation into data breaches and misappropria-
tion of data.
 Address False Positives – Instead of directly taking actions on the results
of data mining, perform an intermediate step where analysts investigate
the result. If a false positive is found, use the result to improve the data
mining program.
 Accountability Measures – Internal and external reviews of the program
should be held. The government should validate the models being used in
these programs and the results.
6 Privacy Preserving Data Mining
[5] has performed a survey of the data mining techniques used in the very much re-
lated field of – Fraud Detection. The survey yields that all kinds of learning algo-
rithms are in extensive use in this field:
 Supervised Approaches - Using labeled examples of fraudulent and au-
thentic transactions, a mathematical model is created to distinguish be-
tween the two. Supervised learning algorithms that have been used for
such purposes include - Neural Networks, SVM, Bayesian Networks,
Naive Bayes, Association Rule Mining, Genetic Programming. Popular
supervised algorithms like Neural Networks, Bayesian Networks and de-
cision trees have been combined together to create hybrid approaches to
improve results.
 Supervised + Unsupervised Hybrids - Some studies show that super-
vised algorithms outperform the unsupervised algorithms on telecommu-
nications data while the best results are achieved when both are used in
conjunction.
 Unsupervised Approaches - These techniques use unlabelled examples
to find patterns and structures inherent in the data. Link analysis and graph
mining are considered to be hot research topics in security areas like coun-
ter-terrorism and law enforcement. Unsupervised approaches like cluster
analysis, outlier detection, spike detection, unsupervised neural networks
have been applied for fraud detection.
Due to the privacy invasive nature of these techniques, many efforts have been
made to develop privacy preserving mining techniques. Data mining is a combination
of tools and the data and not just any one of them. Thus various techniques are possi-
ble which work on either the data or the tool [6].
[7] performs classification of privacy preserving data mining techniques into
three classes:
1. Heuristic Based - In heuristic based techniques the data is modified in a way
such that it leads to least loss in utility. For e.g. Data mining algorithms like
association rule mining can be made privacy preserving by ensuring that sen-
sitive rules do not receive the required support or confidence which can be
done by hiding the item sets from which these rules are derived.
2. Cryptographic - Cryptography based techniques are applied where data
mining is done on distributed data. The privacy concern in such scenario is
that each data holder does not want to expose its raw data to others while are
interested in the end computation product. Data mining algorithms are hence
required to perform secure multiparty communication. There have been vari-
ous techniques proposed which convert normal computation into SMCs and
also various SMC methods have been proposed which can support certain
data mining algorithms. One particular SMC algorithm for decision tree
learning through ID3 has been proposed by [8]. We look at this algorithm in
detail later in this section.
3. Reconstruction Based - Reconstruction Based techniques perturb the data
but is still possible to infer the distribution of data. Hence, though the data is
perturbed at more granular levels, the higher level view is still maintained.
[6] Have performed another classification of techniques which ensure that certain
sensitive rules cannot be inferred while the non-sensitive rules can be:
1. Limiting Access - Provide a sample view of database so that inferences
drawn do not imply strict support.
2. Fuzz the data - Alter the data or put aggregate values in place of individ-
ual values.
3. Eliminate unnecessary groupings - Keep the data as random as possible.
Do not append meanings to data meant for some other purposes, which
could then be mined.
4. Augment the data - Add dummy data.
5. Audit - Not feasible when the data is publically available but for within
organizational purposes it can induce accountability.
6. Attack the Algorithm
- The logic behind how the algorithm finds rules can be attacked so
as to ensure that dummy rules get created and ensure that sensitive rules
are not found.
- Performance of Algorithm can be attacked to ensure that the algo-
rithm is infeasible to be applied on the given dataset.
In the rest of this section [8] has been summarized to describe the proposed SMC
technique.
[8] proposes privacy preserving decision tree learning for a scenario where
two parties hold parts of the database and wish to not reveal the contents of their da-
tabases while are interested in the decision tree learnt on the union of their databases.
No third party has been assumed. This is a case of SMC where number of participat-
ing parties is two. The proposed technique ensures that each participating party can
learn no more than what can be learnt using its own database (its input) and the result-
ing decision tree (output). A semi-honest adversary has been considered so the tech-
nique preserves privacy in the face of any passive attack. This means that the adver-
sary shall try to break the privacy of a participating party while adhering to the proto-
cols of the proposed technique.
Decision trees are machine learning tools for tasks of classification. A deci-
sion tree is a tree consisting of nodes where each internal node is a rule defined on
one of the attributes of data. Each leaf node is one of the possible classes. Decision
tree is learnt for a given database using some decision tree learning algorithm. Once
the tree is learnt, any test instance is traversed on the tree starting at root and the leaf
node at which the traversal ends is the predicted class for the test instance. ID3 is a
specific supervised learning algorithm to learn decision tree on a given database. ID3
attempts to create shortest tree possible by trying to finish the classification using
least number of nodes/attributes. This is done by ordering the attributes in decreasing
order based on their information gain over the training data. The attribute with maxi-
mum information gain classifies completely maximum number of the existing unclas-
sified transactions. Hence ID3 recursively calculates the information gain by each
attribute over the unclassified transaction in the training set and picks the one with
maximum gain and puts it into the tree, till no unclassified transaction is left. The
information gain for an attribute depends on the entropy of the attribute. The entropy
of an attribute is given by:
where Hc(T|A) is the entropy of attribute A over set of training transactions T when
the set of possible classes is C, |T| is the number of transactions, |T(aj)| is the number
of transactions having value for attribute A = aj, m is the number of possible attribute
values for attribute A, Hc(T(aj)) is the entropy to classify the transactions having
attribute value for A = aj.
ID3 thus calculates the entropy for each attribute and selects the one with minimum
entropy and puts it into the tree. ID3- delta is an extension of ID3 where the entropy
for each attribute is approximated and attributes having entropy within delta range of
each other can come in either order-
The problem being solved is a two-party communication which is often de-
noted by: (x, y) |→ ( f 1(x, y), f 2(x, y)) where x is the input from first party, y is the
input from second party , first party wishes to receive f 1(x,y), second party wishes to
receive f 2(x,y). The particular case in the problem at hand can thus be denoted as
(D1, D2) |→ (ID3(D1 ∪ D2), ID3(D1 ∪ D2)) where D1 is the database possessed by
first party, D2 is the database possessed by second party and both parties are interest-
ed in common output – ID3(D1 ∪ D2).
The aim of SMC is to provide a private protocol to carry out above computa-
tion (in two party cases). A protocol is private if the view of each party can be simu-
lated using just its input and protocol‘s output which means that the party does not
learn anything new from protocol execution. The proposed technique is a private pro-
tocol for calculating ID3-delta. Hence the view of first party can be simulated given
D1 and ID3-delta(D1 ∪ D2) only and similarly for second party.
Since, the problem being solved is a case of SMC, the existing solutions for
SMC does solve the problem. Yao in [9] proposed a protocol for computing any prob-
abilistic polynomial-time functionality f(x,y) where x and y are the inputs of the two
parties respectively. The protocol works by first party computing f(x,.) and sending it
to second party in encrypted format. The encryption is such that it allows for partial
decryption by second party to give f(x,y). The keys used by second party are received
from first party corresponding to y. This can be done without revealing y by carrying
out |y| instances of 1-out-of-2 oblivious transfer protocol [8]. A 1-out-of-2 oblivious
transfer protocol is: ((x 0, x 1), σ ) |→ (λ, x σ ) i.e. the first party inputs a pair (x0,x1)
and second party inputs a bit 0 or 1. The protocol outputs the x0 or x1 depending on
the input bit to the second party while first party learns nothing. While, this generic
solution applies to the problem of privately computing ID3-delta as well, its complex-
ity is proportional to the input size i.e. is the size of the database and has huge com-
munication overhead. Hence, it scales badly for data mining purposes where the size
of databases is too huge.
Due to inefficiency of generic protocols, research has been focused on devel-
oping efficient solutions to specific problems. In this direction, [8] proposes solution
for two-party distributed private computation of ID3-delta. The proposed algorithm
tries to provide an efficient protocol by cutting on the communication overhead by
making each party indulge in mostly independent computations.
The assumptions across the proposed protocol are:
 The databases D1 and D2 possessed by the two parties have same struc-
ture.
 Attribute names are public
 Possible attribute values are public for each attribute
 The total size of |D1 U D2| is public.
As seen before, the main task of ID3 is finding the attribute with minimum entropy
which is performed recursively until all training transactions are classified. In order to
cut the complexity of performing this task, minimum entropy is written in following
form:
Since, |T| is the number of transactions, it is constant across all attributes
hence can be ignored. Now, to compute entropy of any attribute A, two quantities
are required: |T(aj)| and |T(aj,ci)| where |T(aj)| is the number of transactions having
attribute value for attribute A = aj and |T(aj,ci)| is the number of transactions having
attribute value for attribute A = aj and class value = ci. Now, |T(aj)| = |T1(aj)| +
|T2(aj)| and similarly |T(aj,ci)| = |T1(aj,ci)| + |T2(aj,ci)| where T1 signifies the transac-
tions in D1 and T2 signifies the transactions in D2. Therefore, a non-private me-
thod of finding minimum entropy attribute would be for first party to compute
|T1(aj)| and |T1(aj,ci)| for each attribute and send them to second party which could
then calculate |T(aj)| and |T(aj,ci)| and hence the entropy of the attribute. The com-
munication complexity is reduced in this case to logarithmic in terms of number of
transactions.
In order to turn this into a private protocol, the basis is the knowledge that
privately computing ID3-delta is equivalent to privately finding the attribute with
the minimum entropy which further means privately computing the entropy of each
attribute – Hc(T|A). This quantity has been written above as a sum of expressions
of the form (v1 + v2) ln (v1+v2), (The log can be changed to ln since we have to
compare this quantity for each attribute), where v1 = |T1(aj,ci)| or |T1(aj)| and v2 =
|T2(aj)| or |T2(aj,ci)|.
The task of privately finding the minimum entropy attribute is done by com-
puting random shares of Hc(T|A) for each attribute A and distributing between par-
ties such that sum of shares = Hc(T|A), and the task of privately computing
Hc(T|A) is done by privately computing the expression (v1 + v2) ln (v1+v2). The
protocol for privately computing the expression (v1 + v2) ln (v1+v2) takes input v1
and v2 from the two parties, privately compute (v1 + v2) ln (v1+v2), output shares
of an approximation of (v1 + v2) ln (v1+v2) to the two parties such that the sum of
the shares = approximation of (v1 + v2) ln (v1+v2).
Now, Hc(T|A) is sum of expressions like (v1 + v2) ln (v1+v2) and by the
protocol of privately computing the expression (v1 + v2) ln (v1+v2) each party has
shares such that their sum is approximation of (v1 + v2) ln (v1+v2) therefore, each
party can independently sum its own shares for all (v1 + v2) ln (v1+v2) expres-
sions for Hc(T|A) to get its share of Hc(T|A). Hence by following the protocol for
calculating all (v1 + v2) ln (v1+v2) expressions for all attributes, each party has
their shares of an approximation of Hc(T|A) for each attribute A. Now, only part
remaining is given the shares find the minimum entropy attribute i.e. find the
attribute for which the sum of corresponding shares possessed by each party is
minimum. This is done using Yao‘s protocol. It takes as input the shares for all
attributes from each party and outputs the attribute for which the sum of its shares
is minimum.
Hence, the task of finding the attribute with minimum entropy is per-
formed by invoking two separate private sub-protocols:
1. Privately calculating (v1 + v2) ln (v1+v2) and distributing its shares.
2. Private Yao’s protocol for finding minimum entropy (Hc(T|A)) attribute
given shares of Hc(T|A) for all A.
This composition of two private sub-protocols result in a private protocol as
the first protocol yields shares which are uniformly distributed in a finite field
[8]. Hence the resulting protocol is private.
Now, protocol for privately computing (v1 + v2) ln (v1+v2) remains un-
described. This protocol as said a few times already, takes as input v1 and v2 from
the two parties and outputs shares of an approximation of (v1 + v2) ln (v1+v2).
This protocol is carried out in 2 steps:
1. Distribute shares of ln(v1 + v2)
Let v1+v2 =x, therefore the task of this step is to create shares of lnx. For a
given x, we start by finding n which gives the 2n
closest to x. Therefore, x =
2n
(1+E) where -1/2<= E =<1/2. Taking ln on both sides gives:
Ln(x) = ln(2n
) + ln(1+E)= n ln2 + E – E2
/2+E3
/3-E4
/4 ….
Now, Yao‗s protocol is used to compute 2n
E and 2n
n ln2. Then, shares are calculated
for the above mentioned taylor‘s series approximation using oblivious polynomial
evaluation. The sum of the shares obtained in this step (u1, u2) are the shares for ln x.
2. Given v1 , v2 and shares of ln(v1+v2) find shares of (v1+v2)ln(v1+v2) us-
ing private multiplication protocol. The private multiplication protocol is also
based on oblivious polynomial evaluation. Each party invokes multiplication pro-
tocol twice to receive shares of u1 . v2 and u2 . v1. Party 1‗s share w1 is then the
sum of these two shares and u1 . v1 and party 2‘s share w2 is sum of these two
shares and u2 . v2. We get:
w1 + w2= u1v1+u1v2 + u 2v1 + u 2v2 = (u 1 + u 2)(v1 + v2) ≈ x ln x
These shares are then used in the protocol for finding the attribute with minimum
entropy.
7 Conclusion
While there has not been any evidence of its success, data mining for national se-
curity has serious privacy-invasive implications. The faith in Data Mining ranges
from one end where Bruce Schneier has always condemned it, - to U.S government‘s
numerous programs using data mining for national security purposes.
In this survey paper the privacy implications of national security led data mining
has been put forward, some reasons for no success have been explored, ways to find a
balance between privacy and security in this field through formal frameworks and
data mining techniques which inherently preserve privacy, are explored.
8 Acknowledgement
I would like to deeply thank Dr. Shishir Nagaraja for letting me perform this Indepen-
dent Study under his guidance.
9 References
1. Bruce Schneier : Cryptogram - http://www.schneier.com/essays-terrorism.html
2. Fred H. Cate : Government Data Mining: The Need for a Legal Framework
3. Ira S. Rubinstein et al : Data Mining and Internet Profiling: Emerging Regulato-
ry and Technological Approaches
4. Daniel J. Solove : Data Mining and the Security-Liberty Debate
5. Clifton Phua et al : A Comprehensive Survey of Data Mining-based Fraud De-
tection Research
6. Chris Clifton et al : Security and Privacy Implications of Data Mining
7. Vassilios S. Verykios et al : State-of-the-art in Privacy Preserving Data Mining
8. Yehuda Lindell et al: Privacy Preserving Data Mining
9. A. C. Yao, How to generate and exchange secrets, Proceedings of the 27th Sym-
posium on Foundations of Computer Science (FOCS), IEEE, 1986, pp. 162–167.

More Related Content

What's hot

Cyber Security Intelligence
Cyber Security IntelligenceCyber Security Intelligence
Cyber Security Intelligenceijtsrd
 
The Information Warfare: how it can affect us
The Information Warfare: how it can affect usThe Information Warfare: how it can affect us
The Information Warfare: how it can affect usLuis Borges Gouveia
 
Information Warfare
Information WarfareInformation Warfare
Information Warfaredibyendupaul
 
Francesca Bosco, Le nuove sfide della cyber security
Francesca Bosco, Le nuove sfide della cyber securityFrancesca Bosco, Le nuove sfide della cyber security
Francesca Bosco, Le nuove sfide della cyber securityAndrea Rossetti
 
Institutional Cybersecurity from Military Perspective
Institutional Cybersecurity from Military PerspectiveInstitutional Cybersecurity from Military Perspective
Institutional Cybersecurity from Military PerspectiveGovernment
 
Cyber terrorism fact or fiction - 2011
Cyber terrorism fact or fiction - 2011Cyber terrorism fact or fiction - 2011
Cyber terrorism fact or fiction - 2011hassanzadeh20
 
The Role Of Technology In Modern Terrorism
The Role Of Technology In Modern TerrorismThe Role Of Technology In Modern Terrorism
The Role Of Technology In Modern TerrorismPierluigi Paganini
 
Information warfare and information operations
Information warfare and information operationsInformation warfare and information operations
Information warfare and information operationsClifford Stone
 
2015_ICMSS_Institutional_Cybersecurity_s02
2015_ICMSS_Institutional_Cybersecurity_s022015_ICMSS_Institutional_Cybersecurity_s02
2015_ICMSS_Institutional_Cybersecurity_s02Government
 
Shubhrat.presentationfor cybercrime.ppt
Shubhrat.presentationfor cybercrime.pptShubhrat.presentationfor cybercrime.ppt
Shubhrat.presentationfor cybercrime.pptShubhrat Mishra
 
Terror And Technology
Terror And TechnologyTerror And Technology
Terror And Technologypradhansushil
 
2015 Cyber Security Strategy
2015 Cyber Security Strategy 2015 Cyber Security Strategy
2015 Cyber Security Strategy Mohit Kumar
 
CyberTerrorism - A case study for Emergency Management
CyberTerrorism - A case study for Emergency ManagementCyberTerrorism - A case study for Emergency Management
CyberTerrorism - A case study for Emergency ManagementRicardo Reis
 
Sj terp emerging tech radar
Sj terp emerging tech radarSj terp emerging tech radar
Sj terp emerging tech radarSaraJayneTerp
 
Cyber Security, Cyber Warfare
Cyber Security, Cyber WarfareCyber Security, Cyber Warfare
Cyber Security, Cyber WarfareAmit Anand
 
Mark Anderson on Cyber Security
Mark Anderson on Cyber SecurityMark Anderson on Cyber Security
Mark Anderson on Cyber SecurityMeg Weber
 

What's hot (20)

Cyber Security Intelligence
Cyber Security IntelligenceCyber Security Intelligence
Cyber Security Intelligence
 
The Information Warfare: how it can affect us
The Information Warfare: how it can affect usThe Information Warfare: how it can affect us
The Information Warfare: how it can affect us
 
Session 3.2 Zahri Hj Yunos
Session 3.2 Zahri Hj YunosSession 3.2 Zahri Hj Yunos
Session 3.2 Zahri Hj Yunos
 
Information Warfare
Information WarfareInformation Warfare
Information Warfare
 
Francesca Bosco, Le nuove sfide della cyber security
Francesca Bosco, Le nuove sfide della cyber securityFrancesca Bosco, Le nuove sfide della cyber security
Francesca Bosco, Le nuove sfide della cyber security
 
Cyber terrorism
Cyber terrorismCyber terrorism
Cyber terrorism
 
Information Warfare
Information WarfareInformation Warfare
Information Warfare
 
Institutional Cybersecurity from Military Perspective
Institutional Cybersecurity from Military PerspectiveInstitutional Cybersecurity from Military Perspective
Institutional Cybersecurity from Military Perspective
 
Cyber terrorism fact or fiction - 2011
Cyber terrorism fact or fiction - 2011Cyber terrorism fact or fiction - 2011
Cyber terrorism fact or fiction - 2011
 
The Role Of Technology In Modern Terrorism
The Role Of Technology In Modern TerrorismThe Role Of Technology In Modern Terrorism
The Role Of Technology In Modern Terrorism
 
Information warfare and information operations
Information warfare and information operationsInformation warfare and information operations
Information warfare and information operations
 
2015_ICMSS_Institutional_Cybersecurity_s02
2015_ICMSS_Institutional_Cybersecurity_s022015_ICMSS_Institutional_Cybersecurity_s02
2015_ICMSS_Institutional_Cybersecurity_s02
 
Shubhrat.presentationfor cybercrime.ppt
Shubhrat.presentationfor cybercrime.pptShubhrat.presentationfor cybercrime.ppt
Shubhrat.presentationfor cybercrime.ppt
 
Terror And Technology
Terror And TechnologyTerror And Technology
Terror And Technology
 
2015 Cyber Security Strategy
2015 Cyber Security Strategy 2015 Cyber Security Strategy
2015 Cyber Security Strategy
 
CyberTerrorism - A case study for Emergency Management
CyberTerrorism - A case study for Emergency ManagementCyberTerrorism - A case study for Emergency Management
CyberTerrorism - A case study for Emergency Management
 
Sj terp emerging tech radar
Sj terp emerging tech radarSj terp emerging tech radar
Sj terp emerging tech radar
 
Teaching intelligence
Teaching intelligenceTeaching intelligence
Teaching intelligence
 
Cyber Security, Cyber Warfare
Cyber Security, Cyber WarfareCyber Security, Cyber Warfare
Cyber Security, Cyber Warfare
 
Mark Anderson on Cyber Security
Mark Anderson on Cyber SecurityMark Anderson on Cyber Security
Mark Anderson on Cyber Security
 

Viewers also liked

Vipin solanki- terrorism in pakistan
Vipin solanki- terrorism in pakistanVipin solanki- terrorism in pakistan
Vipin solanki- terrorism in pakistanVipin Solanki
 
paper on forecasting terrorism
paper on forecasting terrorismpaper on forecasting terrorism
paper on forecasting terrorismAjay Ohri
 
[2012 12-04 3] - terrorism definition and type
[2012 12-04 3] - terrorism definition and type[2012 12-04 3] - terrorism definition and type
[2012 12-04 3] - terrorism definition and typeCarlos Oliveira
 
Terrorism in pakistan causes &amp; remedies
Terrorism in pakistan causes &amp; remediesTerrorism in pakistan causes &amp; remedies
Terrorism in pakistan causes &amp; remediesGulfam Hussain
 
network security
network securitynetwork security
network securityPREMKUMAR
 

Viewers also liked (7)

PDS 614 Assignment
PDS 614 AssignmentPDS 614 Assignment
PDS 614 Assignment
 
Vipin solanki- terrorism in pakistan
Vipin solanki- terrorism in pakistanVipin solanki- terrorism in pakistan
Vipin solanki- terrorism in pakistan
 
Rough Draft
Rough DraftRough Draft
Rough Draft
 
paper on forecasting terrorism
paper on forecasting terrorismpaper on forecasting terrorism
paper on forecasting terrorism
 
[2012 12-04 3] - terrorism definition and type
[2012 12-04 3] - terrorism definition and type[2012 12-04 3] - terrorism definition and type
[2012 12-04 3] - terrorism definition and type
 
Terrorism in pakistan causes &amp; remedies
Terrorism in pakistan causes &amp; remediesTerrorism in pakistan causes &amp; remedies
Terrorism in pakistan causes &amp; remedies
 
network security
network securitynetwork security
network security
 

Similar to Privacy and terrorism informatics

Outline D
Outline DOutline D
Outline Dbutest
 
The Hacked World Order By Adam Segal
The Hacked World Order By Adam SegalThe Hacked World Order By Adam Segal
The Hacked World Order By Adam SegalLeslie Lee
 
Cyberterrorism Research Paper
Cyberterrorism Research PaperCyberterrorism Research Paper
Cyberterrorism Research PaperRachel Phillips
 
Cyber Weapons Proliferation
Cyber Weapons Proliferation                                 Cyber Weapons Proliferation
Cyber Weapons Proliferation OllieShoresna
 
Running headEMERGING THREATS AND COUNTERMEASURES .docx
Running headEMERGING THREATS AND COUNTERMEASURES             .docxRunning headEMERGING THREATS AND COUNTERMEASURES             .docx
Running headEMERGING THREATS AND COUNTERMEASURES .docxrtodd599
 
Cybersecurity Issues and Challenges
Cybersecurity Issues and ChallengesCybersecurity Issues and Challenges
Cybersecurity Issues and ChallengesTam Nguyen
 
Running head ISOL 534 – Application Security 1Running head.docx
Running head ISOL 534 – Application Security 1Running head.docxRunning head ISOL 534 – Application Security 1Running head.docx
Running head ISOL 534 – Application Security 1Running head.docxwlynn1
 
Cyber Security and Terrorism Research Article2Cybe.docx
Cyber Security and Terrorism Research Article2Cybe.docxCyber Security and Terrorism Research Article2Cybe.docx
Cyber Security and Terrorism Research Article2Cybe.docxrandyburney60861
 
Causes of the Growing Conflict Between Privacy and Security
Causes of the Growing Conflict Between Privacy and SecurityCauses of the Growing Conflict Between Privacy and Security
Causes of the Growing Conflict Between Privacy and SecurityDon Edwards
 
ESSENTIALS OF Management Information Systems 12eKENNETH C..docx
ESSENTIALS OF Management Information Systems 12eKENNETH C..docxESSENTIALS OF Management Information Systems 12eKENNETH C..docx
ESSENTIALS OF Management Information Systems 12eKENNETH C..docxdebishakespeare
 
ESSENTIALS OF Management Information Systems 12eKENNETH C.
ESSENTIALS OF Management Information Systems 12eKENNETH C.ESSENTIALS OF Management Information Systems 12eKENNETH C.
ESSENTIALS OF Management Information Systems 12eKENNETH C.ronnasleightholm
 
1)Using general mass-media (such as news sites) identify a recent co.pdf
1)Using general mass-media (such as news sites) identify a recent co.pdf1)Using general mass-media (such as news sites) identify a recent co.pdf
1)Using general mass-media (such as news sites) identify a recent co.pdfezzi552
 
Securing Cyber Space- Eljay Robertson
Securing Cyber Space- Eljay RobertsonSecuring Cyber Space- Eljay Robertson
Securing Cyber Space- Eljay RobertsonEljay Robertson
 
Invasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaInvasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaKelly Ratkovic
 
The Patriot Act Title Vii Section 814 And 816
The Patriot Act Title Vii Section 814 And 816The Patriot Act Title Vii Section 814 And 816
The Patriot Act Title Vii Section 814 And 816Nicole Fields
 
REPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docx
REPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docxREPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docx
REPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docxchris293
 

Similar to Privacy and terrorism informatics (19)

Outline D
Outline DOutline D
Outline D
 
The Hacked World Order By Adam Segal
The Hacked World Order By Adam SegalThe Hacked World Order By Adam Segal
The Hacked World Order By Adam Segal
 
Cyberterrorism Research Paper
Cyberterrorism Research PaperCyberterrorism Research Paper
Cyberterrorism Research Paper
 
Cyber Weapons Proliferation
Cyber Weapons Proliferation                                 Cyber Weapons Proliferation
Cyber Weapons Proliferation
 
Running headEMERGING THREATS AND COUNTERMEASURES .docx
Running headEMERGING THREATS AND COUNTERMEASURES             .docxRunning headEMERGING THREATS AND COUNTERMEASURES             .docx
Running headEMERGING THREATS AND COUNTERMEASURES .docx
 
Cybersecurity Issues and Challenges
Cybersecurity Issues and ChallengesCybersecurity Issues and Challenges
Cybersecurity Issues and Challenges
 
Running head ISOL 534 – Application Security 1Running head.docx
Running head ISOL 534 – Application Security 1Running head.docxRunning head ISOL 534 – Application Security 1Running head.docx
Running head ISOL 534 – Application Security 1Running head.docx
 
Cyber Security and Terrorism Research Article2Cybe.docx
Cyber Security and Terrorism Research Article2Cybe.docxCyber Security and Terrorism Research Article2Cybe.docx
Cyber Security and Terrorism Research Article2Cybe.docx
 
Causes of the Growing Conflict Between Privacy and Security
Causes of the Growing Conflict Between Privacy and SecurityCauses of the Growing Conflict Between Privacy and Security
Causes of the Growing Conflict Between Privacy and Security
 
ESSENTIALS OF Management Information Systems 12eKENNETH C..docx
ESSENTIALS OF Management Information Systems 12eKENNETH C..docxESSENTIALS OF Management Information Systems 12eKENNETH C..docx
ESSENTIALS OF Management Information Systems 12eKENNETH C..docx
 
ESSENTIALS OF Management Information Systems 12eKENNETH C.
ESSENTIALS OF Management Information Systems 12eKENNETH C.ESSENTIALS OF Management Information Systems 12eKENNETH C.
ESSENTIALS OF Management Information Systems 12eKENNETH C.
 
1)Using general mass-media (such as news sites) identify a recent co.pdf
1)Using general mass-media (such as news sites) identify a recent co.pdf1)Using general mass-media (such as news sites) identify a recent co.pdf
1)Using general mass-media (such as news sites) identify a recent co.pdf
 
Securing Cyber Space- Eljay Robertson
Securing Cyber Space- Eljay RobertsonSecuring Cyber Space- Eljay Robertson
Securing Cyber Space- Eljay Robertson
 
Traditional Terrorists
Traditional TerroristsTraditional Terrorists
Traditional Terrorists
 
Invasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaInvasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian Media
 
The Patriot Act Title Vii Section 814 And 816
The Patriot Act Title Vii Section 814 And 816The Patriot Act Title Vii Section 814 And 816
The Patriot Act Title Vii Section 814 And 816
 
REPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docx
REPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docxREPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docx
REPLY TO EACH POST 100 WORDS MIN EACH1. Throughout th.docx
 
Paris Attacks
Paris AttacksParis Attacks
Paris Attacks
 
28658043 cyber-terrorism
28658043 cyber-terrorism28658043 cyber-terrorism
28658043 cyber-terrorism
 

Privacy and terrorism informatics

  • 1. adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011 Terrorism Informatics and Privacy Shrey Jairath IIIT- Delhi Abstract. Terrorism Informatics is an emerging field where tools of Informa- tion Technology are applied to produce counter-terrorism efforts. The govern- ments of terrorism–affected nations have shown the primary interest in this field. Numerous programs are believed to exist under US government, which specifically indulge in pattern-based Data Mining over huge databases. These databases contain huge amount of data from disparate sources, including de- tailed data on the US citizens as well. Subjecting such data to data mining has caused various privacy issues. Articles on this topic by Bruce Schneier and some papers published in this field have been surveyed and summarized here to present - a view of privacy invasive nature of work being done in the field of counter-terrorism data mining, ways to introduce privacy preserving technolo- gies in this field and arguments on whether data mining can be an effective tool for National Security at all. Keywords: Terrorism, Informatics, Data Mining, Privacy. 1 Introduction Terrorism is a nuisance. It evokes anger. The thought that, somebody‘s life is taken without any fault of that individual, that it was not an accident but was precisely planned, that the killer did not even care which individual died as long as somebody died, that the dead individual is just an increment in the number of such deaths gives an unexplainable feeling –as if human life is so cheap. In my view, the effect of terrorism on society is more pervasive and damaging than it is generally considered. I find effects of terrorism, besides the loss of human lives, very similar to effects of privacy violations. Terrorism is so often compared to other causes of death like accidents, diseases, poverty, and it is argued that terrorism is too rare when compared to these events. But though terror strikes are rare and therefore the actual deaths due to the same is generally not high, but I believe the effect of terrorism is etched in society. There is always a threat and fear hanging in the air. There might not be an attack for a long time but the threat would always be there and this threat is directly faced by the society. Terrorist attacks might be a rare event, but terrorism is a permanent trend. Considering that no counter-measures are taken for terrorism which would violate the sense of society‘s freedom, still the socie- ty is threatened and frightened and no sense of privacy could provide an escape. The purpose of terrorism is to instill this fear in the society; therefore ignoring terrorism on the basis that it is rare does not kill the fear. The fear gets killed only through as-
  • 2. surances that there is no threat and when those assurances are kept. Hence the point is that, keeping all the privacy and handicapping counter-terrorism efforts does not gain any goal. Privacy in a threatened atmosphere seems absurd. Therefore, terrorism cannot be ignored and effective counter-terrorism meas- ures are a must in order to achieve goals of both security as well as privacy. There are a host of counter-measures that are around us all the time. From in-uniform security personnel to under-cover cops to intelligence agencies and from traditional technolo- gies like CCTV networks, X-ray machines to complex face recognition, sensor net- works etc, there has been a lot of efforts put and initiatives taken to counter terrorism. While, there are certainly a number of measures taken, but the need is to maintain objectivity and evaluate each measure by its effectiveness. Most of the measures are different forms of ―Security Theatre‖, the concept described by Bruce Schneier[1]. Security theatre means securing against a very specific attack like securing against a second 9/11, securing super bowl, historical monuments, subways, metro against terrorist attacks. Bruce Schneier says that such a strategy to secure against terrorism- by trying to secure against each possible attack - is grossly ineffective. The main rea- sons for the ineffectiveness are that - 1. The number of possible attacks is limitless, by securing against a set of at- tacks we are only forcing the terrorists to do slight modification in the plan and follow some other attack - By securing Airports we are only getting the subways blown. 2. There is no dearth of terrifying ideas, but we do not see them in reality very often. It is because terrorism is hard to carry out. Terrorism is very rare. When the number of attacks is few, each attack would be a new attack- not a copy of pre- vious ones. It would be a new target and a new tactic. Hence security theatre can't work since it is based on old tactics taken by terrorists. It is required that each measure taken is effective since otherwise there is not only loss of resources and privacy due to that measure but also loss of security as a possible better alternative measure goes neglected. One certain counter-terrorism initiative is use of Information Technology in the form of data mining, which we are particularly interested in. The governments of terrorism–affected nations have shown the primary interest in this field. Numerous programs are believed to exist under US government, which specifically indulge in pattern-based Data Mining over huge databases. These databases contain huge amount of data from disparate sources, including detailed data on the US citizens as well. Subjecting such data to data mining has caused various privacy issues. Articles on this topic by Bruce Schneier and some papers published in this field have been surveyed and summarized here to present - a view of privacy invasive nature of work being done in the field of counter-terrorism data mining, ways to introduce privacy preserving technologies in this field and arguments on whether data mining can be an effective tool for National Security at all. The structure of this paper is as follows: Section 2 describes the privacy in- vasive nature of these data mining measures; Section 3 is about how to make the tra- deoff between Security and Privacy in the context of counter-terrorism; Section 4 presents the arguments on why data mining would never work for the purpose of na- tional security; Section 5 describes a framework which would ensure that the data
  • 3. mining practices do not lead to privacy invasions; Section 6 provides ways of doing privacy preserving data mining where the privacy preserving nature is inbuilt in the tool; finally we conclude in Section 7. 2 Privacy Invasive Terrorism Informatics Past instances of terror strikes – 9/11, Madrid and London bombings- have shown that terrorists integrate into the society to seek invisibility [3]. This has led governments to look for terrorists blended in their own society in addition to looking for them in for- eign lands. Data Mining is one of the strategies adopted in this regard. Vast databases have been created which records every day information about an individual like – educational, health, financial, commincations. These records are then subjected to data mining algorithms to find patterns. The assumption is that terrorist activity leaves behind a trail in the every day activities and there are patterns which could identify it. Two types of data mining are being used aggressively: 1. Subject-Based – Used to gather information about individuals already sus- pected of wrong-doing. This type of data mining has been used since a long time and forms major source of the investigations. 2. Pattern-Based – A model is built which is considered to characterize the ac- tivities related to terrorism and is used to match against the sea of every day data. Any hit is considered as a possible terrorist plan or potentially culpable individuals. The aim of such program is to find terrorists hidden in the socie- ty. This type of data mining for national security purposes started after 9/11. While in subject-based data mining, there is an initial suspect around whom the data mining revolves, there is no such center of suspicion in pattern-based data mining and is based on the predictive powers of data linkages [3]. This has caused concerns as people who have done nothing to warrant a suspicion are suddenly being watched day in and day out. Almost all of the privacy concerns regarding data mining for national security purposes have been regarding pattern-based type of data mining. Although the goal of the program is the security of its citizens, the means are privacy invasive since the sensitive data of the citizens are scrutinized. Process of extracting information about individuals used to be expensive and time-consuming. This ensured that privacy violations are not practically feasible. This effect was termed as ―practical obscurity‖ by U.S Supreme Court [2]. In twenty first century, though, practical obscurity has been eroded by the developments in technology. 3 Trading Security with Privacy Whenever security and privacy are face to face, security measure automatically wins over civil liberties as the security threat is always more apparent and there is a loss of understanding of the concept of privacy. Usually, no reasoning is done about whether the measure is even effective enough. This is a wrong tradeoff for civil liberties as well as a loss for security as there might be better alternatives which do not get the attention and the resources.
  • 4. [4] Specifically talks about the tradeoff that exists between security and pri- vacy. It puts forward the rational way to balance the security with liberty. It says that the tradeoff between security and privacy is not set in linear equation and it is possible that alternatives occur with better security promises as well as lesser civil infringes. Also, protecting privacy does not necessarily require the proposed measure to be scrapped completely but certain measures ensuring accountability might be enough. But, the courts are not ready to go even that far as gravity of security threat automati- cally wins over the loss of privacy. In order to rationally trade security with privacy [4] puts forward the follow- ing methodology and applies it in case of terrorism as threat and data mining as secu- rity measure:  First assess the gravity of security threat -About terrorism the author says that threat of terrorism is over- hyped as number of people dying due to terrorism is miniscule; panic and fear cause the threat to be overstated. But, I would contest this perspective, as I have done earlier in the paper. The consequences of rare terrorist strikes are long-lasting and very akin to consequences of privacy violations. In my view, the threat of terrorism can not be taken lightly and should be given enough weight.  Secondly, Assess the effectiveness of proposed security measure against the given security threat -About Data Mining as a security measure against terrorism, the author says that it is effective in commercial settings where appetite for false positives is much higher and automatically has serious concerns in governmental purposes due to the harms of false positives. Also, the author says that there is no evidence where Data Mining has proved its efficiency and worthiness.  Based on above two factors decide whether the loss of civil liberties is justified. In the case of data mining for counter terrorism, as mentioned above the author feels the threat of terrorism is overhyped and says that the lack of any example proving the efficiency of data mining for such purposes and the highly covert nature of such tech- nologies make it hard to gauge the possible worthiness. The verdict of the author is fully captured in these lines- " Given the significant potential privacy issues and other constitutional concerns, combined with speculative and unproven security benefits as well as many other alternative means of promoting security, should data min-ing still be on the table as a viable policy option? Of course, one could argue that data mining at least should be investigated and studied. There is nothing wrong with doing so, but the cost must be considered in light of alternative security measures that might already be effective and lack as many potential problems. "
  • 5. In my view, the threat of terrorism would always qualify to consider the possible se- curity measures and I would like to give it enough weight to consider even privacy violating measure. I feel this is the problem with the method, as it is qualitative in nature. I can‘t quantitatively assess a security threat like terrorism and see whether it qualifies for certain amount of privacy violation (which too can‘t be quantified). Through this method as well it comes to the whims of the judge to say whether the particular security threat is grave enough for a list of privacy violations. But, the secu- rity advocates and privacy advocates would already have sides chosen. But evaluating the security measure though, certainly seems a logical re- quirement to perform the tradeoff between security and privacy. Effectiveness of a security measure is much more quantifiable and apparent. It makes sense to match the effectiveness of a security measure with the privacy violations. Though, certain de- gree of ambiguity remains. It may seem naïve to argue but suppose a particular securi- ty measure saves one life per year in return of particular privacy violations. How would you decide whether the trade-off is balanced? How would you balance the certain number of lives saved with any amount of privacy violation? Thus, the only step I would really stand by while performing the trade-off is comparing the possible security measures against each other. It is vital to choose the most effective security measure or the most effective to privacy invasive measure available, if that ratio is measurable. 4 Why Data Mining won’t work for National Security Bruce Schneier in [1] from 2001 till today has maintained that data mining would never work for national security purposes. The main reasons pointed out are: 1. The attacks are very rare. 2. No well defined profile to search for. 3. High cost of false positives. The author says that Data mining works when there is a reasonable number of attacks per year and there is a well defined profile to search for. In case of terrorism, though there is a pattern common to many terrorist attacks, the pattern is shared by many many other events as well. And since the number of actual attacks are too few than those other events - the number of false positives per every true positive is massively large. Further, the author says that the cost of the false positives is financially and in terms of civil liberties very high. Hence, in Bruce Schneier‘s view the only way to fight terrorism is through on the ground intelligence work and investigation. [1] Performs qualitative assessment of data mining and puts forward the current road- blocks for data mining to prove efficient for national security: 1. Data Quality
  • 6. Duplicate Records, lack of data standards, timeliness of updates and human error are some factors that make data mining inaccurate. The reports describing the various governmental data mining programs have frequently stated the evidence of such data inaccuracies. Further, the high stakes of such errors for individuals make it even harder. 2. Data Matching There is no single huge database and data mining requires integration across many different databases. This linking different databases together is a difficult and sometimes infeasible task as databases might have different formats, the data about the same individual might be in different forms, the data itself might be in unstruc- tured format etc. Government often does not have control over the disparate sources of data and hence rectifying this issue is even harder. 3. Data Mining Tools It is hard to comment directly on governmental data mining efficiency as there are no examples of its success and otherwise it is carried out in classified man- ner. But, inferring from the efficiency of data mining in commercial sector, there is the problem of inaccuracies mainly in the form of huge false positives. Compared to private sector there are lot many factors that should further diminish the performance of governmental data mining as  The target for government is far lesser in number than the target for pri- vate sector.  The terrorist can blend in  Hard to get the pattern to search for as there have not been many terrorist attacks and those that have occurred are very different from each other. [1] puts it properly in - ―With a relatively small number of attempts every year and only one or two major terrorist incidents every few years— each one distinct in terms of planning and execution—there are no meaningful patterns that show what behavior indicates planning or preparation for terrorism.‖  Data Mining efforts are reactive i.e they respond to the previous examples of terrorist incidents but national security requires proactive efforts as the terrorists can always come up with a very new plot.  In private space, the targets do not care a lot but in counter terrorism the terrorists will make all efforts to avoid getting caught.  Paul Rosenzweig, Deputy Assistant Secretary for Policy at DHS : ―[t]he only certainty [in data mining] is that there will be false positives.‖ 5 Framework to Prevent Privacy Invasion The Fourth Amendment is the restriction imposed on government from obtaining personal information about individuals against ―general searches‖[1]. Thus Fourth
  • 7. Amendment while allows for specific searches which are encountered in Subject- based data mining, the general searches of pattern-based data mining are blocked by the fourth amendment. But, the boundary between specific searches and general searches are dissolved to just distinguish between reasonable searches and unreasona- ble searches. Fourth Amendment applies to searches performed by US government for national security and intelligence purposes. But Fourth Amendment does not apply on data collected by third parties i.e. is the private parties. And since most of the data used for data mining purposes are collected from these private third parties, there is almost no refrain due to the Fourth Amendment in use of this data. Apart from Fourth Amendment, the Privacy Act of 1974 tries to regulate the government‘s collection and usage of private data. This act requires agencies to [1]:  Store no more information than required by the executive order.  Maintain data quality.  Ensure security of the stored data. But there are various exceptions in this act which let the government get away with their motives. To evaluate the efficacy of its data mining programs and the privacy violations due to them, US government established TAPAC – Technology and privacy Advisory com- mittee. TAPAC in its recommendations to the government recommended a frame- work for carrying out the data mining activities. This framework has been generally accepted and is advocated in [1] and [6]:  Legal Authorization – Requires the agency head to write an authoriza- tion letter stating the purpose of project, how the information will be used, establish acceptable false positive rates and the ways to deal with them.  Access Control – Ensure that only the authorized users gets access to the data and that they do not misuse the data.  Anonymization and Selective Revelation – Reveal the minimum amount of private information. Further detailed data is shown only if need be which is also selectively revealed.  Audit – Keep a record of what information was watched by which ana- lyst. This would allow investigation into data breaches and misappropria- tion of data.  Address False Positives – Instead of directly taking actions on the results of data mining, perform an intermediate step where analysts investigate the result. If a false positive is found, use the result to improve the data mining program.  Accountability Measures – Internal and external reviews of the program should be held. The government should validate the models being used in these programs and the results.
  • 8. 6 Privacy Preserving Data Mining [5] has performed a survey of the data mining techniques used in the very much re- lated field of – Fraud Detection. The survey yields that all kinds of learning algo- rithms are in extensive use in this field:  Supervised Approaches - Using labeled examples of fraudulent and au- thentic transactions, a mathematical model is created to distinguish be- tween the two. Supervised learning algorithms that have been used for such purposes include - Neural Networks, SVM, Bayesian Networks, Naive Bayes, Association Rule Mining, Genetic Programming. Popular supervised algorithms like Neural Networks, Bayesian Networks and de- cision trees have been combined together to create hybrid approaches to improve results.  Supervised + Unsupervised Hybrids - Some studies show that super- vised algorithms outperform the unsupervised algorithms on telecommu- nications data while the best results are achieved when both are used in conjunction.  Unsupervised Approaches - These techniques use unlabelled examples to find patterns and structures inherent in the data. Link analysis and graph mining are considered to be hot research topics in security areas like coun- ter-terrorism and law enforcement. Unsupervised approaches like cluster analysis, outlier detection, spike detection, unsupervised neural networks have been applied for fraud detection. Due to the privacy invasive nature of these techniques, many efforts have been made to develop privacy preserving mining techniques. Data mining is a combination of tools and the data and not just any one of them. Thus various techniques are possi- ble which work on either the data or the tool [6]. [7] performs classification of privacy preserving data mining techniques into three classes: 1. Heuristic Based - In heuristic based techniques the data is modified in a way such that it leads to least loss in utility. For e.g. Data mining algorithms like association rule mining can be made privacy preserving by ensuring that sen- sitive rules do not receive the required support or confidence which can be done by hiding the item sets from which these rules are derived. 2. Cryptographic - Cryptography based techniques are applied where data mining is done on distributed data. The privacy concern in such scenario is that each data holder does not want to expose its raw data to others while are interested in the end computation product. Data mining algorithms are hence required to perform secure multiparty communication. There have been vari- ous techniques proposed which convert normal computation into SMCs and also various SMC methods have been proposed which can support certain
  • 9. data mining algorithms. One particular SMC algorithm for decision tree learning through ID3 has been proposed by [8]. We look at this algorithm in detail later in this section. 3. Reconstruction Based - Reconstruction Based techniques perturb the data but is still possible to infer the distribution of data. Hence, though the data is perturbed at more granular levels, the higher level view is still maintained. [6] Have performed another classification of techniques which ensure that certain sensitive rules cannot be inferred while the non-sensitive rules can be: 1. Limiting Access - Provide a sample view of database so that inferences drawn do not imply strict support. 2. Fuzz the data - Alter the data or put aggregate values in place of individ- ual values. 3. Eliminate unnecessary groupings - Keep the data as random as possible. Do not append meanings to data meant for some other purposes, which could then be mined. 4. Augment the data - Add dummy data. 5. Audit - Not feasible when the data is publically available but for within organizational purposes it can induce accountability. 6. Attack the Algorithm - The logic behind how the algorithm finds rules can be attacked so as to ensure that dummy rules get created and ensure that sensitive rules are not found. - Performance of Algorithm can be attacked to ensure that the algo- rithm is infeasible to be applied on the given dataset. In the rest of this section [8] has been summarized to describe the proposed SMC technique. [8] proposes privacy preserving decision tree learning for a scenario where two parties hold parts of the database and wish to not reveal the contents of their da- tabases while are interested in the decision tree learnt on the union of their databases. No third party has been assumed. This is a case of SMC where number of participat- ing parties is two. The proposed technique ensures that each participating party can learn no more than what can be learnt using its own database (its input) and the result- ing decision tree (output). A semi-honest adversary has been considered so the tech- nique preserves privacy in the face of any passive attack. This means that the adver- sary shall try to break the privacy of a participating party while adhering to the proto- cols of the proposed technique. Decision trees are machine learning tools for tasks of classification. A deci- sion tree is a tree consisting of nodes where each internal node is a rule defined on one of the attributes of data. Each leaf node is one of the possible classes. Decision tree is learnt for a given database using some decision tree learning algorithm. Once the tree is learnt, any test instance is traversed on the tree starting at root and the leaf
  • 10. node at which the traversal ends is the predicted class for the test instance. ID3 is a specific supervised learning algorithm to learn decision tree on a given database. ID3 attempts to create shortest tree possible by trying to finish the classification using least number of nodes/attributes. This is done by ordering the attributes in decreasing order based on their information gain over the training data. The attribute with maxi- mum information gain classifies completely maximum number of the existing unclas- sified transactions. Hence ID3 recursively calculates the information gain by each attribute over the unclassified transaction in the training set and picks the one with maximum gain and puts it into the tree, till no unclassified transaction is left. The information gain for an attribute depends on the entropy of the attribute. The entropy of an attribute is given by: where Hc(T|A) is the entropy of attribute A over set of training transactions T when the set of possible classes is C, |T| is the number of transactions, |T(aj)| is the number of transactions having value for attribute A = aj, m is the number of possible attribute values for attribute A, Hc(T(aj)) is the entropy to classify the transactions having attribute value for A = aj. ID3 thus calculates the entropy for each attribute and selects the one with minimum entropy and puts it into the tree. ID3- delta is an extension of ID3 where the entropy for each attribute is approximated and attributes having entropy within delta range of each other can come in either order- The problem being solved is a two-party communication which is often de- noted by: (x, y) |→ ( f 1(x, y), f 2(x, y)) where x is the input from first party, y is the input from second party , first party wishes to receive f 1(x,y), second party wishes to receive f 2(x,y). The particular case in the problem at hand can thus be denoted as (D1, D2) |→ (ID3(D1 ∪ D2), ID3(D1 ∪ D2)) where D1 is the database possessed by first party, D2 is the database possessed by second party and both parties are interest- ed in common output – ID3(D1 ∪ D2). The aim of SMC is to provide a private protocol to carry out above computa- tion (in two party cases). A protocol is private if the view of each party can be simu- lated using just its input and protocol‘s output which means that the party does not learn anything new from protocol execution. The proposed technique is a private pro- tocol for calculating ID3-delta. Hence the view of first party can be simulated given D1 and ID3-delta(D1 ∪ D2) only and similarly for second party. Since, the problem being solved is a case of SMC, the existing solutions for SMC does solve the problem. Yao in [9] proposed a protocol for computing any prob- abilistic polynomial-time functionality f(x,y) where x and y are the inputs of the two parties respectively. The protocol works by first party computing f(x,.) and sending it to second party in encrypted format. The encryption is such that it allows for partial decryption by second party to give f(x,y). The keys used by second party are received from first party corresponding to y. This can be done without revealing y by carrying out |y| instances of 1-out-of-2 oblivious transfer protocol [8]. A 1-out-of-2 oblivious
  • 11. transfer protocol is: ((x 0, x 1), σ ) |→ (λ, x σ ) i.e. the first party inputs a pair (x0,x1) and second party inputs a bit 0 or 1. The protocol outputs the x0 or x1 depending on the input bit to the second party while first party learns nothing. While, this generic solution applies to the problem of privately computing ID3-delta as well, its complex- ity is proportional to the input size i.e. is the size of the database and has huge com- munication overhead. Hence, it scales badly for data mining purposes where the size of databases is too huge. Due to inefficiency of generic protocols, research has been focused on devel- oping efficient solutions to specific problems. In this direction, [8] proposes solution for two-party distributed private computation of ID3-delta. The proposed algorithm tries to provide an efficient protocol by cutting on the communication overhead by making each party indulge in mostly independent computations. The assumptions across the proposed protocol are:  The databases D1 and D2 possessed by the two parties have same struc- ture.  Attribute names are public  Possible attribute values are public for each attribute  The total size of |D1 U D2| is public. As seen before, the main task of ID3 is finding the attribute with minimum entropy which is performed recursively until all training transactions are classified. In order to cut the complexity of performing this task, minimum entropy is written in following form: Since, |T| is the number of transactions, it is constant across all attributes hence can be ignored. Now, to compute entropy of any attribute A, two quantities are required: |T(aj)| and |T(aj,ci)| where |T(aj)| is the number of transactions having attribute value for attribute A = aj and |T(aj,ci)| is the number of transactions having attribute value for attribute A = aj and class value = ci. Now, |T(aj)| = |T1(aj)| + |T2(aj)| and similarly |T(aj,ci)| = |T1(aj,ci)| + |T2(aj,ci)| where T1 signifies the transac- tions in D1 and T2 signifies the transactions in D2. Therefore, a non-private me- thod of finding minimum entropy attribute would be for first party to compute |T1(aj)| and |T1(aj,ci)| for each attribute and send them to second party which could then calculate |T(aj)| and |T(aj,ci)| and hence the entropy of the attribute. The com-
  • 12. munication complexity is reduced in this case to logarithmic in terms of number of transactions. In order to turn this into a private protocol, the basis is the knowledge that privately computing ID3-delta is equivalent to privately finding the attribute with the minimum entropy which further means privately computing the entropy of each attribute – Hc(T|A). This quantity has been written above as a sum of expressions of the form (v1 + v2) ln (v1+v2), (The log can be changed to ln since we have to compare this quantity for each attribute), where v1 = |T1(aj,ci)| or |T1(aj)| and v2 = |T2(aj)| or |T2(aj,ci)|. The task of privately finding the minimum entropy attribute is done by com- puting random shares of Hc(T|A) for each attribute A and distributing between par- ties such that sum of shares = Hc(T|A), and the task of privately computing Hc(T|A) is done by privately computing the expression (v1 + v2) ln (v1+v2). The protocol for privately computing the expression (v1 + v2) ln (v1+v2) takes input v1 and v2 from the two parties, privately compute (v1 + v2) ln (v1+v2), output shares of an approximation of (v1 + v2) ln (v1+v2) to the two parties such that the sum of the shares = approximation of (v1 + v2) ln (v1+v2). Now, Hc(T|A) is sum of expressions like (v1 + v2) ln (v1+v2) and by the protocol of privately computing the expression (v1 + v2) ln (v1+v2) each party has shares such that their sum is approximation of (v1 + v2) ln (v1+v2) therefore, each party can independently sum its own shares for all (v1 + v2) ln (v1+v2) expres- sions for Hc(T|A) to get its share of Hc(T|A). Hence by following the protocol for calculating all (v1 + v2) ln (v1+v2) expressions for all attributes, each party has their shares of an approximation of Hc(T|A) for each attribute A. Now, only part remaining is given the shares find the minimum entropy attribute i.e. find the attribute for which the sum of corresponding shares possessed by each party is minimum. This is done using Yao‘s protocol. It takes as input the shares for all attributes from each party and outputs the attribute for which the sum of its shares is minimum. Hence, the task of finding the attribute with minimum entropy is per- formed by invoking two separate private sub-protocols: 1. Privately calculating (v1 + v2) ln (v1+v2) and distributing its shares. 2. Private Yao’s protocol for finding minimum entropy (Hc(T|A)) attribute given shares of Hc(T|A) for all A. This composition of two private sub-protocols result in a private protocol as the first protocol yields shares which are uniformly distributed in a finite field [8]. Hence the resulting protocol is private. Now, protocol for privately computing (v1 + v2) ln (v1+v2) remains un- described. This protocol as said a few times already, takes as input v1 and v2 from the two parties and outputs shares of an approximation of (v1 + v2) ln (v1+v2). This protocol is carried out in 2 steps: 1. Distribute shares of ln(v1 + v2)
  • 13. Let v1+v2 =x, therefore the task of this step is to create shares of lnx. For a given x, we start by finding n which gives the 2n closest to x. Therefore, x = 2n (1+E) where -1/2<= E =<1/2. Taking ln on both sides gives: Ln(x) = ln(2n ) + ln(1+E)= n ln2 + E – E2 /2+E3 /3-E4 /4 …. Now, Yao‗s protocol is used to compute 2n E and 2n n ln2. Then, shares are calculated for the above mentioned taylor‘s series approximation using oblivious polynomial evaluation. The sum of the shares obtained in this step (u1, u2) are the shares for ln x. 2. Given v1 , v2 and shares of ln(v1+v2) find shares of (v1+v2)ln(v1+v2) us- ing private multiplication protocol. The private multiplication protocol is also based on oblivious polynomial evaluation. Each party invokes multiplication pro- tocol twice to receive shares of u1 . v2 and u2 . v1. Party 1‗s share w1 is then the sum of these two shares and u1 . v1 and party 2‘s share w2 is sum of these two shares and u2 . v2. We get: w1 + w2= u1v1+u1v2 + u 2v1 + u 2v2 = (u 1 + u 2)(v1 + v2) ≈ x ln x These shares are then used in the protocol for finding the attribute with minimum entropy. 7 Conclusion While there has not been any evidence of its success, data mining for national se- curity has serious privacy-invasive implications. The faith in Data Mining ranges from one end where Bruce Schneier has always condemned it, - to U.S government‘s numerous programs using data mining for national security purposes. In this survey paper the privacy implications of national security led data mining has been put forward, some reasons for no success have been explored, ways to find a balance between privacy and security in this field through formal frameworks and data mining techniques which inherently preserve privacy, are explored. 8 Acknowledgement I would like to deeply thank Dr. Shishir Nagaraja for letting me perform this Indepen- dent Study under his guidance. 9 References 1. Bruce Schneier : Cryptogram - http://www.schneier.com/essays-terrorism.html 2. Fred H. Cate : Government Data Mining: The Need for a Legal Framework 3. Ira S. Rubinstein et al : Data Mining and Internet Profiling: Emerging Regulato- ry and Technological Approaches 4. Daniel J. Solove : Data Mining and the Security-Liberty Debate
  • 14. 5. Clifton Phua et al : A Comprehensive Survey of Data Mining-based Fraud De- tection Research 6. Chris Clifton et al : Security and Privacy Implications of Data Mining 7. Vassilios S. Verykios et al : State-of-the-art in Privacy Preserving Data Mining 8. Yehuda Lindell et al: Privacy Preserving Data Mining 9. A. C. Yao, How to generate and exchange secrets, Proceedings of the 27th Sym- posium on Foundations of Computer Science (FOCS), IEEE, 1986, pp. 162–167.