Understand the Idea of Big Data and in Present Scenario
Terrorism in the Age of Big Data
1. Maman 1
HUNTING TERRORISM IN THE AGE OF BIG DATA:
CURRENT DATA MINING CHALLENGES IN INTELLIGENCE COLLECTION
BY
MICHAEL MAMAN
RESEARCH METHODS IN SECURITY AND INTELLIGENCE STUDIES - INTL500
DR. TATARKA
AMERICAN MILITARY UNIVERSITY
2. Maman 2
Introduction
The attacks on September 11, 2001 were a devastating blow to the United States and its
Intelligence infrastructure, which was later chastised for not having greater foresight at detecting
activity leading up to the event. Since then, agencies within the Intelligence Community, or IC,
enjoy greater cooperation and data sharing, as well as bolstering methods of data collection to
detect terrorism. Other agencies like the Defense Advanced Research Project Agency (DARPA)
were invited to try their hand at data mining methods to aid in the effort. Data mining, the
method of detecting patterns through sifting through large volumes of data, has been a method
long employed in intelligence gathering. However, the amount of data in circulation today is
unprecedented, and only continues to grow. A decade after 9/11, the IC is now grappling with
“Big Data”; collections of information from phone records, to online purchases, credit history,
and most recently social network activity, all of which are only fractions of the larger data
collection scope.
With the ever increasing mobilization of media, terrorists are becoming more adaptable
and making greater efforts to remain hidden. As data collection continues to grow, the challenge
for the IC is finding suitable methods of accurately analyzing the data, and sorting the ‘bad’ data
from the ‘good’ amidst an over glut of information. Utilizing research from the IC, business,
and the information technology sector, this essay will examine the issues of big data in terms of
its current issues and promises for the future at detecting terrorist activity. If the IC is unable to
catch up with the ever growing amount of information collected, then the risk of undetected
terrorist activity may increase in the future. Understanding that the main challenge of
3. Maman 3
intelligence gathering with big data is effective analysis, this essay will also explore methods of
mitigating that challenge. In addition, this paper will advance some alternative methods to big
data collection, suggesting a “smarter” way at collecting information to detect patterns as
opposed to unrestrained data collection.
Literature Review
With big data still an emergent topic, academic literature research into big data and intelligence
gathering is still being published and perhaps not enough a researcher could hope for. However,
there is still substantial research on big data itself and its application to project future trends with
use. Moreover, literature on previous “pre-big data” programs following 9/11 should offer some
insight into the workings of its methodology and the challenges it poses.
With the amount of information big data pulls, preliminary research suggests that it can
in fact, “…bridge the gap between what people want to do and what they actually do as well as
how they interact with others in their environment” (Michael and Miller 2013, 23). Certainly,
big data opens the possibilities to how much information can be collected on individuals. With
terrorism, however, the issue lies in detecting patterns that are meant to be hidden. Big data
gathering on consumer behavior and social network profiles are fairly conspicuous. Moreover
when it comes to terrorism, authors Chen et al. (2012, 1172) identify major hurdles that need to
be tackled in the areas of information processing and analysis. They write that “…diverse data
sources, multiple data formats, and large data volumes” creates an information overload which
current research is trying to address (1172). Here, specific attention is paid to data mining for
terrorist activity, giving mention to a recent DARPA program in 2012 known as “XDATA”
(1172). The XDATA program’s objective is to “…help develop computational techniques and
software tools for processing and analyzing…” the massive amounts of information so it can be
4. Maman 4
synthesized in an organized manner for the IC (1172). Whether the XDATA program is
yielding success is still to be determined, but Chen et al. do an adequate job of addressing the
current challenges for big data in the national security front. However, the DARPA project is the
only mention of a potential mitigation to handling extremely large quantities of data sets. In this
case, the solution, or mitigation seems to be introducing more effective algorithms.
Nevertheless, the XDATA program will be one of continual interest and promise in this regard.
Next, this paper brings in the findings and viewpoint of strategy analyst Stephane
Lefebvre to once again emphasize the scalability issue in data related to threat detection. Written
in 2004, Lefebvre mentions the point that intelligence analysts at a “military tactical level” can
“…receive over 17,000 reports per hour from sensors alone” (Lefebvre 249). Of course, if the
analyst cannot identify what is valuable then the data is useless (249). One can only imagine
how much has changed for intelligence analysts in almost a decade. Lefebvre then discusses
data storage, and how intelligence agencies, having massive amounts of data stores in databases,
must have “…fast and accurate algorithms” to analyze the data (249). Later he discusses “pre-
big data” projects by DARPA such as “Total Information Awareness” to collect large quantities
of public and private data on all American citizens, as well as project Genoa (249-50). To once
again emphasize the 2004 publication date of Lefebvre’s work, it’s important to note that at the
time those DARPA programs were still in their infancy and being discussed. Lefebvre mentions
how intelligence analysts are increasingly relying on technical support to handle reporting, but
little concern is given to being overwhelmed with handling the data unlike with Chen et al.
Lefebvre’s piece is still useful exposition in unraveling the nature of data collection and the
importance of analytics.
Next, this essay brings the work of Terrence Maxwell (2005) to illustrate further
5. Maman 5
challenges with data mining pre-big data and storage. He begins by identifying ‘data
warehouses’ as “…massive databases” allowing analysts to access data from multiple
databases (Maxwell 3). However, according to data warehousing experts1
, the “possibility of
error in data warehouse with multiple inputs and data collected over time is quite high” (5). This
is framed from the context of 2005, but some of the factors to reach that conclusion were from
‘static’ data mining models taken from the DARPA TIA project, assuming that patterns of
individuals, in both beliefs and relationships, do not change over time (5). He goes on to say
that it could be misguided of data mining models to attempt to catch terrorism by “detecting
relevant relationships and patterns of activity that correspond to potential terrorist events, threats,
or planned attacks” (6). The reason for this is because terrorists, who constantly want to remain
hidden, are also likely to adapt and evolve their tactics; a problem Maxwell says, data mining
developers acknowledge, but “do not adequately respond to” (6). This is how false positives can
be identified and benign activities can be mistaken for nefarious. Although Maxwell’s paper is
from 2005, there is no evidence to indicate that IC uses big data different from the data mining
models of looking for patterns. Maxwell’s paper is one of strong interest to this field, and
contains many notable references that will possibly be explored further. For now, this paper will
move on to the final source and make its preliminary conclusions.
The final piece of academic literature to be examined is by Nancy Roberts (2011)
specifically identifying the challenges and opportunities for data mining for the IC. Roberts’s
paper is particularly useful as it identifies the roles in intelligence gathering for each particular
agency beginning with the CIA. The next section deals with particular challenges to data
1
Data Warehousing Center (2000). “An Informal Taxonomy of Data Warehouse Data Errors”.
http://businessweek.itpapers.com/abstract.aspx?scid=1003&sortby=title&docid=6729
6. Maman 6
collection, several of which have already been mentioned, such as scalability, emphasizing that
the “information glut” will get worse with the amount of data continually being collected and the
amount of storage continually growing to hold it (Roberts 9). The next challenge identified is
the ability to extract the pertinent information from massive data sets, again previously discussed
(10). The information of note here is that Roberts substantiates the suspicions about the
difficulties collecting information on dark networks, in that “Data on terrorists is dynamic, not
static” (11). Lastly, Roberts closes with introducing the concept of “visual analytics”, what’s
called “an emerging field dedicated to improving data collection and analysis through the use of
computer –mediated visualization techniques and tools” (5). Roberts goes over the history of the
visual analytics field and two prominent firms pioneering the technology. The area of visual
analytics will further be explored to see if it can be a potential solution in tackling the analysis
challenge in big data.
Conclusion
Literature on Big Data, related specifically to the IC is indeed limited, but the
research already acquired in its big data’s relation to data mining seems to be sufficient in
outlying its current challenges in analytics. Most particularly, in the area of ever growing data
collection and the need for developing more sophisticated algorithms and programs at detecting
relevant information. Detecting patterns for terrorist activity, as has been the standard for data
mining, may not be the most ideal method given the dynamic nature of dark network activity.
Ultimately, more research is still needed in mitigation methods for tackling the challenges posed
by big data, and if possible, how much promise they project for the future.
7. Maman 7
References
Hsinchun Chen, et al. “Business Intelligence and Analytics: From Big Data to Big Impact.” MIS
Quarterly 36, no.4 (2012): 1165-1188.
Lefebvre, Stephane. “A Look at Intelligence Analysis”. International Journal of
Intelligence and CounterIntelligence 17 (2004): 231-264.
Maxwell, Terrence A. “Information Policy, Data Mining, and National Security: False
Positives and Unidentified Negatives.” System Sciences. HICSS-38. 38th Hawaii
International Conference on System Sciences (Jan. 2005): 1-8.
Michael, Katina and Keith W. Miller. “Big Data: New Opportunities and New
Challenges”. IEEE Computer Society 46, no. 6 (2013): 22-24.
Roberts, Nancy C. “Tracking and disrupting dark networks: Challenges of data collection
and analysis”. Information Systems Frontiers 13, no.1 (2011): 5-19.