SlideShare a Scribd company logo
1 of 7
Download to read offline
Maman 1
HUNTING TERRORISM IN THE AGE OF BIG DATA:
CURRENT DATA MINING CHALLENGES IN INTELLIGENCE COLLECTION
BY
MICHAEL MAMAN
RESEARCH METHODS IN SECURITY AND INTELLIGENCE STUDIES - INTL500
DR. TATARKA
AMERICAN MILITARY UNIVERSITY
Maman 2
Introduction
The attacks on September 11, 2001 were a devastating blow to the United States and its
Intelligence infrastructure, which was later chastised for not having greater foresight at detecting
activity leading up to the event. Since then, agencies within the Intelligence Community, or IC,
enjoy greater cooperation and data sharing, as well as bolstering methods of data collection to
detect terrorism. Other agencies like the Defense Advanced Research Project Agency (DARPA)
were invited to try their hand at data mining methods to aid in the effort. Data mining, the
method of detecting patterns through sifting through large volumes of data, has been a method
long employed in intelligence gathering. However, the amount of data in circulation today is
unprecedented, and only continues to grow. A decade after 9/11, the IC is now grappling with
“Big Data”; collections of information from phone records, to online purchases, credit history,
and most recently social network activity, all of which are only fractions of the larger data
collection scope.
With the ever increasing mobilization of media, terrorists are becoming more adaptable
and making greater efforts to remain hidden. As data collection continues to grow, the challenge
for the IC is finding suitable methods of accurately analyzing the data, and sorting the ‘bad’ data
from the ‘good’ amidst an over glut of information. Utilizing research from the IC, business,
and the information technology sector, this essay will examine the issues of big data in terms of
its current issues and promises for the future at detecting terrorist activity. If the IC is unable to
catch up with the ever growing amount of information collected, then the risk of undetected
terrorist activity may increase in the future. Understanding that the main challenge of
Maman 3
intelligence gathering with big data is effective analysis, this essay will also explore methods of
mitigating that challenge. In addition, this paper will advance some alternative methods to big
data collection, suggesting a “smarter” way at collecting information to detect patterns as
opposed to unrestrained data collection.
Literature Review
With big data still an emergent topic, academic literature research into big data and intelligence
gathering is still being published and perhaps not enough a researcher could hope for. However,
there is still substantial research on big data itself and its application to project future trends with
use. Moreover, literature on previous “pre-big data” programs following 9/11 should offer some
insight into the workings of its methodology and the challenges it poses.
With the amount of information big data pulls, preliminary research suggests that it can
in fact, “…bridge the gap between what people want to do and what they actually do as well as
how they interact with others in their environment” (Michael and Miller 2013, 23). Certainly,
big data opens the possibilities to how much information can be collected on individuals. With
terrorism, however, the issue lies in detecting patterns that are meant to be hidden. Big data
gathering on consumer behavior and social network profiles are fairly conspicuous. Moreover
when it comes to terrorism, authors Chen et al. (2012, 1172) identify major hurdles that need to
be tackled in the areas of information processing and analysis. They write that “…diverse data
sources, multiple data formats, and large data volumes” creates an information overload which
current research is trying to address (1172). Here, specific attention is paid to data mining for
terrorist activity, giving mention to a recent DARPA program in 2012 known as “XDATA”
(1172). The XDATA program’s objective is to “…help develop computational techniques and
software tools for processing and analyzing…” the massive amounts of information so it can be
Maman 4
synthesized in an organized manner for the IC (1172). Whether the XDATA program is
yielding success is still to be determined, but Chen et al. do an adequate job of addressing the
current challenges for big data in the national security front. However, the DARPA project is the
only mention of a potential mitigation to handling extremely large quantities of data sets. In this
case, the solution, or mitigation seems to be introducing more effective algorithms.
Nevertheless, the XDATA program will be one of continual interest and promise in this regard.
Next, this paper brings in the findings and viewpoint of strategy analyst Stephane
Lefebvre to once again emphasize the scalability issue in data related to threat detection. Written
in 2004, Lefebvre mentions the point that intelligence analysts at a “military tactical level” can
“…receive over 17,000 reports per hour from sensors alone” (Lefebvre 249). Of course, if the
analyst cannot identify what is valuable then the data is useless (249). One can only imagine
how much has changed for intelligence analysts in almost a decade. Lefebvre then discusses
data storage, and how intelligence agencies, having massive amounts of data stores in databases,
must have “…fast and accurate algorithms” to analyze the data (249). Later he discusses “pre-
big data” projects by DARPA such as “Total Information Awareness” to collect large quantities
of public and private data on all American citizens, as well as project Genoa (249-50). To once
again emphasize the 2004 publication date of Lefebvre’s work, it’s important to note that at the
time those DARPA programs were still in their infancy and being discussed. Lefebvre mentions
how intelligence analysts are increasingly relying on technical support to handle reporting, but
little concern is given to being overwhelmed with handling the data unlike with Chen et al.
Lefebvre’s piece is still useful exposition in unraveling the nature of data collection and the
importance of analytics.
Next, this essay brings the work of Terrence Maxwell (2005) to illustrate further
Maman 5
challenges with data mining pre-big data and storage. He begins by identifying ‘data
warehouses’ as “…massive databases” allowing analysts to access data from multiple
databases (Maxwell 3). However, according to data warehousing experts1
, the “possibility of
error in data warehouse with multiple inputs and data collected over time is quite high” (5). This
is framed from the context of 2005, but some of the factors to reach that conclusion were from
‘static’ data mining models taken from the DARPA TIA project, assuming that patterns of
individuals, in both beliefs and relationships, do not change over time (5). He goes on to say
that it could be misguided of data mining models to attempt to catch terrorism by “detecting
relevant relationships and patterns of activity that correspond to potential terrorist events, threats,
or planned attacks” (6). The reason for this is because terrorists, who constantly want to remain
hidden, are also likely to adapt and evolve their tactics; a problem Maxwell says, data mining
developers acknowledge, but “do not adequately respond to” (6). This is how false positives can
be identified and benign activities can be mistaken for nefarious. Although Maxwell’s paper is
from 2005, there is no evidence to indicate that IC uses big data different from the data mining
models of looking for patterns. Maxwell’s paper is one of strong interest to this field, and
contains many notable references that will possibly be explored further. For now, this paper will
move on to the final source and make its preliminary conclusions.
The final piece of academic literature to be examined is by Nancy Roberts (2011)
specifically identifying the challenges and opportunities for data mining for the IC. Roberts’s
paper is particularly useful as it identifies the roles in intelligence gathering for each particular
agency beginning with the CIA. The next section deals with particular challenges to data
1
Data Warehousing Center (2000). “An Informal Taxonomy of Data Warehouse Data Errors”.
http://businessweek.itpapers.com/abstract.aspx?scid=1003&sortby=title&docid=6729
Maman 6
collection, several of which have already been mentioned, such as scalability, emphasizing that
the “information glut” will get worse with the amount of data continually being collected and the
amount of storage continually growing to hold it (Roberts 9). The next challenge identified is
the ability to extract the pertinent information from massive data sets, again previously discussed
(10). The information of note here is that Roberts substantiates the suspicions about the
difficulties collecting information on dark networks, in that “Data on terrorists is dynamic, not
static” (11). Lastly, Roberts closes with introducing the concept of “visual analytics”, what’s
called “an emerging field dedicated to improving data collection and analysis through the use of
computer –mediated visualization techniques and tools” (5). Roberts goes over the history of the
visual analytics field and two prominent firms pioneering the technology. The area of visual
analytics will further be explored to see if it can be a potential solution in tackling the analysis
challenge in big data.
Conclusion
Literature on Big Data, related specifically to the IC is indeed limited, but the
research already acquired in its big data’s relation to data mining seems to be sufficient in
outlying its current challenges in analytics. Most particularly, in the area of ever growing data
collection and the need for developing more sophisticated algorithms and programs at detecting
relevant information. Detecting patterns for terrorist activity, as has been the standard for data
mining, may not be the most ideal method given the dynamic nature of dark network activity.
Ultimately, more research is still needed in mitigation methods for tackling the challenges posed
by big data, and if possible, how much promise they project for the future.
Maman 7
References
Hsinchun Chen, et al. “Business Intelligence and Analytics: From Big Data to Big Impact.” MIS
Quarterly 36, no.4 (2012): 1165-1188.
Lefebvre, Stephane. “A Look at Intelligence Analysis”. International Journal of
Intelligence and CounterIntelligence 17 (2004): 231-264.
Maxwell, Terrence A. “Information Policy, Data Mining, and National Security: False
Positives and Unidentified Negatives.” System Sciences. HICSS-38. 38th Hawaii
International Conference on System Sciences (Jan. 2005): 1-8.
Michael, Katina and Keith W. Miller. “Big Data: New Opportunities and New
Challenges”. IEEE Computer Society 46, no. 6 (2013): 22-24.
Roberts, Nancy C. “Tracking and disrupting dark networks: Challenges of data collection
and analysis”. Information Systems Frontiers 13, no.1 (2011): 5-19.

More Related Content

What's hot

Efficient Association Rule Mining in Heterogeneous Data Base
Efficient Association Rule Mining in Heterogeneous Data BaseEfficient Association Rule Mining in Heterogeneous Data Base
Efficient Association Rule Mining in Heterogeneous Data BaseIJTET Journal
 
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...IJCSIS Research Publications
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesDr. Amarjeet Singh
 
Semantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and OpportunitiesSemantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and OpportunitiesCSCJournals
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and VisualizationDr. Neil Brittliff
 
What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin eraser Juan José Calderón
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...IOSR Journals
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
 
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...Anastasija Nikiforova
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...IJMTST Journal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...ijcseit
 

What's hot (19)

Efficient Association Rule Mining in Heterogeneous Data Base
Efficient Association Rule Mining in Heterogeneous Data BaseEfficient Association Rule Mining in Heterogeneous Data Base
Efficient Association Rule Mining in Heterogeneous Data Base
 
Data Mining
Data MiningData Mining
Data Mining
 
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: Challenges
 
Semantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and OpportunitiesSemantic Web Mining of Un-structured Data: Challenges and Opportunities
Semantic Web Mining of Un-structured Data: Challenges and Opportunities
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 
10 problems 06
10 problems 0610 problems 06
10 problems 06
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
Web Mining for an Academic Portal: The case of Al-Imam Muhammad Ibn Saud Isla...
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
A Novel Approach of Data Driven Analytics for Personalized Healthcare through...
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
 

Similar to Terrorism in the Age of Big Data

Data Mining And Visualization of Large Databases
Data Mining And Visualization of Large DatabasesData Mining And Visualization of Large Databases
Data Mining And Visualization of Large DatabasesCSCJournals
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextMurad Daryousse
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Katie Whipkey
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...g8briel
 
A Review Of Data Mining Literature
A Review Of Data Mining LiteratureA Review Of Data Mining Literature
A Review Of Data Mining LiteratureAddison Coleman
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...
THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...
THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...IJNSA Journal
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESSIJDKP
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESSIJDKP
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptSangrangBargayary3
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
 
Module 6 DiscussionIntroductionData ethics is a branch of ethics.pdf
Module 6 DiscussionIntroductionData ethics is a branch of ethics.pdfModule 6 DiscussionIntroductionData ethics is a branch of ethics.pdf
Module 6 DiscussionIntroductionData ethics is a branch of ethics.pdfsaxenaavnish1
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor networkparry prabhu
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioAI Publications
 

Similar to Terrorism in the Age of Big Data (20)

Data Mining And Visualization of Large Databases
Data Mining And Visualization of Large DatabasesData Mining And Visualization of Large Databases
Data Mining And Visualization of Large Databases
 
Big data survey
Big data surveyBig data survey
Big data survey
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
 
mineria de datos
mineria de datosmineria de datos
mineria de datos
 
mineria datos
mineria datosmineria datos
mineria datos
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
 
A Review Of Data Mining Literature
A Review Of Data Mining LiteratureA Review Of Data Mining Literature
A Review Of Data Mining Literature
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...
THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...
THE INTEREST OF HYBRIDIZING EXPLAINABLE AI WITH RNN TO RESOLVE DDOS ATTACKS: ...
 
Big Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
Big Data Research Trend and Forecast (2005-2015): An Informetrics PerspectiveBig Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
Big Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
LINK MINING PROCESS
LINK MINING PROCESSLINK MINING PROCESS
LINK MINING PROCESS
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
 
Big Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictionsBig Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictions
 
Module 6 DiscussionIntroductionData ethics is a branch of ethics.pdf
Module 6 DiscussionIntroductionData ethics is a branch of ethics.pdfModule 6 DiscussionIntroductionData ethics is a branch of ethics.pdf
Module 6 DiscussionIntroductionData ethics is a branch of ethics.pdf
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
Understand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present ScenarioUnderstand the Idea of Big Data and in Present Scenario
Understand the Idea of Big Data and in Present Scenario
 

Terrorism in the Age of Big Data

  • 1. Maman 1 HUNTING TERRORISM IN THE AGE OF BIG DATA: CURRENT DATA MINING CHALLENGES IN INTELLIGENCE COLLECTION BY MICHAEL MAMAN RESEARCH METHODS IN SECURITY AND INTELLIGENCE STUDIES - INTL500 DR. TATARKA AMERICAN MILITARY UNIVERSITY
  • 2. Maman 2 Introduction The attacks on September 11, 2001 were a devastating blow to the United States and its Intelligence infrastructure, which was later chastised for not having greater foresight at detecting activity leading up to the event. Since then, agencies within the Intelligence Community, or IC, enjoy greater cooperation and data sharing, as well as bolstering methods of data collection to detect terrorism. Other agencies like the Defense Advanced Research Project Agency (DARPA) were invited to try their hand at data mining methods to aid in the effort. Data mining, the method of detecting patterns through sifting through large volumes of data, has been a method long employed in intelligence gathering. However, the amount of data in circulation today is unprecedented, and only continues to grow. A decade after 9/11, the IC is now grappling with “Big Data”; collections of information from phone records, to online purchases, credit history, and most recently social network activity, all of which are only fractions of the larger data collection scope. With the ever increasing mobilization of media, terrorists are becoming more adaptable and making greater efforts to remain hidden. As data collection continues to grow, the challenge for the IC is finding suitable methods of accurately analyzing the data, and sorting the ‘bad’ data from the ‘good’ amidst an over glut of information. Utilizing research from the IC, business, and the information technology sector, this essay will examine the issues of big data in terms of its current issues and promises for the future at detecting terrorist activity. If the IC is unable to catch up with the ever growing amount of information collected, then the risk of undetected terrorist activity may increase in the future. Understanding that the main challenge of
  • 3. Maman 3 intelligence gathering with big data is effective analysis, this essay will also explore methods of mitigating that challenge. In addition, this paper will advance some alternative methods to big data collection, suggesting a “smarter” way at collecting information to detect patterns as opposed to unrestrained data collection. Literature Review With big data still an emergent topic, academic literature research into big data and intelligence gathering is still being published and perhaps not enough a researcher could hope for. However, there is still substantial research on big data itself and its application to project future trends with use. Moreover, literature on previous “pre-big data” programs following 9/11 should offer some insight into the workings of its methodology and the challenges it poses. With the amount of information big data pulls, preliminary research suggests that it can in fact, “…bridge the gap between what people want to do and what they actually do as well as how they interact with others in their environment” (Michael and Miller 2013, 23). Certainly, big data opens the possibilities to how much information can be collected on individuals. With terrorism, however, the issue lies in detecting patterns that are meant to be hidden. Big data gathering on consumer behavior and social network profiles are fairly conspicuous. Moreover when it comes to terrorism, authors Chen et al. (2012, 1172) identify major hurdles that need to be tackled in the areas of information processing and analysis. They write that “…diverse data sources, multiple data formats, and large data volumes” creates an information overload which current research is trying to address (1172). Here, specific attention is paid to data mining for terrorist activity, giving mention to a recent DARPA program in 2012 known as “XDATA” (1172). The XDATA program’s objective is to “…help develop computational techniques and software tools for processing and analyzing…” the massive amounts of information so it can be
  • 4. Maman 4 synthesized in an organized manner for the IC (1172). Whether the XDATA program is yielding success is still to be determined, but Chen et al. do an adequate job of addressing the current challenges for big data in the national security front. However, the DARPA project is the only mention of a potential mitigation to handling extremely large quantities of data sets. In this case, the solution, or mitigation seems to be introducing more effective algorithms. Nevertheless, the XDATA program will be one of continual interest and promise in this regard. Next, this paper brings in the findings and viewpoint of strategy analyst Stephane Lefebvre to once again emphasize the scalability issue in data related to threat detection. Written in 2004, Lefebvre mentions the point that intelligence analysts at a “military tactical level” can “…receive over 17,000 reports per hour from sensors alone” (Lefebvre 249). Of course, if the analyst cannot identify what is valuable then the data is useless (249). One can only imagine how much has changed for intelligence analysts in almost a decade. Lefebvre then discusses data storage, and how intelligence agencies, having massive amounts of data stores in databases, must have “…fast and accurate algorithms” to analyze the data (249). Later he discusses “pre- big data” projects by DARPA such as “Total Information Awareness” to collect large quantities of public and private data on all American citizens, as well as project Genoa (249-50). To once again emphasize the 2004 publication date of Lefebvre’s work, it’s important to note that at the time those DARPA programs were still in their infancy and being discussed. Lefebvre mentions how intelligence analysts are increasingly relying on technical support to handle reporting, but little concern is given to being overwhelmed with handling the data unlike with Chen et al. Lefebvre’s piece is still useful exposition in unraveling the nature of data collection and the importance of analytics. Next, this essay brings the work of Terrence Maxwell (2005) to illustrate further
  • 5. Maman 5 challenges with data mining pre-big data and storage. He begins by identifying ‘data warehouses’ as “…massive databases” allowing analysts to access data from multiple databases (Maxwell 3). However, according to data warehousing experts1 , the “possibility of error in data warehouse with multiple inputs and data collected over time is quite high” (5). This is framed from the context of 2005, but some of the factors to reach that conclusion were from ‘static’ data mining models taken from the DARPA TIA project, assuming that patterns of individuals, in both beliefs and relationships, do not change over time (5). He goes on to say that it could be misguided of data mining models to attempt to catch terrorism by “detecting relevant relationships and patterns of activity that correspond to potential terrorist events, threats, or planned attacks” (6). The reason for this is because terrorists, who constantly want to remain hidden, are also likely to adapt and evolve their tactics; a problem Maxwell says, data mining developers acknowledge, but “do not adequately respond to” (6). This is how false positives can be identified and benign activities can be mistaken for nefarious. Although Maxwell’s paper is from 2005, there is no evidence to indicate that IC uses big data different from the data mining models of looking for patterns. Maxwell’s paper is one of strong interest to this field, and contains many notable references that will possibly be explored further. For now, this paper will move on to the final source and make its preliminary conclusions. The final piece of academic literature to be examined is by Nancy Roberts (2011) specifically identifying the challenges and opportunities for data mining for the IC. Roberts’s paper is particularly useful as it identifies the roles in intelligence gathering for each particular agency beginning with the CIA. The next section deals with particular challenges to data 1 Data Warehousing Center (2000). “An Informal Taxonomy of Data Warehouse Data Errors”. http://businessweek.itpapers.com/abstract.aspx?scid=1003&sortby=title&docid=6729
  • 6. Maman 6 collection, several of which have already been mentioned, such as scalability, emphasizing that the “information glut” will get worse with the amount of data continually being collected and the amount of storage continually growing to hold it (Roberts 9). The next challenge identified is the ability to extract the pertinent information from massive data sets, again previously discussed (10). The information of note here is that Roberts substantiates the suspicions about the difficulties collecting information on dark networks, in that “Data on terrorists is dynamic, not static” (11). Lastly, Roberts closes with introducing the concept of “visual analytics”, what’s called “an emerging field dedicated to improving data collection and analysis through the use of computer –mediated visualization techniques and tools” (5). Roberts goes over the history of the visual analytics field and two prominent firms pioneering the technology. The area of visual analytics will further be explored to see if it can be a potential solution in tackling the analysis challenge in big data. Conclusion Literature on Big Data, related specifically to the IC is indeed limited, but the research already acquired in its big data’s relation to data mining seems to be sufficient in outlying its current challenges in analytics. Most particularly, in the area of ever growing data collection and the need for developing more sophisticated algorithms and programs at detecting relevant information. Detecting patterns for terrorist activity, as has been the standard for data mining, may not be the most ideal method given the dynamic nature of dark network activity. Ultimately, more research is still needed in mitigation methods for tackling the challenges posed by big data, and if possible, how much promise they project for the future.
  • 7. Maman 7 References Hsinchun Chen, et al. “Business Intelligence and Analytics: From Big Data to Big Impact.” MIS Quarterly 36, no.4 (2012): 1165-1188. Lefebvre, Stephane. “A Look at Intelligence Analysis”. International Journal of Intelligence and CounterIntelligence 17 (2004): 231-264. Maxwell, Terrence A. “Information Policy, Data Mining, and National Security: False Positives and Unidentified Negatives.” System Sciences. HICSS-38. 38th Hawaii International Conference on System Sciences (Jan. 2005): 1-8. Michael, Katina and Keith W. Miller. “Big Data: New Opportunities and New Challenges”. IEEE Computer Society 46, no. 6 (2013): 22-24. Roberts, Nancy C. “Tracking and disrupting dark networks: Challenges of data collection and analysis”. Information Systems Frontiers 13, no.1 (2011): 5-19.