• Save
The Terrorism Knowledge Portal: Advanced Methodologies for Collecting and Analyzing Information from the ‘Dark Web’ and Terrorism Research Resources
Upcoming SlideShare
Loading in...5

The Terrorism Knowledge Portal: Advanced Methodologies for Collecting and Analyzing Information from the ‘Dark Web’ and Terrorism Research Resources






Total Views
Views on SlideShare
Embed Views



3 Embeds 5

http://www.slideshare.net 3
http://www.linkedin.com 1
https://www.linkedin.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

The Terrorism Knowledge Portal: Advanced Methodologies for Collecting and Analyzing Information from the ‘Dark Web’ and Terrorism Research Resources The Terrorism Knowledge Portal: Advanced Methodologies for Collecting and Analyzing Information from the ‘Dark Web’ and Terrorism Research Resources Presentation Transcript

  • The Terrorism Knowledge Portal Advanced Methodologies for Collecting and Analyzing Information from the “Dark Web” and Terrorism Research Resources By: Hsinchun Chen, PhD, with Wingyan Chung; Cathy Larson; Edna Reid, DLS; Wei Xi; Alfonso Bonillas; Chun Yin; Yu Su and Greg Lai; The University of Arizona August 14, 2003 We gratefully acknowledge support from the National Science Foundation
  • “ There is a potential here to reach millions [Black said of the Internet]. I think it’s a major breakthrough. I don’t know if it’s the ultimate solution to developing a white rights movement in this country, but it’s certainly a significant advance.” -- Don Black, white supremacist, KKK, launched Stormfront , the first extremist hate web site on WWW “ Up until now, unless someone met me personally, or read my material, the only way they could judge me is by what the liberal-biased media says. Now, that situation has changed. Millions of people are going online in America. Now, if they want to find out about me and my ideas and issues all they have to do is go into one of the search engines and search for “David Duke.” Hundreds [actually about 1,080,000] of sources will show up.”-- David Duke, Knights of the KKK, in “The Coming White Revolution – Born on the Internet”
  • “ Terrorist groups are increasingly using new information technology and the Internet to formulate plans, recruit members, communicate between cells and members, raise funds, and spread propaganda. Terrorist groups including Hizballah, the Abu Nidal Organization, and bin Laden’s Al-Queda Organization are using computerized files, email, and encryption to support their operations.” – Louis J. Freeh, Director, FBI, Congressional Statement, “Threat of Terrorism to the United States,” 2001
  • Agenda
    • Background
    • Review of the field
    • Data Collection and Data Analysis
    • Methodology for Creating Testbed
    • Research Issues
    • Timeline
  • Background
  • Misuse of the Internet
    • The Internet has evolved to be a global platform for anyone to use in disseminating, sharing, and communicating ideas
    • Misuse of the Internet has become increasingly serious
      • Spreading of viruses, pornographic pictures and misleading information
      • Web hosting of terrorists, extremist groups, hate groups, racial supremacy groups
    • Due to the global threat of terrorism, it is important to obtain intelligence from the Web to enable better understanding and analysis of these terrorist groups
      • “ Dark Web”: The alternate side of the Web which is used by terrorist and extremist groups to spread their ideas
  • Important Definitions
    • Terrorism : “Premediated, politically motivated violence perpetrated against noncombatant targets by subnational groups or clandestine agents, usually intended to influence an audience”
    • International terrorism : “Terrorism involving citizens or the territory of more than one country”
    • Terrorist group : “Any group practicing, or that has significant subgroups that practice international terrorism”
    Source: U.S. State Dept., Patterns of Global Terrorism 2002 . “The US Government has employed this definition of terrorism for statistical and analytical purposes since 1983.”
  • Intelligence from the Dark Web
    • Search engines provide searching and browsing of Dark Web information
      • Properly collected and analyzed information can be transformed into intelligence and knowledge
      • Link search can be used with some search engines to collect information about the social community of terrorist groups
    • Bulletin boards allow terrorist and extremist groups to promulgate their ideas
      • The ideologies of and actions done by those groups can be traced
      • Members of these groups provide links and self-descriptions
    • Internet archives store valuable collections of many Web sites, including the Dark Web
      • Evolution of terrorists’ or criminal organizations’ Web sites can be traced
  • Problems
    • Despite the availability of information on the Web, several problems prevent effective and efficient discovery of Dark Web intelligence
      • Information overload on the Web
      • Related information tends to be scattered in different sources, making it hard to obtain a big picture
      • Most information is “about” the Dark Web (such as news articles, government Web pages); a much smaller percentage developed by terrorist and extremist groups
      • Data posted on the Web are not persistent and may be misleading
      • Language and translation problems
    • There has been no general methodology for collecting and analyzing Dark Web information
  • The Terrorist Knowledge Portal
    • Goals:
    • Develop a common understanding of the “Dark Web” as it pertains to the use of the Internet by terrorist and extremist groups
    • Discover the best methodologies for discovering links between groups and individuals
    • Test the hypothesis that the methods proposed here will be effective in revealing the social networks of terrorists and can further the work of preventing future terrorism acts
  • Review of Terrorism Research Information and Terrorist Groups
  • Review
    • Three areas are reviewed:
      • Information about terrorists and terrorism
      • Domestic terrorist groups
      • International terrorist groups
    • Scope – where useful information can be found :
    • Government information resources such as FBI web site, the Department of State, government news sites, etc.
    • Anti-hate organizations such as the Anti-Defamation League publications and web reports
    • Research and other organizations’ web sites, databases and reports, such as the Memorial Institute for Prevention of Terrorism and the like
    • Individuals’ web sites , such as The Hate Directory
    • Commercially licensed news databases , archives and newspapers such as Lexis-Nexis, the Washington Post, etc.
    • Commercially licensed academic databases , such as Sociological Abstracts, Criminal Justice Abstracts, etc.
    Information about Terrorism
  • A Few Examples
    • State Dept. “Patterns of Global Terrorism” (report of terrorist activity)
    • Terrorism Research Center, http://www.terrorism.com/ (research organization)
    • Anti-Defamation League (issues, news releases, and other resources on terrorism topics)
      • http://www.adl.org/adl.asp and http:// www.adl.org/ict/default.asp
    • Terrorism Project (list of known terrorist organizations) http://www.cdi.oeg/terrorism/terrorist-groups.cfm
    • Memorial Institute for Prevention of Terrorism (MIPT) (located in Oklahoma) http:// db.mipt.org / (database of incidents, etc.)
    • Literally millions of individual resources exist – searching them all is painstaking and time-consuming….
  • From the Dark Web: Domestic Terrorist Groups
    • KKK: http:// www.americanknights.com / - Dozens of affiliated groups (such as domestic and international KKK chapters and affiliated organizations) identified through Yahoo newsgroup member profiles, back-linking, and standard search engines
    • ALF / ELF http:// www.animalliberationfront.com / and http:// www.earthliberationfront.com/main.shtml – Found over 200 results through back-linking (news portals, personal pages, environmentalist web sites, etc.)
    Numerous groups have openly posted their web sites to the public web, but many more are not so easily accessible
  • From the Dark Web: International Groups
    • Palestinian Islamic Jihad
      • Searched Google using group’s name
      • Browsed results following those likely to be Web pages or sites maintained or developed by terrorist groups
      • Examined both top- and lower-ranked results
      • Access blocked to top level page; pages below top remained accessible
      • WHOIS database showed domain name QUDSWAY.COM and registrant Khayat, Radwan (QUDSWAY3-DOM) & address
  • From the Dark Web: International Groups
    • Aum Supreme Truth (aka, Aleph)
      • Searched standard search engines, newsgroups, and news portals to identify official web site: http://www.aleph.to
      • Employed “Find web pages that are similar to …” & “Find web pages that link to www.aleph.to ”
      • Results:
        • News groups that have posted relevant messages
        • Websites hosted by the group itself (mostly in Japanese)
        • Research institutes and researchers that conducted research on this group and their relevant publications
        • A keyword list
  • From the Dark Web: Findings
    • Information about each group that was gathered included:
      • The group’s mission, beliefs, slogans, and methods
      • Geographical location and branches, if any
      • Group members’ names
      • In-links and out-links
      • History of major incidents (dates, places, event-type, etc.)
    • This information was found through both information sources about terrorism as well as Dark Web sources
  • Data Collection and Data Analysis for Testbed Creation An Overview and Analysis of Potential Methods
  • Purpose
    • Create testbeds supporting terrorism research:
      • The Terrorism Knowledge testbed (information about terrorists and groups)
      • The Dark Web testbeds (information provided on the web by groups themselves)
      • See more information in section “Proposed Methodology”
    • Apply artificial intelligence techniques (such as self-organizing maps, concept space, and mutual information, for example) to social network analysis in order to decipher the terrorist social milieu and communication channels
  • Data Collection Methods (1)
    • Spidering of major terrorist Web sites
      • Pros: Good spidering tools are readily available for creating large collections
      • Cons: Need substantial human work in identifying seed URLs
        • A test completed using Palestinian Islamic Jihad’s Web site resulted in over 35,000 Web pages (in Arabic) being collected: the Web is a major means of terrorists’ communication
    • Meta searching on major search engines
      • Pros: Can get highly relevant Web pages
      • Cons: Most results convey pro-Western ideas; Need substantial human work in finding query terms and filtering irrelevant results
  • Data Collection Methods (2)
    • Back link search in Google and AltaVista
      • Pros: Provides information about which Web pages are pointing to a certain Web site
      • Cons: Only a limited number of search engines provide this service, thus coverage is not complete
      • Implication: The back links can be used to trace the relationships between a terrorist Web site and other entities
      • Further analysis is needed to explain why the terrorist Web site is pointed to by others, who may be allies or who may not share the same belief system
  • Data Collection Methods (3)
    • Group and Personal profile search on Yahoo Member Directory
      • Search the directory using keywords (e.g., Abu Sayaff)
      • Pros: A large number of personal profiles, as well as their interests and links, can be obtained
      • Cons: Much irrelevant information will be found
      • Cons: Reliability of information
      • Automatic extraction of personal profiles and links related to Abu Sayaff, Palestinian Islamic Jihad, and White Supremacy was tested; many irrelevant links were found
  • Data Collection Methods (4)
    • Messages on bulletin board systems
      • A large number of messages can be found - but it is difficult to identify those posted by terrorists/extremists
    • Web site searching on Internet Archive based on time and URLs
      • Can trace the evolution of Web sites over time
      • Archiving of terrorist Web sites is infrequent
  • Other methods
    • Use records from terrorism databases
      • Date is highly structured
      • Lack of personal profile and Web page outlink information
    • Reports from U.S. Government (e.g., State Dept.)
      • Detailed and usually up to date
      • Presents only pro-Western view
    • Academic databases
      • Well-researched, valid sources
      • Commercial licensing may inhibit use
      • Usually pro-Western view
  • Data Analysis (1)
    • Extraction of key words and phrases from terrorist Web sites
      • For building a Dark Web lexicon (e.g., organizations’ and persons’ names, special slogans or expressions)
      • For identifying key trends in terrorism
    • Extraction of outlinks from and searching for inlinks to terrorist Web pages
      • For showing the community network of terrorist organizations
  • Data Analysis (2)
    • Calculate social network measures to describe community behavioral patterns
    • Mapping of terrorist Web sites and pages
      • Provides visualization of terrorist groups on the Web
    • Temporal visualization
      • To allow exploration of terrorist group evolution
  • Proposed Methodology
  • An Automatic Text Mining Approach
    • The objective of developing the methodology is to automate the process of collecting and analyzing Dark Web information
      • This will increase the efficiency and effectiveness of analyzing terrorist behavior
    • An automatic text mining approach will be used that integrates the various collection and analysis methods
      • Text is the most common media of conveying ideas and most information collected is in textual format
      • Any single method is not enough to collect comprehensive information. Rather, a combination of methods is needed
  • Information Sources Collection Methods Automatic Spidering Back link search Personal Profile Search Meta Searching Downloading from Gov’t Web sites Filtering Data Storage Domestic Terrorism International Terrorism Terrorism research information The Web Dark Web Hate Groups | Racial Supremacy | Suicidal Attackers | Activists / Extremists | Anti-Government | … Terrorist Group Web sites Search Engines Terrorism databases Government information
  • Testbeds
    • Using the proposed approach, the following pilot testbeds will be created in order to validate the methodology:
      • Testbed for terrorism research information
        • News articles, government press release, terrorism research information, academic databases, etc.
      • Dark Web testbed for domestic terrorist groups
        • KKK groups, Animal Liberation Front, ELF
      • Dark Web testbed for international terrorist groups
        • Palestinian Islamic Jihad, Aum Supreme Truth (aka, Aleph)
    The basic unit in each testbed is a Web page, article or report
  • Plan for Developing the Testbed
    • Human identification
      • Seed URLs for automatic spidering and back link search
      • Keywords for personal profile search
      • Filtering rules and lexicon of stopwords
    • Web page collection
      • Domain-specific spidering on terrorist sites
      • Inlink and outlink pages of terrorist sites
      • Web pages of news articles and terrorism research information
    • Analysis techniques
      • Basic searching by keywords, entity extraction (mutual information), classification, categorization, visualization, network analysis
  • Research Issues
    • Question : Will our methodology be effective in revealing the social networks of terrorists and hence further the work of preventing future terrorism acts?
    • Challenges :
      • Combining human knowledge and machine efficiency to collect information from the Dark Web
      • Filtering out irrelevant information from a vast amount of terrorism data
      • Analyzing and visualizing the data and information collected in order to create intelligence
    • Assessment :
      • How effective is the testbed in advancing the field of security informatics? What are appropriate measures and indicators?
  • Challenges
    • Validating the data and information that is collected
    • Translation needed for pages found in Arabic, Japanese and other languages
    • Gaining access to sites through standard search engines
    • Access blocked to sites’ top pages (e.g., to U.S. IPs)
    • Potential personal risk involved in viewing these sites
  • Current Progress
  • Current Progress
    • We are in the process of building the three pilot testbeds:
      • Testbed for terrorism research information
      • Dark Web testbed for domestic terrorist groups
      • Dark Web testbed for international terrorist groups
  • Terrorism Knowledge Portal (1)
    • A prototype of our Terrorism Knowledge Portal has been finished
      • It contains the testbed for terrorism research information
    • The portal provides searching and browsing capabilities
    • Specific functionalities include
      • Search by words and/or phrases, Boolean search
      • Automatic term suggestion (Scirus, Concept Space)
      • Browsing supports: Summarization, Categorization and Visualization
      • Resource page on Terrorism Research Information
  • Terrorism Knowledge Portal (2)
    • Our testbed consists of Web pages collected from high quality information sources related to the terrorism domain
      • 37 seed URLs were carefully selected as starting domains for Web page spidering
      • Over 360,000 Web pages were collected
    • In addition, nine high-quality information sources were selected for meta searching
      • Four categories: Terrorism Databases, Research Institutes, Government Web sites, and News
    • Link to Demo
  • Dark Web testbed: Domestic Terrorist Groups
    • The following domestic terrorist groups have been selected:
      • Animal Liberation Front
      • Earth Liberation Front
      • Ku Klux Klan
    • Various analyses have been done:
      • Web site search, Yahoo! personal profile search
      • Back link search
    • Web page collection will be done next
  • Dark Web testbed: International Terrorist Groups
    • The following groups have been studied:
      • Palestinian Islamic Jihad
      • Aum Supreme Truth
      • Abu Sayyaf Group
      • Hizbollah
    • Information about their aims, histories, activities, chief leaders have been studied
    • Web page collection will be done next
      • Automatic analyses can be provided on the information collected
  • Contact: Hsinchun Chen [email_address] For more information, please visit http://ai.eller.arizona.edu/