This document summarizes research analyzing the topics and importance of services on the Tor network. It collected data on over 1 million pages from 3,347 Tor domains. Topic modeling found the most common topics were drugs, marketplaces, and gambling. Betweenness centrality identified directory domains as influential hubs. Eigenvector centrality established the Dream marketplace as the most structurally important "core" Tor service. The research concludes Tor is used for illicit marketplace activity more than open information discovery, and directories are critical for connecting domains.
1. Topic and Importance Evaluation of Tor
services
Mahdieh Zabihimayvan (Zabihimayvan.2@wright.edu)
Advisor: Dr. Derek Doran
Feb 2019
2. Introduction
Dark Web and Tor
Research Question
Related Work
Dataset Collection & Processing
Topic Evaluation
Importance Evaluation
Conclusion & Future Work
2
3. World Wide Web
Deep Web: Content on the
World Wide Web that cannot or
has not yet been indexed by
search engines
Dark Web: A subset of Deep
Web that requires unique
application layer protocols and
authorization schemes
Of great interest to parties
wanting anonymity while using
the web
3
4. Tor Network
The most popular Dark Web among others (such as I2P, Riffle, and
Freenet)
4
.onion
5. Work done on Tor Network
Conducted over information content of specific Tor pages and
domains:
drug trafficking
homemade explosives
terrorist activities
dark forums
general hackers
Drugs
weapons 5
6. Research Question
How diverse is the information and the services provided on Tor?
What are the “core” services of Tor? Is there even a core?
6
7. Dataset Collection and Processing
7
40,439 English
pages belonging to
3,347 domains
150,473
domains
1,236,433
distinct
pages
10. Importance Evaluation
In graph theory and network analysis, indicators
of centrality identify the most important vertices within a graph.
10
Betweenness Cent.: Detecting the amount of influence a node has
over the flow of information in a graph
Closeness Cent.: Measuring the mean distance from a vertex to other vertices
EigenVec cent.: Describing the structural
importance of a node as a function of the
importance of its neighboring nodes
12. Importance Evaluation (Eigenvector)
12
All domains with eigenvector
centrality ≥ 0.2 are part of the
Dream market.
This establishes the Dream
market as the most structurally
important, “core” service Tor
provides.
The especially low eigenvector
centralities of other domains are
further indicative of the
significance of the Dream
market’s structural importance
in Tor.
13. Importance Evaluation (Closeness)
13
This outcome is likely the product of
the directories HiddenWiki and
TorWiki having high Betweenness
centrality that enables many pairs of
domains to be few hops away from
each other via these directories.
This underscores the central
importance of directory domains to
connect Tor pages across domains, and
the fact that Tor domains tend to
remain undiscoverable without
directories.
14. Conclusion and Future Work
Over half of all domains constitute site directories or marketplaces to purchase and
sell goods or services.
small proportion of all Tor domains are used for information discovery or
cryptocurrency transactions using Bitcoin.
The surprisingly large percentage of gambling domains also suggest that people now
use Tor to play online gambling games.
Our measurements identified the Dream market as perhaps the “core” service of Tor,
as Dream market domains exhibit especially high eigenvector centralities.
Tor domains tend to remain undiscoverable without directories.
14
circumventing government censorship
releasing information to the public
sensitive communication between parties
private space to buy and sell goods and services.
Tor is free and open-source software for enabling anonymous communication. The name is derived from an acronym for the original software project name "The Onion Router".[8][9] Tor directs Internet traffic through a free, worldwide, volunteer overlay network consisting of more than seven thousand relays[10] to conceal a user's identity.
Onion routing is implemented by encryption in the application layer of a communication protocol stack, nested like the layers of an onion.
Answers to such questions would shed light into the broad structure of the Tor dark web, yield a through understanding of the kinds of services and information available on Tor, and reveal the most popular and important (from a structural perspective) services it provides.
we present a quantitative characterization of the types of information available across a large swath of English language Tor webpages.
We chose to only focus on English pages to facilitate our content analysis.
Our crawl encompasses over 1 million addresses, of which 150,473 are hosted on Tor and the remaining 840,527 returns to the visible web. We focus on 40,439 Tor pages belonging to 3,347 English language domains and augment LDA with a topic-labeling algorithm that uses a knowledge base (DBpedia) to assign semantically meaningful categories to the content on every crawled page.
Tor’s main utility for users may be to browse information and shop on marketplaces that require secrecy.
Dream market: The marketplace sells a variety of content, including drugs, stolen data, and counterfeit consumer goods, all using cryptocurrency.
In contrast, domains related to the free exchange of ideas and information (a powerful and positive use-case of Tor, particular to users in countries facing Internet censorship [9]), represented by Forum, Email, and News sites, account for just 23.66% of all domains.
Gambling domains indicate a surprisingly large (10.19%) percentage of domains, suggesting that people may now be turning to Tor to play online gambling games that are otherwise illegal to host in many countries around the world.
Finally, and perhaps surprisingly, Bitcoin and multimedia domains respectively constitute the smallest proportion of English Tor domains.
where {\displaystyle \sigma _{st}} is the total number of shortest paths from node {\displaystyle s} to node {\displaystyle t} and {\displaystyle \sigma _{st}(v)} is the number of those paths that pass through {\displaystyle v}.
An attack, removal, or failure of such directories may thus directly impact the number of Tor domains reachable by a casual browser exploring this dark web.
Eigenvector centrality [12] describes the structural importance of a node as a function of the importance of its neighboring nodes. The eigenvector centrality of vertex v i is given by the i th component of the eigenvector of A whose corresponding eigenvalue is largest.
We find a heavily skewed distribution of eigenvector centralities where a majority are close to zero.
Closeness centrality measures the mean distance from a vertex to other vertices.