Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Pac sec2011 ruoando-nict-2011-11-09-01-eng
1. Rapid and Massive monitoring of
DHT: crawling 10 millions of nodes in
24 hours
PacSec Tokyo November 2011
In this presentation, we present our high-speed DHT Crawler
to monitor 10 millions of nodes in 24 hours !
Ruo Ando
NICT National Institute of Information and Communications Technology
Takayuki Sugiura
NetAgent Co. Ltd.
1
2. Overview:
detecting illegal adoption in huge network
• BitTorrent becomes irreplaceable network application for
distributing software and contents. But ..
• No one can know its exact scale and dynamics !
How many nodes join and disappear in BitTorrent network in 24
hours ?
• BitTorrent network is huge and no one can know about where
(potential) security incidents and illegal adoption has been
occurred !
• We have tackled this challenge of monitoring the largest scale
network using our rapid and massive DHT crawler.
• We have succeeded to obtain 10,000,000 nodes in 24 hours !
• Also, visualizing the dynamics of BitTorrent Network is
presented ! 2
PacSec 2011
4. BT: The largest file sharing network in the world.
It is estimated that BitTorrent has 70 million active
users and 100 million total users and it is still
increasing !
4
PacSec 2011
5. BitTorrent is now expanding and everywhere !
●BitTorrent in portable USB storage devices
http://www.iodata.com/
●Android: BitTorrent Client | aBTC
Available in about $5 !
https://market.android.com/
5
PacSec 2011
6. The old new problem: illegal contents downloads
BitTorrent is the one of the most efficient way to
share large files such as Operating system IOS.
Unfortunately, BT is at the same time a very efficient way to
download protected (copyright) content sush as movies and
music in illegal manner.
The biggest case of BitTorrent:
In 2010, United States Copyrights Group(USCG) said that
23,322 IP addresses have allegedly infringed the movie
of Expendables. The settlements is around $3,000 per
infringement.
6
PacSec 2011
7. The case of Limewire 2010 Oct
●In 2010 Oct, A New York judge ordered
LimeWire to shutdown its file-sharing software.
US federal court judge issued that Limewire’s
service is used as one of the software for
infringement of copyright contents.
●Later soon, the new version of Limewire called
LPE (Limewire Pirate Edition) has been released
as resurrection by anonymous creators.
7
PacSec 2011
8. Right to be deleted or forgotten?
2010 Nov: EC announced the plan for setting
out strategy to strengthen EU data protection
rules.
EU people basically recognize the current
Pervasive use of BitTorrent and its potential
as promising. Also, EU people would like
BitTorrent to be adopted in legal manner.
8
PacSec 2011
9. Dot-P2P
Domain seizures and BT based DNS
●2010 June: WikiLeaks leverages torrent and magnet links for
distributing files.
●U.S. Immigration and Customs Enforcement (ICE) seizures the site
domain of BT meta search engine.
●U.S proposed Combating Online Infringement and Counterfeits Act’
(COICA) which would allow the Department of Justice to order the
domain register to take the domain offline. COICA will be aimed to
increase the government’s censorship powers.
●In a direct response to the domain seizures by US authorities, Dot-
P2P project proposes ICANN or IPS independent DNS service.
●In Dot-P2P system, a request for .p2p TLD is redirected to a locally
hosted DNS database. The traffic is encrypted and sent according to
the BitTorrent protocol which result in that .p2p TLD is decentralized
and independent of ICANN or any IPS’s DNS service. 9
PacSec 2011
10. BitTorrent History
The implementation of BitTorrent has been
started by Bram Cohen in 2001.
He has released client software in 2003.
In 2003, a user in EU has released ISO image of
Red Hat and the 30,000 image has been downloaded in 3
days.
In 2004, he had formed BitTorrent Inc and by mid
2005, BitTorrent Inc was funded by VC.
10
PacSec 2011
11. BitTorrent Traffic estimations
① “55%” - CableLabs
About an half of upstream traffic of CATV.
② “35%” - CacheLogic
“LIVEWIRE - File-sharing network thrives beneath
the Radar”
③ “60%” - documents in www.sans.edu
“It is estimated that more than 60% of the traffic on
the internet is peer-to-peer.”
11
PacSec 2011
12. Basic architecture of tracker network
① Ask
Node A (newcomer) ask the
tracker for searching the file.
② torrent download
Tracker provides torrent file.
③ join
Node A queries node B.
④ download
Node A can downloads pieces
of file on swarm network
Seeder has a complete file.
Leecher has pieces of file.
12
PacSec 2011
13. BitTorrent Network
tracker or DHT (trackerless)
Tracker – a dedicated machine which stores torrent files,
tracks of which nodes are downloading and uploading.
DHT – decentralized network architecture to share the
functionality of the tracker. DHT is decentralized, but is
more scalable than pure-P2P.
DHT (Distributed Hash Table) is method using <key,value>
pairs. DHT lookup method enables us to discover the
location of the node who shares the responsibility of tracker
of a file share.
Recently DHT network has been paid much attention due to Dot-P2P
project and Pirates Bay’s confirmation of stopping tracker.
13
PacSec 2011
14. DHT Protocol
●DHT is not new spec
Introduced to Azureus (2005) and BitCommet (2005).
●Based On Kademlia, XOR based DHT
Petar Maymounkov and David Mazières. Kademlia: A peer-
to-peer information system based on the XOR metric. In
Proceedings of the 1st International Workshop on Peer-to
Peer Systems (IPTPS '02)
●Supported by many clients apps.
uTorent 1.8.5、Vuze 4.3.0.2、BitTorrent 6.3、
BitComet 1.16、Transmission 1.76
14
PacSec 2011
15. DHT Protocol
●Magnet links are URLs which enables each
node download and/or distribute contents
without querying tracker site.
●Magnet link is provided by Pirates Bay and
Mininova to fasten the download (base32
encoded and hex encoding).
●2010 Pirate Bay moves to magnet-link oriented
DHT, shutting down their server.
●Magnet link enables BitTorent network tracker-
less ?
15
PacSec 2011
16. DHT Protocol
DHT network is scalable architecture for file sharing system.
Pure P2P: hundreds of thousands of nodes
DHT: millions of nodes
BitTorrent DHT network is implemented over KRPC. KRPC
protocol is a RPC over UDP.
DHT Queries has four kinds of message: ping, find_node,
get_peers and announce_peer. Each is implemented
according to B-Encode.
16
PacSec 2011
17. DHT Protocol
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.
• PING : the basic query for checking the queried node is
alive. 20-byte string. Network byte order.
• FIND_NODE : used to obtain the contact information of ID.
Response should be a key “nodes” or the compact node
info for the target node or the K (8) in its routing table.
arguments: {"id" : "<querying nodes id>", "target" : "<id of
target node>"}
response: {"id" : "<queried nodes id>", "nodes" :
"<compact node info>"}
17
PacSec 2011
18. DHT Protocol
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.
• GET_PEERS : used to cope with a torrent infohash.
if the queried node has peers for the infohash, response is a key
values as a list of strings.
if not, K nodes in the queried nodes routing table closest to the
infohash
• ANNOUNCE_PEER : used to announce the peer which has the
querying node is downloading a torrent on a port.
arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte
infohash of target torrent>", "port" : <port number>, "token" :
"<opaque token>"}
18
PacSec 2011
19. Monitoring system architecture
DHT network
Reduce
DHT Crawler DHT Crawler DHT Crawler
Shuffle
Scale out !
Map Map Map
Key value store
<key>=node ID
<value>=data (address, port, etc) Dump Data
19
PacSec 2011
20. Scaling out crawlers !
The response should be a key nodes of
or the compact node info for the target node
or the K (8) in its routing table.
Info of key nodes and K(8) should be
randomly distributed.
DHT network So scaling out crawlers is effective way to
expand monitoring range !
DHT crawlers is running on virtualized
DHT Crawler DHT Crawler DHT Crawler Linux image.
Hypervisor is VMWare ESX which provides
Hypervisor rich interface to manage crawlers.
20
PacSec 2011
21. Hadoop & MapReduce
Retrieval
geoLocation
domain name
Reduce Translation
KML (XML)
Shuffle Ranking
Scale out ! wordcount
sorting
Map Map Map
Hadoop & MapReduce
running on Linux RH
Dump Data
21
PacSec 2011
23. Visualization & ranking
*.*.39.201,6881,2011/9/25 23:57:43,1
*.*.210.128,62845,2011/9/25 23:56:32,1
*.*.33.212,6881,2011/9/25 23:33:58,1
*.*.9.21,49924,2011/9/25 23:37:02,1
IP address Time
Location Info
Domain name (country, city, latlng)
KML movie
250
200 Figure
150
100
23
50
ranking 0
1 2 3 4 5 6 7 8 9 10 11 12
GB RU JP CN US
PacSec 2011
24. Map Reduce
Map Reduce
Input Map Reduce Output
Map Reduce
MapReduce is the algorithm for coping with Big data.
map(key1,value) -> list<key2,value2>
reduce(key2, list<value2>) -> list<value3>
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004. 24
PacSec 2011
25. Map
*.*.194.107,h116-0-194-107.catv02.itscom.jp
*.*.27.107,c-76-28-27-107.hsd1.ct.comcast.net
*.*.239.181,c-68-40-239-181.hsd1.mi.comcast.net
*.*.44.184,pool-96-253-44-184.prvdri.fios.verizon.net
*.*.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com
*.*.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com
*.0.194.107 hdsl1 comcast hdsl1 comcast verizon virginmedia
1 1 1 1 1 1 1
Log string is divided into words and assigned “1”.
key-value – {word, 1}
25
PacSec 2011
28. # of nodes Ranking in one day
RANK Country # of nodes Region Domain
1 Russia 1,488,056 Russia RU
2 United states 1,177,766 North America US
3 China 815,934 East Asia CN
4 UK 414,282 West Europe GB
5 Canada 408,592 North America CA
6 Ukraine 399,054 East Europe UA
7 France 394,005 West Europe FR
8 India 309,008 South Asia IN
9 Taiwan 296,856 East Asia TW
10 Brazil 271,417 South America BR
11 Japan 262,678 East Asia JP
12 Romania 233,536 East Europe RO
13 Bulgaria 226,885 East Europe BG
14 South Korea 217,409 East Asia KR
15 Australia 216,250 Oceania AU
16 Poland 184,087 East Europe PL
17 Sweden 183,465 North Europe SE
18 Thailand 183,008 South East Asia TH
19 Italy 177,932 West Europe IT
20 Spain 172,969 West Europe ES
28
PacSec 2011
29. visualization
KML (Keyhole Markup Language)
■ KML is a XML-like file format for for displaying
geographic data on Google Earth.
■ Timespan tag makes it possible to make our crawling
log smoothly animated on Google Earth.
29
PacSec 2011
30. EU: 4 UK 414,282 West Europe GB
UK (code: GB)
N/A 77490
London 47559 (7550000: 0.6%)
Manchester 9808 (441000: 2%)
Birmingham 6617
Leeds 5111
Glasgow 4841
Brighton 4788
Liverpool 4445
Bristol 3814
Sheffield 3536
Upon 3363
250
Edinburgh 3140
200 Nottingham 2412
150
Newcastle 2297
Bradford 2093
100
Tyne 2091
50 Stoke-on-trent 2021
0
Coventry 1965
1 2 3 4 5 6 7 8 9 10 11 12
Preston 1902
GB RU JP CN US 30
Reading 1814
PacSec 2011
33. Demo: observed nodes in Moscow
10 millions of nodes
in 24 hours !
33
PacSec 2011
34. Island in the stream: Male
[root@localhost ranking]# geoiplookup -f
MV, 40, Male, N/A, 4.166700, 73.500000,
0, 0
34
PacSec 2011
35. Island in the stream: Arue
[root@localhost ~]# nslookup *.*.*.*
Non-authoritative answer:
.in-addr.arpa name = *.*.*.*
dsl.dyn.mana.pf.
Authoritative answers can be found from
PF, 00, Arue, N/A, -17.516800, -
149.500000, 0, 0
35
PacSec 2011
36. Rank 2 United states 1,177,76
N/A 207179
San 29263
??
Dallas 18899
New 16213
Saint 11933
Houston 11401
Los 10931
Chicago 10876 25675
Fort 10845
Park 10465
Angeles 10400
250
Brooklyn 9769
York 9462
200
Lake 8885
150 Miami 7575
100
Diego 7161
Francisco 6743
50
Portland 6553
0
1 2 3 4 5 6 7 8 9 10 11 12
Washington 6266
GB RU JP CN US Las 6205
36
Vegas 5956
PacSec 2011
37. Rank 2 United states 1,177,766
user 78494
com 76803
br 45945
veloxzone 42333
ono 27937
dyn 26909
84 8460
users 4754
81 4336
ru 4266
62 4189
Veloxzone net 3725
veloxzone.com.br – Robtex 85 3134
?? mns 2681
82 2454
79 2152
Operadora de telefonia
212 2122
celular brasileira pertencente aos grupos
vivozap 1952
Portugal Telecom e Telefonica.
213 1889
?? 37
217 1868
PacSec 2011
41. rank 3 China 815,934 East Asia CN
cn 90196
com 65413
dynamic 65060
163data 64647
broad 59136
adsl-pool 17127
sh 10473
xw 10398
net 10352
sx 10196
gd 9641
222 9297
fj 8826
dynamic.163data.com.cn js 8531
?? jlccptt 7820
zj 6900
吉林省数据通信局 117 6687
北京新网数码信息技术有限公司 125 6532
?? 218 6371
60 6244 41
PacSec 2011
42. ALL cities
N/A 978457
Moscow 285097 (RU:1)
Beijing 240419 (CN:3)
Seoul 180186 (KR) (1000000:1%)
Taipei 161498 (TW:9)
Kiev 117392 (RU:1)
Saint 94560 (Petersburg ?)
Bucharest 79336 (1940000:4%)
Sofia 78445 (BG:13)
New 72424
Petersburg 71175 (RU:1)
Central 65635 (HK?)
District 65485 (HK?)
Bangkok 62882 (TH:18)
Delhi 62563 (IN:8)
Tokyo 54531 (JP:11)
London 53514 (GB:4)
Guangzhou 52981 (CN:3)
Athens 52656 (3680000: 1.4%)
Budapest 52031 (1,733,685: 3%)
42
PacSec 2011
43. All the world
net 2676477 co 171029
com 1369148 rr 170298
ru 869195 res 169568
dynamic 685144 ca 165639
dsl 430313 hinet 162089
comcast 303649 pl 160772
hsd1 303626 it 151052
br 244534 fr 146154
jp 226366 bb 143578
adsl 222170 hu 139452
cable 217597 sbcglobal 135016
Comcast: High Speed Internet, au 203850 ua 133288
Cable TV, and Phone Services Deals dyn 200646
pppoe 187455
pool 183580
HiNet首頁台灣最大ISP,提供寬頻網路
static 180225
ne 173788
sbcglobal.net - Network Solutions broadband 173384
??
43
PacSec 2011
44. Demo: flying over Eurasia
10 millions of nodes
in 24 hours !
44
PacSec 2011
45. conclusion
In this presentation, we have shown the possibility of obtaining information of
10,000,000 nodes in 24 hours.
In current P2P and DHT network, each node can be easily monitored. And
there are many challenges and interesting topics for illegal adoption of
BitTorrent.
Our crawling system can provide the ranking of countries, cities and domain
providers.
It is shown that DHT network is actually large and scalable network !
BitTorrent has a huge potential to be alterative and unseen network architecture !
45
PacSec 2011