SlideShare a Scribd company logo
1 of 46
Download to read offline
Rapid and Massive monitoring of
DHT: crawling 10 millions of nodes in
             24 hours



                   PacSec Tokyo November 2011

     In this presentation, we present our high-speed DHT Crawler
              to monitor 10 millions of nodes in 24 hours !

                                Ruo Ando
NICT National Institute of Information and Communications Technology
                             Takayuki Sugiura
                            NetAgent Co. Ltd.
                                                                       1
Overview:
      detecting illegal adoption in huge network
•   BitTorrent becomes irreplaceable network application for
    distributing software and contents. But ..

•   No one can know its exact scale and dynamics !
    How many nodes join and disappear in BitTorrent network in 24
    hours ?

•   BitTorrent network is huge and no one can know about where
    (potential) security incidents and illegal adoption has been
    occurred !

•   We have tackled this challenge of monitoring the largest scale
    network using our rapid and massive DHT crawler.

•   We have succeeded to obtain 10,000,000 nodes in 24 hours !

•   Also, visualizing the dynamics of BitTorrent Network is
    presented !                                                      2
                                                              PacSec 2011
Demo: observed nodes




    10 millions of nodes
    in 24 hours !




                                 3
                           PacSec 2011
BT: The largest file sharing network in the world.




It is estimated that BitTorrent has 70 million active
users and 100 million total users and it is still
increasing !

                                                                4
                                                          PacSec 2011
BitTorrent is now expanding and everywhere !

●BitTorrent in portable USB storage devices
http://www.iodata.com/

●Android: BitTorrent Client | aBTC
Available in about $5 !
https://market.android.com/




                                                 5
                                          PacSec 2011
The old new problem: illegal contents downloads

BitTorrent is the one of the most efficient way to
share large files such as Operating system IOS.

Unfortunately, BT is at the same time a very efficient way to
download protected (copyright) content sush as movies and
music in illegal manner.

The biggest case of BitTorrent:
In 2010, United States Copyrights Group(USCG) said that
23,322 IP addresses have allegedly infringed the movie
of Expendables. The settlements is around $3,000 per
infringement.
                                                            6
                                                      PacSec 2011
The case of Limewire 2010 Oct

●In 2010 Oct, A New York judge ordered
 LimeWire to shutdown its file-sharing software.

US federal court judge issued that Limewire’s
service is used as one of the software for
infringement of copyright contents.

●Later soon, the new version of Limewire called
LPE (Limewire Pirate Edition) has been released
as resurrection by anonymous creators.
                                                   7
                                             PacSec 2011
Right to be deleted or forgotten?

2010 Nov: EC announced the plan for setting
out strategy to strengthen EU data protection
rules.

EU people basically recognize the current
Pervasive use of BitTorrent and its potential
as promising. Also, EU people would like
BitTorrent to be adopted in legal manner.
                                                  8
                                            PacSec 2011
Dot-P2P
             Domain seizures and BT based DNS

●2010 June: WikiLeaks leverages torrent and magnet links for
 distributing files.

●U.S. Immigration and Customs Enforcement (ICE) seizures the site
 domain of BT meta search engine.

●U.S proposed Combating Online Infringement and Counterfeits Act’
 (COICA) which would allow the Department of Justice to order the
 domain register to take the domain offline. COICA will be aimed to
 increase the government’s censorship powers.

●In a direct response to the domain seizures by US authorities, Dot-
 P2P project proposes ICANN or IPS independent DNS service.

●In Dot-P2P system, a request for .p2p TLD is redirected to a locally
 hosted DNS database. The traffic is encrypted and sent according to
 the BitTorrent protocol which result in that .p2p TLD is decentralized
 and independent of ICANN or any IPS’s DNS service.                     9
                                                                PacSec 2011
BitTorrent History

The implementation of BitTorrent has been
started by Bram Cohen in 2001.

He has released client software in 2003.

In 2003, a user in EU has released ISO image of
Red Hat and the 30,000 image has been downloaded in 3
days.

In 2004, he had formed BitTorrent Inc and by mid
2005, BitTorrent Inc was funded by VC.


                                                        10
                                                   PacSec 2011
BitTorrent Traffic estimations
① “55%” - CableLabs
About an half of upstream traffic of CATV.

② “35%” - CacheLogic
“LIVEWIRE - File-sharing network thrives beneath
the Radar”

③ “60%” - documents in www.sans.edu
“It is estimated that more than 60% of the traffic on
the internet is peer-to-peer.”

                                                    11
                                               PacSec 2011
Basic architecture of tracker network

                               ① Ask
                               Node A (newcomer) ask the
                               tracker for searching the file.

                               ② torrent download
                               Tracker provides torrent file.

                               ③ join
                               Node A queries node B.

                               ④ download
                               Node A can downloads pieces
                               of file on swarm network

                               Seeder has a complete file.
                               Leecher has pieces of file.

                                                            12
                                                    PacSec 2011
BitTorrent Network
            tracker or DHT (trackerless)

Tracker – a dedicated machine which stores torrent files,
tracks of which nodes are downloading and uploading.

DHT – decentralized network architecture to share the
functionality of the tracker. DHT is decentralized, but is
more scalable than pure-P2P.

DHT (Distributed Hash Table) is method using <key,value>
pairs. DHT lookup method enables us to discover the
location of the node who shares the responsibility of tracker
of a file share.

Recently DHT network has been paid much attention due to Dot-P2P
project and Pirates Bay’s confirmation of stopping tracker.

                                                                     13
                                                                PacSec 2011
DHT Protocol
●DHT is not new spec
Introduced to Azureus (2005) and BitCommet (2005).

●Based On Kademlia, XOR based DHT
Petar Maymounkov and David Mazières. Kademlia: A peer-
to-peer information system based on the XOR metric. In
Proceedings of the 1st International Workshop on Peer-to
Peer Systems (IPTPS '02)

●Supported by many clients apps.
uTorent 1.8.5、Vuze 4.3.0.2、BitTorrent 6.3、
BitComet 1.16、Transmission 1.76

                                                      14
                                                 PacSec 2011
DHT Protocol
●Magnet links are URLs which enables each
 node download and/or distribute contents
 without querying tracker site.

●Magnet link is provided by Pirates Bay and
 Mininova to fasten the download (base32
 encoded and hex encoding).

●2010 Pirate Bay moves to magnet-link oriented
 DHT, shutting down their server.

●Magnet link enables BitTorent network tracker-
 less ?
                                                   15
                                              PacSec 2011
DHT Protocol

DHT network is scalable architecture for file sharing system.
  Pure P2P: hundreds of thousands of nodes
  DHT: millions of nodes

BitTorrent DHT network is implemented over KRPC. KRPC
protocol is a RPC over UDP.

DHT Queries has four kinds of message: ping, find_node,
get_peers and announce_peer. Each is implemented
according to B-Encode.



                                                           16
                                                      PacSec 2011
DHT Protocol
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.

• PING : the basic query for checking the queried node is
  alive. 20-byte string. Network byte order.

• FIND_NODE : used to obtain the contact information of ID.
  Response should be a key “nodes” or the compact node
  info for the target node or the K (8) in its routing table.

  arguments: {"id" : "<querying nodes id>", "target" : "<id of
  target node>"}
   response: {"id" : "<queried nodes id>", "nodes" :
  "<compact node info>"}
                                                            17
                                                      PacSec 2011
DHT Protocol
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.


•   GET_PEERS : used to cope with a torrent infohash.
     if the queried node has peers for the infohash, response is a key
    values as a list of strings.
     if not, K nodes in the queried nodes routing table closest to the
    infohash

•   ANNOUNCE_PEER : used to announce the peer which has the
    querying node is downloading a torrent on a port.

     arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte
    infohash of target torrent>", "port" : <port number>, "token" :
    "<opaque token>"}


                                                                         18
                                                                 PacSec 2011
Monitoring system architecture

              DHT network



                                                            Reduce

DHT Crawler     DHT Crawler   DHT Crawler

                                                            Shuffle
                                       Scale out !

                                                     Map     Map          Map
              Key value store


  <key>=node ID
  <value>=data (address, port, etc)                        Dump Data
                                                                           19
                                                                      PacSec 2011
Scaling out crawlers !
                                           The response should be a key nodes of
                                           or the compact node info for the target node
                                           or the K (8) in its routing table.

                                           Info of key nodes and K(8) should be
                                           randomly distributed.

              DHT network                  So scaling out crawlers is effective way to
                                           expand monitoring range !


                                           DHT crawlers is running on virtualized
DHT Crawler    DHT Crawler   DHT Crawler   Linux image.

                                           Hypervisor is VMWare ESX which provides
               Hypervisor                  rich interface to manage crawlers.


                                                                                    20
                                                                          PacSec 2011
Hadoop & MapReduce

                                      Retrieval
                                       geoLocation
                                       domain name

                     Reduce           Translation
                                       KML (XML)

                     Shuffle          Ranking
Scale out !                            wordcount
                                       sorting
              Map     Map       Map
                                      Hadoop & MapReduce
                                      running on Linux RH



                    Dump Data
                                                            21
                                                     PacSec 2011
Rapid crawling: 24 hours to reach 10000000
                                 nodes !
                                                             node
12000000


10000000


 8000000


 6000000


 4000000


 2000000


       0
           0   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20    21    22    23    24    25    26




                                                              diff                                                                 hour
1000000




 100000




  10000
           0   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19    20    21    22    23    24    25    26

                                                                                                                                         22
                                                                                                                        PacSec 2011
Visualization & ranking
 *.*.39.201,6881,2011/9/25 23:57:43,1
 *.*.210.128,62845,2011/9/25 23:56:32,1
 *.*.33.212,6881,2011/9/25 23:33:58,1
 *.*.9.21,49924,2011/9/25 23:37:02,1




   IP address                                                                                          Time




                       Location Info
Domain name            (country, city, latlng)

                                                                                                              KML movie
                              250


                              200                                                                   Figure
                              150


                              100




                                                                                                                     23
                               50




     ranking                    0
                                    1   2   3    4   5        6   7        8   9    10   11    12

                                            GB           RU           JP       CN         US

                                                                                                               PacSec 2011
Map Reduce

                  Map             Reduce



Input             Map             Reduce         Output



                  Map             Reduce

MapReduce is the algorithm for coping with Big data.

map(key1,value) -> list<key2,value2>
reduce(key2, list<value2>) -> list<value3>

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.                                   24
                                                              PacSec 2011
Map
  *.*.194.107,h116-0-194-107.catv02.itscom.jp
  *.*.27.107,c-76-28-27-107.hsd1.ct.comcast.net
  *.*.239.181,c-68-40-239-181.hsd1.mi.comcast.net
  *.*.44.184,pool-96-253-44-184.prvdri.fios.verizon.net
  *.*.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com
  *.*.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com




*.0.194.107    hdsl1     comcast      hdsl1     comcast   verizon   virginmedia

    1            1          1           1             1      1          1



 Log string is divided into words and assigned “1”.
 key-value – {word, 1}
                                                                            25
                                                                    PacSec 2011
Reduce

*.0.194.107      hdsl1    comcast      hdsl1     comcast      verizon         virginmedia

     1             1          1            1         1             1              1




         hdsl1                      comcast                 verizon

           1                           1                       1

           1                           1


Reduce: count up 1 for each word.
Key-value – {hdsl, 2} / Key-value – {comcast, 2} / Key-value – {verizon, 1}
                                                                                   26
                                                                          PacSec 2011
Sorting and ranking

*.0.194.107      hdsl1      comcast       hdsl1      comcast       verizon      hdsl1

     1             1            1             1          1               1        1




         hdsl1                        comcast                    verizon

            1                             1                          1

            1                             1
                                                                             ③
                                                    ②
            1
                     ①
                                                                                        27
@list1 = reverse sort { (split(/¥s/,$a))[1] <=> (split(/¥s/,$b))[1] } @list1;
                                                                                PacSec 2011
# of nodes Ranking in one day
RANK                        Country   # of nodes                 Region        Domain

       1    Russia                         1,488,056   Russia             RU
       2    United states                  1,177,766   North America      US
       3    China                            815,934   East Asia          CN
       4    UK                               414,282   West Europe        GB
       5    Canada                           408,592   North America      CA
       6    Ukraine                          399,054   East Europe        UA
       7    France                           394,005   West Europe        FR
       8    India                            309,008   South Asia         IN
       9    Taiwan                           296,856   East Asia          TW
       10   Brazil                           271,417   South America      BR
       11   Japan                            262,678   East Asia          JP
       12   Romania                          233,536   East Europe        RO
       13   Bulgaria                         226,885   East Europe        BG
       14   South Korea                      217,409   East Asia          KR
       15   Australia                        216,250   Oceania            AU
       16   Poland                           184,087   East Europe        PL

       17   Sweden                           183,465   North Europe       SE
       18   Thailand                         183,008   South East Asia    TH
       19   Italy                            177,932   West Europe        IT
       20   Spain                            172,969   West Europe        ES
                                                                                        28
                                                                               PacSec 2011
visualization
      KML (Keyhole Markup Language)




■ KML is a XML-like file format for for displaying
 geographic data on Google Earth.
■ Timespan tag makes it possible to make our crawling
 log smoothly animated on Google Earth.
                                                        29
                                                 PacSec 2011
EU: 4           UK              414,282 West Europe    GB

                                                                        UK (code: GB)
                                                                        N/A 77490
                                                                        London 47559 (7550000: 0.6%)
                                                                        Manchester      9808 (441000: 2%)
                                                                        Birmingham      6617
                                                                        Leeds 5111
                                                                        Glasgow 4841
                                                                        Brighton     4788
                                                                        Liverpool    4445
                                                                        Bristol 3814
                                                                        Sheffield    3536
                                                                        Upon 3363
250
                                                                        Edinburgh     3140
200                                                                     Nottingham     2412
150
                                                                        Newcastle      2297
                                                                        Bradford     2093
100
                                                                        Tyne 2091
50                                                                      Stoke-on-trent 2021
  0
                                                                        Coventry      1965
      1   2   3    4   5    6     7    8   9    10   11    12
                                                                        Preston 1902
              GB       RU         JP       CN         US                                                30
                                                                        Reading 1814
                                                                                                  PacSec 2011
Rank 1 Russia 1,488,056
                                                              Moscow 284959 (13670000: 2%)
                                                              Saint 69220
                                                              Petersburg    69220 (4580000 : 1.5 %)
                                                              N/A 51734
                                                              Novgorod      35505 (1330000 : 2.6 %)
                                                              Yekaterinburg 31117
                                                              Velikiy 29706
                                                              Perm 28858
                                                              Tomsk 19083
                                                              Novosibirsk 18379
                                                              Voronezh      15121
                                                              Irkutsk 14943
250
                                                              Krasnoyarsk 14489
200                                                           Ufa 11823
150
                                                              Lenin 11640
                                                              Tyumen 11615
100
                                                              Penza 10665
50
                                                              Izhevsk 10259
 0                                                            Volgograd     10126 (1000000)
      1   2   3    4   5    6   7    8   9    10   11    12
                                                              Saratov 9686                       31
              GB       RU       JP       CN         US

                                                                                         PacSec 2011
Rank 1 Russia 1,488,056
                                   ru    869194
                                   pppoe 157254
                                   broadband      120719
                                   corbina 114501
                                   ertelecom     103364
                                   dynamic 78683
                                   nationalcablenetworks 34208
                                   netbynet     28339
                                   bb     28260
                                   ufanet 26827
                                   avangarddsl 26225
                                   dyn 22174
Corbina Telecom/
Корбина Телеком                    mts-nn 21939
corbina.ru/                        mtu-net 19994
                                   95     19274
Главная |                          bashtel 18588
ЭР-Телеком                         94     17260
www.ertelecom.ru/
                                   nn     15149
                                   dsl 14746
UfaNet.ru
                                   178 14292                 32
www.ufanet.ru/
                                                        PacSec 2011
Demo: observed nodes in Moscow




         10 millions of nodes
         in 24 hours !




                                      33
                                 PacSec 2011
Island in the stream: Male




           [root@localhost ranking]# geoiplookup -f
           MV, 40, Male, N/A, 4.166700, 73.500000,
           0, 0




                                                 34
                                          PacSec 2011
Island in the stream: Arue




            [root@localhost ~]# nslookup *.*.*.*
            Non-authoritative answer:
            .in-addr.arpa     name = *.*.*.*
            dsl.dyn.mana.pf.

            Authoritative answers can be found from

            PF, 00, Arue, N/A, -17.516800, -
            149.500000, 0, 0
                                                   35
                                            PacSec 2011
Rank 2 United states 1,177,76
                                                              N/A 207179
                                                              San 29263
                                                                                   ??
                                                              Dallas 18899
                                                              New 16213
                                                              Saint 11933
                                                              Houston 11401
                                                              Los 10931
                                                              Chicago 10876        25675
                                                              Fort 10845
                                                              Park 10465
                                                              Angeles 10400
250
                                                              Brooklyn    9769
                                                              York 9462
200
                                                              Lake 8885
150                                                           Miami 7575
100
                                                              Diego 7161
                                                              Francisco    6743
 50
                                                              Portland    6553
  0
      1   2   3    4   5    6   7    8   9    10   11    12
                                                              Washington    6266
              GB       RU       JP       CN         US        Las 6205
                                                                                           36
                                                              Vegas 5956
                                                                                   PacSec 2011
Rank 2 United states 1,177,766

                                            user 78494
                                            com 76803
                                            br   45945
                                            veloxzone    42333
                                            ono 27937
                                            dyn 26909
                                            84    8460
                                            users 4754
                                            81    4336
                                            ru   4266
                                            62    4189
Veloxzone                                   net 3725
veloxzone.com.br – Robtex                   85    3134
??                                          mns 2681
                                            82    2454
                                            79    2152
Operadora de telefonia
                                            212 2122
celular brasileira pertencente aos grupos
                                            vivozap 1952
Portugal Telecom e Telefonica.
                                            213 1889
??                                                                    37
                                            217 1868
                                                                 PacSec 2011
Rank 11 Japan 262,678
                                                              N/A 69648
                                                              Tokyo 54531 (13100000: 0.045)
                                                              Osaka 7430 (8860000: ??)
                                                              Yokohama      6983
                                                              Nagoya 4114
                                                              Kawasaki     3503
                                                              Fukuoka 2989
                                                              Kyoto 2875
                                                              Chiba 2443
                                                              Kobe 2409
                                                              Sapporo 2015
                                                              Shizuoka     1667
                                                              Hamamatsu      1396
250
                                                              Hiroshima    1356
200
                                                              Setagaya     1339
150
                                                              Nara 1239
100                                                           Sagamihara    1151
50                                                            Toyonaka     1089
 0                                                            Kawaguchi     1077
      1   2   3    4   5    6   7    8   9    10   11    12

              GB       RU       JP       CN         US
                                                              Tokorozawa    980          38
                                                                                   PacSec 2011
Rank 11 Japan 262,678
                                 jp   226354
                                 ne    173513
                                 ocn 52352 (8000000:0.6%)
                                 ap    38034
                                 or    22745
                                 dion 20057
                                 ppp 19918
                                 ppp-bb 17674
                                 plala 17520
                                 ad    14851
                                 mesh 11932
                                 so-net 11482
                                 eonet 11184
                                 infoweb 10615
OCN公式サイトへようこそ                    nt   9431
ocn.ne.jp                        rev 9181
                                 home 9116
auone-net
高速モバイル,光
                                 yournet 8507
インターネットサービスプロバイダ                 tokyo 7926
www.auone-net.jp                 ftth 7814                39
                                                   PacSec 2011
rank 3 China                  815,934 East Asia CN


                                                                Beijing 240419 (17500000: 1%)
                                                                Guangzhou       52981 (10330000 : 0.5 %?)
                                                                Shanghai      27399 (18580000 : 0.1%?)
                                                                Jinan 26281
                                                                N/A 24695
                                                                Chengdu 18835
                                                                Shenyang       18566
                                                                Tianjin 18460
                                                                Hebei 17414
                                                                Wuhan 15239
                                                                Hangzhou       12997
                                                                Harbin 10848
                                                                Changchun       10411
250
                                                                Nanning 10318
200
                                                                Qingdao 10257
150
                                                                Taiy・ 9573
100
                                                                Hefei 9455
 50
                                                                Changsha       6988
  0
      1   2   3    4   5    6   7    8   9    10   11    12     Chongqing      5641
              GB       RU       JP       CN         US          Shenzhen       5600                  40
                                                                                              PacSec 2011
rank 3 China   815,934 East Asia CN


                                        cn      90196
                                        com 65413
                                        dynamic 65060
                                        163data 64647
                                        broad 59136
                                        adsl-pool     17127
                                        sh      10473
                                        xw      10398
                                        net 10352
                                        sx     10196
                                        gd      9641
                                        222 9297
                                        fj    8826
dynamic.163data.com.cn                  js    8531
??                                      jlccptt 7820
                                        zj    6900
吉林省数据通信局                                117 6687
北京新网数码信息技术有限公司                          125 6532
??                                      218 6371
                                        60      6244               41
                                                              PacSec 2011
ALL cities
                       N/A 978457
                       Moscow 285097 (RU:1)
                       Beijing 240419 (CN:3)
                       Seoul 180186 (KR) (1000000:1%)
                       Taipei 161498 (TW:9)
                       Kiev 117392 (RU:1)
                       Saint 94560 (Petersburg ?)
                       Bucharest     79336 (1940000:4%)
                       Sofia 78445 (BG:13)
                       New 72424
                       Petersburg    71175 (RU:1)
                       Central 65635 (HK?)
                       District    65485 (HK?)
                       Bangkok 62882 (TH:18)
                       Delhi 62563 (IN:8)
                       Tokyo 54531 (JP:11)
                       London 53514 (GB:4)
                       Guangzhou       52981 (CN:3)
                       Athens 52656 (3680000: 1.4%)
                       Budapest      52031 (1,733,685: 3%)
                                                    42
PacSec 2011
All the world
                                     net 2676477        co    171029
                                     com 1369148        rr   170298
                                     ru    869195       res 169568
                                     dynamic 685144     ca    165639
                                     dsl 430313         hinet 162089
                                     comcast 303649     pl   160772
                                     hsd1 303626        it   151052
                                     br    244534       fr   146154
                                     jp    226366       bb    143578
                                     adsl 222170        hu    139452
                                     cable 217597       sbcglobal    135016
Comcast: High Speed Internet,        au     203850      ua    133288
Cable TV, and Phone Services Deals   dyn 200646
                                     pppoe 187455
                                     pool 183580
HiNet首頁台灣最大ISP,提供寬頻網路
                                     static 180225
                                     ne     173788
 sbcglobal.net - Network Solutions   broadband     173384
 ??
                                                                      43
                                                                PacSec 2011
Demo: flying over Eurasia




      10 millions of nodes
      in 24 hours !




                                  44
                             PacSec 2011
conclusion
In this presentation, we have shown the possibility of obtaining information of
10,000,000 nodes in 24 hours.

In current P2P and DHT network, each node can be easily monitored. And
there are many challenges and interesting topics for illegal adoption of
BitTorrent.

Our crawling system can provide the ranking of countries, cities and domain
providers.

It is shown that DHT network is actually large and scalable network !
BitTorrent has a huge potential to be alterative and unseen network architecture !




                                                                                  45
                                                                        PacSec 2011
Thank you for listening !




                            46

More Related Content

What's hot

Anonymous Network
Anonymous NetworkAnonymous Network
Anonymous Networkpauldeng
 
AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...
AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...
AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...AAKASH S
 
Vietnam Youth Internet Governance Forum 2021: Core Internet Technologies
Vietnam Youth Internet Governance Forum 2021: Core Internet TechnologiesVietnam Youth Internet Governance Forum 2021: Core Internet Technologies
Vietnam Youth Internet Governance Forum 2021: Core Internet TechnologiesAPNIC
 

What's hot (7)

Anonymous Network
Anonymous NetworkAnonymous Network
Anonymous Network
 
Lecture2
Lecture2Lecture2
Lecture2
 
AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...
AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...
AASR Authenticated Anonymous Secure Routing for MANETs in Adversarial Environ...
 
A017510102
A017510102A017510102
A017510102
 
The Onion Routing (TOR)
The Onion Routing (TOR)The Onion Routing (TOR)
The Onion Routing (TOR)
 
ONION Routing - Jovial learning
ONION Routing - Jovial learningONION Routing - Jovial learning
ONION Routing - Jovial learning
 
Vietnam Youth Internet Governance Forum 2021: Core Internet Technologies
Vietnam Youth Internet Governance Forum 2021: Core Internet TechnologiesVietnam Youth Internet Governance Forum 2021: Core Internet Technologies
Vietnam Youth Internet Governance Forum 2021: Core Internet Technologies
 

Similar to Pac sec2011 ruoando-nict-2011-11-09-01-eng

A Standalone Content Sharing Application for Spontaneous Communities of Mobil...
A Standalone Content Sharing Application for Spontaneous Communities of Mobil...A Standalone Content Sharing Application for Spontaneous Communities of Mobil...
A Standalone Content Sharing Application for Spontaneous Communities of Mobil...Amir Krifa
 
Bit torrent documentation
Bit torrent documentationBit torrent documentation
Bit torrent documentationAvula Jagadeesh
 
Bit Torrent presentation
Bit Torrent presentationBit Torrent presentation
Bit Torrent presentationAvula Jagadeesh
 
Copy Of Part 4
Copy Of Part 4Copy Of Part 4
Copy Of Part 4raeshu
 
Bit torrent protocol by milan varia
Bit torrent protocol by milan variaBit torrent protocol by milan varia
Bit torrent protocol by milan variaMilan Varia
 
Detecting BitTorrents Using Snort
Detecting BitTorrents Using SnortDetecting BitTorrents Using Snort
Detecting BitTorrents Using SnortRick Wanner
 
(130316) #fitalk bit torrent protocol
(130316) #fitalk   bit torrent protocol(130316) #fitalk   bit torrent protocol
(130316) #fitalk bit torrent protocolINSIGHT FORENSIC
 
P2P networking.pptx
P2P networking.pptxP2P networking.pptx
P2P networking.pptxWasiqMehraj2
 
Bit torrent protocol
Bit torrent protocolBit torrent protocol
Bit torrent protocolKarwan Jacksi
 
Bit Torrent technology
Bit Torrent technology Bit Torrent technology
Bit Torrent technology Parth Akbari
 
Bit torrent-technology
Bit torrent-technologyBit torrent-technology
Bit torrent-technologyabhipesit
 
UNRAVEILING BIT-TORRENT
UNRAVEILING BIT-TORRENTUNRAVEILING BIT-TORRENT
UNRAVEILING BIT-TORRENTSudhansu Dash
 
BitTorrent Protocol
BitTorrent ProtocolBitTorrent Protocol
BitTorrent ProtocolSridharBR
 
Intellectual Property Rights And The Internet
Intellectual Property Rights And The InternetIntellectual Property Rights And The Internet
Intellectual Property Rights And The InternetAdjem
 
Intellectual Property Rights And The Internet
Intellectual Property Rights And The InternetIntellectual Property Rights And The Internet
Intellectual Property Rights And The InternetAdjem
 
Intellectual Property Rights And The Internet
Intellectual Property Rights And The InternetIntellectual Property Rights And The Internet
Intellectual Property Rights And The InternetAdjem
 

Similar to Pac sec2011 ruoando-nict-2011-11-09-01-eng (20)

Bittorrent
BittorrentBittorrent
Bittorrent
 
A Standalone Content Sharing Application for Spontaneous Communities of Mobil...
A Standalone Content Sharing Application for Spontaneous Communities of Mobil...A Standalone Content Sharing Application for Spontaneous Communities of Mobil...
A Standalone Content Sharing Application for Spontaneous Communities of Mobil...
 
Bit torrent documentation
Bit torrent documentationBit torrent documentation
Bit torrent documentation
 
Bittorrent
BittorrentBittorrent
Bittorrent
 
Bit Torrent presentation
Bit Torrent presentationBit Torrent presentation
Bit Torrent presentation
 
Copy Of Part 4
Copy Of Part 4Copy Of Part 4
Copy Of Part 4
 
Bit torrent protocol by milan varia
Bit torrent protocol by milan variaBit torrent protocol by milan varia
Bit torrent protocol by milan varia
 
Bit torrent a revolution in p2p
Bit torrent a revolution in p2pBit torrent a revolution in p2p
Bit torrent a revolution in p2p
 
Bittorrent Privacy
Bittorrent PrivacyBittorrent Privacy
Bittorrent Privacy
 
Detecting BitTorrents Using Snort
Detecting BitTorrents Using SnortDetecting BitTorrents Using Snort
Detecting BitTorrents Using Snort
 
(130316) #fitalk bit torrent protocol
(130316) #fitalk   bit torrent protocol(130316) #fitalk   bit torrent protocol
(130316) #fitalk bit torrent protocol
 
P2P networking.pptx
P2P networking.pptxP2P networking.pptx
P2P networking.pptx
 
Bit torrent protocol
Bit torrent protocolBit torrent protocol
Bit torrent protocol
 
Bit Torrent technology
Bit Torrent technology Bit Torrent technology
Bit Torrent technology
 
Bit torrent-technology
Bit torrent-technologyBit torrent-technology
Bit torrent-technology
 
UNRAVEILING BIT-TORRENT
UNRAVEILING BIT-TORRENTUNRAVEILING BIT-TORRENT
UNRAVEILING BIT-TORRENT
 
BitTorrent Protocol
BitTorrent ProtocolBitTorrent Protocol
BitTorrent Protocol
 
Intellectual Property Rights And The Internet
Intellectual Property Rights And The InternetIntellectual Property Rights And The Internet
Intellectual Property Rights And The Internet
 
Intellectual Property Rights And The Internet
Intellectual Property Rights And The InternetIntellectual Property Rights And The Internet
Intellectual Property Rights And The Internet
 
Intellectual Property Rights And The Internet
Intellectual Property Rights And The InternetIntellectual Property Rights And The Internet
Intellectual Property Rights And The Internet
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Pac sec2011 ruoando-nict-2011-11-09-01-eng

  • 1. Rapid and Massive monitoring of DHT: crawling 10 millions of nodes in 24 hours PacSec Tokyo November 2011 In this presentation, we present our high-speed DHT Crawler to monitor 10 millions of nodes in 24 hours ! Ruo Ando NICT National Institute of Information and Communications Technology Takayuki Sugiura NetAgent Co. Ltd. 1
  • 2. Overview: detecting illegal adoption in huge network • BitTorrent becomes irreplaceable network application for distributing software and contents. But .. • No one can know its exact scale and dynamics ! How many nodes join and disappear in BitTorrent network in 24 hours ? • BitTorrent network is huge and no one can know about where (potential) security incidents and illegal adoption has been occurred ! • We have tackled this challenge of monitoring the largest scale network using our rapid and massive DHT crawler. • We have succeeded to obtain 10,000,000 nodes in 24 hours ! • Also, visualizing the dynamics of BitTorrent Network is presented ! 2 PacSec 2011
  • 3. Demo: observed nodes 10 millions of nodes in 24 hours ! 3 PacSec 2011
  • 4. BT: The largest file sharing network in the world. It is estimated that BitTorrent has 70 million active users and 100 million total users and it is still increasing ! 4 PacSec 2011
  • 5. BitTorrent is now expanding and everywhere ! ●BitTorrent in portable USB storage devices http://www.iodata.com/ ●Android: BitTorrent Client | aBTC Available in about $5 ! https://market.android.com/ 5 PacSec 2011
  • 6. The old new problem: illegal contents downloads BitTorrent is the one of the most efficient way to share large files such as Operating system IOS. Unfortunately, BT is at the same time a very efficient way to download protected (copyright) content sush as movies and music in illegal manner. The biggest case of BitTorrent: In 2010, United States Copyrights Group(USCG) said that 23,322 IP addresses have allegedly infringed the movie of Expendables. The settlements is around $3,000 per infringement. 6 PacSec 2011
  • 7. The case of Limewire 2010 Oct ●In 2010 Oct, A New York judge ordered LimeWire to shutdown its file-sharing software. US federal court judge issued that Limewire’s service is used as one of the software for infringement of copyright contents. ●Later soon, the new version of Limewire called LPE (Limewire Pirate Edition) has been released as resurrection by anonymous creators. 7 PacSec 2011
  • 8. Right to be deleted or forgotten? 2010 Nov: EC announced the plan for setting out strategy to strengthen EU data protection rules. EU people basically recognize the current Pervasive use of BitTorrent and its potential as promising. Also, EU people would like BitTorrent to be adopted in legal manner. 8 PacSec 2011
  • 9. Dot-P2P Domain seizures and BT based DNS ●2010 June: WikiLeaks leverages torrent and magnet links for distributing files. ●U.S. Immigration and Customs Enforcement (ICE) seizures the site domain of BT meta search engine. ●U.S proposed Combating Online Infringement and Counterfeits Act’ (COICA) which would allow the Department of Justice to order the domain register to take the domain offline. COICA will be aimed to increase the government’s censorship powers. ●In a direct response to the domain seizures by US authorities, Dot- P2P project proposes ICANN or IPS independent DNS service. ●In Dot-P2P system, a request for .p2p TLD is redirected to a locally hosted DNS database. The traffic is encrypted and sent according to the BitTorrent protocol which result in that .p2p TLD is decentralized and independent of ICANN or any IPS’s DNS service. 9 PacSec 2011
  • 10. BitTorrent History The implementation of BitTorrent has been started by Bram Cohen in 2001. He has released client software in 2003. In 2003, a user in EU has released ISO image of Red Hat and the 30,000 image has been downloaded in 3 days. In 2004, he had formed BitTorrent Inc and by mid 2005, BitTorrent Inc was funded by VC. 10 PacSec 2011
  • 11. BitTorrent Traffic estimations ① “55%” - CableLabs About an half of upstream traffic of CATV. ② “35%” - CacheLogic “LIVEWIRE - File-sharing network thrives beneath the Radar” ③ “60%” - documents in www.sans.edu “It is estimated that more than 60% of the traffic on the internet is peer-to-peer.” 11 PacSec 2011
  • 12. Basic architecture of tracker network ① Ask Node A (newcomer) ask the tracker for searching the file. ② torrent download Tracker provides torrent file. ③ join Node A queries node B. ④ download Node A can downloads pieces of file on swarm network Seeder has a complete file. Leecher has pieces of file. 12 PacSec 2011
  • 13. BitTorrent Network tracker or DHT (trackerless) Tracker – a dedicated machine which stores torrent files, tracks of which nodes are downloading and uploading. DHT – decentralized network architecture to share the functionality of the tracker. DHT is decentralized, but is more scalable than pure-P2P. DHT (Distributed Hash Table) is method using <key,value> pairs. DHT lookup method enables us to discover the location of the node who shares the responsibility of tracker of a file share. Recently DHT network has been paid much attention due to Dot-P2P project and Pirates Bay’s confirmation of stopping tracker. 13 PacSec 2011
  • 14. DHT Protocol ●DHT is not new spec Introduced to Azureus (2005) and BitCommet (2005). ●Based On Kademlia, XOR based DHT Petar Maymounkov and David Mazières. Kademlia: A peer- to-peer information system based on the XOR metric. In Proceedings of the 1st International Workshop on Peer-to Peer Systems (IPTPS '02) ●Supported by many clients apps. uTorent 1.8.5、Vuze 4.3.0.2、BitTorrent 6.3、 BitComet 1.16、Transmission 1.76 14 PacSec 2011
  • 15. DHT Protocol ●Magnet links are URLs which enables each node download and/or distribute contents without querying tracker site. ●Magnet link is provided by Pirates Bay and Mininova to fasten the download (base32 encoded and hex encoding). ●2010 Pirate Bay moves to magnet-link oriented DHT, shutting down their server. ●Magnet link enables BitTorent network tracker- less ? 15 PacSec 2011
  • 16. DHT Protocol DHT network is scalable architecture for file sharing system. Pure P2P: hundreds of thousands of nodes DHT: millions of nodes BitTorrent DHT network is implemented over KRPC. KRPC protocol is a RPC over UDP. DHT Queries has four kinds of message: ping, find_node, get_peers and announce_peer. Each is implemented according to B-Encode. 16 PacSec 2011
  • 17. DHT Protocol There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE. • PING : the basic query for checking the queried node is alive. 20-byte string. Network byte order. • FIND_NODE : used to obtain the contact information of ID. Response should be a key “nodes” or the compact node info for the target node or the K (8) in its routing table. arguments: {"id" : "<querying nodes id>", "target" : "<id of target node>"} response: {"id" : "<queried nodes id>", "nodes" : "<compact node info>"} 17 PacSec 2011
  • 18. DHT Protocol There are four kinds of messages of BitTorrent DHT Network: PING, STORE, FIND_NODE and FIND VALUE. • GET_PEERS : used to cope with a torrent infohash. if the queried node has peers for the infohash, response is a key values as a list of strings. if not, K nodes in the queried nodes routing table closest to the infohash • ANNOUNCE_PEER : used to announce the peer which has the querying node is downloading a torrent on a port. arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"} 18 PacSec 2011
  • 19. Monitoring system architecture DHT network Reduce DHT Crawler DHT Crawler DHT Crawler Shuffle Scale out ! Map Map Map Key value store <key>=node ID <value>=data (address, port, etc) Dump Data 19 PacSec 2011
  • 20. Scaling out crawlers ! The response should be a key nodes of or the compact node info for the target node or the K (8) in its routing table. Info of key nodes and K(8) should be randomly distributed. DHT network So scaling out crawlers is effective way to expand monitoring range ! DHT crawlers is running on virtualized DHT Crawler DHT Crawler DHT Crawler Linux image. Hypervisor is VMWare ESX which provides Hypervisor rich interface to manage crawlers. 20 PacSec 2011
  • 21. Hadoop & MapReduce Retrieval geoLocation domain name Reduce Translation KML (XML) Shuffle Ranking Scale out ! wordcount sorting Map Map Map Hadoop & MapReduce running on Linux RH Dump Data 21 PacSec 2011
  • 22. Rapid crawling: 24 hours to reach 10000000 nodes ! node 12000000 10000000 8000000 6000000 4000000 2000000 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 diff hour 1000000 100000 10000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 22 PacSec 2011
  • 23. Visualization & ranking *.*.39.201,6881,2011/9/25 23:57:43,1 *.*.210.128,62845,2011/9/25 23:56:32,1 *.*.33.212,6881,2011/9/25 23:33:58,1 *.*.9.21,49924,2011/9/25 23:37:02,1 IP address Time Location Info Domain name (country, city, latlng) KML movie 250 200 Figure 150 100 23 50 ranking 0 1 2 3 4 5 6 7 8 9 10 11 12 GB RU JP CN US PacSec 2011
  • 24. Map Reduce Map Reduce Input Map Reduce Output Map Reduce MapReduce is the algorithm for coping with Big data. map(key1,value) -> list<key2,value2> reduce(key2, list<value2>) -> list<value3> MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. 24 PacSec 2011
  • 25. Map *.*.194.107,h116-0-194-107.catv02.itscom.jp *.*.27.107,c-76-28-27-107.hsd1.ct.comcast.net *.*.239.181,c-68-40-239-181.hsd1.mi.comcast.net *.*.44.184,pool-96-253-44-184.prvdri.fios.verizon.net *.*.170.168,cpc11-stok15-2-0-cust167.1-4.cable.virginmedia.com *.*.23.81,cpc2-stkn10-0-0-cust848.11-2.cable.virginmedia.com *.0.194.107 hdsl1 comcast hdsl1 comcast verizon virginmedia 1 1 1 1 1 1 1 Log string is divided into words and assigned “1”. key-value – {word, 1} 25 PacSec 2011
  • 26. Reduce *.0.194.107 hdsl1 comcast hdsl1 comcast verizon virginmedia 1 1 1 1 1 1 1 hdsl1 comcast verizon 1 1 1 1 1 Reduce: count up 1 for each word. Key-value – {hdsl, 2} / Key-value – {comcast, 2} / Key-value – {verizon, 1} 26 PacSec 2011
  • 27. Sorting and ranking *.0.194.107 hdsl1 comcast hdsl1 comcast verizon hdsl1 1 1 1 1 1 1 1 hdsl1 comcast verizon 1 1 1 1 1 ③ ② 1 ① 27 @list1 = reverse sort { (split(/¥s/,$a))[1] <=> (split(/¥s/,$b))[1] } @list1; PacSec 2011
  • 28. # of nodes Ranking in one day RANK Country # of nodes Region Domain 1 Russia 1,488,056 Russia RU 2 United states 1,177,766 North America US 3 China 815,934 East Asia CN 4 UK 414,282 West Europe GB 5 Canada 408,592 North America CA 6 Ukraine 399,054 East Europe UA 7 France 394,005 West Europe FR 8 India 309,008 South Asia IN 9 Taiwan 296,856 East Asia TW 10 Brazil 271,417 South America BR 11 Japan 262,678 East Asia JP 12 Romania 233,536 East Europe RO 13 Bulgaria 226,885 East Europe BG 14 South Korea 217,409 East Asia KR 15 Australia 216,250 Oceania AU 16 Poland 184,087 East Europe PL 17 Sweden 183,465 North Europe SE 18 Thailand 183,008 South East Asia TH 19 Italy 177,932 West Europe IT 20 Spain 172,969 West Europe ES 28 PacSec 2011
  • 29. visualization KML (Keyhole Markup Language) ■ KML is a XML-like file format for for displaying geographic data on Google Earth. ■ Timespan tag makes it possible to make our crawling log smoothly animated on Google Earth. 29 PacSec 2011
  • 30. EU: 4 UK 414,282 West Europe GB UK (code: GB) N/A 77490 London 47559 (7550000: 0.6%) Manchester 9808 (441000: 2%) Birmingham 6617 Leeds 5111 Glasgow 4841 Brighton 4788 Liverpool 4445 Bristol 3814 Sheffield 3536 Upon 3363 250 Edinburgh 3140 200 Nottingham 2412 150 Newcastle 2297 Bradford 2093 100 Tyne 2091 50 Stoke-on-trent 2021 0 Coventry 1965 1 2 3 4 5 6 7 8 9 10 11 12 Preston 1902 GB RU JP CN US 30 Reading 1814 PacSec 2011
  • 31. Rank 1 Russia 1,488,056 Moscow 284959 (13670000: 2%) Saint 69220 Petersburg 69220 (4580000 : 1.5 %) N/A 51734 Novgorod 35505 (1330000 : 2.6 %) Yekaterinburg 31117 Velikiy 29706 Perm 28858 Tomsk 19083 Novosibirsk 18379 Voronezh 15121 Irkutsk 14943 250 Krasnoyarsk 14489 200 Ufa 11823 150 Lenin 11640 Tyumen 11615 100 Penza 10665 50 Izhevsk 10259 0 Volgograd 10126 (1000000) 1 2 3 4 5 6 7 8 9 10 11 12 Saratov 9686 31 GB RU JP CN US PacSec 2011
  • 32. Rank 1 Russia 1,488,056 ru 869194 pppoe 157254 broadband 120719 corbina 114501 ertelecom 103364 dynamic 78683 nationalcablenetworks 34208 netbynet 28339 bb 28260 ufanet 26827 avangarddsl 26225 dyn 22174 Corbina Telecom/ Корбина Телеком mts-nn 21939 corbina.ru/ mtu-net 19994 95 19274 Главная | bashtel 18588 ЭР-Телеком 94 17260 www.ertelecom.ru/ nn 15149 dsl 14746 UfaNet.ru 178 14292 32 www.ufanet.ru/ PacSec 2011
  • 33. Demo: observed nodes in Moscow 10 millions of nodes in 24 hours ! 33 PacSec 2011
  • 34. Island in the stream: Male [root@localhost ranking]# geoiplookup -f MV, 40, Male, N/A, 4.166700, 73.500000, 0, 0 34 PacSec 2011
  • 35. Island in the stream: Arue [root@localhost ~]# nslookup *.*.*.* Non-authoritative answer: .in-addr.arpa name = *.*.*.* dsl.dyn.mana.pf. Authoritative answers can be found from PF, 00, Arue, N/A, -17.516800, - 149.500000, 0, 0 35 PacSec 2011
  • 36. Rank 2 United states 1,177,76 N/A 207179 San 29263 ?? Dallas 18899 New 16213 Saint 11933 Houston 11401 Los 10931 Chicago 10876 25675 Fort 10845 Park 10465 Angeles 10400 250 Brooklyn 9769 York 9462 200 Lake 8885 150 Miami 7575 100 Diego 7161 Francisco 6743 50 Portland 6553 0 1 2 3 4 5 6 7 8 9 10 11 12 Washington 6266 GB RU JP CN US Las 6205 36 Vegas 5956 PacSec 2011
  • 37. Rank 2 United states 1,177,766 user 78494 com 76803 br 45945 veloxzone 42333 ono 27937 dyn 26909 84 8460 users 4754 81 4336 ru 4266 62 4189 Veloxzone net 3725 veloxzone.com.br – Robtex 85 3134 ?? mns 2681 82 2454 79 2152 Operadora de telefonia 212 2122 celular brasileira pertencente aos grupos vivozap 1952 Portugal Telecom e Telefonica. 213 1889 ?? 37 217 1868 PacSec 2011
  • 38. Rank 11 Japan 262,678 N/A 69648 Tokyo 54531 (13100000: 0.045) Osaka 7430 (8860000: ??) Yokohama 6983 Nagoya 4114 Kawasaki 3503 Fukuoka 2989 Kyoto 2875 Chiba 2443 Kobe 2409 Sapporo 2015 Shizuoka 1667 Hamamatsu 1396 250 Hiroshima 1356 200 Setagaya 1339 150 Nara 1239 100 Sagamihara 1151 50 Toyonaka 1089 0 Kawaguchi 1077 1 2 3 4 5 6 7 8 9 10 11 12 GB RU JP CN US Tokorozawa 980 38 PacSec 2011
  • 39. Rank 11 Japan 262,678 jp 226354 ne 173513 ocn 52352 (8000000:0.6%) ap 38034 or 22745 dion 20057 ppp 19918 ppp-bb 17674 plala 17520 ad 14851 mesh 11932 so-net 11482 eonet 11184 infoweb 10615 OCN公式サイトへようこそ nt 9431 ocn.ne.jp rev 9181 home 9116 auone-net 高速モバイル,光 yournet 8507 インターネットサービスプロバイダ tokyo 7926 www.auone-net.jp ftth 7814 39 PacSec 2011
  • 40. rank 3 China 815,934 East Asia CN Beijing 240419 (17500000: 1%) Guangzhou 52981 (10330000 : 0.5 %?) Shanghai 27399 (18580000 : 0.1%?) Jinan 26281 N/A 24695 Chengdu 18835 Shenyang 18566 Tianjin 18460 Hebei 17414 Wuhan 15239 Hangzhou 12997 Harbin 10848 Changchun 10411 250 Nanning 10318 200 Qingdao 10257 150 Taiy・ 9573 100 Hefei 9455 50 Changsha 6988 0 1 2 3 4 5 6 7 8 9 10 11 12 Chongqing 5641 GB RU JP CN US Shenzhen 5600 40 PacSec 2011
  • 41. rank 3 China 815,934 East Asia CN cn 90196 com 65413 dynamic 65060 163data 64647 broad 59136 adsl-pool 17127 sh 10473 xw 10398 net 10352 sx 10196 gd 9641 222 9297 fj 8826 dynamic.163data.com.cn js 8531 ?? jlccptt 7820 zj 6900 吉林省数据通信局 117 6687 北京新网数码信息技术有限公司 125 6532 ?? 218 6371 60 6244 41 PacSec 2011
  • 42. ALL cities N/A 978457 Moscow 285097 (RU:1) Beijing 240419 (CN:3) Seoul 180186 (KR) (1000000:1%) Taipei 161498 (TW:9) Kiev 117392 (RU:1) Saint 94560 (Petersburg ?) Bucharest 79336 (1940000:4%) Sofia 78445 (BG:13) New 72424 Petersburg 71175 (RU:1) Central 65635 (HK?) District 65485 (HK?) Bangkok 62882 (TH:18) Delhi 62563 (IN:8) Tokyo 54531 (JP:11) London 53514 (GB:4) Guangzhou 52981 (CN:3) Athens 52656 (3680000: 1.4%) Budapest 52031 (1,733,685: 3%) 42 PacSec 2011
  • 43. All the world net 2676477 co 171029 com 1369148 rr 170298 ru 869195 res 169568 dynamic 685144 ca 165639 dsl 430313 hinet 162089 comcast 303649 pl 160772 hsd1 303626 it 151052 br 244534 fr 146154 jp 226366 bb 143578 adsl 222170 hu 139452 cable 217597 sbcglobal 135016 Comcast: High Speed Internet, au 203850 ua 133288 Cable TV, and Phone Services Deals dyn 200646 pppoe 187455 pool 183580 HiNet首頁台灣最大ISP,提供寬頻網路 static 180225 ne 173788 sbcglobal.net - Network Solutions broadband 173384 ?? 43 PacSec 2011
  • 44. Demo: flying over Eurasia 10 millions of nodes in 24 hours ! 44 PacSec 2011
  • 45. conclusion In this presentation, we have shown the possibility of obtaining information of 10,000,000 nodes in 24 hours. In current P2P and DHT network, each node can be easily monitored. And there are many challenges and interesting topics for illegal adoption of BitTorrent. Our crawling system can provide the ranking of countries, cities and domain providers. It is shown that DHT network is actually large and scalable network ! BitTorrent has a huge potential to be alterative and unseen network architecture ! 45 PacSec 2011
  • 46. Thank you for listening ! 46