8th NCM: 2012 International Conference on Networked Computing and Advanced Information Management
http://www.aicit.org/ncm
April 24-26, 2012, Grand Hilton Hotel (Seoul), Korea
chaitra-1.pptx fake news detection using machine learning
Ncm 2012 Ruo Ando
1. Blink II: A node ranking system of
DHT network using Map Reduce
Framework
Ruo Ando, Akihiko Shinohara and Takayuki Sugiura
NICT National Institute of Information and Communication
Technology
NetAgent Co. Ltd
2. 2
Overview:
detecting illegal adoption in huge network
• BitTorrent becomes irreplaceable network application for
distributing software and contents. But ..
• No one can know its exact scale and dynamics !
How many nodes join and disappear in BitTorrent network in
24 hours ?
• We have tackled this challenge of monitoring the largest scale
network using our rapid and massive DHT crawler.
• We have succeeded to obtain 10,000,000 nodes in 24 hours !
• Also, ranking of countries and cities about BitTorrent Network
is presented.
3. BitTorrent Traffic estimations
“① 55%” - CableLabs
About an half of upstream traffic of CATV.
“② 35%” - CacheLogic
“LIVEWIRE - File-sharing network thrives beneath
the Radar”
“③ 60%” - documents in www.sans.edu
“It is estimated that more than 60% of the traffic on
the internet is peer-to-peer.”
4. Proposed system architecture for monitoring large scale networks
DHT network
DHT Crawler
Key value store
Dump Data
DHT Crawler DHT Crawler
<key>=node ID
<value>=data (address, port, etc)
Map Map Map
Shuffle
Reduce
Scale out !
6. Basic architecture of tracker network
① Ask
Node A (newcomer) ask the
tracker for searching the file.
② torrent download
Tracker provides torrent file.
③ join
Node A queries node B.
④ download
Node A can downloads pieces
of file on swarm network
Seeder has a complete file.
Leecher has pieces of file.
PacSec 2011
7. BitTorrent Network
Tracker or DHT (trackerless)
Tracker – a dedicated machine which stores torrent files,
tracks of which nodes are downloading and uploading.
DHT – decentralized network architecture to share the
functionality of the tracker. DHT is decentralized, but is
more scalable than pure-P2P.
DHT (Distributed Hash Table) is method using <key,value>
pairs. DHT lookup method enables us to discover the
location of the node who shares the responsibility of tracker
of a file share.
Recently DHT network has been paid much attention due to Dot-P2P
project and Pirates Bay’s confirmation of stopping tracker.
8. DHT Protocol
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.
• PING : the basic query for checking the queried node is
alive. 20-byte string. Network byte order.
• FIND_NODE : used to obtain the contact information of
ID. Response should be a key “nodes” or the compact
node info for the target node or the K (8) in its routing
table.
arguments: {"id" : "<querying nodes id>", "target" : "<id of
target node>"}
response: {"id" : "<queried nodes id>", "nodes" :
"<compact node info>"}
PacSec 2011
9. DHT Protocol
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.
• GET_PEERS : used to cope with a torrent infohash.
if the queried node has peers for the infohash, response is a key
values as a list of strings.
if not, K nodes in the queried nodes routing table closest to the
infohash
• ANNOUNCE_PEER : used to announce the peer which has the
querying node is downloading a torrent on a port.
arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte
infohash of target torrent>", "port" : <port number>, "token" :
"<opaque token>"}
PacSec 2011
10. DHT network crawling
There are four kinds of messages of BitTorrent DHT
Network: PING, STORE, FIND_NODE and FIND VALUE.
PING : the basic query for checking the queried node is alive. 20-byte string.
Network byte order.
FIND_NODE : used to obtain the contact information of ID. Response should be a
key “nodes” or the compact node info for the target node or the K (8) in its routing
table.
12. Map Reduce
Input
Map
Map
Map
Reduce
MapReduce is the algorithm for coping with Big data.
map(key1,value) -> list<key2,value2>
reduce(key2, list<value2>) -> list<value3>
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.
Reduce
Reduce
Output
13. Map Reduce
Input
Map
PacSec 2011
Map
Map
Reduce
MapReduce is the algorithm for coping with Big data.
map(key1,value) -> list<key2,value2>
reduce(key2, list<value2>) -> list<value3>
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.
Reduce
Reduce
Output
16. Sorting and ranking
hdsl1*.0.194.107 comcast verizon hdsl1hdsl1 comcast
1 1 1 1 1 1 1
@list1 = reverse sort { (split(/s/,$a))[1] <=> (split(/s/,$b))[1] } @list1;
hdsl1 comcast
1
1
1
1
1
verizon
1
①
②
③
PacSec 2011
17. # of nodes Ranking in one day
RANK Country # of nodes Region Domain
1 Russia 1,488,056 Russia RU
2 United states 1,177,766 North America US
3 China 815,934 East Asia CN
4 UK 414,282 West Europe GB
5 Canada 408,592 North America CA
6 Ukraine 399,054 East Europe UA
7 France 394,005 West Europe FR
8 India 309,008 South Asia IN
9 Taiwan 296,856 East Asia TW
10 Brazil 271,417 South America BR
11 Japan 262,678 East Asia JP
12 Romania 233,536 East Europe RO
13 Bulgaria 226,885 East Europe BG
14 South Korea 217,409 East Asia KR
15 Australia 216,250 Oceania AU
16 Poland 184,087 East Europe PL
17 Sweden 183,465 North Europe SE
18 Thailand 183,008 South East Asia TH
19 Italy 177,932 West Europe IT
20 Spain 172,969 West Europe ES
18. EU: 4 UK 414,282 West Europe GB
UK (code: GB)
N/A 77490
London 47559 (7550000: 0.6%)
Manchester 9808 (441000: 2%)
Birmingham 6617
Leeds 5111
Glasgow 4841
Brighton 4788
Liverpool 4445
Bristol 3814
Sheffield 3536
Upon 3363
Edinburgh 3140
Nottingham 2412
Newcastle 2297
Bradford 2093
Tyne 2091
Stoke-on-trent 2021
Coventry 1965
Preston 1902
Reading 1814
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12
GB RU J P CN US
21. City # country # of country population
1 Moscow 285097 Russia 1,488,056 1367
2 Beijing 240419 China 815,934 1755
3 Seoul 180186 Korea 217409 970
4 Saint Pertergburg 165735 Russia 1,488,056
5 Taipei 161498 Taiwan 296856 265
6 Hong Kong 130920 Hong Kong
7 Kiev 117392 Ukraine 399,054 251
8 Bucharest 79336 Romania 233,536 194
9 Sofia 78445 Bulgaria 226,885 126
10 Bangkok 62882 Thailand 183,008 687
11 Delhi 62563 India 309,008 2099
12 Tokyo 54531 Japan 262,678 1300
13 London 53514 England 414,282 755
14 Guangzhou 52981 China 815,934 1004
15 Athens 52656 Greece 300
Ranking of Cities
22. 22
Conclusion
detecting illegal adoption in huge network
• BitTorrent becomes irreplaceable network application for
distributing software and contents. But ..
• No one can know its exact scale and dynamics !
How many nodes join and disappear in BitTorrent network in
24 hours ?
• We have tackled this challenge of monitoring the largest scale
network using our rapid and massive DHT crawler.
• We have succeeded to obtain 10,000,000 nodes in 24 hours !
• Also, ranking of countries and cities about BitTorrent Network
is presented.