Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Hades_poster_Comad
1. Hades: A Hadoop-based Framework for Detection of Peer-to-Peer Botnets
Pratik Narang, Abhishek Thakur, Chittaranjan Hota
Birla Institute of Technology & Science – Pilani, Hyderabad Campus
Hyderabad, Telangana – 5000 78, India
Abstract
This paper presents Hades, a Hadoop-based framework for
detection of P2P botnets in an enterprise-level network, which is
distributed and scalable by design.
Our work uses the Hadoop-ecosystem to adopt a ‘host-
aggregation based’ approach which aggregates behavioral
metrics for each P2P host seen in network communications,
and uses them to distinguish between benign P2P hosts and
hosts infected by P2P botnets.
We propose a distributed data-collection architecture which can
monitor inside-to-inside LAN traffic, as opposed to relying solely
on the NetFlow information available at a backbone router
which cannot see the LAN communications happening in the
network.
…
Data nodes
P2P bots
detected
Name node
2. Parse Packets with
Tshark
5. Feature set evaluated
against models built with
Mahout
4. Host-based features
extracted
with Hive
3. Push data to HDFS
1. Data collection
Trigger Firewall rules
Distributed Systems Lab Student Hostels
0
10
20
30
40
50
60
70
80
90
100
True Positive False Positive True Positive False Positive
Training Testing
Botnet Benign
Features extracted per host
Number of distinct destination hosts contacted
The total volume of data sent from the source host
The average of the TTL value of the packets sent from the
source host
Hive Queries
CREATE EXTERNAL TABLE packet_data (
timestamp DECIMAL, ip_source STRING,
ip_dest STRING, ttl INT,
proto INT, payload_length INT )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘/user/hdfs/PacketDump’;
CREATE TABLE host_data (
host STRING, destinations DECIMAL,
avg_ttl DECIMAL, volume BIGINT )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘n’ STORED AS TEXTFILE;
INSERT INTO TABLE host_data
SELECT ip_source, COUNT (DISTINCT ip_dest),
AVG(ttl), SUM(payload_length)
FROM packet_data
GROUP BY ip_source;
References
1. J. Zhang, R. Perdisci, W. Lee, X. Luo, and U. Sarfraz.,
Building a scalable system for stealthy p2p-botnet
detection. Information Forensics and Security, IEEE
Transactions on, 9(1):27-38, 2014.
2. T.-F. Yen and M. K. Reiter. Are your hosts trading or
plotting? telling p2p file-sharing and bots apart. In
Proceedings of the 2010 30th International Conference on
Distributed Computing Systems, ICDCS '10, pages 241-
252. IEEE, 2010.
3. P. Narang, C. Hota, and V.N. Venkatakrishnan. Peershark:
flow-clustering and conversation-generation for malicious
peer-to-peer traffic identification. EURASIP Journal on
Information Security, 2014(1):1-12, 2014.
Results with Random Forests over Mahout
This research work was supported by grants from Department of Information Technology, Government of India
Datasets
Data type .pcap format .csv format
P2P Botnet 14 GB 6.6 GB
P2P Benign 6 GB 3.9 GB