2. they are also called as active Honeypots or Honeyclient. Both
of the Honeypot Technology has emerged as a widely
research areas in the field of cyber security. Below diagram
depicts the classifications of honeypots.
Fig 1. Classification of Honeypots
There are some aspects differentiate client honeypot from
Traditional server honeypot: [7][8].
• Client-side: it simulates/drives client-side software
and does not expose server based services to be
attacked.
• Active: it cannot entice attacks to itself, but rather it
must actively initiate interaction with remote servers
to be attacked.
• Identifying: whereas all accesses to the traditional
honeypot are malicious by default, client-side
honeypot must discern which server is malicious and
which is benign.
3. PROPOSED SYSTEM DESIGN
In this section, first we discuss the proposed integrated System
including client honeypot and server honeypots. The complete
system is automated one including the analysis of the
collected malwares. As client honeypots are very useful for
the collection of malwares which target the client side
applications like browser (Mozilla browser, IE etc), pdf etc. In
our proposed approach both the client honeypots and server
honeypots are being controlled by single centralized server
known as active controller. All the honeypots are running in
virtualized environment using open source Virtual Box [9].
Server honeypots are combinations of low and high
interaction honeypots; for low interaction we have chosen
nepenthes honeypot because it is easy to install and monitor.
Similarly for high interaction server honeypots , we are using
various operating systems like Windows XP, Windows 2003
etc. Configurations to the server honeypots and URL lists to
the client honeypots are provided by the server called active
honeypot controller. In proposed system, there are basically
five functional components: client honeypots, server
honeypots, honeypot controller, and management and analysis
servers. There are three modules in client honeypots known as
queuer, visitor and analysis engine. For queuer, we are
inserting the URLs list into database which is being executed
by the visitor component of the honeyclient. At present as a
visitor module, we are browsing the URLs using Internet
Explore. Further as analysis modules, we are doing state based
detection based Real time file system monitoring, we are
cleaning the honeypot image after 2 minutes and any activity
which takes place on honeypots will be kind of malicious
activity.
Honeywall
Gateway
Management
server
URL data
Store
Active
Honeypot
Client 1
Active
Honeypot
Client 2
Active
Honeypot
Client 3
Active
Honeypot
Client n
Active Honeypot
Controller
Server
Honeypot
1
Server
Honeypot
2
Server
Honeypot
3
Central Database
Server
Configuration
Sender
Controller
Analysis
server
Integrated framework Architecture
Server
Honeypot
n
Fig 2. Integrated Framework Architecture
As depicted in the integrated framework for malware
collections and analysis, there are 5 components:
1. URL data source
2. Honeypot controller
3. Central Database
4. Analysis Server
5. Management Server
URL Data source:
We are collecting the URL by using traditional web crawler
and submitting those URLs as inputs to static analyzer
developed by us. By using static analyzer, we are able to
collect likely malicious URLs which are further submitted to
honeyclient. Initially we are collecting URLs from different
sources such as Internet, user-agency which are further
submitted to crawler to extract the number of URLs. After
extraction, there is a list of URLs which we are feeding to
honeyclient solution. By using different applications and
various environments, we are visiting the extracted URLs in
honeyclient using real browser applications.
Honeypot Controller:
Honeypots Classifications
Server Honeypot Client Honeypot
Low Interaction High Interaction
3. Honeypot controller is a program by which we are controlling
both kind of honeypots aka server honeypots and client
honeypots. The basic functionality of the honeypot controller
is:
• Send the configuration to the Server Honeypots.
• Collect the Captured PCAP Data and Binary files
from the Server Honeypots.
• Store the Binary Files and captured details in
Database.
• Send the Configuration and URL’s to the Client
Honeypots
• Collect the Captured Data (Binary) and details
regarding the Executed URL’s.
• Stores the Captured Binary and details regarding the
Executed URL’s.
Below is the generic algorithmic steps used in our algorithm
which is running at Honeypot Controller:
1. Server retrieves the URLs from database
If Current_URL! =Previous_URL
{
a. Send URLs to Active Honeypots and make
the virtual machine up.
b. Active Honeypot start the AHP scripts
c. AHP scripts receive the URLs and start the
analyser module.
d. Analyser excludes the exclusion list which
not being monitored.
e. Analyser starts the monitoring module.
f. Start the visitor module for visitation of
Current_URL.
g. Client sends back the result to Honeypot
controller
h. Clean the machine.
}
Else
{
Retrieve next URL from database.
Current_URL=Next_URL
}
2. Repeat step 1.
Central Database:
We have developed central database schema for storage of
extracted URLs and execution results of all URLs such as
network dumps, malware collected after execution. . All
collected malware as well as logs are being stored at our
central database server. All the executables which are
potentially malwares are stored in the database. We are storing
malware binaries which are unique according to their MD5
value and stored as binary fields in central database. For the
database implementations, we are using the MySQL.
Analysis Server:
Now we come across our analysis server on which we are
running our code for further analysis of malware binaries. On
this server, we are executing the malware executables and
monitor the behavior of the executions. Malware analysis can
be performed in two ways as static analysis and dynamic
analysis. Here we are performing dynamic execution based
malware analysis. We are using deep packet based (DPI)
algorithm for analysis of PCAP dump file for signature based
analysis. We are also performing dynamic analysis for
collected unknown malwares. We have developed the java
based payload parser which is a application and it will
automatically extract the signatures corresponding to bot and
botnet malwares.
Firstly, we are scanning all the collected malwares by popular
anti-viruses for labeling and for classifications of malwares.
Also for unknown class of malwares as said , we are
performing dynamic analysis by monitoring the execution
traces.
Management Server:
Lastly there is a Management Server on which developed
graphical user interface (GUI) is running. Collected malware
sample, PCAP data as well as other logs we are directly
displaying on GUI.
The complete flow of the integrated framework has been
depicted below in figure 3. For the client honeypots, data
source can be taken from web space or some other sources.
Static
Analyser
Honeypot Controller
DB
Presentation
GUI
Server
Honeypot 1
Server
Honeypot 2
Crawler
Monitor Execution
Analysis
Server
Honeypot n
High Interaction
Client Honeypot
Malicious Page
Repository
Low Interaction
Client Honeypot
Malware
Binary
Data Source
Flow Diagram
Suspicious URL
Fig 3. Flow Diagram of Integrated Framework
4. For the collection of web page URLs, we use a crawler and
store them in a database. We are only submitting the likely
suspicious URLs to client honeypots in which we are
browsing the URLs. For this, we have developed our own
machine learning based static analyzer which classifies the
URLs into expected malicious URLs and benign URLs. Then
we are submitting those potential suspicious URLs to Virtual
Honeypots.
We are also developed the malicious page repository to store
the malicious pages and corresponding binaries if downloaded
on the system. Further those URLs which are declared as
benign URLs by the high interaction honeyclient, we are
submitting them to low interaction honeyclient to detect shell
codes present in the URLs.
EXPERIMENTAL RESULTS
Below table signify some results which signify the detection
of malicious URLs by honeyclient as well as collected
malwares samples. Column 1 represents the number of URLs
executed, column 2 represents the source of URLs taken,
column 3 signify the application used in browsing of URLs,
column 4,5 and 6 represents the detected malicious URLs,
benign URLs and others URL which are neither malicious nor
benign URLs. Others types of URLs are either giving errors in
execution or page is not displayed.
URL Source
of
URLs
Application
Used
Detected
Malicious
URLs
Benign
URLs
Other
1474 User-
Agency
IE
6.0,Adobe
Reader8
98 1021 355
14 Botnet
[11]
URLs
IE
6.0,Adobe
Reader8
1 9 4
391 Spyeye
Blacklis
t
Domain
IE
6.0,Adobe
Reader8
- 71 320
551 Zeus
Blacklis
t
Domain
IE
6.0,Adobe
Reader8
- 107 444
1327 Internet IE
6.0,Adobe
Reader8
152 190 985
737 User
Agency
IE 12 578 147
955 Internet IE 152 190 613
445 User
Agency
IE 11 332 102
3592 Malwar
e.com.b
r
IE 27 1158 2407
Table 1: URL execution results
Further we have executed millions of URLs on our developed
framework to detect malicious websites. Below table depicts
few malicious URLs extracted from list of submitted URLs.
url
http://x.x.x.x///im/tt/1.exe
http://204.3.x.x///image/view/videos_plug-
visualizar=0000Jan2012_.exe
http://203.95.x.x///im/tt/cool.exe
http://xyz///1.exe
http://webdownghost.narod.ru///del031.exe
http://webdownghost.narod.ru///del100.exe
http://webdownghost.narod.ru///hld.exe
http://webdownghost.narod.ru///sfk.exe
http://webdownghost.narod.ru///up151.exe
http://gehgoaz.eu///w.php?f=182b5&e=2
http://francoabdo.sitesled.com///down_server.exe
http://francoabdo.sitesled.com///server.exe
Table 2. Few Malicious URLs
As we are also submitting the MD5 of collected malwares
sample to popular anti-viruses for labeling and scanning
purposes. Below table signify some of malwares which are not
detected by popular anti-viruese. As last column depict the
malwares samples which are not detected by anti-viruses and
these are the good examples of unknown class of malwares.
Unclassified Malwares by AV
URLs Source Application
Used
Malicious
URLs
Malware
dropped
Not
detected
by AV
3592 Internet IE 6.0 121 15 17
2656 User-
Agency
IE 6.0 33 41 12
807 Internet IE 6.0 4 4
Table 3. Few unclassified malwares by popular Anti-Viruses.
Further we have deployed traditional honeypots sensors of
high interaction honeypots and nepenthes [12], [13] sensors
using Virtualization Technology. We used high interaction
honeypots like Windows 2000, Windows XP, unpatched with
default configurations. During the operation, we have
detected more than 16000 samples (about 1400, unique
samples).
5. Below figure 4 depicts the screenshot of our developed GUI
which is a portal to see the results of collected malwares. We
are displaying the collected executable binaries with some
unique ID, their MD5 values, collection time, on which
honeypot we have collected them and Anti-Virus labeling
[10]. Below are the fields which we have displayed on GUI:
<Binary-ID, MD5, Collection_time, Node, Honeypot,
Antivirus-label>
Fig 4. Graphical User Snapshot
CONCLUSION AND FUTURE WORK
In this paper we have presented the hybrid framework of
malware collection and analysis using both of the servers and
client honeypots based technologies. Collecting extensive data
on attacks vectors has always been the primary objective of
honeypots. Conventional server honeypots suffer from lack of
exposure, risk of detection, demand high resources and
administrative supervision. Client honeypots on the other hand
risk generating false positive alerts, false negative and slow
performance speed. As we were already working on
traditional honeypots, our main goal was to collect the large
attack vectors. Our solution is completely automated but lack
of automated correlation of attacker source IP address to
Sebek Keystrokes remains a major problem. Our database
schema is presently only for centralized botnets; no support
for P2P botnets and encrypted botnets. We plan to add some
basic support for these kinds of botnets also.
Acknowledgment
We would like to thank all our colleagues working in Cyber
Security Technology team at CDAC, Mohali to provide the
useful help in collecting the malwares to make them available
for further analysis. We also very thankful to Executive
Director of CDAC, Mohali to provide us full support.
References
[1] Ren Liu. China virus status & Internet Security Report in 2006.2007-02-
01.http://www.donews.com/Content/200702/eda7daf7970448608b2881d97c9
a1868.shtm.
[2].Description of the Blaster worm,
www.symantec.com/security_response/writeup.jsp?docid=2003-081113-229-
99.
[3] Thorsten Holz, Markus Koetter. The German Honeyclient
Project.2006.http://www.chicagohoneynet.org/germanhoneypot-holz.pdf.
[4] Yaser Alosefer and Omer Rana, “Honeyware: a web-based low interaction
client honeypot”, Third IEEE International Conference on Software Testing,
Verification, and Validation Workshops (ICSTW), pp. 410 – 417, 2010.
[5] Xiaoyan Sun, Yang Wang, Jie Ren, Yuefei Zhu and Shengli Liu,
“Collecting Internet Malware Based on Client-side Honeypot”, 9th
IEEE
International Conference for Young Computer Scientists (ICVCS 2008), pp.
1493 – 1498, 2008.
[6] L. Spitzner, Honeypots: Tracking Hackers. Addison Wesley, 2002.
[7]. C. Seifert, R. Steenson, T. Holz, Y. Bing, and M. A. Davis, “Know your
enemy: Malicious web servers.” The Honeynet Project, 2007.
http://www.honeynet.org/papers/mws/
[8]. Spitzner, L. (2002). Honeypots: Tracking Hackers.US: Addison Wesley.
Pp 1-430.
[9]. VirtualBox. (2004). Sun VirtualBox® User Manual. Available:
http://www.virtualbox.org/manual/UserManual.html Last accessed 20 July
2008.
[10] Virus Total, free service for scanning binaries with multiple antivirus
products, www.virustotal.com
[11].The Honeynet Project. Know Your Enemy: Tracking Botnets, Internet
(March 2005)
[12]. Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.: The
Nepenthes Platform: An Efficient Approach to Collect Malware. In: Zamboni,
D., Krügel, C. (eds.) RAID 2006.LNCS, vol. 4219, pp. 165–184. Springer,
Heidelberg (2006) Conference on Internet Measurement, pp. 41–52. ACM
Press, New York (2006)
[13] Nepenthes homepage, nepenthes.mwcollect.org
.