Your SlideShare is downloading. ×
Dearjohn nicholassparks1-140121071122-phpapp01
Dearjohn nicholassparks1-140121071122-phpapp01
Dearjohn nicholassparks1-140121071122-phpapp01
Dearjohn nicholassparks1-140121071122-phpapp01
Dearjohn nicholassparks1-140121071122-phpapp01
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dearjohn nicholassparks1-140121071122-phpapp01


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Design and Implementation of Web Service Honeypot Abdallah Ghourabi Tarek Abbes Adel Bouhoula Department of Computer Science and Networks Higher School of Communication of Tunis SUP’COM, University of Carthage, Tunisia Abstract: Web services are increasingly becoming an integral part of next-generation web applications. A Web service is defined as a software system designed to support interoperable machine-to-machine interaction over a network based on a set of XML standards. This new architecture and set of protocols brings new vulnerabilities that can be exploited by attackers. To prevent and detect such attacks, several security techniques are available like authentication and encryption mechanisms, firewalls and intrusion detection systems (IDS). Nevertheless these security methods encounter some problems, especially when dealing with new attacks. Relying on additional security principles seems to be important to well protect Web services. In this paper, we propose using honeypots to detect and study attacks against Web services. Honeypots are used to learn new techniques, tools and motivations of hackers to better protect the production systems against attacks. Our solution (WS Honeypot) is to deploy a honeypot as a web service application. This honeypot captures all request messages and analyses them by using machine learning techniques in order to detect and study attacks. 1. INTRODUCTION In the last few years, the field of Web Services has evolved rapidly by providing attractive features (such as ease of use, platform independence and interoperability) which can be used by business and IT organizations. According to the World Wide Web Consortium (W3C) [13], a Web service is a software system designed to support interoperable machineto-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards. However, along with these enhanced information exchange capabilities, come significant security considerations and challenges. The diversity of standards and protocols included with Web services brings several threats and vulnerabilities which can be exploited to attack the system. Attacks on web services are very numerous, such as DoS attack, SQL and XML injection, parameters tampering and still others [4]. To prevent and detect such attacks, several security techniques are available like authentication and encryption mechanisms, firewalls and intrusion detection systems (IDS). Nevertheless these security methods encounter some problems, especially when dealing with new attacks. To resolve this problem, a complementary approach is needed. The idea is to use honeypots. A honeypot [8] is a computer system voluntarily vulnerable to one or more known threats, deployed on a network for the purpose of logging and studying attacks on the honeypot. These systems may be made purposely insecure in order to lure attackers to study their techniques, tools, and motivations. We propose in this paper a solution which takes advantages of honeypots in order to detect and study attacks against Web services. Our solution (WS Honeypot) is to deploy a honeypot playing the role of Web service application. The honeypot supervises received SOAPmessages and logs all activities in order to analyze them by using machine learning techniques (Support Vector Machine “SVM”, regression analysis, association rules). The purpose of this automated analysis is to facilitate the analysis task and to extract the abnormal activities that will be studied by the human expert. The remaining parts of the paper are organized as follows: Section 2 reviews related works. Section 3 presents our solution, WS Honeypot and its architecture. Section 4 describes the data analysis in the WS Honeypot. Section 5 reports the results of our experiments. Finally, we conclude the paper in Section 6. 2. RELATED WORKS Beyond traditional honeypots which are designed for detecting network attacks, there are recent works, which have proposed the deployment of honeypots for web-based attacks. In the context of Web services honeypot, the proposed solutions are very limited. Hugo González [5] presented “WSpot”, an approach for the design of a Web Service Honeypot. In this approach, the author proposes software that emulates a SOAP based Web service. The objective of this software is to log and register all the activities on it. The architecture of this honeypot is very simple and does not allow collecting interesting information about the attacks. For example, when the attacker sends a request to the Web service honeypot, the response will be an error message like “You are not logged or not permitted to use this service”. This limits the interest to this Web service and reduces the value of collected information about attacks. In [9], Thakar et al. proposed a semi automated approach to analyze the attacks and generate signatures for web services. To perform data collection, the authors deployed some honeypots and traffic monitoring tools to log suspect activities. The chosen honeypot was Honeyd, low-interaction open source software that creates security logs to report all attempted and
  • 2. completed connections for several network protocols. But the problem here is that Honeyd is not designed to simulate a Web service. Furthermore, it is a low interaction honeypot. Therefore, it cannot attract a large number of attackers, and consequently, it cannot gather interesting information about attacks. Honeypots represent a good complement for Intrusion Detection Systems (IDS). They allow collecting valuable information about new attacks and their techniques. In this paper, we propose a Web Service Honeypot that provides real services to allow a high interaction with attackers. We integrate in the honeypot intelligent techniques based on machine learning to extract attacks and analyze collected data. 3. THE WEB SERVICE HONEYPOT Our solution for Web Service Honeypot is to deploy an extra Web Service application as a honeypot. This honeypot captures all incoming and outgoing messages and analyses them by using machine learning techniques in order to detect and facilitate the study of attacks. Thenceforth, any activity performed by a user on our honeypot will be classified with the initial dataset. If misclassification occurs, e.g. due to abnormal activity, this activity will be transmitted to a human expert to check if it is an attack. After verification, if it appears to be a normal activity which is not previously seen, a new entry will be added about this activity in the initial dataset to avoid a similar error. Moreover, confirmed attacks detected by the honeypot will be classified in other datasets in order to prevent expert intervention in similar cases. 3.2. Architecture The role of the WS Honeypot is to simulate the behavior of a Web service. It incorporates automated tools to capture and analyze clients activities, especially those issued from attackers. The system architecture is shown in Figure 1. It consists of following components: 3.1. Description The WS Honeypot is a high interaction honeypot. It provides real web services to ensure a real interaction with attackers. The services offered by the honeypot can be deployed by using two technologies, Axis or .Net. The administrator of the honeypot can customize his own web service, or he can simply use an automated tool integrated into the honeypot that can create from a WSDL (Web Services Description Language) file a real service that can be deployed in the honeypot. This tool offers flexibility in the behavior of the honeypot and in the choice of its features. Honeypots store collected information in log files. This information is used to learn new techniques, tools and motivations of hackers to better protect the production systems against attacks. The main problem related to logging is the great amount of data that has to be analyzed by a human expert. Much of these collected data represent the normal behavior of the system and don't have any relation with attacks. So the human expert will be overwhelmed with a large amount of data in audit trails and may fail to notice severe threats. For this reason, we choose to use, in the WS Honeypot, machine learning techniques to analyze data and detect attacks in a semi-automatic way. These techniques help us to learn the normal behavior of activities in the honeypot, and consequently, detect any significant deviation from this normal behavior and provide it to the expert to decide whether it constitutes a true positive. Before deploying the WS Honeypot, it must firstly undergo a training phase during which it operates in a safe environment to learn the normal behavior of the Web service simulated in the honeypot. Thus, a dataset will be created containing classified information about the various legitimate activities and requests that can be interpreted by the honeypot. After this phase, the WS Honeypot can be deployed anywhere to begin attracting attackers. Figure 1 - The WS Honeypot architecture 3.2.1. Web Service Simulation The purpose of this module is to convince attackers that they are interacting with a real Web service. For this reason, we choose to integrate in the Honeypot two Web service servers: Axis and .NET. The administrator can choose which implementation to use. To avoid giving a static appearance to the operations offered by the simulated service, we added an automatic tool that creates from a WSDL (Web Services Description Language) file, a real service that can be deployed in the honeypot. The administrator will be interested in only customizing responses to be returned by the service operations. 3.2.2. Traffic Capture Traffic logging is an important task to collect and classify client activities. This component includes traffic capturing mechanisms and monitoring tools to intercept and parse requests and responses for Web services simulated on the WS Honeypot. The Web service requests are formatted in XML language and encapsulated in SOAP messages which use HTTP as the transport protocol. The content inspection of these messages is necessary to detect attacks. 3.2.3. Feature Extraction During this step, we extract features from each SOAP message captured by the WS Honeypot. These parameters will then be useful to classify activities related to these messages. For this purpose, we created three categories of parameters:
  • 3. • • • SOAP message content: IP source, message length, request preamble, response preamble, invoked operations in each request, input parameters of every operation. Resource consumption: response time, CPU usage, memory usage. Operations list: the Web service offers several operations that can be called differently depending on user. For each user, we will extract a list of all invoked operations to build a profile and analyze it afterward. 3.2.4. Data analysis In honeypots, the analysis is a very difficult and tedious task that requires much effort due to the great amount of data which has to be examined by the human expert. To facilitate this task, we chose to develop an analysis tool based on machine learning techniques to detect any abnormal activities in the honeypot. This tool is very useful for detecting new attacks and speeding up the analysis of normal activities or attacks already seen. Due to the importance of this module, we detail its mechanisms in the next section. 4. ARCHITECTURE OF DATA ANALYSIS MODULE The parameters extracted as mentioned above in Section 3, will be stored in datasets to be analyzed later. The analysis consists in classifying any received request according to its content. The classification relies on a learning dataset built progressively during the deployment of the honeypot. If misclassification occurs, e.g. due to abnormal activity, this activity will be transmitted to the human expert to check if it is an attack. The classification process is divided into three parts. For each group of parameters extracted from the requests, we have associated a data mining classifier: • • • For the category “SOAP message content”, we used an SVM (support vector machine) classifier. This solution is useful for detecting attacks like SQL injection, XPath injection and parameters tampering. For “Resource consumption”, we opt for a SVM implementation for regression. This classifier is useful for detecting several types of DoS attacks. For “Operations list”, we employ the association rule algorithm “Apriori”. This solution is useful to detect unauthorized accesses to operations. The task of data analysis in a honeypot is very difficult; it requires much effort by the analyst due to the large amount of data collected. To facilitate this task we have used the machine learning techniques. The basic idea of our solution is to capture SOAP messages received by the WS Honeypot. From each message, some parameters will be extracted as mentioned in Feature Extraction module. Then the data will be classified using three algorithms: SVM, SVM Regression and Apriori. At the end of the classification, the abnormal activities will be extracted to be analyzed by a human expert. The architecture of this module is shown in Figure 2. Figure 2. Architecture of Data Analysis module 4.1. Datasets in the WS Honeypot It should be noted that to have good results with this honeypot, it must undergo first a training phase during which it must be deployed in a safe environment to learn the normal activities of the system. Features extracted from these normal activities, captured in the honeypot, will be stored in a dataset labeled “Learning Dataset”. These activities will be classified by their type using the three classification algorithms. After the training phase, the WS Honeypot can begin its work. For each SOAP message captured in the honeypot, the extracted features will be stored in a dataset labeled “Test Dataset”. Then, these features will be classified periodically based on the Learning Dataset. If a classification error is generated due to abnormal activity, this activity will be transmitted to the human expert to check if it is an attack. After verification, the expert will add this activity to “Learning Dataset” either with normal activities data, or with attacks data. 4.2. SVM Classifier The role of this classifier is to classify the received SOAP messages based on the first category of extracted features “SOAP message content”. The analysis of the SOAP message content allows the detection of malicious code and tampered parameters. The classifier will try to find a class for each message in “Learning Dataset”. If the parameters in the message represent a certain deviation compared to known parameters that are well classified, a classification error will occur and consequently, the activity associated to this message will be transmitted to the human expert to verify if it is an attack. This technique is very useful to detect and study several types of web service attacks like SQL injection, XPATH and XQUERY injection, parameters tampering and Coercive parsing. We chose SVM due to its good performance compared to other classification algorithms and its popularity in anomalybased intrusion detection. This popularity is due to the good generalization nature of SVM and the ability to overcome the curse of dimensionality [3, 12]. Furthermore, SVM implements the structural risk minimization principle which minimizes an upper bound for the generalization error rather than minimizing the training error [10]. To further justify the choice, we perform a test to compare this algorithm with three other classification algorithms (multilayer perceptron, naive bayes and K-nearest neighbours). The test consists in classifying 365 SOAP-
  • 4. messages captured by the honeypot, containing 52 different attacks. To realize the test, we use the algorithms implemented in the Weka software library (open source software that offers a collection of machine learning algorithms for data mining tasks) [14]. The test results are presented in Table 1. For every tested algorithm, we extract the attack detection rate, the false alarm rate and the time consumed by the classifier. The SVM algorithm gives the better detection rate (95%) however with some extra false alarms (total rate =18%) and consumed time (total period = 35s). Since the purpose of a honeypot is to analyze the maximum number of attacks, we prioritize the detection rate characteristic for the choice of our classification algorithm. Table 1 - Comparison of SVM with other algorithms SVM Multilayer Perceptron Naïve Bayes K-nearest neighbours Detection Rate (%) 95 82 88 73 False alarm Rate (%) 18 20 17 14 Time (seconds) 35 24 1 1 4.3. SVM Regression Classifier This classifier is useful for the second category of extracted features “Resource consumption” according to the same principle as the previous classification. For each captured SOAP message, a classification is performed to verify if there is an abnormal activity. The extracted features are related to the resource consumption during the processing of SOAP requests such as response time, CPU usage, and memory usage. The analysis of these variables is very useful to detect several types of DoS (denial of service) attacks (XDoS, XML Bomb, and malicious XML eXternal Entity). To analyze these features, we choose regression analysis which gave good results with intrusion detection systems in previous works [11]. Regression analysis includes any technique for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps us to understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Regression analysis is applied when we have more than one predictor variable to be analyzed with the response variable at the same time [11]. For example, in our case, we want to determine that certain characteristics of SOAPmessage processing, such as, response time, CPU usage and memory usage are associated with normal SOAP-message. To do this analysis, we used a SVM implementation for regression named SMOreg proposed by Shevade et al. [7]. This choice is done after testing the SMOreg algorithm with a linear regression algorithm (both algorithms are implemented in the Weka software library). The purpose of the test is to analyze 365 SOAP-messages containing 22 different DoS attacks. Results are reported in Table 2. All attacks are detected by both classifiers. Nevertheless, SMOreg gives a lower false positives rate (2.5% against 8.7% with linear regression algorithm). Table 2 - Comparison between SMOreg and linear regression Detection Rate (%) 100 SMOreg Linear regression False alarm Rate (%) 2.5 Time (seconds) 0.21 100 8.7 0.16 4.4. Apriori Classifier The Apriori classifier examines the third category of extracted features “Operations list”. This classifier is based on association rules to detect if the same user has conducted an abnormal sequence of activities. For example, if a large amount of identical repeated messages are transmitted from a specific source address over a very short time period, most likely it is a SOAP flooding attack. Furthermore, a bad scenario is detected if a user accesses some operations offered by the service without being successfully authenticated during the login step, although the latter is necessary. Another suspect situation is discerned when a client fails to login several times using the same identifier. The basic idea of this classifier is to build using the training data, a profile describing the sequence of activities performed by each user. The profile is represented as a set of association rules. Upon the reception of a new sequence of activities, the algorithm compares it with the association rules created in the profile and calculates two parameters, confidence and support. Based on these two parameters, the classifier can determine if this sequence of activities is a normal behavior of the user or it represents a probable attack. The association rules are very useful to detect such attacks, they are also widely used in intrusion detection systems [2, 6]. To implement the method, we choose “Apriori” [1], the best-known algorithm to mine association rules. 4.5. Abnormal activities extraction During the classification process, features extracted from each activity on the WS honeypot are classified according to their similarities. If a classification error occurs, due to an abnormal activity, a detailed description of the activity is reported to the human analyst, via a web page. Hence the security expert analyzes the suspected activity and the related context to verify if there is really an attack. 5. EXPERIMENTAL RESULTS In order to validate the efficacy of the proposed approach, we need to deploy the WS Honeypot in the Internet for a period of three months at least. Pending these results, we tested the honeypot locally. For this purpose, we prepared a honeypot that simulates a shopping cart Web service. It offers eight Web service operations (APIs) as mentioned in Figure 3. This type of service attracts many attackers due to the precious information that it may contain. We begin our test by sending legitimate SOAP-messages to the WS Honeypot for the training phase. Afterward, for the test phase, we send 365 SOAP-messages among which malicious requests generated 73 alarms. The simulated attacks are various and are made using several techniques. Among them, we mention:
  • 5. • • • • • • SQL injection attack against the “Authenticate” function by inserting codes like “or 1=1;” (A1) Remote command execution attack against “GetProducts” by employing the “xp_cmdshell” procedure to gain access on the host system (A2) Coercive parsing attacks against several operations to load the XML parser causing a DoS (A3) Session hijacking attack by stealing a session ID, this technique allows the attacker to access the operations “AddItem” and “CheckOut” by avoiding “Authenticate” (A4). Parameter tampering (A5) XPATH injection (A6) GetProductByName Authenticate GetProducts AddedItem DeleteItem CheckOut Shipping Billing Figure 3 - An example of legitimate transitions of Web service operations in the WS Honeypot These experimental results are very encouraging, here is a description of what we obtained from the experience: • • • • Total number of attacks: 73 Number of attacks detected by referring to the human analyst: 26 Number of attacks detected automatically: 45 Number of false negatives: 2 Table 3. Detailed description of attacks detection Number of instance Human expert intervention Automatic detection False negative Detection Method A1 A2 14 4 5 2 9 1 0 1 A3 22 4 18 0 A4 A5 A6 Other attacks 9 11 5 7 3 2 2 7 3 0 1 0 8 3 5 0 SVM SVM SVM Regression Apriori SVM SVM SVM/SVM Regression To well describe the experimental results, we present in Table 3 for each type of performed attacks (A1, A2,...), detailed information such as the number of instances associated to this type of attack, the number of attacks detected by the system after the human expert intervention, the number of attacks detected automatically, the number of false negatives and the detection method (SVM, SVM Regression or Apriori). Although these attacks are not generated by real attackers, the experimental results are very interesting, they encourage us to improve the system and publish it in the Internet for interacting with real hackers and studying new attacks. 6. CONCLUSION In this paper, we presented a Web service Honeypot deployed to supervise SOAP-messages in order to detect and study attacks. To analyze logged activities by this WS Honeypot, we proposed a data analysis tool based on machine learning techniques. This tool uses classification algorithms in order to classify captured activities and extract those that are suspicious. The usefulness of the approach has been validated experimentally, and results have been detailed in the paper. As future work, we envisage reducing false negatives by a better correlation with knowledge from the test environment. We also plan to add other functionalities like automatic extraction of attacks signatures to be used by an intrusion detection system (IDS). REFERENCES [1] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile, September 1994. [2] D. Barbara, J. Couto, S. Jajodia, N. Wu, “ADAM: a testbed for exploring the use of data mining in intrusion detection,” ACM SIGMOD Record: SPECIAL ISSUE: Special section on data mining for intrusion detection and threat analysis 30 (2001) 15–24. [3] W. H. Chen, S. H. Hsu and H. P. Shen, “Application of SVM and ANN for intrusion detection,” Computers & Operations Research, vol.32, pp.2617-2634, 2003. [4] A. Ghourabi, T. Abbes and A. Bouhoula, “Experimental analysis of attacks against Web services and countermeasures,” In Proceedings of the12th International Conference on Information Integration and Webbased Applications & Services (iiWAS2010), Paris, France, November 2010. [5] H. González, “First approach to Design a Web Service Honeypot, WSpot,” International Journal of Web Services Practices, Vol. 3, No.1-2 (2008), pp. 66-70. [6] W. Lee, S.J. Stolfo, K.W. Mok, “A data mining framework for building intrusion detection models,” In Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, 1999, pp. 120–132. [7] S. K. Shevade, S. S. Keerthi, C. Bhattacharyya and K. R. K. Murthy, “Improvements to SMO Algorithm for SVM Regression,” Technical Report CD-99-16, Control Division, Dept. of Mechanical and Production Engineering, National University of Singapore, Singapore, 1999. [8] L. Spitzner, Definitions and value of honeypots:, 2003. [9] U. Thakar, N. Dagdee and S. Varma, “Pattern Analysis and Signature Extraction for Intrusion Attacks on Web Services,” International Journal of Network Security & Its Applications (IJNSA), Vol.2, No.3, July 2010. [10] V. N. Vapnik, “The Nature of Statistical Learning Theory,” SpringerVerlag New York, Inc., New York, NY, USA, 1995. [11] Y. Wang, “Statistical Techniques for Network Security: Modern Statistically based Intrusion Detection and Protection,” IGI Global, 2009. [12] J.T. Yao, S.L. Zhao and L. Fan, “An Enhanced Support Vector Machine Model for Intrusion Detection,” Proceedings of the International Conference on Rough Sets and Knowledge Technology (RSKT), Chongqing, China, July 24-26, 2006, LNAI 4062, pp53 8-543. [13] Web Services Glossary, [14] Weka,