Security Event Analysis Through Correlation

5,992 views
5,741 views

Published on

This paper covers several of the security event correlation methods, utilized by Security Information Management (SIM) solutions for better attack and misuse detection. We describe these correlation methods, show their corresponding advantages and disadvantages and explain how they work together for maximum security.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,992
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Security Event Analysis Through Correlation

  1. 1. “Security Event Analysis Through Correlation” Anton Chuvakin, Ph.D., GCIA, GCIH WRITTEN: 2002-2004 Contents Contents...............................................................................................................................................1 Abstract................................................................................................................................................1 Introduction to security data analysis..................................................................................................1 Types of correlation.............................................................................................................................3 Rule-based correlation....................................................................................................................3 Statistical correlation......................................................................................................................4 Challenges with correlation............................................................................................................5 Maximizing benefits of correlation......................................................................................................6 Correlation Rule Examples..................................................................................................................6 Probes followed by an attack..........................................................................................................6 Login guessing................................................................................................................................7 Conclusion...........................................................................................................................................7 DISCLAIMER: Security is a rapidly changing field of human endeavor. Threats we face literally change every day; moreover, many security professionals consider the rate of change to be accelerating. On top of that, to be able to stay in touch with such ever-changing reality, one has to evolve with the space as well. Thus, even though I hope that this document will be useful for to my readers, please keep in mind that is was possibly written years ago. Also, keep in mind that some of the URL might have gone 404, please Google around. Abstract This paper covers several of the security event correlation methods, utilized by Security Information Management (SIM) solutions for better attack and misuse detection. We describe these correlation methods, show their corresponding advantages and disadvantages and explain how they work together for maximum security. Introduction to security data analysis The security spending survey by “Information Security Magazine” http://www.infosecuritymag.com/2003/may/coverstory.pdf and recent research by Forrester analyst firm indicate that deployment rates of many security technologies will soar in the next three years. According to some estimates, security budgets (and thus technology purchases) will double by 2006. Almost every Internet-connected organization now has a firewall, included as part of its network infrastructure; most Windows networks have an anti-virus solution. Intrusion Detection Systems (IDSs) are slowly but surely gaining wider acceptance and intrusion prevention starts to show more promise, despite the obvious hurdles. New types of application security products such as web application firewalls are starting to be deployed by security-conscious organizations. This buying trend is further enhanced by the growing popularity of so-called "appliance" security systems, which are very easy to install and manage. Appliances combine software and hardware in
  2. 2. one package and usually have much lower installation and maintenance costs, thus facilitating their adoption. All the above devices, whether aimed at prevention or detection of attacks, usually generate huge volumes of audit data. Firewalls, routers, switched and other devices recording network connection information are especially guilty of producing vast oceans of data. There are other problems induced by this log deluge, turning its analysis into a pursuit few dare to undertake. Many diverse data formats and representations, some binary1, obscure and undocumented, are used for those log files and audit trails. Also, a percentage of events generated by network Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) are false alarms and do not map to real threats or map to threats that have no chance of causing loss. To further confuse the issue, different devices might report on the same things happening on the network, but in a different way, with no apparent way of figuring the truth of their relationship. For example, a UNIX log file might contain an FTP connection message. The same will also be recorded by the firewall as 'connection allowed to TCP port 21'. A network IDS might also generate an alert, warning that FTP with no password has occurred. All three messages refer to the same event and a human analyst will recognize them as such. However, programming a system to do that is much more challenging, especially for A broad spectrum of messages, Thus, there is a definite need for a consistent analysis framework to identify various network threats, prioritize them and learn their impact on the target organization. This needs to be done as fast as possible (preferably in real-time) for attack identification and also over the long term for threat trending and risk analysis. To understand the meaning of the piling logs, the data in them may be categorized in several ways. It should be noted that before the data can be intelligently categorized, it should be normalized to a common schema. The normalization process involves extracting the parts of the log records serving the common purpose and assigning them to specific fields in the common schema. For example, both firewall and network IDS log records will usually contain the source and destination IP addresses. If you see both firewall and IDS logs referring to the same source and destination at about the same time, they are likely to be related. Log categorization helps to make the similarity between different log records to stands out. For example, the generated log data across many security devices, hosts and applications might be related to: •Device performance data •Network traffic •Known attacks •Known network/system problems •Anomalous/suspicious network/host activity •Access control decisions •Software failures •Hardware errors •System changes •Evidence of malicious agents •Site-specific AUP2 violations 1 Binary = here, not containing human-readable text, but binary data 2 AUP = Acceptable Use Policy
  3. 3. Each of the above types of events presents unique analysis challenges. For example, some are produced in much higher numbers (network access control, worm events) while some others are often not what they seem at first (such as network IDS “false positives”). Moreover, sometimes the threat can only identified and rated by cross-device and cross-category analysis of the above events. Many questions arise upon seeing the above data. How to turn that flood of data into useful and actionable information? How to find what is really relevant for the organization at the moment and for the near future? How to tell normal log records, produced in the course of business, from the anomalous and malicious, produced by attackers or misbehaving software? Correlation performed by the SIM (Security Information Management) software is believed to be the solution to those challenges. Correlation is defined in the dictionary as establishing or finding relationships between entities. However, the good security-specific definition is lacking. In security, “event correlation” may be defined as improving threat identification and assessment process by looking not only at individual events, but also at their sets, bound by some common parameter (“related”). Types of correlation Security-specific correlation can be loosely categorized into rule-based and statistical (or algorithmic). Rule-based correlation needs some pre-existing knowledge of the attack (“the rule”) and is able to define what it actually detected in precise terms (“Successful Shopping Cart Web Application Attack”). Such attack knowledge is used to relate events and analyze them together in broader context. On the other hand, statistical correlation does not employ any pre-existing knowledge of the “bad” activity (at least, not as a primary detection vehicle), but instead relies upon the knowledge of normal activities, accumulated over time. Ongoing events are then rated by the built-in algorithm and are additionally compared to the accumulated activity patterns. This distinction is somewhat similar to signature vs anomaly IDS and makes a SIM solution a kind of meta-IDS, operating on a higher-level data (not packets, but log records). Both of those correlation methods combined can help to sift through the large volume of diverse data and identify high severity threats. Rule-based correlation Rule-based correlation uses some pre-existing knowledge of an attack (a rule), which is essentially a scenario that an attack must follow to be detected. Such scenario might be encoded in the form of “if this, then that, therefore some action is needed”. Rule-based correlation deals with states, conditions, timeouts and actions. Let us define those important terms. A state is a stationary occurrence that the correlation rule might be in. A state might contain various conditions, such as matching incoming events by the source IP address, protocol, port, event type, producing security device type, username and other components of the event. It should be noted that although such data components vary upon the device, the SIM solution normalizes them using the cross-device event schema without incurring the information loss. Timeout defines how long the rule will be in a certain state. If the correlation engine has to
  4. 4. maintain a lot of rules in waiting state in memory, this resource might be exhausted. Thus, rule timeouts plan an important role in correlation performance. A transition is an event when one rule state is switched to another one. For a complicated rule, many transitions are possible. Action is what happens when all the rule conditions are met. Various actions may result from rules, such as user notification, alarm escalation, configuration changes or automatic incident case investigation. The correlation is usually performed by the correlation engine, which is able to track various states and switch from state to state, depending on conditions and incoming events. It does all the above for multiple rules at the same time. The correlation engine gets a real-time event feed from the alarm-generating security devices and applies the relevant correlation rules as needed. The correlation engine also leverages other types of available data (such as vulnerability, open port or asset business value information) for higher level of correlation. Correlation rules may be applied to the incoming events as they arrive in real-time or to the historical events stored in the database. In the latter case, the rules are used as a form of data mining or analytics, which allows uncovering hidden threats such as slow port scans or low level Trojan or exploitation activity. Such rule may be run periodically for incident identification or in the course of the investigation of suspicious activity for seeking out the prior occurrences of similar (and thus possibly related) activity. Unlike the real-time rules, which become useless if prone to false alarms (just as signature-based IDSs sometimes are), database rules can tolerate a certain level of false alarms for the purpose of drastically reducing false negatives. This is due to the fact that real-time rules usually feed the alarm notification system, while database rule correlation will be launched by the analyst during security incident the investigation. As long as the rule-based analytics will uncover a hidden threat, which is impossible to discover otherwise, an analyst might be able to tolerate a certain level of false alarms, not acceptable for the real-time correlation. Statistical correlation Statistical correlation uses special numeric algorithms to calculate threat levels incurred by the security relevant events on various IT assets. Such correlation looks for deviations from normal event levels and other routine activities. Risk levels may be computed from the incoming events and than tracked in real time or historically, so that deviations are apparent. The algorithmic correlation may leverage the event categorization in order to compute the threat levels specific to various attack types, such as threat of denial of service, threat of viruses, etc and track them over time. Detecting threats using statistical correlation does not require any pre-existing knowledge of the attack to be detected. Statistical methods may however be used to detect threats on pre-defined activity thresholds. Such thresholds may be configured based on the experiences monitoring the environment. For example, if normal level of specific reconnaissance activity is exceeded for a prolonged period of time, the alarm might be generated by the system. Correlation may also use various parameters for enterprise assets to skew the statistical algorithm for higher accuracy detection. Some of them are defined by system users (such as the affected asset value to the organization) or are automatically computed from other available event context data (such as vulnerability scanning results or measure of normal user activity on the asset). That allows
  5. 5. to define broader context for transpiring security events and thus help understand how they contribute to the organization's risk posture. If rule-based correlation is more helpful during the threat identification, the algorithmic correlation is conducive to impact assessment. In case of higher threat levels detected by the algorithms, one can assume that there is a higher chance of catastrophic system compromise or failure. Various statistical algorithms may be used to trend such threat levels over long periods in order to gain awareness of the normal network and host activities. The accumulated threat data is then used to compare the current patterns of activity with the baseline. This allows the system to make accurate (and possibly automated) decisions about event flows and their possible impact. Challenges with correlation Both of the above types of correlation have inherent challenges, which can fortunately be mitigated by combining both methods to create coherent correlation coverage, leading to quality threat identification and ranking. First, can we assume that the attacker will follow a scenario, which can be caught by the rule-based correlation system? Unlike the network IDS system that needs a specific signature with detailed knowledge of the attack, a correlation system rule might cover the broad range of malicious activities, especially if intelligent security event categorization is utilized. It may be done without going into the specifics of a particular IDS signature. For example, rules may be written to look for certain activities that usually accompany the system compromise, such as backdoor communication or hacker tools download. Doing those things is harder to avoid by the attacker if he intends to use the compromised machine for his purposes. Extensive research using deception networks also called honeynets allows us to learn more and more of the attackers' patterns of behavior and to encode them as correlation rules, available out of the box. Second, can multiple rules cause the number of false positives to actually increase instead of decrease? Indeed, deploying many rules without any regard to the environment might generate false alarms. However, it is much easier to understand and tune the SIM correlation rules than intricate binary matching patterns. The latter requires in-depth understanding of the attack network packets, memory corruption issues and specifics of the exploitation techniques. On the other hand, tuning the correlation rule involves changing the timeouts and adding or removing conditions. Overall, in case of correlation rules, one may also define response actions with higher confidence, since one can bind the rules to a specific asset or group of assets. Third, rule-based correlation is relatively intensive computationally. However, using highly optimized correlation engines and intelligently applying filters to limit the flow of events allows gaining maximum advantage of the rule-based correlation. Additionally, many rules can be combined together so that the correlation engine does not have to keep many similar events in memory. It also makes sense to apply more specific correlation rules to a large number of assets, where false positives flood might endanger the security, and to apply wider and more generic rules to critical assets, where an occasional false alarm is better than missing a single important alert. This way all the suspicious activities directed against a small group of critical assets will be detected, and
  6. 6. Fourth, statistical correlation may not pick up anomalous activity if it is performed at low enough levels, essentially merging with the normal. Hiding attack patterns under volumes and volumes of similar normal activity might deceive the statistical correlation system. Similarly, a single occurrence of an attack might not impact the statistical profile enough to be noticed. However, careful “baselining” of the environment and then using statistical methods to track the deviations from such baseline might allow detecting some of the low-volume threats. Also, rule-based correlation efficiency compensates for those rare events and enables their detection, even if algorithmic correlation misses them. Maximizing benefits of correlation Correlation enabled the system users to take the audit data analysis to the next level. Rule-based and statistical correlation allows the user to: •Dramatically decrease the response times for routine attacks and incidents by using the centralized and correlated evidence storage •Completely automate the response to certain threats that can be detected reliably by correlation rules •Identify malicious and suspicious activities on the network even without having any pre- existing knowledge of what to look for •Increase awareness of the network via baselining and trending and effectively “take back your network” •Fuse data from various information sources to gain cross-device business risk view of the organization •Use the statistical correlation to learn the threats and then deploy new rules for site-specific and newly discovered violations Overall, combining rules and algorithms provides the best value for managing organization's IT security risks. Correlation Rule Examples Probes followed by an attack The rule watches for the general attack pattern consisting of a reconnaissance activity followed by the exploit attempt. Attackers often use activities such as port scanning, application querying to scope the environment and find targets for exploitation and get an initial picture of system vulnerabilities. After the initial information gathering is performed, the attacker returns with exploit code or automated attack tools to get to the actual system penetration. The correlation enriches the information reported by the IDS and serves to validate the attack and suppress false alarms. By watching for exploit attempts that follow the reconnaissance activity from the same source IP address against the same destination machine, the SIM solution can increase the confidence and accuracy of reporting. After the reconnaissance event is detected by the system the rule activates and waits for the actual exploit to be reported. If it arrives within a specified interval, the correlated event is generated. The
  7. 7. notification functionality can then be used to relay the event to security administrators by email, pager, and cell phone or to invoke appropriate actions. Login guessing The rule watches for multiple attempts of failed authentication to network and host services followed by a successful login attempt. While some intrusion detection systems are able to alert on failed login attempts, the correlation system is able to analyze such activity across all authenticated services, networked (such as telnet, ssh, ftp, Windows access, etc) and local (such as UNIX and Windows console logins). This rule is designed to track successful completion of such attack. Triggering of this rule indicates that an attacker managed to login to one of your servers. It is well-known that system users would often use passwords that are easy to guess from just several tries. Intelligent automated guessing tools, available to hackers, allow them to cut the guessing time to a minimum. The tools use various tricks such as trying to derive a password from a user's login name, last name, etc. In the case that those simple guessing attempts fail, hackers might resort to "brute forcing" the password. The technique uses all possible combinations of characters (such as letters and numbers) to try as a password. After the non-root (non- administrator) user password is successfully obtained, the attacker will likely attempt to escalate privileges on the machine in order to achieve higher system privileges. The rule activates after the first failed attempt is detected. The event counter is then incremented until the threshold level is reached. At that point the rule engine will be expecting a successful login message. In case such message is received, the correlated event is sent. It is highly suggested to tune the count and the interval for the environment. Up to three failed attempts within several minutes is usually associated with users trying to remember the forgotten password, while higher counts within shorter period of time might be more suspicious and indicate a malicious attempt or a script-based attack. Conclusion SIM products leveraging advanced correlation techniques and intelligent alert categorization are becoming indispensable as enterprises deploy more and more security point solutions, appliances and devices. Those solutions alone only address small parts of the company security requirements and need to be integrated under the umbrella of Security Information Management solution, which will enable the users to combat modern-day technology threats such as hackers, hybrid worms and even internal abuse. ABOUT THE AUTHOR: This is an updated author bio, added to the paper at the time of reposting in 2009. Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the field of log management and PCI DSS compliance. He is an author of books "Security Warrior" and "PCI Compliance" and a contributor to "Know Your Enemy II", "Information Security Management Handbook" and others. Anton has published dozens of papers on log management, correlation,
  8. 8. data analysis, PCI DSS, security management (see list www.info-secure.org) . His blog http://www.securitywarrior.org is one of the most popular in the industry. In addition, Anton teaches classes and presents at many security conferences across the world; he recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries. He works on emerging security standards and serves on the advisory boards of several security start-ups. Currently, Anton is developing his security consulting practice, focusing on logging and PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the world about the importance of logging for security, compliance and operations. Before LogLogic, Anton was employed by a security vendor in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook University.

×