• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Tel Aviv University Raymond and Beverly Sackler Faculty of Exact SciencesClassification of Traffic Sources on The Internet A study of the adaptability level of network agents Thesis Submitted as partial fulfillment of the requirement towards the M.Sc. degree School of Computer Science Tel Aviv University By Uri Gilad This research work in this thesis has been conducted under the supervision of Prof. Hezy Yeshurun, Tel Aviv University (Coordinating Instructor) Dr. Vern Paxson, International Computer Science Institute May 2005.
  • 2. Abstract The subject of investigating the different types and purpose of network intrudershas received relatively little attention in the academic community, unlike the extensivelyresearched area of identifying intrusions. In this paper, we examine how adaptive tochanging environments intruders are. We investigate the different groups that form whenwe rank intruders according to their adaptability. This research lends new understandingabout the level of threat presented by highly adaptive intruders. The following paper is organized as follows: first we examine the current work inintrusion detection and related subjects. We establish ground truth by performing anexperiment in controlled environment. We present and justify our methods of datacollection and filtering, and move on to present results based on real data. Finally weoutlay our conclusions and discuss methods of further research. The resulting algorithm is a useful tool to rank the intruders attacking a defendednetwork according to adaptability levels. 2
  • 3. ContentsClassification of Traffic Sources on The Internet...........................................................1Abstract..............................................................................................................................2Contents..............................................................................................................................31. Introduction....................................................................................................................42. Algorithm........................................................................................................................83. Results...........................................................................................................................314. Discussion.....................................................................................................................465. Further work................................................................................................................49References.........................................................................................................................51‫75..................................................................................................................................תקציר‬ 3
  • 4. 1. Introduction Present day computer security technologies are focused on detecting andprotecting against intruders. While there are many solutions for network defense, therehas been surprisingly little research on the subject of understanding the nature ofattackers. Present day Internet traffic contains a variety of threats, all of which may directtraffic at any random host connected to the network. The traffic originates from sourcessuch as worms, attacking hackers, automated scanning tools and legitimate traffic. The rough classification above begs the question: Can one automaticallycategorize and determine the type of a traffic source. While it may seem secondary, thenature of an attacker is paramount to understanding the seriousness of a threat. Attacksthat come from worms and scanning tools are “impersonal” and will not “insist” onlingering at the specific site while a human may return again and again. A human canemploy a wider array of techniques and keep hammering at a site until finally brute-forcing a way inside. In addition, humans can be subject to retaliation by contacting theresponsible party, while the threat from autonomous agents does not have a clear identitybehind it to contact.1.1 Traffic Sources on the Internet A random host connected to the Internet will be scanned and attacked in a shortwhile [21]. Most of the attacks come from automated agents, due to their overwhelmingnumbers. Automated agents can be worms, human written software pieces. Worms aredesigned to propagate between computer systems by “infecting” a vulnerable machineusing a built-in exploit. After infection the worm places a copy of itself on the targetedsystem and moves on to infect the next victim. The worm thus spawned on the infectedsystem enters the same cycle. Research suggests that probes from worms and autorootersheavily dominate the Internet [37]. As mentioned above, another source of traffic on the Internet is automatedscanning tools such as the abovementioned autorooters. Autorooters are designed toquickly inspect as many hosts as possible for information – without the propagationfeature. There are benign scanning tools such as web crawlers – designed to collectinformation for indexes such as Internet search engines. Other scanning tools, however,are designed to seek vulnerabilities to exploit. The people behind the last type ofautomated scanning tools use the data gathered to return and break into a site found to bevulnerable. Autorooters, an evolution of these automated vulnerability scanners, actuallyinstall a backdoor automatically on a vulnerable machine and report the machines IPaddress back to the human operating them, who can now take control of the host. 4
  • 5. Rarely seen on the Internet are fully manual sources of hostile traffic, adaptivesources actually interacting with a specific network site in order to break into it. There isa distinction between this adaptive (possibly human) traffic sources and the above tworough classifications. While humans may employ worms and automated scanning tools,sometimes the human hacker will manually work against a site, with the intention ofachieving better results by employing a wider range of techniques. This paper presents a practical method to classify sources of attack on theInternet. The classification is believed to be helpful when determining the nature of anattacker – is it an automated agent or a manual one (a human). The development andexperimentation is based on actual data from several distinct sites.1.2 Related Work The subjects of examining the behavior of an attacker on the Internet,understanding attackers nature or classifying the attacker by behavior have not receivedmuch attention from the academic community. However, the related subjects ofdifferentiation between humans and computers and the subject of detecting wormoutbreaks (a subset of the larger subject of intrusion detection) have been researchedextensively. Several papers touch upon this works ideas. Telling computers and humans apart is a problem discussed by Von Ahn et al. [1].The researchers describe a technique dubbed "CAPTCHA" (Completely AutomatedPublic Turing Test to Tell Computers and Humans Apart). As the acronym implies, thework details an automatic algorithm to generate a test that human beings can pass, whilecomputer programs cannot. The test is based upon "hard to solve AI problems" thathuman beings can solve easily, while computers cannot. The Definition of "Hard tosolve AI problems" relies on consensus among researchers, and therefore the problem setmay change in the future. The proof that human beings can solve these problems isempiric. The work outlined in this paper and the CAPTCHA paper share an interest intelling humans and software agents apart. While the CAPTCHA solution is taken fromthe domain of AI, and relies on voluntary test takers (if the test is not passed, resourceswill be denied) this paper will attempt to present an algorithm that deals with more severerestrictions. The differentiation accomplished here is performed without knowledge orcooperation of the tested subject. In [7] Staniford et al. discuss theoretical advanced worm algorithms that use newpropagation mechanisms, random and non-random. Discussed are worms equipped witha hit-list of targets to infect, worms sharing a common permutation to avoid repeatinginfection attempts and worms that study the application topology (for example:harvesting email addresses) to decide which computers to target for infection. Stanifordand co. envision a “cyber center for disease control” to identify and combat threats. Inthis paper, we touch upon the task of identifying the threat and categorizing it as anadaptive source attack or an automated agent that may be part of a worldwide infection.The subject of “non-random” agents (such as those worms equipped with a hit-list – for 5
  • 6. example) is important, as the issue of identifying the origin of that attack (a human whomay change tactics or a worm) is of importance. Zou et al. [2] present a generic worm monitoring system. The system describedcontains an ingress scan monitor, used for analyzing traffic to an unused IP addressspace. The researchers observe that worms follow a simple attack behavior while ahacker’s attack is more complicated, and cannot be simulated by known models. Theysuggest using a recursive filtering algorithm to detect new traffic trends that may indicatea case of worm infection. Zou’s observation that hacker and worm behavior is differentis not expanded upon. The observation is implicitly used, as the paper describes amethod for intruder classification based on their behavior. This work will present analgorithm that uses the mentioned difference to group intruders with similar behavior. In [3] Wu et al. expands upon the subject of worm detection mentioned in [2].Wu discusses several worm scanning techniques and re-introduces the subject ofmonitoring unassigned IP address space. Jiang Wu and co. discuss several wormpropagation algorithms similar to those presented in [7]. The authors discuss wormdetection, and propose the hypothesis that random scanning worms scan the unassignedIP address space and that this fact may present a way to detect them. The authors searchfor common characteristics of worms (as opposed to other agents), one such commoncharacteristic is that a worm will scan a large number of unassigned IP addresses in ashort while. Wu and co. suggest an adaptive threshold as a way to detect wormoutbreaks. Examining the algorithm results on traffic traces validates the work, and theconclusion is that unknown worms can be detected after only 4% of the vulnerablemachines in the network are infected. Unused IP address monitoring is employed in thisresearch also, this research draws upon the common characteristic of worms to detect“automaton-like” behavior, as opposed to more random or adaptive behavior. In [17] Jung et al. present an algorithm to separate port scanning from benigntraffic by examining at the number of connections that are made to active hosts vs. thenumber of connections made to inactive hosts. Their observation is that there is adisparity on the number of connections made to active hosts between benign and hostilesources. The disparity in the access attempts to active and inactive hosts is implementedin an algorithm named TRW (Threshold random walk), which is used to successfully andefficiently detect scanning hosts. The research by Jung et al. brings to light an importantdifference between hostile scanners and benign users, which touches on this research. Inthis research an attempt is made to establish a range of behaviors, ranging from fullyautomated agents (such as worms) to attackers who react to environmental changes andmodify their algorithm of attack accordingly – a behavior indicative of humans. Thework done in [17] assumes certain behavior on the part of users – they will access moreactive sources than inactive sources. This research relies on the diversity of resourcesaccessed as another characteristic useful to classify attackers. In [18] Weaver et al. showthat even a simplified version of TRW achieves quite good results, which emphasizes thepoint that the difference in behavior between naïve and hostile sources is of importance. 6
  • 7. Spitzner [4] describes the honeypot: a system that has no production value andthus it can be assumed that any access to it is illegitimate. In [5] Spitzner makes thedistinction between behaviors of different hackers and bases this distinction on theirbehavior – as can be evident by their actions when attacking a network. This work isrelated to honeypot technology as the working assumption that most traffic directed atunused IP space is hostile is used. Further, Spitzner notes the classification of Internethostiles based on their activity, which is one of the foundations for this research. Spitzer,however, does not expand on the idea of classification beyond describing a specific groupof Internet “operatives” dubbed “script kiddies”. The intention of this research is to gobeyond that and provide a method of telling apart different hostile sources. Lee [6] presents a data mining approach to intrusion detection. The approachpresented by Lee is to apply current machine learning techniques to network sessions,extracting features and analyzing them – finally building a decision tree to separateintrusion attempts from legitimate traffic. Building decision trees to aid in classificationof traffic is a recognized method, but in this research we chose to rely on expertknowledge as this approach is assumed to produce better results. In Lees research theresulting detection rate of below 70% is claimed to be unsatisfactory. The low successrate in Lee’s work encouraged us to try the presented approach.1.3 Rationale for this study In this paper, we present a practical algorithm to classify intruders according totheir activity as captured at the targeted network. The classification defines a rangebetween automated attacks and fully manual sources of attack such as human hackers.This kind of classification is touched upon in some works, but is not explored to thefullest extent. The algorithm is based on tested hypotheses on human and worm behavioraldifferences. The method relies on past behavior of simple worms in order to find thecommon denominator of the behavior of automated attackers. After determining thiscommon denominator, attackers can be ranked and differentiated. 7
  • 8. 2. Algorithm2.1 Establishing ground truth The research presented below is based on the assumption that there are coredifferences between the traffic generated by a fully manual source of attack such as ahuman and fully automated source of attack, such as a worm. This assumption was testedin an experiment before embarking on the development of a full-scale algorithm.2.1.1 Experiment An initial algorithm was developed to test attacker behavior. The algorithm goesover a pre-recorded network traffic dump. For each traffic source, the algorithmcalculates the series of ICMP echo requests and TCP/UDP ports the source accessed oneach destination address. For each new target the algorithm processes for this source, theresulting access series is compared to those already performed by the source on differenttargets. If the access series is different, it is added to the list. It is proposed that a higher number of different access series for a specific sourceis indicative of the ability to react to changes, and possibly the guidance of a human. Thebasis for this assumption is that we believe a human will react to different hosts in adifferent way, producing a different access series, while automated sources will act upona simple deterministic algorithm programmed beforehand, independent of the conditionsin the network currently attacked. This algorithm is used to test the access series for humans and simpleautomatons/network worms. This is not the same as testing the behavior of an attacker tothe fullest extent, including examining the data passed in the various sessions in thetraffic collected. We believe that establishing that humans and worms differ in their“access series” to targets is a good argument towards proving that there is a difference inthe behavior between an automaton and a more manual source. This conclusion is validbecause a different access series is an example of different behavior. For the remainder of this paper, we term the number of different access series fora single source as an “adaptability score”. The method of calculating an access series foran intruder is given under section 2.3 – Algorithm details. 8
  • 9. 2.1.2 Experiment layout The experiment was composed of two stages. The first stage – the algorithmsresponse to the activity of human attackers was tested, humans being an example of ahighly adaptive agent. In the second phase the algorithms response to a network worm ina lab environment was tested. Network worms with their simple algorithms are examplesof completely automatic agents. Testing Human subjects A group of volunteers was collected for testing the assumptions about humanbehavior. Besides being human, the test subjects had to be knowledgeable securityexperts, with practical experience in penetration testing (i.e. breaking into company websites for testing and improving their security). The human test subjects were given thetask of breaking into a prepared site. Only the IP address range was provided, without mention of the purpose of theexperiment and what the defense mechanisms are employed at the site. The site is protected by an ActiveScout [16] machine, which creates virtualresources for the humans to interact with. The test subjects are able to connect tointeractive and transaction based services such as FTP, NetBIOS, HTTP and Telnet. Thevirtual site presented appears to contain some “security holes” in it in the form ofvulnerable services and open ports. The vulnerable services show a welcome banner thatclearly proclaims an aged version of a common server application, one that is known tothe security community to be open for exploitation and can be abused to break into thehosting computer. Most of the versions reported in the welcome banners have publiclyavailable exploits. The protection of the site is configured so that after scanning for ports, muchmore “virtual” resources than real resources will answer, and that accessing (as opposedto scanning) these virtual resources will cause the attacker to be locked out of the site fora short duration of time (4 hours). As we are using a commercial product to implement the experiment, the lockoutperiod and conditions are configurable. Specifically, in this experiment lockout occursunder the following conditions: 1. If the intruder accesses a NetBIOS based resource, lockout will occur after trying a user/password combination, but enumerating servers, users and other NetBIOS resources does not trigger lockout. 2. If the intruder accesses an http server, the lockout will not be triggered. 3. After the intruder completes a handshake with a TCP based simulated resource – lockout will occur. These are the default settings on the product, which are claimed to providemaximum protection while ensuring a low number of false positives. 9
  • 10. A real web server was installed in the site to serve as the “trophy” to be found.This web server – a Linux machine running an old (unpatched) version of apache, wasvulnerable to several well-known weaknesses. A hacker finding (and breaking into) thisweb server will find a message that will notify him of his success. The structure of the experiment would appear to introduce only minor bias. Theexperiment only approximates a “double blind” experiment, as the tester knows that theattackers are human. The test subjects are not aware which algorithms are being used, orwhat is being tested. The researcher is running an algorithm that is not aware of the typeof test subject. Some bias may have been introduced when designing the testenvironment, however, as the site is obviously not a “real” one. There are no corporateassets in the site, there is no real content beyond that which is provided by theActiveScout machine, and there are no legitimate uses for this site. These facts make thesite markedly different than a commercial presence on the web. However, the site canpass for a bare bones website, and the attackers are employing the same techniques theyuse when penetrating “real” web sites. Thus, the experiment will record human attackeractivity. Testing automated agents - worms To test the assumptions about automated agent behavior, a representative group ofcurrent and past network worms was analyzed. Worms are an example of an automatedagent, as a static pre-defined algorithm dictates a worm’s behavior. The wormexperimentation is logistically simpler than running the experiment with humans, sincemost worms’ algorithms are well known. Analysis of each worms state machineproduced enough knowledge about its expected access series. The purpose of testingworms is to understand how a very simple automaton behaves. Worms represent thesimplest automatons available.2.1.3 Experiment results, human subjects We had each participant write down a journal while conducting the test,containing the activities performed, the tools used and his conclusions. All human subject achieved scores higher or equal to 3 on the adaptability scale,meaning they had 3 or more different access series for the targets they accessed. Whileworms ordinarily had an adaptability score of “1” – as they perform according to adeterministic state machine. Below is a summary of the human experts results, some comments for each, andthe resulting adaptability calculation. 10
  • 11. Test Subject #1 1. Test subject #1 started the experiment by researching the Internet Whois database and DNS to find out information about the site. 2. The test subject continued the experiment with port scans for various common ports. These include FTP, SSH, HTTP, NetBIOS, and some other ports – which include 6000/TCP (X-Windows) and the range 1-1024/TCP (looking for other open services). These scans were conducted on the entire range supplied, in deterministic, sequential order. 3. The test subject followed the above scans by manually connecting to SSH and HTTP open ports, and querying the services for versions and banner information. 4. The test subject attempted to browse the HTTP sites found, but was unable to due to the fact that the ActiveScout [16] had by this time determined it to be hostile, and locked him out of the site for the duration of 4 hours. 5. The test subject did not realize at first that he was locked out – and continued various attempts, finally giving up and returning in 8 hours, to accomplish basically the same results. 6. The test concluded after the second scan. 7. The test subject was able to determine which of the simulated computers was a “real” web server, but did not manage to break into it. The test subject arrived to this conclusion after noting the different content of the real web site and the simulated web sites. Following up on this hunch, the test subject mapped out the TCP/IP fingerprints for the simulated web site and the real web site, and found a bug in the implementation of the simulated web site. The simulated web site, while claiming to be the same OS model as the real web site had a different response to specific TCP packets.Discussion From the log this test subject kept, we learn that he attempted to use outsideinformation (DNS, Whois) to learn about the site before accessing it, attempting to gleaninformation about services and servers available before doing the actual scan. This is thekind of behavior rarely seen in automated agents to date, although possible, mostautomated agents do not tend to seek outside information about the site being attacked, anexception are topologically aware worms discussed below. The classification thesis proved robust against this intruder, as the test subject wasattracted to open ports and attempted to penetrate them in various ways in order to breakinto the site. The test subject did not follow a specific plan once the attack passed the“reconnaissance” phase. This fact contributed to the adaptability and to a greater score inthe algorithm.The access series for this test subjects appear below in figure 1. These port groups arecounted towards an adaptability score of 5. 11
  • 12. 1. 22/TCP/TCP,21/TCP,80/TCP,135/TCP,137/UDP, 22/TCP 2. 22/TCP/TCP,21/TCP,80/TCP,135/TCP,137/UDP, 22/TCP,80/TCP 3. 22/TCP/TCP,21/TCP,80/TCP,135/TCP,137/UDP, 80/TCP 4. 22/TCP/TCP,21/TCP,80/TCP,135/TCP,137/UDP, 22/TCP,6000/TCP 5. 1-1024/TCP,6000/TCP Score = 5 Figure 1: test subject 1 port groups.Test subject #2 1. Test subject started the experiment by making a select few connections to FTP, HTTP, NetBIOS and ports 467/TCP and 484/TCP. It is interesting to note that ports 467/TCP and 484/TCP are not used by any known protocol. When later queried about this, the test subject replied that the ports were selected without a practical reason but as an attempt to confuse an operator of the site, if there is any. 2. These selective connections to specific ports suggest that the test subject made an earlier reconnaissance attempt from a different source address. The test subject confirmed this suspicion in a later interview. 3. After a break of four days, the test subject returned from the same source address, and performed a “ping sweep” of the available address space. 4. The test subject also performed scans to ports 98/TCP and 96/TCP. 5. Finally, the test subject concluded with a horizontal NetBIOS scan.Discussion In a later interview, the test subject confirmed that in the four days he did notactively attack the site, he contemplated the problem and deduced that a network deviceof some sort protects the site. The test subject did not divine the nature of the “networkdevice” – but his later behavior is explained by this deduction. Apparently he realizedthat the device is reacting to scans. He attempted to access unusual ports in order to testthe device’s response. Failing to find any weakness, and receiving what seemed like fakeresults from the NetBIOS scan, he gave up. The access series output by the algorithm is summarized in figure 2 below. 1. 137/UDP 80/TCP ICMP 2. 137/UDP 21/TCP ICMP 3. 98/TCP 96/TCP 4. 484/TCP 467/TCP Score = 4 Figure 2: Test subject 2 port groups. 12
  • 13. Test subject #3 1. Test subject 3 started the experiment by automatically mapping out the entire network range. 2. The test subject than followed up on several ports in random order, concentrating on 22/TCP (SSH) 80/TCP (HTTP) and 21/TCP (FTP). 3. The scan included attempts to determine the operating system of the attacked virtual computers by testing TCP flags – using an automated tool called “nmap” [20]. 4. After every connection attempt to a responsive “virtual host” the test subject would be blocked out of the network for a period of time. 5. This behavior of the network frustrated the test subject, and the experiment was concluded.Discussion While the experiment with this test subject itself did not provide a large amount ofdata to analyze, as the test subject gave up quickly on the test itself, the establishment ofground truth for the research benefited. The test subject, although not very thorough, didwork in an adaptive way, returning to those ports that were responsive in the scannedhosts. The test subject spent more time on responsive resources, and tried to determineversions of applications and operating system. The test subject’s operating mode was completely adaptive, and summarizedbelow are the different ports series output by the algorithm for this test subject. The scorefor this test subject is low, 3, and can be explained by his lack of interest and frustration.Although low, this score is still higher than the score awarded to automated agents,below. 1. 22/TCP 21/TCP 22/TCP 80/TCP 2. 21/TCP 22/TCP 21/TCP 80/TCP 3. 21/TCP 22/TCP 80/TCP Score = 3 Figure 3: Test subject #3 port groups. 13
  • 14. 2.1.4 Testing human subjects, conclusions Several conclusions arise from interviewing the test subjects and analyzing theiractions. It appears that all test subjects became bored at one stage or another with thework. The fact that this was volunteer work, along with their growing suspicion that“something fishy” is going on, contributed to their wrapping up the experiment quickerthan they would if their work was paid for or if their work was for personal interest – twocommon motivations for human hackers. This can lead to another indicator for humanactivity, but this research will not focus on it. All test subjects reported a suspicion that “something fishy” is going on. Thisfeeling was caused by the fact that the ActiveScout protection mechanism used generatedresources in response to scans, and blocked users for a period of time after they haveproved to be hostile by accessing the generated resources. All test subjects eventually gotblocked. The blocking was released automatically after a period of time. This behaviorcontributed to the frustration of the test subjects as the site did not seem to provide aconsistent image of resources over time. Frustration is not a trait of automated agents.While the test subjects are familiar with mechanisms that block offensive users, theywere not familiar with mechanisms that work by detecting access to “virtual” resources –but rather signature based mechanisms. Another conclusion is that hackers may not immediately follow thereconnaissance phase with an attack phase. Some hackers will take the results andanalyze them manually, and than employ exploits against each vulnerable spot at theirleisure. Additionally, these separate stages may come at the target from different sourceIP addresses, due to the use of DHCP or even due to an active attempt to disguise theorigin of an attack. During the reconnaissance phase, when hackers employ automatedscanning tools they are still considered to be automated agents. When hackers turn toselectively attacking scanned resources – they will appear to be manual sources of attack. The algorithm proved robust even if hackers perform the reconnaissance phasefrom a different source address as the source address they attack from. Thereconnaissance source address will be detected (correctly) as an automated source, andthe attack will be detected as a more adaptive source. Running several automated toolsfrom the same source IP can cause enough adaptability to be counted as a manual source,which is as expected – if the attacker employs a fully automated script it will providedifferent results than if the attacker runs several scanning tools in response to results fromthe site. While human test subjects in this experiment employed automated tools for mostof the reconnaissance phase, they performed the attack phase manually. This result isenhanced by the fact that the availability of scanning tools is much greater than theavailability of automated “autorooter tools” – tools that perform an attack automatically.Autorooters perform the attack cycle from beginning to end, finally providing the userwith a list of compromised hosts. Most of the “attack tools” available are usually toolswhich break into a single host, and the user is left to decide what to do after the break-inon his own. A reasonable assumption would be that users that write even more advanced 14
  • 15. attack tools – show even greater adaptability and randomness of actions when finallydoing manual actions on their own.2.1.5 Analysis of recent network worms as examples for automated agents To complete the process of establishing ground truth, we need to look at the otherside of the spectrum, at fully automated sources of attack. The most common automatedagent is the network worm. The following is a study of the recent well-known wormsdiscovered on the Internet. We make the distinction between email worms and network worms. Thedifference between these two kinds of malicious agents is that email based worms usuallyrequire some sort of human interaction – such as opening the mail message and executingan attachment. The claim we need to establish is that network worms have a simple statemachine, resulting in a predictable order of actions. A simple, predictable behavior willresult in a low adaptability score. The worms studied are summarized below – for eachworm, the adaptability score is calculated according to the worm’s algorithm. Included isthe list of ICMP echo requests and TCP/UDP ports the worm accesses, this list is thebasis for the adaptability score calculated. In some cases, a copy of network traffic from a captured host was not found, andwe had to rely on public knowledge bases such as SANS [38] and CERT [39]. Thisknowledge was used to divine the worms’ algorithm, for which an adaptability score wascalculated. SadmindFirst report by CERT: May 10, 2001 [22]. The Sadmind worm (also named PoisonBox.worm in some resources) is a wormthat employs an exploit for a Solaris service (Sadmind) to spread. Once a machine isinfected, in addition to propagating, the machine will scan for and deface IIS web servers.This is an example of a multimode worm, although the second mode is used for defacingand not propagation. Since the IIS infection is performed in conjunction with the propagation process,there are two possible port series for this worm. The worm will attempt the same series ofports for all victims, be they exploitable Solaris workstations or defaceable IIS machines. The port series calculated for this worm are: 1. 111/TCP, 600/TCP this is the propagation phase – the first port is the portmapper service used for exploitation, the exploit will open a backdoor on 600/TCP port. 2. 80/TCP – the port used to launch an exploit against the IIS machines, which will than download the defaced web pages from the attacking machine.Adaptability score for this worm would be 2, due to the additional feature of webpagedefacement – a new series of actions attempted against hosts independently of the firstseries. 15
  • 16. Code Red I & II The code red worm has several variants, but the security community makes adistinction between two major variants, sharing the use of the same vulnerability:Code Red first variant was reported by CERT on Jul. 19, 2001. [23]Code red II report by CERT: Aug. 6, 2001. Code Red connects to the HTTP port of any victim and sends a specially craftedrequest containing the worm itself, the worm code will execute from the stack of theexploited process. This worm’s state machine is simple, and consists of two stages: Thegeneration of a random IP, and the probe/attack/spread phase, which is contained withinthe malicious payload.The list of ports contains only port 80/TCP – resulting in an adaptability score of NimdaFirst report by CERT: Sep. 18, 2001. [26]Nimda is a multi-mode worm, which spreads by either: 1. Infecting files in open NetBIOS shares. 2. Infecting web pages present on the attacked computer, thereby spreading to unsuspecting web clients. 3. Replication by sending itself in an executable attached in an email message from the infected computer. 4. Attacking IIS machines, using a weakness of the HTTP server. 5. Exploiting the backdoor left by code red II and Sadmind.The Nimda worm will gain a score of 2 with the algorithm, as the worm’s algorithmattacks the following ports groups: 1. 80/TCP [IIS & Code Red backdoor] 2. 445/TCP, 139/TCP [open NetBIOS shares] The web page infection, and the infecting email cannot be seen by our chosen IDS.Although this worm has several modes of attack, it’s algorithm is based of a simple statemachine, even taking into consideration the modes not seen by our IDS, the worm willstill retain a low score of 3. This worm does however merit a discussion of multi-modeworms – below. SpidaFirst report by CERT: May 22, 2002. [24] The Spida worm will connect to Microsoft SQL server and exploit a default“null” password for an administrative account in order to propagate and spread. The worm operates entirely over 1433/TCP, providing it with an adaptabilityscore of 1. 16
  • 17. SlapperFirst report by CERT: Sep. 14, 2002. [25] The Slapper worm will test systems for mod_ssl (the vulnerable web servermodule) by grabbing the web page banner from 80/TCP. If mod_ssl is present, the wormwill connect to port 443/TCP, and launch an exploit. Due to the fact that this worm isLinux based, the worm downloads and recompiles it’s code on the infected system, asopposed to other worms that use static binaries. Since the worm operates in a pre-defined port order for all victims, (80/TCP,443/TCP) the worm has an adaptability score of SlammerFirst report by CERT: Jan. 27, 2003. [27] The worm sends a specially crafted UDP packet to SQL services – which willcause the SQL server start sending the same packet to random IP addresses in a neverending loop. The worm’s state machine is extremely simple and is similar to the oneemployed by Code Red II above – generate an IP address and probe/attack/spread. This malware has Adaptability score of 1, where the port series includes simply1434/UDP. BlasterFirst report by CERT: Aug. 11, 2003 [28] Blaster attacks by exploiting a weakness in Microsoft DCOM RPC [41] services.The worm connects to port 135/TCP, and the compromised host is instructed to “callback” to the attacker and retrieve the worm code.The worm operates on this outgoing port only, gaining it a score of Welchia/NachiFirst report by CERT: August 18, 2003 Shortly after blaster was released, a worm dubbed “Welchia”[40] was unleashed.This worm was especially interesting, as it seems it was written with the intent of being a“benign worm”. When Welchia successfully infects a host, it will first seek and removeany copies of the “Blaster” worm. Additionally, the worm will attempt to install therelevant patch from Microsoft. Welchia exploits two vulnerabilities in Microsoftsystems: the RPC DCOM vulnerability used by blaster, and vulnerability in NTDLLcommonly known as the “WebDAV” vulnerability. The worm will seek one of 76 hard-coded class B networks when attempting to infect with the WebDAV vulnerabilities.Presumably, the worm’s author scanned these networks beforehand, as the networks areowned by Chinese organizations, and the WebDAV exploit bundled with Welchia onlyworks on some double-byte character platforms, Chinese being one of these vulnerableplatforms. 17
  • 18. As Welchia targets different hosts with it’s two exploits, the worm will gain anadaptability score of SasserFirst report by CERT: May 1, 2004Reported by Symantec on April 30, 2004. The Sasser worm operates in way not dissimilar to Blaster. The worm will exploitan RPC service on the Microsoft Windows platform, the exploit will cause a remotecommand backdoor to be opened, through which the worm will instruct the victim todownload and execute the malicious payload. The port series for this worm is 445/TCP, with 9996/TCP for the remote backdoorcreated. Some variants use ICMP ping request to speed up the process but all variantsknown follow a single predetermined port series for all victims, gaining the worm a scoreof Santy/PHP include worms.First report by CERT: December 21, 2004 The Santy worm is an example of a topological worm. This worm gathers a “hitlist” of sources to attack from a Google query (later variants use other search engines)looking for a vulnerable version of a PHP (html scripting language used to presentdynamic web pages) application, specifically phpBB – a bulletin board package. Since this worm will not perform any reconnaissance. It will bypass our chosenIDS system. Active Response Technology depends on intruders using baits distributedduring an early reconnaissance phase. However, going over a captured traffic from thisworm, shows that the worm will perform a simple series of actions, all over the HTTPport. The algorithm will award this worm an adaptability score of 1, as it does notdivert from this order of actions, nor does it try any other services.2.1.6 Multi mode worms, discussion As seen above, most worms have a very simple state machine and they tend tofollow a very specific order of actions for each host attacked. However, there are wormsthat will attempt several exploits against a single target, examples from the above list areNimda and Sadmind. Such worms will gain a higher adaptability score if they attempt adifferent set of attacks for each host. Multi mode worms will still linger a very shortwhile with each victim, and they will still follow a pre-defined order of actions, althoughtheir algorithm will be more complex. Multi mode worms tend to have a lower numberof variants, and are generally rare (although this trend may change in the future). 18
  • 19. 2.1.7 Topologically aware worms, discussion Topological worms are worms that use additional sources of information whendeciding which victim to attack. An example is Santy (above). Such worms are not moreor less adaptive than other worms, as the process of choosing victims is independent fromthe process of attacking each victim. These worms do present a problem with this research, our chosen IDS depends onan intruder (be it an adaptive source such as a human or an automatic source) doing somereconnaissance before attacking. If a worm is topologically aware, it may be able toignore the virtual resources the Active Response technology sets as baits by usinginformation gleaned elsewhere – such as a search engine. This problem is mitigated by the fact that looking at a traffic dump of a sourceinfected by a topologically aware worm, an adaptability score can be determinedmanually. 19