SMalL - Semantic Malware Log-based reporter

                               Stefan Ceriu, Stefan Prutianu
2 Ontologies and OWL

2.1 Overview

         The term ontology originates from philosophy. In that context, it is used as...
such queries proactively to reduce the reaction time in case the user adopts a
suggestion. Or if too many answers are retr...
Reasoning support is important because it allows one to:
     check the consistency of the ontology and the knowledge
2.2 Protégé

         Knowledge about the application domain is one of the most important
cornerstones of successful softw...
Individuals, represent objects in the domain in which we are interested 2. An
important difference between Protégé and OWL...
automatically compute the class hierarchy is one of the major benefits of building an
ontology using the OWL-DL sub-langua...
to spot fraudulent activities associated with advanced malware operating on
customers' computers.
         Many early infe...
users may be tempted to install it without knowing what it does. This is the technique
of the Trojan horse or Trojan.
Spyware programs are commercially produced for the purpose of gathering
information about computer users, showing them pop...
3.2 SMalL Ontology

         The SMalL Ontology is designed to aid the development of malware
prevention software by offer...
topological analysis), then copying their program instructions to those hosts. There
are five main categories of computer ...
zombie or, sometimes, a drone. Bots may be further subcategorized
         according to their delivery mechanism. For exam...
Figure 1. SMalL Ontology
3.3 SMalL Java Application

          The SMalL Java Application is a tool designed to compare available
software security...
Figure 2.1 Main application window
Figure 2.2 Add new antivirus window
Figure 2.3 Antivirus comparison window

   1.  Yu, Liang: Introduction to the Semantic Web and Semantic Web Services
   2.  Robert, Colomb: Ontology a...
Upcoming SlideShare
Loading in …5

SMalL - Semantic Malware Log Based Reporter


Published on

In this paper we present the SMalL Ontology for malicious software classification, SMalL Java Application for antivirus systems comparison and the SMalL knowledge based file format for malware related attacks. We believe that our ontology is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. The application is a prototype regarding how this ontology might be used in conjunction with known antivirus capabilities to offer a comprehensive comparison.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SMalL - Semantic Malware Log Based Reporter

  1. 1. SMalL - Semantic Malware Log-based reporter Stefan Ceriu, Stefan Prutianu Faculty of Computer Science, „Al. I. Cuza“ University, Iasi, Romania { stefan.ceriu, stefan.prutianu} Abstract. In this paper we present the SMalL Ontology for malicious software classification, SMalL Java Application for antivirus systems comparison and the SMalL knowledge based file format for malware related attacks. We believe that our ontology is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. The application is a prototype regarding how this ontology might be used in conjunction with known antivirus capabilities to offer a comprehensive comparison. Keywords: malware, semantic web, jena, owl, protégé, ontology, virus, worm, Trojan, spyware, crimeware; 1 Introduction Malware, also known as malicious code and malicious software, refers to a program that is inserted into a system, usually covertly, with the intent of compromising the confidentiality, integrity, or availability of the victim‘s data, applications, or operating system or otherwise annoying or disrupting the victim. Malware has become the most significant external threat to most systems, causing widespread damage and disruption, and necessitating extensive recovery efforts within most organizations. Spyware malware intended to violate a user‘s privacy has also become a major concern to organizations. Although privacy-violating malware has been in use for many years, it has become much more widespread recently, with spyware invading many systems to monitor personal activities and conduct financial fraud. Organizations also face similar threats from a few forms of non-malware threats that are often associated with malware. One of these forms that has become commonplace is phishing, which is using deceptive computer-based means to trick individuals into disclosing sensitive information. Another common form is virus hoaxes, which are false warnings of new malware threats. We will further look into way by witch to classify all the different types of malware by means of a new ontology and an application designed to work with it towards comparing different antivirus systems available.
  2. 2. 2 Ontologies and OWL 2.1 Overview The term ontology originates from philosophy. In that context, it is used as the name of a subfield of philosophy, namely, the study of the nature of existence, the branch of metaphysics concerned with identifying, in the most general terms, the kinds of things that actually exist, and how to describe them. For example, the observation that the world is made up of specific objects that can be grouped into abstract classes based on shared properties is a typical ontological commitment. However, in more recent years, ontology has become one of the many words hijacked by computer science and given a specific technical meaning that is rather different from the original one. Instead of ―ontology‖ we now speak of ―an ontology.‖ In general, an ontology describes formally a domain of discourse. Typically, an ontology consists of a finite list of terms and the relationships between these terms. The terms denote important concepts (classes of objects) of the domain. For example, in a university setting, staff members, students, courses, lecture theaters, and disciplines are some important concepts. The relationships typically include hierarchies of classes. A hierarchy specifies a class C to be a subclass of another class S if every object in C is also included in S. For example, all faculty members are staff members. Apart from subclass relationships, ontologies may include information  properties (X teaches Y)  value restrictions (only faculty members may teach courses)  disjointness statements (faculty and general staff are disjoint)  specifications of logical relationships between objects (every department must include at least ten faculty members). In the context of the Web, ontologies provide a shared understanding of a domain. Such a shared understanding is necessary to overcome differences in terminology. One application‘s zip code may be the same as another application‘s area code. Another problem is that two applications may use the same term with different meanings. In university A, a course may refer to a degree (like computer science), while in university B it may mean a single subject (CS 101). Such differences can be overcome by mapping the particular terminology to a shared ontology or by defining direct mappings between the ontologies. In either case, it is easy to see that ontologies support semantic interoperability. Ontologies are useful for the organization and navigation of Web sites. Many web sites today expose on the left-hand side of the page the top levels of a concept hierarchy of terms. The user may click on one of them to expand the subcategories. Also, ontologies are useful for improving the accuracy of Web searches. The search engines can look for pages that refer to a precise concept in an ontology instead of collecting all pages in which certain, generally ambiguous, keywords occur. In this way, differences in terminology between Web pages and the queries can be overcome. In addition, Web searches can exploit generalization/specialization information. If a query fails to find any relevant documents, the search engine may suggest to the user a more general query. It is even conceivable for the engine to run
  3. 3. such queries proactively to reduce the reaction time in case the user adopts a suggestion. Or if too many answers are retrieved, the search engine may suggest to the user some specializations. The Web Ontology Working Group of W3C identified a number of characteristic use cases for the Semantic Web that would require much more expressiveness than RDF and RDF Schema offer. A number of research groups in both the United States and Europe had already identified the need for a more powerful ontology modeling language. This led to a joint initiative to define a richer language, called DAML+OIL (the name is a join of the names of the U.S. proposal DAML- ONT and the European language OIL). DAML+OIL in turn was taken as the starting point for the W3C Web Ontology Working Group in defining OWL, the language that is aimed to be the standardized and broadly accepted ontology language of the Semantic Web. Ontology languages allow users to write explicit, formal conceptualizations of domain models. The main requirements are a well-defined syntax, efficient reasoning support, a formal semantics, sufficient expressive power and convenience of expression. The importance of a well-defined syntax is clear and known from the area of programming languages; it is a necessary condition for machine processing of information. All the languages we have presented so far have a well defined syntax. DAML+OIL and OWL build upon RDF and RDFS and have the same kind of syntax. Of course, it is questionable whether the XML-based RDF syntax is very user- friendly; there are alternatives better suited to human users (for example, see the OIL syntax). However, this drawback is not very significant because ultimately users will be developing their own ontologies using authoring tools, or more generally, ontology development tools, instead of writing them directly in DAML+OIL or OWL. A formal semantics describes the meaning of knowledge precisely. Precisely here means that the semantics does not refer to subjective intuitions, nor is it open to different interpretations by different people (or machines). The importance of a formal semantics is well-established in the domain of mathematical logic, for instance. One use of a formal semantics is to allow people to reason about the knowledge. For ontological knowledge, we may reason about the following:  Class membership. If x is an instance of a class C, and C is a subclass of D, then we can infer that x is an instance of D  Equivalence of classes. If class A is equivalent to class B, and class B is equivalent to class C, then A is equivalent to C, too.  Consistency. Suppose we have declared x to be an instance of the class A and that A is a subclass of B ∩ C, A is a subclass of D, and B and D are disjoint. Then we have an inconsistency because A should be empty but has the instance x. This is an indication of an error in the ontology.  Classification. If we have declared that certain property-value pairs are a sufficient condition for membership in a class A, then if an individual x satisfies such conditions, we can conclude that x must be an instance of A. Semantics is a prerequisite for reasoning support. Derivations such as the preceding ones can be made mechanically instead of being made by hand.
  4. 4. Reasoning support is important because it allows one to:  check the consistency of the ontology and the knowledge  check for unintended relationships between classes  automatically classify instances in classes Automated reasoning support allows one to check many more cases than could be checked manually. Checks like the preceding ones are valuable for designing large ontologies, where multiple authors are involved, and for integrating and sharing ontologies from various sources. A formal semantics and reasoning support are usually provided by mapping an ontology language to a known logical formalism, and by using automated reasoners that already exist for those formalisms. OWL is (partially) mapped on description logic, and makes use of existing reasoners such as FaCT and RACER. Description logics are a subset of predicate logic for which efficient reasoning support is possible. RDF and RDFS allow the representation of some ontological knowledge. The main modeling primitives of RDF/RDFS concern the organization of vocabularies in typed hierarchies: subclass and sub-property relationships, domain and range restrictions, and instances of classes. However, a number of other features are missing. Here we list a few:  Local scope of properties. rdfs:range defines the range of a property, say eats, for all classes. Thus in RDF Schema we cannot declare range restrictions that apply to some classes only. For example, we cannot say that cows eat only plants, while other animals may eat meat too  Disjointness of classes. Sometimes we wish to say that classes are disjoint. For example, male and female are disjoint. But in RDF Schema we can only state subclass relationships, e.g., female is a subclass of person  Boolean combinations of classes. Sometimes we wish to build new classes by combining other classes using union, intersection, and complement. For example, we may wish to define the class person to be the disjoint union of the classes male and female. RDF Schema does not allow such definitions  Cardinality restrictions. Sometimes we wish to place restrictions on how many distinct values a property may or must take. For example, we would like to say that a person has exactly two parents, or that a course is taught by at least one lecturer. Again, such restrictions are impossible to express in RDF Schema  Special characteristics of properties. Sometimes it is useful to say that a property is transitive (like ―greater than‖), unique (like ―is mother of‖), or the inverse of another property (like ―eats‖ and ―is eaten by‖) Thus we need an ontology language that is richer than RDF Schema, a language that offers these features and more. In designing such a language one should be aware of the trade-off between expressive power and efficient reasoning support. Generally speaking, the richer the language, the more inefficient the reasoning support becomes, often crossing the border of non-computability. Thus we need a compromise, a language that can be supported by reasonably efficient reasoners while being sufficiently expressive to express large classes of ontologies and knowledge.
  5. 5. 2.2 Protégé Knowledge about the application domain is one of the most important cornerstones of successful software projects. We must gather at least a basic understanding of the concepts relevant to your customers before we can begin coding. For example, we need to know how your customer's business processes work before we can develop a warehouse management system; we need to know that users who buy cat food might also be interested in cat litter before you can implement purchase recommendations for an online shop. We acquire such knowledge from domain experts and capture it in some kind of domain model. In simple cases, we can scribble these models on paper. This approach works fine for small projects and when the experts help us decipher their handwriting. But it's better to have models that directly translate into a Java program. For instance, we can use Unified Modeling Language (UML) to sketch the domain models with class diagrams and use cases. UML is quite good for quickly getting to an implementation, but it is basically a language for object-oriented programming that few domain experts fully understand. And it consists of a fixed set of modeling constructs (such as classes and attributes) that are not very useful when domain experts would rather talk about specific business processes and products. The Protégé-OWL editor is an extension of Protégé that supports the Web Ontology Language (OWL). OWL is the most recent development in standard ontology languages, endorsed by the World Wide Web Consortium (W3C) to promote the Semantic Web vision. An OWL ontology may include descriptions of classes, properties and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. These entailments may be based on a single document or multiple distributed documents that have been combined using defined OWL mechanisms. The Protégé-OWL editor enables users to: • Load and save OWL and RDF ontologies. • Edit and visualize classes, properties, and SWRL rules. • Define logical class characteristics as OWL expressions. • Execute reasoners such as description logic classifiers. • Edit OWL individuals for Semantic Web markup. Protégé-OWL's flexible architecture makes it easy to configure and extend the tool. It is tightly integrated with Jena and has an open-source Java API for the development of custom-tailored user interface components or arbitrary Semantic Web services. From a programmer's perspective, one of Protégé's most attractive features is that it provides an open source API to plug in your own Java components and access the domain models from your application. As a result, you can develop systems very rapidly: just start with the underlying domain model, let Protégé generate the basic user interface, and then gradually write widgets and plug-ins to customize look-and- feel and behavior.
  6. 6. Individuals, represent objects in the domain in which we are interested 2. An important difference between Protégé and OWL is that OWL does not use the Unique Name Assumption (UNA). This means that two different names could actually refer to the same individual. For example, ―Queen Elizabeth‖, ―The Queen‖ and ―Elizabeth Windsor‖ might all refer to the same individual. In OWL, it must be explicitly stated that individuals are the same as each other, or different to each other — otherwise they might be the same as each other, or they might be different to each other. Properties are binary relations on individuals - i.e. properties link two individuals together. For example, the property hasSibling might link the individual Matthew to the individual Gemma, or the property hasChild might link the individual Peter to the individual Matthew. Properties can have inverses. For example, the inverse of hasOwner is isOwnedBy. Properties can be limited to having a single value – i.e. to being functional. They can also be either transitive or symmetric. OWL classes are interpreted as sets that contain individuals. They are described using formal (mathematical) descriptions that state precisely the requirements for membership of the class. For example, the class Cat would contain all the individuals that are cats in our domain of interest. Classes may be organised into a superclass-subclass hierarchy, which is also known as a taxonomy. Subclasses specialize (‗are subsumed by‘) their superclasses. For example consider the classes Animal and Cat – Cat might be a subclass of Animal (so Animal is the superclass of Cat). This says that, ‗All cats are animals‘, ‗All members of the class Cat are members of the class Animal‘, ‗Being a Cat implies that you‘re an Animal‘, and ‗Cat is subsumed by Animal‘. One of the key features of OWL-DL is that these superclass- subclass relationships (subsumption relationships) can be computed automatically by a reasoned. In OWL classes are built up of descriptions that specify the conditions that must be satisfied by an individual for it to be a member of the class. OWL Classes are assumed to ‗overlap‘. We therefore cannot assume that an individual is not a member of a particular class simply because it has not been asserted to be a member of that class. In order to ‗separate‘ a group of classes we must make them disjoint from one another. This ensures that an individual who has been asserted to be a member of one of the classes in the group cannot be a member of any other classes in that group. One of the key features of ontologies that are described using OWL-DL is that they can be processed by a reasoner. One of the main services offered by a reasoner is to test whether or not one class is a subclass of another class. By performing such tests on the classes in an ontology it is possible for a reasoner to compute the inferred ontology class hierarchy. Another standard service that is offered by reasoners is consistency checking. Based on the description (conditions) of a class the reasoner can check whether or not it is possible for the class to have any instances. A class is deemed to be inconsistent if it cannot possibly have any instances. Protégé allows different OWL reasoners to be plugged-in; the reasoner shipped with Protégé is called Fact++. The ontology can be ‗sent to the reasoner‘ to automatically compute the classification hierarchy and also to check the logical consistency of the ontology. In Protégé the ‗manually constructed‘ class hierarchy is called the asserted hierarchy. The class hierarchy that is automatically computed by the reasoner is called the inferred hierarchy. Being able to use a reasoner to
  7. 7. automatically compute the class hierarchy is one of the major benefits of building an ontology using the OWL-DL sub-language. When constructing very large ontologies (with upwards of several thousand classes in them) the use of a reasoner to compute subclass-superclass relationships between classes becomes almost vital. Without a reasoner it is very difficult to keep large ontologies in a maintainable and logically correct state. In cases where ontologies can have classes that have many superclasses (multiple inheritance) it is nearly always a good idea to construct the class hierarchy as a simple tree. Classes in the asserted hierarchy (manually constructed hierarchy) therefore have no more than one superclass. Computing and maintaining multiple inheritance is the job of the reasoner. This technique helps to keep the ontology in a maintainable and modular state. Not only does this promote the reuse of the ontology by other ontologies and applications, it also minimizes human errors that are inherent in maintaining a multiple inheritance hierarchy. 3 Malware 3.1 Overview Malware, short for malicious software, is software designed to infiltrate a computer system without the owner's informed consent. The expression is a general term used by computer professionals to mean a variety of forms of hostile, intrusive, or annoying software or program code. The term "computer virus" is sometimes used as a catch-all phrase to include all types of malware, including true viruses. Software is considered malware based on the perceived intent of the creator rather than any particular features. Malware includes computer viruses, worms, Trojan horses, most root kits, spyware, dishonest adware, crime ware and other malicious and unwanted software. In law, malware is sometimes known as a computer contaminant, for instance in the legal codes of several U. S. states, including California and West Virginia. Malware is not the same as defective software, that is, software that has a legitimate purpose but contains harmful bugs. Preliminary results from Symantec published in 2008 suggested that “the release rate of malicious code and other unwanted programs may be exceeding that of legitimate software applications”. According to F-Secure, "as much malware [was] produced in 2007 as in the previous 20 years altogether." Malware's most common pathway from criminals to users is through the Internet: primarily by e-mail and the World Wide Web. The prevalence of malware as a vehicle for organized Internet crime, along with the general inability of traditional anti-malware protection platforms to protect against the continuous stream of unique and newly produced professional malware, has seen the adoption of a new mindset for businesses operating on the Internet - the acknowledgment that some sizable percentage of Internet customers will always be infected for some reason or other, and that they need to continue doing business with infected customers. The result is a greater emphasis on back-office systems designed
  8. 8. to spot fraudulent activities associated with advanced malware operating on customers' computers. Many early infectious programs, including the first Internet Worm and a number of MS-DOS viruses, were written as experiments or pranks generally intended to be harmless or merely annoying rather than to cause serious damage to computers. In some cases the perpetrator did not realize how much harm their creations could do. Young programmers learning about viruses and the techniques wrote them for the sole purpose that they could or to see how far it could spread. As late as 1999, widespread viruses such as the Melissa virus appear to have been written chiefly as pranks. Hostile intent related to vandalism can be found in programs designed to cause harm or data loss. Many DOS viruses, and the Windows ExploreZip worm, were designed to destroy files on a hard disk, or to corrupt the file system by writing invalid data. Network-borne worms such as the 2001 Code Red worm or the Ramen worm fall into the same category. Designed to vandalize web pages, worms may seem like the online equivalent to graffiti tagging, with the author's alias or affinity group appearing everywhere the worm goes. However, since the rise of widespread broadband Internet access, malicious software has come to be designed for a profit motive, either more or less legal (forced advertising) or criminal. For instance, since 2003, the majority of widespread viruses and worms have been designed to take control of users' computers for black-market exploitation.[citation needed] Infected "zombie computers" are used to send email spam, to host contraband data such as child pornography, or to engage in distributed denial-of-service attacks as a form of extortion. Another strictly for-profit category of malware has emerged in spyware - programs designed to monitor users' web browsing, display unsolicited advertisements, or redirect affiliate marketing revenues to the spyware creator. Spyware programs do not spread like viruses; they are, in general, installed by exploiting security holes or are packaged with user-installed software, such as peer- to-peer applications. The best-known types of malware, viruses and worms, are known for the manner in which they spread, rather than any other particular behavior. The term computer virus is used for a program that has infected some executable software and that causes that software, when run, to spread the virus to other executable software. Viruses may also contain a payload that performs other actions, often malicious. A worm, on the other hand, is a program that actively transmits itself over a network to infect other computers. It too may carry a payload. These definitions lead to the observation that a virus requires user intervention to spread, whereas a worm spreads automatically. Using this distinction, infections transmitted by email or Microsoft Word documents, which rely on the recipient opening a file or email to infect the system, would be classified as viruses rather than worms. Some writers in the trade and popular press appear to misunderstand this distinction, and use the terms interchangeably. For a malicious program to accomplish its goals, it must be able to do so without being shut down, or deleted by the user or administrator of the computer on which it is running. Concealment can also help get the malware installed in the first place. When a malicious program is disguised as something innocuous or desirable,
  9. 9. users may be tempted to install it without knowing what it does. This is the technique of the Trojan horse or Trojan. In broad terms, a Trojan horse is any program that invites the user to run it, concealing a harmful or malicious payload. The payload may take effect immediately and can lead to many undesirable effects, such as deleting the user's files or further installing malicious or undesirable software. Trojan horses known as droppers are used to start off a worm outbreak, by injecting the worm into users' local networks. One of the most common ways that spyware is distributed is as a Trojan horse, bundled with a piece of desirable software that the user downloads from the Internet. When the user installs the software, the spyware is installed alongside. Spyware authors who attempt to act in a legal fashion may include an end-user license agreement that states the behavior of the spyware in loose terms, which the users are unlikely to read or understand. Once a malicious program is installed on a system, it is essential that it stay concealed, to avoid detection and disinfection. The same is true when a human attacker breaks into a computer directly. Techniques known as root kits allow this concealment, by modifying the host operating system so that the malware is hidden from the user. Root kits can prevent a malicious process from being visible in the system's list of processes, or keep its files from being read. Originally, a root kit was a set of tools installed by a human attacker on a Unix system where the attacker had gained administrator (root) access. Today, the term is used more generally for concealment routines in a malicious program. Some malicious programs contain routines to defend against removal, not merely to hide themselves, but to repel attempts to remove them. An early example of this behavior is recorded in the Jargon File tale of a pair of programs infesting a Xerox CP-V timesharing system. Each ghost-job would detect the fact that the other had been killed, and would start a new copy of the recently slain program within a few milliseconds. The only way to kill both ghosts was to kill them simultaneously (very difficult) or to deliberately crash the system. Similar techniques are used by some modern malware, wherein the malware starts a number of processes that monitor and restore one another as needed. A backdoor is a method of bypassing normal authentication procedures. Once a system has been compromised (by one of the above methods, or in some other way), one or more backdoors may be installed in order to allow easier access in the future. Backdoors may also be installed prior to malicious software, to allow attackers entry. The idea has often been suggested that computer manufacturers preinstall backdoors on their systems to provide technical support for customers, but this has never been reliably verified. Crackers typically use backdoors to secure remote access to a computer, while attempting to remain hidden from casual inspection. To install backdoors crackers may use Trojan horses, worms, or other methods. During the 1980s and 1990s, it was usually taken for granted that malicious programs were created as a form of vandalism or prank. More recently, the greater share of malware programs have been written with a financial or profit motive in mind. This can be taken as the malware authors' choice to monetize their control over infected systems: to turn that control into a source of revenue.
  10. 10. Spyware programs are commercially produced for the purpose of gathering information about computer users, showing them pop-up ads, or altering web-browser behavior for the financial benefit of the spyware creator. For instance, some spyware programs redirect search engine results to paid advertisements. Others, often called "stealware" by the media, overwrite affiliate marketing codes so that revenue is redirected to the spyware creator rather than the intended recipient. Spyware programs are sometimes installed as Trojan horses of one sort or another. They differ in that their creators present themselves openly as businesses, for instance by selling advertising space on the pop-ups created by the malware. Most such programs present the user with an end-user license agreement that purportedly protects the creator from prosecution under computer contaminant laws. However, spyware EULAs have not yet been upheld in court. Another way that financially-motivated malware creators can profit from their infections is to directly use the infected computers to do work for the creator. The infected computers are used as proxies to send out spam messages. A computer left in this state is often known as a zombie computer. The advantage to spammers of using infected computers is they provide anonymity, protecting the spammer from prosecution. Spammers have also used infected PCs to target anti-spam organizations with distributed denial-of-service attacks. In order to coordinate the activity of many infected computers, attackers have used coordinating systems known as botnets. In a botnet, the malware or malbot logs in to an Internet Relay Chat channel or other chat system. The attacker can then give instructions to all the infected systems simultaneously. Botnets can also be used to push upgraded malware to the infected systems, keeping them resistant to antivirus software or other security measures. It is possible for a malware creator to profit by stealing sensitive information from a victim. Some malware programs install a key logger, which intercepts the user's keystrokes when entering a password, credit card number, or other information that may be exploited. This is then transmitted to the malware creator automatically, enabling credit card fraud and other theft. Similarly, malware may copy the CD key or password for online games, allowing the creator to steal accounts or virtual items. Another way of stealing money from the infected PC owner is to take control of a dial-up modem and dial an expensive toll call. Dialer (or porn dialer) software dials up a premium-rate telephone number such as a U.S. "900 number" and leave the line open, charging the toll to the infected user. Data-stealing malware is a web threat that divests victims of personal and proprietary information with the intent of monetizing stolen data through direct use or underground distribution. Content security threats that fall under this umbrella include keyloggers, screen scrapers, spyware, adware, backdoors, and bots. The term does not refer to activities such as spam, phishing, DNS poisoning, SEO abuse, etc. However, when these threats result in file download or direct installation, as most hybrid attacks do, files that act as agents to proxy information will fall into the data-stealing malware category.
  11. 11. 3.2 SMalL Ontology The SMalL Ontology is designed to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing malicious software. It covers all the different categories and subcategories of malware and organized based on behavior, propagation methods, payload, motivation etc. The ontology is divided into five main categories based on the major malicious software threats: Crimeware, Spyware, Trojans, Viruses and Worms. A virus replicates by attaching its program instructions to an ordinary ―host‖ program or document, so that the virus instructions are executed when the host program is executed. There are five main virus categories:  File virus - uses the file system of a given OS (or more than one) to propagate. File viruses include viruses that infect executable files, companion viruses that create duplicates of files, viruses that copy themselves into various directories, and link viruses that exploit file system features.  Boot sector virus - infects the boot sector or the master boot record, or displaces the active boot sector, of a hard drive. Once the hard drive is booted up, boot sector viruses load themselves into the computer‘s memory. Many boot sector viruses, once executed, prevent the O S from booting. Boot sector viruses were widespread in the 1990s, but have almost disappeared since the introduction of 32-bit processors and the near-disappearance of floppy disks as a storage medium for executables.  Macro virus - written in the macro scripting languages of word processing, accounting, editing, or project applications, it propagates by exploiting the macro language‘s properties in order to transfer itself from the infected file containing the macro script to another file. The most widespread macro viruses are for Microsoft Office applications (Word, Excel, PowerPoint, Access). Because they are written in the code of application software, macro viruses are platform independent and can spread between Mac, Windows, Linux, and any other system running the targeted application.  Email virus - refers to the delivery mechanism rather than the infection target or behavior. Email can be used to transmit any of the above types of virus by copying and emailing itself to every address in the victim‘s email address book, usually within an email attachment. Each time a recipient opens the infected attachment, the virus harvests that victim‘s email address book and repeats its propagation process.  Multi-variant virus - the same core virus but implemented with slight variations, so that an anti-virus scanner that can detect one variant will not be able to detect the other variants. Worms are Self-propagating program that spreads over a network, usually the Internet. Unlike viruses, may not depend on other programs or victim actions (such as opening an infected email attachment or clicking on the Web link for a malware Web site) for replication, dissemination, or execution. Worms spread by locating other vulnerable potential hosts on the network (e.g., via scanning or
  12. 12. topological analysis), then copying their program instructions to those hosts. There are five main categories of computer worms:  Email worm - spreads via infected email attachments  Instant messaging worm - Spread via infected attachments to IM messages or reader access to Uniform Resource Locators (URL) in IM messages that point to malicious Web sites from which the worm is downloaded.  IRC Worm - Comparable to IM worms, but exploit IRC rather than IM channels.  P2P Worm - Copies itself into a shared folder, then uses P2P mechanisms to announce its existence in hopes that other P2P users will download and execute it.  Web Worm - Spread via user access to a Web page, File Transfer Protocol (FTP) site, or other Internet resources. A Trojan Horse is a destructive program that masquerades as a benign program. Stealthware such as spyware, rootkits, keyloggers, trapdoors, and certain adware represents a subset of Trojans that is intentionally designed to be hard-to detect or undetectable Trojan horse software installs itself on the victim‘s computer when the victim opens an email attachment or computer file containing the Trojan, or clicks on a Web link that directs the victim‘s browser to a Web site from which the Trojan is automatically downloaded. Once installed, the software can be controlled remotely by hackers for criminal or other malicious purposes, such as extracting money, passwords, or other sensitive information, or to create a zombie from which to disseminate spam, phishing emails, the same Trojan, or other malware to other computers on the network/Internet. Trojan horses are classified in six categories:  Backdoor Trojan (also known as Trapdoor Trojan or Remote-Access Trojan) acts as a remote administration utility that enables control of the infected machine by a remote host.  Data-collecting Trojan - surreptitiously collects and sends back information from the victim‘s machine. The surreptitious nature of such software has led to it being referred to as ―stealth ware.‖  Downloader or Dropper - downloads, installs, and in the case of the Downloader, launches additional malware on the victim‘s machine.  Proxy Trojan - turns the victim‘s computer into a proxy server (i.e., a zombie) that operates on behalf of the remote attacker. If the attacker‘s activities are detected and tracked, the trail leads back to the victim rather than to the attacker.  Rootkit - a collection of programs used by a hacker to evade detection while trying to gain unauthorized access to the victim‘s computer. Rootkits are designed to hide processes, files or Windows Registry entries. Rootkits are used by hackers to hide their tracks or to insert threats surreptitiously on compromised computers. Various types of malware use rootkits to hide themselves on a computer  Bot - any type of malware (e.g., Trojan, worm, spyware bots or spybots) that enables the attacker to surreptitiously gain complete control of the infected machine. A computer that has been infected by a bot is referred to as a
  13. 13. zombie or, sometimes, a drone. Bots may be further subcategorized according to their delivery mechanism. For example, a Spam bot is similar to an email virus or mass-mailing worm in that it relies on the intended victim‘s action to activate it, either by opening an attachment affixed to a spam email, or by clicking on a Web link within a spam email which points to a Web site from which the bot is downloaded to the victim‘s computer Spyware represents non-Trojan stealthware that has the same objectives and performs the same types of actions as spyware Trojans. A number of bots have spyware capabilities, and are referred to as spybots. They are categorized in 2 main categories:  Adware- Software that automatically displays advertising material to the user, resulting in an unpleasant user experience. If malicious, adware usually exhibits the behaviors and/or infection techniques used by viruses, worms, and/or spyware.  Tracking cookie - a cookie is a data structure that stores information about a user‘s browser session state. While cookies are a necessary component of how many Web sites operate, tracking cookies are specifically designed to track a user‘s behavior across multiple sites. Spyware sites routinely use tracking cookies to monitor a user‘s browsing behavior and associate it with the user‘s personal data such as name, credit card number, and other private information, which can then be harvested and sold to illicit marketers or cybercriminals. Crimeware is malware used in aid of criminal activities. This said, there are specific types of malware used predominantly or exclusively as crimeware. Four main crimeware are known:  Email redirector - used to intercept and relay outgoing emails to the attacker‘s system.  IM redirector - used to intercept and relay outgoing instant messages to the attacker‘s system.  Clicker - redirects the victim to a Web site or Internet resource by sending the necessary commands to the victim‘s browser or replacing the system file(s) in which standard Internet URLs are stored (e.g., the Microsoft Windows hosts file).  Transaction generator- targets not the end-user computer but the computer of a corporate or financial institution‘s computer center. The software generates fraudulent transactions on behalf of the attacker within the victim organization‘s payment processing or other financial systems. In some instances, transaction generators are used to intercept credit card data for abuse by the attacker.  Session hijacker - usually a malicious browser component that, after the victim logs in or begins a browser session, takes over that session to enable a hacker to exploit it, usually to perform criminal actions, such as transferring money from the victim‘s bank account.
  14. 14. Figure 1. SMalL Ontology
  15. 15. 3.3 SMalL Java Application The SMalL Java Application is a tool designed to compare available software security systems. It works in conjunction with the SMalL ontology to provide better ways by which users can examine similarities and differences between antivirus solutions. The application allows the user to add a new antivirus to the ontology and link its properties to the available malware knowledgebase. The user can afterwards compare the security systems and see exactly which one prevents against a given type of malware and which one doesn’t, on which operating system they run .etc. The application main windows are presented in Figure2.1, Figure 2.2 and Figure 2.3 3.3 SMalL File Format We believe that the file format for malware related attacks can be an OWL file created by extracting data relevant to the given attack directly from the SMalL Ontology. For example in the case of an adware attack the file could contain the antivirus used, the operating system it runs on and that the system might also be infected with a Trojan. If this is the case and the antivirus didn’t manage to find the Trojan then supplementary scans are required to find the problem. In the case a system is infected by multiple malware programs then a custom file can be created and the problems related so that on other occasions the antivirus can check for all of them when one appears. 3.3 Conclusions We created an ontology for malicious software classification which is able to aid the development of malware prevention software by offering a common knowledge base and a clear classification of the existing security issues. We presented an application prototype which handles antivirus software comparison based on the information available in the ontology and user entered data. We also proposed The SMalL file format which is a comprehensive way to report software security issues and brings new possibilities regarding scanning for software security problems.
  16. 16. Figure 2.1 Main application window
  17. 17. Figure 2.2 Add new antivirus window
  18. 18. Figure 2.3 Antivirus comparison window
  19. 19. References 1. Yu, Liang: Introduction to the Semantic Web and Semantic Web Services 2. Robert, Colomb: Ontology and the Semantic Web 3. Matthew, Horridge: A Practical Guide To Building OWL Ontologies Using Protégé 4 and CO-ODE Tools 4. Nicholas, Weaver, Vern, Paxson, Stuat, Staniford, Robert, Cunningham: A Taxonomy of Computer Worms 5. Information Assurance Tools Report: Malware 6. AntiVirus Software Review: 7. Protégé documentation: 8. Joanna, Rutkowska: Introducing Stealth Malware Taxonomy 9. Peter, Mell, Karen, Kent, Joseph, Nusbaum: Introducing Stealth Malware Taxonomy 10. Peter, Gutmann: The commercial malware industry 11. Grigoris, Antoniou, Frank, van Harmelen: Web Ontology Language: OWL 12. Jena documentation: