Deductive Databases


Published on

Is it possible to create applications that rely on fewer volumes of data? Can applications really be made more intelligent if they deal with less data? And if so, in what ways can they reason? Can this be done on the existing data storage solutions or should we adopt new ones? Furthermore, how can applications deal with multimedia in order to take full advantage of them? How can multimedia be treated differently than text content? And finally, how can we apply all the mentioned above in today’s applications?

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Deductive Databases

  1. 1. Antonin University Baabda- 2010 PJI Report Maroun Baydoun INF 1312Deductive Databases Under the supervision of Mr. Samir Saad Presented to Mr. Chady Abou Jaoudé
  2. 2. I would like to take the opportunity to thank Father Fady Fadel and Dr. Paul Ghobril forproviding us with a great level of education and putting under our disposition all the tools weneed to succeed. I would also like to thank Mr. Samir Saad for accepting to supervise me in thisproject.Last but not least, I would like to thank the open source community for supplying theperquisites for my project and supporting me on the online forums.
  3. 3. Table of content: I. Technological context and motivation 1. Introduction 2. Theoretical study 3. Summary II. Functional and Technical context 1.Introduction 2.Tools and technologies used 3.Developed solution 4.UML Diagrams a. DCU b. DES c. DCL 5. Classes 6. Summary III. Conclusion and future workList of figuresList of referencesAppendices
  4. 4. I. Technological context and motivation
  5. 5. A. Introduction:Storing data is amongst the most important issues faced in every aspect of IT. Developedsolutions usually rely on data stores to stock and retrieve information. As applications tend tobecome more and more complex, so does the data they deal with. Whether using SQLdatabases, XML files, LDAP directories or any other form of storage medium, large amounts ofdata require considerable space and imply performance penalties, not to mention maintenanceand administration nightmares.Even though big volumes of data can cause that much trouble, they are necessary in manycases; some applications simply need that much data to function properly. Let’s take forexample a large-scale e-commerce web application. It needs to have access to a large array ofinformation (users’ details and credentials, products, transactions…); without this data, theapplication is pretty useless. On the other hand, no matter how useful this data is, we can’tneglect all the implications related to storing and maintaining it. Large disk spaces, big amountsof backups, considerable work hours to keep it in good condition…are just a few of the hasslecaused by data in working environments.Another drawback of using large amounts of data is that it makes applications look and behaveless “smartly”. Feeding applications all this data ultimately means that they can’t think forthemselves. Hence their role is limited to accessing and updating the given data without reallyreasoning beyond the business rules established in them. In a fast-evolving world like ours,applications can’t remain reduced to that function…applications need to start thinking!On the other hand, as applications become more advanced, so does their content. It’sunconceivable to state that they are solely charged with delivering textual content: In the 21stcentury applications are becoming a key player in conveying multimedia materials. Thissituation urges the application to start treating this kind of content differently; multimediashould be handled from a different perspective, one that accounts for its substance and not justits presentation.
  6. 6. All that was stated earlier raises many questions. Firstly, is it possible to create applications thatrely on fewer volumes of data? Can applications really be made more intelligent if they dealwith less data? And if so, in what ways can they reason? Can this be done on the existing datastorage solutions or should we adopt new ones? Furthermore, how can applications deal withmultimedia in order to take full advantage of them? How can multimedia be treated differentlythan text content? And finally, how can we apply all the mentioned above in today’sapplications?
  7. 7. B. Theoretical study:Databases: The prime medium for data storageIn the early days of computing, data was held on decks of cards and magnetic tapes. Theprograms read the files, changed their contents and wrote them back again as new version.The first revolution came with the invention of disk technology. It made it possible to changefile contents without copying their entire content. However at this point, the structure of thefiles was still embedded directly in the programs responsible for the updates.The next advance was the introduction of database technology. It provides a way to describedata independently from the application program. Data has finally become a separate resourcein its own right. Another advantage of this technology is that the procedures used to access andupdate the data are hidden from the programmer. The structure and content of a databaserecord was still described independently from the way it’s stored, accessed and updated, andthe connection between records is represented as pointers.The most important leap was the introduction of relational databases where all records can beidentified by their contents. The real innovation behind relational databases is that they followthe mathematical theory of relations. Another advance is query optimization which was madepossible thanks to the ability to transform queries from one expression to another having thesame result as the first.iDatabases are particularly useful under the following circumstances:  Concurrent changes to the data.  Regular changes to the data.  Large sets of data need to be shared among many people.  Queries need to be executed fast without analysis.
  8. 8. SQL: Making dialogs with the databaseIn the early 70s, IBM engineers developed the SQL language to manipulate and retrieve datastored in their relational database management system (System R). Initially called SEQUEL(Structured English Query Language), the name was later changed to SQL for legal reasons.The SQL language is composed of the following elements: Clauses constituting components of statements and queries. Expressions producing scalar values or tables consisting of columns and rows of data. Predicates specifying conditions used to limit the effects of statements and queries, or to change program flow. Queries which retrieve data based on specific criteria. Statements which may have a persistent effect on schemas and data, or which may control transactions, program flow, connections, sessions, or diagnostics. Insignificant whitespace generally ignored in SQL statements and queries, making it easier to format SQL code. The semicolon statement terminator (‘;’) which is optional on some platform but is still part of the standard SQL grammar.iiThe following figure is a simple SQL statement that illustrates the different components of SQLlanguage:
  9. 9. The SQL language is used as a front-end for many RDBS such as Oracle, MS SQL and MySQL.While these systems adhere to the SQL standard grammar, each adds their own set of customconstructs aiming at offering more advanced functionalities. Not only that, but also many ofthese RDMS provided their own database manipulation languages that make up for the SQLlack of procedural capabilities such as (such as T-SQL for MS SQL) and PL/SQL for Oracle.The SQL language can be divided into three subsystems according to functionality:  Data description: describes the structure tables, views and other kinds of objects (commonly referred to as DML).  Data access: enables reading, saving, modifying and dropping data. (commonly referred to as DDL)  Privileges: used to grant and revoke rights for users in RDBMS.SQL: The things we cannot expressSQL has made querying databases a very easy task. With a simple Select statement, largeamount of data can be extracted. However, SQL comes with its own limitations. Some queriesare simply too difficult or impossible to express. This leads to a difficult situation: if SQL has itsshortcomings, then what are our alternatives?iii
  10. 10. Let’s take an example of a situation where SQL is an inadequate choice. The following table(named Assembly) illustrates the different parts of a vehicle and how they’re related. Number Subpart Part trike wheel 3 trike frame 1 frame seat 1 frame pedal 1 wheel spoke 2 wheel tire 1 tire rim 1 tire tube 1As we can see in this case, some parts are made from a number of subparts, which in turn aremade from other subparts and so in. This suggests that the data follows a hierarchical structure.
  11. 11. This data structure is very powerful because it expresses perfectly how the parts are related to each other. However, it represents a major obstacle for SQL queries because the relational algebra that SQL follows are not very expressive for such situations. With SQL, traversing a hierarchical structure can be done by joining the assembly with itself. For example, to write a query that returns all the components of a trike, we need to join the Assembly table with itself a first time to know that a trike contains spoke, tire, seat and pedal, and a second time to know that it contains rim and tube. This example illustrates the problem with SQL and hierarchical data: We need table joins as many levels as there are in the structure. This is obviously not a big issue in small datasets, but its severity immediately escalates in situations where the structure is composed of tens of levels. SQL does offer another solution: Recursive queries. A recursive query is similar to a regular SQL query but is used with tree-like data structures. Here’s an example of such query: However, this solution is not optimal because:  It’s not a standard solution therefore it’s not supported by every RDBMS.  The query syntax is not the same on all the supported RDBMS.  Only supports linear recursion.Linear recursion occurs where an action has a simple repetitive structure consisting of some basic stepfollowed by the action again. It is a recursion whose order of growth is linear
  12. 12. Datalog: Asking questions and getting answers.Datalog is a nonprocedural query and rule language based on Prolog. Its creation dates back tothe beginning of logic programming, but in 1977 it became a separate area of study.ivDatalog differentiates between:  Rules  Facts  QueriesAn atom is a predicate (relation name) with variables or constants as arguments.A rule determines the logic behind the data: How the data is related. It is composed of twoparts joined by the ‘:-‘symbol:  The head: composed of an atom.  The body: composed of one or more atoms joined with AND (subgoals). The first atom/subgoal Negation p :- q, not r. The second atom/subgoal The head ‘Is true if’ AND
  13. 13. In Datalog, a rule is considered to be true for a given set of variable values if all the subgoalsevaluate to true for those values. Therefore the rule shown above can be expressed asfollowing: ‘p is true if q is true AND not r is true’ Or ‘p depends on q AND not r ’A special case of rules are those with a head part only (no subgoals). They are known as uniterules and will be used mainly in facts. p.Another type of rules is recursive rules. A rule is recursive if it’s contained in its body as asubgoal. p: - r, p.Recursively is what makes Datalog is very interesting language in logic programming. Without it,Datalog can only express what SQL can with a Select-From-Where statement. Thereforerecursively will be the most important aspect in our study because it solves all of theshortcomings of SQL.
  14. 14. Another aspect of rules is that we need to distinguish:  EDB (extensional database relations): relations that exist in the database and used to create facts.  IDB (intensional database relations): relations defined by one or more rules and can’t be used to create facts.Finally, we need to point out the issue of safety in Datalog rules. Since rules can be written withlarge margin of freedom, they can have undesirable consequences. A major problem in thisarea is rules that generate an infinite number of answers. Therefore some restrictions had to beapplied in order to avoid such situations. Datalog enforces the following criteria on rules: If a variable x appears in either:  The head  A negated subgoal  An arithmetic comparison Then x must also appear in a non-negated subgoal of the body.
  15. 15. Example:The following Datalog rules are unsafe because they all return an infinite number of results S(x):- R(y). S(x):- NOT R(x). S(x):- R(y) AND x < y. Possible solution S(x):- R(y), P(x). S(x):- NOT R(x), P(x). S(x):- R(y) AND x < y, P(x).A fact is a tuple inside a relation. It can be seen as an instance for the rule. They are created byunite rules only. They represent the dataset the program will operate on.A fact has the following structure: predicate name (list of constants). A fact can only operate on The name of the rule/relation constants. No variables are allowed to which the fact belongs
  16. 16. Example:Considering the following rule Student(X, Y) that states that a person X is a student in theuniversity Y. Some facts coming from that rule would be:  Student (‘Maroun’, ‘UPA’).  Student (‘Elie’, ‘USJ’).  Student (‘Tony’, ‘LAU’).A query is a question asked to the Datalog program, to which the program is supposed toreturn an answer.A query has the following structure: ?- predicate name (list of constants/variable). A query can have either constants or The name of the rule/relation variables or both depending on its to which the query is related typeDatalog queries can be of three types:  A query that determines whether a fact is true or not. (usually returns ‘yes’ or ‘no’)  A query that returns terms of a fact.  A query that returns all the terms of a fact.Here are some examples of queries (using the same Student rule introduced above):?- Student (‘Maroun’, ’UPA’). Would return ‘yes’ because Maroun is a student at UPA?- Student (‘Maroun’, ’USJ’). Would return ‘no’ because Maroun is not a student at USJ
  17. 17. ?- Student (‘Maroun’,Y). Would return ‘Y: UPA’ because Maroun is a student at UPA?- Student (X,’ J’). Would return all the persons who are students at USJ (in this case Elie)?- Student (X, Y). Would return all the facts derived from the Student ruleDeductive databases: When databases meet DatalogA deductive database is a database system that can make deduction therefore it can come upwith additional facts other than those expressed in its dataset. Such database operates on logicrules and facts and answers to queries. It typically uses Datalog to specify the rules, facts andqueries.Since they are based on Datalog, deductive databases are considered more powerful than theirrelational counterparts because Datalog fills the gap between the data and the logic. Not onlyare they more powerful, but also have a good performance and scalability when it comes todealing with large datasets. vA deductive database has two forms of data:  Stored data belonging to a relation (extensional database relation).  Deducted data that’s not stored but inferred on runtime (intensional database relations)The true power of deductive databases is that they can give us far more data than what’sactually stored into them. This capacity is nearly impossible to implement in relationaldatabases because of SQL’s shortcomings.Unfortunately, deductive databases have not been largely adopted in real-life applications; theyhave remained mainly used for academic and research purposes. This is most probably due tothe fact that deductive databases are used to create large knowledge bases, a thing that isbeyond the scope of most of the applications.
  18. 18. In the recent years, some deductive database concepts started to be used in other systems. Forexample, the RDBMS that provide recursive SQL have based their implementations ondeductive database standards.A more viable approach is to use a regular relational database with an added Datalog layer forlogic programming. This way we gain the power of Datalog without losing the ease of relationaldatabases. That’s the approach followed in this project.
  19. 19. II. Functional and Technical context
  20. 20. A. Introduction:In this part, I am going to expose how I applied the information collected through thetheoretical study in order to bring answers to the problems mentioned in the beginning. Thiswill be explained through the application that was created in the course of this project.The application developed is a website called ‘Family Book’ and represents a concept for asocial network linking family members and not friends. In the following parts, I will talk aboutthe tools and technologies used and how I put them together to build the application. I will alsoshow the architecture of the application through UML diagrams.
  21. 21. B. Tools and technologies usedThe application was created using many existing tools and technologies. This approach hasprovided me with many benefits:  Reduced development time.  Higher productivity.  Usage of a vast number of APIs.  Easy access to numerous help and support on the internet.  A much solid and maintainable finished product.All the tools used are either open source or distributed as freeware. This has reduced the costof the application to zero without sacrificing capabilities, ease of development andperformance. Also such kind of tools often comes with a great support from the communitythrough online forums and tutorials which can offer a tremendous help when developing.Another benefit of these technologies is that they all integrate with each other seamlessly andcan work side by side without the need for complex configurations or workarounds.In what follows I am going to enumerate the tools and technologies used by category. Then Iwill give information about each as well as how and why I used it.
  22. 22.  Programming platform: Java EE 6 with Java Persistence API (JPA). Framework: Java Server Faces with PrimeFaces components. Application server: Glassfish Application Server v3. IDE: NetBeans IDE 6.8 Database server: Oracle 10g + Oracle Multimedia. Datalog engine: IRIS reasoner. RDF engine: Seasame.
  23. 23. Java EEThe Java Platform Enterprise Edition (Java EE) is server programming platform for the Javaprogramming language. It is based largely on open-standards and used for developing,deploying and maintaining enterprise applications.viJava EE enterprise applications are: component- n-tiered web-enabled server-centric basedA typical Java EE enterprise application will comprise of the following:  Presentation logic  Business logic  Data access logic/Data model  System servicesJava EE applications rely heavily on the container in which they are deployed. This containeroffers the applications numerous services leveraging productivity and relieving developers fromimplementing basic functionalities. Among these functionalities:  Security  Transaction  Persistence
  24. 24. Java Persistance APIThe Java Persistence API (JPA) is a specification for accessing, persisting, and managing databetween Java objects and a relational database. It is now considered the standard method toaccomplish Object to Relational Mapping (ORM) in the Java technology.JPA is just a specification not a product. It doesn’t perform any task by itself. Instead it relies onexternal providers to do the job. Therefore the only role of JPA is to unify all thoseimplementations under one interface. Some of the most used JPA providers are Hibernate,EclipseLink, OpenJPA.viiJPA relies on the concept of POJOs (Plain old Java objects) which are ordinary Java class thatdon’t implement any given interface. By doing so, these classes can remain independent of theimplementation and can be reused later on.Each POJO class is called an Entity, and each Entity is linked to a corresponding database table.Here’s an example of a simple Entity class:@Entitypublic class Employee {@Id private int id;@Column(name="F_NAME")private String firstName;@Column(name="L_NAME")private String lastName;private long salary;// ...
  25. 25. We should note that:  The class is annotated with @Entity making it an entity class linked to a database table that has the same name.  id is annotated with @id making it the primary key of the database table.  firstName and lastName are annotated with @Column which means that they’ll be persisted into those table columns.  salary is not annotated, but still it will be persisted in a column with the same name.Java Serve FacesJava Server Faces (JSF) is a server-side user interface component framework for Java webapplications. It is the standard technology for developing web applications since Java EE 5.JSF provides developers with many features, mainly:  Components (abstraction of standard HTML elements + custom widgets)  Events (similar to AWT events for Java desktop applications)  Validators (validate input data)  Convertors (convert input data types)  Navigation (navigate conditionally from a web page to another)  Back-end-data integration (integrate with data sources such as databases...)
  26. 26. JSF and MVC:JSF ensures that applications are well designed and highly maintainable by integrating the well-established Model-View-Controller (MVC) design pattern into its architecture. This means thatJSF applications clearly separate between:viii Model View ControllerIn JSF, the MVC architecture is as follows: • Reperesents the part of the application the user can see and View interact with. • Usually a JSP or XHTML page. • Handles requests from the View to execute Controller actions on the Model. • Usually the FacesServlet class. • Represents the behavior of the application. Model • Deals with the applications persistent data. • Usually a ManagedBean class
  27. 27. This separation between the presentation and behavior enables specialized users (webdesigners, component writers, application architects...) to work each on their part of concern.Therefore, the development cycle is shortened and the application is more maintainable in thelong term.To illustrate how the architecture functions, here’s a simple example of a login use case in a JSFapplication:  The view : <h:form> <h:inputText value="#{}" id="name"/> <h:inputSecret value="#{user.password}" id="password"/> <h:commandButton value=”Login” action=”#{user.login}”/> </h:form>In this simple view, we need to note the following elements:  The form element that encompasses the components.  The inputText element that represents a simple HTML text input.  The inputSecret element that represents a simple HTML password input.  The commandButton element that represents a simple HTML submit button.Furthermore, another point of interest is how the view is communicating with the controller. This isdone using Java EL in the following manner:  The inputText and inputSecret component’s values are linked respectively to the name and password properties of the user model; any change of these properties will be reflected on the view and vice versa.  The commandButton component’s action is linked to the login method of the user model; every time this component is activated (clicked), the controller will invoke the login method on the user model.
  28. 28.  The model: @ManagedBean(name = "user") @SessionScoped public class UserBean { private String name = ""; private String password; public String getName() { return name; } public void setName(String newValue) { name = newValue; } public String getPassword() { return password; } public void setPassword(String newValue) { password = newValue; } public String login(){ if(name.equals(“user”) && password.equals(“pass”)){ return “success”; } else { return “failure”; } } In this simple model, we need to note the following:Expression Language (EL) is a scripting language which allows access to Java components (JavaBeans)through JSP or JSF.  It is an ordinary Java class annotated with @ManagedBean.  The name attribute is used to access this model from Java EL.  The @SessionScoped annotation implies that this model will be preserved as long as the session is active.  The model exposes its properties to Java EL as public getters and setters.  The model has one action method, the login method. This method takes no parameters and returns a String. The return value will determine to which view the controller has to navigate after invoking this method. For example, if the user is correctly logged in, the controller will navigate to the success view (success.xhtml page), otherwise to the failure view (failure.xhtml page).
  29. 29.  The controller :  Represented by the built-in class FacesServlet, therefore programmers don’t need to define controllers themselves.  It runs constantly in the background once the application is invoked.Localization in JSF:Another important aspect of JSF is internalization and localization. This is done by creating amessage bundle per locale. Each bundle will contain the textual content of the applicationwritten in the locale’s language. The application will then use the suitable bundle according tothe web browser locale or the user’s choice.For example, a JSF application can have the following message bundles:  messages_en for English  messages_fr for French  messages_de for German  messages_ar for ArabicThen, if the web browser locale is set English, the application will automatically get its text content fromthe messages_en bundle. Otherwise, if the locale is set to Arabic, the application will rely onthe messages_ar bundle.Inside every bundle, the text is represented as key-value pairs. For example: name: nom password: mot de passe gender: sexe
  30. 30. Internationalization is the process of designing an application so that it can be adapted to various languagesand regions without engineering changes. Textual elements, such as status messages and the GUI componentlabels are not hardcoded in the program. Instead they are stored outside the source code and retrieveddynamically. Therefore, support for new languages does not require recompilation. PrimeFaces JSF offers a limited set of visual components reflecting the classic HTML elements (input, button, checkbox…). As useful as the components are, they are not enough for a large-scale web application. Therefore more advanced components are needed. One solution for this problem would be for the developer or web designer to create the custom components themselves. However this approach definitely has a steep development cycle and can seriously disrupt and work flow. A much better solution would be to use a ready-made component suite. An example of such suite is the PrimeFaces project. PrimeFaces is an open-source project that delivers more than a hundred Ajax-ready components. The beauty of PrimeFaces is its simplicity: no configuration is needed to integrate the components in a JSF application. Once the appropriate JAR file is added to the application classpath, the PrimeFaces components can be used right away.ix Besides, another aspect of PrimeFaces is that the components can be easily skinned using premade themes, so the application’s look and feel can be modified easily .Consequently visually-appealing web applications can be created with few steps. Some of PrimeFaces’ components used in the application:  Calendar.  Dialog.  File upload.  Menu.
  31. 31. Glassfish Application ServerGlassfish is an open-source application server created by Sun Microsoft as a referenceimplementation for the Java EE platform: for every version of Java EE Glassfish is the first serverto fully implements the new features.xThe beauty of Glassfish is that it’s very simple to use and at the same time very powerful. Itoffers many features, mainly:  Thread pools.  Database connection pools.  SecurityGlassfish can be installed stand-alone, or as in my case bundled with the NetBeans IDE. It fully integrateswith the IDE making application easy to deploy and maintain.The version of Glassfish used with the application is v3 which fully implements the Java EE 6specification.NetBeansNetBeans is a free open source IDE. It is led by an active community that provides constant updates andfeatures. It currently rivals with Eclipse IDE (among others) as the first Java technology IDE.
  32. 32. Among the features it offers:xi  Full support for Java based applications (Web, desktop and mobile)  Support for dynamic languages such as Groovy and Ruby.  Support for PHP and C++.  Support for web technologies such as HTML,CSS and JavaScript.  Support for UML modeling.Here’s a screenshot of NetBean’s user interface
  33. 33. Oracle DatabaseThe oracle database is an object-relational database management system created by OracleCorporation.Oracle database provides:  High scalability and effectiveness.  Manageability.  High availability.  Backup and recovery.  Business intelligence features.  Security features.  Data integrity.Oracle database is one of the major contenders onthe database market. It competes directly with suchproducts as Microsoft SQL Server and IBM DB2. It’savailable for all the major operating systems and iswidely used especially in client-server applications.xii Figure 1
  34. 34. Oracle MultiMediaOracle database includes an interesting feature named Oracle Multimedia (formerly OracleinterMedia). It enables the database to store, manage and retrieve audio, video, images andother types of media. Oracle Multimedia extends the traditional features of Oracle Database(reliability, availability…) and applies them to multimedia content.Oracle Multimedia (formerly Oracle interMedia) is a feature that enables Oracle Database tostore, manage, and retrieve images, audio, video, or other heterogeneous media data in anintegrated fashion with other enterprise information. One aspect that Oracle Multimediadoesn’t deal with is how the data iscaptured; this is left to the applicationsoftware.xiiiOracle Multimedia provides the following:  Storage and retrieval.  Media and application metadata management. Figure 2  Support for popular media formats.  Access through traditional and Web interfaces.  Querying using associated relational data.  Querying using extracted metadata.  Querying using media content with optional specialized indexing.
  35. 35. Oracle Multimedia provides the following objects to represent media content:  ORDAudio  ORDDoc  ORDImage  ORDImageSignature  ORDVideo  SI_StillImageUsing these objects and their methods, developers can do the following:  Extract metadata and attributes from multimedia data.  Get and manage multimedia data from Oracle Multimedia, web servers, file systems, and other sources.  Perform manipulation operations on image dataOracle Multimedia offers two methods to manipulate these objects:  Using PL/SQL stored procedure.  Using Java API.In this project, I am particularly interested in the ORDImage and ORDImageSignature objects.ORDImage object type supports the storage, management, and manipulation of image data.Some of the methods offered by ORDImage:  init( ) creates a new empty ORDImage object. BEGIN INSERT INTO pm.online_media (product_id, product_photo) VALUES (3501, ORDSYS.ORDImage.init()); END;
  36. 36.  copy(dest IN OUT ORDImage) copies an ORDImage into another.DECLARE image_1 ORDSYS.ORDImage; image_2 ORDSYS.ORDImage;BEGIN -- Initialize a new ORDImage object where the copy will be stored: INSERT INTO pm.online_media (product_id, product_photo) VALUES (3091, ORDSYS.ORDImage.init()); -- Select the source object into image_1: SELECT product_photo INTO image_1 FROM pm.online_media WHERE product_id = 3515; -- Select the target object into image_2: SELECT product_photo INTO image_2 FROM pm.online_media WHERE product_id = 3091 FOR UPDATE; -- Copy the data from image_1 to image_2: image_1.copy(image_2); UPDATE pm.online_media SET product_photo = image_2 WHERE product_id = 3091; COMMIT;END;  process (command IN VARCHAR2) performs image processing (such as resizing..) .DECLARE obj ORDSYS.ORDImage;BEGIN SELECT product_photo INTO obj FROM pm.online_media WHERE product_id = 3515 FOR UPDATE; obj.process(maxScale=32 32); UPDATE pm.online_media p SET product_thumbnail = obj WHERE product_id = 3515;COMMIT;END;
  37. 37. ORDImageSignature object type supports content-based retrieval. It serves primarily for imagematching. This is done by comparing two images according to given criteria and deducing ascore value: the smaller the score, the more similar the images are.The criteria, also called weights, used to compare images are:  The color.  The texture.  The location.  The shape.Some of the methods offered by ORDImageSignature:  init( ) creates a new empty ORDImageSignature object.BEGIN INSERT INTO pm.online_media (product_id, product_photo,product_photo_signature) VALUES (1910, ORDSYS.ORDImage.init(FILE, FILE_DIR,speaker.jpg), ORDSYS.ORDImageSignature.init()); COMMIT;END;  generateSignature(image IN ORDImage) generates the signature for the giver ORDImage objectDECLARE t_image ORDSYS.ORDImage; image_sig ORDSYS.ORDImageSignature;BEGIN SELECT p.product_photo, p.product_photo_signature INTO t_image, image_sig FROM pm.online_media p WHERE p.product_id = 2402 FOR UPDATE;-- Generate a signature: image_sig.generateSignature(t_image); UPDATE pm.online_media p SET p.product_photo_signature = image_sig WHERE product_id = 2402;END;
  38. 38.  evaluateScore(sig1 IN ORDImageSignature,sig2 IN ORDImageSignature, weights IN VARCHAR2) evaluates the distance (the score) between the two given signatures according to the supplied weights. The bigger the distance, the less similar the images are.DECLARE t_image ORDSYS.ORDImage; c_image ORDSYS.ORDImage; image_sig ORDSYS.ORDImageSignature; compare_sig ORDSYS.ORDImageSignature; score FLOAT;BEGIN SELECT p.product_photo, p.product_photo_signature INTO t_image, image_sig FROM pm.online_media p WHERE p.product_id = 1910 FOR UPDATE;-- Generate a signature: image_sig.generateSignature(t_image); UPDATE pm.online_media p SET p.product_photo_signature = image_sig WHERE product_id =1910; SELECT p.product_photo, p.product_photo_signature INTO c_image, compare_sig FROM pm.online_media p WHERE p.product_id = 1940 FOR UPDATE;-- Generate a signature: compare_sig.generateSignature(c_image); UPDATE pm.online_media p SET p.product_photo_signature = compare_sig WHERE product_id = 1940; SELECT p.product_photo, p.product_photo_signature INTO t_image, image_sig FROM pm.online_media p WHERE p.product_id = 1910; SELECT p.product_photo, p.product_photo_signature INTO c_image, compare_sig FROM pm.online_media p WHERE p.product_id = 1940;-- Compare two images for similarity based on image color: score:=ORDSYS.ORDImageSignature.evaluateScore(image_sig, compare_sig,color=1.0,texture=0,shape=0,location=0); DBMS_OUTPUT.PUT_LINE(Score is || score);END;An example of how Oracle Multimedia can be used with Java is found in the appendices.
  39. 39. IRIS ReasonerIRIS is an open-source Datalog reasoner that can evaluate safe or unsafe Datalog.It is composed of three jar files:  The first contains the reasoning engine.  The second contains the parser  The third contains some utility programs including two applications that provide a user interface to the IRIS engineThe IRIS reasoner evaluates queries over a knowledge base. A knowledge-base is composed offacts and rules. The combination of facts, rules and queries forms a logic program. A logicprogram is the input for IRIS.xivA knowledge-base can be created in one of the two ways:  Create the java objects representing the components of the knowledgebase using the API  Parse an entire Datalog program written in human-readable form using the parser.IRIS evaluates the queries by returning the set of all tuples that can be found or inferred fromthe knowledge-base and that satisfy the query.Unlike the standard Datalog syntax, IRIS’s variables start with the ‘?’ symbol.IRIS supports the addition of external data sources to be used when evaluating queries. Thesedata sources can databases, XML files, Web services….
  40. 40. SesameSesame is a Java framework used to store and query RDF data. It provides the followingfeatures:  An extensible storage mechanism: RDF tuples can be stored in a database, disk file or kept in-memory.  Inferencers for RDFS.  Many types of RDF file formats.  An RDF query engine supporting SeRQL and SPARQL languages.It hides all the complexities of RDF by providing an API that resembles the JDBC API. Therefore,accessing Sesame feels like accessing any relational database.Sesame introduces the idea of repository, a storage facility that can store RDF schema and data.Repositories are strong and flexible facilities that can be accessed remotely via HTTP.xv.The following is Sesame’s architecture that shows how the different components are relatedone to another Figure 3
  41. 41. To illustrate how simple is Sesame to use, here’s a small example:String sesameServer = "";String repositoryID = "example-db";//accessing a repositoryRepository myRepository = new HTTPRepository(sesameServer, repositoryID);myRepository.initialize();//creating a connectionRepositoryConnection con = myRepository.getConnection();//adding a new rdf elementURL url = new URL("");con.add(url, url.toString(), RDFFormat.RDFXML);//querying the repositoryString queryString = "SELECT x, y FROM {x} p {y}";TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SERQL, queryString);TupleQueryResult result = tupleQuery.evaluate();
  42. 42. C. Developed solution:OverviewThe solution developed is a Java EE web application using the following features:  Java EE 6 with JSF and PrimeFaces.  Glassfish application server.  Oracle 10g and Oracle MultiMedia. The application is basically a prototype for a social network solution. It is a prototype because the social networking features (chatting, messaging…) are not implemented. The application is simply intended to showcase a solution for the problems mentioned above.The proposed solutions for the problems:The application’s primary goal is to find solutions to the problems proposed in the introduction.It does so by using alternative ways of thinking and relying on third-party technologies.The following is a list of the problems and their solutions:  Create applications that rely on fewer volumes of data: Using Datalog and the IRIS Reasoner to deduce knowledge.  Create more intelligent applications: Using the IRIS Reasoner, the application can come up with a knowledge base not stored in the database. Therefore we can say that the application is ‘thinking’.  Using existing data sources: Since Datalaog and the IRIS Reasoner can be plugged into a pre-existing infrastructure, the application relies on a traditional relational database but extends it with logic programming and deduction features.
  43. 43.  Dealing with multimedia content: Using Oracle MultiMedia, the application can dig into the content of an image for search and comparison purposes. This capability, coupled with Datalog reasoning, sheds a new light on how multimedia content should be handled in applications.  ing h c i i i in d ’ ic i n : The solution applies those feature within a very popular application type (social networks) proving that the work done can become practical and not just theoretical.The idea behind the application:The application is based on a very simple concept ‘Knowing a person’s parents, all his family members can be deduced’Although this rule sounds too simple, it enables the construction of a person’s complete familytree. The implications of this are tremendous:  Fewer data is required. Only the mother and father are needed.  Less data means less maintenance and less programming problems.  The application acquires a certain level of artificial intelligence.Knowing the mother and father the application can eventually deduce the identity of:  Brothers and sisters.  Grandparents.  Uncles and aunts.  Cousins
  44. 44. …and the list keeps going on….virtually any member of the family, no matter how far he is, canbe identified by this approach.A user wishing to join the application will need to provide the following information:  First and last name.  Gender  Birthdate.  Mother and Father.  Spouse.The first and last name and the birthday are just used for display. They don’t contribute in theapplication’s logic. The gender however, plays an important role because it helps determinefamily relations h ic i n’ in:As mentioned above, the application relies on the IRIS Reasoner to make logic deductions. Witha very reduced set of rules, the application can deduce many family relations. And this behavioris extensible, meaning that we can add as much rules as we want to identify family members.The following is a listing of the rules that the application relies on:
  45. 45. male(?X).female(?X).father(?X,?Y).mother(?X,?Y).child(?X,?Y):-father(?Y,?X).child(?X,?Y):-mother(?Y,?X).brother(?X,?Y):- male(?X),father(?Z,?X),father(?Z,?Y),mother(?T,?X),mother(?T,?Y),not ?X=?Y.sister(?X,?Y):- female(?X),father(?Z,?X),father(?Z,?Y),mother(?T,?X),mother(?T,?Y),not ?X=?Y.paternalGrandFather(?X,?Y):- father(?X,?Z),father(?Z,?Y).maternalGrandFather(?X,?Y):- father(?X,?Z),mother(?Z,?Y).paternalGrandMother(?X,?Y):- mother(?X,?Z),father(?Z,?Y).maternalGrandMother(?X,?Y):- mother(?X,?Z),mother(?Z,?Y).uncle(?X,?Y):-male(?X),father(?Z,?Y),brother(?X,?Z).uncle(?X,?Y):-male(?X),mother(?Z,?Y),brother(?X,?Z).aunt(?X,?Y):-female(?X),father(?Z,?Y),sister(?X,?Z).aunt(?X,?Y):-female(?X),mother(?Z,?Y),sister(?X,?Z).This is all the application needs to deduce family members. If we want to identify a new familymember we need to add the corresponding rule to this set.
  46. 46. The beauty of IRIS Reasoner is that it integrates very easily with a relational database andenables us to query it using Datalog. This feature enabled the IRIS Reasoner to communicatewith the Oracle database and fetch the data from its relational tables.This can be done by adding a custom data source to the IRIS Reasoner’s configuration so theengine can investigate this data source every time a Datalog query is issued.The following is a code extract to demonstrate this:Configuration configuration = KnowledgeBaseFactory.getDefaultConfiguration();configuration.externalDataSources.add(new SQLDataSource());SQLDataSource is a custom class implementing the IDataSource interface of IRIS. It providesthe following method:public void get(IPredicate predicate, ITuple from, ITuple to, IRelation relation) { //test with predicate is being queried from Datalog. //Construct the corresponding SQL query and execute it on the database // Retrieve the results and put them in the relation}This approach translates Datalog’s non-recursive queries into SQL Select-From-Where queries.And since those simple rules are used within other recursive ones, we gain the power ofrecursively that SQL cannot offer.
  47. 47. For example, if we take the paternalGrandFather recursive relation, it is composed of two non-recursive rules (father). Expressing this rule in SQL is not very obvious, but when it’s brokendown into two separate rules, each can be executed as a normal SQL query without losing thepower of recursion.Therefore, when evaluating the paternalGrandFather rule , the IRIS Reasoner will beginning byevaluating the first father rule and then evaluating the second.This rule can be easily read as: ‘X is the parental grandfather of Y, if X is the father of X’s father’
  48. 48. h i n’ f :The ‘Family Book’ application enables users to:  Login.  Register.  View/Edit their profile.  View Family members.  Search For people.  View/Add photos.  Compare/Search photos.UML modeling:Use case diagram
  49. 49. Login sequence diagramRegister sequence diagram
  50. 50. View profile sequence diagramEdit profile sequence diagram
  51. 51. View Family sequence diagramView photos sequence diagram
  52. 52. Edit photos sequence diagramCompare photos sequence diagram
  53. 53. Search photos sequence diagramAdd person sequence diagram
  54. 54. Class diagram
  55. 55. The solution’ c n n :The application can be divided into two big parts:  The web part.  The source code part.The web pages:The application contains the following web pages:  index.xhtml  home.xhtml  person-profile.xhtml  person-family.xhtml  photo-list.xml
  56. 56. index.xhtmlh ic i n’ in g wh h c n ginThe ic i n’ in g h wing h gi i nf
  57. 57. home.xhtmlThis is the page where the user is directed after he logs in h n f ing ch h n’ n w hi i n h vi i in h v
  58. 58. person-profile.xhtmlhi i h n’ fi e where he can view and edit his information
  59. 59. person-family.xhtmlThis is the page where the user can see how the application has deducted his family members
  60. 60. photo-list.xhtmlIn this page the user can view the photos he has uploaded as well as upload new onesThe user here has chosen to compare those two pictures, and the application has returned the comparison score
  61. 61. The source code:The application contains the following packages and classes:  pji.ejb contains business logic classes o ConceptService : Deals with RDF concepts stored in Sesame. a) findAll() returns all concepts. o FamilyService : Deals with family relations between persons. a) findRelatedPerson(Person p1, FamilyRelation r) finds the person p2 that is related to p1 with the relation r. b) findRelatedPersons(Person person, FamilyRelation relation) finds the list of persons that are related to p1 with the relation r. c) areRelated(Person p1, Person p2, FamilyRelation r) returns true if p1 and p2 are related with the relation r. d) findRelation(Person p1, Person p2) finds what relation, if any, joins p1 and p2. o PersonService: Deals with person’s information. a) create(Person p) persists the person p in the database. b) find(Object id) finds the person with the corresponding id. c) findByMatchingName(String name) finds all the persons whose names match the given name. d) findByGender(Gender g) finds all the persons who are of gender g. o PhotoService: Deals with user’s photos. a) add(InputStream i, String d, Person p) adds a new image from the inputstream I and having a description d for the person p.
  62. 62. b) findForPersonId(int id) finds all the photos that belong to the person with the given id. c) compare(int id1, int id2, double color, double texture, double shape, double location) compares the two photos corresponding to the given ids , according to the supplied weights. o UserService: Deals with user related matters such as login and logout. a) create(User u) persists the user u in the database. b) findByName(String name) finds the user with the corresponding name. c) login(User user) checks the user’s credentials and performs a log in. pji.ejb.datalog contains the Datalog logic classes o Reasoner: Invokes the IRIS Engine to execute Datalog queries. a) executeBooleanQuery(String query) returns true if the supplied Datalog query returned a ‘yes’. b) executeQuery(String query) returns the list of person ids returned after the execution of the supplied query. o SQLDataSource: Acts as a bridge between IRIS and Oracle database. a) get(IPredicate predicate, ITuple from, ITuple to, IRelation relation) converts the Datalog query into SQL and fills the relation with data coming from Oracle. o rules.txt: Contains the Datalog rules that the application acts upon. pji.entities contains the entity classes mapped to the database tables o FamilyRelation lists the possible family relations that can exist between two people o Gender Can be either male or female o Person Represents a person with a profile and relations to other persons o Photo Represents a profile picture stored in the database as a BLOB. This is independent from the photos that the person can upload. o User Represents the credentials with which a person logs into the application.
  63. 63. The database structure:The application communicates with an Oracle database named ‘ORCL’. It contains the followingtables:  GENDERS  PERSONS  PHOTOS  PROFILE_PHOTOS  USERSAnd here’s their structure: Persons Users Genders Photos Profile_Photos
  64. 64. The database also contains the following stored procedure that adds a photo in the PhotostableCreate or replace PROCEDURE ADD_PHOTO (description IN VARCHAR2, person_id IN NUMBER, id OUTNUMBER) ASBEGIN INSERT Into photos Values (photo_seq.nextval, description, ordsys.ordimage.init(),ordsys.ordimagesignature.init(), person_id) returning id into id;END ADD_PHOTO;
  65. 65. III. Conclusion and future works
  66. 66. At the beginning of this report, many questions were asked. Those questions revolved arounddata-centric issues. Mainly, how to reduce the data volume and make the applications smarterand how to treat multimedia content differently that the normal textual data.In the course of this project, I researched many existing solutions and tools that tried to answerthose questions. But none of them answered all those questions. That’s why I decided that,based on these tools; I am going to come up with a custom solution. That solution consisted ofkeep using traditional relational databases while integrating the powerful Datalog language ontop of them. This approach gave me the benefits of Datalog with the flexibility of relationaldatabases. I also investigated alternatives data sources such as RDF and used it along withSesame to classify my multimedia content according to concept. On another hand, I usedOracle Multimedia in order to have the possibility to compare multimedia content (images)according to their content (shape, color, location and texture).As an improvement to my solution, I believe we need to integrate deduction in more aspects ofthe applications (not just the family relations). We also need to integrate RDF more to makemore use of its powerful constructs and possibly also use OWL for ontologies. A last questionbegs itself is whether all those features will make it one day to a real-life application? Or willthey always remain for academic purpose? And if they do make that transition, what willhappen to our current tools? What future theirs is for SQL databases if Datalog dominated themarket? Can they peacefully coexist?
  67. 67. List of figures  Figure 1 : Oracle Database architecture  Figure 2 : Oracle Multimedia architecture f  Figure 3 : Sesame architecture
  68. 68. List of referencesi Deductive Databases and Their ApplicationsRobert Colomb CRC Press, USAii
  69. 69. xii
  70. 70. AppendicesOracle MultiMedia with Java API exampleimport java.sql.Connection;import java.sql.SQLException;import java.sql.DriverManager;import java.sql.Statement;import java.sql.PreparedStatement;import java.sql.ResultSet;import;import oracle.jdbc.OracleResultSet;import oracle.jdbc.OraclePreparedStatement;import;public class InterMediaQuickStart { /** * Entry point. * usage: java InterMediaQuickStart connectionString usernamepassword * e.g. java InterMediaQuickStart jdbc:oracle:oci:@inst1 scotttiger **/ public static void main(String[] args) throws Exception { System.out.println("starting interMedia Java quickstart..."); if (args.length != 3) { System.out.println("usage: java InterMediaQuickStartconnectionString username password"); System.out.println("e.g. java InterMediaQuickStartjdbc:oracle:oci:@inst1 scott tiger"); System.exit(-1); } // Run the examples as they are printed in the quick startdocumentation // register the oracle jdbc driver with the JDBC drivermanager DriverManager.registerDriver(neworacle.jdbc.driver.OracleDriver());
  71. 71. Connection conn = DriverManager.getConnection(args[0],args[1], args[2]); System.out.println("Got database connection"); System.out.println("Starting verbatim interMedia Java QuickStart code..."); // Note: it is CRITICAL to set the autocommit to false so that // two-phase select-commit of BLOBs can occur. conn.setAutoCommit(false); // create a JDBC Statement object to execute SQL in thedatabase Statement stmt = conn.createStatement(); // ------------------- // Creating a Table // ------------------- { // Create the image_table table with two rows: id andimage String tableCreateSQL = "create table image_table " + "(id number primary key, " + "image ordsys.ordimage)"; stmt.execute(tableCreateSQL); } // ------------------- // Uploading Images from Files into Tables // ------------------- { // insert a row into image_table String rowInsertSQL = ("insert into image_table (id,image) values (1,ordsys.ordimage.init())"); stmt.execute(rowInsertSQL); // select the new ORDImage into a java proxy OrdImageobject (imageProxy) String rowSelectSQL = "select image from image_table whereid = 1 for update"; OracleResultSet rset =(OracleResultSet)stmt.executeQuery(rowSelectSQL);; OrdImage imageProxy = (OrdImage)rset.getORAData("image",OrdImage.getORADataFactory()); rset.close(); imageProxy.loadDataFromFile("goats.gif"); imageProxy.setProperties();
  72. 72. String updateSQL = "update image_table set image=? whereid=1"; OraclePreparedStatement opstmt =(OraclePreparedStatement)conn.prepareStatement(updateSQL); opstmt.setORAData(1, imageProxy); opstmt.execute(); opstmt.close(); } // ------------------- // Retrieving Image Properties // ------------------- // Java Accessor Methods { String rowSelectSQL = "select image from image_table whereid = 1"; OracleResultSet rset =(OracleResultSet)stmt.executeQuery(rowSelectSQL);; OrdImage imageProxy = (OrdImage)rset.getORAData("image",OrdImage.getORADataFactory()); rset.close(); int height = imageProxy.getHeight(); int width = imageProxy.getWidth(); System.out.println("proxy (height x width) = " + height +" x " + width); } // ------------------- // Creating Thumbnails and Changing Formats // ------------------- { // One could significantly reduce the number of round trip // database communications in the following example. String rowInsertSQL = ("insert into image_table (id,image) " + "values (2,ordsys.ordimage.init())"); stmt.execute(rowInsertSQL); // get the source ORDImage object String srcSelectSQL = "select image from image_table whereid=1"; OracleResultSet rset =(OracleResultSet)stmt.executeQuery(srcSelectSQL);; OrdImage srcImageProxy =(OrdImage)rset.getORAData("image", OrdImage.getORADataFactory());
  73. 73. rset.close(); // get the newly inserted destination ORDImage object String dstSelectSQL = "select image from image_table whereid=2 for update"; rset = (OracleResultSet)stmt.executeQuery(dstSelectSQL);; OrdImage dstImageProxy =(OrdImage)rset.getORAData("image", OrdImage.getORADataFactory()); rset.close(); // call the processCopy method (processing occurs on theSERVER) srcImageProxy.processCopy("maxscale=100 100fileformat=jfif", dstImageProxy); // update the destination image in the second row String dstUpdateSQL = "update image_table set image=?where id=2"; OraclePreparedStatement opstmt =(OraclePreparedStatement)conn.prepareStatement(dstUpdateSQL); opstmt.setORAData(1, dstImageProxy); opstmt.execute(); opstmt.close(); } // ------------------- // Downloading Image Data from Tables into Files // ------------------- { // export the data in row 2 String exportSelectSQL = "select image from image_tablewhere id = 2"; OracleResultSet rset =(OracleResultSet)stmt.executeQuery(exportSelectSQL); // get the proxy for the image in row 2; OrdImage imageProxy = (OrdImage)rset.getORAData("image",OrdImage.getORADataFactory()); rset.close(); // call the getDataInFile method to write the ORDImage inrow 2 to disk imageProxy.getDataInFile("row2.jpg"); } // -------------------
  74. 74. // Cleaning up // ------------------- { // drop the images table stmt.executeQuery("drop table image_table"); // commit all our changes conn.commit(); } stmt.close(); System.out.println("Done with verbatim interMedia Java QuickStart code."); // close the database connection to release all the resources conn.close(); return; }}