Deductive Databases

Antonin University
Baabda- 2010

PJI Report

Maroun Baydoun INF 1312

Deductive Databases
Under the supervision of Mr. Samir Saad

Presented to Mr. Chady Abou Jaoudé

I would like to take the opportunity to thank Father Fady Fadel and Dr. Paul Ghobril for
providing us with a great level of education and putting under our disposition all the tools we
need to succeed. I would also like to thank Mr. Samir Saad for accepting to supervise me in this
project.

Last but not least, I would like to thank the open source community for supplying the
perquisites for my project and supporting me on the online forums.

Table of content:

I. Technological context and motivation
1. Introduction
2. Theoretical study
3. Summary

II. Functional and Technical context
1.Introduction
2.Tools and technologies used
3.Developed solution
4.UML Diagrams
a. DCU
b. DES
c. DCL
5. Classes
6. Summary

III. Conclusion and future work

List of figures

List of references

Appendices

I. Technological context and motivation

A. Introduction:

Storing data is amongst the most important issues faced in every aspect of IT. Developed
solutions usually rely on data stores to stock and retrieve information. As applications tend to
become more and more complex, so does the data they deal with. Whether using SQL
databases, XML files, LDAP directories or any other form of storage medium, large amounts of
data require considerable space and imply performance penalties, not to mention maintenance
and administration nightmares.

Even though big volumes of data can cause that much trouble, they are necessary in many
cases; some applications simply need that much data to function properly. Let’s take for
example a large-scale e-commerce web application. It needs to have access to a large array of
information (users’ details and credentials, products, transactions…); without this data, the
application is pretty useless. On the other hand, no matter how useful this data is, we can’t
neglect all the implications related to storing and maintaining it. Large disk spaces, big amounts
of backups, considerable work hours to keep it in good condition…are just a few of the hassle
caused by data in working environments.

Another drawback of using large amounts of data is that it makes applications look and behave
less “smartly”. Feeding applications all this data ultimately means that they can’t think for
themselves. Hence their role is limited to accessing and updating the given data without really
reasoning beyond the business rules established in them. In a fast-evolving world like ours,
applications can’t remain reduced to that function…applications need to start thinking!

On the other hand, as applications become more advanced, so does their content. It’s
unconceivable to state that they are solely charged with delivering textual content: In the 21st
century applications are becoming a key player in conveying multimedia materials. This
situation urges the application to start treating this kind of content differently; multimedia
should be handled from a different perspective, one that accounts for its substance and not just
its presentation.

All that was stated earlier raises many questions. Firstly, is it possible to create applications that
rely on fewer volumes of data? Can applications really be made more intelligent if they deal
with less data? And if so, in what ways can they reason? Can this be done on the existing data
storage solutions or should we adopt new ones? Furthermore, how can applications deal with
multimedia in order to take full advantage of them? How can multimedia be treated differently
than text content? And finally, how can we apply all the mentioned above in today’s
applications?

B. Theoretical study:

Databases: The prime medium for data storage

In the early days of computing, data was held on decks of cards and magnetic tapes. The
programs read the files, changed their contents and wrote them back again as new version.

The first revolution came with the invention of disk technology. It made it possible to change
file contents without copying their entire content. However at this point, the structure of the
files was still embedded directly in the programs responsible for the updates.

The next advance was the introduction of database technology. It provides a way to describe
data independently from the application program. Data has finally become a separate resource
in its own right. Another advantage of this technology is that the procedures used to access and
update the data are hidden from the programmer. The structure and content of a database
record was still described independently from the way it’s stored, accessed and updated, and
the connection between records is represented as pointers.

The most important leap was the introduction of relational databases where all records can be
identified by their contents. The real innovation behind relational databases is that they follow
the mathematical theory of relations. Another advance is query optimization which was made
possible thanks to the ability to transform queries from one expression to another having the
same result as the first.i

Databases are particularly useful under the following circumstances:

 Concurrent changes to the data.
 Regular changes to the data.
 Large sets of data need to be shared among many people.
 Queries need to be executed fast without analysis.

SQL: Making dialogs with the database

In the early 70s, IBM engineers developed the SQL language to manipulate and retrieve data
stored in their relational database management system (System R). Initially called SEQUEL
(Structured English Query Language), the name was later changed to SQL for legal reasons.

The SQL language is composed of the following elements:

 Clauses constituting components of statements and queries.
 Expressions producing scalar values or tables consisting of columns and rows of data.
 Predicates specifying conditions used to limit the effects of statements and queries, or to
change program flow.
 Queries which retrieve data based on specific criteria.
 Statements which may have a persistent effect on schemas and data, or which may control
transactions, program flow, connections, sessions, or diagnostics.
 Insignificant whitespace generally ignored in SQL statements and queries, making it easier
to format SQL code.
 The semicolon statement terminator (‘;’) which is optional on some platform but is still part
of the standard SQL grammar.ii

The following figure is a simple SQL statement that illustrates the different components of SQL
language:

http://www.stat.berkeley.edu/~spector/sql.pdf

The SQL language is used as a front-end for many RDBS such as Oracle, MS SQL and MySQL.
While these systems adhere to the SQL standard grammar, each adds their own set of custom
constructs aiming at offering more advanced functionalities. Not only that, but also many of
these RDMS provided their own database manipulation languages that make up for the SQL
lack of procedural capabilities such as (such as T-SQL for MS SQL) and PL/SQL for Oracle.

The SQL language can be divided into three subsystems according to functionality:

 Data description: describes the structure tables, views and other kinds of objects
(commonly referred to as DML).
 Data access: enables reading, saving, modifying and dropping data. (commonly referred
to as DDL)
 Privileges: used to grant and revoke rights for users in RDBMS.

SQL: The things we cannot express

SQL has made querying databases a very easy task. With a simple Select statement, large
amount of data can be extracted. However, SQL comes with its own limitations. Some queries
are simply too difficult or impossible to express. This leads to a difficult situation: if SQL has its
shortcomings, then what are our alternatives?iii

Let’s take an example of a situation where SQL is an inadequate choice. The following table
(named Assembly) illustrates the different parts of a vehicle and how they’re related.

Number
Subpart
Part

trike wheel 3

trike frame 1

frame seat 1

frame pedal 1

wheel spoke 2

wheel tire 1

tire rim 1

tire tube 1

As we can see in this case, some parts are made from a number of subparts, which in turn are
made from other subparts and so in. This suggests that the data follows a hierarchical structure.

This data structure is very powerful because it expresses perfectly how the parts are related to
each other. However, it represents a major obstacle for SQL queries because the relational
algebra that SQL follows are not very expressive for such situations. With SQL, traversing a
hierarchical structure can be done by joining the assembly with itself.

For example, to write a query that returns all the components of a trike, we need to join the
Assembly table with itself a first time to know that a trike contains spoke, tire, seat and pedal,
and a second time to know that it contains rim and tube.

This example illustrates the problem with SQL and hierarchical data: We need table joins as
many levels as there are in the structure. This is obviously not a big issue in small datasets, but
its severity immediately escalates in situations where the structure is composed of tens of
levels.

SQL does offer another solution: Recursive queries. A recursive query is similar to a regular SQL
query but is used with tree-like data structures. Here’s an example of such query:

However, this solution is not optimal because:

 It’s not a standard solution therefore it’s not supported by every RDBMS.
 The query syntax is not the same on all the supported RDBMS.
 Only supports linear recursion.

Linear recursion occurs where an action has a simple repetitive structure consisting of some basic step
followed by the action again. It is a recursion whose order of growth is linear

Datalog: Asking questions and getting answers.

Datalog is a nonprocedural query and rule language based on Prolog. Its creation dates back to
the beginning of logic programming, but in 1977 it became a separate area of study.iv

Datalog differentiates between:

 Rules
 Facts
 Queries

An atom is a predicate (relation name) with variables or constants as arguments.

A rule determines the logic behind the data: How the data is related. It is composed of two
parts joined by the ‘:-‘symbol:

 The head: composed of an atom.
 The body: composed of one or more atoms joined with AND (subgoals).

The first atom/subgoal Negation

p :- q, not r. The second atom/subgoal

The head
‘Is true if’ AND

In Datalog, a rule is considered to be true for a given set of variable values if all the subgoals
evaluate to true for those values. Therefore the rule shown above can be expressed as
following:

‘p is true if q is true AND not r is true’

Or

‘p depends on q AND not r ’

A special case of rules are those with a head part only (no subgoals). They are known as unite
rules and will be used mainly in facts.

p.

Another type of rules is recursive rules. A rule is recursive if it’s contained in its body as a
subgoal.

p: - r, p.

Recursively is what makes Datalog is very interesting language in logic programming. Without it,
Datalog can only express what SQL can with a Select-From-Where statement. Therefore
recursively will be the most important aspect in our study because it solves all of the
shortcomings of SQL.

Another aspect of rules is that we need to distinguish:

 EDB (extensional database relations): relations that exist in the database and used to
create facts.
 IDB (intensional database relations): relations defined by one or more rules and can’t
be used to create facts.

Finally, we need to point out the issue of safety in Datalog rules. Since rules can be written with
large margin of freedom, they can have undesirable consequences. A major problem in this
area is rules that generate an infinite number of answers. Therefore some restrictions had to be
applied in order to avoid such situations. Datalog enforces the following criteria on rules:

If a variable x appears in either:

 The head
 A negated subgoal
 An arithmetic comparison

Then x must also appear in a non-negated subgoal of the body.

Example:

The following Datalog rules are unsafe because they all return an infinite number of results

S(x):- R(y).

S(x):- NOT R(x).

S(x):- R(y) AND x < y.

Possible solution

S(x):- R(y), P(x).

S(x):- NOT R(x), P(x).

S(x):- R(y) AND x < y, P(x).

A fact is a tuple inside a relation. It can be seen as an instance for the rule. They are created by
unite rules only. They represent the dataset the program will operate on.

A fact has the following structure:

predicate name (list of constants).

A fact can only operate on
The name of the rule/relation constants. No variables are allowed
to which the fact belongs

Example:

Considering the following rule Student(X, Y) that states that a person X is a student in the
university Y. Some facts coming from that rule would be:

 Student (‘Maroun’, ‘UPA’).
 Student (‘Elie’, ‘USJ’).
 Student (‘Tony’, ‘LAU’).

A query is a question asked to the Datalog program, to which the program is supposed to
return an answer.

A query has the following structure:

?- predicate name (list of constants/variable).

A query can have either constants or
The name of the rule/relation variables or both depending on its
to which the query is related type

Datalog queries can be of three types:

 A query that determines whether a fact is true or not. (usually returns ‘yes’ or ‘no’)
 A query that returns terms of a fact.
 A query that returns all the terms of a fact.

Here are some examples of queries (using the same Student rule introduced above):

?- Student (‘Maroun’, ’UPA’). Would return ‘yes’ because Maroun is a student at UPA

?- Student (‘Maroun’, ’USJ’). Would return ‘no’ because Maroun is not a student at USJ

?- Student (‘Maroun’,Y). Would return ‘Y: UPA’ because Maroun is a student at UPA

?- Student (X,’ J’). Would return all the persons who are students at USJ (in this case Elie)

?- Student (X, Y). Would return all the facts derived from the Student rule

Deductive databases: When databases meet Datalog

A deductive database is a database system that can make deduction therefore it can come up
with additional facts other than those expressed in its dataset. Such database operates on logic
rules and facts and answers to queries. It typically uses Datalog to specify the rules, facts and
queries.

Since they are based on Datalog, deductive databases are considered more powerful than their
relational counterparts because Datalog fills the gap between the data and the logic. Not only
are they more powerful, but also have a good performance and scalability when it comes to
dealing with large datasets. v

A deductive database has two forms of data:

 Stored data belonging to a relation (extensional database relation).
 Deducted data that’s not stored but inferred on runtime (intensional database relations)

The true power of deductive databases is that they can give us far more data than what’s
actually stored into them. This capacity is nearly impossible to implement in relational
databases because of SQL’s shortcomings.

Unfortunately, deductive databases have not been largely adopted in real-life applications; they
have remained mainly used for academic and research purposes. This is most probably due to
the fact that deductive databases are used to create large knowledge bases, a thing that is
beyond the scope of most of the applications.

In the recent years, some deductive database concepts started to be used in other systems. For
example, the RDBMS that provide recursive SQL have based their implementations on
deductive database standards.

A more viable approach is to use a regular relational database with an added Datalog layer for
logic programming. This way we gain the power of Datalog without losing the ease of relational
databases. That’s the approach followed in this project.

II. Functional and Technical context

A. Introduction:

In this part, I am going to expose how I applied the information collected through the
theoretical study in order to bring answers to the problems mentioned in the beginning. This
will be explained through the application that was created in the course of this project.

The application developed is a website called ‘Family Book’ and represents a concept for a
social network linking family members and not friends. In the following parts, I will talk about
the tools and technologies used and how I put them together to build the application. I will also
show the architecture of the application through UML diagrams.

B. Tools and technologies used

The application was created using many existing tools and technologies. This approach has
provided me with many benefits:

 Reduced development time.
 Higher productivity.
 Usage of a vast number of APIs.
 Easy access to numerous help and support on the internet.
 A much solid and maintainable finished product.

All the tools used are either open source or distributed as freeware. This has reduced the cost
of the application to zero without sacrificing capabilities, ease of development and
performance. Also such kind of tools often comes with a great support from the community
through online forums and tutorials which can offer a tremendous help when developing.

Another benefit of these technologies is that they all integrate with each other seamlessly and
can work side by side without the need for complex configurations or workarounds.

In what follows I am going to enumerate the tools and technologies used by category. Then I
will give information about each as well as how and why I used it.

 Programming platform: Java EE 6 with Java Persistence API (JPA).

 Framework: Java Server Faces with PrimeFaces components.

 Application server: Glassfish Application Server v3.

 IDE: NetBeans IDE 6.8

 Database server: Oracle 10g + Oracle Multimedia.

 Datalog engine: IRIS reasoner.

 RDF engine: Seasame.

Java EE

The Java Platform Enterprise Edition (Java EE) is server programming platform for the Java
programming language. It is based largely on open-standards and used for developing,
deploying and maintaining enterprise applications.vi

Java EE enterprise applications are:

component-
n-tiered web-enabled server-centric
based

A typical Java EE enterprise application will comprise of the following:

 Presentation logic
 Business logic
 Data access logic/Data model
 System services

Java EE applications rely heavily on the container in which they are deployed. This container
offers the applications numerous services leveraging productivity and relieving developers from
implementing basic functionalities. Among these functionalities:

 Security
 Transaction
 Persistence

Java Persistance API

The Java Persistence API (JPA) is a specification for accessing, persisting, and managing data
between Java objects and a relational database. It is now considered the standard method to
accomplish Object to Relational Mapping (ORM) in the Java technology.

JPA is just a specification not a product. It doesn’t perform any task by itself. Instead it relies on
external providers to do the job. Therefore the only role of JPA is to unify all those
implementations under one interface. Some of the most used JPA providers are Hibernate,
EclipseLink, OpenJPA.vii

JPA relies on the concept of POJOs (Plain old Java objects) which are ordinary Java class that
don’t implement any given interface. By doing so, these classes can remain independent of the
implementation and can be reused later on.

Each POJO class is called an Entity, and each Entity is linked to a corresponding database table.

Here’s an example of a simple Entity class:

@Entity
public class Employee {

@Id private int id;

@Column(name="F_NAME")
private String firstName;

@Column(name="L_NAME")
private String lastName;

private long salary;
// ...

We should note that:

 The class is annotated with @Entity making it an entity class linked to a database table
that has the same name.
 id is annotated with @id making it the primary key of the database table.
 firstName and lastName are annotated with @Column which means that they’ll be
persisted into those table columns.
 salary is not annotated, but still it will be persisted in a column with the same name.

Java Serve Faces

Java Server Faces (JSF) is a server-side user interface component framework for Java web
applications. It is the standard technology for developing web applications since Java EE 5.

JSF provides developers with many features, mainly:

 Components (abstraction of standard HTML elements + custom widgets)
 Events (similar to AWT events for Java desktop applications)
 Validators (validate input data)
 Convertors (convert input data types)
 Navigation (navigate conditionally from a web page to another)
 Back-end-data integration (integrate with data sources such as databases...)

JSF and MVC:

JSF ensures that applications are well designed and highly maintainable by integrating the well-
established Model-View-Controller (MVC) design pattern into its architecture. This means that
JSF applications clearly separate between:viii

Model View Controller

In JSF, the MVC architecture is as follows:

• Reperesents the part of the application the user can see and
View interact with.
• Usually a JSP or XHTML page.

• Handles requests from the View to execute
Controller actions on the Model.
• Usually the FacesServlet class.

• Represents the behavior of the application.
Model • Deals with the application's persistent data.
• Usually a ManagedBean class

This separation between the presentation and behavior enables specialized users (web
designers, component writers, application architects...) to work each on their part of concern.
Therefore, the development cycle is shortened and the application is more maintainable in the
long term.

To illustrate how the architecture functions, here’s a simple example of a login use case in a JSF
application:

 The view :

<h:form>

<h:inputText value="#{user.name}" id="name"/>

<h:inputSecret value="#{user.password}" id="password"/>

<h:commandButton value=”Login” action=”#{user.login}”/>

</h:form>

In this simple view, we need to note the following elements:

 The form element that encompasses the components.
 The inputText element that represents a simple HTML text input.
 The inputSecret element that represents a simple HTML password input.
 The commandButton element that represents a simple HTML submit button.

Furthermore, another point of interest is how the view is communicating with the controller. This is
done using Java EL in the following manner:

 The inputText and inputSecret component’s values are linked respectively to the
name and password properties of the user model; any change of these
properties will be reflected on the view and vice versa.

 The commandButton component’s action is linked to the login method of the
user model; every time this component is activated (clicked), the controller will
invoke the login method on the user model.

 The model:

@ManagedBean(name = "user")
@SessionScoped
public class UserBean {

private String name = "";
private String password;

public String getName() { return name; }
public void setName(String newValue) { name = newValue; }

public String getPassword() { return password; }
public void setPassword(String newValue) { password = newValue; }

public String login(){

if(name.equals(“user”) && password.equals(“pass”)){
return “success”;
}

else {
return “failure”;
}
}

In this simple model, we need to note the following:

Expression Language (EL) is a scripting language which allows access to Java components (JavaBeans)
through JSP or JSF.

 It is an ordinary Java class annotated with @ManagedBean.
 The name attribute is used to access this model from Java EL.
 The @SessionScoped annotation implies that this model will be preserved as
long as the session is active.
 The model exposes its properties to Java EL as public getters and setters.
 The model has one action method, the login method. This method takes no
parameters and returns a String. The return value will determine to which view
the controller has to navigate after invoking this method. For example, if the
user is correctly logged in, the controller will navigate to the success view
(success.xhtml page), otherwise to the failure view (failure.xhtml page).

 The controller :
 Represented by the built-in class FacesServlet, therefore programmers don’t
need to define controllers themselves.
 It runs constantly in the background once the application is invoked.

Localization in JSF:

Another important aspect of JSF is internalization and localization. This is done by creating a
message bundle per locale. Each bundle will contain the textual content of the application
written in the locale’s language. The application will then use the suitable bundle according to
the web browser locale or the user’s choice.

For example, a JSF application can have the following message bundles:

 messages_en for English
 messages_fr for French
 messages_de for German
 messages_ar for Arabic

Then, if the web browser locale is set English, the application will automatically get its text content from
the messages_en bundle. Otherwise, if the locale is set to Arabic, the application will rely on
the messages_ar bundle.

Inside every bundle, the text is represented as key-value pairs. For example:

name: nom

password: mot de passe

gender: sexe

Internationalization is the process of designing an application so that it can be adapted to various languages
and regions without engineering changes. Textual elements, such as status messages and the GUI component
labels are not hardcoded in the program. Instead they are stored outside the source code and retrieved
dynamically. Therefore, support for new languages does not require recompilation.

PrimeFaces

JSF offers a limited set of visual components reflecting the classic HTML elements (input,
button, checkbox…). As useful as the components are, they are not enough for a large-scale
web application. Therefore more advanced components are needed. One solution for this
problem would be for the developer or web designer to create the custom components
themselves. However this approach definitely has a steep development cycle and can seriously
disrupt and work flow.

A much better solution would be to use a ready-made component suite. An example of such
suite is the PrimeFaces project.

PrimeFaces is an open-source project that delivers more than a hundred Ajax-ready
components. The beauty of PrimeFaces is its simplicity: no configuration is needed to integrate
the components in a JSF application. Once the appropriate JAR file is added to the application
classpath, the PrimeFaces components can be used right away.ix

Besides, another aspect of PrimeFaces is that the components can be easily skinned using
premade themes, so the application’s look and feel can be modified easily .Consequently
visually-appealing web applications can be created with few steps.

Some of PrimeFaces’ components used in the application:

 Calendar.
 Dialog.
 File upload.
 Menu.

Glassfish Application Server

Glassfish is an open-source application server created by Sun Microsoft as a reference
implementation for the Java EE platform: for every version of Java EE Glassfish is the first server
to fully implements the new features.x

The beauty of Glassfish is that it’s very simple to use and at the same time very powerful. It
offers many features, mainly:

 Thread pools.
 Database connection pools.
 Security

Glassfish can be installed stand-alone, or as in my case bundled with the NetBeans IDE. It fully integrates
with the IDE making application easy to deploy and maintain.

The version of Glassfish used with the application is v3 which fully implements the Java EE 6
specification.

NetBeans

NetBeans is a free open source IDE. It is led by an active community that provides constant updates and
features. It currently rivals with Eclipse IDE (among others) as the first Java technology IDE.

Among the features it offers:xi

 Full support for Java based applications (Web, desktop and mobile)
 Support for dynamic languages such as Groovy and Ruby.
 Support for PHP and C++.
 Support for web technologies such as HTML,CSS and JavaScript.
 Support for UML modeling.

Here’s a screenshot of NetBean’s user interface

Oracle Database

The oracle database is an object-relational database management system created by Oracle
Corporation.

Oracle database provides:

 High scalability and effectiveness.
 Manageability.
 High availability.
 Backup and recovery.
 Business intelligence features.
 Security features.
 Data integrity.

Oracle database is one of the major contenders on
the database market. It competes directly with such
products as Microsoft SQL Server and IBM DB2. It’s
available for all the major operating systems and is
widely used especially in client-server applications.xii

Figure 1

Oracle MultiMedia

Oracle database includes an interesting feature named Oracle Multimedia (formerly Oracle
interMedia). It enables the database to store, manage and retrieve audio, video, images and
other types of media. Oracle Multimedia extends the traditional features of Oracle Database
(reliability, availability…) and applies them to multimedia content.

Oracle Multimedia (formerly Oracle interMedia) is a feature that enables Oracle Database to
store, manage, and retrieve images, audio, video, or other heterogeneous media data in an
integrated fashion with other enterprise information. One aspect that Oracle Multimedia
doesn’t deal with is how the data is
captured; this is left to the application
software.xiii

Oracle Multimedia provides the following:

 Storage and retrieval.

 Media and application metadata
management. Figure 2

 Support for popular media formats.

 Access through traditional and Web interfaces.

 Querying using associated relational data.

 Querying using extracted metadata.

 Querying using media content with optional specialized indexing.

Oracle Multimedia provides the following objects to represent media content:

 ORDAudio
 ORDDoc
 ORDImage
 ORDImageSignature
 ORDVideo
 SI_StillImage

Using these objects and their methods, developers can do the following:

 Extract metadata and attributes from multimedia data.
 Get and manage multimedia data from Oracle Multimedia, web servers, file systems,
and other sources.
 Perform manipulation operations on image data

Oracle Multimedia offers two methods to manipulate these objects:

 Using PL/SQL stored procedure.
 Using Java API.

In this project, I am particularly interested in the ORDImage and ORDImageSignature objects.

ORDImage object type supports the storage, management, and manipulation of image data.

Some of the methods offered by ORDImage:

 init( ) creates a new empty ORDImage object.

BEGIN
INSERT INTO pm.online_media (product_id, product_photo)
VALUES (3501, ORDSYS.ORDImage.init());
END;

 copy(dest IN OUT ORDImage) copies an ORDImage into another.

DECLARE
image_1 ORDSYS.ORDImage;
image_2 ORDSYS.ORDImage;
BEGIN
-- Initialize a new ORDImage object where the copy will be stored:
INSERT INTO pm.online_media (product_id, product_photo)
VALUES (3091, ORDSYS.ORDImage.init());
-- Select the source object into image_1:
SELECT product_photo INTO image_1 FROM pm.online_media
WHERE product_id = 3515;
-- Select the target object into image_2:
SELECT product_photo INTO image_2 FROM pm.online_media
WHERE product_id = 3091 FOR UPDATE;
-- Copy the data from image_1 to image_2:
image_1.copy(image_2);
UPDATE pm.online_media SET product_photo = image_2
COMMIT;
END;

 process (command IN VARCHAR2) performs image processing (such as resizing..) .

DECLARE
obj ORDSYS.ORDImage;
BEGIN
SELECT product_photo INTO obj FROM pm.online_media
WHERE product_id = 3515 FOR UPDATE;
obj.process('maxScale=32 32');
UPDATE pm.online_media p SET product_thumbnail = obj
COMMIT;
END;

ORDImageSignature object type supports content-based retrieval. It serves primarily for image
matching. This is done by comparing two images according to given criteria and deducing a
score value: the smaller the score, the more similar the images are.

The criteria, also called weights, used to compare images are:

 The color.
 The texture.
 The location.
 The shape.

Some of the methods offered by ORDImageSignature:

 init( ) creates a new empty ORDImageSignature object.

BEGIN
INSERT INTO pm.online_media (product_id, product_photo,product_photo_signature)
VALUES (1910, ORDSYS.ORDImage.init('FILE', 'FILE_DIR','speaker.jpg'),
ORDSYS.ORDImageSignature.init());

COMMIT;

END;

 generateSignature(image IN ORDImage) generates the signature for the giver ORDImage object

DECLARE
t_image ORDSYS.ORDImage;
image_sig ORDSYS.ORDImageSignature;
BEGIN
SELECT p.product_photo, p.product_photo_signature INTO t_image, image_sig
FROM pm.online_media p
WHERE p.product_id = 2402 FOR UPDATE;

-- Generate a signature:
image_sig.generateSignature(t_image);
UPDATE pm.online_media p SET p.product_photo_signature = image_sig
END;

 evaluateScore(sig1 IN ORDImageSignature,sig2 IN ORDImageSignature, weights IN VARCHAR2)
evaluates the distance (the score) between the two given signatures according to the supplied
weights. The bigger the distance, the less similar the images are.

DECLARE
t_image ORDSYS.ORDImage;
c_image ORDSYS.ORDImage;
image_sig ORDSYS.ORDImageSignature;
compare_sig ORDSYS.ORDImageSignature;
score FLOAT;
BEGIN
image_sig.generateSignature(t_image);
UPDATE pm.online_media p SET p.product_photo_signature = image_sig
WHERE product_id =1910;
SELECT p.product_photo, p.product_photo_signature INTO c_image, compare_sig
compare_sig.generateSignature(c_image);
UPDATE pm.online_media p SET p.product_photo_signature = compare_sig
WHERE p.product_id = 1910;
SELECT p.product_photo, p.product_photo_signature INTO c_image, compare_sig
WHERE p.product_id = 1940;
-- Compare two images for similarity based on image color:
score:=ORDSYS.ORDImageSignature.evaluateScore(image_sig,
compare_sig,'color=1.0,texture=0,shape=0,location=0');
DBMS_OUTPUT.PUT_LINE('Score is ' || score);
END;

An example of how Oracle Multimedia can be used with Java is found in the appendices.

IRIS Reasoner

IRIS is an open-source Datalog reasoner that can evaluate safe or unsafe Datalog.

It is composed of three jar files:

 The first contains the reasoning engine.
 The second contains the parser
 The third contains some utility programs including two applications that provide a user
interface to the IRIS engine

The IRIS reasoner evaluates queries over a knowledge base. A knowledge-base is composed of
facts and rules. The combination of facts, rules and queries forms a logic program. A logic
program is the input for IRIS.xiv

A knowledge-base can be created in one of the two ways:
 Create the java objects representing the components of the knowledgebase using the
API
 Parse an entire Datalog program written in human-readable form using the parser.

IRIS evaluates the queries by returning the set of all tuples that can be found or inferred from
the knowledge-base and that satisfy the query.

Unlike the standard Datalog syntax, IRIS’s variables start with the ‘?’ symbol.

IRIS supports the addition of external data sources to be used when evaluating queries. These
data sources can databases, XML files, Web services….

Sesame

Sesame is a Java framework used to store and query RDF data. It provides the following
features:

 An extensible storage mechanism: RDF tuples can be stored in a database, disk file or
kept in-memory.
 Inferencers for RDFS.
 Many types of RDF file formats.
 An RDF query engine supporting SeRQL and SPARQL languages.

It hides all the complexities of RDF by providing an API that resembles the JDBC API. Therefore,
accessing Sesame feels like accessing any relational database.

Sesame introduces the idea of repository, a storage facility that can store RDF schema and data.
Repositories are strong and flexible facilities that can be accessed remotely via HTTP.xv

.

The following is Sesame’s architecture that shows how the different components are related
one to another

Figure 3

To illustrate how simple is Sesame to use, here’s a small example:

String sesameServer = "http://example.org/sesame2";

String repositoryID = "example-db";

//accessing a repository

Repository myRepository = new HTTPRepository(sesameServer, repositoryID);

myRepository.initialize();

//creating a connection

RepositoryConnection con = myRepository.getConnection();

//adding a new rdf element

URL url = new URL("http://example.org/example/remote");

con.add(url, url.toString(), RDFFormat.RDFXML);

//querying the repository

String queryString = "SELECT x, y FROM {x} p {y}";

TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SERQL, queryString);

TupleQueryResult result = tupleQuery.evaluate();

C. Developed solution:

Overview

The solution developed is a Java EE web application using the following features:

 Java EE 6 with JSF and PrimeFaces.
 Glassfish application server.
 Oracle 10g and Oracle MultiMedia.

The application is basically a prototype for a social network solution. It is a prototype because the
social networking features (chatting, messaging…) are not implemented. The application is simply
intended to showcase a solution for the problems mentioned above.

The proposed solutions for the problems:

The application’s primary goal is to find solutions to the problems proposed in the introduction.
It does so by using alternative ways of thinking and relying on third-party technologies.

The following is a list of the problems and their solutions:

 Create applications that rely on fewer volumes of data: Using Datalog and the IRIS
Reasoner to deduce knowledge.
 Create more intelligent applications: Using the IRIS Reasoner, the application can come
up with a knowledge base not stored in the database. Therefore we can say that the
application is ‘thinking’.
 Using existing data sources: Since Datalaog and the IRIS Reasoner can be plugged into a
pre-existing infrastructure, the application relies on a traditional relational database but
extends it with logic programming and deduction features.

 Dealing with multimedia content: Using Oracle MultiMedia, the application can dig into
the content of an image for search and comparison purposes. This capability, coupled
with Datalog reasoning, sheds a new light on how multimedia content should be
handled in applications.
 ing h c i i i in d ’ ic i n : The solution applies those feature
within a very popular application type (social networks) proving that the work done can
become practical and not just theoretical.

The idea behind the application:

The application is based on a very simple concept

‘Knowing a person’s parents, all his family members can be deduced’

Although this rule sounds too simple, it enables the construction of a person’s complete family
tree. The implications of this are tremendous:

 Fewer data is required. Only the mother and father are needed.
 Less data means less maintenance and less programming problems.
 The application acquires a certain level of artificial intelligence.

Knowing the mother and father the application can eventually deduce the identity of:

 Brothers and sisters.
 Grandparents.
 Uncles and aunts.
 Cousins

…and the list keeps going on….virtually any member of the family, no matter how far he is, can
be identified by this approach.

A user wishing to join the application will need to provide the following information:

 First and last name.
 Gender
 Birthdate.
 Mother and Father.
 Spouse.

The first and last name and the birthday are just used for display. They don’t contribute in the
application’s logic. The gender however, plays an important role because it helps determine
family relations

h ic i n’ in:
As mentioned above, the application relies on the IRIS Reasoner to make logic deductions. With
a very reduced set of rules, the application can deduce many family relations. And this behavior
is extensible, meaning that we can add as much rules as we want to identify family members.

The following is a listing of the rules that the application relies on:

male(?X).

female(?X).

father(?X,?Y).

mother(?X,?Y).

child(?X,?Y):-father(?Y,?X).

child(?X,?Y):-mother(?Y,?X).

brother(?X,?Y):- male(?X),father(?Z,?X),father(?Z,?Y),mother(?T,?X),mother(?T,?Y),not ?X=?Y.

sister(?X,?Y):- female(?X),father(?Z,?X),father(?Z,?Y),mother(?T,?X),mother(?T,?Y),not ?X=?Y.

paternalGrandFather(?X,?Y):- father(?X,?Z),father(?Z,?Y).

maternalGrandFather(?X,?Y):- father(?X,?Z),mother(?Z,?Y).

paternalGrandMother(?X,?Y):- mother(?X,?Z),father(?Z,?Y).

maternalGrandMother(?X,?Y):- mother(?X,?Z),mother(?Z,?Y).

uncle(?X,?Y):-male(?X),father(?Z,?Y),brother(?X,?Z).

uncle(?X,?Y):-male(?X),mother(?Z,?Y),brother(?X,?Z).

aunt(?X,?Y):-female(?X),father(?Z,?Y),sister(?X,?Z).

aunt(?X,?Y):-female(?X),mother(?Z,?Y),sister(?X,?Z).

This is all the application needs to deduce family members. If we want to identify a new family
member we need to add the corresponding rule to this set.

The beauty of IRIS Reasoner is that it integrates very easily with a relational database and
enables us to query it using Datalog. This feature enabled the IRIS Reasoner to communicate
with the Oracle database and fetch the data from its relational tables.

This can be done by adding a custom data source to the IRIS Reasoner’s configuration so the
engine can investigate this data source every time a Datalog query is issued.

The following is a code extract to demonstrate this:

Configuration configuration = KnowledgeBaseFactory.getDefaultConfiguration();

configuration.externalDataSources.add(new SQLDataSource());

SQLDataSource is a custom class implementing the IDataSource interface of IRIS. It provides
the following method:

public void get(IPredicate predicate, ITuple from, ITuple to, IRelation relation) {

//test with predicate is being queried from Datalog.

//Construct the corresponding SQL query and execute it on the database

// Retrieve the results and put them in the relation

}

This approach translates Datalog’s non-recursive queries into SQL Select-From-Where queries.
And since those simple rules are used within other recursive ones, we gain the power of
recursively that SQL cannot offer.

For example, if we take the paternalGrandFather recursive relation, it is composed of two non-
recursive rules (father). Expressing this rule in SQL is not very obvious, but when it’s broken
down into two separate rules, each can be executed as a normal SQL query without losing the
power of recursion.

Therefore, when evaluating the paternalGrandFather rule , the IRIS Reasoner will beginning by
evaluating the first father rule and then evaluating the second.

This rule can be easily read as:

‘X is the parental grandfather of Y, if X is the father of X’s father’

h i n’ f :

The ‘Family Book’ application enables users to:

 Login.
 Register.
 View/Edit their profile.
 View Family members.
 Search For people.
 View/Add photos.
 Compare/Search photos.

UML modeling:

Use case diagram

Login sequence diagram

Register sequence diagram

View profile sequence diagram

Edit profile sequence diagram

View Family sequence diagram

View photos sequence diagram

Edit photos sequence diagram

Compare photos sequence diagram

Search photos sequence diagram

Add person sequence diagram

The solution’ c n n :
The application can be divided into two big parts:

 The web part.
 The source code part.

The web pages:

The application contains the following web pages:

 index.xhtml
 home.xhtml
 person-profile.xhtml
 person-family.xhtml
 photo-list.xml

index.xhtml

h ic i n’ in g wh h c n gin

The ic i n’ in g h wing h gi i nf

home.xhtml

This is the page where the user is directed after he logs in

h n f ing ch h n’ n w hi i n h vi i in h v

person-profile.xhtml

hi i h n’ fi e where he can view and edit his information

person-family.xhtml

This is the page where the user can see how the application has deducted his family members

photo-list.xhtml

In this page the user can view the photos he has uploaded as well as upload new ones

The user here has chosen to compare those two pictures, and the application has returned the comparison score

The source code:

The application contains the following packages and classes:

 pji.ejb contains business logic classes

o ConceptService : Deals with RDF concepts
stored in Sesame.
a) findAll() returns all concepts.

o FamilyService : Deals with family relations
between persons.
a) findRelatedPerson(Person p1,
FamilyRelation r) finds the person p2
that is related to p1 with the relation r.
b) findRelatedPersons(Person person, FamilyRelation relation) finds the list
of persons that are related to p1 with the relation r.
c) areRelated(Person p1, Person p2, FamilyRelation r) returns true if p1 and
p2 are related with the relation r.
d) findRelation(Person p1, Person p2) finds what relation, if any, joins p1
and p2.

o PersonService: Deals with person’s information.
a) create(Person p) persists the person p in the database.
b) find(Object id) finds the person with the corresponding id.
c) findByMatchingName(String name) finds all the persons whose names
match the given name.
d) findByGender(Gender g) finds all the persons who are of gender g.

o PhotoService: Deals with user’s photos.
a) add(InputStream i, String d, Person p) adds a new image from the
inputstream I and having a description d for the person p.

b) findForPersonId(int id) finds all the photos that belong to the person with
the given id.
c) compare(int id1, int id2, double color, double texture, double shape,
double location) compares the two photos corresponding to the given
ids , according to the supplied weights.

o UserService: Deals with user related matters such as login and logout.
a) create(User u) persists the user u in the database.
b) findByName(String name) finds the user with the corresponding name.
c) login(User user) checks the user’s credentials and performs a log in.

 pji.ejb.datalog contains the Datalog logic classes

o Reasoner: Invokes the IRIS Engine to execute Datalog queries.
a) executeBooleanQuery(String query) returns true if the supplied Datalog
query returned a ‘yes’.
b) executeQuery(String query) returns the list of person ids returned after
the execution of the supplied query.

o SQLDataSource: Acts as a bridge between IRIS and Oracle database.
a) get(IPredicate predicate, ITuple from, ITuple to, IRelation relation)
converts the Datalog query into SQL and fills the relation with data
coming from Oracle.
o rules.txt: Contains the Datalog rules that the application acts upon.

 pji.entities contains the entity classes mapped to the database tables

o FamilyRelation lists the possible family relations that can exist between two
people
o Gender Can be either male or female
o Person Represents a person with a profile and relations to other persons
o Photo Represents a profile picture stored in the database as a BLOB. This is
independent from the photos that the person can upload.
o User Represents the credentials with which a person logs into the application.

The database structure:

The application communicates with an Oracle database named ‘ORCL’. It contains the following
tables:

 GENDERS
 PERSONS
 PHOTOS
 PROFILE_PHOTOS
 USERS

And here’s their structure:

Persons Users

Genders

Photos

Profile_Photos

The database also contains the following stored procedure that adds a photo in the Photos
table

Create or replace PROCEDURE ADD_PHOTO (description IN VARCHAR2, person_id IN NUMBER, id OUT
NUMBER) AS

BEGIN

INSERT Into photos Values (photo_seq.nextval, description,
ordsys.ordimage.init(),ordsys.ordimagesignature.init(), person_id) returning id into id;

END ADD_PHOTO;

III. Conclusion and future works

At the beginning of this report, many questions were asked. Those questions revolved around
data-centric issues. Mainly, how to reduce the data volume and make the applications smarter
and how to treat multimedia content differently that the normal textual data.

In the course of this project, I researched many existing solutions and tools that tried to answer
those questions. But none of them answered all those questions. That’s why I decided that,
based on these tools; I am going to come up with a custom solution. That solution consisted of
keep using traditional relational databases while integrating the powerful Datalog language on
top of them. This approach gave me the benefits of Datalog with the flexibility of relational
databases. I also investigated alternatives data sources such as RDF and used it along with
Sesame to classify my multimedia content according to concept. On another hand, I used
Oracle Multimedia in order to have the possibility to compare multimedia content (images)
according to their content (shape, color, location and texture).

As an improvement to my solution, I believe we need to integrate deduction in more aspects of
the applications (not just the family relations). We also need to integrate RDF more to make
more use of its powerful constructs and possibly also use OWL for ontologies. A last question
begs itself is whether all those features will make it one day to a real-life application? Or will
they always remain for academic purpose? And if they do make that transition, what will
happen to our current tools? What future theirs is for SQL databases if Datalog dominated the
market? Can they peacefully coexist?

List of figures

 Figure 1 : Oracle Database architecture

http://download.oracle.com/docs/html/B10163_01/img/ntqrf003.gif

 Figure 2 : Oracle Multimedia architecture

http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10777/img/imurg002.gi
f

 Figure 3 : Sesame architecture

http://www.openrdf.org/doc/sesame2/2.3.2/users/figures/sesame-components.png

List of references
i
Deductive Databases and Their Applications
Robert Colomb CRC Press, USA
ii
http://en.wikipedia.org/wiki/SQL

iii
http://pages.cs.wisc.edu/~dbbook/openAccess/thirdEdition/slides/slides3ed-
english/Ch25_DedDB-95.pdf

iv
http://en.wikipedia.org/wiki/Datalog

v
http://en.wikipedia.org/wiki/Deductive_database
vi
http://www.javapassion.com/portal/images/pdf_files/javaee/javaee_overview.pdf

vii
http://en.wikibooks.org/wiki/Java_Persistence/What_is_JPA%3F

viii
http://en.wikipedia.org/wiki/JavaServer_Faces

http://www.oracle.com/technetwork/java/javaee/overview-140548.html

ix
http://primefaces.org/documentation.html

x
http://glassfish.java.net/

http://en.wikipedia.org/wiki/Glassfish

xi
http://netbeans.org/features/index.html

http://netbeans.org/about/

http://en.wikipedia.org/wiki/NetBeans

xii
http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/intro.htm#i57253

http://en.wikipedia.org/wiki/Oracle_Database
xiii
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10777/toc.htm

xiv
http://www.iris-reasoner.org/pages/user_guide.pdf

xv
http://www.openrdf.org/doc/sesame2/2.3.2/users/userguide.html#d0e14

Appendices

Oracle MultiMedia with Java API example

import java.sql.Connection;
import java.sql.SQLException;
import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.PreparedStatement;
import java.sql.ResultSet;

import java.io.IOException;

import oracle.jdbc.OracleResultSet;
import oracle.jdbc.OraclePreparedStatement;

import oracle.ord.im.OrdImage;

public class InterMediaQuickStart {
/**
* Entry point.
* usage: java InterMediaQuickStart connectionString username
password
* e.g. java InterMediaQuickStart jdbc:oracle:oci:@inst1 scott
tiger
**/
public static void main(String[] args) throws Exception {
System.out.println("starting interMedia Java quickstart...");
if (args.length != 3) {
System.out.println("usage: java InterMediaQuickStart
connectionString username password");
System.out.println("e.g. java InterMediaQuickStart
jdbc:oracle:oci:@inst1 scott tiger");
System.exit(-1);
}

// Run the examples as they are printed in the quick start
documentation

// register the oracle jdbc driver with the JDBC driver
manager
DriverManager.registerDriver(new
oracle.jdbc.driver.OracleDriver());

Connection conn = DriverManager.getConnection(args[0],
args[1], args[2]);
System.out.println("Got database connection");

System.out.println("Starting verbatim interMedia Java Quick
Start code...");

// Note: it is CRITICAL to set the autocommit to false so that
// two-phase select-commit of BLOBs can occur.
conn.setAutoCommit(false);

// create a JDBC Statement object to execute SQL in the
database
Statement stmt = conn.createStatement();

// -------------------
// Creating a Table
// -------------------
{
// Create the image_table table with two rows: id and
image
String tableCreateSQL = "create table image_table " +
"(id number primary key, " +
"image ordsys.ordimage)";
stmt.execute(tableCreateSQL);
}

// -------------------
// Uploading Images from Files into Tables
// -------------------
{
// insert a row into image_table
String rowInsertSQL = ("insert into image_table (id,
image) values (1,ordsys.ordimage.init())");
stmt.execute(rowInsertSQL);

// select the new ORDImage into a java proxy OrdImage
object (imageProxy)
String rowSelectSQL = "select image from image_table where
id = 1 for update";
OracleResultSet rset =
(OracleResultSet)stmt.executeQuery(rowSelectSQL);
rset.next();
OrdImage imageProxy = (OrdImage)rset.getORAData("image",
OrdImage.getORADataFactory());
rset.close();

imageProxy.loadDataFromFile("goats.gif");
imageProxy.setProperties();

String updateSQL = "update image_table set image=? where
id=1";
OraclePreparedStatement opstmt =
(OraclePreparedStatement)conn.prepareStatement(updateSQL);
opstmt.setORAData(1, imageProxy);
opstmt.execute();
opstmt.close();
}

// -------------------
// Retrieving Image Properties
// -------------------

// Java Accessor Methods
{
String rowSelectSQL = "select image from image_table where
id = 1";
(OracleResultSet)stmt.executeQuery(rowSelectSQL);
rset.next();
rset.close();
int height = imageProxy.getHeight();
int width = imageProxy.getWidth();
System.out.println("proxy (height x width) = " + height +
" x " + width);
}

// -------------------
// Creating Thumbnails and Changing Formats
// -------------------
{
// One could significantly reduce the number of round trip
// database communications in the following example.
String rowInsertSQL = ("insert into image_table (id,
image) " +
"values (2,
ordsys.ordimage.init())");
stmt.execute(rowInsertSQL);

// get the source ORDImage object
String srcSelectSQL = "select image from image_table where
id=1";
(OracleResultSet)stmt.executeQuery(srcSelectSQL);
rset.next();
OrdImage srcImageProxy =
(OrdImage)rset.getORAData("image", OrdImage.getORADataFactory());

rset.close();

// get the newly inserted destination ORDImage object
String dstSelectSQL = "select image from image_table where
id=2 for update";
rset = (OracleResultSet)stmt.executeQuery(dstSelectSQL);
rset.next();
OrdImage dstImageProxy =
(OrdImage)rset.getORAData("image", OrdImage.getORADataFactory());
rset.close();

// call the processCopy method (processing occurs on the
SERVER)
srcImageProxy.processCopy("maxscale=100 100
fileformat=jfif", dstImageProxy);

// update the destination image in the second row
String dstUpdateSQL = "update image_table set image=?
where id=2";
OraclePreparedStatement opstmt =

(OraclePreparedStatement)conn.prepareStatement(dstUpdateSQL);
opstmt.setORAData(1, dstImageProxy);
opstmt.execute();
opstmt.close();
}

// -------------------
// Downloading Image Data from Tables into Files
// -------------------
{
// export the data in row 2
String exportSelectSQL = "select image from image_table
where id = 2";

(OracleResultSet)stmt.executeQuery(exportSelectSQL);

// get the proxy for the image in row 2
rset.next();
rset.close();

// call the getDataInFile method to write the ORDImage in
row 2 to disk
imageProxy.getDataInFile("row2.jpg");
}

// -------------------

// Cleaning up
// -------------------
{
// drop the images table
stmt.executeQuery("drop table image_table");

// commit all our changes
conn.commit();
}

stmt.close();
System.out.println("Done with verbatim interMedia Java Quick
Start code.");

// close the database connection to release all the resources
conn.close();
return;
}
}

Deductive Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Deductive Databases

Similar to Deductive Databases (20)

Recently uploaded

Recently uploaded (20)

Deductive Databases