1
Chapter 8
An Object Approach for Web Multimedia
Presentations
Jérôme Gensel1
, Philippe Mulhem2
, Hervé Martin1
1
LSR-IMAG, Grenoble, France, 2
IPAL-CNRS Singapore
Abstract: This paper deals with the coupling of V-STORM, which is both a
video manager and a multimedia presentation system, with AROM, an
object-based knowledge representation system. We first present here
an AROM knowledge base, called the AVS model, which constitutes a
generic model for multimedia presentations. The resulting model
encompasses any multimedia presentation described using the SMIL
standard. By instantiating this knowledge base, the author describes
her/his multimedia presentation and the way media objects interact in
it. Then, the corresponding SMIL file is exhibited and sent to V-
STORM in order to be played. This coupling shows to be relevant for
two reasons: first, by its UML-like formalism, AROM eases the task
of a multimedia presentation author; second, AROM is put in charge
of checking the spatial and temporal consistencies of the presentation
during its description. This way, a consistent presentation is sent to V-
STORM. Lastly, we present another AROM model which describes
the notion of template, a logical presentation unit which merges
database queries with spatio-temporal constraints.
Key words: Multimedia presentations, Videos, Knowledge Representation, SMIL,
Template
1. INTRODUCTION
In the last decade, multimedia and, more particularly, video systems have
benefited from a tremendous research interest. The main reason for this is
the increasing ability computers now have for supporting video data, notably
thanks to unceasing improvements in data compression formats (as MPEG-2
and MPEG-4), in network transfer rates and operating systems [1], and in
disk storage capacity. Unsurprisingly, new applications have risen such as
2 Chapter 8
video on demand, video conferencing and home video editing which directly
benefit from this evolution. Following this trend, research efforts ([2], [3])
have been made to extend DataBase Management Systems (DBMS) so that
they support video data types not simply through Binary Large Objects
(BLOB). Indeed, DBMS seem to be well-suited systems for tackling
problems posed by the video, namely storage, modeling, querying and
presentation. Video data types must be physically managed apart from other
conventional data types in order to fulfill their performance requirements.
Video modeling must take into account the hierarchical structure of a video
(shots, scenes and sequences) and allow overlapping and disjoint segment
clustering [4]. The video query language must (accordingly to MPEG-7
descriptions [5]) allow one to query video content using textual annotations
or computed signatures (colour, shape, texture…) and deal with the dynamic
(movements) of objects in the scenes as well as with semi-structural aspects
of videos and, finally, must offer the possibility of creating new videos.
We have designed and implemented V-STORM [6] a video system which
captures video data in an object DBMS. The V-STORM model considers
video data from different perspectives (represented by class hierarchies):
physical (as a BLOB), structural (a video is made up of shots which are
themselves composed of scenes which can be split into sequences),
composition (for editing new videos using data already stored in the
database), semantics (through an annotation, a video segment is linked to a
database object or a keyword). V-STORM uses and extends the O2 object
DBMS and comes as a tool for formulating queries on videos, composing a
video using the results of queries, and generating video abstracts. V-STORM
can play videos (or segments of) of its database but also virtual videos (or
segments of) composed through an O2 interface. Moreover, it is possible to
use V-STORM as a multimedia player for presentations described using the
SMIL [7] standard . This way, V-STORM can be classified in the family of
multimedia presentation software like GriNS [8] or RealNetworks G2 [9].
We show how AROM [10], an object-based knowledge representation
system, can be used to help a V-STORM user to build, in a more declarative
way using templates [11], a multimedia presentation by instantiating a
knowledge base rather than by writing a SMIL file. Then, we show how both
spatial and temporal consistencies of multimedia presentation expressed by
template declarations can be maintained by AROM.
The paper is organized as follows : sections 2 and 3 present respectively
the V-STORM and AROM systems ; section 4 describes the AVS model, an
AROM knowledge base which corresponds to a general multimedia
presentation structure ; section 5 details the template model used for the
definition of multimedia presentations ; section 6 gives the related works
before we conclude in section 7.
8. 3
2. THE V-STORM SYSTEM
V-STORM differentiates between the raw video stored in the database
and the video which is watched and manipulated by end-users. From a user
point of view, a video is a continuous media which can be played, stopped,
paused, etc. From a DBMS point of view, a video is a complex object
composed of an ordered sequence of frames, each having a fixed display
time. This way, new virtual videos can be created by assembling frames
from different segments of videos.
O2
OQL
Video
DataBase
Video
Composer
Video
Player
Figure 1. The V-STORM architecture. Through the video composer interface, a
video is described by the user, translated in OQL so that its component video
segments can be sought in the O2 video database. Then, the video is played by the
V-STORM video player.
In V-STORM, the Object Query Language (OQL) [12] is used (see
Figure 1) to extract video segments to compose virtual videos. Video query
expressions are stored in the databases and the final video is generated at
presentation time. This approach avoids data replication. A video query
returns either a video interval which is a continuous sequence of frames
belonging to the same video, or a whole video, or an excerpt of a raw video
(by combination of the two previous cases), or a logical extract of a video
stemming from various raw videos.
Video composition in V-STORM is achieved using a set of algebraic
operators. A virtual video can be the result of the concatenation, or the
concatenation without duplication (union), or the intersection, or the
difference of two videos, but also the reduction (by elimination of duplicate
segments) or the finite repetition of a single video. Annotations in V-
STORM are used to describe salient objects or events appearing in the video.
Annotations can be declared at each level of the video hierarchy. They are
manually created by the users through an annotation tool. V-STORM also
integrates an algorithm to automatically generate video abstracts. Video
abstracts aims at optimizing the time for watching a video in search of a
4 Chapter 8
particular segment. The user has to provide some information concerning the
expected abstract: its source (one or more videos), its duration, its structure
(which reflects the structure of the video), and its granularity (in the video
segments might be more relevant than others). Finally, in order to open V-
STORM to the multimedia presentation standardization, we have developed
a SMIL parser (see Figure 2) so that V-STORM can read a SMIL document
and play the corresponding presentation. Also, interactivity is possible since
V-STORM handles the presence of anchors for hypermedia links during
presentations.
SMIL
Parser
SMIL File
Video
Player
Figure 2. V-STORM can also be used as multimedia presentation. The presentation is
described in SMIL, sent to a parser and played by the V-STORM video player.
The parser checks the validity of the SMIL document against the SMIL
DTD (extended to support new temporal operations carried out by V-
STORM). Then the different SMIL elements are translated in V-STORM
commands and the video is displayed. Currently, this parser is limited and
does not exploit all the V-STORM functionalities concerning operations on
videos. The work presented here extends the description of a la SMIL
multimedia presentations in order to better exploit V-STORM capabilities.
3. THE AROM SYSTEM
Object-Based Knowledge Representation Systems (OBKRS) are known
to be declarative systems for describing, organizing and processing large
amounts of knowledge. In these systems [13], once built, a knowledge base
(KB) can be exploited through various and powerful inference mechanisms
such as classification, method calls, default values, filters, etc. AROM
(which stands for Associating Relations and Objects for Modeling) is a new
OBKRS which departs from others in two ways. First, in addition to classes
(and objects) which often constitute the unique and central representation
entity in OBKRS, AROM uses associations (and tuples), similar to those
found in UML [14], to describe and organize links between objects having
common structure and semantics. Second, in addition to the classical
8. 5
OBKRS inference mechanisms, AROM integrates an algebraic modeling
language (AML) for expressing operational knowledge in a declarative way.
The AML is used to write constraints, queries, numerical and symbolic
equations involving the various elements of a KB.
A class in AROM describes a set of objects sharing common properties
and constraints. Each class is characterized by a set of properties called
variables and by a set of constraints. A variable denotes a property whose
basic type is not a class of the KB. Each variable is characterized by a set of
facets (domain restriction facets, inference facets, and documentation facets).
Expressed in the AML, constraints are necessary conditions for an object to
belong to the class. Constraints bind together variables of – or reachable
from – the class. The generalization/specialization relation is a partial order
organizes classes in a hierarchy supported by a simple inheritance
mechanism. An AROM object represents a distinguishable entity of the
modeled domain. Each object is attached to exactly one class at any moment.
In AROM, like in UML, an association represents a set of similar links
between n (n ≥ 2) classes, being distinct or not. A link contains objects of the
classes (one for each class) connected by the association. An association is
described by means of roles, variables and constraints. A role corresponds to
the connection between an association and one of the classes it connects.
Each role has a multiplicity, whose meaning is the same as in UML. A
variable of an association denotes a property associated with a link and has
the same set of available facets as a class variable. A tuple of an n-ary
association having m variables vi (1 ≤ i ≤ m) is the (n+m)-uple made up of
the n objects of the link and of the m values of the variables of the
association. A tuple is an "instance" of an association. Association
constraints involve variables or roles and are written in the AML, and must
be satisfied by every tuple of the association. Associations are organized in
specialization hierarchies. See Tables 1 et 2 and Figure 5 for a textual and a
graphical sketches of an AROM KB dedicated to multimedia presentations.
First introduced in Operations Research, algebraic modeling languages
(AMLs) make it possible to write systems of equations and/or of constraints,
in a formalism close to mathematical notations. They support the use of
indexed variables and expressions, quantifiers and iterated operators like ∑
(sum) and ∏ (product), in order to build expressions such as
j
J
j
i x
x
I
i ∈
∑
=
∈
∀ , . AMLs have been used for linear and non-linear, for
discrete-time simulation, and recently for constraint programming [15]. In
AROM, the AML is used for writing both equations, constraints, and
queries. AML expressions can be built from the following elements:
constants, indices and indexed expressions, operators and functions, iterated
operators, quantified expressions, variables belonging to classes and
associations, and expressions that allow to access to the tuples of an
6 Chapter 8
association. An AML interpreter solves systems of (non-simultaneous)
equations and processes queries. Written in Java 1.2, AROM is available as a
platform for knowledge representation and exploitation. It comprises an
interactive modeling environment, which allows one to create, consult, and
modify an AROM KB; a Java API, an interpreter for processing queries and
solving sets of (non-simultaneous) equations written in AML, and
WebAROM, a tool for consulting and editing a KB through a Web browser.
4. COUPLING AROM AND V-STORM
As mentioned above, multimedia scenarios played by V-STORM can be
described using SMIL. The starting point of this study is twofold. We aim
first at providing a UML-like model in order to ease the description of a
multimedia presentation and, second, at reinforcing consistency regarding
spatial and especially temporal constraints between the components of a
multimedia presentation. It is our conviction that, SMIL like XML [16], are
not intuitive knowledge representation languages, and one needs to be
familiar with their syntax before to read or write and understand the structure
of a document. So, we propose an AVS (AROM/V-STORM) model (see
Figures 3 and 4), which consists of an AROM knowledge base whose
structure incorporates any SMIL element used in the description of a
multimedia presentation. This way, we provide a V-STORM user with an
operational UML-like model for describing her/his multimedia presentation.
Using an entity/relation (or class/association) approach for modeling is now
a widely accepted approach, and UML has become a standard. Through the
AROM Interface Modeling Environment, the graphical representation of
classes and associations which constitute the AVS model, gives the user a
more intuitive idea of the structure of her/his presentation. Moreover, taking
advantage of the AROM's AML and type checking, the user can be informed
about the spatial and temporal consistencies of her/his presentation.
4.1 An AROM Model for Multimedia Presentations
Since V-STORM can play any presentation described with SMIL, our
AROM model for multimedia presentation is SMIL compliant. This means
that it incorporates classes and associations corresponding to every element
that can be found in the structure of a SMIL document. However, the main
objective of the AVS model is to give the user the opportunity to invoke any
kind of operations V-STORM can performed on a video.
In the AVS model, the various features of a multimedia presentation are
modeled using classes and associations (see Table 1) which represent the
8. 7
principal elements of SMIL. The class Presentation gives the more
general information about the multimedia presentation. Concerning the
spatial formatting which describes the way displayable objects are placed
into the presentation window, it is described by objects of the Layout class,
in accordance with the SMIL recommendation. When a presentation gathers
more than one layout V-STORM chooses the first layout that matches the
user preferences. This way, V-STORM permits some adaptability
concerning the characteristics of the machine on which the presentation is
played. A layout can be associated with a root-layout and several regions
(described respectively by classes RootLayout, Region and associations
HasRootLayout and HasRegion) where the media objects appear.
Figure 3. A view of the Interactive Modeling Environment through which the AVS model can
be instantiated. On the left, a view of the class and association hierarchies. On the right, the
UML-like graphical description of the mode. From this graphical description, a textual
description is automatically generated ready to be parsed. For instance, one can find the
corresponding textual description of the class Element in Table 1.
8 Chapter 8
class: RootLayout
variables:
variable: b_color
type: string
variable: title
type: string
variable: height
type: integer
variable: width
type: integer
class: Region
variables:
variable: b_color
type: string
variable: fit
type: string
variable: title
type: string
variable: top
type: integer
variable: height
type: integer
variable: width
type: integer
class: CommonAttributes
variables:
variable: abstract
type: string
variable: author
type: string
variable: begin
type: float
default: 0
variable: end
type: float
definition:
end=begin+dur
variable: dur
type: float
default: 0
definition:
dur=end-begin
variable: region
type: string
variable: repeat
type: integer
variable: s_bitrate
type: integer
variable: s_caption
type: boolean
variable: s_language
type: list-of string
cardinality:
min:0
max: *
class: Block
super-class:
CommonAttributes
variables:
variable: sync
type: string
default: "seq"
variable: endsync
type: string
class: Element
super-class:
CommonAttributes
variables:
variable: media
type: string
variable: src
type: string
variable: alt
documentation: "specifies an
alternate text, if the media
can not be displayed"
type: string
variable: fill
documentation: "if fill=true
so freeze else remove"
type: Boolean
association: CBE
roles:
role: block
type: Block
multiplicity:
min: 0
max: *
role: element
type: Element
multiplicity:
min: 0
max: *
Table 1. An excerpt of the AROM textual description showing 7 classes and 1 association of
the AVS model for multimedia presentations. In the CommonAttributes abstract class, a
definition is given for the end and dur variables using the AML.
Concerning the time model, a V-STORM presentation is made up of
blocks. Each block can contain other blocks and/or media objects. Basic
media objects supported by V-STORM are continuous media with an
intrinsic duration (video, audio…) or discrete media without an intrinsic
duration (text, image…). The variable sync in the Block class determines
the temporal behavior (namely parallel or sequential presentation) of the
elements in the blocks, depending on its value seq or par. Three temporal
information can be associated with a media object or a block: its duration
(variable dur), its begin and end times (variables begin and end). When no
value is specified for this variable, the duration of a discrete object is null
and the duration of a continuous object is its natural duration. The semantics
8. 9
concerning the effective beginning of objects linked to a parallel or
sequential block is the same as the one defined in the SMIL
recommendation. Also, every date associated with an object must be defined
as a float value. This is not a limitation since the model allows to associate to
a media object a set of reaction methods (start, end, load…) in response to
events (click, begin, end…) triggered by other objects. Compared with an
authoring language like GRiNS, this event-reaction mechanism offers more
synchronization possibilities between objects and through the AVS model, it
is more easy and intuitive to express a temporal scenario. The knowledge
base contains a Switch class in charge of adapting the presentation to the
system capabilities and settings. The variables found in this class
(s_bitrate,s_captions, s_language…) are equivalent to the
attributes of the switch element in SMIL. The player will play the first
element of the switch acceptable for presentation. Finally, the two kinds of
navigational links proposed by SMIL (a and anchor) and allowing
interactivity during a presentation, are represented in the knowledge base by
the A_Link and Anchor_Link classes. The power of the event-reaction
mechanism implemented in V-STORM allow an author to define more
powerful and intuitive user interaction possibilities than in SMIL. For
instance, a media object can start some time after the click on another object,
and it can end just after the load of a new object.
4.2 Building a multimedia presentation
To build a multimedia presentation, a V-STORM user simply has to
instantiate the AROM KB. For a local KB, this can be done either by using
the AROM Interactive Modeling Environment (see Figure 3 and Figure 4),
or by completing the ASCII document describing the KB (like in Table 1),
or by using the Java API of AROM in a program. For a distant KB, this can
be done through a web browser using WebAROM. Since this instantiation is
made under the control of AROM (type checking, multiplicity constraint
satisfaction, …), both the spatial and temporal consistencies of the described
presentation are guaranteed.
Once this instantiation is performed, an AROM-SMIL parser we have
written is launched and the resulting SMIL file is sent to the SMIL parser of
V-STORM (see Figure 5).
10 Chapter 8
Figure 4. The editor for instantiating the AVS model
AROM-SMIL
Parser
VSTORM
AVS Model
for Mulmedia
Presentations
User Interface
SMIL
AROM-VSTORM
Parser
AROM-API AROM-IME MODEL.txta WebAROM
Figure 5. The architecture of the
AROM/V-STORM coupling
4.3 Benefits of the AVS model
The coupling between V-STORM and AROM combines the video
management richness of the former to the expressing and modeling power of
the latter. Compared to the classical specification and presentation of
multimedia documents, this coupling offers several advantages.
- UML-like description: The AVS-model is described in a graphical
notation close to UML. Object-oriented analysis and design methods,
have shown the relevance of using graphical notations to improve
communication between all the actors of a design process (for instance,
in a collaborative publication task between several authors).
- Modularity and reuse: The author can edit parts of the presentation
independently and group them to compose its documents, just by
manipulating AROM objects. This object approach allows the reuse of
existing blocks to compose new presentations, saving a large amount of
work in the design phase.
- Object identity: In an AROM KB, each object has a unique identifier.
This property prevents inconsistencies due to the assignment of the same
identifier to two different media objects. For the name given to regions,
for instance, the existence of such names is checked.
- Consistency maintenance: When a presentation contains inconsistencies,
for instance when it says that an object B starts at the end of an object A,
while an object C starts at the end of B and C starts at the same time as
A, classical multimedia systems ignore or do just warn about these
inconsistencies at display time. Here, a temporal checking is performed
by AROM during the construction of the presentation and the author is
warned about such inconsistencies. This warranties that a consistent
document is sent to the presentation system. This static checking also
8. 11
makes it possible to obtain a global trace of the presentation or a
timeline view, which aligns all events on a single axis of time.
- Virtual videos: In addition to raw videos, the author can include in the
presentation virtual videos. They correspond to video objects having no
value for their src variable. Associations (Extraction,
Reduction,Repetition,BinaryOperation) corresponding to the
V-STORM operations for creating virtual videos have been introduced
in the KB. Once these associations are instantiated, their tuples link a
virtual video to the video(s) (raw or virtual) it is derived from.
- Keywords and video abtracts: It is possible to use the keywords
variable to annotate and to formulate queries on the content of a video.
Moreover, the model includes an AbstractOf association in order to
link an abstract (an object of the VAbstract class), possibly having a
given duration, to a video. Thus, the video can be replaced by its abstract
during the presentation. A VAbstract object can be created manually
or automatically using the AROM API and the V-STORM video abstract
generator.
5. THE TEMPLATE MODEL
We present here the model that supports the logical presentation of
database video objects. First, we introduce the concept of Template. A
Template is a logical presentation unit. Any Template has a unique identifier
and can be composed of several components. A component is either a
template or a query expression specified using the OQL query language [17].
Template definitions specify spatio-temporal constraints between
components. We present in this section the structure of a Template, and then
we introduce its spatial and temporal features.
5.1 Template definition
A template is defined by its name, its structure and temporal and spatial
constraints. A template T is formally defined as a quadruplet [id, ip, c, p] ∈
ID × IP × C × P:
- Id is the set of template logical identifiers.
- IP is the set of template inputs. An input ensures the communication
between templates. From an object perspective, is corresponds to
message parameters.
- C is the set of template components. A component identifies several
logical elements linked by spatio-temporal relationships. A component
12 Chapter 8
can be atomic (it contains one template instance) or a collection (it
contains a set of template instances).
- P is the set of spatio-temporal constrains related to the template and its
components.
Here is an example of a template definition :
Template: t1(spl : set(laboratory));
Components: (t1.c1=select t2(pl) from pl in spl);
Sp_Synchro: link*(t1.c1);
Te_Synchro : seq*(t1.c1);
Te_Duration : 400
This template is identified by its name t1, has one collection component,
namely t1.c1.The spatial (Sp_Synchro) and temporal (Te_Synchro,
Te_Duration) constrains are described in the following sessions. The
template t1 illustrates the presentation of a collection of laboratories (spl).
Each laboratory pl is selected using a query expression and is presented
according to the definition of another template t2.
5.2 Spatial Description
The aim of the spatial description is to specify the presentation layout.
This task is performed in three steps. First, we split the space according to
the components that must be presented simultaneously. For example, to
present one employee in a laboratory, we assign one region to display the
video of his face, and another for the video of him/her at work.
Consequently, we split the presentation space into two regions. Second, we
define how the presentation space split is achieved. We postulate that the
spatial presentation space of a template is a rectangle, which is consistent
with the usual display of videos. It is allowed to divide this rectangle both
horizontally and/or vertically. Third, each region can be assigned to a
Template component, which can itself be assigned to several regions.
Suppose we want to display two components C1 and C2 using a 2 by 2 split.
The figure 6 shows valid distribution for regions, without considering
vertical and horizontal symmetries or rotations between C1 and C2.
C1 C2 C1 C2 C2 C1 C1 C1
C1 C2 C2 C2 C2 C2 C2
Figure 6. Assigments for a 2x2-region space
A distribution is defined in the clause Sp_Synchro. The number of
horizontal and vertical partitions is defined using Sp_H
n
for horizontal split
and Sp_V
n
for vertical split. The superscript number n denotes the number
8. 13
of slices of the presentation space. For instance an "Sp_H
2
" defines two
horizontal slices. The width of the slices or the height of the slices in the
case of horizontal split is expressed in percentage. For instance,
Sp_H
2
(40,60) assigns 40% to the first region and 60% to the second. It is
allowed to use a vertical split with a horizontal split. In this case, each
horizontal slice is split into vertical sub-slices. After such splits, regions are
numbered from (1,1) to (nv,nh) where nh is the number of horizontal slices
and nv the number of vertical slices. If it is omitted, we assume that the
value of horizontal (resp. vertical) splits is equal to 1.
The second step of the spatial definition assigns components to regions.
This assignment is done in clause Sp_Synchro using two primitives:
Sp_link and Sp_link
*
. The Sp_link primitive associates a single
component with a region, and Sp_link
*
associates each component
belonging to a collection with a region. When using Sp_link, the part of the
region associated with components is defined using the abscissa and ordinate
of the related regions. The choice to associate a collection component with
each presentation region helps to the specification of hierarchic components,
and thus fosters component reuse.
Consider for instance the following template :
Template: T2(l : laboratory);
Components:(t2.v1,t2.v2);
Sp_Synchro: Sp_H2
(40,60);
Sp_link(t2.v1, 1,1);
Sp_link(t2.v2,2,1);
The template T2 is composed of two video components t2.v1 and
t2.v2. The space associated with T2 is split in two regions. The
coordinates of the first region are (1,1) and the coordinates of the second
region are (2, 1). Thus, Sp_link(t2.v1, 1,1) specifies that the
component t2.v1 is associated with the region (1, 1).
In the context of a collection component CC, and a split
Sp_V
3
(25,50,25). Then, Sp_link
*
(CC) indicates that the first element of
CC is associated with the region (1,1), the second one to (1,2), the third one
to (1,3), the fourth one to (1,1), and so on.
5.3 Temporal Description
The goal of such a description is to temporally constrain component
presentations, and to specify a temporal duration constraint on the template
itself. When a template is composed of several components, it is possible to
specify whether they are presented in sequence or in parallel. It is specified
using seq and par constraints. We present here an excerpt of the
14 Chapter 8
synchronization constraints inspired from [18] where C1 and C2 are atomic
or collection components, and C is a collection component:
- seq(Ci, Cj): Ci and Cj must be presented in sequence.
- par(Ci, Cj): Ci and Cj must be presented in parallel.
- seq-meet(Ci, Cj): Ci and Cj must be presented in sequence with no
delay between presentations.
- par-equal(Ci, Cj): Ci and Cj must be presented in parallel. Moreover,
they must begin and finish simultaneously.
- par-start(Ci, Cj): Ci and Cj must be presented in parallel. The two
presentations must begin simultaneously. Duration of components Ci
and Cj can differ.
- par-finish(Ci, Cj): Ci and Cj must be presented in parallel. Moreover,
the two presentations must finish simultaneously. Duration of Ci and Cj
can differ. The presentation must be stopped when either Ci or Cj
terminates its presentation.
- par-during(Ci, Cj): Ci and Cj are presented in parallel and can have
different durations. The presentation of Ci must begin after Cj starts and
must finish before Cj stops.
- seq*(C): all components belonging to C must be presented in sequence.
- par*(C): all components belonging to C must be presented in parallel
(equivalent to a conjunction between par-start*(C) and par-finish*(C)).
- par-start*(C): all components belonging to C must be presented in
parallel. Moreover, all the presentations must begin at the same time.
Each component has its own duration.
- par-finish*(C): specifies that all components belonging to the C must be
presented in parallel. Moreover, all the presentations must finish at the
same time. The components belonging to C can have different durations.
The presentation must be stopped as soon as one component has ended-
up its presentation
It is possible to set a template time duration. This constraint can be either
an integer greater than zero to specify the maximum duration in seconds, or
equals to -1 to indicate that the duration constraint depends only on the
context of the template presentation (i.e. the temporal duration of a
composed template, or a user-driven duration). When conflicts between
several duration constraints occur, the priority is given to the duration of the
composed template. For sequential constraints, the duration is equally split
into the different components.
Syntactically, the temporal description is then composed of two parts that
are the temporal synchronization and the duration:
<Te_declaration>::=
"Te_Synchro:" <Te_synchronization>
"Te_Duration:" <Te_duration>
8. 15
This approach is also used when objects to be present cannot respect their
own duration constraint. A simple way to validate the constraints is to use a
resolution process that, as previously explained, sets priority to the
composed elements over the composing ones.
5.4 Translation into AROM
The table 2 gives the principal classes and associations of the AROM
model describing the notion of Template. Again, the idea behind this
modeling is to parse a textual description of a template (similar to the ones
given above in examples) and to instantiate the Template model.
The tuple of the association DB_to_Template links the instance Labs of
DB_Connection which contains a query for a database (here, a database
about research laboratories) to the instance t1 of Template (which
corresponds to the example given in section 5.1). The reply of the query will
be used as the spl parameter of the Template. Template t1 is linked to its
component (instance Component1) through the association
Has_component. Also, the spatial (instance Sp_synchro1) constraint is
represented by a tuple of the Has_sp_synchro association.
5.5 From Templates to SMIL Presentations
The translation from templates to SMIL presentations uses the AVS
model described in section 4. We describe here the different steps of this
generation.
- The textual description of templates definitions is first translated into an
AROM base. The reasons that lead us to stick on textual initial
descriptions of templates are related to the compactness of this notation
and the fact that it is quite understandable even for complex descriptions
that contain more than 10 templates. It is also easier to reuse parts of
pre-existing templates with textual descriptions.
- The second step consists in using the AROM template description and
the input from the database to build the AROM AVS knowledge base.
The query of the db_query attribute of the initial
Root_Template_Instance is sent to the database system, and the
result is a set of objects identifiers.
- In a third step, the outer template is recursively analyzed to extract the
clause_from and to obtain the exact number of elements that are
managed by each template. The result is then a structure that represents
each element to be involved in the presentation, along with their spatial
and temporal constrains according to their related templates.
16 Chapter 8
class: DB_Connection
variables:
variable: db_location
type: string
variable: db_query
type: string
class: Template
variables:
variable: id
type: string
variable: duration
type: float
class: Sp_synchro
variables:
variable: split
type: integer
variable: percent
type: list-of integer
cardinality:
min:0
max: *
class: Component
variables:
variable: name
type: string
variable: definition
type: string
class: SelectFromWhere
super-class: Component
variables:
variable: clause_from
type: string
variable: clause_where
type: string
class: RefTemplate
super-class:
SelectFromWhere
variables:
variable: ref_template
type: string
variable: parameters
type: list-of string
cardinality:
min:0
max: *
class: Parameter
variables:
variable: name
type: string
variable: type
type: string
variable: constructor
type: string
association: Has_link
roles:
role: sp_synchro
type: Sp_synchro
multiplicity:
min: 0
max: *
role: component
type: Component
multiplicity:
min: 0
max: *
variables:
variable: abscissa
type: integer
variable: ordinate
type: integer
variable: abscissa_range
type: list-of integer
cardinality:
min:0
max: *
variable: ordinate_range
type: list-of integer
cardinality:
min:0
max: *
variable: star
type: boolean
association: DB_to_Template
roles:
role: db
type: DB_Connection
multiplicity:
min: 1
max: 1
role: template
type: Template
multiplicity:
min: 1
max: 1
instance: Labs
is-a: DB_Connection
db_location =
“c:/db/edu/database”
db_query = "select l from l
in Laboratory"
instance: Template1
is-a: Template
id = "t1"
duration = 400
tuple:
is-a: DB_to_Template
db: Labs
template: Template1
tuple:
is-a:Has_parameter
template = Template1
parameter = Parameter1
instance: Parameter1
is-a: Parameter
name = "spl"
type = "laboratory"
constructor = "set"
tuple:
is-a:Has_component
template = Template1
component = Component1
instance: Component1
is-a: RefTemplate
name = "c1"
ref_template = "t2"
parameters = ["pl"]
clause_from = "pl in spl"
tuple:
is-a:Has_sp_synchro
template = Template1
sp_synchro = Sp_synchro1
instance: Sp_synchro1
is-a: Sp_synchro
tuple:
is-a:Has_link
sp_synchro = Sp_synchro1
star = true
component = Component1
Table 2. An excerpt of the AROM Template Model. The two left columns show the principal
classes and associations used for describing a Template and the way it is linked to a database.
On the right, the instances and tuples describing the template t1 presented in section 5.1.
8. 17
- A fourth step is dedicated to the generation of the AVS base according
to the description of step 2. According to template definition, the
generation of the spatial aspects: we translate the relative coordinates of
the templates into absolute co-ordinates in the Region attributes top,
height and width (cf. Table 1). The synchronization constraints that
manage a given number of elements used in templates are translated in
their absolute beginning time of presentation and duration (attributes
begin and dur of the instances of the class CommonAttribute as
described in Table 1). The constraint solver of AROM is then able to
detect and to fix spatial of temporal inconsistencies.
- Then, the generation of the SMIL file corresponding to the desired
presentation is achieved through the description of section 4.
6. RELATED WORKS
For a complete comparison of V-STORM with other multimedia projects,
one can refer to [19]. Among numerous research works on authoring and
presentation environments for interactive multimedia documents, Madeus
[20] is a very complete environment with a graphical authoring interface and
a spatial formatting editor. Madeus is based on a constraint-based approach.
It offers flexibility for frequent scenario modifications carried out by the
author before reaching the desired scenario, a coupling between the editing
and presentation process, and an incremental editing process which consists
in readjusting the solution each time the author adds or deletes a constraint.
Constraint propagation maintains the consistency of the new scenario: at
each editing step, the author is sure of having a consistent scenario. Our
AVS model also relies on a similar approach since AROM integrates a
constraint solver. The AML allows expressions of constraints involving
classes, associations, objects or tuples. But for authoring, we put the
emphasis on a yet more declarative approach through the use a UML-like
model in which constraints are implicitly embedded into temporal and
spatial operators. Also, unlike V-STORM, other presentation tools pay few
attention to the video data type management. Finally, to our knowledge, this
study is the first attempt to benefit from the expressing power of a object-
based knowledge representation system to describe and check the
consistency of a multimedia presentation.
All major database products propose an interface with the Web: Versant,
Objectstore and O2 for object technology; Oracle, Informix and Microsoft
SQL Server / Access (referred later as MSQLS/A) for relational technology.
However, none of these systems supports temporal expressions for query
result presentations, and spatial features of result presentations are mostly
18 Chapter 8
imperative (by HTLM-like templates or programs). Almost all systems
provide a way to define complex presentations using programs, but no
language definition specifically dedicated to the query result presentation is
provided, and this is why the use of Templates is useful. From another point
of view, some research works consider structured or semi-structured
documents as input of the database, but they are also capable to generate
structured documents as query results. WebSQL [21] uses a relational
database, and queries are based on a “Select-From-Such That-Where”
pattern to allow complex From expressions. Path expressions are supported.
Results are defined in HTML tables. WebOQL [22] allows to define the
format of query results and to reuse the results. WebOQL is a complex
language, that is why a kind of template is used as a front-end. Query
languages for XML documents have also been proposed. XML-QL [23] is a
simple language that queries and builds XML documents. XML-QL allows
the extraction of parts of XML documents and has the ability to perform
joins and agregates, but no consideration is made for the presentation of
documents. The SQL+D [24] proposals deal with presentation of relational
data. A SQL+D query uses the usual “SELECT FROM WHERE” clause,
and adds a “DISPLAY” clause in which we express the presentation features
of the results. SQL+D distinguishes between multimedia documents (without
a temporal presentation schedule) and multimedia presentation that put a
temporal schedule on multimedia documents. The writing (and the reading)
of the query presentations using layers is rather delicate, and our proposal
aims at being more easy to generate by users.
7. CONCLUSION
This paper has presented a first attempt to couple an object-based
knowledge representation system (OBKRS) called AROM, with a
multimedia presentation authoring tool named V-STORM. This coupling has
three main results. First, the multimedia presentation scenario can be
modeled using UML diagram class-like description of AROM which shown
to be more intuitive than a SMIL file. Second, the inference and consistency
engines of AROM check the presentation against validity. Third, the
richness of V-STORM video operators is better exploited. The AROM KB
proposed here, called AVS, is a generic model for multimedia presentations.
Classes and associations of the AVS model just have to be instantiated to
create an effective multimedia presentation. Notably, this model incorporates
every characteristic of SMIL elements for describing how to arrange media
objects in a scenario. A parser has been written to translate such an AROM
8. 19
KB into a SMIL document. At its turn, this SMIL document is parsed by V-
STORM and the presentation is played.
This papers has also presented an AROM model for the notion of
Template. A template is a logical presentation unit for creating Web
presentations according to different query expressions and to spatio-temporal
constraints. Again, through an instantiated model, AROM is responsible for
maintaining the consistency of a set of templates describing the behavior and
the layout of a set of media objects, stored in a database, and involved in a
multimedia presentation.
This work is only at its beginning but three main directions are already
privileged. The first one concerns the integration of the V-STORM video
query language into the AROM model. The idea here is to substitute the
OQL query language by the algebraic modeling language of AROM.
Eventually, a parser will directly connect AROM to V-STORM, without
having recourse to the existing AROM/SMIL parser. Second, a graphical
timeline interface could help the user during the authoring process to
interactively control the changes made on her/his multimedia document
through a real time support. Third, parallel works we make [17] for a better
use of database capabilities in the context of Web presentations could be
integrated within the AVS model. The template model described here does
not consider user interaction during the presentation of documents. We will
work in the future on such integration by defining Event-Condition-Action
rules on templates. Lastly, we are currently adapting the AROM models
presented here to the new W3C proposed recommendation SMIL 2.0.
8. REFERENCES
[1] A. Laursen, J. Olkin and M. Porter, Oracle Media Server: providing consumer interactive
access to Multimedia data, SIGMOD, 1994.
[2] K. Nwosu, B. Thuraisingham and B. Berra, Multimedia Database Systems: design and
implementation strategies, Kluwer Academic Publishers, 1996.
[3] B. Ozden, R. Rastogori and A. Silberschatz, Multimedia Database Systems, Issues and
Research Directions, Springer-Verlag, 1996.
[4] R. Weiss, A. Duda and D. Gifford, Composition and Search with a Video Algebra, IEEE
multimedia, pp 12-25, Springer Ed., 1995.
[5] Overview of the MPEG-7 Standard (version 5.0), ISO/IEC JTC1/SC29/WG11 N4031,
March 2001.
[6] R. Lozano, M. Adiba, F. Mocellin and H. Martin, An Object DBMS for Multimedia
Presentations including Video Data, Proc. of ECOOP'98 Workshop Reader, Springer Verlag,
Lecture Notes in Computer Science, 1543, 1998.
[7] W3C Recommendation: Synchronized Multimedia Integration Language (SMIL) 1.0
Specification http://www.w3.org/TR/REC-smil
[8] GriNS Authoring Software, http://www.oratrix.com/GRiNS/index.html
[9] RealNetworks G2, http://www.realnetworks.com
20 Chapter 8
[10] M. Page, J. Gensel, C. Capponi, C. Bruley, P. Genoud, D. Ziébelin, D. Bardou and V.
Dupierris, A New Approach in Object-Based Knowledge Representation : the AROM
System, IEA/AIE-2001, June 4-7, Budapest, Hungary, 2001,
http://www.inrialalpes.fr/romans/arom
[11] Mulhem and H. Martin, From Database to Web multimedia Documents, in Journal of
Multimedia Tools and Applications (to appear), 2002.
[12] R.G.G. Cattell and D. Barry, The Object Database Standard : ODMG 2.0, Morgan
Kaufmann,1997.
[13] R. J. Brachman and J. G. Schmolze, An Overview of the KL-ONE Knowledge
Representation System, Communications of the ACM, 31 (4), pp. 382-401, 1988.
[14] J. Rumbaugh, I. Jacobson and G. Booch, The Unified Modeling Language Reference
Manual., Addison-Wesley, 1999.
[15] P. Van Hentenryck, The OPL Optimization Programming Language, MIT Press, 1999.
[16] W3C Recommendation: Extensible Markup Language (XML) 1.0 (Second Edition)
http://www.w3.org/TR/REC-xml
[17] A. Alasqur, OQL: A Query Language for Manipulating Object Oriented Databases, 15th
VLDB Conference, Amsterdam, The Netherlands, September 1990.
[18] H. Martin, «Specification of Intentional Multimedia Presentations using an Object-
Oriented Database » Proc. of the International Symposium on Digital Media Information
Base, Nara - Japan , November, 1997.
[19] R. Lozano, Intégration de données video dans un SGBD à objets, PhD Thesis (in
French), Joseph Fourier University, Grenoble, France, 2000.
[20] M. Jourdan, N. Layaïda, C. Roisin, L. Sabry-Ismaïl and L. Tardif, Madeus, an Authoring
Environment for Interactive Multimedia Documents, in ACM Multimedia, pp 267-272,
Bristol, UK, 1998.
[21] A. Mendelson, G. Mihaila, T. Milo, Querying the World Wide Web, Journal on Digital
Libraries, Vol. 1, n. 1, pp. 54-67
[22] G. Arocena and A. Mendelzon, WebOQL: Restructuring Documents, Databases and
Webs, Proc. of the ICDE Conference, Orlando, Florida, USA, February 1998, pp. 24-33.
[23] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Suciu, XML-QL: A Query
Language for XML, W3C, NOTE-xml-ql-19980819, August 1998.
[24] C. Baral, G. Gonzalez and A. Nandigam, SQL+D: extended display capabilities for
multimedia database queries, Proc. of the ACM Multimedia’98 Conference, Bristol, UK,
pp.109-114.

An Object Approach For Web Presentations

  • 1.
    1 Chapter 8 An ObjectApproach for Web Multimedia Presentations Jérôme Gensel1 , Philippe Mulhem2 , Hervé Martin1 1 LSR-IMAG, Grenoble, France, 2 IPAL-CNRS Singapore Abstract: This paper deals with the coupling of V-STORM, which is both a video manager and a multimedia presentation system, with AROM, an object-based knowledge representation system. We first present here an AROM knowledge base, called the AVS model, which constitutes a generic model for multimedia presentations. The resulting model encompasses any multimedia presentation described using the SMIL standard. By instantiating this knowledge base, the author describes her/his multimedia presentation and the way media objects interact in it. Then, the corresponding SMIL file is exhibited and sent to V- STORM in order to be played. This coupling shows to be relevant for two reasons: first, by its UML-like formalism, AROM eases the task of a multimedia presentation author; second, AROM is put in charge of checking the spatial and temporal consistencies of the presentation during its description. This way, a consistent presentation is sent to V- STORM. Lastly, we present another AROM model which describes the notion of template, a logical presentation unit which merges database queries with spatio-temporal constraints. Key words: Multimedia presentations, Videos, Knowledge Representation, SMIL, Template 1. INTRODUCTION In the last decade, multimedia and, more particularly, video systems have benefited from a tremendous research interest. The main reason for this is the increasing ability computers now have for supporting video data, notably thanks to unceasing improvements in data compression formats (as MPEG-2 and MPEG-4), in network transfer rates and operating systems [1], and in disk storage capacity. Unsurprisingly, new applications have risen such as
  • 2.
    2 Chapter 8 videoon demand, video conferencing and home video editing which directly benefit from this evolution. Following this trend, research efforts ([2], [3]) have been made to extend DataBase Management Systems (DBMS) so that they support video data types not simply through Binary Large Objects (BLOB). Indeed, DBMS seem to be well-suited systems for tackling problems posed by the video, namely storage, modeling, querying and presentation. Video data types must be physically managed apart from other conventional data types in order to fulfill their performance requirements. Video modeling must take into account the hierarchical structure of a video (shots, scenes and sequences) and allow overlapping and disjoint segment clustering [4]. The video query language must (accordingly to MPEG-7 descriptions [5]) allow one to query video content using textual annotations or computed signatures (colour, shape, texture…) and deal with the dynamic (movements) of objects in the scenes as well as with semi-structural aspects of videos and, finally, must offer the possibility of creating new videos. We have designed and implemented V-STORM [6] a video system which captures video data in an object DBMS. The V-STORM model considers video data from different perspectives (represented by class hierarchies): physical (as a BLOB), structural (a video is made up of shots which are themselves composed of scenes which can be split into sequences), composition (for editing new videos using data already stored in the database), semantics (through an annotation, a video segment is linked to a database object or a keyword). V-STORM uses and extends the O2 object DBMS and comes as a tool for formulating queries on videos, composing a video using the results of queries, and generating video abstracts. V-STORM can play videos (or segments of) of its database but also virtual videos (or segments of) composed through an O2 interface. Moreover, it is possible to use V-STORM as a multimedia player for presentations described using the SMIL [7] standard . This way, V-STORM can be classified in the family of multimedia presentation software like GriNS [8] or RealNetworks G2 [9]. We show how AROM [10], an object-based knowledge representation system, can be used to help a V-STORM user to build, in a more declarative way using templates [11], a multimedia presentation by instantiating a knowledge base rather than by writing a SMIL file. Then, we show how both spatial and temporal consistencies of multimedia presentation expressed by template declarations can be maintained by AROM. The paper is organized as follows : sections 2 and 3 present respectively the V-STORM and AROM systems ; section 4 describes the AVS model, an AROM knowledge base which corresponds to a general multimedia presentation structure ; section 5 details the template model used for the definition of multimedia presentations ; section 6 gives the related works before we conclude in section 7.
  • 3.
    8. 3 2. THEV-STORM SYSTEM V-STORM differentiates between the raw video stored in the database and the video which is watched and manipulated by end-users. From a user point of view, a video is a continuous media which can be played, stopped, paused, etc. From a DBMS point of view, a video is a complex object composed of an ordered sequence of frames, each having a fixed display time. This way, new virtual videos can be created by assembling frames from different segments of videos. O2 OQL Video DataBase Video Composer Video Player Figure 1. The V-STORM architecture. Through the video composer interface, a video is described by the user, translated in OQL so that its component video segments can be sought in the O2 video database. Then, the video is played by the V-STORM video player. In V-STORM, the Object Query Language (OQL) [12] is used (see Figure 1) to extract video segments to compose virtual videos. Video query expressions are stored in the databases and the final video is generated at presentation time. This approach avoids data replication. A video query returns either a video interval which is a continuous sequence of frames belonging to the same video, or a whole video, or an excerpt of a raw video (by combination of the two previous cases), or a logical extract of a video stemming from various raw videos. Video composition in V-STORM is achieved using a set of algebraic operators. A virtual video can be the result of the concatenation, or the concatenation without duplication (union), or the intersection, or the difference of two videos, but also the reduction (by elimination of duplicate segments) or the finite repetition of a single video. Annotations in V- STORM are used to describe salient objects or events appearing in the video. Annotations can be declared at each level of the video hierarchy. They are manually created by the users through an annotation tool. V-STORM also integrates an algorithm to automatically generate video abstracts. Video abstracts aims at optimizing the time for watching a video in search of a
  • 4.
    4 Chapter 8 particularsegment. The user has to provide some information concerning the expected abstract: its source (one or more videos), its duration, its structure (which reflects the structure of the video), and its granularity (in the video segments might be more relevant than others). Finally, in order to open V- STORM to the multimedia presentation standardization, we have developed a SMIL parser (see Figure 2) so that V-STORM can read a SMIL document and play the corresponding presentation. Also, interactivity is possible since V-STORM handles the presence of anchors for hypermedia links during presentations. SMIL Parser SMIL File Video Player Figure 2. V-STORM can also be used as multimedia presentation. The presentation is described in SMIL, sent to a parser and played by the V-STORM video player. The parser checks the validity of the SMIL document against the SMIL DTD (extended to support new temporal operations carried out by V- STORM). Then the different SMIL elements are translated in V-STORM commands and the video is displayed. Currently, this parser is limited and does not exploit all the V-STORM functionalities concerning operations on videos. The work presented here extends the description of a la SMIL multimedia presentations in order to better exploit V-STORM capabilities. 3. THE AROM SYSTEM Object-Based Knowledge Representation Systems (OBKRS) are known to be declarative systems for describing, organizing and processing large amounts of knowledge. In these systems [13], once built, a knowledge base (KB) can be exploited through various and powerful inference mechanisms such as classification, method calls, default values, filters, etc. AROM (which stands for Associating Relations and Objects for Modeling) is a new OBKRS which departs from others in two ways. First, in addition to classes (and objects) which often constitute the unique and central representation entity in OBKRS, AROM uses associations (and tuples), similar to those found in UML [14], to describe and organize links between objects having common structure and semantics. Second, in addition to the classical
  • 5.
    8. 5 OBKRS inferencemechanisms, AROM integrates an algebraic modeling language (AML) for expressing operational knowledge in a declarative way. The AML is used to write constraints, queries, numerical and symbolic equations involving the various elements of a KB. A class in AROM describes a set of objects sharing common properties and constraints. Each class is characterized by a set of properties called variables and by a set of constraints. A variable denotes a property whose basic type is not a class of the KB. Each variable is characterized by a set of facets (domain restriction facets, inference facets, and documentation facets). Expressed in the AML, constraints are necessary conditions for an object to belong to the class. Constraints bind together variables of – or reachable from – the class. The generalization/specialization relation is a partial order organizes classes in a hierarchy supported by a simple inheritance mechanism. An AROM object represents a distinguishable entity of the modeled domain. Each object is attached to exactly one class at any moment. In AROM, like in UML, an association represents a set of similar links between n (n ≥ 2) classes, being distinct or not. A link contains objects of the classes (one for each class) connected by the association. An association is described by means of roles, variables and constraints. A role corresponds to the connection between an association and one of the classes it connects. Each role has a multiplicity, whose meaning is the same as in UML. A variable of an association denotes a property associated with a link and has the same set of available facets as a class variable. A tuple of an n-ary association having m variables vi (1 ≤ i ≤ m) is the (n+m)-uple made up of the n objects of the link and of the m values of the variables of the association. A tuple is an "instance" of an association. Association constraints involve variables or roles and are written in the AML, and must be satisfied by every tuple of the association. Associations are organized in specialization hierarchies. See Tables 1 et 2 and Figure 5 for a textual and a graphical sketches of an AROM KB dedicated to multimedia presentations. First introduced in Operations Research, algebraic modeling languages (AMLs) make it possible to write systems of equations and/or of constraints, in a formalism close to mathematical notations. They support the use of indexed variables and expressions, quantifiers and iterated operators like ∑ (sum) and ∏ (product), in order to build expressions such as j J j i x x I i ∈ ∑ = ∈ ∀ , . AMLs have been used for linear and non-linear, for discrete-time simulation, and recently for constraint programming [15]. In AROM, the AML is used for writing both equations, constraints, and queries. AML expressions can be built from the following elements: constants, indices and indexed expressions, operators and functions, iterated operators, quantified expressions, variables belonging to classes and associations, and expressions that allow to access to the tuples of an
  • 6.
    6 Chapter 8 association.An AML interpreter solves systems of (non-simultaneous) equations and processes queries. Written in Java 1.2, AROM is available as a platform for knowledge representation and exploitation. It comprises an interactive modeling environment, which allows one to create, consult, and modify an AROM KB; a Java API, an interpreter for processing queries and solving sets of (non-simultaneous) equations written in AML, and WebAROM, a tool for consulting and editing a KB through a Web browser. 4. COUPLING AROM AND V-STORM As mentioned above, multimedia scenarios played by V-STORM can be described using SMIL. The starting point of this study is twofold. We aim first at providing a UML-like model in order to ease the description of a multimedia presentation and, second, at reinforcing consistency regarding spatial and especially temporal constraints between the components of a multimedia presentation. It is our conviction that, SMIL like XML [16], are not intuitive knowledge representation languages, and one needs to be familiar with their syntax before to read or write and understand the structure of a document. So, we propose an AVS (AROM/V-STORM) model (see Figures 3 and 4), which consists of an AROM knowledge base whose structure incorporates any SMIL element used in the description of a multimedia presentation. This way, we provide a V-STORM user with an operational UML-like model for describing her/his multimedia presentation. Using an entity/relation (or class/association) approach for modeling is now a widely accepted approach, and UML has become a standard. Through the AROM Interface Modeling Environment, the graphical representation of classes and associations which constitute the AVS model, gives the user a more intuitive idea of the structure of her/his presentation. Moreover, taking advantage of the AROM's AML and type checking, the user can be informed about the spatial and temporal consistencies of her/his presentation. 4.1 An AROM Model for Multimedia Presentations Since V-STORM can play any presentation described with SMIL, our AROM model for multimedia presentation is SMIL compliant. This means that it incorporates classes and associations corresponding to every element that can be found in the structure of a SMIL document. However, the main objective of the AVS model is to give the user the opportunity to invoke any kind of operations V-STORM can performed on a video. In the AVS model, the various features of a multimedia presentation are modeled using classes and associations (see Table 1) which represent the
  • 7.
    8. 7 principal elementsof SMIL. The class Presentation gives the more general information about the multimedia presentation. Concerning the spatial formatting which describes the way displayable objects are placed into the presentation window, it is described by objects of the Layout class, in accordance with the SMIL recommendation. When a presentation gathers more than one layout V-STORM chooses the first layout that matches the user preferences. This way, V-STORM permits some adaptability concerning the characteristics of the machine on which the presentation is played. A layout can be associated with a root-layout and several regions (described respectively by classes RootLayout, Region and associations HasRootLayout and HasRegion) where the media objects appear. Figure 3. A view of the Interactive Modeling Environment through which the AVS model can be instantiated. On the left, a view of the class and association hierarchies. On the right, the UML-like graphical description of the mode. From this graphical description, a textual description is automatically generated ready to be parsed. For instance, one can find the corresponding textual description of the class Element in Table 1.
  • 8.
    8 Chapter 8 class:RootLayout variables: variable: b_color type: string variable: title type: string variable: height type: integer variable: width type: integer class: Region variables: variable: b_color type: string variable: fit type: string variable: title type: string variable: top type: integer variable: height type: integer variable: width type: integer class: CommonAttributes variables: variable: abstract type: string variable: author type: string variable: begin type: float default: 0 variable: end type: float definition: end=begin+dur variable: dur type: float default: 0 definition: dur=end-begin variable: region type: string variable: repeat type: integer variable: s_bitrate type: integer variable: s_caption type: boolean variable: s_language type: list-of string cardinality: min:0 max: * class: Block super-class: CommonAttributes variables: variable: sync type: string default: "seq" variable: endsync type: string class: Element super-class: CommonAttributes variables: variable: media type: string variable: src type: string variable: alt documentation: "specifies an alternate text, if the media can not be displayed" type: string variable: fill documentation: "if fill=true so freeze else remove" type: Boolean association: CBE roles: role: block type: Block multiplicity: min: 0 max: * role: element type: Element multiplicity: min: 0 max: * Table 1. An excerpt of the AROM textual description showing 7 classes and 1 association of the AVS model for multimedia presentations. In the CommonAttributes abstract class, a definition is given for the end and dur variables using the AML. Concerning the time model, a V-STORM presentation is made up of blocks. Each block can contain other blocks and/or media objects. Basic media objects supported by V-STORM are continuous media with an intrinsic duration (video, audio…) or discrete media without an intrinsic duration (text, image…). The variable sync in the Block class determines the temporal behavior (namely parallel or sequential presentation) of the elements in the blocks, depending on its value seq or par. Three temporal information can be associated with a media object or a block: its duration (variable dur), its begin and end times (variables begin and end). When no value is specified for this variable, the duration of a discrete object is null and the duration of a continuous object is its natural duration. The semantics
  • 9.
    8. 9 concerning theeffective beginning of objects linked to a parallel or sequential block is the same as the one defined in the SMIL recommendation. Also, every date associated with an object must be defined as a float value. This is not a limitation since the model allows to associate to a media object a set of reaction methods (start, end, load…) in response to events (click, begin, end…) triggered by other objects. Compared with an authoring language like GRiNS, this event-reaction mechanism offers more synchronization possibilities between objects and through the AVS model, it is more easy and intuitive to express a temporal scenario. The knowledge base contains a Switch class in charge of adapting the presentation to the system capabilities and settings. The variables found in this class (s_bitrate,s_captions, s_language…) are equivalent to the attributes of the switch element in SMIL. The player will play the first element of the switch acceptable for presentation. Finally, the two kinds of navigational links proposed by SMIL (a and anchor) and allowing interactivity during a presentation, are represented in the knowledge base by the A_Link and Anchor_Link classes. The power of the event-reaction mechanism implemented in V-STORM allow an author to define more powerful and intuitive user interaction possibilities than in SMIL. For instance, a media object can start some time after the click on another object, and it can end just after the load of a new object. 4.2 Building a multimedia presentation To build a multimedia presentation, a V-STORM user simply has to instantiate the AROM KB. For a local KB, this can be done either by using the AROM Interactive Modeling Environment (see Figure 3 and Figure 4), or by completing the ASCII document describing the KB (like in Table 1), or by using the Java API of AROM in a program. For a distant KB, this can be done through a web browser using WebAROM. Since this instantiation is made under the control of AROM (type checking, multiplicity constraint satisfaction, …), both the spatial and temporal consistencies of the described presentation are guaranteed. Once this instantiation is performed, an AROM-SMIL parser we have written is launched and the resulting SMIL file is sent to the SMIL parser of V-STORM (see Figure 5).
  • 10.
    10 Chapter 8 Figure4. The editor for instantiating the AVS model AROM-SMIL Parser VSTORM AVS Model for Mulmedia Presentations User Interface SMIL AROM-VSTORM Parser AROM-API AROM-IME MODEL.txta WebAROM Figure 5. The architecture of the AROM/V-STORM coupling 4.3 Benefits of the AVS model The coupling between V-STORM and AROM combines the video management richness of the former to the expressing and modeling power of the latter. Compared to the classical specification and presentation of multimedia documents, this coupling offers several advantages. - UML-like description: The AVS-model is described in a graphical notation close to UML. Object-oriented analysis and design methods, have shown the relevance of using graphical notations to improve communication between all the actors of a design process (for instance, in a collaborative publication task between several authors). - Modularity and reuse: The author can edit parts of the presentation independently and group them to compose its documents, just by manipulating AROM objects. This object approach allows the reuse of existing blocks to compose new presentations, saving a large amount of work in the design phase. - Object identity: In an AROM KB, each object has a unique identifier. This property prevents inconsistencies due to the assignment of the same identifier to two different media objects. For the name given to regions, for instance, the existence of such names is checked. - Consistency maintenance: When a presentation contains inconsistencies, for instance when it says that an object B starts at the end of an object A, while an object C starts at the end of B and C starts at the same time as A, classical multimedia systems ignore or do just warn about these inconsistencies at display time. Here, a temporal checking is performed by AROM during the construction of the presentation and the author is warned about such inconsistencies. This warranties that a consistent document is sent to the presentation system. This static checking also
  • 11.
    8. 11 makes itpossible to obtain a global trace of the presentation or a timeline view, which aligns all events on a single axis of time. - Virtual videos: In addition to raw videos, the author can include in the presentation virtual videos. They correspond to video objects having no value for their src variable. Associations (Extraction, Reduction,Repetition,BinaryOperation) corresponding to the V-STORM operations for creating virtual videos have been introduced in the KB. Once these associations are instantiated, their tuples link a virtual video to the video(s) (raw or virtual) it is derived from. - Keywords and video abtracts: It is possible to use the keywords variable to annotate and to formulate queries on the content of a video. Moreover, the model includes an AbstractOf association in order to link an abstract (an object of the VAbstract class), possibly having a given duration, to a video. Thus, the video can be replaced by its abstract during the presentation. A VAbstract object can be created manually or automatically using the AROM API and the V-STORM video abstract generator. 5. THE TEMPLATE MODEL We present here the model that supports the logical presentation of database video objects. First, we introduce the concept of Template. A Template is a logical presentation unit. Any Template has a unique identifier and can be composed of several components. A component is either a template or a query expression specified using the OQL query language [17]. Template definitions specify spatio-temporal constraints between components. We present in this section the structure of a Template, and then we introduce its spatial and temporal features. 5.1 Template definition A template is defined by its name, its structure and temporal and spatial constraints. A template T is formally defined as a quadruplet [id, ip, c, p] ∈ ID × IP × C × P: - Id is the set of template logical identifiers. - IP is the set of template inputs. An input ensures the communication between templates. From an object perspective, is corresponds to message parameters. - C is the set of template components. A component identifies several logical elements linked by spatio-temporal relationships. A component
  • 12.
    12 Chapter 8 canbe atomic (it contains one template instance) or a collection (it contains a set of template instances). - P is the set of spatio-temporal constrains related to the template and its components. Here is an example of a template definition : Template: t1(spl : set(laboratory)); Components: (t1.c1=select t2(pl) from pl in spl); Sp_Synchro: link*(t1.c1); Te_Synchro : seq*(t1.c1); Te_Duration : 400 This template is identified by its name t1, has one collection component, namely t1.c1.The spatial (Sp_Synchro) and temporal (Te_Synchro, Te_Duration) constrains are described in the following sessions. The template t1 illustrates the presentation of a collection of laboratories (spl). Each laboratory pl is selected using a query expression and is presented according to the definition of another template t2. 5.2 Spatial Description The aim of the spatial description is to specify the presentation layout. This task is performed in three steps. First, we split the space according to the components that must be presented simultaneously. For example, to present one employee in a laboratory, we assign one region to display the video of his face, and another for the video of him/her at work. Consequently, we split the presentation space into two regions. Second, we define how the presentation space split is achieved. We postulate that the spatial presentation space of a template is a rectangle, which is consistent with the usual display of videos. It is allowed to divide this rectangle both horizontally and/or vertically. Third, each region can be assigned to a Template component, which can itself be assigned to several regions. Suppose we want to display two components C1 and C2 using a 2 by 2 split. The figure 6 shows valid distribution for regions, without considering vertical and horizontal symmetries or rotations between C1 and C2. C1 C2 C1 C2 C2 C1 C1 C1 C1 C2 C2 C2 C2 C2 C2 Figure 6. Assigments for a 2x2-region space A distribution is defined in the clause Sp_Synchro. The number of horizontal and vertical partitions is defined using Sp_H n for horizontal split and Sp_V n for vertical split. The superscript number n denotes the number
  • 13.
    8. 13 of slicesof the presentation space. For instance an "Sp_H 2 " defines two horizontal slices. The width of the slices or the height of the slices in the case of horizontal split is expressed in percentage. For instance, Sp_H 2 (40,60) assigns 40% to the first region and 60% to the second. It is allowed to use a vertical split with a horizontal split. In this case, each horizontal slice is split into vertical sub-slices. After such splits, regions are numbered from (1,1) to (nv,nh) where nh is the number of horizontal slices and nv the number of vertical slices. If it is omitted, we assume that the value of horizontal (resp. vertical) splits is equal to 1. The second step of the spatial definition assigns components to regions. This assignment is done in clause Sp_Synchro using two primitives: Sp_link and Sp_link * . The Sp_link primitive associates a single component with a region, and Sp_link * associates each component belonging to a collection with a region. When using Sp_link, the part of the region associated with components is defined using the abscissa and ordinate of the related regions. The choice to associate a collection component with each presentation region helps to the specification of hierarchic components, and thus fosters component reuse. Consider for instance the following template : Template: T2(l : laboratory); Components:(t2.v1,t2.v2); Sp_Synchro: Sp_H2 (40,60); Sp_link(t2.v1, 1,1); Sp_link(t2.v2,2,1); The template T2 is composed of two video components t2.v1 and t2.v2. The space associated with T2 is split in two regions. The coordinates of the first region are (1,1) and the coordinates of the second region are (2, 1). Thus, Sp_link(t2.v1, 1,1) specifies that the component t2.v1 is associated with the region (1, 1). In the context of a collection component CC, and a split Sp_V 3 (25,50,25). Then, Sp_link * (CC) indicates that the first element of CC is associated with the region (1,1), the second one to (1,2), the third one to (1,3), the fourth one to (1,1), and so on. 5.3 Temporal Description The goal of such a description is to temporally constrain component presentations, and to specify a temporal duration constraint on the template itself. When a template is composed of several components, it is possible to specify whether they are presented in sequence or in parallel. It is specified using seq and par constraints. We present here an excerpt of the
  • 14.
    14 Chapter 8 synchronizationconstraints inspired from [18] where C1 and C2 are atomic or collection components, and C is a collection component: - seq(Ci, Cj): Ci and Cj must be presented in sequence. - par(Ci, Cj): Ci and Cj must be presented in parallel. - seq-meet(Ci, Cj): Ci and Cj must be presented in sequence with no delay between presentations. - par-equal(Ci, Cj): Ci and Cj must be presented in parallel. Moreover, they must begin and finish simultaneously. - par-start(Ci, Cj): Ci and Cj must be presented in parallel. The two presentations must begin simultaneously. Duration of components Ci and Cj can differ. - par-finish(Ci, Cj): Ci and Cj must be presented in parallel. Moreover, the two presentations must finish simultaneously. Duration of Ci and Cj can differ. The presentation must be stopped when either Ci or Cj terminates its presentation. - par-during(Ci, Cj): Ci and Cj are presented in parallel and can have different durations. The presentation of Ci must begin after Cj starts and must finish before Cj stops. - seq*(C): all components belonging to C must be presented in sequence. - par*(C): all components belonging to C must be presented in parallel (equivalent to a conjunction between par-start*(C) and par-finish*(C)). - par-start*(C): all components belonging to C must be presented in parallel. Moreover, all the presentations must begin at the same time. Each component has its own duration. - par-finish*(C): specifies that all components belonging to the C must be presented in parallel. Moreover, all the presentations must finish at the same time. The components belonging to C can have different durations. The presentation must be stopped as soon as one component has ended- up its presentation It is possible to set a template time duration. This constraint can be either an integer greater than zero to specify the maximum duration in seconds, or equals to -1 to indicate that the duration constraint depends only on the context of the template presentation (i.e. the temporal duration of a composed template, or a user-driven duration). When conflicts between several duration constraints occur, the priority is given to the duration of the composed template. For sequential constraints, the duration is equally split into the different components. Syntactically, the temporal description is then composed of two parts that are the temporal synchronization and the duration: <Te_declaration>::= "Te_Synchro:" <Te_synchronization> "Te_Duration:" <Te_duration>
  • 15.
    8. 15 This approachis also used when objects to be present cannot respect their own duration constraint. A simple way to validate the constraints is to use a resolution process that, as previously explained, sets priority to the composed elements over the composing ones. 5.4 Translation into AROM The table 2 gives the principal classes and associations of the AROM model describing the notion of Template. Again, the idea behind this modeling is to parse a textual description of a template (similar to the ones given above in examples) and to instantiate the Template model. The tuple of the association DB_to_Template links the instance Labs of DB_Connection which contains a query for a database (here, a database about research laboratories) to the instance t1 of Template (which corresponds to the example given in section 5.1). The reply of the query will be used as the spl parameter of the Template. Template t1 is linked to its component (instance Component1) through the association Has_component. Also, the spatial (instance Sp_synchro1) constraint is represented by a tuple of the Has_sp_synchro association. 5.5 From Templates to SMIL Presentations The translation from templates to SMIL presentations uses the AVS model described in section 4. We describe here the different steps of this generation. - The textual description of templates definitions is first translated into an AROM base. The reasons that lead us to stick on textual initial descriptions of templates are related to the compactness of this notation and the fact that it is quite understandable even for complex descriptions that contain more than 10 templates. It is also easier to reuse parts of pre-existing templates with textual descriptions. - The second step consists in using the AROM template description and the input from the database to build the AROM AVS knowledge base. The query of the db_query attribute of the initial Root_Template_Instance is sent to the database system, and the result is a set of objects identifiers. - In a third step, the outer template is recursively analyzed to extract the clause_from and to obtain the exact number of elements that are managed by each template. The result is then a structure that represents each element to be involved in the presentation, along with their spatial and temporal constrains according to their related templates.
  • 16.
    16 Chapter 8 class:DB_Connection variables: variable: db_location type: string variable: db_query type: string class: Template variables: variable: id type: string variable: duration type: float class: Sp_synchro variables: variable: split type: integer variable: percent type: list-of integer cardinality: min:0 max: * class: Component variables: variable: name type: string variable: definition type: string class: SelectFromWhere super-class: Component variables: variable: clause_from type: string variable: clause_where type: string class: RefTemplate super-class: SelectFromWhere variables: variable: ref_template type: string variable: parameters type: list-of string cardinality: min:0 max: * class: Parameter variables: variable: name type: string variable: type type: string variable: constructor type: string association: Has_link roles: role: sp_synchro type: Sp_synchro multiplicity: min: 0 max: * role: component type: Component multiplicity: min: 0 max: * variables: variable: abscissa type: integer variable: ordinate type: integer variable: abscissa_range type: list-of integer cardinality: min:0 max: * variable: ordinate_range type: list-of integer cardinality: min:0 max: * variable: star type: boolean association: DB_to_Template roles: role: db type: DB_Connection multiplicity: min: 1 max: 1 role: template type: Template multiplicity: min: 1 max: 1 instance: Labs is-a: DB_Connection db_location = “c:/db/edu/database” db_query = "select l from l in Laboratory" instance: Template1 is-a: Template id = "t1" duration = 400 tuple: is-a: DB_to_Template db: Labs template: Template1 tuple: is-a:Has_parameter template = Template1 parameter = Parameter1 instance: Parameter1 is-a: Parameter name = "spl" type = "laboratory" constructor = "set" tuple: is-a:Has_component template = Template1 component = Component1 instance: Component1 is-a: RefTemplate name = "c1" ref_template = "t2" parameters = ["pl"] clause_from = "pl in spl" tuple: is-a:Has_sp_synchro template = Template1 sp_synchro = Sp_synchro1 instance: Sp_synchro1 is-a: Sp_synchro tuple: is-a:Has_link sp_synchro = Sp_synchro1 star = true component = Component1 Table 2. An excerpt of the AROM Template Model. The two left columns show the principal classes and associations used for describing a Template and the way it is linked to a database. On the right, the instances and tuples describing the template t1 presented in section 5.1.
  • 17.
    8. 17 - Afourth step is dedicated to the generation of the AVS base according to the description of step 2. According to template definition, the generation of the spatial aspects: we translate the relative coordinates of the templates into absolute co-ordinates in the Region attributes top, height and width (cf. Table 1). The synchronization constraints that manage a given number of elements used in templates are translated in their absolute beginning time of presentation and duration (attributes begin and dur of the instances of the class CommonAttribute as described in Table 1). The constraint solver of AROM is then able to detect and to fix spatial of temporal inconsistencies. - Then, the generation of the SMIL file corresponding to the desired presentation is achieved through the description of section 4. 6. RELATED WORKS For a complete comparison of V-STORM with other multimedia projects, one can refer to [19]. Among numerous research works on authoring and presentation environments for interactive multimedia documents, Madeus [20] is a very complete environment with a graphical authoring interface and a spatial formatting editor. Madeus is based on a constraint-based approach. It offers flexibility for frequent scenario modifications carried out by the author before reaching the desired scenario, a coupling between the editing and presentation process, and an incremental editing process which consists in readjusting the solution each time the author adds or deletes a constraint. Constraint propagation maintains the consistency of the new scenario: at each editing step, the author is sure of having a consistent scenario. Our AVS model also relies on a similar approach since AROM integrates a constraint solver. The AML allows expressions of constraints involving classes, associations, objects or tuples. But for authoring, we put the emphasis on a yet more declarative approach through the use a UML-like model in which constraints are implicitly embedded into temporal and spatial operators. Also, unlike V-STORM, other presentation tools pay few attention to the video data type management. Finally, to our knowledge, this study is the first attempt to benefit from the expressing power of a object- based knowledge representation system to describe and check the consistency of a multimedia presentation. All major database products propose an interface with the Web: Versant, Objectstore and O2 for object technology; Oracle, Informix and Microsoft SQL Server / Access (referred later as MSQLS/A) for relational technology. However, none of these systems supports temporal expressions for query result presentations, and spatial features of result presentations are mostly
  • 18.
    18 Chapter 8 imperative(by HTLM-like templates or programs). Almost all systems provide a way to define complex presentations using programs, but no language definition specifically dedicated to the query result presentation is provided, and this is why the use of Templates is useful. From another point of view, some research works consider structured or semi-structured documents as input of the database, but they are also capable to generate structured documents as query results. WebSQL [21] uses a relational database, and queries are based on a “Select-From-Such That-Where” pattern to allow complex From expressions. Path expressions are supported. Results are defined in HTML tables. WebOQL [22] allows to define the format of query results and to reuse the results. WebOQL is a complex language, that is why a kind of template is used as a front-end. Query languages for XML documents have also been proposed. XML-QL [23] is a simple language that queries and builds XML documents. XML-QL allows the extraction of parts of XML documents and has the ability to perform joins and agregates, but no consideration is made for the presentation of documents. The SQL+D [24] proposals deal with presentation of relational data. A SQL+D query uses the usual “SELECT FROM WHERE” clause, and adds a “DISPLAY” clause in which we express the presentation features of the results. SQL+D distinguishes between multimedia documents (without a temporal presentation schedule) and multimedia presentation that put a temporal schedule on multimedia documents. The writing (and the reading) of the query presentations using layers is rather delicate, and our proposal aims at being more easy to generate by users. 7. CONCLUSION This paper has presented a first attempt to couple an object-based knowledge representation system (OBKRS) called AROM, with a multimedia presentation authoring tool named V-STORM. This coupling has three main results. First, the multimedia presentation scenario can be modeled using UML diagram class-like description of AROM which shown to be more intuitive than a SMIL file. Second, the inference and consistency engines of AROM check the presentation against validity. Third, the richness of V-STORM video operators is better exploited. The AROM KB proposed here, called AVS, is a generic model for multimedia presentations. Classes and associations of the AVS model just have to be instantiated to create an effective multimedia presentation. Notably, this model incorporates every characteristic of SMIL elements for describing how to arrange media objects in a scenario. A parser has been written to translate such an AROM
  • 19.
    8. 19 KB intoa SMIL document. At its turn, this SMIL document is parsed by V- STORM and the presentation is played. This papers has also presented an AROM model for the notion of Template. A template is a logical presentation unit for creating Web presentations according to different query expressions and to spatio-temporal constraints. Again, through an instantiated model, AROM is responsible for maintaining the consistency of a set of templates describing the behavior and the layout of a set of media objects, stored in a database, and involved in a multimedia presentation. This work is only at its beginning but three main directions are already privileged. The first one concerns the integration of the V-STORM video query language into the AROM model. The idea here is to substitute the OQL query language by the algebraic modeling language of AROM. Eventually, a parser will directly connect AROM to V-STORM, without having recourse to the existing AROM/SMIL parser. Second, a graphical timeline interface could help the user during the authoring process to interactively control the changes made on her/his multimedia document through a real time support. Third, parallel works we make [17] for a better use of database capabilities in the context of Web presentations could be integrated within the AVS model. The template model described here does not consider user interaction during the presentation of documents. We will work in the future on such integration by defining Event-Condition-Action rules on templates. Lastly, we are currently adapting the AROM models presented here to the new W3C proposed recommendation SMIL 2.0. 8. REFERENCES [1] A. Laursen, J. Olkin and M. Porter, Oracle Media Server: providing consumer interactive access to Multimedia data, SIGMOD, 1994. [2] K. Nwosu, B. Thuraisingham and B. Berra, Multimedia Database Systems: design and implementation strategies, Kluwer Academic Publishers, 1996. [3] B. Ozden, R. Rastogori and A. Silberschatz, Multimedia Database Systems, Issues and Research Directions, Springer-Verlag, 1996. [4] R. Weiss, A. Duda and D. Gifford, Composition and Search with a Video Algebra, IEEE multimedia, pp 12-25, Springer Ed., 1995. [5] Overview of the MPEG-7 Standard (version 5.0), ISO/IEC JTC1/SC29/WG11 N4031, March 2001. [6] R. Lozano, M. Adiba, F. Mocellin and H. Martin, An Object DBMS for Multimedia Presentations including Video Data, Proc. of ECOOP'98 Workshop Reader, Springer Verlag, Lecture Notes in Computer Science, 1543, 1998. [7] W3C Recommendation: Synchronized Multimedia Integration Language (SMIL) 1.0 Specification http://www.w3.org/TR/REC-smil [8] GriNS Authoring Software, http://www.oratrix.com/GRiNS/index.html [9] RealNetworks G2, http://www.realnetworks.com
  • 20.
    20 Chapter 8 [10]M. Page, J. Gensel, C. Capponi, C. Bruley, P. Genoud, D. Ziébelin, D. Bardou and V. Dupierris, A New Approach in Object-Based Knowledge Representation : the AROM System, IEA/AIE-2001, June 4-7, Budapest, Hungary, 2001, http://www.inrialalpes.fr/romans/arom [11] Mulhem and H. Martin, From Database to Web multimedia Documents, in Journal of Multimedia Tools and Applications (to appear), 2002. [12] R.G.G. Cattell and D. Barry, The Object Database Standard : ODMG 2.0, Morgan Kaufmann,1997. [13] R. J. Brachman and J. G. Schmolze, An Overview of the KL-ONE Knowledge Representation System, Communications of the ACM, 31 (4), pp. 382-401, 1988. [14] J. Rumbaugh, I. Jacobson and G. Booch, The Unified Modeling Language Reference Manual., Addison-Wesley, 1999. [15] P. Van Hentenryck, The OPL Optimization Programming Language, MIT Press, 1999. [16] W3C Recommendation: Extensible Markup Language (XML) 1.0 (Second Edition) http://www.w3.org/TR/REC-xml [17] A. Alasqur, OQL: A Query Language for Manipulating Object Oriented Databases, 15th VLDB Conference, Amsterdam, The Netherlands, September 1990. [18] H. Martin, «Specification of Intentional Multimedia Presentations using an Object- Oriented Database » Proc. of the International Symposium on Digital Media Information Base, Nara - Japan , November, 1997. [19] R. Lozano, Intégration de données video dans un SGBD à objets, PhD Thesis (in French), Joseph Fourier University, Grenoble, France, 2000. [20] M. Jourdan, N. Layaïda, C. Roisin, L. Sabry-Ismaïl and L. Tardif, Madeus, an Authoring Environment for Interactive Multimedia Documents, in ACM Multimedia, pp 267-272, Bristol, UK, 1998. [21] A. Mendelson, G. Mihaila, T. Milo, Querying the World Wide Web, Journal on Digital Libraries, Vol. 1, n. 1, pp. 54-67 [22] G. Arocena and A. Mendelzon, WebOQL: Restructuring Documents, Databases and Webs, Proc. of the ICDE Conference, Orlando, Florida, USA, February 1998, pp. 24-33. [23] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Suciu, XML-QL: A Query Language for XML, W3C, NOTE-xml-ql-19980819, August 1998. [24] C. Baral, G. Gonzalez and A. Nandigam, SQL+D: extended display capabilities for multimedia database queries, Proc. of the ACM Multimedia’98 Conference, Bristol, UK, pp.109-114.