SlideShare a Scribd company logo
1 of 4
Download to read offline
AN ANTHROPOCENTRIC DESCRIPTION SCHEME FOR MOVIES CONTENT
CLASSIFICATION AND INDEXING
N. Vretos, V. Solachidis and I. Pitas
Department of Informatics
University of Thessaloniki, 54124, Thessaloniki, Greece
phone: +30-2310996304, fax: +30-2310996304
email:vretos,vasilis,pitas@aiia.csd.auth.gr
ABSTRACT
MPEG-7 has emerged as the standard for multimedia data
content description. As it is in its early age, it tries to evolve
towards a direction in which semantic content description
can be implemented. In this paper we provide a number of
classes to extend the MPEG-7 standard so that it can handle
the video media data, in a more uniform way. Many descrip-
tors (Ds) and description schemes (DSs) already provided by
the MPEG-7 standard, can help to implement semantics of
a media. However, by grouping together several MPEG-7
classes and adding new Ds, better results in movie produc-
tion and analysis tasks can be produced. Several classes are
proposed in this context and we show that the corresponding
scheme produces a new profile which is more flexible in all
types of applications as they are described in [1].
1. INTRODUCTION
Digital video is the most essential media nowadays. It is used
in many multimedia applications such as communication, en-
tertainment, education etc. It is very easy to conclude that
video data increase exponentially with time and researchers
are focusing on finding better ways in classification and re-
trieval applications for video databases. The way of con-
structing videos has also changed in the last years. The po-
tential of digital videos gives producers better editing tools
for a film production. Several applications have been pro-
posed that help producers in doing modern film types and
manipulate all the film data faster and more accurately. For
a better manipulation of all the above, MPEG-7 standard-
izes the set of Ds, DSs, the description definition language
(DDL) and the description encoding [2]-[6]. Despite its early
age, many researchers have proposed several Ds and DSs to
improve MPEG-7 performance in terms of semantic content
description [7], [8].
In this paper we will try to develop several Ds and DSs
that will describe digital video content in a better and more
sophisticated way. Our efforts originate from the idea that a
movie entity is a part of several objects prior to and past to
the editing process. We incorporate information provided by
the preproduction process to improve semantic media content
description. This paper is organized as follows. In Section 2,
we describe the Ds and DSs. In Section 3, we give some ex-
amples of use of real data. Finally in Section 4, a conclusion
and future work directions are described.
2. MPEG-7 VIDEO DESCRIPTORS
In the MPEG-7 standard several descriptors are defined
which enable the implementation of video content descrip-
tions.Moreover MPEG-7 must provide mechanisms that can
Class Name Characterization
Movie Class Container Class
Version Class Container Class
Scene Class Container Class
Shot Class Container Class
Take Class Container Class
Frame Class Object Class
Sound Class Container Class
Actor Class Object CLass
Object Appearance Class Event Class
High Order Semantic Class Container Class
Camera Class Object CLass
Camera Use Class Event Class
Lens Class Object Class
Table 1: Classes introduced in the new framework in order to
implement semantics in multimedia support
propagate semantic entities from the preproduction to the
postproduction phase.Table 1 illustrates a summary of all the
classes that we propose in order to assure this connectivity
between the pre and the post production phases.
High Level Semantics
PREPRODUCTION
INFORMATION
Movie
Scene
Scene Scene
Take
Take
Take
Actor
Constant
Attributes
Object 1
Object 2 Object 3
Object 1
Sound 1
Camera
Constant
Attributes
Camera 1 Camera 3
Camera 1
Implemented
Version
Sound 2
POST PRODUCTION
Figure 1: The interoperability of all the classes enforces se-
mantic entities propagation and exchange in the preproduc-
tion phase.
Before going any further in providing the classes’ details,
we explain the characterization of those classes and their
relations (figure 1). Three types of classes are introduced:
container, object and event classes. Container classes, as in-
dicated by their name, contain other classes. For instance,
a movie class contains scenes that in turn contain shots or
takes, which in turn contain frames (optionally). This en-
capsulation can therefore be very informative in the relation
between semantics characteristics of those classes, because
parent classes can propagate semantics to child classes and
vice versa. For example, a scene which is identified as a
night scene, can propagate this semantic entity to its child
classes (Take or Shot). This global approach of semantic en-
tities not only facilitates the semantic extraction, but it also
gives a research framework in which low-level features can
be statistically compared very fast in the semantic informa-
tion extraction process. The object oriented interface (OOI)
which is applied in this framework provides flexibility in the
use and the implementation of algorithms for high seman-
tic extraction. On the other hand, object classes are constant
physical objects or abstract objects that interact with the me-
dia. Any relations of the form “this object interacts with that
object” are implemented within this interface. For example a
“specific actor is playing in a specific movie”, or “a specific
camera is used in this take”, or “a specific frame is contained
by this take or shot”. Finally, the events classes are used to
implement the interaction of the objects with the movie. An
example of this is “the specific camera object is “panning”
on this take”. From the examples we can clearly conclude
that high semantics relation can be easily implemented with
the use of those classes. More specifically:
The Movie Class is the root class of the hierarchy. It con-
tains all other classes and describes semantically all the
information for a piece of multimedia content. The movie
class holds what can be considered as static movie infor-
mation, such as crew, actors, cameras, director, title etc.
It also holds the list of the movie’s static scenes where
different instances of the Scene class(described later) are
stored. Finally, the movie class combines different seg-
ments of the static information in order to create different
movie versions within the Version class.
The Version Class encodes (contains) the playable versions
of a film. A movie can be built using this class. It
makes references to the static information (takes, pre
edited scenes, etc) in order to join different parts (take
fragments) and construct shot sequences (figure 2). It
can also reference a part of an already defined version’s
scene. For instance, a movie resume (summary) can be
made out of the original director’s scenes.
High Level Semantics
Implemented Version
Scene
Scene Scene
Shot
Shot
Shot
Object 1
Object 2 Object 3
Object 1
Camera 1 Camera 3
Camera 1
Sound 1
Sound 2
Figure 2: Implemented versions collect static movie infor-
mation in order to make a sequential playable movie. This
can be achieved either with references from the static movie
parts (Takes from pre production phase) or with new instan-
tiated shot classes.
The Scene Class contains low-level information with re-
spect to timing, like the start and end timecode of a scene
and the duration. A scene theme tag is made in order to
make possible a semantic definition of each scene. The
High Semantic tag is an instance of the High Order Se-
mantic Class (described later), which gives the possibil-
ity to describe semantic actions and events by means of
semantic data from the audiovisual content. It correlates
high-level information to provide a narrative description
of the scene. Expert systems can be used to provide in-
formation for those tags. A sound array tag has also been
introduced in a sense that sound and scene are highly cor-
related in a film. The sound array stores all instances
of sounds that are present in a scene. The sound array,
which will be described later in more details, along with
low-level information holds high level information, such
as the persons that are currently talking or the song that is
actually being played or the ambient sounds. Addition-
ally the sound array also captures timecode information
for every sound that is heard within the scene. Timecodes
can of course overlap (for example flirting while dancing
in a fine tune) and this information can be used for extrac-
tion of high level scene features. The take and shot array
are restricted fields of the scene class, which means that
one of them can be active for a particular instance. That
class can be used for both pre and postproduction levels.
The takes are used in the preproduction environment and
the shots in the postproduction environment (within the
Version Class).
The Shot Class holds semantic and low-level information
about the shot. Two versions are proposed: one with
frames and one without. In the second version, we con-
sider the shot as the atom of the movie, which means that
it is the movie’s essential element that cannot be divided
any further. Several attributes of this class are common in
both versions, like the serial number of the shot, the list
of appearance and disappearance of an actor, the camera
use, and others. The frameless version has a color infor-
mation tag and a texture information tag, which give in-
formation about the color and the texture within the shot.
The version with frames provides this feature within the
frame class.
The Take Class implements the description of a take. Low-
level and high level information is gathered within this
class to provide semantic feature extraction, content de-
scription and also to facilitate the implementation of a
production tool. A take is actually a continuous shot from
a specific camera. In the production room the director
and his/her colleagues edit takes in order to create a film
version. This implementation tries to provide the users
with editing ability, as well as assists the directors. For
instance, if a director stores all the takes in a multimedia
database and then creates an MPEG-7 file with this de-
scription scheme (s)he will have the ability to operate on
his/her film easily. The ”Synchronized With” tag holds
information about simultaneous takes. For example in a
dialog scenario, where three takes are simultaneous, but
taken from different cameras.
The Frame Class is the lowest level where feature extrac-
tion can take place. Within the frame class there are sev-
eral pieces of information that we would like to store. In
contrast with all other classes the frame is a purely spa-
tial description of a video instance. No time information
is stored in it. There is, of course, the absolute time (in
movie) and the local (in take / in shot) time of the frame
but whatever information is associated with actor posi-
tion, actor emotions, dominant color etc. does not have
a time dimension. The non-temporality of the frame can
be used for low-level feature extraction and also in a pro-
High Level Semantics
Implemented Version
Scene
Scene Scene
Shot
Shot
Shot
Object 1
Object 2 Object 3
Object 1
Camera 1 Camera 3
Camera 1
Sound 1
Sound 2
Movie
Scene
High Level Semantics
Scene Scene
Take
Take
Take
Actors
Constant
Attributes
Camera
Constant
Attributes
Camera 3
Camera 1
Object 1 Frame Frame
Frame
Object 2 Object 3 Object 1
PREPRODUCTION
INFORMATION
Implemented
Version
Sound 1
Sound 2
Camera 1
POST
PRODUCTION
Figure 3: Frames can be used with Take and Shot classes for
a deeper analysis, giving rise to an exponential growth of the
MPEG-7 file size.
duction tool for a frame-by-frame editing process (figure
3).
The Sound Class interfaces all kind of sounds that can ap-
pear in multimedia. Speech, music and noise are the ba-
sic parts of what can be heard in a movie context. This
class holds also the time that a particular sound started
and ended within a scene, and attributes that character-
ize this sound. The speech tag holds an array of speakers
within the scene and also the general speech type (nar-
ration, monolog, dialog, etc). The sound class provides
useful information for high level feature extraction.
The Actor Class & The Object Appearance Class The
actor class implements all the information that is useful
to describe an actor and also gives the possibility to seek
for one in a database, based on a visual description.
Low-level information for the actor’s interactions with
shots or takes is stored in the Object Appearance Class.
This information include the time that the actor enters
and leaves the shot. Also, if the actor re-enters the
shot several times this class holds a list of time-in and
time-out instances in order to handle that. A semantic list
of what the actor/object is actually doing in the shot, like
if he/she is running or just moving or killing someone, is
stored for high-level feature extraction in the High Order
Semantic Class. The Motion of the actor/object is held
as low-level information. The Key Points List is used to
describe any possible known Region Of Interest (ROI),
like the bounding box, the convex hull, the features
points etc. For instance, this implementation can provide
the trajectory of a specific ROI like the eye or a lip’s
lower edge etc. The pose estimation of the face and the
emotions of the actor (with an intensity value) within
the shot can also be captured. The latter three tags can
be used to create high-level information automatically.
The sound class also holds higher-level information for
the speakers and in combining the information we can
generate high-level semantics based also in the sound
content of the multimedia to provide instantiations of the
High Order Semantic Class.
The High Order Semantic Class organizes and combines
the semantic features, which are derived from the au-
diovisual content. For example, if a sequence of actors
shaking hands is followed by low volume crowd noise
and positives emotions this could probably be a countries
leaders handshake. The Object’s Narrative Identification
hosts semantics of actors and objects in a scene context,
for example (Tom’s Car) and not only the object identifi-
cation like (Tom, Car). The Action list refers to actions
performed by actors or other objects like “car crashes be-
fore a wall” or “actor is eating”. The events list holds
information of events that might occur in a scene like “A
plane crashes”. Finally, the two semantic tags, Time and
Location, define semantics for narrative time (night, day,
exact narrative time etc) and narrative location (indoor,
outdoor, cityscape, landscape etc).
The Camera & The Camera Use Class The camera class
holds all the information of a camera, such as manufac-
turer, type, lenses etc. The Camera Use class, which con-
tains the camera’s interaction with the film. The latter
is very useful for low-level and high-level feature extrac-
tion from the film. The camera motion tag uses a string
of characters that conform to the traditional camera mo-
tion styles. New styles have no need to re-implement the
class. The current zoom factor and lens is used for feature
extraction as well.
The Lens Class implements the characteristics of several
lenses that will be used in the production of a movie or
documentary. It is useful to know and store this informa-
tion in order to better extract low-level features from the
movie. Also for educational reasons one can search for
movies that are recorded with a particular lens.
This OOI essentially constitutes a novel approach of digital
video processing. We believe that in a video retrieval applica-
tion one can post queries in a form that only simple low-level
features cannot answer yet. Also, video annotation has been
proven [7]to be non productive, because of the objectivity of
the annotators and the time consuming annotation process.
The proposed classes have the ability to standardize the an-
notation context and they are defined in a way that low-level
features can be integrated, in order to enable an automatic ex-
traction of those high-level features. In figure 4 one can see
the relation between semantic space and technical (low-level)
features space. The proposed classes are an amalgam of low-
level features, like histograms, FFTs etc, and high seman-
tic entities, like scene theme, person recognition, emotions,
sounds qualification, etc. Nowadays researchers are very in-
terested in extracting high semantic features [9]-[11]. Having
this in mind, we believe that the proposed template can give
the genre of a new digital video processing approach.
Figure 4: Low-level features space is captured by human per-
ception algorithms and this mechanism will produce the se-
mantic content of the frame.
3. EXAMPLES OF USE
Three types of examples, will be provide in this section.
These include one for preproduction environment, one for
postproduction and the combination of preproduction and
postproduction.
• In preproduction environment the produced xml will in-
clude a movie class in witch takes, scenes, sound infor-
mation and all the static information of the movie will
be encapsulated. In Figure 5, one can visualize the en-
capsulation of those classes. Applications can handle the
MPEG-7 file in order to produce helpful tips for the di-
rector while he/she is in the editing room or better assist
him in editing with smart agents application.
Figure 5: Shots exist only in the post-production phase as
fragments of takes from the pre-production phase, and in-
herits all semantic and low-level features extracted for the
takes in the pre-production phase.This constitutes the seman-
tic propagation mechanism. Several versions can be imple-
mented without different video data files.
• In the second type of application all the types of the pull
and push applications, as defined in the MPEG-7 stan-
dard [1] can be realized or improved. Retrieval appli-
cation will be able to handle in a more uniform way all
kinds of request whether the latter are associated with
low-level features or high semantic entities.
• Finally, the third type of application, is the essential rea-
son for implementing such a framework. In such an ap-
plication a total interactivity with the user can be estab-
lished, by providing even the ability to reediting a film
and create different film version with the same amount
of data. For example, by saving the takes of a film and
altering the editing process, the user can reproduce the
film from scratch and with no additional storage of heavy
video files.Creating different permutation arrays of takes
and scenes results in different versions of the same film.
Nowadays media supports (DVD, VCD, etc) can hold a
very large amount of data. Such an implementation can
be realistic, if we consider that already produced DVDs
provides users something more than just a movie e.g.
(behind the scene tracks, or director’s cuts, actors’ inter-
views etc). Media Interactivity in media seems to gain a
big share in the market, the proposed OOI can provide a
very useful product. It is obvious that such a tool can give
rise to education programs, interactive television and all
the new revolutionary approaches of multimedia world.
4. CONCLUSION AND FURTHER WORK AREAS
As a conclusion we underline the fact that this framework
can be very useful for the new era of image processing witch
is focusing in semantic feature extraction and we believe that
it can provide specific research goals. Having this in mind, in
the future we will concentrate our research in filling automat-
ically the classes’ attributes. On going research has already
underdevelopment tools for extraction of high semantic fea-
ture.
5. ACKNOWLEDGEMENT
The work presented was developed within NM2 (New me-
dia for a New Millennium), a European Integrated Project
(http://www.ist-nm2.org), funded under the European Com-
mission IST FP6 programme.
REFERENCES
[1] I.S.O, “Information technology – multimedia content
description interface - part 6: Reference software,” ,
no. ISO/IEC JTC 1/SC 29 N 4163, 2001.
[2] I.S.O, “Information technology – multimedia content
description interface - part 1: Systems,” , no. ISO/IEC
JTC 1/SC 29 N 4153, 2001.
[3] I.S.O, “Information technology – multimedia content
description interface - part 2: Description definition
language,” , no. ISO/IEC JTC 1/SC 29 N 4155, 2001.
[4] I.S.O, “Information technology – multimedia content
description interface - part 3: Visual,” , no. ISO/IEC
JTC 1/SC 29 N 4157, 2001.
[5] I.S.O, “Information technology – multimedia content
description interface - part 4: Audio,” , no. ISO/IEC
JTC 1/SC 29 N 4159, 2001.
[6] I.S.O, “Information technology – multimedia content
description interface - part 5: Multimedia description
schemes,” , no. ISO/IEC JTC 1/SC 29 N 4161, 2001.
[7] John P. Eakins, “Retrieval of still images by content,”
pp. 111–138, 2001.
[8] A. Vakali, M.S. Hacid, and A. Elmagarmid, “Mpeg-7
based description schemes for multi-level video content
classification,” IVC, vol. 22, no. 5, pp. 367–378, May
2004.
[9] M. Krinidis, G. Stamou, H. Teutsch, S. Spors, N. Niko-
laidis, R. Rabenstein, and I. Pitas, “An audio-visual
database for evaluating person tracking algorithms,” in
Proc. of IEEE International Conference on Acoustics,
Speech and Signal Processing, Philadelphia, USA, 18-
23 March 2005, pp. 452–455.
[10] J. K. Aggarwal and Q. Cai, “Human motion analysis:
A review,” Computer Vision and Image Understanding,
vol. 73, no. 3, pp. 428–440, 1999.
[11] J.J. Wang and S. Singh, “Video analysis of human dy-
namics - a survey,” Real-Time Imaging, vol. 9, no. 5,
pp. 321–346, October 2003.

More Related Content

Similar to AN ANTHROPOCENTRIC DESCRIPTION SCHEME FOR MOVIES CONTENT CLASSIFICATION AND INDEXING

An Object-Based Knowledge Representation Approach for Multimedia Presentations
An Object-Based Knowledge Representation Approach for Multimedia PresentationsAn Object-Based Knowledge Representation Approach for Multimedia Presentations
An Object-Based Knowledge Representation Approach for Multimedia PresentationsMadjid KETFI
 
Breakdown film script using parsing algorithm
Breakdown film script using parsing algorithmBreakdown film script using parsing algorithm
Breakdown film script using parsing algorithmTELKOMNIKA JOURNAL
 
Optimal Repeated Frame Compensation Using Efficient Video Coding
Optimal Repeated Frame Compensation Using Efficient Video  CodingOptimal Repeated Frame Compensation Using Efficient Video  Coding
Optimal Repeated Frame Compensation Using Efficient Video CodingIOSR Journals
 
Video indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksVideo indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksIAEME Publication
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...Journal For Research
 
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONVISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONcscpconf
 
Shot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesShot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesCSCJournals
 
Motion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classificationMotion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classificationacijjournal
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureIRJET Journal
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Editor IJARCET
 
Video Forgery Detection: Literature review
Video Forgery Detection: Literature reviewVideo Forgery Detection: Literature review
Video Forgery Detection: Literature reviewTharindu Rusira
 
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...Best Jobs
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceINFOGAIN PUBLICATION
 
Animation Movies Trailer Computation
Animation Movies Trailer ComputationAnimation Movies Trailer Computation
Animation Movies Trailer ComputationEmma Burke
 
A Study of Image Compression Methods
A Study of Image Compression MethodsA Study of Image Compression Methods
A Study of Image Compression MethodsIOSR Journals
 
Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)
Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)
Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)Soichiro Murakami
 

Similar to AN ANTHROPOCENTRIC DESCRIPTION SCHEME FOR MOVIES CONTENT CLASSIFICATION AND INDEXING (20)

C1 mala1 akila
C1 mala1 akilaC1 mala1 akila
C1 mala1 akila
 
An Object-Based Knowledge Representation Approach for Multimedia Presentations
An Object-Based Knowledge Representation Approach for Multimedia PresentationsAn Object-Based Knowledge Representation Approach for Multimedia Presentations
An Object-Based Knowledge Representation Approach for Multimedia Presentations
 
Breakdown film script using parsing algorithm
Breakdown film script using parsing algorithmBreakdown film script using parsing algorithm
Breakdown film script using parsing algorithm
 
Optimal Repeated Frame Compensation Using Efficient Video Coding
Optimal Repeated Frame Compensation Using Efficient Video  CodingOptimal Repeated Frame Compensation Using Efficient Video  Coding
Optimal Repeated Frame Compensation Using Efficient Video Coding
 
Video indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksVideo indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracks
 
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
VIDEO SUMMARIZATION: CORRELATION FOR SUMMARIZATION AND SUBTRACTION FOR RARE E...
 
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONVISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
 
AcademicProject
AcademicProjectAcademicProject
AcademicProject
 
Shot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesShot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion Activities
 
Motion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classificationMotion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classification
 
Automated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU ArchitectureAutomated Image Captioning – Model Based on CNN – GRU Architecture
Automated Image Captioning – Model Based on CNN – GRU Architecture
 
video comparison
video comparison video comparison
video comparison
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
 
C04841417
C04841417C04841417
C04841417
 
Video Forgery Detection: Literature review
Video Forgery Detection: Literature reviewVideo Forgery Detection: Literature review
Video Forgery Detection: Literature review
 
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importance
 
Animation Movies Trailer Computation
Animation Movies Trailer ComputationAnimation Movies Trailer Computation
Animation Movies Trailer Computation
 
A Study of Image Compression Methods
A Study of Image Compression MethodsA Study of Image Compression Methods
A Study of Image Compression Methods
 
Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)
Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)
Paper introduction: Sequence to Sequence - Video to Text (ICCV2015)
 

More from Nicole Heredia

Buy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - EssayBuy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - EssayNicole Heredia
 
PPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK UPPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK UNicole Heredia
 
Informative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To WriteInformative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To WriteNicole Heredia
 
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7TRANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7TNicole Heredia
 
Writing Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTubeWriting Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTubeNicole Heredia
 
How To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-ByHow To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-ByNicole Heredia
 
College Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay WritiCollege Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay WritiNicole Heredia
 
Typewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment StockTypewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment StockNicole Heredia
 
How To Write An Essay Introduction Paragraph
How To Write An Essay Introduction ParagraphHow To Write An Essay Introduction Paragraph
How To Write An Essay Introduction ParagraphNicole Heredia
 
Descriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A DesDescriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A DesNicole Heredia
 
Fall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, FreFall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, FreNicole Heredia
 
Apa Research Paper Order
Apa Research Paper OrderApa Research Paper Order
Apa Research Paper OrderNicole Heredia
 
Tips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation PaperTips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation PaperNicole Heredia
 
Dr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, UnderlinDr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, UnderlinNicole Heredia
 
Thesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - ThesiThesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - ThesiNicole Heredia
 
Writing Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With RubricWriting Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With RubricNicole Heredia
 
Sample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker ScSample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker ScNicole Heredia
 
Sample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management PraSample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management PraNicole Heredia
 
Lap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather LittletonLap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather LittletonNicole Heredia
 
They Say I Essay Templates For College
They Say I Essay Templates For CollegeThey Say I Essay Templates For College
They Say I Essay Templates For CollegeNicole Heredia
 

More from Nicole Heredia (20)

Buy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - EssayBuy Essays Online From Successful Essay - Essay
Buy Essays Online From Successful Essay - Essay
 
PPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK UPPT - Top Essay Writing Company In Australia, UK U
PPT - Top Essay Writing Company In Australia, UK U
 
Informative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To WriteInformative Essay Examples, Samples, Outline To Write
Informative Essay Examples, Samples, Outline To Write
 
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7TRANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
RANGER SHOWCASE INFORMATION - Mrs. RatliffS 7T
 
Writing Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTubeWriting Objectives Part 1 - YouTube
Writing Objectives Part 1 - YouTube
 
How To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-ByHow To Improve Your Essay Writing Quickly A Step-By
How To Improve Your Essay Writing Quickly A Step-By
 
College Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay WritiCollege Essay Help - Get Online College Essay Writi
College Essay Help - Get Online College Essay Writi
 
Typewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment StockTypewriter With Blank Sheet Of Paper Arts Entertainment Stock
Typewriter With Blank Sheet Of Paper Arts Entertainment Stock
 
How To Write An Essay Introduction Paragraph
How To Write An Essay Introduction ParagraphHow To Write An Essay Introduction Paragraph
How To Write An Essay Introduction Paragraph
 
Descriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A DesDescriptive Essays GCE Guide - How To Write A Des
Descriptive Essays GCE Guide - How To Write A Des
 
Fall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, FreFall Scarecrow Stationary For Kids Fall Scarecrows, Fre
Fall Scarecrow Stationary For Kids Fall Scarecrows, Fre
 
Apa Research Paper Order
Apa Research Paper OrderApa Research Paper Order
Apa Research Paper Order
 
Tips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation PaperTips How To Write A Paper Presentation By Presentation Paper
Tips How To Write A Paper Presentation By Presentation Paper
 
Dr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, UnderlinDr. MLKJ Reading Passage - Read, Answer Qs, Underlin
Dr. MLKJ Reading Passage - Read, Answer Qs, Underlin
 
Thesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - ThesiThesis Statement For Character Analysis Essay Example - Thesi
Thesis Statement For Character Analysis Essay Example - Thesi
 
Writing Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With RubricWriting Paper - Differentiated Writing Paper With Rubric
Writing Paper - Differentiated Writing Paper With Rubric
 
Sample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker ScSample Scholarship Announcement Maker Sc
Sample Scholarship Announcement Maker Sc
 
Sample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management PraSample Essay On Human Resource Management Pra
Sample Essay On Human Resource Management Pra
 
Lap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather LittletonLap Desk, 10 Writing Board, Best American Wood Leather Littleton
Lap Desk, 10 Writing Board, Best American Wood Leather Littleton
 
They Say I Essay Templates For College
They Say I Essay Templates For CollegeThey Say I Essay Templates For College
They Say I Essay Templates For College
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Recently uploaded (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

AN ANTHROPOCENTRIC DESCRIPTION SCHEME FOR MOVIES CONTENT CLASSIFICATION AND INDEXING

  • 1. AN ANTHROPOCENTRIC DESCRIPTION SCHEME FOR MOVIES CONTENT CLASSIFICATION AND INDEXING N. Vretos, V. Solachidis and I. Pitas Department of Informatics University of Thessaloniki, 54124, Thessaloniki, Greece phone: +30-2310996304, fax: +30-2310996304 email:vretos,vasilis,pitas@aiia.csd.auth.gr ABSTRACT MPEG-7 has emerged as the standard for multimedia data content description. As it is in its early age, it tries to evolve towards a direction in which semantic content description can be implemented. In this paper we provide a number of classes to extend the MPEG-7 standard so that it can handle the video media data, in a more uniform way. Many descrip- tors (Ds) and description schemes (DSs) already provided by the MPEG-7 standard, can help to implement semantics of a media. However, by grouping together several MPEG-7 classes and adding new Ds, better results in movie produc- tion and analysis tasks can be produced. Several classes are proposed in this context and we show that the corresponding scheme produces a new profile which is more flexible in all types of applications as they are described in [1]. 1. INTRODUCTION Digital video is the most essential media nowadays. It is used in many multimedia applications such as communication, en- tertainment, education etc. It is very easy to conclude that video data increase exponentially with time and researchers are focusing on finding better ways in classification and re- trieval applications for video databases. The way of con- structing videos has also changed in the last years. The po- tential of digital videos gives producers better editing tools for a film production. Several applications have been pro- posed that help producers in doing modern film types and manipulate all the film data faster and more accurately. For a better manipulation of all the above, MPEG-7 standard- izes the set of Ds, DSs, the description definition language (DDL) and the description encoding [2]-[6]. Despite its early age, many researchers have proposed several Ds and DSs to improve MPEG-7 performance in terms of semantic content description [7], [8]. In this paper we will try to develop several Ds and DSs that will describe digital video content in a better and more sophisticated way. Our efforts originate from the idea that a movie entity is a part of several objects prior to and past to the editing process. We incorporate information provided by the preproduction process to improve semantic media content description. This paper is organized as follows. In Section 2, we describe the Ds and DSs. In Section 3, we give some ex- amples of use of real data. Finally in Section 4, a conclusion and future work directions are described. 2. MPEG-7 VIDEO DESCRIPTORS In the MPEG-7 standard several descriptors are defined which enable the implementation of video content descrip- tions.Moreover MPEG-7 must provide mechanisms that can Class Name Characterization Movie Class Container Class Version Class Container Class Scene Class Container Class Shot Class Container Class Take Class Container Class Frame Class Object Class Sound Class Container Class Actor Class Object CLass Object Appearance Class Event Class High Order Semantic Class Container Class Camera Class Object CLass Camera Use Class Event Class Lens Class Object Class Table 1: Classes introduced in the new framework in order to implement semantics in multimedia support propagate semantic entities from the preproduction to the postproduction phase.Table 1 illustrates a summary of all the classes that we propose in order to assure this connectivity between the pre and the post production phases. High Level Semantics PREPRODUCTION INFORMATION Movie Scene Scene Scene Take Take Take Actor Constant Attributes Object 1 Object 2 Object 3 Object 1 Sound 1 Camera Constant Attributes Camera 1 Camera 3 Camera 1 Implemented Version Sound 2 POST PRODUCTION Figure 1: The interoperability of all the classes enforces se- mantic entities propagation and exchange in the preproduc- tion phase. Before going any further in providing the classes’ details, we explain the characterization of those classes and their relations (figure 1). Three types of classes are introduced: container, object and event classes. Container classes, as in- dicated by their name, contain other classes. For instance, a movie class contains scenes that in turn contain shots or takes, which in turn contain frames (optionally). This en- capsulation can therefore be very informative in the relation
  • 2. between semantics characteristics of those classes, because parent classes can propagate semantics to child classes and vice versa. For example, a scene which is identified as a night scene, can propagate this semantic entity to its child classes (Take or Shot). This global approach of semantic en- tities not only facilitates the semantic extraction, but it also gives a research framework in which low-level features can be statistically compared very fast in the semantic informa- tion extraction process. The object oriented interface (OOI) which is applied in this framework provides flexibility in the use and the implementation of algorithms for high seman- tic extraction. On the other hand, object classes are constant physical objects or abstract objects that interact with the me- dia. Any relations of the form “this object interacts with that object” are implemented within this interface. For example a “specific actor is playing in a specific movie”, or “a specific camera is used in this take”, or “a specific frame is contained by this take or shot”. Finally, the events classes are used to implement the interaction of the objects with the movie. An example of this is “the specific camera object is “panning” on this take”. From the examples we can clearly conclude that high semantics relation can be easily implemented with the use of those classes. More specifically: The Movie Class is the root class of the hierarchy. It con- tains all other classes and describes semantically all the information for a piece of multimedia content. The movie class holds what can be considered as static movie infor- mation, such as crew, actors, cameras, director, title etc. It also holds the list of the movie’s static scenes where different instances of the Scene class(described later) are stored. Finally, the movie class combines different seg- ments of the static information in order to create different movie versions within the Version class. The Version Class encodes (contains) the playable versions of a film. A movie can be built using this class. It makes references to the static information (takes, pre edited scenes, etc) in order to join different parts (take fragments) and construct shot sequences (figure 2). It can also reference a part of an already defined version’s scene. For instance, a movie resume (summary) can be made out of the original director’s scenes. High Level Semantics Implemented Version Scene Scene Scene Shot Shot Shot Object 1 Object 2 Object 3 Object 1 Camera 1 Camera 3 Camera 1 Sound 1 Sound 2 Figure 2: Implemented versions collect static movie infor- mation in order to make a sequential playable movie. This can be achieved either with references from the static movie parts (Takes from pre production phase) or with new instan- tiated shot classes. The Scene Class contains low-level information with re- spect to timing, like the start and end timecode of a scene and the duration. A scene theme tag is made in order to make possible a semantic definition of each scene. The High Semantic tag is an instance of the High Order Se- mantic Class (described later), which gives the possibil- ity to describe semantic actions and events by means of semantic data from the audiovisual content. It correlates high-level information to provide a narrative description of the scene. Expert systems can be used to provide in- formation for those tags. A sound array tag has also been introduced in a sense that sound and scene are highly cor- related in a film. The sound array stores all instances of sounds that are present in a scene. The sound array, which will be described later in more details, along with low-level information holds high level information, such as the persons that are currently talking or the song that is actually being played or the ambient sounds. Addition- ally the sound array also captures timecode information for every sound that is heard within the scene. Timecodes can of course overlap (for example flirting while dancing in a fine tune) and this information can be used for extrac- tion of high level scene features. The take and shot array are restricted fields of the scene class, which means that one of them can be active for a particular instance. That class can be used for both pre and postproduction levels. The takes are used in the preproduction environment and the shots in the postproduction environment (within the Version Class). The Shot Class holds semantic and low-level information about the shot. Two versions are proposed: one with frames and one without. In the second version, we con- sider the shot as the atom of the movie, which means that it is the movie’s essential element that cannot be divided any further. Several attributes of this class are common in both versions, like the serial number of the shot, the list of appearance and disappearance of an actor, the camera use, and others. The frameless version has a color infor- mation tag and a texture information tag, which give in- formation about the color and the texture within the shot. The version with frames provides this feature within the frame class. The Take Class implements the description of a take. Low- level and high level information is gathered within this class to provide semantic feature extraction, content de- scription and also to facilitate the implementation of a production tool. A take is actually a continuous shot from a specific camera. In the production room the director and his/her colleagues edit takes in order to create a film version. This implementation tries to provide the users with editing ability, as well as assists the directors. For instance, if a director stores all the takes in a multimedia database and then creates an MPEG-7 file with this de- scription scheme (s)he will have the ability to operate on his/her film easily. The ”Synchronized With” tag holds information about simultaneous takes. For example in a dialog scenario, where three takes are simultaneous, but taken from different cameras. The Frame Class is the lowest level where feature extrac- tion can take place. Within the frame class there are sev- eral pieces of information that we would like to store. In contrast with all other classes the frame is a purely spa- tial description of a video instance. No time information is stored in it. There is, of course, the absolute time (in movie) and the local (in take / in shot) time of the frame but whatever information is associated with actor posi- tion, actor emotions, dominant color etc. does not have a time dimension. The non-temporality of the frame can be used for low-level feature extraction and also in a pro-
  • 3. High Level Semantics Implemented Version Scene Scene Scene Shot Shot Shot Object 1 Object 2 Object 3 Object 1 Camera 1 Camera 3 Camera 1 Sound 1 Sound 2 Movie Scene High Level Semantics Scene Scene Take Take Take Actors Constant Attributes Camera Constant Attributes Camera 3 Camera 1 Object 1 Frame Frame Frame Object 2 Object 3 Object 1 PREPRODUCTION INFORMATION Implemented Version Sound 1 Sound 2 Camera 1 POST PRODUCTION Figure 3: Frames can be used with Take and Shot classes for a deeper analysis, giving rise to an exponential growth of the MPEG-7 file size. duction tool for a frame-by-frame editing process (figure 3). The Sound Class interfaces all kind of sounds that can ap- pear in multimedia. Speech, music and noise are the ba- sic parts of what can be heard in a movie context. This class holds also the time that a particular sound started and ended within a scene, and attributes that character- ize this sound. The speech tag holds an array of speakers within the scene and also the general speech type (nar- ration, monolog, dialog, etc). The sound class provides useful information for high level feature extraction. The Actor Class & The Object Appearance Class The actor class implements all the information that is useful to describe an actor and also gives the possibility to seek for one in a database, based on a visual description. Low-level information for the actor’s interactions with shots or takes is stored in the Object Appearance Class. This information include the time that the actor enters and leaves the shot. Also, if the actor re-enters the shot several times this class holds a list of time-in and time-out instances in order to handle that. A semantic list of what the actor/object is actually doing in the shot, like if he/she is running or just moving or killing someone, is stored for high-level feature extraction in the High Order Semantic Class. The Motion of the actor/object is held as low-level information. The Key Points List is used to describe any possible known Region Of Interest (ROI), like the bounding box, the convex hull, the features points etc. For instance, this implementation can provide the trajectory of a specific ROI like the eye or a lip’s lower edge etc. The pose estimation of the face and the emotions of the actor (with an intensity value) within the shot can also be captured. The latter three tags can be used to create high-level information automatically. The sound class also holds higher-level information for the speakers and in combining the information we can generate high-level semantics based also in the sound content of the multimedia to provide instantiations of the High Order Semantic Class. The High Order Semantic Class organizes and combines the semantic features, which are derived from the au- diovisual content. For example, if a sequence of actors shaking hands is followed by low volume crowd noise and positives emotions this could probably be a countries leaders handshake. The Object’s Narrative Identification hosts semantics of actors and objects in a scene context, for example (Tom’s Car) and not only the object identifi- cation like (Tom, Car). The Action list refers to actions performed by actors or other objects like “car crashes be- fore a wall” or “actor is eating”. The events list holds information of events that might occur in a scene like “A plane crashes”. Finally, the two semantic tags, Time and Location, define semantics for narrative time (night, day, exact narrative time etc) and narrative location (indoor, outdoor, cityscape, landscape etc). The Camera & The Camera Use Class The camera class holds all the information of a camera, such as manufac- turer, type, lenses etc. The Camera Use class, which con- tains the camera’s interaction with the film. The latter is very useful for low-level and high-level feature extrac- tion from the film. The camera motion tag uses a string of characters that conform to the traditional camera mo- tion styles. New styles have no need to re-implement the class. The current zoom factor and lens is used for feature extraction as well. The Lens Class implements the characteristics of several lenses that will be used in the production of a movie or documentary. It is useful to know and store this informa- tion in order to better extract low-level features from the movie. Also for educational reasons one can search for movies that are recorded with a particular lens. This OOI essentially constitutes a novel approach of digital video processing. We believe that in a video retrieval applica- tion one can post queries in a form that only simple low-level features cannot answer yet. Also, video annotation has been proven [7]to be non productive, because of the objectivity of the annotators and the time consuming annotation process. The proposed classes have the ability to standardize the an- notation context and they are defined in a way that low-level features can be integrated, in order to enable an automatic ex- traction of those high-level features. In figure 4 one can see the relation between semantic space and technical (low-level) features space. The proposed classes are an amalgam of low- level features, like histograms, FFTs etc, and high seman- tic entities, like scene theme, person recognition, emotions, sounds qualification, etc. Nowadays researchers are very in- terested in extracting high semantic features [9]-[11]. Having this in mind, we believe that the proposed template can give the genre of a new digital video processing approach. Figure 4: Low-level features space is captured by human per- ception algorithms and this mechanism will produce the se- mantic content of the frame.
  • 4. 3. EXAMPLES OF USE Three types of examples, will be provide in this section. These include one for preproduction environment, one for postproduction and the combination of preproduction and postproduction. • In preproduction environment the produced xml will in- clude a movie class in witch takes, scenes, sound infor- mation and all the static information of the movie will be encapsulated. In Figure 5, one can visualize the en- capsulation of those classes. Applications can handle the MPEG-7 file in order to produce helpful tips for the di- rector while he/she is in the editing room or better assist him in editing with smart agents application. Figure 5: Shots exist only in the post-production phase as fragments of takes from the pre-production phase, and in- herits all semantic and low-level features extracted for the takes in the pre-production phase.This constitutes the seman- tic propagation mechanism. Several versions can be imple- mented without different video data files. • In the second type of application all the types of the pull and push applications, as defined in the MPEG-7 stan- dard [1] can be realized or improved. Retrieval appli- cation will be able to handle in a more uniform way all kinds of request whether the latter are associated with low-level features or high semantic entities. • Finally, the third type of application, is the essential rea- son for implementing such a framework. In such an ap- plication a total interactivity with the user can be estab- lished, by providing even the ability to reediting a film and create different film version with the same amount of data. For example, by saving the takes of a film and altering the editing process, the user can reproduce the film from scratch and with no additional storage of heavy video files.Creating different permutation arrays of takes and scenes results in different versions of the same film. Nowadays media supports (DVD, VCD, etc) can hold a very large amount of data. Such an implementation can be realistic, if we consider that already produced DVDs provides users something more than just a movie e.g. (behind the scene tracks, or director’s cuts, actors’ inter- views etc). Media Interactivity in media seems to gain a big share in the market, the proposed OOI can provide a very useful product. It is obvious that such a tool can give rise to education programs, interactive television and all the new revolutionary approaches of multimedia world. 4. CONCLUSION AND FURTHER WORK AREAS As a conclusion we underline the fact that this framework can be very useful for the new era of image processing witch is focusing in semantic feature extraction and we believe that it can provide specific research goals. Having this in mind, in the future we will concentrate our research in filling automat- ically the classes’ attributes. On going research has already underdevelopment tools for extraction of high semantic fea- ture. 5. ACKNOWLEDGEMENT The work presented was developed within NM2 (New me- dia for a New Millennium), a European Integrated Project (http://www.ist-nm2.org), funded under the European Com- mission IST FP6 programme. REFERENCES [1] I.S.O, “Information technology – multimedia content description interface - part 6: Reference software,” , no. ISO/IEC JTC 1/SC 29 N 4163, 2001. [2] I.S.O, “Information technology – multimedia content description interface - part 1: Systems,” , no. ISO/IEC JTC 1/SC 29 N 4153, 2001. [3] I.S.O, “Information technology – multimedia content description interface - part 2: Description definition language,” , no. ISO/IEC JTC 1/SC 29 N 4155, 2001. [4] I.S.O, “Information technology – multimedia content description interface - part 3: Visual,” , no. ISO/IEC JTC 1/SC 29 N 4157, 2001. [5] I.S.O, “Information technology – multimedia content description interface - part 4: Audio,” , no. ISO/IEC JTC 1/SC 29 N 4159, 2001. [6] I.S.O, “Information technology – multimedia content description interface - part 5: Multimedia description schemes,” , no. ISO/IEC JTC 1/SC 29 N 4161, 2001. [7] John P. Eakins, “Retrieval of still images by content,” pp. 111–138, 2001. [8] A. Vakali, M.S. Hacid, and A. Elmagarmid, “Mpeg-7 based description schemes for multi-level video content classification,” IVC, vol. 22, no. 5, pp. 367–378, May 2004. [9] M. Krinidis, G. Stamou, H. Teutsch, S. Spors, N. Niko- laidis, R. Rabenstein, and I. Pitas, “An audio-visual database for evaluating person tracking algorithms,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, USA, 18- 23 March 2005, pp. 452–455. [10] J. K. Aggarwal and Q. Cai, “Human motion analysis: A review,” Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428–440, 1999. [11] J.J. Wang and S. Singh, “Video analysis of human dy- namics - a survey,” Real-Time Imaging, vol. 9, no. 5, pp. 321–346, October 2003.