2. The problem
The massive increase in digital audio-visual information
poses high demands on advanced storage and search
engines for consumers and professional archives.
Video is now a natural form of communication
for the Internet and mobile devices.
Video search engines are the product of progress in many
technologies: visual and audio analysis, machine learning
techniques, as well as visualization and interaction.
giovedì 1 luglio 2010
3. Two solutions
www.vidivideo.info www.im3i.eu
giovedì 1 luglio 2010
4. VidiVideo: project overview
The VidiVideo project addressed the
challenge of creating a substantially
enhanced semantic access to video,
implemented in a search engine.
The outcome of the project is an audio-visual search
engine, composed of two parts: a automatic annotation
part, that runs off-line, where detectors for more
than 1000 semantic concepts are collected in a
thesaurus to process and automatically annotate the
video and an interactive part that provides a video
search engine for both technical and non-technical
users.
giovedì 1 luglio 2010
5. VidiVideo: project results
The automatic annotation part of the system performs audio
and video segmentation, speech recognition,
speaker clustering and semantic concept detection.
The VidiVideo system has achieved the highest
performance in the most important object and concept
recognition international contests (PASCAL VOC and
TRECVID).
The interactive part provides a desktop-based and a
web-based search engines. The system permits different
query modalities (free text, natural language, graphical
composition of concepts using boolean and temporal relations
and query by visual example) and visualizations for video
retrieval and browsing.
giovedì 1 luglio 2010
6. Call Identifier FP7-SME-2010-1
Submitted 03 December 2009
VidiVideo: project partners
Name of the co-ordinating person Dr.-Ing. Georgios Ioannidis
E-Mail gi@in-two.com
Fax +49-179-33-2286677
No. Participant Name Type Short Name Country
1 IN2 search interfaces development Ltd SME IN2 UK
2 spring techno GmbH SME SPRING DE
3 VISup Srl SME VISUP IT
4 Hogeschool voor de Kunsten Utrecht RTDP HKU NL
5 University Firenze RTDP UNIFI IT
6 Instituto de Engenharia de Sistemas e RTDP INESC-ID PT
Computadores
giovedì 1 luglio 2010
7. IM3I: project overview
IM3I aims to provide the creative media sector with new
ways of searching, summarising and visualising large
multimedia archives.
IM3I will provide a service-oriented architecture
that allow multiple viewpoints upon multimedia data that
are available in a repository, and provide better ways to
interact and share rich media. This paves the way for a
multimedia information management
platform which is more flexible, adaptable and
customisable than current repository software.
This in turn enables new opportunities for content
owners to exploit their digital assets.
giovedì 1 luglio 2010
8. IM3I: project results
Developed a set of tools for automatic audio-visual
annotation and search
Developed a set of web services to manage, create and
orchestrate the indexing services
Developed a set of specialized search and
management interfaces
IM3I authoring platform: allows professional users to
import and publish repositories of digital media, authoring of
web-based environments for the end-users, creation of
elaborate workflow patterns and search & retrieval interfaces
to allow a diversity of end-user interactions and scenarios
giovedì 1 luglio 2010
11. Visual annotation
• Split a video detecting shots and large content changes
with very fast algorithm
• Use different annotation strategies and types of
detectors:
• low level (color, B/W, motion)
• Haar-based boosted classifiers
• HOG + SVMs
• Bag-of-words
• k-NN + voting (for tag suggestion)
• simple MPEG-7 XML format (full and fragment)
giovedì 1 luglio 2010
12. Baseline: typical BoW
Hierarch.
clustering
Feature
extract.
visual words
histo
Learning
giovedì 1 luglio 2010
13. Fusion schemes
• Early fusion: integrates unimodal features before learning concepts.
• Late fusion: first reduces unim. feat. to separately learned concepts
scores, then these scores are integrated to learn concepts.
giovedì 1 luglio 2010
14. Fusion schemes
• Early fusion: integrates unimodal features before learning concepts.
• Late fusion: first reduces unim. feat. to separately learned concepts
scores, then these scores are integrated to learn concepts.
giovedì 1 luglio 2010
15. Early fusion approach
Hierarch.
clustering
• Hypothesis: MSER isolate semantically relevant information.
• Idea: represent points that have some spatial relation with regions that are inside, outside, just
on the border
• Sampling: SIFT-SURF, dense.
giovedì 1 luglio 2010
16. Late fusion approach
Hierarch.
clustering
Hierarch.
clustering
!"#
!1 !2
!"###$%#&'%(!")#*%+,$-#&'-(!")#*%+......$%#&'%(!")#*/+,$-#&'-(!")#*/+#
• Use SURF/SIFT + MSER
• Use geometric descriptors for MSERs
giovedì 1 luglio 2010
17. Test: baseline
Time Avg. Max
Method Sampling # points Time
accuracy accuracy
• Best: SURF 64 Grid 10 (accuracy, computational cost)
• SURF 64 Grid 5: +7-8% accuracy, +300% time
• the number of points influences accuracy
giovedì 1 luglio 2010
18. Test: early fusion
Sampling Avg. Max
Method # points Time Time
accuracy accuracy
• Best: EF SURF 64 Grid 10 (accuracy, computational cost)
• EF SURF 64 Borders: many points, accuracy ~ that of Grid 10 but higher
computational costs
• EF SURF 64 Grid 10 is worst than SURF 64 Grid 10, but much faster (50% of
execution time)
giovedì 1 luglio 2010
19. Test: late fusion
Method 1 Method 2 Accuracy
• weighting 0.6 (best method) and 0.4 (worst method) lead to good results
• best performance: dense sampling + sparse sampling
• best combination: SURF 64 + EF SURF 64 Grid 10 (improved accuracy, modest
computational cost increase)
giovedì 1 luglio 2010
20. Conclusions
• Early fusion strategies:
• ~ baseline accuracy
• faster
• Late fusion strategies:
• better accuracy than baseline
• each method corrects some errors made by the other
• fuse keypoints/regions (SURF, fusion of SURF and
MSER)
• IM3I users will be able to chose what’s best for them
giovedì 1 luglio 2010
22. Video search engine
Our goal is to provide a search engine for videos
for both technical and non-technical users.
Provide different interfaces that permit different query
modalities: free-text, natural language,
graphical composition of concepts using boolean and
temporal relations and query by visual example.
In addition, exploit ontologies and their structure
to encode semantic relations between
concepts permitting, for example, to expand queries to
synonyms and concept specializations.
giovedì 1 luglio 2010
23. Sirio and Orione
• Design goals/assumptions:
• semantic content-based retrieval
• efficient web-based interface
• System features: • System interface query options:
• Sirio is a Rich Internet • ontology exploration using a
Application (in Adobe Flex) front graph-based view
end.
• compact keyframe-based results
• Orione is web service search engine presentation / streaming videos
• Support for multiple ontologies • concept drag&drop facility (to build
and ontology reasoning complex queries)
• Results are in Media RSS format • natural language query (with Boolean/
(queries treated as RSS feeds) temporal ops.)
• New search engine able to scale • free text query (for Google-like
to large number of instances of search)
ontology concepts
giovedì 1 luglio 2010
32. Andromeda
• System interface query options:
• Design goals/assumptions:
• Shows the concepts with more
instances in a concept cloud view
• semantic content-based browsing
• efficient web-based interface using • Graph representation of
semantic data structure
RIA
• System features: • Multiple automatic layout algorithms
for spatial positioning and manual drag
• Query manager as a Rich Internet & drop
Application (in Adobe Flex).
Connects to web service (search • Thumbnails view of the instances of
each concept
engine)
• Support for multiple ontologies • Access to video metadata and video
streaming
and ontology reasoning
• Access to social content related
to ontology concepts (Flickr,YouTube,
and real time tweets from Twitter)
giovedì 1 luglio 2010
39. Pan
• Design goals/assumptions:
• complete/correct automatic
annotations
• System interface options
• help in training new automatic
• Integrated with web-based
concept detectors
search engine and automatic
• System features: video annotation
• Rich Internet Application • Multiple user profiles: a
(in Adobe Flex). simple user may change his own
annotations, while a super user
• video streaming using the same can import the annotations of
system of Sirio and Andromeda other users, e.g. to supervise
the annotation process
• new backend within an organization.
• geotagging using Google Maps
giovedì 1 luglio 2010
46. Daphnis
• Design goals/assumptions:
• build on image tagging made popular • System interface options
by Flickr and tag clouds
• users can tag images and retrieve
images based on tags, or use tags
• connect to social web sites to filter the results of similarity
based retrieval.
• allow CBIR
• System features: • Ongoing work:
• Rich Internet Application • merging with automatic video
annotation for automatic
(in Adobe Flex).
tagging
• Connects to Flickr (and also
• adoption of mechanisms for
Facebook, if needed)
tag suggestion, based on
• Approximate nearest recent research work in this
field (use content, tags and
neighbour search using MPEG-7
descriptors, to scale to large number geolocalization)
of images
giovedì 1 luglio 2010
51. IM3I: authoring platform
A CMS approach to repository
analysis, authoring and publication
giovedì 1 luglio 2010
52. IM3I: authoring platform
Authoring IM3I end-user functionality typically covers 5
distinctive stages:
• Importing an existing repository from RSS and various
XML streams
• Extending the associated datamodel
• Editing layout and editing features
• Editing Search and Retrieval interfaces
• Embedding the IM3I end-user interfaces in a (corporate)
website
giovedì 1 luglio 2010
53. Editing workflow demo
•Step 1: Importing a video-repository
•Step 2: Enhancing the datamodel
•Step 3: Authoring layouts
•Step 4: Publishing the repository
giovedì 1 luglio 2010
54. I: Importing a repository
•Importing an existing repository to an internal and
flexible datamodel
•Aggregating and harmonizing multiple repositories
•Visualisation of markup and preview of contents
•Flexibly mapping by drag-and-drop
giovedì 1 luglio 2010
55. I: Importing a repository
Mapping the
contents of video
RSS to an IM3I
Datamodel
giovedì 1 luglio 2010
56. II: Enhancing the Datamodel
•Datamodels contain the descriptions of your
repository and in this way stipulate what can be
shown to- or retrieved by an end-user.
•Datamodels can reference to each other
•Datamodels can be extended overtime by adding
elements
•Elements are based on types: media files, URIs, date,
string, etc.
•Elements can be shared across datamodels to allow
search & retrieval across multiple collections
giovedì 1 luglio 2010
57. II: Enhancing the Datamodel
Adding a ‘translation’ element to the datamodel
giovedì 1 luglio 2010
58. II: Enhancing the Datamodel
Adding a ‘translation’ element to the datamodel
giovedì 1 luglio 2010
59. III: Layout and Functionality
Easy manipulation of layout to a repository by:
•Table metaphor (easy editing of table
characteristics)
•Drag and drop graphical elements
•Drag and drop contents of repository in cells
•Easy manipulation of look and feel
•Easy adding editing functionalities to a layout
•Easy preview and markup functionalities
giovedì 1 luglio 2010
60. III: Layout and Functionality
Defining a layout table
giovedì 1 luglio 2010
61. III: Layout and Functionality
Dragging repository contents to layout
giovedì 1 luglio 2010
62. III: Layout and Functionality
Previewing layout
giovedì 1 luglio 2010
63. IV: Embedding in website
Easy blend- in of layouts in corporate websites
•By means of plugins for CMSs (e.g. WebManager,
WordPress, Typo3)
•Using <embed> </embed>
•Allowing for elaborate workflow patterns in
combining multiple layouts
giovedì 1 luglio 2010
64. IV: Embedding in website
Original
contents Added
Translation
Functionality
giovedì 1 luglio 2010
66. Atlante - process manager
• Main functions of this
• Web application that is used for application are:
creation, technical
administration and monitoring • creation of new type of
of IM3I processing pipeline (e.g. (distributed) process
automatic annotation process,
media transcoding, etc.) • params setting for new type
of process
• This web application has
• creation of “Multiprocess”
multiple user profile:
composed by sets of single
• managers (distributed) Processes
• administrators • starting/pausing/stopping a
process
• monitoring running processes
giovedì 1 luglio 2010
70. Gaia - media manager
• Web application that will be used for a technical
administration and monitoring of the database
• Main functions of this application are:
• media management
• configuration of metadata, broadcasters,
Annotations types, Concept types and Media types
• media annotations monitoring by technical backend
giovedì 1 luglio 2010
76. ACM MM 2010 Workshop
3rd International Workshop on Automated Information Extraction in Media Production
AIEMPro'10
Organizers:
Dr. Robbie De Sutter
Vlaamse Radio- en Televisieomroep - Medialab
Jean-Pierre Evain
European Broadcasting Union . Union Européenne de Radiotélévision
Dr. Gerald Friedland
ICSI (International Computer Science Institute)
Dr. Alberto Messina
RAI Radiotelevisione Italiana, Centre for Research and Technological Innovation
Dr. Masanori Sano
NHK (Japan Broadcasting Corporation) Science and Technology Research Laboratories
giovedì 1 luglio 2010