TechnologyCatalog
quaero-catalogue-210x210-v1.6-page-per-page.indd 1 02/10/2013 09:53:32
quaero-catalogue-210x210-v1.6-page-per-page.indd 2 02/10/2013 09:53:32
CatalogueTechnologique
quaero-catalogue-210x210-v2-dernieres-pages.indd 1 02/10/2013 12:47:23
Quaero, premier pôle de recherche et
d’innovation sur les technologies de traitement
automatique des contenus multimédias et multilingues
À l’origine de Quaero, il y a le besoin de fédérer et renforcer
une filière technologique émergente, celle du traitement
sémantique des contenus multimédias et multilingues (texte,
parole, musique, images fixes, vidéo, documents numérisés).
Il y a aussi la volonté de se comparer régulièrement à l’état de
l’art international, d’organiser la chaîne complète de transfert
technologique, et de mobiliser les acteurs de cette filière
autour d’applications correspondant à des marchés identifiés
comme importants, tels que les moteurs de recherche, la
télévision personnalisée et la numérisation du patrimoine
culturel.
Ceprogrammeestportéparunconsortiumde32partenaires
publics et privés, français et allemands. Pendant la phase de
R&D, de 2008 à 2013, ces partenaires ont produit des corpus,
effectué des recherches sur un large spectre scientifique,
développé et testé des modèles de plus en plus élaborés,
partagéleurexpérience,intégréleurslogiciels. Leprogramme
de travail a été adapté aux évolutions du contexte. De
nouvelles applications sont apparues, comme la gestion
du courrier entrant en entreprise ou l’aide à la création de
site web multilingues, et les efforts sur les technologies
correspondantesontétérenforcés,commelareconnaissance
d’écriture ou la traduction automatique. Les collaborations,
notamment entre disciplines et entre laboratoires de
recherche et entreprises, se sont approfondies.
Fruit de ces travaux, qui ont donné lieu à plus de
800publicationsnationalesetinternationalesetde
nombreusesdistinctions,unecentainedemodules
technologiques et démonstrateurs applicatifs ont
été développés, dont certains commencent déjà
à être exploités commercialement. Un grand
nombre d’entre eux présente un intérêt au-delà
des membres du consortium. Ils font l’objet du
présent catalogue.
Ce catalogue technologique présente 72
modules ou démonstrateurs décrits chacun
dans une double page qui en précise le
domaine d’application et les caractéristiques
techniques. Il est composé de deux parties :
59•	 modules technologiques, organisés
par domaine thématique ; la liste des 12
domaines, qui apparaît p. 4, est rappelée
sur chaque page de gauche
13•	 démonstrateurs applicatifs ; la liste,
qui apparaît p. 5, est rappelée sur chaque
page de gauche
Le lecteur pourra également effectuer une
recherche par partenaire à partir de l’index
en fin de document
quaero-catalogue-210x210-v1.6-page-per-page.indd 4 02/10/2013 09:53:32
The Quaero program stems from the need to federate and
strengthen an emerging technological sector, dealing with
the semantic processing of multimedia and multilingual
content (text, speech, music, still and video image, scanned
documents). It also arises from the will to systematically
benchmark results against international standards, to
organize a complete technological transfer value chain, and
to mobilize the actors of the sector around applications
corresponding to identified and potentially large markets,
suchassearchengines,personalizedTV,andthedigitization
of cultural heritage.
Quaero, the first research and innovation cluster
on multimedia and multilingual content processing
This program is borne by a consortium of 32
FrenchandGermanpartnersfromthepublicand
privatesector. DuringtheR&Dphase,from2008
to 2013, these partners have produced corpora,
performed research covering a large scientific
spectrum, developed and tested increasingly
elaborate models, shared experience, integrated
software. The work plan has been adapted
to the context evolutions. New applications
have appeared, such as the management of
professional incoming mail or computer-aided
multilingual web site creation, and additional
efforts have been put on the corresponding
technologies, such as handwriting recognition
or machine translation. Collaboration became
more extensive, especially across disciplines and
between research and industry.
Thanks to these efforts, which led to more than
800 national and international publications and
to numerous distinctions, about one hundred
core technology modules and application
demonstrators have been developed, some
of them being already commercially exploited.
Many of these technologies are of interest
beyond the consortium members. Presenting
them is the purpose of this catalog.
The Technology Catalog presents 72 modules or
demonstrators, each of them being described in a
double page which provides details on the application
domain and technical characteristics. It is composed of
two parts:
59•	 Core Technology Modules, organized per
thematic domains; the list of 12 domains, provided on
p.4, is reminded on each left-hand page
13•	 Application Demonstrators; their list, provided
on p.5, is reminded on each left-hand page
The catalog can also be searched by institution using the
index provided at the end of the document.
quaero-catalogue-210x210-v1.6-page-per-page.indd 5 02/10/2013 09:53:32
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
4
quaero-catalogue-210x210-v1.6-page-per-page.indd 4 02/10/2013 09:53:32
Application Demonstrators
Chromatik - p148
MediaCentric® - p152
MuMa: The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA:
Multimedia Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech® product line - p154
Personalized and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
5
quaero-catalogue-210x210-v1.6-page-per-page.indd 5 02/10/2013 09:53:35
AlvisAE: Alvis Annotation Editor -
Inra p8
KIWI: Keyword extractor - Inria p14
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
6
quaero-catalogue-210x210-v1.6-page-per-page.indd 6 02/10/2013 09:53:47
AlvisIR - Inra p10
TyDI: Terminology Design Interface
- Inra p16
Alvis NLP: Alvis Natural Language
Processing - Inra p12
7
quaero-catalogue-210x210-v1.6-page-per-page.indd 7 02/10/2013 09:53:47
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
8
Alvis Annotation Editor
Application sectors
Any sector using text documents•	
Information Extraction•	
Contents analysis•	
Target users and customers
With AlvisAE, remote users display annotated
documents in their web browser and manually create
new annotations over the text and share them.
Partners:
Inra
quaero-catalogue-210x210-v1.6-page-per-page.indd 8 02/10/2013 09:53:59
9
AlvisAE: Alvis Annotation Editor
Contact details:
Robert Bossy
robert.bossy@jouy.inra.fr
Description:
Technical requirements:
Conditions for access and use:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas France
http://bibliome.jouy.inra.fr
AlvisAE is a Web Annotation Editor designed
to display and edit fine-grained semantic formal
annotations of textual documents. The annotations
are used for fast reading or for training Machine
Learning algorithms in text mining. The annotations
can also be stored in a database and queried. The
annotations are entities, n-ary relations and groups.
The entities can be discontinuous and overlapping.
They are typed by a small set of categories or by concepts from an
external ontology.
The user can dynamically extend the ontology by dragging new
annotations from the text to the ontology. AlvisAE supports
collaborative and concurrent annotations and adjudication. Input
documents can be in HTML or text format. AlvisAE takes also
as input semantic pre-annotations automatically produced by
AlvisNLP.
Server side: Java 6 or 7, a Java Application and a
RDMS.
Client Side: The client application can be run by any
recent JavaScript enabled web browser (e.g. Firefox,
Chromium, Safari). Internet Explorer is not supported.
AlvisAE software is developped by INRA,
Mathématique, Informatique et Génome lab. It is
property of INRA.
AlvisAE can be supplied under licence on a case-by-
case basis. An open-source distribution is planned in
the short term.
quaero-catalogue-210x210-v1.6-page-per-page.indd 9 02/10/2013 09:54:03
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
10
Semantic document indexing and
search engine framework
Target users and customers
Domain-specific communities, especially technical
and scientific, willing to build search engines and
information systems to manage documents with fine-
grained semantic annotations.
Partners:
Inra
Application sectors
Search engines and information systems
development.
quaero-catalogue-210x210-v1.6-page-per-page.indd 10 02/10/2013 09:54:15
11
AlvisIR
Contact details:
Robert Bossy
robert.bossy@jouy.inra.fr
Description:
Technical requirements:
Conditions for access and use:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas cedex France
http://bibliome.jouy.inra.fr
Linux platform•	
Perl•	
libxml2•	
Zebra indexing engine•	
PHP5•	
Sources available upon request. Free of use for
academic institutions.
AlvisIR is a complete suite for indexing documents
with fine-grained semantic annotations. The search
engine performs a semantic analysis of the user query
and searches for synonyms and sub-concepts.
AlvisIR has two main components:
1. the indexing tool and search daemon based on
IndexData’s Zebra that supports standard CQL
queries,
2. the web user interface featuring result snippets,
query-term highlight, facet filtering and concept
hierarchy browsing.
Setting up a search engine requires the semantic resources for
query analysis (synonyms and concept hierarchy) and a set of
annotated documents. AlvisIR is closely integrated with AlvisNLP
and TyDI for document annotation and semantic resources
acquisition respectively.
Indicative indexing time: 24mn for a corpus containing 5 million
annotations.
Indicative response time: 18s for a response containing 20,000
annotations.
quaero-catalogue-210x210-v1.6-page-per-page.indd 11 02/10/2013 09:54:19
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
12
A pipeline framework for Natural
Language Processing
Target users and customers
The targeted audience includes projects that
require usual Natural Language Processing tools for
production and research purpose.
Partners:
Inra
Application sectors
Natural language processing•	
Contents analysis•	
Information retrieval•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 12 02/10/2013 09:54:32
13
Alvis NLP: Alvis Natural Language Processing
Contact details:
Robert Bossy
robert.bossy@jouy.inra.fr
Description:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas cedex France
http://bibliome.jouy.inra.fr
Technical requirements:
Java 7 Weka
Conditions for access and use:
Sources available upon request. Free of use for
academic institutions.
Alvis NLP is a pipeline framework to annotate text
documents using Natural Language Processing (NLP)
tools for sentence and word segmentation, named-entity
recognition, term analysis, semantic typing and relation
extraction (see the paper by Nedellec et al. in Handbook
on Ontologies 2009 for a comprehensive overview).
The various available functions are accessible as modules,
that can be composed in a sequence forming the pipeline.
This sequence, as well as parameters for the modules, is
specified through a XML-based configuration file.
New components can easily be integrated into the pipeline.
To implement a new module, one has to build a Java class
manipulating text annotations following the data model
defined in Alvis NLP.
The class is loaded at run-time by Alvis NLP, which makes
the integration much easier.
quaero-catalogue-210x210-v1.6-page-per-page.indd 13 02/10/2013 09:54:34
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
14
Keyword extractor
Target users and customers
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in textual document
processing.
Partners:
Inra
Application sectors
Textual and multimedia document processing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 14 02/10/2013 09:54:47
15
KIWI: Keyword extractor
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Description:
Technical requirements:
Conditions for access and use:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
SPC with Unix/Linux OS•	
Kiwi requires the TreeTagger [1] software to be•	
installed on the system
Kiwi requires the Flemm [2] software to be installed on•	
the system
[1] http://www.ims.uni-stuttgart.de/projekte/corplex/
TreeTagger/
[2] http://www.univnancy2.fr/pers/namer/Telecharger_
Flemm.htm
Kiwi is a software that has been developed at Irisa/Inria-
Rennes and is the property of Inria. Registration at the
Agency for Program Protection (APP) in France, is under
process.
Kiwi is currently available as a prototype only. It can be
released and supplied under license on a case-by-case basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Kiwi is a software dedicated to the extraction of
keywords from a textual document. From an input
text, or preferably a normalized text, Kiwi outputs
a weighted word vector (see figure 1 below). This
ranked keyword vector can then be used as a
document description or for indexing purposes.
Kiwi is a software dedicated to the extraction of keywords from a
textual document.
From an input text, or preferably a normalized text, Kiwi outputs
a weighted word vector (see figure 1 below). This ranked keyword
vector can then be used as a document description or for indexing
purposes.
Kiwi was developed at Irisa/INRIA Rennes by the Texmex team.
The Kiwi author is: Gwénolé Lecorvé
quaero-catalogue-210x210-v1.6-page-per-page.indd 15 02/10/2013 09:54:52
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
16
A platform for the validation,
structuration and export of termino-
ontologies
Target users and customers
The primary use of TyDI is the design of termino-
ontologies for the indexing of textual documents.
It can therefore be of great help for most projects
involved in natural language processing.
Partners:
Inra
Application sectors
Terminology structuration•	
Textual document indexing•	
Natural language processing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 16 02/10/2013 09:55:05
17
TyDI: Terminology Design Interface
Contact details:
Robert Bossy
robert.bossy@jouy.inra.fr
Description:
Technical requirements:
Conditions for access and use:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas Cedex France
http://bibliome.jouy.inra.fr
Server side: Glassfish and Postgresql servers•	
Client side: Java Virtual Machine version 1.5•	
TyDI is a software developped by INRA, Mathématique,
Informatique et Génome and is the property of INRA.
TyDI can be supplied under licence on a case-by-case
basis. For more information, please contact Robert Bossy
(robert.bossy@jouy.inra.fr)
Figure 1: The client interface of TyDI. It is composed of
several panels (hierarchichal/tabular view of the terms,
search panel, context of appearance of selected terms …)
TyDI is a collaborative tool for manual validation/
annotation of terms either originating from
terminologies or extracted from training corpus
of textual documents. It is used on the output of
so-called term extractor programs (like Yatea),
which are used to identify candidate terms (e.g.
compound nouns).
Thanks to TyDI, a user can validate candidate terms and specify
synonymy/hyperonymy relations. These annotations can then be
exported in several formats, and used in other Natural Language
Processing tools.
quaero-catalogue-210x210-v1.6-page-per-page.indd 17 02/10/2013 09:55:12
FIDJI:Web Question-Answering
System - LIMSI - CNRS p20
RITEL: Spoken and Interactive
Question-Answering System -
LIMSI - CNRS p26
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
18
quaero-catalogue-210x210-v1.6-page-per-page.indd 18 02/10/2013 09:55:24
Question-Answering System -
Synapse Développement p22
QAVAL: Question Answering by
Validation - LIMSI - CNRS p24
19
quaero-catalogue-210x210-v1.6-page-per-page.indd 19 02/10/2013 09:55:24
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
20
A question-answering system aims at
answering questions written in natural
language with a precise answer.
Target users and customers
Web Question-answering is an end-user application.
FIDJI is an open-domain QA system for French and
English
Partners:
LIMSI-CNRS
Application sectors
Information retrieval on the Web or in document
collections
quaero-catalogue-210x210-v1.6-page-per-page.indd 20 02/10/2013 09:55:37
21
FIDJI: Web Question-Answering System
Contact details:
Véronique Moriceau
moriceau@limsi.fr
Description:
LIMSI-CNRS
Groupe ILES B.P. 133
91403 Orsay Cedex France
http://www.limsi.fr/
Technical requirements:
PC with Linux platform
Conditions for access and use:
Available for licensing on case-by-case basis
Xavier Tannier
xtannier@limsi.fr
Document retrieval systems such as search engines
provide the user with a large set of pairs URL/snippets
containing relevant information with respect to a query.
To obtain a precise answer, the user then needs to locate
relevant information within the documents and possibly to
combine different pieces of information coming from one
or several documents.
To avoid these problems, focused retrieval aims at
identifying relevant documents and locating the precise
answer to a user question within a document. Question-
answering (QA) is a type of focused retrieval: its goal
is to provide the user with a precise answer to a natural
language question. While information retrieval (IR)
methods are mostly numerical and use only little linguistic
knowledge, QA often implies deep linguistic processing,
large resources and expert rule-based modules.
Most question-answering systems can extract the
answer to a factoid question when it is explicitly present
in texts, but are not able to combine different pieces
of information to produce an answer. FIDJI (Finding
In Documents Justifications and Inferences), an open-
domain QA system for French and English, aims at going
beyond this insufficiency and focuses on introducing text
understanding mechanisms.
The objective is to produce answers which are fully
validated by a supporting text (or passage) with respect to
a given question. The main difficulty is that an answer (or
some pieces of information composing an answer) may be
validated by several documents. For example:
Q: Which French Prime Minister committed suicide?
A: Pierre Bérégovoy
P1: The French Prime Minister Pierre Bérégovoy warned Mr. Clinton
against…
P2: Two years later, Pierre Bérégovoy committed suicide after he was
indirectly implicated…
In this example, the information “French Prime Minister” and “committed
suicide” are validated by two different complementary passages. Indeed,
this question may be decomposed into two sub-questions, e.g. “Who
committed suicide?” and “Are they French Prime Minister?”.
FIDJI uses syntactic information, especially dependency relations which
allow question decomposition. The goal is to match the dependency
relations derived from the question and those of a passage and to
validate the type of the potential answer in this passage or in another
document.
Another important aim of FIDJI is to answer new categories of questions,
called complex questions, typically “how” and “why” questions. Complex
questions do not exist in traditional evaluation campaigns but have been
introduced within the Quaero framework. Answers to these particular
questions are no longer short and precise answers, but rather parts of
documents or even full documents. In this case, the linguistic analysis of
the question provides a lot of information concerning the possible form of
the answer and keywords that should be sought in candidate passages.
quaero-catalogue-210x210-v1.6-page-per-page.indd 21 02/10/2013 09:55:37
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
22
A Question Answering system allows the user
to ask questions in natural language and to
obtain one or several answers. For boolean
and generic questions, our system is able to
generate potential questions and to return the
corresponding answers.
Target users and customers
End-user application, Question-Answering is the
easiest way to find information for everybody: ask
the question as you want and obtain answers, not
snippets or pages.
Partners:
Synapse Développement
Application sectors
Search and find precise answers in any collection
of texts, from the Web or any other source (voice
recognition, optical character recognition, etc.),
with eventual correction of the source text, ability to
generate questions from generic requests, eventually
a single word, ability to find similar questions and
their answers, etc.
Monolingual and multilingual Question-Answering
system. Languages: English, French (+ Spanish,
Portuguese, Polish, with partners using the same
API).
quaero-catalogue-210x210-v1.6-page-per-page.indd 22 02/10/2013 09:55:50
23
Question-Answering System
Contact details:
Patrick Séguéla
patrick.seguela@synapse-fr.com
Description:
Technical requirements:
Conditions for access and use:
Synapse Développement
33, rue Maynard
31000 Toulouse France
http://www.synapse-developpement.fr/
SPC with Windows or Linux•	
RAM minimum : 4 Gb•	
HDD minimum : 100 Gb•	
SDK available for integration in programs or Web
services.
For specific conditions of use, contact us.
The technology is a system based on very
consequent linguistic resources and on NLP state-
of-the-art technologies, especially, syntactic and
semantic parsing, with sophisticated features like
resolution of anaphora, word sense disambiguation or
relations between named entities.
On news and Web corpora, our system is regularly awarded in the
international and national evaluation campaigns (EQueR 2004,
CLEF 2005, 2006, 2007, Quaero 2008, 2009).
quaero-catalogue-210x210-v1.6-page-per-page.indd 23 02/10/2013 09:55:55
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
24
A question answering system that is
adapted for searching precise answers
in textual passages extracted from Web
documents or text collections.
Target users and customers
Question-answering is for both the general public
to retrieve precise information in raw texts, and for
companies and organizations, that have specific text
mining needs. Question-answering systems suggest
short answers and their justification passage to
questions provided in natural language.
Partners:
LIMSI-CNRS
Application sectors
Extension of search engine, technology monitoring
quaero-catalogue-210x210-v1.6-page-per-page.indd 24 02/10/2013 09:56:08
25
QAVAL: Question Answering by VALidation
Contact details:
Brigitte Grau
Brigitte.Grau@limsi.fr
Technical requirements:
Conditions for access and use:
LIMSI-CNRS
ILES Group B.P. 133
91403 Orsay Cedex France
www.limsi.fr/Scientifique/iles
Linux platform
Available for licensing on a case-by-case basis
Description:
The large number of documents currently on the
Web, but also on intranet systems, makes it necessary
to provide users with intelligent assistant tools to help
them find the specific information they are searching
for. Relevant information at the right time can help
solving a particular task. Thus, the purpose is to be able
to access the content of texts, and not only give access
to documents. Question-answering systems address
this question.
Question-answering systems aim at finding answers to
a question asked in natural language, using a collection
of documents. When the collection is extracted from
the Web, the structure and style of the texts are
quite different from those of newspaper articles. We
developed a question-answering system QAVAL
based on an answer validation process able to handle
both kinds of documents. A large number of candidate
answers are extracted from short passages in order
to be validated, according to question and excerpt
characteristics. The validation module is based on a machine
learning approach. It takes into account criteria characterizing
both excerpt and answer relevance at surface, lexical, syntactic and
semantic levels, in order to deal with different types of texts.
QAVAL is made of sequential modules, corresponding to five
main steps. The question analysis provides main characteristics to
retrieve excerpts and guide the validation process. Short excerpts
are obtained directly from the search engine and are parsed and
enriched with the question characteristics, which allows QAVAL
to compute the different features for validating or discarding
candidate answers.
quaero-catalogue-210x210-v1.6-page-per-page.indd 25 02/10/2013 09:56:15
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
26
A spoken and interactive QA system
that helps a user to find an answer to his
question, spoken or written, in a collection
of documents.
Target users and customers
Question-answering is an end-user application.
The purpose is to go beyond the traditional way of
retrieving information through search engines. Our
system is interactive, with both a speech (phone or
microphone) and text (web) interface.
Partners:
LIMSI-CNRS
Application sectors
QA system can be viewed as a direct extension of
search engines. They allow a user to ask questions in
natural language.
quaero-catalogue-210x210-v1.6-page-per-page.indd 26 02/10/2013 09:56:28
27
RITEL: Spoken and Interactive Question-Answering System
Contact details:
Sophie Rosset
sophie.rosset@limsi.fr
Description:
Technical requirements:
Conditions for access and use:
LIMSI-CNRS
TLP Group B.P. 133
91403 Orsay Cedex France
http://www.limsi.fr/tlp/
PC with Linux platform.
Available for licensing on case-by-case basis.
There are different ways to go beyond standard
retrieval systems such as search engines. One of them
is to offer the users different ways to express their
query: some prefer to use speech to express a query,
while others prefer written natural language. Another
way is to allow the user to interact with the system.
The Ritel system aims at integrating a dialog system
and an open-domain information retrieval system to
allow a human to ask a general question (f.i. « Who is
currently presiding the Senate?’’ or « How did the price
of gas change for the last ten years?’’) and refine his
research interactively.
A human-computer dialog system analyses and acts
on the user requests depending on the task at hand,
the previous interactions and the user’s behaviour.
Its aim is to provide the user with the information
being sought while maintaining a smooth and natural
interaction flow.
The following example illustrates the kind of possible
interaction with the Ritel system:
[S] Hi, Ritel speaking! What is your first question?
[U] who built the Versailles Castle
[S] Your search is about Versailles Castle and built . The answer is
Louis XIII. Do you want to ask another question?
[U] in which year
[S] 1682, according to the documents I had access to. Another
question?
[U] Who designed the garden
[S] The following items are used for searching: Versailles, gardens
and designed. André Le Nôtre. Anything else?
The dialog system is comprised of a component for user utterance
analysis, a component for dialog management, and a component for
interaction management. The system for information retrieval and
question-answering is tightly integrated within it. The user interface
can be phone-based or web-based for written interaction.
quaero-catalogue-210x210-v1.6-page-per-page.indd 27 02/10/2013 09:56:33
Acoustic Speaker Diarization -
LIMSI - CNRS p30
Automatic Speech Transcription -
Vocapia p36
Speech-to-Text -
Karlsruhe Institute of Technology
(KIT) p42
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
28
quaero-catalogue-210x210-v1.6-page-per-page.indd 28 02/10/2013 09:56:45
MediaSpeech®alignement -Vecsys
p32
Corinat®- Vecsys p38
Automatic Speech Recognition -
RWTH Aachen University p34
Language Identification -
Vocapia p40
29
quaero-catalogue-210x210-v1.6-page-per-page.indd 29 02/10/2013 09:56:45
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
30
The module aims at performing
automatic segmentation and clustering
of an input audio according to speaker
identity using acoustic cues.
Target users and customers
Multimedia document indexing and archiving
services.
Partners:
LIMSI-CNRS
Application sectors
Multimedia document management•	
Search by content into audio-visual documents•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 30 02/10/2013 09:56:59
31
Acoustic Speaker Diarization
Contact details:
Claude Barras
claude.barras@limsi.fr
Description:
Technical requirements: Conditions for access and use:
LIMSI-CNRS
Spoken Language Processing Group B.P. 133
91403 Orsay Cedex France
http://www.limsi.fr/tlp/
A standard PC with Linux operating system. The technology developed at LIMSI-CNRS is
available for licensing on a case-by-case basis.
Speaker diarization is the process of partitioning an
input audio stream into homogeneous segments
according to their speaker identity. This partitioning
is a useful preprocessing step for an automatic
speech transcription system, but it can also improve
the readability of the transcription by structuring the
audio stream into speaker turns. One of the major
issues is that the number of speakers in the audio
stream is generally unknown a priori and needs to be
automatically determined.
Given samples of known speaker’s voices, speaker
verification techniques can be further applied and
provide clusters of identified speaker.
The LIMSI multi-stage speaker diarization system combines an
agglomerative clustering based on Bayesian information criterion
(BIC) with a second clustering stage using speaker identification
(SID) techniques with more complex models.
This system participated to several evaluations on acoustic
speaker diarization, on US English Broadcast News for NIST
Rich Transcription 2004 Fall (NIST RT’04F) and on French
broadcast radio and TV news and conversations for the ESTER-1
and ESTER-2 evaluation campaigns, providing state-of-the-art
performances. Within the QUAERO program, LIMSI is developing
improved speaker diarization and speaker tracking systems for
broadcast news but also for more interactive data like talk shows.
It is a building block of the system presented by QUAERO partners
to the REPERE challenge on multimodal person identification.
quaero-catalogue-210x210-v1.6-page-per-page.indd 31 02/10/2013 09:57:00
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
32
Audio and Text synchronization tool
Target users and customers
E-editors•	
Media content producers•	
Media application developers•	
Search interface integrators•	
Partners:
Vecsys
Bertin Technologies
Exalead
Application sectors
Public/Private debates and conference: E.g.•	
Parliament, Meetings
E-learning/E-books: E.g. Audiobook•	
Media Asset Management: E.g. Search in•	
annotated media streams (TV, radio, films…)
quaero-catalogue-210x210-v1.6-page-per-page.indd 32 02/10/2013 09:57:14
33
MediaSpeech® alignment
Contact details:
Ariane Nabeth-Halber
anabeth@vecsys.fr
Technical requirements: Conditions for access and use:
Vecsys
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
http://www.vecsys.fr
Standard Web access Available in SaaS mode or installed on a server or installed as
Virtual Machine in MediaSpeech® product line.
Quotation on request
Description:
This technology synchronizes an audio stream with its
associated text transcript: it takes as inputs both audio
stream and raw transcript and produces as output a
“time coded” transcript, i.e. each word or group of
words is associated with its precise occurrence in the audio stream.
The technology is pretty robust and handles nicely slight variations
between audio speech and text transcript.
quaero-catalogue-210x210-v1.6-page-per-page.indd 33 02/10/2013 09:57:15
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
34
Automatic speech recognition, also
known as speech-to-text, is the
transcription of speech into (machine-
readable) text by a computer
Target users and customers
Researchers•	
Developers•	
Integrators•	
Partners:
RWTH Aachen University
Application sectors
The use of automatic speech recognition is so
manifold that it is hard to list here. Main usages
today are customer interaction via the telephone,
healthcare dictation and usage on car navigation
systems and smartphones. With increasingly better
technology, these applications are extending to audio
mining, speech translation and an increased use of
human computer interaction via speech.
quaero-catalogue-210x210-v1.6-page-per-page.indd 34 02/10/2013 09:57:29
35
Automatic Speech Recognition
Contact details:
Volker Steinbiss
steinbiss@informatik.rwth-aachen.de
Description:
Technical requirements:
Conditions for access and use:
RWTH Aachen University
Lehrstuhl Informatik 6
Templergraben 55
52072 Aachen Germany
http://www-i6.informatik.rwth-aachen.de
Speech translation is a computationally and
memory-intensive process, so the typical set-up is
to have one or several computers in the internet
serving the speech translation requirements of
many users.
RWTH provides on open-source speech recognizer
free of charge for academic usage.
Other usage should be subject to a bilateral
agreement.
Automatic speech recognition is a very hard problem
in computer science but more mature than machine
translation.
After a media hype at the end of the 1990’s, the
technology has continuously improved and it
has been adopted by the market, e.g. in large
deployments in the customer contact sector, in the
automation in radiology dictation, or in voice enabled navigation
systems in the automotive sector.
Public awareness has increased through the use on smart-phones,
in particular Siri. The research community concentrates on
problems such as the recognition of spontaneous speech or the
easy acquisition of new languages.
quaero-catalogue-210x210-v1.6-page-per-page.indd 35 02/10/2013 09:57:34
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
36
Vocapia Research develops core multilingual large
vocabulary speech recognition technologies* for voice
interfaces and automatic audio indexing applications.
This speech-to-text technology is available for multiple
languages. (* Under license from LIMSI-CNRS)
Target users and customers
The targeted users and customers of speech-to-text
transcription technologies are actors in the multimedia and call
center sector, including academic and industrial organizations
interested in the automatic mining processing of audio or
audiovisual documents.
Partners:
Vocapia
Application sectors
This core technology can serve as the basis for a variety of
applications: multilingual audio indexing, teleconference
transcription, telephone speech analytics, transcription of
speeches, subtitling…
Large vocabulary continuous speech recognition is the key
technology for enabling content-based information access
in audio and audiovisual documents. Most of the linguistic
information is encoded in the audio channel of audiovisual data,
which once transcribed can be accessed using text-based tools.
Via speech recognition, spoken document retrieval can support
random access using specific criteria to relevant portions
of audio documents, reducing the time needed to identify
recordings in large multimedia
databases. Some applications are data-mining, news-on-
demand, and
media monitoring.
quaero-catalogue-210x210-v1.6-page-per-page.indd 36 02/10/2013 09:57:48
37
Automatic Speech Transcription
Contact details:
Bernard Prouts
prouts@vocapia.com
contact@vocapia.com
+33 (0)1 84 17 01 14
Description:
Technical requirements: Conditions for access and use:
Vocapia Research
28, rue Jean Rostand
Parc Orsay Université
91400 Orsay France
www.vocapia.com
PC with Linux platform (via licensing use). The VoxSigma software is available both via licensing
and via our web service.
The Vocapia Research speech transcription system
transcribes the speech segments located in an audio
file. Currently systems for 17 languages varieties are
available for broadcast and web data. Conversational
speech transcription systems are available for 7
languages.
The transcription system has two main components:
an audio partitioner and a word recognizer.
The audio partitioner divides the acoustic signal into
homogeneous segments, and associates appropriate (document
internal) speaker labels with the segments.
For each speech segment, the word recognizer determines
the sequence of words, associating start and end times and a
confidence measure for each word.
quaero-catalogue-210x210-v1.6-page-per-page.indd 37 02/10/2013 09:57:49
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
38
Language Resources production
infrastructure
Target users and customers
Linguistic resources providers•	
Audio content transcribers; Media transcribers•	
Speech processing users and developers•	
Partners:
Vecsys
LIMSI-CNRS
Application sectors
Language resources production•	
Speech technology industry•	
Media subtitling, conferences and meetings•	
transcription services
quaero-catalogue-210x210-v1.6-page-per-page.indd 38 02/10/2013 09:58:03
39
Corinat®
Contact details:
Ariane Nabeth-Halber
anabeth@vecsys.fr
Description:
Technical requirements: Conditions for access and use:
Vecsys
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
http://www.vecsys.fr
Standard Web access Quotation on request
Corinat® is a hardware/software infrastructure
for language resources production that offers the
following functionalities:
Data collection (broadcast, conversational)•	
Audio data automatic pre-processing•	
Annotation tasks distribution•	
Annotations semi-automatic post-processing•	
Corinat® is a high availability platform (24/7), with a web-based
interface for language resources production management in any
location.
quaero-catalogue-210x210-v1.6-page-per-page.indd 39 02/10/2013 09:58:07
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
40
Vocapia Research provides a language
identification technology* that can identify
languages in audio data.
(* Under license from LIMSI-CNRS)
Target users and customers
The targeted users and customers of language
recognition technologies are actors in the multimedia
and call center sectors, including academic and industrial
organizations, as well as actors in the defense domain,
interested in the processing of audio documents, and in
particular if the collection of documents contains multiple
languages.
Partners:
Vocapia
Application sectors
A language identification system can be run prior to
a speech recognizer. Its output is used to load the
appropriate language dependent speech recognition
models for the audio document.
Alternatively, the language identification might be used to
dispatch audio documents or telephone calls to a human
operators fluent in the corresponding identified language.
Other potential applications also involve the use of LID
as a front-end to a multi-lingual translation system. This
technology can also be part of automatic system for
spoken data retrieval or automatic enriched transcriptions.
quaero-catalogue-210x210-v1.6-page-per-page.indd 40 02/10/2013 09:58:21
41
Language Identification
Contact details:
Bernard Prouts
prouts@vocapia.com
contact@vocapia.com
+33 (0)1 84 17 01 14
Description:
Technical requirements:
Conditions for access and use:
Vocapia Research
28, rue Jean Rostand
Parc Orsay Université
91400 Orsay
www.vocapia.com
PC with Linux platform (via licensing use).
The VoxSigma software is available both via
licensing and via our web service.
The VoxSigma software suite can recognize the
language spoken in an audio document or in speech
segments defined in an input XML file. The default
set of possible languages and their associated models
can be specified by the user.
LID systems are available for broadcast and
conversational data. Currently 15 languages
for broadcast news audio and 50 languages for
conversational telephone speech are included in the
respective VR LID system. New languages can easily
be added to the system.
The VoxSigma software suite uses multiple phone-based
decoders in parallel to take a decision about which language is in
the audio file.
The system specifies the language of the audio document along
with a confidence score. In the current version, it is assumed that
a channel of an audio document is in a single language. In future
versions, it is planned to allow multiple languages in a single
document.
quaero-catalogue-210x210-v1.6-page-per-page.indd 41 02/10/2013 09:58:21
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
42
Transcription of human speech into
written word sequences
Target users and customers
Companies who want to integrate the transcription
of human speech into their products.
Partners:
Karlsruhe Institute of Technology (KIT)
Application sectors
Speech-to-Text technology is key to indexing
multimedia content as it is found in multimedia
databases or in video and audio collections on
the World Wide Web, and to make it searchable
by human queries. In addition, it offers a natural
interface for submitting and executing queries.
This technology is further part of speech-translation
services. In communication with machine translation
technology, it is possible to design machines that
take human speech as input and translate it into a
new language. This can be used to enable human-to-
human combination across the language barrier or to
access languages in a cross-lingual way.
quaero-catalogue-210x210-v1.6-page-per-page.indd 42 02/10/2013 09:58:35
43
Speech-to-Text
Contact details:
Prof. Alex Waibel
waibel@ira.uka.de
Description:
Technical requirements:
Conditions for access and use:
Karlsruhe Institute of Technology (KIT)
Adenauerring 2
76131 Karlsruhe Germany
http://isl.anthropomatik.kit.edu
Linux based server with 2GB of RAM.
Available for licensing on a case-by-case basis.
The KIT speech transcription system is based on the
JANUS Recognition Toolkit (JRTk) which features
the IBIS single pass decoder. The JRTk is a flexible
toolkit which follows an object-oriented approach and
which is controlled via Tcl/Tk scripting.
Recognition can be performed in different modes:
In offline mode, the audio to be recognized is first
segmented into sentence-like units. Theses segments
are then clustered in an unsupervised way according
to speaker. Recognition can then be performed
in several passes. In between passes, the models
are adapted in an unsupervised manner in order
to improve the recognition performance. System
combination using confusion network combination
can be used in addition to further improve
recognition performance.
In run-on mode, the audio to be recognized is continuously
processed without prior segmentation. The output is a steady
stream of words.
The recognizer can be flexibly configured to meet given real-time
requirements, between the poles of recognition accuracy and
recognition speed.
Within the Quaero project, we are targeting the languages English,
French, German, Russian, and Spanish. Given sufficient amounts
of training material, the HMM based acoustic models can be easily
adapted to additional languages and domains.
quaero-catalogue-210x210-v1.6-page-per-page.indd 43 02/10/2013 09:58:41
Machine Translation - RWTH
Aachen University p46
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
44
quaero-catalogue-210x210-v1.6-page-per-page.indd 44 02/10/2013 09:58:54
Speech Translation - RWTH
Aachen University p48
45
quaero-catalogue-210x210-v1.6-page-per-page.indd 45 02/10/2013 09:58:54
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
46
Automatic translation of text breaks the
language barrier: It allows instant access
to information in foreign languages.
Target users and customers
Researchers•	
Developers•	
Integrators•	
Partners:
RWTH Aachen University
Application sectors
As translation quality is far below the work of
professional human translators, machine translation
is targeted to situations where instant access and low
cost are key and high quality is not demanded, for
example:
Internet search (cross-language document•	
retrieval)
Internet (on-the-fly translation of foreign-•	
language websites or news feeds)
quaero-catalogue-210x210-v1.6-page-per-page.indd 46 02/10/2013 09:59:06
47
Machine Translation
Contact details:
Volker Steinbiss
steinbiss@informatik.rwth-aachen.de
Description:
Technical requirements: Conditions for access and use:
RWTH Aachen University
Lehrstuhl Informatik 6
Templergraben 55
52072 Aachen Germany
http://www-i6.informatik.rwth-aachen.de
Translation is a memory-intense process, so the
typical set-up is to have one or several computers in
the internet serving the translation requirements of
many users.
RWTH provides open-source translation tools free
of charge for academic usage. Other usage should
be subject to a bilateral agreement.
Machine translation is a very hard problem in
computer science and has been worked on for
decades. The corpus-based methods that emerged
in the 1990’s allow the computer to actually learn
translation from existing bilingual texts – you could
say, from many translation examples.
A correct mapping is indeed not easy to learn, as the translation
of a word depends on its context, and word orders typically differ
across languages. It is fascinating to see this technology improving
over the years. The learning methods are more of a mathematical
kind and can be applied to any language pair.
quaero-catalogue-210x210-v1.6-page-per-page.indd 47 02/10/2013 09:59:12
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
48
Automatic translation of speech practically
sub-titles – in your native language! – the
speech of foreign-language speakers.
Target users and customers
Researchers•	
Developers•	
Integrators•	
Partners:
RWTH Aachen University
Application sectors
Sub-titling of broadcast via television or internet•	
Internet search in audio and video material•	
(cross-language retrieval)
quaero-catalogue-210x210-v1.6-page-per-page.indd 48 02/10/2013 09:59:25
49
Speech Translation
Contact details:
Volker Steinbiss
steinbiss@informatik.rwth-aachen.de
Description:
Technical requirements:
Conditions for access and use:
RWTH Aachen University
Lehrstuhl Informatik 6
Templergraben 55
52072 Aachen Germany
http://www-i6.informatik.rwth-aachen.de
Speech translation is a computationally and
memory-intensive process, so the typical set-up is
to have one or several computers in the internet
serving the speech translation requirements of
many users.
RWTH provides on open-source speech recognizer
and various open-source tools free of charge for
academic usage. Other usage should be subject to
a bilateral agreement.
In a nutshell, speech translation is the combination of
two hard computer science problems, namely speech
recognition (automatic transcription of speech into
text) and machine translation (automatic translation
of a text from a source to a target language).
While both technologies do not work perfectly, it is impressive to
see them working in combination, in particular when we have not
even rudimentary knowledge of the source language – for many
of us, this is the case for the Chinese or the Arabic language.
The mathematical methods behind both speech recognition
and machine translation are related, and the systems draw their
knowledge from large amounts of example data.
quaero-catalogue-210x210-v1.6-page-per-page.indd 49 02/10/2013 09:59:30
Sync Audio Watermarking
-Technicolor p52
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
50
quaero-catalogue-210x210-v1.6-page-per-page.indd 50 02/10/2013 09:59:43
SAMuSA: Speech And Music
Segmenter and Annotator -Inria
p54
Yaafe: Audio feature extractor -
Télécom ParisTech p56
51
quaero-catalogue-210x210-v1.6-page-per-page.indd 51 02/10/2013 09:59:43
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
52
Technicolor Sync Audio Watermarking
technologies
Target users and customers
Content Owners•	
Studios•	
Broadcasters•	
Content distributors•	
Partners:
Technicolor
Application sectors
Technicolor Sync Audio Watermarking allows studios and
content owners
to create more valuable and attractive content by•	
delivering premium quality information
to generate additional earnings through targeted ads,•	
e-commerce and product placement alongside main
screen content
Technicolor Sync Audio Watermarking allows
broadcasters and content distributors
to provide distinctive content and retain audiences•	
to control complementary content on the 2nd screen•	
within their branded environment
to leverage real-time, qualified behavior metadata to•	
better understand customers and deliver personalized
content and recommendations
ContentArmor™ Audio Watermarking allows content
owners to deter content leakage by tracking the source of
pirated copies.
quaero-catalogue-210x210-v1.6-page-per-page.indd 52 02/10/2013 09:59:55
53
Sync Audio Watermarking
Contact details:
Gwenaël Doërr
gwenael.doerr@technicolor.com
Description:
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs / CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
Technicolor Sync Audio Watermarking detector•	
works on Android and iOS.
The watermark embedder of both technologies•	
works on Linux and MacOS.
Both systems can be licensed as software
executables or libraries.
With Technicolor Sync Audio Watermarking
technologies, studios, content owners, aggregators
and distributors can sync live, recorded or time-
shifted content and collect qualified metadata.
And thanks to Technicolor’s expertise in both
watermarking and entertainment services, these
solutions are easily integrated into your existing post-
production, broadcast and any new media delivery
workflows.
Technicolor sync technologies open access to all the
benefits of new attractive companion app markets
with no additional infrastructure cost.
Content identification and a time stamp are inaudibly inserted
into the audio signal in post-production or during broadcast.
The 2nd screen device picks up the audio signal, decodes the
watermark and synchronizes the app on the 2nd screen thanks to
the embedded content identification data. Audio watermarking
uses the original content audio signal as its transmission channel,
ensuring compatibility with all existing TVs, PVRs or DVD/Blu-
ray players as well as legacy devices without network interfaces. It
works for realtime, time-shifted and recorded content.
quaero-catalogue-210x210-v1.6-page-per-page.indd 53 02/10/2013 09:59:56
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
54
Speech And Music Segmenter and
Annotator
Target users and customers
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in audio document
processing.
Partners:
Inra
Application sectors
Audio and multimedia document processing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 54 02/10/2013 10:00:09
55
SAMuSA: Speech And Music Segmenter and Annotator
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Technical requirements:
Conditions for access and use:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
PC with Unix/Linux OS•	
SAMuSA is a software that has been developed at
Irisa in Rennes and is the property of CNRS and Inria.
SAMuSA is currently available as a prototype only. It
can be released and supplied under license on a case-
by-case basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Description:
As shown on Figure below, the SAMuSA module
takes an audio file or stream as an input, and returns
a text file containing detected segments of: speech,
music and silence.
To perform segmentation, SAMuSA uses audio class
models as external resources. It also calls external
tools for audio feature extraction (Spro software
[1]), and for audio segmentation and classification
(Audioseg software [2]). These tools are included in
the SAMuSA package.
Trained on hours of various TV and radio programs,
this module provides efficient results: 95% of speech and 90% of
music are correctly detected.
One hour of audio can be computed in approximately one
minute on standard computers.
[1] http://gforge.inria.fr/projects/spro/
[2] http://gforge.inria.fr/projects/audioseg/
SAMuSA was developed in Irisa/INRIA Rennes by the Metiss
team.
The SAMuSA authors are: Frédéric Bimbot, Guillaume Gravier,
Olivier Le Blouch.
The Spro author is: Guillaume Gravier
The Audioseg authors are: Mathieu Ben, Michaël Betser,
Guillaume Gravier
quaero-catalogue-210x210-v1.6-page-per-page.indd 55 02/10/2013 10:00:13
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
56
Yaafe is a low-level and mid-level audio
features extractor, designed to extract large
number of features over large audio files.
Target users and customers
Targeted integrators and users are industrial or
academic laboratories in the field of audio signal
processing and in particular for music information
retrieval tasks.
Partners:
Télécom ParisTech
Application sectors
Music information retrieval.•	
Audio segmentation.•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 56 02/10/2013 10:00:25
57
Yaafe: Audio feature extractor
Contact details:
S. Essid
slim.essid@telecom-paristech.fr
Description:
Technical requirements:
Conditions for access and use:
Télécom ParisTech
37 rue Dareau
75014 Paris / France
http://www.tsi.telecomparistech.fr/aao/en/2010/02/19/
yaafe-audio-feature-extractor/
Yaafe is a C++/Python software available for linux
and Mac.
Yaafe has been released under LGPL licence
and is available for download on Sourceforge.
Some mid-level feature ARE available in a
separate library, with a proprietary licence.
Yaafe is designed to extract a large number of
features simultaneously, in an efficient way. It
automatically optimizes features’ computation, so
that each intermediate representation (spectrum,
CQT, envelope, etc…) is computed only once.
Yaafe works in a streaming mode, so it has a low
memory footprint and can process arbitrarily long
audio files.
Available features are spectral features, perceptual features
(loudness), MFCC, CQT, chroma, chords, onsets detection.
A user can select his own set of features and transformations
(derivative, temporal integration), and easily adapt all
parameters to his own task.
quaero-catalogue-210x210-v1.6-page-per-page.indd 57 02/10/2013 10:00:32
Colorimetric Correction System -
Jouve p60
Document Reader -A2iA p66
Handwriting Recognition System -
Jouve p72
Recognition of Handwritten Text -
RWTH Aachen University p78
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
58
quaero-catalogue-210x210-v1.6-page-per-page.indd 58 02/10/2013 10:00:44
Document Classification System -
Jouve p62
Document Structuring System -
Jouve p68
Image Descreening System - Jouve
p74
Document Layout Analysis System
- Jouve p64
Grey Level Character Recognition
System -Jouve p70
Image Resizing for Print on
Demand Scanning - Jouve p76
59
quaero-catalogue-210x210-v1.6-page-per-page.indd 59 02/10/2013 10:00:44
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
60
A specific tool to create a suitable
colorimetric correction and check its
stability over time
Target users and customers
Everyone who has to deal with highcolorimetric
constraints.
Partners:
Jouve
Application sectors
Patrimony•	
Industry•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 60 02/10/2013 10:00:58
61
Colorimetric Correction System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
The system uses a file containing reference values of
calibration target and the image obtained from target
scanning. A profile is created from this file. In order to
improve correction, a table of colors transformation is
integrated to the system.
To guarantee the required quality, the system checks several
times the values of a calibration target against the specifications.
quaero-catalogue-210x210-v1.6-page-per-page.indd 61 02/10/2013 10:01:04
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
62
A generic tool for classifying documents
based on a hybrid learning technique
Target users and customers
Everyone who has to deal with document
classification with a large amount of already classified
documents.
Partners:
Jouve
Application sectors
Industrial property•	
Scientific Edition•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 62 02/10/2013 10:01:17
63
Document Classification System
Contact details:
Gustavo Crispino
gcrispino@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
30, rue du Gard
62300 Lens France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
The 100% automatic system is based on linguistic
resources that are extracted from already classified
documents.
On a 100 classes patent preclassification task, this system
achieves 85% precision (that is 5% better than human operators
for this task).
quaero-catalogue-210x210-v1.6-page-per-page.indd 63 02/10/2013 10:01:22
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
64
A generic tool to identify and extract
regions of text by analyzing connected
components
Target users and customers
Everyone who has to deal with document image
analysis.
Layout analysis is the first major step in a document
image analysis workflow. The correctness of
the output of page segmentation and region
classification is crucial as the resulting representation
is the basis for all subsequent analysis and
recognition processes.
Partners:
Jouve
Application sectors
Industry•	
Service•	
Patrimony•	
Edition•	
Administration•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 64 02/10/2013 10:01:35
65
Document Layout Analysis System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
The system identifies and extracts regions of text by
analyzing connected components constrained by
black and white (background) separators. The rest is
filtered out as non-text. First, the image is binarized,
any skew is corrected and black page borders are removed.
Subsequently, connected components are extracted and filtered
according to their size (very small components are filtered out).
quaero-catalogue-210x210-v1.6-page-per-page.indd 65 02/10/2013 10:01:43
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
66
Classification of all types of paper
documents, Data Extraction and Mail
Processing and Workflow Automation
Target users and customers
Independent Software Vendors•	
Business Process Outsourcers•	
Partners:
A2iA
Application sectors
Bank, Insurance, Administration, Telecom and Utility
Companies, Historical Document Conversion
quaero-catalogue-210x210-v1.6-page-per-page.indd 66 02/10/2013 10:01:55
67
Document Reader
Contact details:
Venceslas Cartier
venceslas.cartier@a2ia.com
Technical requirements:
Conditions for access and use:
A2iA
39, rue de la Bienfaisance
75008 Paris France
www.a2ia.com
Wintel Platform
Upon request
Description:
Classification of all types of paper documents
A2iA DocumentReader classifies digitized
documents into user-defined classes or “categories”
(letters, contracts, claim forms, accounts receivable,
etc.) based on both their geometry and their content.
The software analyzes the layout of items on the
document. Then, using a general dictionary and trade
vocabulary, it carries out a literal transcription of the
handwritten and/or typed areas.
A2iA DocumentReader can then extract key-words
or phrases in order to determine the category of the
document.
Data Extraction
A2iA DocumentReader uses 3 methods to extract
data from all types of paper documents:
Extraction from predefined documents. Some
documents (such as checks, bank documents
and envelopes) are preconfigured within A2iA
DocumentReader. The software recognizes their
structure, the format of data to be extracted and
their location on the document.Extraction from
structured documents. A2iA DocumentReader recognizes and
extracts data within a fixed location on the document.Extraction
from semi-structured documents. The layout of the document
varies but the data to be extracted remains unchanged. A2iA
DocumentReader locates this data by its format and the proximity
of key-words, wherever they appear on the document.
Mail Processing and Workflow Automation
A2iA DocumentReader analyzes the entire envelope or folder on
a wholistic level, just as a human would, to identify its purpose and
subject-matter (termination of subscription, request for assistance,
change of address, etc.). All of the documents together can have
a different meaning or purpose than a single document on its
own. A2iA DocumentReader then transmits the digital data to the
classification application in order to route the mail to the correct
person or department. Mail is sent to the appropriate location as
soon as it arrives: processing and response times are minimized,
workflow automated, and manual labor decreased.
quaero-catalogue-210x210-v1.6-page-per-page.indd 67 02/10/2013 10:01:56
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
68
A generic tool to recognize the logical
structure of documents from a OCR stream
Target users and customers
Everyone who has to deal with electronic document
encoding of from the original source material
and needs to consider the hierarchical structure
represented in the digitized document.
Partners:
Jouve
Application sectors
Industry•	
Service•	
Patrimony•	
Administration•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 68 02/10/2013 10:02:10
69
Document Structuring System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
The system recognizes the logical structure of
documents from a OCR stream in accordance with
the descriptions of a model (DTD, XML Schema).
The result is a hierarchically structured flow. The model involves
both knowledge of the macro-structure of the documents and
the micro-structure of their content.
quaero-catalogue-210x210-v1.6-page-per-page.indd 69 02/10/2013 10:02:16
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
70
A recognition engine for degraded
printed documents
Target users and customers
Everyone who has to deal with Character recognition
on grey level images. Specifically targeted for low
quality documents, the system also outperforms on
the shelf OCR engines for good quality images.
Partners:
Jouve
Application sectors
Heritage scanning•	
Printing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 70 02/10/2013 10:02:29
71
Grey Level Character Recognition System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Description:
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Despite all other OCR engines, this system processes
grey level images directly (without using a temporary
B&W image).
Using all the information present in the image, this system is
able to recognize degraded characters.
quaero-catalogue-210x210-v1.6-page-per-page.indd 71 02/10/2013 10:02:32
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
72
Capture handwritten and machine-
printed data from documents
Target users and customers
Everyone who has to deal with forms containing
handwritten fields or to process incoming mails
Partners:
Jouve
Application sectors
Banking•	
Healthcare•	
Government•	
Administration•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 72 02/10/2013 10:02:44
73
Handwriting Recognition System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
JOUVE ICR (Intelligent Character Recognition)
engine is a combination of two complementary
systems: HMM and multidimensional recurrent
neural networks.
This engine has the advantage of dealing with input data of
varying size and taking the context into account. JOUVE ICR
carries on increasing recognition rate of handwritten fields in
forms, using links between the fields.
quaero-catalogue-210x210-v1.6-page-per-page.indd 73 02/10/2013 10:02:50
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
74
A system that removes annoying
halftones in scanned images
Target users and customers
Everyone who has to deal with high quality
reproduction of halftone images.
Partners:
Jouve
Application sectors
Heritage scanning•	
Printing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 74 02/10/2013 10:03:02
75
Image Descreening System
Contact details:
Christophe Lebouleux
clebouleux@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
Halftone is a process to reproduce photographs or
other images in which the various tones of grey or
color are produced by variously sized dots of ink.
When a document using this process is scanned, a
very uncomfortable screening effect may appear.
The system uses a combination of removal of peaks in Fourier
image and local Gaussian blur.
quaero-catalogue-210x210-v1.6-page-per-page.indd 75 02/10/2013 10:03:10
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
76
A specific tool for recreating matter
that was lost during the scanning
process of bonded books.
Target users and customers
Everyone who has to deal with high quality
reproduction of bonded books.
Partners:
Jouve
Application sectors
Heritage scanning•	
Printing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 76 02/10/2013 10:03:22
77
Image Resizing for Print on Demand Scanning
Contact details:
Christophe Lebouleux
clebouleux@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system•	
Grey level or color images•	
Ask Jouve
Description:
In many cases, when documents have been debinded
before scanning (that suppresses a part of the
original), we are asked to provide an image at the
original size, and sometimes to provide larger images
than the original for reprint purpose.
Using Seam Carving technique, we are able to obtain very
realistic results.
quaero-catalogue-210x210-v1.6-page-per-page.indd 77 02/10/2013 10:03:30
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
78
Recognition of handwritten text
transforms handwritten text into machine-
readable text on a computer.
Target users and customers
Researchers•	
Developers•	
Integrators•	
Partners:
RWTH Aachen University
Application sectors
Recognition of printed or handwritten text is heavily
used in the mass processing of paper mail, filled-out
forms and letters e.g. to insurance companies, and
has been covered by the media in connection with
the mass digitization of books. New usage patterns
will evolve from the better coverage of handwriting
and difficult font systems like Arabic or Chinese and
from the recognition of text in any form of image
data that due to digital cameras and the Internet, is
being produced and distributed in ever increasing
volumes.
quaero-catalogue-210x210-v1.6-page-per-page.indd 78 02/10/2013 10:03:42
79
Recognition of Handwritten Text
Contact details:
Volker Steinbiss
steinbiss@informatik.rwth-aachen.de
Technical requirements:
Conditions for access and use:
RWTH Aachen University
Lehrstuhl Informatik 6
Templergraben 55
52072 Aachen Germany
http://www-i6.informatik.rwth-aachen.de
The text needs to be available in digitized form, e.g.
through a scanner as part of a digital image or video.
Processing takes place on a normal computer.
RWTH does currently not provide public access to
software in this area. Any usage should be subject to
a bilateral agreement.
Description:
Optical character recognition (OCR) works
sufficiently well on printed text but is in particular
difficult for handwritten material. This is due to the
fact that handwritten material contains a far higher
variability than printed one.
Methods that have been proven successful in other areas
such as speech recognition and machine translation are being
exploited to tackle this set of OCR problems.
quaero-catalogue-210x210-v1.6-page-per-page.indd 79 02/10/2013 10:03:48
Image Clusterization System -
Jouve p82
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
80
quaero-catalogue-210x210-v1.6-page-per-page.indd 80 02/10/2013 10:04:00
Image Identification System -
Jouve p84
LTU Leading Image Recognition
Technologies - LTU technologies
p86
81
quaero-catalogue-210x210-v1.6-page-per-page.indd 81 02/10/2013 10:04:00
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
82
A generic tool to perform automatic
clustering of scanned images
Target users and customers
Everyone who has to group a large set of images in
such a way that images in the same group are more
similar to each other than to those in other groups,
like for instance, in incoming mail processing.
Partners:
Jouve
Application sectors
Banking•	
Insurance•	
Industry•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 82 02/10/2013 10:04:13
83
Image Clusterization System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
Two kinds of methods have been implemented. The
first method consists in applying optical character
recognition on pages. Distances are computed
between images to classify and images contained in a
database of labeled images.
The second method consists in randomly selecting a pool of
images inside a directory. For each image, invariant key points
are extracted and characteristic features are computed (SIFT or
SURF) to build the clusters.
quaero-catalogue-210x210-v1.6-page-per-page.indd 83 02/10/2013 10:04:19
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
84
A generic tool to identify
automatically documents, photos and
text zones in scanned images
Target users and customers
Everyone who has to deal with document recognition
like identity cards, passports, invoices…
Partners:
Jouve
Application sectors
Administration•	
Banking•	
Insurance•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 84 02/10/2013 10:04:31
85
Image Identification System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Technical requirements: Conditions for access and use:
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
The system searches the best match between image
signatures and model signatures. It determines
whether the same kind of model is present in the
image which has to be segmented or not.
The segmentation done on the model is reported in the
image to be segmented by applying an affine transformation
(translation, rotation and homothety).
quaero-catalogue-210x210-v1.6-page-per-page.indd 85 02/10/2013 10:04:36
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
86
Leading Image Recognition
Technologies
Target users and customers
Brands•	
Retailers•	
Social Media Monitoring companies•	
Research companies•	
Government agencies•	
Partners:
LTU technologies
Application sectors
Visual Brand Intelligence: e-reputation, brand•	
protection
Media Monitoring•	
M-Commerce and E-Commerce: augmented•	
reality, interactive catalogs, virtual Shop,
advanced search functionalities, etc.
Visual Asset Management: Images classification,•	
Images de-duplication, Images filtering,
moderation, etc.
quaero-catalogue-210x210-v1.6-page-per-page.indd 86 02/10/2013 10:04:48
87
LTU Leading Image Recognition Technologies
Contact details:
Frédéric Jahard
fjahard@ltutech.com
Technical requirements:
Conditions for access and use:
LTU technologies
Headquarter:
132 rue de Rivoli
75001 Paris, France
+33 1 53 43 01 68
Coming soon
Coming soon
US office:
232 Madison Ave
New York, NY 10016 USA
+1 646 434 0273
http://www.ltutech.com
Description:
Founded in 1999 by researchers at MIT, Oxford and
Inria, LTU provides cutting-edge image recognition
technologies and services to global companies and
organizations such as Adidas, Kantar Media and Ipsos.
LTU’s solutions are available on-demand with
LTU Cloud or on an on-premise basis with LTU
Enterprise Software. These patented image
recognition solutions enable LTU’s clients to
effectively manage their visual assets – internally and
externally – and innovate by bringing their end-users
truly innovative visual experiences.
In an image-centric world, LTU’s expertise runs the image
recognition gamut from visual search, visual data management,
investigations and media monitoring, to e-commerce, brand
intelligence, and mobile applications.
quaero-catalogue-210x210-v1.6-page-per-page.indd 87 02/10/2013 10:04:55
AudioPrint - IRCAM p90
Ircamchord: Automatic Chord
Estimation - IRCAM p96
Music Structure - Inria p102
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
88
quaero-catalogue-210x210-v1.6-page-per-page.indd 88 02/10/2013 10:05:06
Ircamaudiosim: Acoustical Similarity
Estimation - IRCAM p92
Ircammusicgenre and
Ircammusicmood: Genre and Mood
Estimation - IRCAM p98
Ircambeat: Music Tempo, Meter,
Beat and Downbeat Estimation -
IRCAM p94
Ircamsummary: Music Summary
Generation and Music Structure
Estimation - IRCAM p100
89
quaero-catalogue-210x210-v1.6-page-per-page.indd 89 02/10/2013 10:05:07
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
90
AudioPrint captures the acoustical
properties by computing a robust
representation of the sound
Target users and customers
AudioPrint is dedicated to middleware integrators
that wish to develop audio fingerprint applications
(i.e. systems for live recognition of music on air),
as well as synchronization frameworks for second
screen applications (a mobile device brings contents
directly related to the live TV program). The music
recognition application can also be used by digital
rights management companies.
Partners:
IRCAM
Application sectors
Second screen software providers•	
Digital right management•	
Music query software developers•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 90 02/10/2013 10:05:19
91
AudioPrint
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
Technical requirements: Conditions for access and use:
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
AudioPrint is available as a static library for Linux,
Mac OS X and iOS platforms.
Ircam Licence
Description:
AudioPrint is an efficient technology for live or
offline recognition of musical tracks, within a database
of learnt tracks. It captures the acoustical properties
of the audio signal by computing a symbolic
representation of the sound profile that is robust to
common alterations.
Moreover, it provides a very precise estimation of the temporal
offset within the detected musical track. This offset estimation
can be used as a means to synchronize devices.
quaero-catalogue-210x210-v1.6-page-per-page.indd 91 02/10/2013 10:05:20
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
92
Ircamaudiosim estimates the acoustical
similarity between two music tracks.
It can be used to perform music
recommendation based on music
content similarity.
Target users and customers
Ircamaudiosim allows the development of music
recommendation based on music content similarity.
It can therefore be used for any system (online or
offline) requiring music recommendation, such as
for the development of a recommendation engine
for online music service or offline music collection
browsing.
Partners:
IRCAM
Application sectors
Online music providers•	
Online music portals•	
Music players developers•	
Music software developers•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 92 02/10/2013 10:05:32
93
Ircamaudiosim: Acoustical Similarity Estimation
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
Technical requirements: Conditions for access and use:
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
Ircamaudiosim is available as software or as a
dynamic library for Windows, Mac OS-X and Linux
platform.
Ircam Licence
Description:
Ircamaudiosim estimates the acoustical similarity
between two audio tracks. For this, each music track
of a database is first analyzed in terms of its acoustical
content (timbre, rhythm, harmony). An efficient
representation of this content is used, that allows a
fast comparison between two music tracks.
Because of this, the system is scalable to large databases. Given
a target music track, the most similar (in terms of acoustical
content) items of the database can be found quickly and then
be used to provide recommendation to the listener.
quaero-catalogue-210x210-v1.6-page-per-page.indd 93 02/10/2013 10:05:33
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
94
Ircambeat software estimates the global and
time-variable tempo and meter of a music
file. It also estimates the positions of the
beats and downbeats over time.
Target users and customers
Tempo and meter of a music file are among the major
perceptual characteristics of a music file. Their automatic
estimation allows to get these values for large collections
of music files. They can therefore be used to perform
automatic music classification of large music collections,
search by similarity over large music collections and
automatic music play-list generation. The technology can
therefore benefit to music providers, online music portals
or offline media-player developers.
Beats and downbeats define the time-grid of a music
file. They are used as front-end – for the estimation of
many other music parameters and – for other processings
(time-stretching, segmentation, DJ-ing). The technology
for their automatic estimation can therefore benefit to
music software developers (music production, music DJ-
ing software).
Partners:
IRCAM
Application sectors
Online music providers•	
Online music portals•	
Music players developers•	
Music software developers•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 94 02/10/2013 10:05:46
95
Ircambeat: Music Tempo, Meter, Beat and Downbeat Estimation
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
Technical requirements: Conditions for access and use:
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
Ircambeat is available as software or as dynamic
library for Windows, Mac OS-X and Linux platform.
Ircam Licence
Description:
Ircambeat performs the automatic estimation of the
global and time-variable tempo and meter of a music
file, as well as the estimation of the position of the
beats and downbeats in a music file.
For this, each digital music file is analyzed in terms
of its time and frequency content in order to detect
salient musical events. Periodicities of the musical events are
then analyzed over time at various scales to get the tempo
and meter. Beats and downbeats positions are estimated using
music templates based on machine learning and musical theory,
to get a precise time positioning.
quaero-catalogue-210x210-v1.6-page-per-page.indd 95 02/10/2013 10:05:46
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
96
Ircamchord software estimates
automatically the temporal succession of
music chords (C-Major, C-minor, …) that
makes up a piece of music.
Target users and customers
One of the most important perceptual aspects of popular
music is the succession of chords over time. Two tracks
based on the same chord succession are perceived
very similar and sometimes indicate a cover-version of
the same composition. Automatic estimation of chord
succession can therefore be used to perform search by
similarity and play-list generation.
It can therefore benefit to music providers, online music
portals.
Chord notation is also very popular for beginner
musicians (a very large amount of guitar tabs are
accessible and used over the web). Estimating
automatically the chord succession of a given track can
therefore be beneficial to personal users through the
inclusion of the technology in local software.
Partners:
IRCAM
Application sectors
Online music providers•	
Online music portals•	
Music players developers•	
Music software developers•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 96 02/10/2013 10:05:59
97
Ircamchord: Automatic Chord Estimation
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
Technical requirements: Conditions for access and use:
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
Ircamchord is available as software or as a dynamic
library for Windows, Mac OS-X and Linux platform.
Ircam Licence
Description:
Ircamchord performs the automatic estimation of
the chord succession of a music track using a 24
chord dictionary (C-Major, C-minor…).
For this, the harmonic content of a music file is first
extracted in a beat-synchronous way.
A statistical model (double-state hidden Markov model)
representing music theory (chord transition), expected
downbeat positions and estimated local-key is used for a precise
estimation.
quaero-catalogue-210x210-v1.6-page-per-page.indd 97 02/10/2013 10:05:59
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
98
Ircammusicgenre and Ircammusicmood software
estimate automatically the belonging of a music
track to a set of music genre (electronica, jazz,
pop/rock…) and music mood classes (positive,
sad, powerful, calming…)
Target users and customers
Classification of music items are generally primarily based on
their belonging to a music genre: electronica, jazz, pop/ rock…
However, the editorial meta-data related to the genre are
generally only accessible at the artist level (the whole set of music
tracks produced by one artist will belong to the same music genre
whatever the tracks content). Ircammusicgenre is a software
which allows the automatic estimation of the belonging of a music
track to music genres. The list of music genres considered by the
software can be pre-determined by Ircam (electronica, jazz, pop/
rock…) or can be adapted to categories relevant to the partner,
provided a sufficient number of sound examples per category.
Ircammusicgenre also allows to perform multi-labeling of a music
track, i.e. assigning a set of genre labels instead of a single genre. In
this case, a weighting is assigned to each estimated label.
Ircammusicmood is a software which allows the automatic
estimation of the music mood of a music track to music mood.
Music mood relates to the “mood” that a track suggests: positive,
sad, powerful, calming…
As for the music genre, the list can be predetermined by Ircam or
discussed with the partner. Multi-labels can also be applied to the
music mood classification.
Partners:
IRCAM
Application sectors
Online music providers•	
Online music portals•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 98 02/10/2013 10:06:12
99
Ircammusicgenre and Ircammusicmood: Genre and Mood Estimation
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
Technical requirements: Conditions for access and use:
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
Ircammusicgenre and Ircammusicmood are available
as software or as a dynamic library for Windows,
Mac OS-X and Linux platform.
Ircam Licence
Description:
Ircammusicgenre and Ircammusicmood are based
on the Ircamclassifier technology.
Ircamclassifier allows to learn new concepts related
to music contents by training on example databases.
For this, a large set of audio features are extracted
from labeled music items and are used to find
relationships between the labels and the example
audio contents. Ircamclassifier uses over 500 different
audio features, performs automatic feature selection and
statistical model parameter selection.
Ircamclassifier uses a full-binarization process of the labels and
a set of SVM classifiers. Mono-labeling and multi-labeling are
obtained from the set of SVM decisions. Performances and
computation time of the resulting trained system are then
optimized for a specific tasks given a ready-to-use system for
music-genre or musicmood.
quaero-catalogue-210x210-v1.6-page-per-page.indd 99 02/10/2013 10:06:13
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
100
Ircamsummary software creates automatically a
music summary of an audio file. It also estimates
the temporal structure of a music file in terms of
repetition of similar parts
Target users and customers
Automatic music summary generation aims at providing
informative audio preview of the content of a music file
(rather than the commonly used first 30s). It can therefore
benefit to any service providing access to music items
requiring a quick preview of the music files such as music
providers, online music portals. It can also be installed on
a personal computer as a preview of the local user’s music
collection.
Automatic music structure estimation provides the
description of the temporal organization of music files in
terms of repetition of parts over time. It can be used for
visualization and interaction with the playing of a music
file (intelligent forward/ backward, accessing directly the
most repeated parts). It can benefit to any developer of
music players or software for music interaction.
Partners:
IRCAM
Application sectors
Online music providers•	
Online music portals•	
Music players developers•	
Music software developers•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 100 02/10/2013 10:06:26
101
Ircamsummary: Music Summary Generation and Music Structure Estimation
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
Technical requirements: Conditions for access and use:
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
Ircamsummary is available as software or as a
dynamic library for Windows, Mac OS-X and Linux
platform.
Ircam Licence
Description:
Ircamsummary performs the automatic generation
of music audio summaries. It uses various strategies:
the most representative extract (in terms of
content repetition and content position), down-
beat synchronous concatenation of the most
representative parts. The summary can also be
parameterized by the user in terms of duration of the
summary (from 10s to 30s).
Ircamsummary also provides the estimation of the
structure of music files in terms of repetition of parts
(such as verse, chorus bridge… but without explicit
labeling of the parts).
For this, Ircamsummary extracts the timbral, harmonic and
rhythmic content of a music file over time and analyzes content
repetition using two strategies: sequence repetition and state
repetition.
The generation of the audio summary is parametrizable in type
(continuous summary/or summary obtained by concatenating
the most informative parts) and in duration. The estimation of
the structure is parametrizable in terms of number of parts and
part’s type (sequence or state).
quaero-catalogue-210x210-v1.6-page-per-page.indd 101 02/10/2013 10:06:27
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
102
Three music structure estimation
systems
Target users and customers
Music industry actors•	
Industrial laboratories interested in automatic•	
music analysis
Partners:
Inria
Application sectors
Music description•	
Music indexing•	
Music analysis and creation•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 102 02/10/2013 10:06:39
103
Music Structure
Contact details:
Frédéric Bimbot
frederic.bimbot@irisa.fr
Technical requirements:
Conditions for access and use:
IRISA/PANAMA Research Team
Campus de Beaulieu
35042 Rennes Cedex France
https://team.inria.fr/panama/projects/music-structure/
All: PC or Mac with Matlab (signal processing and statistics toolboxes)
System 1 (2010) requires the mfcc extractor from the MA Toolbox by
Slaney and Logan, and chroma and beat extractors developed by Ellis
(Coversongs project, LabRosa)
System 2 (2011) requires the chord estimation by Ueda (University of
Tokyo), the beat and downbeat trackers by Davies (INESC Porto), and
Matlab edit distance script by Miguel Castro (Matlab Central)
System 3 (2012) requires the Chroma Toolbox by Muller and Ewert
(Max-Planck-Institut für Informatik) and the beat and downbeat trackers
by Davies (INESC Porto)
The three systems have been developed at Irisa in Rennes and are the
property of Université de Rennes 1, CNRS and Inria. They are currently
prototypes provided by IRISA/PANAMA under the « Creative
Commons Attribution-NonCommercial-ShareAlike 3.0″ license (http://
creativecommons.org/licenses/by-nc-sa/3.0/legalcode)
Gabriel Sargent
gabriel.sargent@irisa.fr
Description:
The three systems produce an estimation of the
semiotic structure of the music piece considered, i.e.
a description of its macroscopic organization through
a set of structural segments labeled according to the
similarity of their musical content.
They consist in three steps: a feature extraction step,
a segmentation step based on feature analysis under
a time regularity constraint, and a labeling step based
on hierarchical clustering.
System 1 (2010) uses timbre homogeneity, tonal content
repetitions and short sound events for segmentation. Resulting
segments are clustered according to their timbre.
System 2 (2011) performs a segmentation through chord
repetitions. Resulting segments are clustered according to the
similarity of their chord sequence.
System 3 (2012) considers an internal model of the structural
segments for segmentation. Resulting segments are clustered
according to the similarity of their tonal content.
Authors: Gabriel Sargent, Frédéric Bimbot, Emmanuel Vincent
quaero-catalogue-210x210-v1.6-page-per-page.indd 103 02/10/2013 10:06:44
SYRIX: Information retrieval system
in context - IRIT p106
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
104
quaero-catalogue-210x210-v1.6-page-per-page.indd 104 02/10/2013 10:06:57
105
quaero-catalogue-210x210-v1.6-page-per-page.indd 105 02/10/2013 10:06:57
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
106
Information Retrieval System
in Context
Target users and customers
The targeted users and customers are the search
engine actors, and all industrialists interested in
document retrieval.
Partners:
IRCAM
Application sectors
Document retrieval•	
Information recommendation•	
Advertising•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 106 02/10/2013 10:07:10
107
SYRIX: Information retrieval system in context
Contact details:
General issues:
Lynda Tamine
lechani@irit.fr
Technical requirements:
Conditions for access and use:
IRIT/SIG team
118, Route de Narbonne
31062 Toulouse Cedex 09 France
http://www.irit.fr/
PC with Unix/Linux•	
This software requires a front-of search engine,•	
ODP ontology provided by DMOZ editor *
* http://www.dmoz.org/
SyRiX is a software that has been developed at
IRIT-SIG Toulouse and is the property of IRIT.
SyRiX can be supplied under license on a case-by-
case basis. For more information, please contact
Lynda Tamine Lechani at Lynda.Lechani@irit.fr
Mohand Boughanem
bougha@irit.fr
Description:
SyRiX can be considered to be (1) a contextual
search engine itself (2) a contextual document re-
ranker component in the sense that is intended to be
plugged to a search engine in order to personalize
the initial ranking using evidence issued from the user
profile.
Figure 1 gives a general overview of SyRix’ main functionalities.
quaero-catalogue-210x210-v1.6-page-per-page.indd 107 02/10/2013 10:07:14
IRINTS: Irisa News Topic
Segmenter - Inria p110
SloPy: Slope One with Privacy -
Inria p116
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
108
quaero-catalogue-210x210-v1.6-page-per-page.indd 108 02/10/2013 10:07:27
Sentiment Analysis and Opinion
Mining - Synapse Développement
p112
Persons, Places, Date,
Organizations & Events Recognition
- Synapse Développement p114
109
quaero-catalogue-210x210-v1.6-page-per-page.indd 109 02/10/2013 10:07:27
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
110
Topic segmentation of automatic
speech transcripts
Target users and customers
The targeted users and customers are the
multimedia industry actors, and any content and
service provider with speech data.
Partners:
Inria
Application sectors
Spoken document processing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 110 02/10/2013 10:07:39
111
IRINTS: Irisa News Topic Segmenter
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Technical requirements:
Conditions for access and use:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
SPC with Unix/Linux OS•	
IRINTS requires a C compiler, Perl [1], the libxml2•	
[2] library, and the TreeTagger [3] software to be
installed on the system
[1] http://www.perl.org/
[2] http://xmlsoft.org/
[3] http://www.ims.uni-stuttgart.de/projekte/corplex/
TreeTagger/
IRINTS is a software that has been developed at Irisa in
Rennes and is the property of CNRS (DI 03033-01) and
Inria. Registration at the Agency for Program Protection
(APP) in France, is currently under process.
License can be supplied under request on a case-by-case
basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Description:
IRINTS (Irisa News Topic Segmenter) was designed
for topic segmentation of broadcast news transcripts.
The distribution includes a front-end script,
‘irints’, which is merely a wrapper to the main
‘topic-segmenter’ program included herein (topic-
segmenter, release 1.1 [1] ).
The topic-segmenter program is a software
dedicated to topic segmentation of texts and
(automatic) transcripts, mostly based on lexical
cohesion, implementing (and extending) a method
described in [2].
A bunch of goodies, such as the use of alternate knowledge
sources, were added. For more details (and assuming you can
read the French language), please refer to [3].
As shown on figure 1 below, input to IRINTS is an automatic
transcript (in Vecsys’s VOX format or IRISA’s SSD format). The
output is an XML file in SSD format specifying topic segments.
[1] http://gforge.inria.fr/projects/topic-segmenter/
[2] Masao Utiyama and Hitoshi Isahara, «A Statistical Model for
Domain-Independent Text Segmentation», ACL, 491–498, 2001
[3] S. Huet, G. Gravier and P. Sébillot, «Un modèle multisources
pour la segmentation en sujets de journaux radiophoniques», in
Proc. Traitement Automatique des Langues Naturelles, 2008.
IRINTS was developed at Irisa in Rennes by the Texmex and
Metiss teams.
The IRINTS authors are: Guillaume Gravier, Camille
Guinaudeau
quaero-catalogue-210x210-v1.6-page-per-page.indd 111 02/10/2013 10:07:42
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
112
This technology synthesizes the opinions
at elementary level subsection (strongly
negative opinion), and at higher levels
(comment, document) and topics in order to
provide a new entry point.
Target users and customers
Any organization that wants to qualify, follow and
analyze the contents it manages or that is created on
the Internet.
Partners:
Synapse Développement
Exalead
Technicolor
Yacast
Application sectors
Monitoring of influence operations•	
Fight against disinformation•	
Networks of opinion mapping•	
E-reputation•	
Summary classification of consumer reviews•	
Detection of positions on social networks•	
Make graphic analysis on the reviews to highlight•	
the tendencies and the key concepts
Analyze the consumer insight to better•	
understand him
quaero-catalogue-210x210-v1.6-page-per-page.indd 112 02/10/2013 10:07:55
113
Sentiment analysis and Opinion mining
Contact details:
Patrick Séguéla
patrick.seguela@synapse-fr.com
(+33)(0)5.61.63.03.74
Technical requirements:
Conditions for access and use:
Synapse Développement
33, rue Maynard
31000 Toulouse France
http://www.synapse-developpement.fr/
No technical constraint. It can be accessed from
Linux or Windows OS.
SDK available for integration in programs or Web
services.
www.synapse-fr.com/sitepro/index.html
Autre partenaire:
Priberam
Lisbon, Portugal
Description:
The rise of social media such as blogs and
social networks has fueled interest in sentiment
analysis. With the proliferation of reviews, ratings,
recommendations and other forms of online
expression, online opinion has turned into a kind of
virtual currency for businesses willing to market their
products, identify new opportunities and manage
their reputations. As businesses tend to automate the
process of filtering out the noise, understanding the
conversations, identifying the relevant content, many
are now looking into the field of sentiment analysis.
By investing in predictive analytics tools and other search
solutions, businesses can gain valuable insights from their data
and better serve the needs of their clients.
This technology synthesizes the opinions at elementary level
subsection (strongly negative opinion), and at higher levels
(comment, document) and topics in order to provide a new
entry point.
The opinions are tagged with 3 pieces of information: 1/ The
polarity; 2/ The intensity; 3/ A semantic category indicating the
degree of involvement of the author.
quaero-catalogue-210x210-v1.6-page-per-page.indd 113 02/10/2013 10:08:02
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
114
Competitive intelligence always concerns
organizations, people, places, products,
etc. This technology aims at tagging
information in a text flow.
Target users and customers
Exploitation of mass of unstructured data•	
Anonymization of sensitive data•	
Strategic, business or competitive intelligence•	
Partners:
Synapse Développement
Exalead
Yacast
Application sectors
Search and indexing in unstructured documents•	
Document processing•	
Machine Reading and rich indexing•	
Interconnection between metadata and•	
unstructured content
Intelligence, press and social networks•	
monitoring
Automatic text understanding•	
Automatic annotation of content•	
Document classification•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 114 02/10/2013 10:08:15
115
Persons, Places, Date, Organizations & Events Recognition
Contact details:
Patrick Séguéla
patrick.seguela@synapse-fr.com
(+33)(0)5.61.63.03.74
Technical requirements:
Conditions for access and use:
Synapse Développement
33, rue Maynard
31000 Toulouse France
http://www.synapse-developpement.fr/
No technical constraint. It can be accessed from
Linux or Windows OS.
SDK available for integration in programs or Web
services.
For specific conditions of use and to see our demo,
please contact us.
www.synapse-fr.com/sitepro/index.html
You can grant access to our demo website
Description:
Competitive intelligence always concerns
organizations, people, places, products, etc. This
technology aims at tagging information in a text flow.
The information automatically annotated is basically:
person’s name, functions, organizations, dates, events,
places, addresses, phone numbers, e-mail addresses
and amounts.
The technology is accurate for all types of texts,
whatever the field. Whether legal or military
posts, journalistic dispatches on terrorist acts or
on economics news, it identifies the actors, their
functions and relationships, as well as details of the
events encountered. User can integrate its own
dictionaries in the technology.
quaero-catalogue-210x210-v1.6-page-per-page.indd 115 02/10/2013 10:08:19
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
116
Slope One with Privacy
Target users and customers
The targeted users and customers are all the Internet
actors providing personalized services to their users,
interested by integrating recommender systems that
are more respectful of their privacy.
Partners:
Inria
Application sectors
Personalization•	
Recommender systems•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 116 02/10/2013 10:08:32
117
SloPy: Slope One with Privacy
Contact details:
Sébastien Gambs
sgambs@irisa.fr
SlopPy was developed in Irisa/INRIA Rennes by the CIDRE team
by Sébastien Gambs and Julien Lolive.
Technical requirements:
Conditions for access and use:
Inria Rennes
Campus Universitaire de Beaulieu
35042 Rennes Cedex France
www.inria.fr
SPC with Java installed•	
Access to the TOR anonymous communication•	
network
Installation of a library implementing•	
homomorphic encryption such as BouncyCastle
Deployment of a server responsible for creating•	
and updating the matrices needed for the
recommendation
SlopPy is currently available as a prototype only.
It can be released and supplied under license on a
case-by-case basis.
Description:
SlopPy (for Slope One with Privacy ) [1] is both a
privacy-preserving version of the recommendation
algorithm Slope One and a recommendation
architecture built around this algorithm in which a
user never releases directly his personal information
(i.e., his ratings) to a trusted third party. The figure
below illustrates the architecture of the SlopPy
recommender system.
More precisely in SlopPy, each user first perturbs
locally his data (Step 1) by applying a Randomized
Response Technique (RRT) before sending this
information to the entity responsible for storing this information
through an anonymous communication channel (Step 2). This
entity is assumed to be semi-trusted, also sometimes called
honest-but-curious in the sense that it is assumed to follow the
directives of the protocol (i.e., it will not corrupt the perturbed
ratings sent by a user or try to influence the output of the
recommendation algorithm) but nonetheless tries to extract as
much information as it can from the data it receives. Out of the
perturbed ratings, the semi-trusted entity constructs two matrices
(i.e., the deviation matrix and the cardinality matrix) following the
Weighted Slope One algorithm (Step 3). When a user needs a
recommendation on a particular movie, he queries these matrices
through a variant of a private information retrieval scheme (Step
4) hiding the content of his query (i.e., the item he is interested in)
to the semi-trusted entity. By combining the data retrieved (Step
5) with his true ratings (which once again are only stored on his
machine), the user can then locally compute the output of the
recommendation algorithm for this particular item (Step 6).
[1] Sébastien Gambs and Julien Lolive. SlopPy: Slope One with
Privacy. In DPM, September 2012.
quaero-catalogue-210x210-v1.6-page-per-page.indd 117 02/10/2013 10:08:33
MoveaTV: Motion Processing
Engine for interactive TV - Movea
p120
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
118
quaero-catalogue-210x210-v1.6-page-per-page.indd 118 02/10/2013 10:08:45
119
quaero-catalogue-210x210-v1.6-page-per-page.indd 119 02/10/2013 10:08:46
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
120
Motion processing engine for
Interactive TV
Target users and customers
Systems integrators•	
OEMs•	
Service providers•	
Application developers to take advantage of•	
Movea’s state-of-the-art motion processing
technology
Partners:
Movea
Application sectors
Man-machine interactions•	
Remote control•	
Digital TV•	
Video Games•	
Peripherals•	
Smart Home•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 120 02/10/2013 10:08:58
121
MoveaTV: Motion Processing Engine for interactive TV
Contact details:
Marc Attia
m.attia@movea.com
Technical requirements: Conditions for access and use:
Movea
4 Avenue Doyen Louis Weil
38000 Grenoble France
www.movea.com
Programming (Java, C/C++, etc.) To be discussed (usually IP license, i.e fees & royalties)
Description:
MoveaTV makes it easy to deliver advanced user
interfaces, immersive motion gaming, gesture-based
viewer authentication, and intuitive program guide
navigation.
quaero-catalogue-210x210-v1.6-page-per-page.indd 121 02/10/2013 10:08:59
AACI: Automatic acquisition and
tracking of mobile target in image
sequences - Inria p124
ContentArmor™ Video
Watermarking - Technicolor p130
Hybrid Broadcast Broadband
Synchronization - Technicolor p136
Soccer Event Detection -
Technicolor p142
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
122
quaero-catalogue-210x210-v1.6-page-per-page.indd 122 02/10/2013 10:09:12
Audience Characterization -
Technicolor p126
Crowd Sourced Metadata -
Technicolor p132
Movie Chaptering - Technicolor
p138
VidSeg: Video Segmentation - Inria
p144
C-Motion: Camera motion
characterization - Inria p128
Face Detection, Recognition and
Analysis - Karlsruhe Institute of
Technology (KIT) p134
Mumtimedia Person Identification -
Karlsruhe Institute of Technology
(KIT) p140
Violent Scenes Detection -
Technicolor p146
123
quaero-catalogue-210x210-v1.6-page-per-page.indd 123 02/10/2013 10:09:12
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
124
Automatic acquisition and tracking of
mobile target in image sequences
Target users and customers
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in object tracking in
videos.
Partners:
Inria
Application sectors
Target tracking•	
Video analysis•	
Multimedia document processing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 124 02/10/2013 10:09:25
125
AACI: Automatic acquisition and tracking of mobile target in image sequences
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Technical requirements:
Conditions for access and use:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
SPC with Unix/Linux•	
This software requires the Motion2D [1]•	
software developed by Inria, and OpenCV [2]
developed by Intel, as third party libraries.
[1] http://www.irisa.fr/vista/Motion2D/index.html
[2] http://opencv.willowgarage.com/wiki/
AACI is a software that has been developed at
Irisa/Inria-Rennes and is the property of Inria.
AACI can be supplied under license on a case-by-
case basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Description:
AACI is structured in 4 steps. First of all, dominant
motion estimation is performed using Motion2D
software. Then each pixel of current frame is labeled
either as “conform to dominant motion” or “non-
conform to dominant motion” by minimum-cut/
maximumflow minimization of a cost function,
described in [1].
Then each detection is added to a Trellis, and is
validated if it is persistent in size and position for a
short period of time (the Trellis depth) or discarded
otherwise. Finally, each validated detection is tracked
using Mean-shift algorithm, as explained in [2].
[1] J.-M. Odobez and P. Bouthemy, Separation of moving
regions from background in an image sequence acquired with a
mobile camera
[2] D. Comaniciu, V. Ramesh, P. Meer, Kernel Based Object
Tracking
AACI was jointly developed by the Vista team at Irisa/INRIA
Rennes, BERTIN Technologies, and by DGA (Direction
Générale de l’Armement).
The AACI authors are: Florent Dutrech, Patrick Perez
quaero-catalogue-210x210-v1.6-page-per-page.indd 125 02/10/2013 10:09:29
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
126
Automatically characterize in-home
audience and level of attention
Target users and customers
All content providers may be interested in the
automatic characterization of the in-home audience.
When personalization of video – either Video on
Demand (VoD) or broadcast – or ads is targeted,
these same providers will see an interest in having
this module to help the automatic personalization of
provided content.
The audience characterization module may also
be used by end users to manage their own content
at home. Furthermore, content providers and
advertisers will be interested in the ‘level of attention’
information provided by the module.
Partners:
Technicolor
Application sectors
Provided content personalization
Knowing what the audience is:
VoD portals may be personalized and proper•	
home pages may be displayed;
Ads may be personalized;•	
According videos and broadcast programs may•	
be proposed.
quaero-catalogue-210x210-v1.6-page-per-page.indd 126 02/10/2013 10:09:41
127
Audience Characterization
Contact details:
Louis Chevallier
Louis.chevallier@technicolor.com
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
Current version of module runs on a QuadCore PC
connected to 2 WebCams.
It is a multi-threaded Windows application
programmed in C++.
Corresponding deliverables are all stated QL – i.e.
this module is only available to a subset of PVAA
partners, on their request.
Related IPL is the property of Technicolor.
Description:
The Audience Characterization module is connected
to a Webcam (or two Webcams to enlarge the
angle of the field of view) placed near a TV screen.
This module detects and tracks faces, evaluates age
class and gender of individuals, detects groups, and
provides timed reports about detected people.
The eye tracking module tracks eyes and evaluates the level of
attention for each detected person. Timed reports are provided.
The module may match detected faces with known ones
from a small database (i.e. family members), to enhance the
personalization of the provided content.
quaero-catalogue-210x210-v1.6-page-per-page.indd 127 02/10/2013 10:09:47
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
128
Camera motion characterization
Target users and customers
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in video analysis.
Partners:
Inria
Application sectors
TVideo indexing•	
Multimedia document processing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 128 02/10/2013 10:10:00
129
C-Motion: Camera motion characterization
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Technical requirements:
Conditions for access and use:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
SPC with Unix/Linux or Windows OS•	
This software requires the Motion2D [1] software•	
developed by Inria as a third party library
[1] http://www.irisa.fr/vista/Motion2D/index.html
C-Motion is a software that has been developed at
Irisa/Inria-Rennes and is the property of Inria.
C-Motion can be supplied under license on a case-
by-case basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Description:
C-Motion is a software dedicated to camera motion
characterization. It relies on the Motion2D library
developed by Inria for 2D parametric motion model
estimation (see the architecture on figure below).
For each frame in an image sequence or a video,
C-Motion gives the corresponding estimated camera
motion class. These motion classes correspond to the
following situations:
Static camera•	
Pan (right, left, up, down, or a combination: right/up, right/•	
down, left/up, left/down)
Zoom/traveling (in or out)•	
Complex camera motion•	
C-Motion was jointly developed by the Vista team at Irisa/INRIA
Rennes, and by DGA (Direction Générale de l’Armement).
The C-Motion authors are: Marc Gelgon, Fabien Spindler, Patrick
Bouthemy.
quaero-catalogue-210x210-v1.6-page-per-page.indd 129 02/10/2013 10:10:03
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
130
ContentArmor™ Video Watermarking is a
technology intended to deter actors along the
content value chain from leaking content
Target users and customers
Content owners•	
Studios•	
Post-production houses•	
Content distributors•	
Partners:
Technicolor
Application sectors
SECURE E-screener provides a traitor tracing mechanism
for all high-risk screeners: internal quality assessment and
validation reviews, promotional tools preparation, screeners and
promotional viewing for distributors.
Fully automated process within digital workflow•	
Flexible integration of the embedder in existing content•	
distribution frameworks
Stronger reputation and controlled liability of the•	
stakeholders
PREMIUM Video-on-Demand provides a serialization
mechanism in home gateways enabling fine grained traitor
tracing for premium content distributed on multi-devices into
the home.
Wider traitor tracing coverage enabling early window•	
content release
Ease of technology integration in low computational•	
devices
Secure implementation on dependable platforms, e.g.•	
Conditional Access System
quaero-catalogue-210x210-v1.6-page-per-page.indd 130 02/10/2013 10:10:16
131
ContentArmor™ Video Watermarking
Contact details:
Gwenaël Doërr
gwenael.doerr@technicolor.com
Description:
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
ContentArmor™ Video Watermarking is dedicated to video
encoded using H.264 MPEG-4/AVC with CABAC entropy
coding (Main and High profiles)
The system currently supports the following video containers:
MPEG-2/TS, MOV, MP4, none
The profiler is Linux-based; the embedder is OS independent.
The profiler and the embedder can be licensed as software
executables or libraries; investigative services are currently not
available for licensing.
http://www.technicolor.com/en/
solutions-services/technology/
technology-licensing/content-
armor-secure-digital-content
ContentArmor™ Video Watermarking is a
technology intended to deter actors along the
content value chain from leaking content. To do
so, an invisible forensics watermark is embedded
within the content to uniquely identify the device
or the recipient whom it has been delivered to.
Technicolor’s two-step video watermarking
algorithm (profiler + embedder) operates
directly in the bitstream, thus resulting in blitz-
fast embedding. The shift of computationally
expensive operations to a preliminary profiling step
enables integration at any point of the distribution
chain, including low computational power CE
devices such as set-top boxes, tablets, etc.
2-Step Watermarking System
Isolation of computationally intensive operations in an offline profiler•	
One unique profiling per content regardless of the number of•	
recipients
Bit Stream Watermarking
No need for re-encoding•	
Blitz-fast embedding•	
Seamless integration at any point of digital distribution workflows•	
High-performance technology
High-fidelity profiles tailored to the type of video content (animation•	
movies, feature films, sport programs, documentaries, etc)
Proven robustness against crude signal processing attacks, including•	
severe recompression, HDMI stripping, screencasting, and
camcording
Flexible error correction to individually protect the ID bits•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 131 02/10/2013 10:10:17
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
132
Automatically tags media content according to
what the crowd says of it over the Web
Target users and customers
Professional customers:•	
Content owners
Content providers
Service providers
Consumers•	
Partners:
Technicolor
Application sectors
Content targeting•	
Content recommending•	
Content retrieving•	
Content discovering•	
Content browsing•	
Content replaying•	
Web contributors’ reviews and comments posted on
dedicated web sites have proven to embed cleverness
from which valuable descriptive metadata can be
extracted. These descriptive metadata, while being
synchronized with the content timeline or not, help
any of the above usages and services about content.
Furthermore, these associated metadata raise the value
of the related content over time, which is of interest to
content owners as well as to content providers.
quaero-catalogue-210x210-v1.6-page-per-page.indd 132 02/10/2013 10:10:29
133
Crowd Sourced Metadata
Contact details:
Philippe Schmouker
philippe.schmouker@technicolor.com
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
The « Content tagging according to crowd•	
sourced metadata” module analyses big sets of
comments and reviews – i.e. free texts – posted
by contributors on Web sites dedicated to
Cinema and TV
It currently runs as Python modules.•	
Corresponding deliverables are all stated QL – i.e.
these modules are only available to a subset of
PVAA partners, on their argued request.
Related IPL is proprietary of Technicolor.
Description:
The “Content tagging according to crowd sourced
metadata” module automatically extracts metadata
from what the crowd says about media content on
the Web. It currently extracts named entities from
subtitles, comments and reviews. It also extracts from
posted comments: quotes of movie dialogs and
of other comments. Furthermore, it characterizes
contributors to forums according to their connections
to other contributors and to their behaviour over time
on these forums.
The characterization of contributors is expected to help in
determining which comments should be analysed first. Indeed, they
constitute an infinite and constantly growing stream of words that
may not always be of great interest.
The Natural Language Processing and temporal graphs analysis
dedicated modules have been developed for the specific purpose
of extracting descriptive metadata and when possible synchronizing
them with the media timeline. It aims to enrich the description of the
media content and to increase its value over time.
quaero-catalogue-210x210-v1.6-page-per-page.indd 133 02/10/2013 10:10:36
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
134
Localize and identify faces and
estimate age, gender and emotions
Target users and customers
The targeted users are companies interested in
integrating face analysis into their products.
Partners:
Karlsruhe Institute of Technology (KIT)
Application sectors
Digital Signage•	
User Interfaces / Human Computer Interaction•	
Entertainment•	
Safety and Security•	
Multimedia Analysis, Search & Retrieval•	
Assistive Techonologies•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 134 02/10/2013 10:10:49
135
Face Detection, Recognition and Analysis
Contact details:
Prof. Dr. Rainer Stiefelhagen
rainer.stiefelhagen@kit.edu
Conditions for access and use:
Karlsruhe Institute of Technology
Institute for Anthropomatics
Vincenz-Priessnitz-Str. 3
76131 Karlsruhe Germany
https://cvhci.anthropomatik.kit.edu/
Available for licensing on a case-by-case basis.
Description:
This technology allows to localize and recognize
faces in images and videos. It operates in real-time
and is robust across very different source types.
In addition to identification, age, gender and emotion of the
person can be estimated.
quaero-catalogue-210x210-v1.6-page-per-page.indd 135 02/10/2013 10:10:56
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
136
Personalized audio, Multi-view on multi-
screen, Hybrid stereoscopic 3D TV
Target users and customers
Broadcasters (Sat, Cable, Terrestrial)•	
ISP•	
Partners:
Technicolor
Application sectors
Personalized audio: the technology offers the user the
possibility of enjoying a broadcast TV program in his/her
favorite language. Additional languages are streamed on
demand from a server and can be rendered either on the
main TV screen or on a personal device (e.g. smartphone
with headphone).
Multi-view on multi-screen: the user can enrich the
broadcast TV program (e.g. music live concert or sport
event) by selecting additional points of views rendered on
a second screen, e.g. a tablet.
Hybrid stereoscopic 3D TV: it consists in rendering
a 3D side-by-side content without monopolizing a
broadcast channel. One view is transmitted over a
broadcast network whilst the other view is delivered over
Internet. Each view can be rendered independently as a
2D content.
quaero-catalogue-210x210-v1.6-page-per-page.indd 136 02/10/2013 10:11:08
137
Hybrid Broadcast Broadband Synchronization
Contact details:
Anthony Laurent
anthony.laurent@technicolor.com
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
The technology ensures a frame accurate
synchronization for audiovisual components delivered
over different networks with different transport
protocols and transmission delays, each one having its
own reference clock.
The principle is to insert a timeline component in the
broadcast service. This component is linked to the
current content and embeds a timing information
indicating the time elapsed since the beginning. Its
format is based on existing DVB specification ETSI TS
102 823 “Specification for the carriage of synchronized
auxiliary data in DVB transport streams”.
Technology currently available only as a prototype based
on GStreamer.
Related IPL is proprietary of Technicolor.
Description:
Broadcasters and IPTV service providers would
like to propose new value-added services but are
confronted with bandwidth limitations. Hybrid
broadcast broadband synchronization technology
helps them overcome this constraint.
It allows broadcast/IPTV service content to be enriched with
additional audiovisual components delivered over broadband
or stored locally, by ensuring a very accurate rendering
synchronization.
quaero-catalogue-210x210-v1.6-page-per-page.indd 137 02/10/2013 10:11:11
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
138
An Automatic temporal
segmentation of video
Target users and customers
Content providers•	
ISP (Internet Service Provider)•	
Video editing software companies•	
Partners:
Technicolor
Application sectors
Video structuring•	
Video archiving•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 138 02/10/2013 10:11:24
139
Movie Chaptering
Contact details:
Hassane Guermoud
Hassane.Guermoud@technicolor.com
Technical requirements: Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
Current version of module runs on QuadCore PC. It is
a Linux and Windows application programmed in C++.
This module is only available to a subset of PVAA
partners, on their request. Related IPL is propriety of
Technicolor.
Description:
The movie chaptering is an unsupervised processing
that segments a movie in chapters. The rules to
obtain chapters have to respect the unity of time,
place and action.
The movie chaptering module can be embedded in a set up
box and executed offline to segment a movie recorded by the
user. Thus, browsing through chapters to reach the sequence
that the user is looking for becomes easier to achieve.
quaero-catalogue-210x210-v1.6-page-per-page.indd 139 02/10/2013 10:11:29
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
140
Identifying actors in movies
and TV series
Target users and customers
Multimedia Content Providers•	
Movie/TV Streaming Providers•	
Movie/TV Industry Actors•	
Partners:
Karlsruhe Institute of Technology (KIT)
Application sectors
Movie/TV Streaming & Playback•	
Second Screen•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 140 02/10/2013 10:11:41
141
Multimedia Person Identification
Contact details:
Prof. Dr. Rainer Stiefelhagen
rainer.stiefelhagen@kit.edu
Conditions for access and use:
Karlsruhe Institute of Technology
Institute for Anthropomatics
Vincenz-Priessnitz-Str. 3
76131 Karlsruhe Germany
https://cvhci.anthropomatik.kit.edu/
Available for licensing on a case-by-case basis.
Description:
This technology allows to identify actors/characters
in multimedia data such as movies and TV series. It
first tracks faces/persons and subsequently provides
identities for each track.
As such, it can be used to provide additional information about
actors/characters while viewing the video.
quaero-catalogue-210x210-v1.6-page-per-page.indd 141 02/10/2013 10:11:49
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
142
Automatically detects events/actions of
interest in a soccer match
Target users and customers
All content providers may be interested by the soccer
event detection technology to manage efficiently their
content database. When repurposing applications are
targeted, these same providers will see an interest in
having this module to help automatically detect events of
interest.
The soccer event detection module may also be used by
end users to manage their own content at home.
Partners:
Technicolor
Application sectors
Content structuring & browsing•	 : Knowing where
actions of interest are in a soccer match allows
structuring of soccer matches. It also gives direct
access to events, thus enabling a non linear browsing
in the document.
Content retrieval•	 : By adding additional metadata to
contents (ie. timecodes of events), the module finds a
direct application in the context of content retrieval in
video databases.
Content repurposing•	 : Knowing where the events
are in a soccer match allows to repurpose this
content for other broadcasting channels such as the
internet or portable devices. In the later case, building
summary from the detected events is of interest
for sending this new repurposed content to mobile
phones.
quaero-catalogue-210x210-v1.6-page-per-page.indd 142 02/10/2013 10:12:02
143
Soccer Event Detection
Contact details:
Claire-Helene Demarty
claire-helene.demarty@technicolor.com
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
The Soccer Event Detection module takes a set of•	
precomputed features from a video stream as input.
Currently running under matlab.•	
Module currently available as a matlab prototype•	
only.
Corresponding deliverables are all stated QI – i.e.•	
this module is only available to Quaero partners,
on their request. Related IPL is proprietary of
Technicolor.
Description:
The soccer detection module is a system that detects
automatically events/actions of interest in soccer
matches. Provided with a set of pre-computed
features from a video stream, the system will output a
file with the corresponding event timecodes.
The chosen pre-computed features were proved
to be discriminative for the task of soccer event
detection. They are used as an input for a
classification sub-system based on Bayesian networks.
This sub-system was trained and its parameters
learned offline on a soccer matches database.
Contrary to what exists in literature, the Bayesian model
structure was also automatically learned, without using any
expert knowledge. The structure learning property enables one
to automatically train another model on another type of videos
to detect some other type of events, without manually building
the structure of the model.
quaero-catalogue-210x210-v1.6-page-per-page.indd 143 02/10/2013 10:12:06
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
144
Video segmentation: detection of
cuts, dissolves, monochrome frames,
silences and aspect ratio changes
(4/3 & 16/9)
Target users and customers
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in video analysis.
Partners:
Inria
Application sectors
Multimedia document processing•	
Video indexing•	
quaero-catalogue-210x210-v1.6-page-per-page.indd 144 02/10/2013 10:12:19
145
VidSeg: Video Segmentation
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Conditions for access and use:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
VidSeg is a software that has been developed at Irisa/Inria-
Rennes and is the property of Inria. It is registered at the
Agency for Program Protection (APP) in France under the
references:
* n°IDDN.FR.001.250009.000.S.P.2009.000.40000
VidSeg can be supplied under license on a case-by-case basis.
Technical requirements:
PC with Unix/Linux OS•	
This software requires the FFMPEG [1] software•	
installed on the system, as a third party library.
[1] http://ffmpeg.org/
Technical issues:
Sébastien Campion
scampion@irisa.fr
Description:
VidSeg is a software tool dedicated to video
segmentation. It detects cuts and dissolves transitions
in a video, along with additional information:
monochrome frames, silences and aspect ratio
modifications (from 4/3 to 16/9 or inversely).
The VidSeg software relies on the FFMPEG libraries
for video decoding. Results are output in an XML
format file as shown in figure 1 below.
VidSeg was developed at Irisa/INRIA Rennes by the Texmex
team.
The VidSeg authors are: Manolis Delakis, Sébastien Campion.
quaero-catalogue-210x210-v1.6-page-per-page.indd 145 02/10/2013 10:12:24
Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
146
Violent Scenes Detection –
automatically detects violent scenes
in movies
Target users and customers
All content providers may be interested in the violent
scenes detection technology to manage efficiently
their content database. In particular, the violent
scenes detection technology is an interesting feature
for VOD services, as it may help users select a movie
suitable for the entire family.
Partners:
Technicolor
Inria
Application sectors
Content structuring & browsing: Knowing where•	
the most violent scenes are in a movie gives
direct access to these events, thus enabling a non
linear browsing in the document.
Content retrieval: By adding additional metadata•	
to contents (i.e., timecodes of violent scenes),
the module finds a direct application in the
context of content retrieval in videos databases.
quaero-catalogue-210x210-v1.6-page-per-page.indd 146 02/10/2013 10:12:37
147
Violent Scenes Detection
Contact details:
Claire-Helene Demarty
claire-helene.demarty@technicolor.com
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
The Violent Scenes Detection module takes a set of•	
pre-computed features from a video stream as input.
Currently running under matlab.•	
Module currently available as a matlab prototype only.•	
Corresponding deliverables are all stated QI – i.e., this•	
module is only available to Quaero partners, on their
request. Related IPL is proprietary of Technicolor.
Description:
The Violent Scenes Detection module is a system
which detects automatically the most violent scenes
in a movie. Provided with a set of pre-computed
audio and video features from a video stream, the
system will output a file with the corresponding
violent scenes timecodes. The chosen pre-computed
features were proved to be discriminative for the
task of violent scenes detection. They are used as
an input for a classification sub-system based on
Bayesian networks. This sub-system was trained and
its parameters learned offline, on a movie database.
The Bayesian model structure was also automatically learned,
without using any expert knowledge. The structure learning
property enables one to automatically train another model on
another type of videos to detect some other type of events,
without building the structure of the model manually.
quaero-catalogue-210x210-v1.6-page-per-page.indd 147 02/10/2013 10:12:42
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
148
Image search based on color content
Application sectors
Catalogues, e-commerce, web design, stock
photography management, etc.
Target users and customers
Any organization possessing masses of images
may be interested in indexing them regarding their
color content. It may allow to search among images
which have not been correctly described with textual
metadata.
Partners:
Exalead
quaero-catalogue-210x210-v1.6-page-per-page.indd 148 02/10/2013 10:12:49
149
Contact details:
Rémi Landais
Remi.landais@3ds.com
+33 (0)1 55 35 26 26
Description
Technical requirements:
Conditions for access and use:
Exalead SA
10 place de la Madeleine
75008 Paris France
http://www.exalead.com/search/image/
http://chromatik.labs.exalead.com/
Within Chromatik, images may be searched by
text or by color, by clicking on one or several color
squares, adjusting the proportion of each selected
color and/or selecting a color category (Bright vs
Dark and Colorful vs Greyscale).
The images must already be scanned and available
in electronic form in any format.
 Their indexing relies
on Exalead CloudView™ product and on Exalead
Chromatik indexing service
Commercially available through Exalead
http://www.3ds.com/products/exalead/.
Contact at contact@exalead.com
As indexed pictures are obtained from FlickR API, Chromatik
takes benefit from available metadata (location, owner id, tags,
license) to improve the browsing (allowing to search images by
geographical proximity, etc.).
The user may also search images by similarity by selecting an
image from the index or even by uploading an image specifying
a url.
Chromatik
quaero-catalogue-210x210-v1.6-page-per-page.indd 149 02/10/2013 10:12:54
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
150
Unified capture solution for incoming
multimedia information
Application sectors
Bank and insurance companies•	
Public companies•	
Large companies with heterogeneous incoming•	
flows (postal mail, email, voice server, etc.)
Target users and customers
Large companies and organizations that face up
to increasing amount of incoming data flow and
to the diversification of information. The ongoing
evolutions and convergence of Information
Technology, Telecommunications and Medias
are generating additional flows of unstructured
multimedia information like voice messages, pictures,
video sequences…
Partners:
Itesoft / A2iA / Vecsys / Vocapia / LTU technologies
quaero-catalogue-210x210-v1.6-page-per-page.indd 150 02/10/2013 10:13:01
MECA: Multimedia Enterprise CApture
151
Contact details:
Vincent Ehrström
veh@itesoft.com
Phone: +33 (0)4 66 35 77 00
Fax: +33 (0)4 66 35 77 01
Description
Technical requirements:
Conditions for access and use:
ITESOFT
Parc d’Andron / Le Séquoia
30470 Aimargues France
www.itesoft.com
The Multimedia Enterprise Capture platform
processes high volumes of incoming multimedia
information flow. It is able to capture the information
automatically, analyze it, classify it and index it
independently of the media.
Automatic capture system processing multimedia•	
content
Textual document extraction (machine printed,•	
handwritten)
Multimedia content extraction (voice, picture)•	
Automatic document classification•	
Automatic document indexation•	
Manual validation interface•	
Export to CMS•	
Demo available: contact ITESOFT, Parc d’Andron, Le Séquoia,
30470 Aimargues.
+33 (0)4 66 35 77 00
It offers a unified capture solution to large companies and
organizations.
quaero-catalogue-210x210-v1.6-page-per-page.indd 151 02/10/2013 10:13:04
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
152
Advanced solution for multimedia and
multilingual content processing
Application sectors
Market Issues: Press clippings, monitoring•	
reports, media analysis on specific interest
(strategic intelligence, e-reputation, image study,
etc.).
Governmental Issues: Open Source Intelligence•	
on Defence or Homeland Security topics
Target users and customers
Media Intelligence Companies•	
Governmental Intelligence Agencies•	
Partners:
Bertin Technologies / SYSTRAN / Vecsys
quaero-catalogue-210x210-v1.6-page-per-page.indd 152 02/10/2013 10:13:11
MediaCentric®
153
Contact details:
Nabil Bouzerna, Nicolas Masson
mediacentric@bertin.fr
Description
Technical requirements:
Conditions for access and use:
Bertin Technologies
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
www.bertin.fr
MediaCentric® is an advanced multimedia (video,
audio, image and text) and multilingual content
processing solution capable of supporting in a tight
time massive amounts of data from multiple sources
i.e. satellite and terrestrial broadcast TV / radio, Web
TV, podcast, UGC (User-Generated Content) and
social media (Twitter, Facebook, IRC,…).
The platform powers a whole process from
acquisition, monitoring, exploration and analysis to
the dissemination of the critical pieces of information.
Acquisition devices and hardware depend on targets
and amounts of data to be processed (from PC to
high performance cluster).
All information available from Bertin Technologies.
By the combined use of video, speech and image analysis
technologies (face recognition, text extraction by OCR, etc.),
text mining and translation, MediaCentric® makes the most of
the richness conveyed by nowadays Media. Moreover, it offers
a user-friendly interface, designed for promptness, to increase
operator efficiency.
quaero-catalogue-210x210-v1.6-page-per-page.indd 153 02/10/2013 10:13:19
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
154
Solutions for speech processing
Application sectors
Media monitoring; Media Asset Management;
Audio archives indexing
Speech analytics in call centers; Security/Intelligence;
etc.
Target users and customers
Audio content managers•	
Producers•	
Editors•	
Transcribers•	
Researchers•	
Monitors•	
Analysts•	
Partners:
Vecsys / Bertin Technologies / Exalead / Itesoft /
Orange / Yacast
quaero-catalogue-210x210-v1.6-page-per-page.indd 154 02/10/2013 10:13:25
MediaSpeech® product line
155
Contact details:
Ariane Nabeth-Halber
anabeth@vecsys.fr
Description
Technical requirements:
Conditions for access and use:
Vecsys
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
http://www.vecsys.fr
Vecsys has developed a family of highly efficient
platforms for speech processing, typically offering
such services as:
spoken data extraction and partitioning
speaker identification / speaker tracking
automatic speech transcription
speech and text synchronisation
MediaSpeech® available solutions are the following:
MediaSpeech® Factory: high availability distributed
speech processing system (24/7), with a redundant
cluster, designed and optimized to process huge
Standard Web access
On-premises or in-the-cloud (SaaS)
Quotation on request
volume of audio data, with a high efficiency process scheduler to
handle a processing queue and load balancing.
MediaSpeech® Lite: cost effective solution, without redundancy,
for deployment on PC or standard server.
MediaSpeech® VM: virtual machine solution.
MediaSpeech® SaaS: set of hosted WebServices using the full
capabilities of MediaSpeech Factory.
These MediaSpeech® solutions are all accessible through the
same WebServices communication interfaces (SOAP, REST,
WEB pages, Web Content…), and share the same API.
quaero-catalogue-210x210-v1.6-page-per-page.indd 155 02/10/2013 10:13:29
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
156
A voice activated command
technology
Application sectors
Connected TV, connected house•	
E-Business, E-Health, Logistics, Avionics,•	
Automotive
Target users and customers
Smartphone and tablet owners; Mobile people;•	
Hand-busy people
Disabled or aged persons•	
Enterprise with mobile applications•	
Mobile Applications developers•	
Partners:
Vecsys
Technicolor
quaero-catalogue-210x210-v1.6-page-per-page.indd 156 02/10/2013 10:13:36
MobileSpeech
157
Contact details:
Ariane Nabeth-Halber
anabeth@vecsys.fr
Description
Technical requirements:
Conditions for access and use:
Vecsys
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
http://www.vecsys.fr
MediaSpeech® Mobile is a standalone Automatic
Speech Recognition solution for smartphones and
tablets.
The speech recognition engine is optimized to
reduce memory usage and processing requirements.
It works in file mode or streaming mode, with a
vocabulary up to several thousands of words. It
processes constrained language (with standard W3C
Java Speech format grammars) and natural language
(with statistical models).
Mobile OS (Android, iOS) or embedded OS
Android/iPhone/embedded-OS package
Quotation on request
Vecsys offers tools, API and professional services to assist
Partners and Customers in delivering successful applications.
Such assistance includes:
pronunciation dictionaries adaptation; phonetic or acoustic
adaptation
grammars development; statistical language models
development and adaptation
meeting target performances and accuracy
validation/advising for spec and integration phases
quaero-catalogue-210x210-v1.6-page-per-page.indd 157 02/10/2013 10:13:41
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
158
The Music Mashup: an innovative
search engine dedicated to music
Application sectors
Music editors, distributors or any organization
needing a unified view of its business items (artists in
the cloudmine context) based on different sources.
Target users and customers
Cloudmine may concern anybody interested in
music. The general concept of gathering and
integrating information from multiple web sources
may also be extended to any other domain and then
concern anybody using a search engine!
Partners:
Exalead
IRCAM
Télécom ParisTech
quaero-catalogue-210x210-v1.6-page-per-page.indd 158 02/10/2013 10:13:48
MuMa: The Music Mashup
159
Contact details:
Jean-Marc Finsterwald
jean-marc.finsterwald@3ds.com
+33 (0)1 55 35 26 26
Description
Technical requirements:
Conditions for access and use:
Exalead SA
10 place de la Madeleine
75008 Paris France
http://music.labs.exalead.com/
MuMa, the Music Mashup, is an innovative search
engine dedicated to music. It collects songs and
information about it (artists, titles, albums, lyrics,
concerts, tweets, pictures, biographies, prices...) from
reference sources on the Web and displays it into a
unique mashup way.
Thanks to the collaboration with Ircam and Télécom
ParisTech, MuMa also analyzes the content of songs,
For each domain, reference sources must be clearly
identified. Concerning the indexing, cloudmine relies
on the Exalead CloudView™ product.
Commercially available through Exalead: http://www.3ds.
com/products/exalead/.
Contact at contact@exalead.com
allowing the user to search through music by chords, moods,
genres, type of drums/guitar. The user may also browse MuMa
content using search by similarity.
Last but not least, the user may play the query with a real
keyboard.
quaero-catalogue-210x210-v1.6-page-per-page.indd 159 02/10/2013 10:13:53
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
160
New generation of machine translation
services for translating multimedia
documents
Application sectors
OMTP can be used for media monitoring, business
intelligence or public security applications, or in
any other context where multimodal, multilingual
data have to be processed (e.g. language service
providers).
Target users and customers
Public and commercial actors in need of processing
multilingual and multimodal documents (web pages,
audio/video podcasts, OCRed documents)
Partners:
SYSTRAN / A2iA / Bertin Technologies / Exalead /
Inria / LIMSI-CNRS / RWTH Aachen University /
Vocapia
quaero-catalogue-210x210-v1.6-page-per-page.indd 160 02/10/2013 10:14:00
OMTP: Online Multimedia Translation Platform
161
Contact details:
Jean Senellart
Directeur Recherche et Développement
senellart@systran.fr
Tel : + 33 (0) 1 44 82 49 49
Description
Technical requirements:
Conditions for access and use:
SYSTRAN
5 rue Feydeau
75002 Paris France
www.systransoft.com
The online platform provides access to high-quality
and real-time multimedia translation services,
making the translation of web content, audio/video
podcasts and graphical documents. It also enables
to create new, customized (‘hyper-specialized’)
translation models by training on resources created
by automated data acquisition procedures.
Online service with no specific technical•	
requirement
API enabling third-party applications to integrate•	
translation services
Demo available upon demand. Please contact Jean
Senellart (details below) for any further inquiries.
quaero-catalogue-210x210-v1.6-page-per-page.indd 161 02/10/2013 10:14:05
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
162
Easy Search & innovative Discovery
of Content through a unique User
Experience which combines TV,
gyroscopic Remote Control, Tablet and
SmartPhone.
Application sectors
Media and Entertainment
Target users and customers
Network Service Operators•	
Internet Service Providers•	
Partners:
Technicolor / Bertin Technologies / Exalead / Institut
National de l’Audiovisuel / Inria / Karlsruhe Institute
of Technology (KIT) / LTU technologies / Movea /
Télécom ParisTech / Vecsys / Yacast
quaero-catalogue-210x210-v1.6-page-per-page.indd 162 02/10/2013 10:14:11
Personalized and social TV
163
Contact details:
Nathalie Cabel
Nathalie.cabel@technicolor.com
Description
Technical requirements:
Conditions for access and use:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs / CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
The solution offers the end-users a personal user
experience to access content on TV with the
combinaison of Tablet or SmartPhone. The end-
user may explore easily with fun the large volumes of
content and may display additional information about
content watched on a second screen.
The system integrates several technologies in
different domains: content metadata enrichment,
and search engine, recommendation engine, semantic
analysis, privacy considerations, techniques of
synchronization multi-devices, gesture, voice, facial
and picture recognitions.
The TV platform is hosted on the network and manages
the catalogs of Video On Demand and of TV guide, the
end-user accounts and their consumptions. Thanks to a
web-based architecture, the end-users access portals on
TV, Tablet or SmartPhone.
The services leverage the latest innovations in terms of user
interactions, such as facial recognition, gesture, voice and second
screen devices. Thanks to efficient Search and Recommendation
engines and a connection with the Social Networks, the end-user
benefits from personalized services and can share his Media
experience with his social community.
quaero-catalogue-210x210-v1.6-page-per-page.indd 163 02/10/2013 10:14:19
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
164
PlateusNet: a remote testing plateform
for HMI’s usability
Application sectors
Information system design,•	
Complex system design,•	
Intervention in user centred process for•	
assessment of new products and services such as:
Professional & general public•	
Software•	
Product•	
Interactive TV•	
Cockpit•	
Embedded & mobile product•	
Target users and customers
Web agencies, bank and insurance, telecom
companies, Complex systems (defense & civil),
product Research & development.
Partners:
Bertin Technologies
quaero-catalogue-210x210-v1.6-page-per-page.indd 164 02/10/2013 10:14:26
PlateusNet
165
Contact details:
Marie Vian
vian@bertin.fr
Description
Technical requirements:
Conditions for access and use:
Bertin Technologies
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
www.bertin.fr
PlateusNet allows to pick up product or service HMIs
and to create questionnaires for remote usability
tests. A simulated HMI version is presented to many
endusers (worldwide if required) to record all of their
interactions with this HMI.
All products or services can be assessed with
PlateusNet at every step of the design process
(mock-up, prototype or completed). No technical
requirement is needed.
End-user side application needs Windows OS, Java
platform and internet connexion.
All information available from Bertin Technologies.
Results of all interactions are automatically collected on a
database server. PlateusNet interprets them by statistical analysis
with supervision of a usability expert.
The aims of these analysis are to propose design
recommendations to enhance usability and efficiency of the
product or the service HMI at the early and each stage of design
process.
@
quaero-catalogue-210x210-v1.6-page-per-page.indd 165 02/10/2013 10:14:26
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
166
Cloud-based collaborative service for
website localization
Application sectors
Corporate websites (whatever the sector)•	
eCommerce websites•	
Blogs and individual websites•	
Target users and customers
Users: Webmasters, professional translators,
Marketing/PR professionals, Small Business owners
Customers: International business leaders, SME with
international ambitions, digital agencies, language
service providers, eCommerce enterprises, start-
ups, bloggers, individuals or associations, tourism
industry actors (restaurants, hotels…) – in general any
business or organization that needs to increase its
international web exposure.
Partners:
SYSTRAN
quaero-catalogue-210x210-v1.6-page-per-page.indd 166 02/10/2013 10:14:34
SYSTRANLinks
167
Contact details:
Jean Senellart
Directeur Recherche et Développement
senellart@systran.fr
Tel : +33 (0) 1 44 82 49 49
Description
Technical requirements:
Conditions for access and use:
SYSTRAN
5 rue Feydeau
75002 Paris France
www.systranlinks.com
Online service, making website translation faster,
easier and more cost-effective than classical solutions.
This service consists of an innovative, collaborative
and reliable online CMS platform that enables to
launch and manage localization projects.
Accessing the Online service requires only an Internet
access and a browser; the service can be used for
translating any website, whatever the technology
behind it. No technical skill is required.
A SYSTRAN account has to be created during the•	
subscription
A Free version is offered, fully featured and suited•	
for websites with low traffic and small content to
review
3 payment schemes for corporate or professional•	
websites with either higher traffic or larger content
to be reviewed and edited
quaero-catalogue-210x210-v1.6-page-per-page.indd 167 02/10/2013 10:14:40
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
168
A unique application to explore public
debates
Application sectors
Public institutions•	
Parliaments (in any country)•	
Political journalists•	
Sociologists•	
Target users and customers
This application can be used by: political journalists,
sociologists, students and more than that, any citizen
Partners:
Exalead
Vecsys
quaero-catalogue-210x210-v1.6-page-per-page.indd 168 02/10/2013 10:14:47
Voxalead Débat Public
169
Contact details:
Julien Law-To
Julien.lawto@3ds.com
+33 (0)1 55 35 26 26
Description
Technical requirements:
Conditions for access and use:
Exalead SA
10 place de la Madeleine
75008 Paris France
http://politics.labs.exalead.com/
Voting is the privilege of democratic citizens. But
exercising it may be difficult: making good choices
requires analyzing candidates’ positions across all
important domains. While politicians have become
more and more accessible by participating in media
forums like talk shows, understanding their work still
requires some effort.
We want to make all the open data of our public
institutions available and easy to browse by any end
Manual transcription must be available.
When audio or audio recording are available, we can
align the recordings with the transcription.
Commercially available through Exalead
http://www.exalead.com/software/company/contact/
Exalead also works closely with leading information
management specialists like Capgemini, EADS, Logica,
TERMINALFOUR, Digirati, and Knowledge Concepts
to provide you with enterprise search solutions tailored
to your unique needs.
See http://www.exalead.com/software/partners/channel/
user, with simple tools, through an innovative interface.
What is said during the public debates is available as open-data.
Using CloudView, we are able to analyze, enrich and index these
debates. Users can search on different specific automatically
extracted topics or search by keywords, on all the debates. It is
also possible to focus on a political person.
When browsing a debate, if the video is available, we can watch
the video synchronized with the text, thanks to Vecsys.
quaero-catalogue-210x210-v1.6-page-per-page.indd 169 02/10/2013 10:14:52
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
170
Audio content-based video retrieval
Application sectors
Video search engine•	
Information retrieval, including videos•	
E-learning•	
Education•	
Defense and homeland security•	
Target users and customers
Any organization possessing masses of video and
audio contents can provide their users with access to
these contents through this technology. Making the
contents of the media automatically searchable and
browsable with all the performances of a web search
engine (robustness and scalability) provides a new
experience to customers.
Partners:
Exalead
Vocapia
quaero-catalogue-210x210-v1.6-page-per-page.indd 170 02/10/2013 10:15:00
Voxalead multimedia search engine
171
Contact details:
Julien Law-To
Julien.lawto@3ds.com
+33 (0)1 55 35 26 26
Description
Technical requirements:
Conditions for access and use:
Exalead SA
10 place de la Madeleine
75008 Paris France
http://politics.labs.exalead.com/
The audio part of the media is transcribed by a
Vocapia component. The transcription is then
analyzed, enriched and indexed by CloudView. The
Voxalead demonstrator is composed of a result page
and a play page that play the content interactively.
The search can be done in different languages
(English, French, German, Dutch, Spanish, Italian,
Arabic and Chinese).
The videos must be in electronic form. The better
the quality of the audio part is and the better the
transcription will be.
Commercially available through Exalead
http://www.exalead.com/software/company/contact/
Exalead also works closely with leading information
management specialists like Capgemini, EADS, Logica,
TERMINALFOUR, Digirati, and Knowledge Concepts
to provide you with enterprise search solutions tailored
to your unique needs.
See http://www.exalead.com/software/partners/channel/
quaero-catalogue-210x210-v1.6-page-per-page.indd 171 02/10/2013 10:15:07
Chromatik - p148
MediaCentric® - p152
MuMa:
The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA: Multimedia
Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech®
product line - p154
Personalized
and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
Application Demonstrators
172
Multilingual audio indexing, Teleconference
transcription, Telephone speech analytics,
Transcriptions of speeches, Subtitling
Application sectors
Multilingual audio indexing: the VoxSigma software suite•	
offers advanced language technologies to transform
raw audio data into structured and searchable XML
documents. It includes adaptive features allowing the
transcription of noisy speech.
Teleconference transcription: Vocapia’s speech-to-text•	
technology significantly reduces the cost of transcribing
business conference calls (such as quarterly reports).
Telephone speech analytics: Vocapia’s software processes•	
telephone data making recorded calls searchable and
analyzable via text-based methods for call management
companies.
Transcriptions of speeches: VoxSigma is used by several•	
governmental organizations to provide easy access to
audio and/or video content via time coded searchable
XML documents.
Subtitling: while fully automatic processing generally•	
does not deliver high enough quality subtitles, Vocapia’s
technologies reduce the effort entailed when closely
integrated in the subtitle creation process.
Target users and customers
The targeted users and customers of the VoxSigma SaaS
are actors in the multimedia and call center sectors, including
academic and industrial organizations, interested in the
processing of audio documents.
Partners:
Vocapia
LIMSI-CNRS
quaero-catalogue-210x210-v1.6-page-per-page.indd 172 02/10/2013 10:15:14
VoxSigma SaaS
173
Contact details:
Bernard Prouts
prouts@vocapia.com
contact@vocapia.com
+33 (0)1 84 17 01 14
Description
Technical requirements:
Conditions for access and use:
Vocapia Research
28, rue Jean Rostand
Parc Orsay Université France
91400 Orsay
www.vocapia.com
Vocapia developed a SaaS offer for the VoxSigma
software suite complementary to the classical
licensing, allowing customers to quickly reap the
benefits of regular improvements to our technology.
They can take advantage of additional features
offered by the online environment, such as handling
irregular processing needs by offering a high
computing power.
The VoxSigma SaaS offers three main processing
functions: the identification of the language spoken
in an audio document, the conversion of recorded
speech input to text (speech-to-text transcription),
and the synchronization of a transcription with the
speech signal (also called speech-text alignment).
Protocol: REST API over HTTPS. POST, GET
and PUT HTTP methods are accepted. Both URI
encoded requests and MIME multi-part requests are
supported.
Our service is available 24/7/365 with failover servers
and geographic redundancy. It can be accessed via pay-
as-you-go service and subscription offer.
It handles content in many European languages as well as
Mandarin and Arabic.
LID systems are available for broadcast (15 languages currently
available) and conversational data (50 languages). New
languages can easily be added to the system.
The system specifies the language of the audio document along
with a confidence score. In the current version, it is assumed that
a channel of an audio document is in a single language. In future
versions, it is planned to allow multiple languages in a single
document.
The speech-to-text transcription systems are currently available
for 17 languages for broadcast data, and for 7 languages for
conversational speech. Each word is associated with start and
end times and a confidence measure.
quaero-catalogue-210x210-v1.6-page-per-page.indd 173 02/10/2013 10:15:20
Table of Contents - Index
A2iA
Document Reader - Document Processing (p. 66)
Bertin Technologies
MediaCentric® (p. 152)
PlateusNet (p. 164)
Exalead
Chromatik (p. 148)
MuMa: The Music Mashup (p. 158)
Voxalead Débat Public (p. 168)
Voxalead multimedia search engine (p. 170)
INRA
AlvisAE: Alvis Annotation Editor - Semantic Acquisition & Annotation
(p. 8)
AlvisIR - Semantic Acquisition & Annotation (p. 10)
AlvisNLP: Alvis Natural Language Processing - Semantic Acquisition
& Annotation (p. 12)
TyDI: Terminology Design Interface - Semantic Acquisition &
Annotation (p. 16)
INRIA
KIWI: Keyword extractor - Semantic Acquisition & Annotation (p. 14)
SAMuSA: Speech And Music Segmenter and Annotator - Audio
Processing (p. 54)
Music Structure - Music Processing (p. 102)
AACI: Automatic acquisition and tracking of mobile target in image
sequences - Video Analysis & Structuring (p. 124)
C-Motion: Camera motion characterization - Video Analysis &
Structuring (p. 128)
VidSeg: Video Segmentation - Video Analysis & Structuring (p. 144)
IRINTS: Irisa News Topic Segmenter - Content Analysis (p. 110)
SloPy: Slope One with Privacy - Content Analysis (p. 116)
IRCAM
AudioPrint - Music Processing (p. 90)
Ircamaudiosim: Acoustical Similarity Estimation - Music
Processing (p. 92)
Ircambeat: Music Tempo, Meter, Beat and Downbeat Estimation -
Music Processing (p. 94)
Ircamchord: Automatic Chord Estimation - Music Processing (p. 96)
Ircammusicgenre and Ircammusicmood: Genre and Mood
Estimation - Music Processing (p. 98)
Ircamsummary: Music Summary Generation and Music Structure
Estimation - Music Processing (p. 100)
IRIT
SYRIX: Information retrieval system in context - Indexing, Ranking and
Retrieval (p. 106)
iTESOFT
MECA: Multimedia Enterprise Capture (p. 150)
Jouve
Colorimetric Correction System - Document Processing (p. 60)
Document Classification System - Document Processing (p. 62)
Document Layout Analysis System - Document Processing (p. 64)
Document Structuring System - Document Processing (p. 68)
Grey Level Character Recognition System - Document
Processing (p. 70)
Handwriting Recognition System - Document Processing (p. 72)
Image Descreening System - Document Processing (p. 74)
Image Resizing for Print on Demand Scanning - Document
Processing (p. 76)
Image Clusterization System - Object Recognition & Image
Clustering (p. 82)
Image Identification System - Object Recognition & ImageClustering (p. 84)
Karlsruhe Institute of Technology (KIT)
Speech-to-Text - Speech Processing (p. 42)
Face Detection, Recognition and Analysis - Video Analysis &
Structuring (p. 134)
Multimedia Person Identification - Video Analysis & Structuring (p. 140)
LIMSI-CNRS
FIDJI: Finding In Documents Justifications and Inferences - Q&A (p. 20)
QAVAL: Question Answering by VALidation - Q&A (p. 24)
RITEL: Spoken and Interactive Question-Answering System - Q&A (p. 26)
Acoustic Speaker Diarization - Speech Processing (p. 30)
LTU Technologies
LTU Leading Image Recognition Technologies - Object Recognition &
Image Clustering (p. 86)
Movea
MoveaTV: Motion Processing Engine for interactive TV - Gesture
Recognition (p. 120)
RWTH Aachen University
Machine Translation - Translation of Text and Speech (p. 46)
Speech Translation - Translation of Text and Speech (p. 48)
Automatic Speech Recognition - Speech Processing (p. 34)
Recognition of Handwritten Text - Document Processing (p. 78)
quaero-catalogue-210x210-v2-dernieres-pages.indd 1 02/10/2013 12:07:47
Synapse Développement
Question-Answering System - Q&A (p. 22)
Sentiment analysis and Opinion mining - Content Analysis (p. 112)
Persons, Places, Date, Organizations & Events Recognition - Content Analysis (p. 114)
Systran
OMTP: Online Multimedia Translation Platform (p. 160)
SYSTRANLinks (p. 166)
Technicolor
Sync Audio Watermarking - Audio Processing (p. 52)
Audience Characterization - Video Analysis & Structuring (p. 126)
ContentArmor™ Video Watermarking - Video Analysis & Structuring (p. 130)
Crowd Sourced Metadata - Video Analysis & Structuring (p. 132)
Hybrid Broadcast Broadband Synchronization - Video Analysis & Structuring (p. 136)
Movie Chaptering - Video Analysis & Structuring (p. 138)
Soccer Event Detection - Video Analysis & Structuring (p. 142)
Violent Scenes Detection - Video Analysis & Structuring (p. 146)
Personalized and social TV (p. 162)
Télécom ParisTech
Yaafe: Audio feature extractor - Audio Processing (p. 56)
Vecsys
MediaSpeech® Alignment - Speech Processing (p. 32)
Corinat: Language Resources production infrastructure - Speech Processing (p. 38)
MediaSpeech® product line (p. 154)
MobileSpeech (p. 156)
Vocapia Research
Automatic Speech Transcription - Speech Processing (p. 36)
Language Identification - Speech Processing (p. 40)
VoxSigma SaaS (p. 172)
quaero-catalogue-210x210-v2-dernieres-pages.indd 2 02/10/2013 12:07:47
Quaero en chiffres
L’esprit collaboratif :
Des workshops semestriels•	 réunissant plus de 130
chercheurs et ingénieurs
50 thèses•	 en cours dont 30 seront soutenues avant la
fin du programme
35 nationalités•	 représentées
une mobilité des jeunes acteurs académiques•	
recrutés par des partenaires industriels (SYSTRAN,
Vocapia, LNE) ou à l’international (Etats-Unis, Canada,
Chine, Suisse, Allemagne, etc.)
La dynamique industrielle :
34 brevets déposés•	
35 prototypes applicatifs développés•	
9 distinctions•	 (meilleure démonstration ACM Multimedia
Grand Challenge, Golden Mobile Award, TV Innovations
Awards at CES 2012, etc.)
Une position au niveau mondial•	 pour plusieurs
partenaires industriels (LTU Technologie pour la
reconnaissance d’images, Jouve pour la numérisation des
livres et la conversion (e-books), Technicolor pour la TV
sociale, interactive et personnalisée)
L’excellence scientifique :
Plus de 800 publications•	 dont 70
journaux, revues et livres
70 participations•	 à des campagnes
d’évaluation nationales et internationales
(classement le plus souvent dans les 3
premiers)
23 campagnes d’évaluation internes•	
conduites par le LNE
16 distinctions•	 (meilleure publication,
prix jeune chercheur, prix de thèse,
médaille de Cristal CNRS)
75 modules technologiques•	
élémentaires développés et transférés
dans les prototypes applicatifs (dont
plusieurs en open source accessibles sur
sourceforge.net)
Des résultats qui démontrent l’esprit collaboratif, l’excellence scientifique,
et la dynamique industrielle du programme
quaero-catalogue-210x210-v2-dernieres-pages.indd 3 02/10/2013 12:07:47
Quaero in numbers
The collaborative spirit:
Bi-annual workshops gathering•	 more than 130
researchers and engineers
50 ongoing PhD theses among•	 which 30 will be
defended before the end of the program
35 nationalities•	
Mobility of young academic actors•	 , recruited
by industrial partners (SYSTRAN, Vocapia, LNE)
or abroad (USA, Canada, China, Switzerland,
Germany, etc.)
The industrial dynamics:
34 patents•	
35 application demonstrators•	
9 awards•	 (Best Demo at ACM Multimedia Grand
Challenge, Golden Mobile Award, TV Innovations
Awards at CES 2012, etc.)
A worldwide leading position•	 for several industrial
partners (LTU Technologies for image recognition,
Jouve for book digitalization and e-book conversion,
Technicolor for social, interactive and personalized TV)
The scientific excellence:
More than 800 publications•	 among which
70 were published in scientific journals and
books
70 participations•	 in national and
international evaluation campaigns (regularly
ranked in the top 3)
More than 30 internal evaluation•	
campaigns every year
16 awards•	 (best publication, young
researcher award, thesis prize, CNRS Crystal
Medal, etc.)
75 core technology modules•	 developed
and integrated into application
demonstrators (among which several in open
source, available on sourceforge.net)
Results which attest to the collaborative spirit, scientific excellence,
and industrial dynamics of the program
quaero-catalogue-210x210-v2-dernieres-pages.indd 4 02/10/2013 12:07:47
Avec le soutien de
With the support of
Quaero © 2013 – www.quaero.org
quaero-catalogue-210x210-v2-dernieres-pages.indd 5 02/10/2013 12:08:03

Quaero Technology Catalog

  • 1.
  • 2.
  • 3.
  • 4.
    Quaero, premier pôlede recherche et d’innovation sur les technologies de traitement automatique des contenus multimédias et multilingues À l’origine de Quaero, il y a le besoin de fédérer et renforcer une filière technologique émergente, celle du traitement sémantique des contenus multimédias et multilingues (texte, parole, musique, images fixes, vidéo, documents numérisés). Il y a aussi la volonté de se comparer régulièrement à l’état de l’art international, d’organiser la chaîne complète de transfert technologique, et de mobiliser les acteurs de cette filière autour d’applications correspondant à des marchés identifiés comme importants, tels que les moteurs de recherche, la télévision personnalisée et la numérisation du patrimoine culturel. Ceprogrammeestportéparunconsortiumde32partenaires publics et privés, français et allemands. Pendant la phase de R&D, de 2008 à 2013, ces partenaires ont produit des corpus, effectué des recherches sur un large spectre scientifique, développé et testé des modèles de plus en plus élaborés, partagéleurexpérience,intégréleurslogiciels. Leprogramme de travail a été adapté aux évolutions du contexte. De nouvelles applications sont apparues, comme la gestion du courrier entrant en entreprise ou l’aide à la création de site web multilingues, et les efforts sur les technologies correspondantesontétérenforcés,commelareconnaissance d’écriture ou la traduction automatique. Les collaborations, notamment entre disciplines et entre laboratoires de recherche et entreprises, se sont approfondies. Fruit de ces travaux, qui ont donné lieu à plus de 800publicationsnationalesetinternationalesetde nombreusesdistinctions,unecentainedemodules technologiques et démonstrateurs applicatifs ont été développés, dont certains commencent déjà à être exploités commercialement. Un grand nombre d’entre eux présente un intérêt au-delà des membres du consortium. Ils font l’objet du présent catalogue. Ce catalogue technologique présente 72 modules ou démonstrateurs décrits chacun dans une double page qui en précise le domaine d’application et les caractéristiques techniques. Il est composé de deux parties : 59• modules technologiques, organisés par domaine thématique ; la liste des 12 domaines, qui apparaît p. 4, est rappelée sur chaque page de gauche 13• démonstrateurs applicatifs ; la liste, qui apparaît p. 5, est rappelée sur chaque page de gauche Le lecteur pourra également effectuer une recherche par partenaire à partir de l’index en fin de document quaero-catalogue-210x210-v1.6-page-per-page.indd 4 02/10/2013 09:53:32
  • 5.
    The Quaero programstems from the need to federate and strengthen an emerging technological sector, dealing with the semantic processing of multimedia and multilingual content (text, speech, music, still and video image, scanned documents). It also arises from the will to systematically benchmark results against international standards, to organize a complete technological transfer value chain, and to mobilize the actors of the sector around applications corresponding to identified and potentially large markets, suchassearchengines,personalizedTV,andthedigitization of cultural heritage. Quaero, the first research and innovation cluster on multimedia and multilingual content processing This program is borne by a consortium of 32 FrenchandGermanpartnersfromthepublicand privatesector. DuringtheR&Dphase,from2008 to 2013, these partners have produced corpora, performed research covering a large scientific spectrum, developed and tested increasingly elaborate models, shared experience, integrated software. The work plan has been adapted to the context evolutions. New applications have appeared, such as the management of professional incoming mail or computer-aided multilingual web site creation, and additional efforts have been put on the corresponding technologies, such as handwriting recognition or machine translation. Collaboration became more extensive, especially across disciplines and between research and industry. Thanks to these efforts, which led to more than 800 national and international publications and to numerous distinctions, about one hundred core technology modules and application demonstrators have been developed, some of them being already commercially exploited. Many of these technologies are of interest beyond the consortium members. Presenting them is the purpose of this catalog. The Technology Catalog presents 72 modules or demonstrators, each of them being described in a double page which provides details on the application domain and technical characteristics. It is composed of two parts: 59• Core Technology Modules, organized per thematic domains; the list of 12 domains, provided on p.4, is reminded on each left-hand page 13• Application Demonstrators; their list, provided on p.5, is reminded on each left-hand page The catalog can also be searched by institution using the index provided at the end of the document. quaero-catalogue-210x210-v1.6-page-per-page.indd 5 02/10/2013 09:53:32
  • 6.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 4 quaero-catalogue-210x210-v1.6-page-per-page.indd 4 02/10/2013 09:53:32
  • 7.
    Application Demonstrators Chromatik -p148 MediaCentric® - p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 5 quaero-catalogue-210x210-v1.6-page-per-page.indd 5 02/10/2013 09:53:35
  • 8.
    AlvisAE: Alvis AnnotationEditor - Inra p8 KIWI: Keyword extractor - Inria p14 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 6 quaero-catalogue-210x210-v1.6-page-per-page.indd 6 02/10/2013 09:53:47
  • 9.
    AlvisIR - Inrap10 TyDI: Terminology Design Interface - Inra p16 Alvis NLP: Alvis Natural Language Processing - Inra p12 7 quaero-catalogue-210x210-v1.6-page-per-page.indd 7 02/10/2013 09:53:47
  • 10.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 8 Alvis Annotation Editor Application sectors Any sector using text documents• Information Extraction• Contents analysis• Target users and customers With AlvisAE, remote users display annotated documents in their web browser and manually create new annotations over the text and share them. Partners: Inra quaero-catalogue-210x210-v1.6-page-per-page.indd 8 02/10/2013 09:53:59
  • 11.
    9 AlvisAE: Alvis AnnotationEditor Contact details: Robert Bossy robert.bossy@jouy.inra.fr Description: Technical requirements: Conditions for access and use: INRA MIG Domaine de Vilvert 78352 Jouy-en-Josas France http://bibliome.jouy.inra.fr AlvisAE is a Web Annotation Editor designed to display and edit fine-grained semantic formal annotations of textual documents. The annotations are used for fast reading or for training Machine Learning algorithms in text mining. The annotations can also be stored in a database and queried. The annotations are entities, n-ary relations and groups. The entities can be discontinuous and overlapping. They are typed by a small set of categories or by concepts from an external ontology. The user can dynamically extend the ontology by dragging new annotations from the text to the ontology. AlvisAE supports collaborative and concurrent annotations and adjudication. Input documents can be in HTML or text format. AlvisAE takes also as input semantic pre-annotations automatically produced by AlvisNLP. Server side: Java 6 or 7, a Java Application and a RDMS. Client Side: The client application can be run by any recent JavaScript enabled web browser (e.g. Firefox, Chromium, Safari). Internet Explorer is not supported. AlvisAE software is developped by INRA, Mathématique, Informatique et Génome lab. It is property of INRA. AlvisAE can be supplied under licence on a case-by- case basis. An open-source distribution is planned in the short term. quaero-catalogue-210x210-v1.6-page-per-page.indd 9 02/10/2013 09:54:03
  • 12.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 10 Semantic document indexing and search engine framework Target users and customers Domain-specific communities, especially technical and scientific, willing to build search engines and information systems to manage documents with fine- grained semantic annotations. Partners: Inra Application sectors Search engines and information systems development. quaero-catalogue-210x210-v1.6-page-per-page.indd 10 02/10/2013 09:54:15
  • 13.
    11 AlvisIR Contact details: Robert Bossy robert.bossy@jouy.inra.fr Description: Technicalrequirements: Conditions for access and use: INRA MIG Domaine de Vilvert 78352 Jouy-en-Josas cedex France http://bibliome.jouy.inra.fr Linux platform• Perl• libxml2• Zebra indexing engine• PHP5• Sources available upon request. Free of use for academic institutions. AlvisIR is a complete suite for indexing documents with fine-grained semantic annotations. The search engine performs a semantic analysis of the user query and searches for synonyms and sub-concepts. AlvisIR has two main components: 1. the indexing tool and search daemon based on IndexData’s Zebra that supports standard CQL queries, 2. the web user interface featuring result snippets, query-term highlight, facet filtering and concept hierarchy browsing. Setting up a search engine requires the semantic resources for query analysis (synonyms and concept hierarchy) and a set of annotated documents. AlvisIR is closely integrated with AlvisNLP and TyDI for document annotation and semantic resources acquisition respectively. Indicative indexing time: 24mn for a corpus containing 5 million annotations. Indicative response time: 18s for a response containing 20,000 annotations. quaero-catalogue-210x210-v1.6-page-per-page.indd 11 02/10/2013 09:54:19
  • 14.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 12 A pipeline framework for Natural Language Processing Target users and customers The targeted audience includes projects that require usual Natural Language Processing tools for production and research purpose. Partners: Inra Application sectors Natural language processing• Contents analysis• Information retrieval• quaero-catalogue-210x210-v1.6-page-per-page.indd 12 02/10/2013 09:54:32
  • 15.
    13 Alvis NLP: AlvisNatural Language Processing Contact details: Robert Bossy robert.bossy@jouy.inra.fr Description: INRA MIG Domaine de Vilvert 78352 Jouy-en-Josas cedex France http://bibliome.jouy.inra.fr Technical requirements: Java 7 Weka Conditions for access and use: Sources available upon request. Free of use for academic institutions. Alvis NLP is a pipeline framework to annotate text documents using Natural Language Processing (NLP) tools for sentence and word segmentation, named-entity recognition, term analysis, semantic typing and relation extraction (see the paper by Nedellec et al. in Handbook on Ontologies 2009 for a comprehensive overview). The various available functions are accessible as modules, that can be composed in a sequence forming the pipeline. This sequence, as well as parameters for the modules, is specified through a XML-based configuration file. New components can easily be integrated into the pipeline. To implement a new module, one has to build a Java class manipulating text annotations following the data model defined in Alvis NLP. The class is loaded at run-time by Alvis NLP, which makes the integration much easier. quaero-catalogue-210x210-v1.6-page-per-page.indd 13 02/10/2013 09:54:34
  • 16.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 14 Keyword extractor Target users and customers The targeted users and customers are the multimedia industry actors, and all academic or industrial laboratories interested in textual document processing. Partners: Inra Application sectors Textual and multimedia document processing• quaero-catalogue-210x210-v1.6-page-per-page.indd 14 02/10/2013 09:54:47
  • 17.
    15 KIWI: Keyword extractor Contactdetails: General issues: Patrick Gros patrick.gros@irisa.fr Description: Technical requirements: Conditions for access and use: IRISA/Texmex team Campus de Beaulieu 35042 Rennes Cedex France http://www.irisa.fr/ SPC with Unix/Linux OS• Kiwi requires the TreeTagger [1] software to be• installed on the system Kiwi requires the Flemm [2] software to be installed on• the system [1] http://www.ims.uni-stuttgart.de/projekte/corplex/ TreeTagger/ [2] http://www.univnancy2.fr/pers/namer/Telecharger_ Flemm.htm Kiwi is a software that has been developed at Irisa/Inria- Rennes and is the property of Inria. Registration at the Agency for Program Protection (APP) in France, is under process. Kiwi is currently available as a prototype only. It can be released and supplied under license on a case-by-case basis. Technical issues: Sébastien Campion scampion@irisa.fr Kiwi is a software dedicated to the extraction of keywords from a textual document. From an input text, or preferably a normalized text, Kiwi outputs a weighted word vector (see figure 1 below). This ranked keyword vector can then be used as a document description or for indexing purposes. Kiwi is a software dedicated to the extraction of keywords from a textual document. From an input text, or preferably a normalized text, Kiwi outputs a weighted word vector (see figure 1 below). This ranked keyword vector can then be used as a document description or for indexing purposes. Kiwi was developed at Irisa/INRIA Rennes by the Texmex team. The Kiwi author is: Gwénolé Lecorvé quaero-catalogue-210x210-v1.6-page-per-page.indd 15 02/10/2013 09:54:52
  • 18.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 16 A platform for the validation, structuration and export of termino- ontologies Target users and customers The primary use of TyDI is the design of termino- ontologies for the indexing of textual documents. It can therefore be of great help for most projects involved in natural language processing. Partners: Inra Application sectors Terminology structuration• Textual document indexing• Natural language processing• quaero-catalogue-210x210-v1.6-page-per-page.indd 16 02/10/2013 09:55:05
  • 19.
    17 TyDI: Terminology DesignInterface Contact details: Robert Bossy robert.bossy@jouy.inra.fr Description: Technical requirements: Conditions for access and use: INRA MIG Domaine de Vilvert 78352 Jouy-en-Josas Cedex France http://bibliome.jouy.inra.fr Server side: Glassfish and Postgresql servers• Client side: Java Virtual Machine version 1.5• TyDI is a software developped by INRA, Mathématique, Informatique et Génome and is the property of INRA. TyDI can be supplied under licence on a case-by-case basis. For more information, please contact Robert Bossy (robert.bossy@jouy.inra.fr) Figure 1: The client interface of TyDI. It is composed of several panels (hierarchichal/tabular view of the terms, search panel, context of appearance of selected terms …) TyDI is a collaborative tool for manual validation/ annotation of terms either originating from terminologies or extracted from training corpus of textual documents. It is used on the output of so-called term extractor programs (like Yatea), which are used to identify candidate terms (e.g. compound nouns). Thanks to TyDI, a user can validate candidate terms and specify synonymy/hyperonymy relations. These annotations can then be exported in several formats, and used in other Natural Language Processing tools. quaero-catalogue-210x210-v1.6-page-per-page.indd 17 02/10/2013 09:55:12
  • 20.
    FIDJI:Web Question-Answering System -LIMSI - CNRS p20 RITEL: Spoken and Interactive Question-Answering System - LIMSI - CNRS p26 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 18 quaero-catalogue-210x210-v1.6-page-per-page.indd 18 02/10/2013 09:55:24
  • 21.
    Question-Answering System - SynapseDéveloppement p22 QAVAL: Question Answering by Validation - LIMSI - CNRS p24 19 quaero-catalogue-210x210-v1.6-page-per-page.indd 19 02/10/2013 09:55:24
  • 22.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 20 A question-answering system aims at answering questions written in natural language with a precise answer. Target users and customers Web Question-answering is an end-user application. FIDJI is an open-domain QA system for French and English Partners: LIMSI-CNRS Application sectors Information retrieval on the Web or in document collections quaero-catalogue-210x210-v1.6-page-per-page.indd 20 02/10/2013 09:55:37
  • 23.
    21 FIDJI: Web Question-AnsweringSystem Contact details: Véronique Moriceau moriceau@limsi.fr Description: LIMSI-CNRS Groupe ILES B.P. 133 91403 Orsay Cedex France http://www.limsi.fr/ Technical requirements: PC with Linux platform Conditions for access and use: Available for licensing on case-by-case basis Xavier Tannier xtannier@limsi.fr Document retrieval systems such as search engines provide the user with a large set of pairs URL/snippets containing relevant information with respect to a query. To obtain a precise answer, the user then needs to locate relevant information within the documents and possibly to combine different pieces of information coming from one or several documents. To avoid these problems, focused retrieval aims at identifying relevant documents and locating the precise answer to a user question within a document. Question- answering (QA) is a type of focused retrieval: its goal is to provide the user with a precise answer to a natural language question. While information retrieval (IR) methods are mostly numerical and use only little linguistic knowledge, QA often implies deep linguistic processing, large resources and expert rule-based modules. Most question-answering systems can extract the answer to a factoid question when it is explicitly present in texts, but are not able to combine different pieces of information to produce an answer. FIDJI (Finding In Documents Justifications and Inferences), an open- domain QA system for French and English, aims at going beyond this insufficiency and focuses on introducing text understanding mechanisms. The objective is to produce answers which are fully validated by a supporting text (or passage) with respect to a given question. The main difficulty is that an answer (or some pieces of information composing an answer) may be validated by several documents. For example: Q: Which French Prime Minister committed suicide? A: Pierre Bérégovoy P1: The French Prime Minister Pierre Bérégovoy warned Mr. Clinton against… P2: Two years later, Pierre Bérégovoy committed suicide after he was indirectly implicated… In this example, the information “French Prime Minister” and “committed suicide” are validated by two different complementary passages. Indeed, this question may be decomposed into two sub-questions, e.g. “Who committed suicide?” and “Are they French Prime Minister?”. FIDJI uses syntactic information, especially dependency relations which allow question decomposition. The goal is to match the dependency relations derived from the question and those of a passage and to validate the type of the potential answer in this passage or in another document. Another important aim of FIDJI is to answer new categories of questions, called complex questions, typically “how” and “why” questions. Complex questions do not exist in traditional evaluation campaigns but have been introduced within the Quaero framework. Answers to these particular questions are no longer short and precise answers, but rather parts of documents or even full documents. In this case, the linguistic analysis of the question provides a lot of information concerning the possible form of the answer and keywords that should be sought in candidate passages. quaero-catalogue-210x210-v1.6-page-per-page.indd 21 02/10/2013 09:55:37
  • 24.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 22 A Question Answering system allows the user to ask questions in natural language and to obtain one or several answers. For boolean and generic questions, our system is able to generate potential questions and to return the corresponding answers. Target users and customers End-user application, Question-Answering is the easiest way to find information for everybody: ask the question as you want and obtain answers, not snippets or pages. Partners: Synapse Développement Application sectors Search and find precise answers in any collection of texts, from the Web or any other source (voice recognition, optical character recognition, etc.), with eventual correction of the source text, ability to generate questions from generic requests, eventually a single word, ability to find similar questions and their answers, etc. Monolingual and multilingual Question-Answering system. Languages: English, French (+ Spanish, Portuguese, Polish, with partners using the same API). quaero-catalogue-210x210-v1.6-page-per-page.indd 22 02/10/2013 09:55:50
  • 25.
    23 Question-Answering System Contact details: PatrickSéguéla patrick.seguela@synapse-fr.com Description: Technical requirements: Conditions for access and use: Synapse Développement 33, rue Maynard 31000 Toulouse France http://www.synapse-developpement.fr/ SPC with Windows or Linux• RAM minimum : 4 Gb• HDD minimum : 100 Gb• SDK available for integration in programs or Web services. For specific conditions of use, contact us. The technology is a system based on very consequent linguistic resources and on NLP state- of-the-art technologies, especially, syntactic and semantic parsing, with sophisticated features like resolution of anaphora, word sense disambiguation or relations between named entities. On news and Web corpora, our system is regularly awarded in the international and national evaluation campaigns (EQueR 2004, CLEF 2005, 2006, 2007, Quaero 2008, 2009). quaero-catalogue-210x210-v1.6-page-per-page.indd 23 02/10/2013 09:55:55
  • 26.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 24 A question answering system that is adapted for searching precise answers in textual passages extracted from Web documents or text collections. Target users and customers Question-answering is for both the general public to retrieve precise information in raw texts, and for companies and organizations, that have specific text mining needs. Question-answering systems suggest short answers and their justification passage to questions provided in natural language. Partners: LIMSI-CNRS Application sectors Extension of search engine, technology monitoring quaero-catalogue-210x210-v1.6-page-per-page.indd 24 02/10/2013 09:56:08
  • 27.
    25 QAVAL: Question Answeringby VALidation Contact details: Brigitte Grau Brigitte.Grau@limsi.fr Technical requirements: Conditions for access and use: LIMSI-CNRS ILES Group B.P. 133 91403 Orsay Cedex France www.limsi.fr/Scientifique/iles Linux platform Available for licensing on a case-by-case basis Description: The large number of documents currently on the Web, but also on intranet systems, makes it necessary to provide users with intelligent assistant tools to help them find the specific information they are searching for. Relevant information at the right time can help solving a particular task. Thus, the purpose is to be able to access the content of texts, and not only give access to documents. Question-answering systems address this question. Question-answering systems aim at finding answers to a question asked in natural language, using a collection of documents. When the collection is extracted from the Web, the structure and style of the texts are quite different from those of newspaper articles. We developed a question-answering system QAVAL based on an answer validation process able to handle both kinds of documents. A large number of candidate answers are extracted from short passages in order to be validated, according to question and excerpt characteristics. The validation module is based on a machine learning approach. It takes into account criteria characterizing both excerpt and answer relevance at surface, lexical, syntactic and semantic levels, in order to deal with different types of texts. QAVAL is made of sequential modules, corresponding to five main steps. The question analysis provides main characteristics to retrieve excerpts and guide the validation process. Short excerpts are obtained directly from the search engine and are parsed and enriched with the question characteristics, which allows QAVAL to compute the different features for validating or discarding candidate answers. quaero-catalogue-210x210-v1.6-page-per-page.indd 25 02/10/2013 09:56:15
  • 28.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 26 A spoken and interactive QA system that helps a user to find an answer to his question, spoken or written, in a collection of documents. Target users and customers Question-answering is an end-user application. The purpose is to go beyond the traditional way of retrieving information through search engines. Our system is interactive, with both a speech (phone or microphone) and text (web) interface. Partners: LIMSI-CNRS Application sectors QA system can be viewed as a direct extension of search engines. They allow a user to ask questions in natural language. quaero-catalogue-210x210-v1.6-page-per-page.indd 26 02/10/2013 09:56:28
  • 29.
    27 RITEL: Spoken andInteractive Question-Answering System Contact details: Sophie Rosset sophie.rosset@limsi.fr Description: Technical requirements: Conditions for access and use: LIMSI-CNRS TLP Group B.P. 133 91403 Orsay Cedex France http://www.limsi.fr/tlp/ PC with Linux platform. Available for licensing on case-by-case basis. There are different ways to go beyond standard retrieval systems such as search engines. One of them is to offer the users different ways to express their query: some prefer to use speech to express a query, while others prefer written natural language. Another way is to allow the user to interact with the system. The Ritel system aims at integrating a dialog system and an open-domain information retrieval system to allow a human to ask a general question (f.i. « Who is currently presiding the Senate?’’ or « How did the price of gas change for the last ten years?’’) and refine his research interactively. A human-computer dialog system analyses and acts on the user requests depending on the task at hand, the previous interactions and the user’s behaviour. Its aim is to provide the user with the information being sought while maintaining a smooth and natural interaction flow. The following example illustrates the kind of possible interaction with the Ritel system: [S] Hi, Ritel speaking! What is your first question? [U] who built the Versailles Castle [S] Your search is about Versailles Castle and built . The answer is Louis XIII. Do you want to ask another question? [U] in which year [S] 1682, according to the documents I had access to. Another question? [U] Who designed the garden [S] The following items are used for searching: Versailles, gardens and designed. André Le Nôtre. Anything else? The dialog system is comprised of a component for user utterance analysis, a component for dialog management, and a component for interaction management. The system for information retrieval and question-answering is tightly integrated within it. The user interface can be phone-based or web-based for written interaction. quaero-catalogue-210x210-v1.6-page-per-page.indd 27 02/10/2013 09:56:33
  • 30.
    Acoustic Speaker Diarization- LIMSI - CNRS p30 Automatic Speech Transcription - Vocapia p36 Speech-to-Text - Karlsruhe Institute of Technology (KIT) p42 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 28 quaero-catalogue-210x210-v1.6-page-per-page.indd 28 02/10/2013 09:56:45
  • 31.
    MediaSpeech®alignement -Vecsys p32 Corinat®- Vecsysp38 Automatic Speech Recognition - RWTH Aachen University p34 Language Identification - Vocapia p40 29 quaero-catalogue-210x210-v1.6-page-per-page.indd 29 02/10/2013 09:56:45
  • 32.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 30 The module aims at performing automatic segmentation and clustering of an input audio according to speaker identity using acoustic cues. Target users and customers Multimedia document indexing and archiving services. Partners: LIMSI-CNRS Application sectors Multimedia document management• Search by content into audio-visual documents• quaero-catalogue-210x210-v1.6-page-per-page.indd 30 02/10/2013 09:56:59
  • 33.
    31 Acoustic Speaker Diarization Contactdetails: Claude Barras claude.barras@limsi.fr Description: Technical requirements: Conditions for access and use: LIMSI-CNRS Spoken Language Processing Group B.P. 133 91403 Orsay Cedex France http://www.limsi.fr/tlp/ A standard PC with Linux operating system. The technology developed at LIMSI-CNRS is available for licensing on a case-by-case basis. Speaker diarization is the process of partitioning an input audio stream into homogeneous segments according to their speaker identity. This partitioning is a useful preprocessing step for an automatic speech transcription system, but it can also improve the readability of the transcription by structuring the audio stream into speaker turns. One of the major issues is that the number of speakers in the audio stream is generally unknown a priori and needs to be automatically determined. Given samples of known speaker’s voices, speaker verification techniques can be further applied and provide clusters of identified speaker. The LIMSI multi-stage speaker diarization system combines an agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering stage using speaker identification (SID) techniques with more complex models. This system participated to several evaluations on acoustic speaker diarization, on US English Broadcast News for NIST Rich Transcription 2004 Fall (NIST RT’04F) and on French broadcast radio and TV news and conversations for the ESTER-1 and ESTER-2 evaluation campaigns, providing state-of-the-art performances. Within the QUAERO program, LIMSI is developing improved speaker diarization and speaker tracking systems for broadcast news but also for more interactive data like talk shows. It is a building block of the system presented by QUAERO partners to the REPERE challenge on multimodal person identification. quaero-catalogue-210x210-v1.6-page-per-page.indd 31 02/10/2013 09:57:00
  • 34.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 32 Audio and Text synchronization tool Target users and customers E-editors• Media content producers• Media application developers• Search interface integrators• Partners: Vecsys Bertin Technologies Exalead Application sectors Public/Private debates and conference: E.g.• Parliament, Meetings E-learning/E-books: E.g. Audiobook• Media Asset Management: E.g. Search in• annotated media streams (TV, radio, films…) quaero-catalogue-210x210-v1.6-page-per-page.indd 32 02/10/2013 09:57:14
  • 35.
    33 MediaSpeech® alignment Contact details: ArianeNabeth-Halber anabeth@vecsys.fr Technical requirements: Conditions for access and use: Vecsys Parc d’Activité du Pas du Lac 10 bis avenue André Marie Ampère 78180 Montigny-le-Bretonneux France http://www.vecsys.fr Standard Web access Available in SaaS mode or installed on a server or installed as Virtual Machine in MediaSpeech® product line. Quotation on request Description: This technology synchronizes an audio stream with its associated text transcript: it takes as inputs both audio stream and raw transcript and produces as output a “time coded” transcript, i.e. each word or group of words is associated with its precise occurrence in the audio stream. The technology is pretty robust and handles nicely slight variations between audio speech and text transcript. quaero-catalogue-210x210-v1.6-page-per-page.indd 33 02/10/2013 09:57:15
  • 36.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 34 Automatic speech recognition, also known as speech-to-text, is the transcription of speech into (machine- readable) text by a computer Target users and customers Researchers• Developers• Integrators• Partners: RWTH Aachen University Application sectors The use of automatic speech recognition is so manifold that it is hard to list here. Main usages today are customer interaction via the telephone, healthcare dictation and usage on car navigation systems and smartphones. With increasingly better technology, these applications are extending to audio mining, speech translation and an increased use of human computer interaction via speech. quaero-catalogue-210x210-v1.6-page-per-page.indd 34 02/10/2013 09:57:29
  • 37.
    35 Automatic Speech Recognition Contactdetails: Volker Steinbiss steinbiss@informatik.rwth-aachen.de Description: Technical requirements: Conditions for access and use: RWTH Aachen University Lehrstuhl Informatik 6 Templergraben 55 52072 Aachen Germany http://www-i6.informatik.rwth-aachen.de Speech translation is a computationally and memory-intensive process, so the typical set-up is to have one or several computers in the internet serving the speech translation requirements of many users. RWTH provides on open-source speech recognizer free of charge for academic usage. Other usage should be subject to a bilateral agreement. Automatic speech recognition is a very hard problem in computer science but more mature than machine translation. After a media hype at the end of the 1990’s, the technology has continuously improved and it has been adopted by the market, e.g. in large deployments in the customer contact sector, in the automation in radiology dictation, or in voice enabled navigation systems in the automotive sector. Public awareness has increased through the use on smart-phones, in particular Siri. The research community concentrates on problems such as the recognition of spontaneous speech or the easy acquisition of new languages. quaero-catalogue-210x210-v1.6-page-per-page.indd 35 02/10/2013 09:57:34
  • 38.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 36 Vocapia Research develops core multilingual large vocabulary speech recognition technologies* for voice interfaces and automatic audio indexing applications. This speech-to-text technology is available for multiple languages. (* Under license from LIMSI-CNRS) Target users and customers The targeted users and customers of speech-to-text transcription technologies are actors in the multimedia and call center sector, including academic and industrial organizations interested in the automatic mining processing of audio or audiovisual documents. Partners: Vocapia Application sectors This core technology can serve as the basis for a variety of applications: multilingual audio indexing, teleconference transcription, telephone speech analytics, transcription of speeches, subtitling… Large vocabulary continuous speech recognition is the key technology for enabling content-based information access in audio and audiovisual documents. Most of the linguistic information is encoded in the audio channel of audiovisual data, which once transcribed can be accessed using text-based tools. Via speech recognition, spoken document retrieval can support random access using specific criteria to relevant portions of audio documents, reducing the time needed to identify recordings in large multimedia databases. Some applications are data-mining, news-on- demand, and media monitoring. quaero-catalogue-210x210-v1.6-page-per-page.indd 36 02/10/2013 09:57:48
  • 39.
    37 Automatic Speech Transcription Contactdetails: Bernard Prouts prouts@vocapia.com contact@vocapia.com +33 (0)1 84 17 01 14 Description: Technical requirements: Conditions for access and use: Vocapia Research 28, rue Jean Rostand Parc Orsay Université 91400 Orsay France www.vocapia.com PC with Linux platform (via licensing use). The VoxSigma software is available both via licensing and via our web service. The Vocapia Research speech transcription system transcribes the speech segments located in an audio file. Currently systems for 17 languages varieties are available for broadcast and web data. Conversational speech transcription systems are available for 7 languages. The transcription system has two main components: an audio partitioner and a word recognizer. The audio partitioner divides the acoustic signal into homogeneous segments, and associates appropriate (document internal) speaker labels with the segments. For each speech segment, the word recognizer determines the sequence of words, associating start and end times and a confidence measure for each word. quaero-catalogue-210x210-v1.6-page-per-page.indd 37 02/10/2013 09:57:49
  • 40.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 38 Language Resources production infrastructure Target users and customers Linguistic resources providers• Audio content transcribers; Media transcribers• Speech processing users and developers• Partners: Vecsys LIMSI-CNRS Application sectors Language resources production• Speech technology industry• Media subtitling, conferences and meetings• transcription services quaero-catalogue-210x210-v1.6-page-per-page.indd 38 02/10/2013 09:58:03
  • 41.
    39 Corinat® Contact details: Ariane Nabeth-Halber anabeth@vecsys.fr Description: Technicalrequirements: Conditions for access and use: Vecsys Parc d’Activité du Pas du Lac 10 bis avenue André Marie Ampère 78180 Montigny-le-Bretonneux France http://www.vecsys.fr Standard Web access Quotation on request Corinat® is a hardware/software infrastructure for language resources production that offers the following functionalities: Data collection (broadcast, conversational)• Audio data automatic pre-processing• Annotation tasks distribution• Annotations semi-automatic post-processing• Corinat® is a high availability platform (24/7), with a web-based interface for language resources production management in any location. quaero-catalogue-210x210-v1.6-page-per-page.indd 39 02/10/2013 09:58:07
  • 42.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 40 Vocapia Research provides a language identification technology* that can identify languages in audio data. (* Under license from LIMSI-CNRS) Target users and customers The targeted users and customers of language recognition technologies are actors in the multimedia and call center sectors, including academic and industrial organizations, as well as actors in the defense domain, interested in the processing of audio documents, and in particular if the collection of documents contains multiple languages. Partners: Vocapia Application sectors A language identification system can be run prior to a speech recognizer. Its output is used to load the appropriate language dependent speech recognition models for the audio document. Alternatively, the language identification might be used to dispatch audio documents or telephone calls to a human operators fluent in the corresponding identified language. Other potential applications also involve the use of LID as a front-end to a multi-lingual translation system. This technology can also be part of automatic system for spoken data retrieval or automatic enriched transcriptions. quaero-catalogue-210x210-v1.6-page-per-page.indd 40 02/10/2013 09:58:21
  • 43.
    41 Language Identification Contact details: BernardProuts prouts@vocapia.com contact@vocapia.com +33 (0)1 84 17 01 14 Description: Technical requirements: Conditions for access and use: Vocapia Research 28, rue Jean Rostand Parc Orsay Université 91400 Orsay www.vocapia.com PC with Linux platform (via licensing use). The VoxSigma software is available both via licensing and via our web service. The VoxSigma software suite can recognize the language spoken in an audio document or in speech segments defined in an input XML file. The default set of possible languages and their associated models can be specified by the user. LID systems are available for broadcast and conversational data. Currently 15 languages for broadcast news audio and 50 languages for conversational telephone speech are included in the respective VR LID system. New languages can easily be added to the system. The VoxSigma software suite uses multiple phone-based decoders in parallel to take a decision about which language is in the audio file. The system specifies the language of the audio document along with a confidence score. In the current version, it is assumed that a channel of an audio document is in a single language. In future versions, it is planned to allow multiple languages in a single document. quaero-catalogue-210x210-v1.6-page-per-page.indd 41 02/10/2013 09:58:21
  • 44.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 42 Transcription of human speech into written word sequences Target users and customers Companies who want to integrate the transcription of human speech into their products. Partners: Karlsruhe Institute of Technology (KIT) Application sectors Speech-to-Text technology is key to indexing multimedia content as it is found in multimedia databases or in video and audio collections on the World Wide Web, and to make it searchable by human queries. In addition, it offers a natural interface for submitting and executing queries. This technology is further part of speech-translation services. In communication with machine translation technology, it is possible to design machines that take human speech as input and translate it into a new language. This can be used to enable human-to- human combination across the language barrier or to access languages in a cross-lingual way. quaero-catalogue-210x210-v1.6-page-per-page.indd 42 02/10/2013 09:58:35
  • 45.
    43 Speech-to-Text Contact details: Prof. AlexWaibel waibel@ira.uka.de Description: Technical requirements: Conditions for access and use: Karlsruhe Institute of Technology (KIT) Adenauerring 2 76131 Karlsruhe Germany http://isl.anthropomatik.kit.edu Linux based server with 2GB of RAM. Available for licensing on a case-by-case basis. The KIT speech transcription system is based on the JANUS Recognition Toolkit (JRTk) which features the IBIS single pass decoder. The JRTk is a flexible toolkit which follows an object-oriented approach and which is controlled via Tcl/Tk scripting. Recognition can be performed in different modes: In offline mode, the audio to be recognized is first segmented into sentence-like units. Theses segments are then clustered in an unsupervised way according to speaker. Recognition can then be performed in several passes. In between passes, the models are adapted in an unsupervised manner in order to improve the recognition performance. System combination using confusion network combination can be used in addition to further improve recognition performance. In run-on mode, the audio to be recognized is continuously processed without prior segmentation. The output is a steady stream of words. The recognizer can be flexibly configured to meet given real-time requirements, between the poles of recognition accuracy and recognition speed. Within the Quaero project, we are targeting the languages English, French, German, Russian, and Spanish. Given sufficient amounts of training material, the HMM based acoustic models can be easily adapted to additional languages and domains. quaero-catalogue-210x210-v1.6-page-per-page.indd 43 02/10/2013 09:58:41
  • 46.
    Machine Translation -RWTH Aachen University p46 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 44 quaero-catalogue-210x210-v1.6-page-per-page.indd 44 02/10/2013 09:58:54
  • 47.
    Speech Translation -RWTH Aachen University p48 45 quaero-catalogue-210x210-v1.6-page-per-page.indd 45 02/10/2013 09:58:54
  • 48.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 46 Automatic translation of text breaks the language barrier: It allows instant access to information in foreign languages. Target users and customers Researchers• Developers• Integrators• Partners: RWTH Aachen University Application sectors As translation quality is far below the work of professional human translators, machine translation is targeted to situations where instant access and low cost are key and high quality is not demanded, for example: Internet search (cross-language document• retrieval) Internet (on-the-fly translation of foreign-• language websites or news feeds) quaero-catalogue-210x210-v1.6-page-per-page.indd 46 02/10/2013 09:59:06
  • 49.
    47 Machine Translation Contact details: VolkerSteinbiss steinbiss@informatik.rwth-aachen.de Description: Technical requirements: Conditions for access and use: RWTH Aachen University Lehrstuhl Informatik 6 Templergraben 55 52072 Aachen Germany http://www-i6.informatik.rwth-aachen.de Translation is a memory-intense process, so the typical set-up is to have one or several computers in the internet serving the translation requirements of many users. RWTH provides open-source translation tools free of charge for academic usage. Other usage should be subject to a bilateral agreement. Machine translation is a very hard problem in computer science and has been worked on for decades. The corpus-based methods that emerged in the 1990’s allow the computer to actually learn translation from existing bilingual texts – you could say, from many translation examples. A correct mapping is indeed not easy to learn, as the translation of a word depends on its context, and word orders typically differ across languages. It is fascinating to see this technology improving over the years. The learning methods are more of a mathematical kind and can be applied to any language pair. quaero-catalogue-210x210-v1.6-page-per-page.indd 47 02/10/2013 09:59:12
  • 50.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 48 Automatic translation of speech practically sub-titles – in your native language! – the speech of foreign-language speakers. Target users and customers Researchers• Developers• Integrators• Partners: RWTH Aachen University Application sectors Sub-titling of broadcast via television or internet• Internet search in audio and video material• (cross-language retrieval) quaero-catalogue-210x210-v1.6-page-per-page.indd 48 02/10/2013 09:59:25
  • 51.
    49 Speech Translation Contact details: VolkerSteinbiss steinbiss@informatik.rwth-aachen.de Description: Technical requirements: Conditions for access and use: RWTH Aachen University Lehrstuhl Informatik 6 Templergraben 55 52072 Aachen Germany http://www-i6.informatik.rwth-aachen.de Speech translation is a computationally and memory-intensive process, so the typical set-up is to have one or several computers in the internet serving the speech translation requirements of many users. RWTH provides on open-source speech recognizer and various open-source tools free of charge for academic usage. Other usage should be subject to a bilateral agreement. In a nutshell, speech translation is the combination of two hard computer science problems, namely speech recognition (automatic transcription of speech into text) and machine translation (automatic translation of a text from a source to a target language). While both technologies do not work perfectly, it is impressive to see them working in combination, in particular when we have not even rudimentary knowledge of the source language – for many of us, this is the case for the Chinese or the Arabic language. The mathematical methods behind both speech recognition and machine translation are related, and the systems draw their knowledge from large amounts of example data. quaero-catalogue-210x210-v1.6-page-per-page.indd 49 02/10/2013 09:59:30
  • 52.
    Sync Audio Watermarking -Technicolorp52 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 50 quaero-catalogue-210x210-v1.6-page-per-page.indd 50 02/10/2013 09:59:43
  • 53.
    SAMuSA: Speech AndMusic Segmenter and Annotator -Inria p54 Yaafe: Audio feature extractor - Télécom ParisTech p56 51 quaero-catalogue-210x210-v1.6-page-per-page.indd 51 02/10/2013 09:59:43
  • 54.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 52 Technicolor Sync Audio Watermarking technologies Target users and customers Content Owners• Studios• Broadcasters• Content distributors• Partners: Technicolor Application sectors Technicolor Sync Audio Watermarking allows studios and content owners to create more valuable and attractive content by• delivering premium quality information to generate additional earnings through targeted ads,• e-commerce and product placement alongside main screen content Technicolor Sync Audio Watermarking allows broadcasters and content distributors to provide distinctive content and retain audiences• to control complementary content on the 2nd screen• within their branded environment to leverage real-time, qualified behavior metadata to• better understand customers and deliver personalized content and recommendations ContentArmor™ Audio Watermarking allows content owners to deter content leakage by tracking the source of pirated copies. quaero-catalogue-210x210-v1.6-page-per-page.indd 52 02/10/2013 09:59:55
  • 55.
    53 Sync Audio Watermarking Contactdetails: Gwenaël Doërr gwenael.doerr@technicolor.com Description: Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs / CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com Technicolor Sync Audio Watermarking detector• works on Android and iOS. The watermark embedder of both technologies• works on Linux and MacOS. Both systems can be licensed as software executables or libraries. With Technicolor Sync Audio Watermarking technologies, studios, content owners, aggregators and distributors can sync live, recorded or time- shifted content and collect qualified metadata. And thanks to Technicolor’s expertise in both watermarking and entertainment services, these solutions are easily integrated into your existing post- production, broadcast and any new media delivery workflows. Technicolor sync technologies open access to all the benefits of new attractive companion app markets with no additional infrastructure cost. Content identification and a time stamp are inaudibly inserted into the audio signal in post-production or during broadcast. The 2nd screen device picks up the audio signal, decodes the watermark and synchronizes the app on the 2nd screen thanks to the embedded content identification data. Audio watermarking uses the original content audio signal as its transmission channel, ensuring compatibility with all existing TVs, PVRs or DVD/Blu- ray players as well as legacy devices without network interfaces. It works for realtime, time-shifted and recorded content. quaero-catalogue-210x210-v1.6-page-per-page.indd 53 02/10/2013 09:59:56
  • 56.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 54 Speech And Music Segmenter and Annotator Target users and customers The targeted users and customers are the multimedia industry actors, and all academic or industrial laboratories interested in audio document processing. Partners: Inra Application sectors Audio and multimedia document processing• quaero-catalogue-210x210-v1.6-page-per-page.indd 54 02/10/2013 10:00:09
  • 57.
    55 SAMuSA: Speech AndMusic Segmenter and Annotator Contact details: General issues: Patrick Gros patrick.gros@irisa.fr Technical requirements: Conditions for access and use: IRISA/Texmex team Campus de Beaulieu 35042 Rennes Cedex France http://www.irisa.fr/ PC with Unix/Linux OS• SAMuSA is a software that has been developed at Irisa in Rennes and is the property of CNRS and Inria. SAMuSA is currently available as a prototype only. It can be released and supplied under license on a case- by-case basis. Technical issues: Sébastien Campion scampion@irisa.fr Description: As shown on Figure below, the SAMuSA module takes an audio file or stream as an input, and returns a text file containing detected segments of: speech, music and silence. To perform segmentation, SAMuSA uses audio class models as external resources. It also calls external tools for audio feature extraction (Spro software [1]), and for audio segmentation and classification (Audioseg software [2]). These tools are included in the SAMuSA package. Trained on hours of various TV and radio programs, this module provides efficient results: 95% of speech and 90% of music are correctly detected. One hour of audio can be computed in approximately one minute on standard computers. [1] http://gforge.inria.fr/projects/spro/ [2] http://gforge.inria.fr/projects/audioseg/ SAMuSA was developed in Irisa/INRIA Rennes by the Metiss team. The SAMuSA authors are: Frédéric Bimbot, Guillaume Gravier, Olivier Le Blouch. The Spro author is: Guillaume Gravier The Audioseg authors are: Mathieu Ben, Michaël Betser, Guillaume Gravier quaero-catalogue-210x210-v1.6-page-per-page.indd 55 02/10/2013 10:00:13
  • 58.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 56 Yaafe is a low-level and mid-level audio features extractor, designed to extract large number of features over large audio files. Target users and customers Targeted integrators and users are industrial or academic laboratories in the field of audio signal processing and in particular for music information retrieval tasks. Partners: Télécom ParisTech Application sectors Music information retrieval.• Audio segmentation.• quaero-catalogue-210x210-v1.6-page-per-page.indd 56 02/10/2013 10:00:25
  • 59.
    57 Yaafe: Audio featureextractor Contact details: S. Essid slim.essid@telecom-paristech.fr Description: Technical requirements: Conditions for access and use: Télécom ParisTech 37 rue Dareau 75014 Paris / France http://www.tsi.telecomparistech.fr/aao/en/2010/02/19/ yaafe-audio-feature-extractor/ Yaafe is a C++/Python software available for linux and Mac. Yaafe has been released under LGPL licence and is available for download on Sourceforge. Some mid-level feature ARE available in a separate library, with a proprietary licence. Yaafe is designed to extract a large number of features simultaneously, in an efficient way. It automatically optimizes features’ computation, so that each intermediate representation (spectrum, CQT, envelope, etc…) is computed only once. Yaafe works in a streaming mode, so it has a low memory footprint and can process arbitrarily long audio files. Available features are spectral features, perceptual features (loudness), MFCC, CQT, chroma, chords, onsets detection. A user can select his own set of features and transformations (derivative, temporal integration), and easily adapt all parameters to his own task. quaero-catalogue-210x210-v1.6-page-per-page.indd 57 02/10/2013 10:00:32
  • 60.
    Colorimetric Correction System- Jouve p60 Document Reader -A2iA p66 Handwriting Recognition System - Jouve p72 Recognition of Handwritten Text - RWTH Aachen University p78 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 58 quaero-catalogue-210x210-v1.6-page-per-page.indd 58 02/10/2013 10:00:44
  • 61.
    Document Classification System- Jouve p62 Document Structuring System - Jouve p68 Image Descreening System - Jouve p74 Document Layout Analysis System - Jouve p64 Grey Level Character Recognition System -Jouve p70 Image Resizing for Print on Demand Scanning - Jouve p76 59 quaero-catalogue-210x210-v1.6-page-per-page.indd 59 02/10/2013 10:00:44
  • 62.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 60 A specific tool to create a suitable colorimetric correction and check its stability over time Target users and customers Everyone who has to deal with highcolorimetric constraints. Partners: Jouve Application sectors Patrimony• Industry• quaero-catalogue-210x210-v1.6-page-per-page.indd 60 02/10/2013 10:00:58
  • 63.
    61 Colorimetric Correction System Contactdetails: Jean-Pierre Raysz jpraysz@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: The system uses a file containing reference values of calibration target and the image obtained from target scanning. A profile is created from this file. In order to improve correction, a table of colors transformation is integrated to the system. To guarantee the required quality, the system checks several times the values of a calibration target against the specifications. quaero-catalogue-210x210-v1.6-page-per-page.indd 61 02/10/2013 10:01:04
  • 64.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 62 A generic tool for classifying documents based on a hybrid learning technique Target users and customers Everyone who has to deal with document classification with a large amount of already classified documents. Partners: Jouve Application sectors Industrial property• Scientific Edition• quaero-catalogue-210x210-v1.6-page-per-page.indd 62 02/10/2013 10:01:17
  • 65.
    63 Document Classification System Contactdetails: Gustavo Crispino gcrispino@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 30, rue du Gard 62300 Lens France www.jouve.com Any Posix compliant system Ask Jouve Description: The 100% automatic system is based on linguistic resources that are extracted from already classified documents. On a 100 classes patent preclassification task, this system achieves 85% precision (that is 5% better than human operators for this task). quaero-catalogue-210x210-v1.6-page-per-page.indd 63 02/10/2013 10:01:22
  • 66.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 64 A generic tool to identify and extract regions of text by analyzing connected components Target users and customers Everyone who has to deal with document image analysis. Layout analysis is the first major step in a document image analysis workflow. The correctness of the output of page segmentation and region classification is crucial as the resulting representation is the basis for all subsequent analysis and recognition processes. Partners: Jouve Application sectors Industry• Service• Patrimony• Edition• Administration• quaero-catalogue-210x210-v1.6-page-per-page.indd 64 02/10/2013 10:01:35
  • 67.
    65 Document Layout AnalysisSystem Contact details: Jean-Pierre Raysz jpraysz@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: The system identifies and extracts regions of text by analyzing connected components constrained by black and white (background) separators. The rest is filtered out as non-text. First, the image is binarized, any skew is corrected and black page borders are removed. Subsequently, connected components are extracted and filtered according to their size (very small components are filtered out). quaero-catalogue-210x210-v1.6-page-per-page.indd 65 02/10/2013 10:01:43
  • 68.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 66 Classification of all types of paper documents, Data Extraction and Mail Processing and Workflow Automation Target users and customers Independent Software Vendors• Business Process Outsourcers• Partners: A2iA Application sectors Bank, Insurance, Administration, Telecom and Utility Companies, Historical Document Conversion quaero-catalogue-210x210-v1.6-page-per-page.indd 66 02/10/2013 10:01:55
  • 69.
    67 Document Reader Contact details: VenceslasCartier venceslas.cartier@a2ia.com Technical requirements: Conditions for access and use: A2iA 39, rue de la Bienfaisance 75008 Paris France www.a2ia.com Wintel Platform Upon request Description: Classification of all types of paper documents A2iA DocumentReader classifies digitized documents into user-defined classes or “categories” (letters, contracts, claim forms, accounts receivable, etc.) based on both their geometry and their content. The software analyzes the layout of items on the document. Then, using a general dictionary and trade vocabulary, it carries out a literal transcription of the handwritten and/or typed areas. A2iA DocumentReader can then extract key-words or phrases in order to determine the category of the document. Data Extraction A2iA DocumentReader uses 3 methods to extract data from all types of paper documents: Extraction from predefined documents. Some documents (such as checks, bank documents and envelopes) are preconfigured within A2iA DocumentReader. The software recognizes their structure, the format of data to be extracted and their location on the document.Extraction from structured documents. A2iA DocumentReader recognizes and extracts data within a fixed location on the document.Extraction from semi-structured documents. The layout of the document varies but the data to be extracted remains unchanged. A2iA DocumentReader locates this data by its format and the proximity of key-words, wherever they appear on the document. Mail Processing and Workflow Automation A2iA DocumentReader analyzes the entire envelope or folder on a wholistic level, just as a human would, to identify its purpose and subject-matter (termination of subscription, request for assistance, change of address, etc.). All of the documents together can have a different meaning or purpose than a single document on its own. A2iA DocumentReader then transmits the digital data to the classification application in order to route the mail to the correct person or department. Mail is sent to the appropriate location as soon as it arrives: processing and response times are minimized, workflow automated, and manual labor decreased. quaero-catalogue-210x210-v1.6-page-per-page.indd 67 02/10/2013 10:01:56
  • 70.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 68 A generic tool to recognize the logical structure of documents from a OCR stream Target users and customers Everyone who has to deal with electronic document encoding of from the original source material and needs to consider the hierarchical structure represented in the digitized document. Partners: Jouve Application sectors Industry• Service• Patrimony• Administration• quaero-catalogue-210x210-v1.6-page-per-page.indd 68 02/10/2013 10:02:10
  • 71.
    69 Document Structuring System Contactdetails: Jean-Pierre Raysz jpraysz@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: The system recognizes the logical structure of documents from a OCR stream in accordance with the descriptions of a model (DTD, XML Schema). The result is a hierarchically structured flow. The model involves both knowledge of the macro-structure of the documents and the micro-structure of their content. quaero-catalogue-210x210-v1.6-page-per-page.indd 69 02/10/2013 10:02:16
  • 72.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 70 A recognition engine for degraded printed documents Target users and customers Everyone who has to deal with Character recognition on grey level images. Specifically targeted for low quality documents, the system also outperforms on the shelf OCR engines for good quality images. Partners: Jouve Application sectors Heritage scanning• Printing• quaero-catalogue-210x210-v1.6-page-per-page.indd 70 02/10/2013 10:02:29
  • 73.
    71 Grey Level CharacterRecognition System Contact details: Jean-Pierre Raysz jpraysz@jouve.fr Description: Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Despite all other OCR engines, this system processes grey level images directly (without using a temporary B&W image). Using all the information present in the image, this system is able to recognize degraded characters. quaero-catalogue-210x210-v1.6-page-per-page.indd 71 02/10/2013 10:02:32
  • 74.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 72 Capture handwritten and machine- printed data from documents Target users and customers Everyone who has to deal with forms containing handwritten fields or to process incoming mails Partners: Jouve Application sectors Banking• Healthcare• Government• Administration• quaero-catalogue-210x210-v1.6-page-per-page.indd 72 02/10/2013 10:02:44
  • 75.
    73 Handwriting Recognition System Contactdetails: Jean-Pierre Raysz jpraysz@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: JOUVE ICR (Intelligent Character Recognition) engine is a combination of two complementary systems: HMM and multidimensional recurrent neural networks. This engine has the advantage of dealing with input data of varying size and taking the context into account. JOUVE ICR carries on increasing recognition rate of handwritten fields in forms, using links between the fields. quaero-catalogue-210x210-v1.6-page-per-page.indd 73 02/10/2013 10:02:50
  • 76.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 74 A system that removes annoying halftones in scanned images Target users and customers Everyone who has to deal with high quality reproduction of halftone images. Partners: Jouve Application sectors Heritage scanning• Printing• quaero-catalogue-210x210-v1.6-page-per-page.indd 74 02/10/2013 10:03:02
  • 77.
    75 Image Descreening System Contactdetails: Christophe Lebouleux clebouleux@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: Halftone is a process to reproduce photographs or other images in which the various tones of grey or color are produced by variously sized dots of ink. When a document using this process is scanned, a very uncomfortable screening effect may appear. The system uses a combination of removal of peaks in Fourier image and local Gaussian blur. quaero-catalogue-210x210-v1.6-page-per-page.indd 75 02/10/2013 10:03:10
  • 78.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 76 A specific tool for recreating matter that was lost during the scanning process of bonded books. Target users and customers Everyone who has to deal with high quality reproduction of bonded books. Partners: Jouve Application sectors Heritage scanning• Printing• quaero-catalogue-210x210-v1.6-page-per-page.indd 76 02/10/2013 10:03:22
  • 79.
    77 Image Resizing forPrint on Demand Scanning Contact details: Christophe Lebouleux clebouleux@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system• Grey level or color images• Ask Jouve Description: In many cases, when documents have been debinded before scanning (that suppresses a part of the original), we are asked to provide an image at the original size, and sometimes to provide larger images than the original for reprint purpose. Using Seam Carving technique, we are able to obtain very realistic results. quaero-catalogue-210x210-v1.6-page-per-page.indd 77 02/10/2013 10:03:30
  • 80.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 78 Recognition of handwritten text transforms handwritten text into machine- readable text on a computer. Target users and customers Researchers• Developers• Integrators• Partners: RWTH Aachen University Application sectors Recognition of printed or handwritten text is heavily used in the mass processing of paper mail, filled-out forms and letters e.g. to insurance companies, and has been covered by the media in connection with the mass digitization of books. New usage patterns will evolve from the better coverage of handwriting and difficult font systems like Arabic or Chinese and from the recognition of text in any form of image data that due to digital cameras and the Internet, is being produced and distributed in ever increasing volumes. quaero-catalogue-210x210-v1.6-page-per-page.indd 78 02/10/2013 10:03:42
  • 81.
    79 Recognition of HandwrittenText Contact details: Volker Steinbiss steinbiss@informatik.rwth-aachen.de Technical requirements: Conditions for access and use: RWTH Aachen University Lehrstuhl Informatik 6 Templergraben 55 52072 Aachen Germany http://www-i6.informatik.rwth-aachen.de The text needs to be available in digitized form, e.g. through a scanner as part of a digital image or video. Processing takes place on a normal computer. RWTH does currently not provide public access to software in this area. Any usage should be subject to a bilateral agreement. Description: Optical character recognition (OCR) works sufficiently well on printed text but is in particular difficult for handwritten material. This is due to the fact that handwritten material contains a far higher variability than printed one. Methods that have been proven successful in other areas such as speech recognition and machine translation are being exploited to tackle this set of OCR problems. quaero-catalogue-210x210-v1.6-page-per-page.indd 79 02/10/2013 10:03:48
  • 82.
    Image Clusterization System- Jouve p82 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 80 quaero-catalogue-210x210-v1.6-page-per-page.indd 80 02/10/2013 10:04:00
  • 83.
    Image Identification System- Jouve p84 LTU Leading Image Recognition Technologies - LTU technologies p86 81 quaero-catalogue-210x210-v1.6-page-per-page.indd 81 02/10/2013 10:04:00
  • 84.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 82 A generic tool to perform automatic clustering of scanned images Target users and customers Everyone who has to group a large set of images in such a way that images in the same group are more similar to each other than to those in other groups, like for instance, in incoming mail processing. Partners: Jouve Application sectors Banking• Insurance• Industry• quaero-catalogue-210x210-v1.6-page-per-page.indd 82 02/10/2013 10:04:13
  • 85.
    83 Image Clusterization System Contactdetails: Jean-Pierre Raysz jpraysz@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: Two kinds of methods have been implemented. The first method consists in applying optical character recognition on pages. Distances are computed between images to classify and images contained in a database of labeled images. The second method consists in randomly selecting a pool of images inside a directory. For each image, invariant key points are extracted and characteristic features are computed (SIFT or SURF) to build the clusters. quaero-catalogue-210x210-v1.6-page-per-page.indd 83 02/10/2013 10:04:19
  • 86.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 84 A generic tool to identify automatically documents, photos and text zones in scanned images Target users and customers Everyone who has to deal with document recognition like identity cards, passports, invoices… Partners: Jouve Application sectors Administration• Banking• Insurance• quaero-catalogue-210x210-v1.6-page-per-page.indd 84 02/10/2013 10:04:31
  • 87.
    85 Image Identification System Contactdetails: Jean-Pierre Raysz jpraysz@jouve.fr Technical requirements: Conditions for access and use: Jouve R&D 1, rue du Dr Sauvé 53000 Mayenne France www.jouve.com Any Posix compliant system Ask Jouve Description: The system searches the best match between image signatures and model signatures. It determines whether the same kind of model is present in the image which has to be segmented or not. The segmentation done on the model is reported in the image to be segmented by applying an affine transformation (translation, rotation and homothety). quaero-catalogue-210x210-v1.6-page-per-page.indd 85 02/10/2013 10:04:36
  • 88.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 86 Leading Image Recognition Technologies Target users and customers Brands• Retailers• Social Media Monitoring companies• Research companies• Government agencies• Partners: LTU technologies Application sectors Visual Brand Intelligence: e-reputation, brand• protection Media Monitoring• M-Commerce and E-Commerce: augmented• reality, interactive catalogs, virtual Shop, advanced search functionalities, etc. Visual Asset Management: Images classification,• Images de-duplication, Images filtering, moderation, etc. quaero-catalogue-210x210-v1.6-page-per-page.indd 86 02/10/2013 10:04:48
  • 89.
    87 LTU Leading ImageRecognition Technologies Contact details: Frédéric Jahard fjahard@ltutech.com Technical requirements: Conditions for access and use: LTU technologies Headquarter: 132 rue de Rivoli 75001 Paris, France +33 1 53 43 01 68 Coming soon Coming soon US office: 232 Madison Ave New York, NY 10016 USA +1 646 434 0273 http://www.ltutech.com Description: Founded in 1999 by researchers at MIT, Oxford and Inria, LTU provides cutting-edge image recognition technologies and services to global companies and organizations such as Adidas, Kantar Media and Ipsos. LTU’s solutions are available on-demand with LTU Cloud or on an on-premise basis with LTU Enterprise Software. These patented image recognition solutions enable LTU’s clients to effectively manage their visual assets – internally and externally – and innovate by bringing their end-users truly innovative visual experiences. In an image-centric world, LTU’s expertise runs the image recognition gamut from visual search, visual data management, investigations and media monitoring, to e-commerce, brand intelligence, and mobile applications. quaero-catalogue-210x210-v1.6-page-per-page.indd 87 02/10/2013 10:04:55
  • 90.
    AudioPrint - IRCAMp90 Ircamchord: Automatic Chord Estimation - IRCAM p96 Music Structure - Inria p102 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 88 quaero-catalogue-210x210-v1.6-page-per-page.indd 88 02/10/2013 10:05:06
  • 91.
    Ircamaudiosim: Acoustical Similarity Estimation- IRCAM p92 Ircammusicgenre and Ircammusicmood: Genre and Mood Estimation - IRCAM p98 Ircambeat: Music Tempo, Meter, Beat and Downbeat Estimation - IRCAM p94 Ircamsummary: Music Summary Generation and Music Structure Estimation - IRCAM p100 89 quaero-catalogue-210x210-v1.6-page-per-page.indd 89 02/10/2013 10:05:07
  • 92.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 90 AudioPrint captures the acoustical properties by computing a robust representation of the sound Target users and customers AudioPrint is dedicated to middleware integrators that wish to develop audio fingerprint applications (i.e. systems for live recognition of music on air), as well as synchronization frameworks for second screen applications (a mobile device brings contents directly related to the live TV program). The music recognition application can also be used by digital rights management companies. Partners: IRCAM Application sectors Second screen software providers• Digital right management• Music query software developers• quaero-catalogue-210x210-v1.6-page-per-page.indd 90 02/10/2013 10:05:19
  • 93.
    91 AudioPrint Contact details: Frédérick Rousseau Frederick.Rousseau@ircam.fr Technicalrequirements: Conditions for access and use: IRCAM Sound Analysis /Synthesis 1 Place Igor-Stravinsky 75004 Paris France http://www.ircam.fr AudioPrint is available as a static library for Linux, Mac OS X and iOS platforms. Ircam Licence Description: AudioPrint is an efficient technology for live or offline recognition of musical tracks, within a database of learnt tracks. It captures the acoustical properties of the audio signal by computing a symbolic representation of the sound profile that is robust to common alterations. Moreover, it provides a very precise estimation of the temporal offset within the detected musical track. This offset estimation can be used as a means to synchronize devices. quaero-catalogue-210x210-v1.6-page-per-page.indd 91 02/10/2013 10:05:20
  • 94.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 92 Ircamaudiosim estimates the acoustical similarity between two music tracks. It can be used to perform music recommendation based on music content similarity. Target users and customers Ircamaudiosim allows the development of music recommendation based on music content similarity. It can therefore be used for any system (online or offline) requiring music recommendation, such as for the development of a recommendation engine for online music service or offline music collection browsing. Partners: IRCAM Application sectors Online music providers• Online music portals• Music players developers• Music software developers• quaero-catalogue-210x210-v1.6-page-per-page.indd 92 02/10/2013 10:05:32
  • 95.
    93 Ircamaudiosim: Acoustical SimilarityEstimation Contact details: Frédérick Rousseau Frederick.Rousseau@ircam.fr Technical requirements: Conditions for access and use: IRCAM Sound Analysis /Synthesis 1 Place Igor-Stravinsky 75004 Paris France http://www.ircam.fr Ircamaudiosim is available as software or as a dynamic library for Windows, Mac OS-X and Linux platform. Ircam Licence Description: Ircamaudiosim estimates the acoustical similarity between two audio tracks. For this, each music track of a database is first analyzed in terms of its acoustical content (timbre, rhythm, harmony). An efficient representation of this content is used, that allows a fast comparison between two music tracks. Because of this, the system is scalable to large databases. Given a target music track, the most similar (in terms of acoustical content) items of the database can be found quickly and then be used to provide recommendation to the listener. quaero-catalogue-210x210-v1.6-page-per-page.indd 93 02/10/2013 10:05:33
  • 96.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 94 Ircambeat software estimates the global and time-variable tempo and meter of a music file. It also estimates the positions of the beats and downbeats over time. Target users and customers Tempo and meter of a music file are among the major perceptual characteristics of a music file. Their automatic estimation allows to get these values for large collections of music files. They can therefore be used to perform automatic music classification of large music collections, search by similarity over large music collections and automatic music play-list generation. The technology can therefore benefit to music providers, online music portals or offline media-player developers. Beats and downbeats define the time-grid of a music file. They are used as front-end – for the estimation of many other music parameters and – for other processings (time-stretching, segmentation, DJ-ing). The technology for their automatic estimation can therefore benefit to music software developers (music production, music DJ- ing software). Partners: IRCAM Application sectors Online music providers• Online music portals• Music players developers• Music software developers• quaero-catalogue-210x210-v1.6-page-per-page.indd 94 02/10/2013 10:05:46
  • 97.
    95 Ircambeat: Music Tempo,Meter, Beat and Downbeat Estimation Contact details: Frédérick Rousseau Frederick.Rousseau@ircam.fr Technical requirements: Conditions for access and use: IRCAM Sound Analysis /Synthesis 1 Place Igor-Stravinsky 75004 Paris France http://www.ircam.fr Ircambeat is available as software or as dynamic library for Windows, Mac OS-X and Linux platform. Ircam Licence Description: Ircambeat performs the automatic estimation of the global and time-variable tempo and meter of a music file, as well as the estimation of the position of the beats and downbeats in a music file. For this, each digital music file is analyzed in terms of its time and frequency content in order to detect salient musical events. Periodicities of the musical events are then analyzed over time at various scales to get the tempo and meter. Beats and downbeats positions are estimated using music templates based on machine learning and musical theory, to get a precise time positioning. quaero-catalogue-210x210-v1.6-page-per-page.indd 95 02/10/2013 10:05:46
  • 98.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 96 Ircamchord software estimates automatically the temporal succession of music chords (C-Major, C-minor, …) that makes up a piece of music. Target users and customers One of the most important perceptual aspects of popular music is the succession of chords over time. Two tracks based on the same chord succession are perceived very similar and sometimes indicate a cover-version of the same composition. Automatic estimation of chord succession can therefore be used to perform search by similarity and play-list generation. It can therefore benefit to music providers, online music portals. Chord notation is also very popular for beginner musicians (a very large amount of guitar tabs are accessible and used over the web). Estimating automatically the chord succession of a given track can therefore be beneficial to personal users through the inclusion of the technology in local software. Partners: IRCAM Application sectors Online music providers• Online music portals• Music players developers• Music software developers• quaero-catalogue-210x210-v1.6-page-per-page.indd 96 02/10/2013 10:05:59
  • 99.
    97 Ircamchord: Automatic ChordEstimation Contact details: Frédérick Rousseau Frederick.Rousseau@ircam.fr Technical requirements: Conditions for access and use: IRCAM Sound Analysis /Synthesis 1 Place Igor-Stravinsky 75004 Paris France http://www.ircam.fr Ircamchord is available as software or as a dynamic library for Windows, Mac OS-X and Linux platform. Ircam Licence Description: Ircamchord performs the automatic estimation of the chord succession of a music track using a 24 chord dictionary (C-Major, C-minor…). For this, the harmonic content of a music file is first extracted in a beat-synchronous way. A statistical model (double-state hidden Markov model) representing music theory (chord transition), expected downbeat positions and estimated local-key is used for a precise estimation. quaero-catalogue-210x210-v1.6-page-per-page.indd 97 02/10/2013 10:05:59
  • 100.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 98 Ircammusicgenre and Ircammusicmood software estimate automatically the belonging of a music track to a set of music genre (electronica, jazz, pop/rock…) and music mood classes (positive, sad, powerful, calming…) Target users and customers Classification of music items are generally primarily based on their belonging to a music genre: electronica, jazz, pop/ rock… However, the editorial meta-data related to the genre are generally only accessible at the artist level (the whole set of music tracks produced by one artist will belong to the same music genre whatever the tracks content). Ircammusicgenre is a software which allows the automatic estimation of the belonging of a music track to music genres. The list of music genres considered by the software can be pre-determined by Ircam (electronica, jazz, pop/ rock…) or can be adapted to categories relevant to the partner, provided a sufficient number of sound examples per category. Ircammusicgenre also allows to perform multi-labeling of a music track, i.e. assigning a set of genre labels instead of a single genre. In this case, a weighting is assigned to each estimated label. Ircammusicmood is a software which allows the automatic estimation of the music mood of a music track to music mood. Music mood relates to the “mood” that a track suggests: positive, sad, powerful, calming… As for the music genre, the list can be predetermined by Ircam or discussed with the partner. Multi-labels can also be applied to the music mood classification. Partners: IRCAM Application sectors Online music providers• Online music portals• quaero-catalogue-210x210-v1.6-page-per-page.indd 98 02/10/2013 10:06:12
  • 101.
    99 Ircammusicgenre and Ircammusicmood:Genre and Mood Estimation Contact details: Frédérick Rousseau Frederick.Rousseau@ircam.fr Technical requirements: Conditions for access and use: IRCAM Sound Analysis /Synthesis 1 Place Igor-Stravinsky 75004 Paris France http://www.ircam.fr Ircammusicgenre and Ircammusicmood are available as software or as a dynamic library for Windows, Mac OS-X and Linux platform. Ircam Licence Description: Ircammusicgenre and Ircammusicmood are based on the Ircamclassifier technology. Ircamclassifier allows to learn new concepts related to music contents by training on example databases. For this, a large set of audio features are extracted from labeled music items and are used to find relationships between the labels and the example audio contents. Ircamclassifier uses over 500 different audio features, performs automatic feature selection and statistical model parameter selection. Ircamclassifier uses a full-binarization process of the labels and a set of SVM classifiers. Mono-labeling and multi-labeling are obtained from the set of SVM decisions. Performances and computation time of the resulting trained system are then optimized for a specific tasks given a ready-to-use system for music-genre or musicmood. quaero-catalogue-210x210-v1.6-page-per-page.indd 99 02/10/2013 10:06:13
  • 102.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 100 Ircamsummary software creates automatically a music summary of an audio file. It also estimates the temporal structure of a music file in terms of repetition of similar parts Target users and customers Automatic music summary generation aims at providing informative audio preview of the content of a music file (rather than the commonly used first 30s). It can therefore benefit to any service providing access to music items requiring a quick preview of the music files such as music providers, online music portals. It can also be installed on a personal computer as a preview of the local user’s music collection. Automatic music structure estimation provides the description of the temporal organization of music files in terms of repetition of parts over time. It can be used for visualization and interaction with the playing of a music file (intelligent forward/ backward, accessing directly the most repeated parts). It can benefit to any developer of music players or software for music interaction. Partners: IRCAM Application sectors Online music providers• Online music portals• Music players developers• Music software developers• quaero-catalogue-210x210-v1.6-page-per-page.indd 100 02/10/2013 10:06:26
  • 103.
    101 Ircamsummary: Music SummaryGeneration and Music Structure Estimation Contact details: Frédérick Rousseau Frederick.Rousseau@ircam.fr Technical requirements: Conditions for access and use: IRCAM Sound Analysis /Synthesis 1 Place Igor-Stravinsky 75004 Paris France http://www.ircam.fr Ircamsummary is available as software or as a dynamic library for Windows, Mac OS-X and Linux platform. Ircam Licence Description: Ircamsummary performs the automatic generation of music audio summaries. It uses various strategies: the most representative extract (in terms of content repetition and content position), down- beat synchronous concatenation of the most representative parts. The summary can also be parameterized by the user in terms of duration of the summary (from 10s to 30s). Ircamsummary also provides the estimation of the structure of music files in terms of repetition of parts (such as verse, chorus bridge… but without explicit labeling of the parts). For this, Ircamsummary extracts the timbral, harmonic and rhythmic content of a music file over time and analyzes content repetition using two strategies: sequence repetition and state repetition. The generation of the audio summary is parametrizable in type (continuous summary/or summary obtained by concatenating the most informative parts) and in duration. The estimation of the structure is parametrizable in terms of number of parts and part’s type (sequence or state). quaero-catalogue-210x210-v1.6-page-per-page.indd 101 02/10/2013 10:06:27
  • 104.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 102 Three music structure estimation systems Target users and customers Music industry actors• Industrial laboratories interested in automatic• music analysis Partners: Inria Application sectors Music description• Music indexing• Music analysis and creation• quaero-catalogue-210x210-v1.6-page-per-page.indd 102 02/10/2013 10:06:39
  • 105.
    103 Music Structure Contact details: FrédéricBimbot frederic.bimbot@irisa.fr Technical requirements: Conditions for access and use: IRISA/PANAMA Research Team Campus de Beaulieu 35042 Rennes Cedex France https://team.inria.fr/panama/projects/music-structure/ All: PC or Mac with Matlab (signal processing and statistics toolboxes) System 1 (2010) requires the mfcc extractor from the MA Toolbox by Slaney and Logan, and chroma and beat extractors developed by Ellis (Coversongs project, LabRosa) System 2 (2011) requires the chord estimation by Ueda (University of Tokyo), the beat and downbeat trackers by Davies (INESC Porto), and Matlab edit distance script by Miguel Castro (Matlab Central) System 3 (2012) requires the Chroma Toolbox by Muller and Ewert (Max-Planck-Institut für Informatik) and the beat and downbeat trackers by Davies (INESC Porto) The three systems have been developed at Irisa in Rennes and are the property of Université de Rennes 1, CNRS and Inria. They are currently prototypes provided by IRISA/PANAMA under the « Creative Commons Attribution-NonCommercial-ShareAlike 3.0″ license (http:// creativecommons.org/licenses/by-nc-sa/3.0/legalcode) Gabriel Sargent gabriel.sargent@irisa.fr Description: The three systems produce an estimation of the semiotic structure of the music piece considered, i.e. a description of its macroscopic organization through a set of structural segments labeled according to the similarity of their musical content. They consist in three steps: a feature extraction step, a segmentation step based on feature analysis under a time regularity constraint, and a labeling step based on hierarchical clustering. System 1 (2010) uses timbre homogeneity, tonal content repetitions and short sound events for segmentation. Resulting segments are clustered according to their timbre. System 2 (2011) performs a segmentation through chord repetitions. Resulting segments are clustered according to the similarity of their chord sequence. System 3 (2012) considers an internal model of the structural segments for segmentation. Resulting segments are clustered according to the similarity of their tonal content. Authors: Gabriel Sargent, Frédéric Bimbot, Emmanuel Vincent quaero-catalogue-210x210-v1.6-page-per-page.indd 103 02/10/2013 10:06:44
  • 106.
    SYRIX: Information retrievalsystem in context - IRIT p106 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 104 quaero-catalogue-210x210-v1.6-page-per-page.indd 104 02/10/2013 10:06:57
  • 107.
  • 108.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 106 Information Retrieval System in Context Target users and customers The targeted users and customers are the search engine actors, and all industrialists interested in document retrieval. Partners: IRCAM Application sectors Document retrieval• Information recommendation• Advertising• quaero-catalogue-210x210-v1.6-page-per-page.indd 106 02/10/2013 10:07:10
  • 109.
    107 SYRIX: Information retrievalsystem in context Contact details: General issues: Lynda Tamine lechani@irit.fr Technical requirements: Conditions for access and use: IRIT/SIG team 118, Route de Narbonne 31062 Toulouse Cedex 09 France http://www.irit.fr/ PC with Unix/Linux• This software requires a front-of search engine,• ODP ontology provided by DMOZ editor * * http://www.dmoz.org/ SyRiX is a software that has been developed at IRIT-SIG Toulouse and is the property of IRIT. SyRiX can be supplied under license on a case-by- case basis. For more information, please contact Lynda Tamine Lechani at Lynda.Lechani@irit.fr Mohand Boughanem bougha@irit.fr Description: SyRiX can be considered to be (1) a contextual search engine itself (2) a contextual document re- ranker component in the sense that is intended to be plugged to a search engine in order to personalize the initial ranking using evidence issued from the user profile. Figure 1 gives a general overview of SyRix’ main functionalities. quaero-catalogue-210x210-v1.6-page-per-page.indd 107 02/10/2013 10:07:14
  • 110.
    IRINTS: Irisa NewsTopic Segmenter - Inria p110 SloPy: Slope One with Privacy - Inria p116 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 108 quaero-catalogue-210x210-v1.6-page-per-page.indd 108 02/10/2013 10:07:27
  • 111.
    Sentiment Analysis andOpinion Mining - Synapse Développement p112 Persons, Places, Date, Organizations & Events Recognition - Synapse Développement p114 109 quaero-catalogue-210x210-v1.6-page-per-page.indd 109 02/10/2013 10:07:27
  • 112.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 110 Topic segmentation of automatic speech transcripts Target users and customers The targeted users and customers are the multimedia industry actors, and any content and service provider with speech data. Partners: Inria Application sectors Spoken document processing• quaero-catalogue-210x210-v1.6-page-per-page.indd 110 02/10/2013 10:07:39
  • 113.
    111 IRINTS: Irisa NewsTopic Segmenter Contact details: General issues: Patrick Gros patrick.gros@irisa.fr Technical requirements: Conditions for access and use: IRISA/Texmex team Campus de Beaulieu 35042 Rennes Cedex France http://www.irisa.fr/ SPC with Unix/Linux OS• IRINTS requires a C compiler, Perl [1], the libxml2• [2] library, and the TreeTagger [3] software to be installed on the system [1] http://www.perl.org/ [2] http://xmlsoft.org/ [3] http://www.ims.uni-stuttgart.de/projekte/corplex/ TreeTagger/ IRINTS is a software that has been developed at Irisa in Rennes and is the property of CNRS (DI 03033-01) and Inria. Registration at the Agency for Program Protection (APP) in France, is currently under process. License can be supplied under request on a case-by-case basis. Technical issues: Sébastien Campion scampion@irisa.fr Description: IRINTS (Irisa News Topic Segmenter) was designed for topic segmentation of broadcast news transcripts. The distribution includes a front-end script, ‘irints’, which is merely a wrapper to the main ‘topic-segmenter’ program included herein (topic- segmenter, release 1.1 [1] ). The topic-segmenter program is a software dedicated to topic segmentation of texts and (automatic) transcripts, mostly based on lexical cohesion, implementing (and extending) a method described in [2]. A bunch of goodies, such as the use of alternate knowledge sources, were added. For more details (and assuming you can read the French language), please refer to [3]. As shown on figure 1 below, input to IRINTS is an automatic transcript (in Vecsys’s VOX format or IRISA’s SSD format). The output is an XML file in SSD format specifying topic segments. [1] http://gforge.inria.fr/projects/topic-segmenter/ [2] Masao Utiyama and Hitoshi Isahara, «A Statistical Model for Domain-Independent Text Segmentation», ACL, 491–498, 2001 [3] S. Huet, G. Gravier and P. Sébillot, «Un modèle multisources pour la segmentation en sujets de journaux radiophoniques», in Proc. Traitement Automatique des Langues Naturelles, 2008. IRINTS was developed at Irisa in Rennes by the Texmex and Metiss teams. The IRINTS authors are: Guillaume Gravier, Camille Guinaudeau quaero-catalogue-210x210-v1.6-page-per-page.indd 111 02/10/2013 10:07:42
  • 114.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 112 This technology synthesizes the opinions at elementary level subsection (strongly negative opinion), and at higher levels (comment, document) and topics in order to provide a new entry point. Target users and customers Any organization that wants to qualify, follow and analyze the contents it manages or that is created on the Internet. Partners: Synapse Développement Exalead Technicolor Yacast Application sectors Monitoring of influence operations• Fight against disinformation• Networks of opinion mapping• E-reputation• Summary classification of consumer reviews• Detection of positions on social networks• Make graphic analysis on the reviews to highlight• the tendencies and the key concepts Analyze the consumer insight to better• understand him quaero-catalogue-210x210-v1.6-page-per-page.indd 112 02/10/2013 10:07:55
  • 115.
    113 Sentiment analysis andOpinion mining Contact details: Patrick Séguéla patrick.seguela@synapse-fr.com (+33)(0)5.61.63.03.74 Technical requirements: Conditions for access and use: Synapse Développement 33, rue Maynard 31000 Toulouse France http://www.synapse-developpement.fr/ No technical constraint. It can be accessed from Linux or Windows OS. SDK available for integration in programs or Web services. www.synapse-fr.com/sitepro/index.html Autre partenaire: Priberam Lisbon, Portugal Description: The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses willing to market their products, identify new opportunities and manage their reputations. As businesses tend to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content, many are now looking into the field of sentiment analysis. By investing in predictive analytics tools and other search solutions, businesses can gain valuable insights from their data and better serve the needs of their clients. This technology synthesizes the opinions at elementary level subsection (strongly negative opinion), and at higher levels (comment, document) and topics in order to provide a new entry point. The opinions are tagged with 3 pieces of information: 1/ The polarity; 2/ The intensity; 3/ A semantic category indicating the degree of involvement of the author. quaero-catalogue-210x210-v1.6-page-per-page.indd 113 02/10/2013 10:08:02
  • 116.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 114 Competitive intelligence always concerns organizations, people, places, products, etc. This technology aims at tagging information in a text flow. Target users and customers Exploitation of mass of unstructured data• Anonymization of sensitive data• Strategic, business or competitive intelligence• Partners: Synapse Développement Exalead Yacast Application sectors Search and indexing in unstructured documents• Document processing• Machine Reading and rich indexing• Interconnection between metadata and• unstructured content Intelligence, press and social networks• monitoring Automatic text understanding• Automatic annotation of content• Document classification• quaero-catalogue-210x210-v1.6-page-per-page.indd 114 02/10/2013 10:08:15
  • 117.
    115 Persons, Places, Date,Organizations & Events Recognition Contact details: Patrick Séguéla patrick.seguela@synapse-fr.com (+33)(0)5.61.63.03.74 Technical requirements: Conditions for access and use: Synapse Développement 33, rue Maynard 31000 Toulouse France http://www.synapse-developpement.fr/ No technical constraint. It can be accessed from Linux or Windows OS. SDK available for integration in programs or Web services. For specific conditions of use and to see our demo, please contact us. www.synapse-fr.com/sitepro/index.html You can grant access to our demo website Description: Competitive intelligence always concerns organizations, people, places, products, etc. This technology aims at tagging information in a text flow. The information automatically annotated is basically: person’s name, functions, organizations, dates, events, places, addresses, phone numbers, e-mail addresses and amounts. The technology is accurate for all types of texts, whatever the field. Whether legal or military posts, journalistic dispatches on terrorist acts or on economics news, it identifies the actors, their functions and relationships, as well as details of the events encountered. User can integrate its own dictionaries in the technology. quaero-catalogue-210x210-v1.6-page-per-page.indd 115 02/10/2013 10:08:19
  • 118.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 116 Slope One with Privacy Target users and customers The targeted users and customers are all the Internet actors providing personalized services to their users, interested by integrating recommender systems that are more respectful of their privacy. Partners: Inria Application sectors Personalization• Recommender systems• quaero-catalogue-210x210-v1.6-page-per-page.indd 116 02/10/2013 10:08:32
  • 119.
    117 SloPy: Slope Onewith Privacy Contact details: Sébastien Gambs sgambs@irisa.fr SlopPy was developed in Irisa/INRIA Rennes by the CIDRE team by Sébastien Gambs and Julien Lolive. Technical requirements: Conditions for access and use: Inria Rennes Campus Universitaire de Beaulieu 35042 Rennes Cedex France www.inria.fr SPC with Java installed• Access to the TOR anonymous communication• network Installation of a library implementing• homomorphic encryption such as BouncyCastle Deployment of a server responsible for creating• and updating the matrices needed for the recommendation SlopPy is currently available as a prototype only. It can be released and supplied under license on a case-by-case basis. Description: SlopPy (for Slope One with Privacy ) [1] is both a privacy-preserving version of the recommendation algorithm Slope One and a recommendation architecture built around this algorithm in which a user never releases directly his personal information (i.e., his ratings) to a trusted third party. The figure below illustrates the architecture of the SlopPy recommender system. More precisely in SlopPy, each user first perturbs locally his data (Step 1) by applying a Randomized Response Technique (RRT) before sending this information to the entity responsible for storing this information through an anonymous communication channel (Step 2). This entity is assumed to be semi-trusted, also sometimes called honest-but-curious in the sense that it is assumed to follow the directives of the protocol (i.e., it will not corrupt the perturbed ratings sent by a user or try to influence the output of the recommendation algorithm) but nonetheless tries to extract as much information as it can from the data it receives. Out of the perturbed ratings, the semi-trusted entity constructs two matrices (i.e., the deviation matrix and the cardinality matrix) following the Weighted Slope One algorithm (Step 3). When a user needs a recommendation on a particular movie, he queries these matrices through a variant of a private information retrieval scheme (Step 4) hiding the content of his query (i.e., the item he is interested in) to the semi-trusted entity. By combining the data retrieved (Step 5) with his true ratings (which once again are only stored on his machine), the user can then locally compute the output of the recommendation algorithm for this particular item (Step 6). [1] Sébastien Gambs and Julien Lolive. SlopPy: Slope One with Privacy. In DPM, September 2012. quaero-catalogue-210x210-v1.6-page-per-page.indd 117 02/10/2013 10:08:33
  • 120.
    MoveaTV: Motion Processing Enginefor interactive TV - Movea p120 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 118 quaero-catalogue-210x210-v1.6-page-per-page.indd 118 02/10/2013 10:08:45
  • 121.
  • 122.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 120 Motion processing engine for Interactive TV Target users and customers Systems integrators• OEMs• Service providers• Application developers to take advantage of• Movea’s state-of-the-art motion processing technology Partners: Movea Application sectors Man-machine interactions• Remote control• Digital TV• Video Games• Peripherals• Smart Home• quaero-catalogue-210x210-v1.6-page-per-page.indd 120 02/10/2013 10:08:58
  • 123.
    121 MoveaTV: Motion ProcessingEngine for interactive TV Contact details: Marc Attia m.attia@movea.com Technical requirements: Conditions for access and use: Movea 4 Avenue Doyen Louis Weil 38000 Grenoble France www.movea.com Programming (Java, C/C++, etc.) To be discussed (usually IP license, i.e fees & royalties) Description: MoveaTV makes it easy to deliver advanced user interfaces, immersive motion gaming, gesture-based viewer authentication, and intuitive program guide navigation. quaero-catalogue-210x210-v1.6-page-per-page.indd 121 02/10/2013 10:08:59
  • 124.
    AACI: Automatic acquisitionand tracking of mobile target in image sequences - Inria p124 ContentArmor™ Video Watermarking - Technicolor p130 Hybrid Broadcast Broadband Synchronization - Technicolor p136 Soccer Event Detection - Technicolor p142 Semantic Acquisition & Annotation (5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 122 quaero-catalogue-210x210-v1.6-page-per-page.indd 122 02/10/2013 10:09:12
  • 125.
    Audience Characterization - Technicolorp126 Crowd Sourced Metadata - Technicolor p132 Movie Chaptering - Technicolor p138 VidSeg: Video Segmentation - Inria p144 C-Motion: Camera motion characterization - Inria p128 Face Detection, Recognition and Analysis - Karlsruhe Institute of Technology (KIT) p134 Mumtimedia Person Identification - Karlsruhe Institute of Technology (KIT) p140 Violent Scenes Detection - Technicolor p146 123 quaero-catalogue-210x210-v1.6-page-per-page.indd 123 02/10/2013 10:09:12
  • 126.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 124 Automatic acquisition and tracking of mobile target in image sequences Target users and customers The targeted users and customers are the multimedia industry actors, and all academic or industrial laboratories interested in object tracking in videos. Partners: Inria Application sectors Target tracking• Video analysis• Multimedia document processing• quaero-catalogue-210x210-v1.6-page-per-page.indd 124 02/10/2013 10:09:25
  • 127.
    125 AACI: Automatic acquisitionand tracking of mobile target in image sequences Contact details: General issues: Patrick Gros patrick.gros@irisa.fr Technical requirements: Conditions for access and use: IRISA/Texmex team Campus de Beaulieu 35042 Rennes Cedex France http://www.irisa.fr/ SPC with Unix/Linux• This software requires the Motion2D [1]• software developed by Inria, and OpenCV [2] developed by Intel, as third party libraries. [1] http://www.irisa.fr/vista/Motion2D/index.html [2] http://opencv.willowgarage.com/wiki/ AACI is a software that has been developed at Irisa/Inria-Rennes and is the property of Inria. AACI can be supplied under license on a case-by- case basis. Technical issues: Sébastien Campion scampion@irisa.fr Description: AACI is structured in 4 steps. First of all, dominant motion estimation is performed using Motion2D software. Then each pixel of current frame is labeled either as “conform to dominant motion” or “non- conform to dominant motion” by minimum-cut/ maximumflow minimization of a cost function, described in [1]. Then each detection is added to a Trellis, and is validated if it is persistent in size and position for a short period of time (the Trellis depth) or discarded otherwise. Finally, each validated detection is tracked using Mean-shift algorithm, as explained in [2]. [1] J.-M. Odobez and P. Bouthemy, Separation of moving regions from background in an image sequence acquired with a mobile camera [2] D. Comaniciu, V. Ramesh, P. Meer, Kernel Based Object Tracking AACI was jointly developed by the Vista team at Irisa/INRIA Rennes, BERTIN Technologies, and by DGA (Direction Générale de l’Armement). The AACI authors are: Florent Dutrech, Patrick Perez quaero-catalogue-210x210-v1.6-page-per-page.indd 125 02/10/2013 10:09:29
  • 128.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 126 Automatically characterize in-home audience and level of attention Target users and customers All content providers may be interested in the automatic characterization of the in-home audience. When personalization of video – either Video on Demand (VoD) or broadcast – or ads is targeted, these same providers will see an interest in having this module to help the automatic personalization of provided content. The audience characterization module may also be used by end users to manage their own content at home. Furthermore, content providers and advertisers will be interested in the ‘level of attention’ information provided by the module. Partners: Technicolor Application sectors Provided content personalization Knowing what the audience is: VoD portals may be personalized and proper• home pages may be displayed; Ads may be personalized;• According videos and broadcast programs may• be proposed. quaero-catalogue-210x210-v1.6-page-per-page.indd 126 02/10/2013 10:09:41
  • 129.
    127 Audience Characterization Contact details: LouisChevallier Louis.chevallier@technicolor.com Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com Current version of module runs on a QuadCore PC connected to 2 WebCams. It is a multi-threaded Windows application programmed in C++. Corresponding deliverables are all stated QL – i.e. this module is only available to a subset of PVAA partners, on their request. Related IPL is the property of Technicolor. Description: The Audience Characterization module is connected to a Webcam (or two Webcams to enlarge the angle of the field of view) placed near a TV screen. This module detects and tracks faces, evaluates age class and gender of individuals, detects groups, and provides timed reports about detected people. The eye tracking module tracks eyes and evaluates the level of attention for each detected person. Timed reports are provided. The module may match detected faces with known ones from a small database (i.e. family members), to enhance the personalization of the provided content. quaero-catalogue-210x210-v1.6-page-per-page.indd 127 02/10/2013 10:09:47
  • 130.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 128 Camera motion characterization Target users and customers The targeted users and customers are the multimedia industry actors, and all academic or industrial laboratories interested in video analysis. Partners: Inria Application sectors TVideo indexing• Multimedia document processing• quaero-catalogue-210x210-v1.6-page-per-page.indd 128 02/10/2013 10:10:00
  • 131.
    129 C-Motion: Camera motioncharacterization Contact details: General issues: Patrick Gros patrick.gros@irisa.fr Technical requirements: Conditions for access and use: IRISA/Texmex team Campus de Beaulieu 35042 Rennes Cedex France http://www.irisa.fr/ SPC with Unix/Linux or Windows OS• This software requires the Motion2D [1] software• developed by Inria as a third party library [1] http://www.irisa.fr/vista/Motion2D/index.html C-Motion is a software that has been developed at Irisa/Inria-Rennes and is the property of Inria. C-Motion can be supplied under license on a case- by-case basis. Technical issues: Sébastien Campion scampion@irisa.fr Description: C-Motion is a software dedicated to camera motion characterization. It relies on the Motion2D library developed by Inria for 2D parametric motion model estimation (see the architecture on figure below). For each frame in an image sequence or a video, C-Motion gives the corresponding estimated camera motion class. These motion classes correspond to the following situations: Static camera• Pan (right, left, up, down, or a combination: right/up, right/• down, left/up, left/down) Zoom/traveling (in or out)• Complex camera motion• C-Motion was jointly developed by the Vista team at Irisa/INRIA Rennes, and by DGA (Direction Générale de l’Armement). The C-Motion authors are: Marc Gelgon, Fabien Spindler, Patrick Bouthemy. quaero-catalogue-210x210-v1.6-page-per-page.indd 129 02/10/2013 10:10:03
  • 132.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 130 ContentArmor™ Video Watermarking is a technology intended to deter actors along the content value chain from leaking content Target users and customers Content owners• Studios• Post-production houses• Content distributors• Partners: Technicolor Application sectors SECURE E-screener provides a traitor tracing mechanism for all high-risk screeners: internal quality assessment and validation reviews, promotional tools preparation, screeners and promotional viewing for distributors. Fully automated process within digital workflow• Flexible integration of the embedder in existing content• distribution frameworks Stronger reputation and controlled liability of the• stakeholders PREMIUM Video-on-Demand provides a serialization mechanism in home gateways enabling fine grained traitor tracing for premium content distributed on multi-devices into the home. Wider traitor tracing coverage enabling early window• content release Ease of technology integration in low computational• devices Secure implementation on dependable platforms, e.g.• Conditional Access System quaero-catalogue-210x210-v1.6-page-per-page.indd 130 02/10/2013 10:10:16
  • 133.
    131 ContentArmor™ Video Watermarking Contactdetails: Gwenaël Doërr gwenael.doerr@technicolor.com Description: Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com ContentArmor™ Video Watermarking is dedicated to video encoded using H.264 MPEG-4/AVC with CABAC entropy coding (Main and High profiles) The system currently supports the following video containers: MPEG-2/TS, MOV, MP4, none The profiler is Linux-based; the embedder is OS independent. The profiler and the embedder can be licensed as software executables or libraries; investigative services are currently not available for licensing. http://www.technicolor.com/en/ solutions-services/technology/ technology-licensing/content- armor-secure-digital-content ContentArmor™ Video Watermarking is a technology intended to deter actors along the content value chain from leaking content. To do so, an invisible forensics watermark is embedded within the content to uniquely identify the device or the recipient whom it has been delivered to. Technicolor’s two-step video watermarking algorithm (profiler + embedder) operates directly in the bitstream, thus resulting in blitz- fast embedding. The shift of computationally expensive operations to a preliminary profiling step enables integration at any point of the distribution chain, including low computational power CE devices such as set-top boxes, tablets, etc. 2-Step Watermarking System Isolation of computationally intensive operations in an offline profiler• One unique profiling per content regardless of the number of• recipients Bit Stream Watermarking No need for re-encoding• Blitz-fast embedding• Seamless integration at any point of digital distribution workflows• High-performance technology High-fidelity profiles tailored to the type of video content (animation• movies, feature films, sport programs, documentaries, etc) Proven robustness against crude signal processing attacks, including• severe recompression, HDMI stripping, screencasting, and camcording Flexible error correction to individually protect the ID bits• quaero-catalogue-210x210-v1.6-page-per-page.indd 131 02/10/2013 10:10:17
  • 134.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 132 Automatically tags media content according to what the crowd says of it over the Web Target users and customers Professional customers:• Content owners Content providers Service providers Consumers• Partners: Technicolor Application sectors Content targeting• Content recommending• Content retrieving• Content discovering• Content browsing• Content replaying• Web contributors’ reviews and comments posted on dedicated web sites have proven to embed cleverness from which valuable descriptive metadata can be extracted. These descriptive metadata, while being synchronized with the content timeline or not, help any of the above usages and services about content. Furthermore, these associated metadata raise the value of the related content over time, which is of interest to content owners as well as to content providers. quaero-catalogue-210x210-v1.6-page-per-page.indd 132 02/10/2013 10:10:29
  • 135.
    133 Crowd Sourced Metadata Contactdetails: Philippe Schmouker philippe.schmouker@technicolor.com Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com The « Content tagging according to crowd• sourced metadata” module analyses big sets of comments and reviews – i.e. free texts – posted by contributors on Web sites dedicated to Cinema and TV It currently runs as Python modules.• Corresponding deliverables are all stated QL – i.e. these modules are only available to a subset of PVAA partners, on their argued request. Related IPL is proprietary of Technicolor. Description: The “Content tagging according to crowd sourced metadata” module automatically extracts metadata from what the crowd says about media content on the Web. It currently extracts named entities from subtitles, comments and reviews. It also extracts from posted comments: quotes of movie dialogs and of other comments. Furthermore, it characterizes contributors to forums according to their connections to other contributors and to their behaviour over time on these forums. The characterization of contributors is expected to help in determining which comments should be analysed first. Indeed, they constitute an infinite and constantly growing stream of words that may not always be of great interest. The Natural Language Processing and temporal graphs analysis dedicated modules have been developed for the specific purpose of extracting descriptive metadata and when possible synchronizing them with the media timeline. It aims to enrich the description of the media content and to increase its value over time. quaero-catalogue-210x210-v1.6-page-per-page.indd 133 02/10/2013 10:10:36
  • 136.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 134 Localize and identify faces and estimate age, gender and emotions Target users and customers The targeted users are companies interested in integrating face analysis into their products. Partners: Karlsruhe Institute of Technology (KIT) Application sectors Digital Signage• User Interfaces / Human Computer Interaction• Entertainment• Safety and Security• Multimedia Analysis, Search & Retrieval• Assistive Techonologies• quaero-catalogue-210x210-v1.6-page-per-page.indd 134 02/10/2013 10:10:49
  • 137.
    135 Face Detection, Recognitionand Analysis Contact details: Prof. Dr. Rainer Stiefelhagen rainer.stiefelhagen@kit.edu Conditions for access and use: Karlsruhe Institute of Technology Institute for Anthropomatics Vincenz-Priessnitz-Str. 3 76131 Karlsruhe Germany https://cvhci.anthropomatik.kit.edu/ Available for licensing on a case-by-case basis. Description: This technology allows to localize and recognize faces in images and videos. It operates in real-time and is robust across very different source types. In addition to identification, age, gender and emotion of the person can be estimated. quaero-catalogue-210x210-v1.6-page-per-page.indd 135 02/10/2013 10:10:56
  • 138.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 136 Personalized audio, Multi-view on multi- screen, Hybrid stereoscopic 3D TV Target users and customers Broadcasters (Sat, Cable, Terrestrial)• ISP• Partners: Technicolor Application sectors Personalized audio: the technology offers the user the possibility of enjoying a broadcast TV program in his/her favorite language. Additional languages are streamed on demand from a server and can be rendered either on the main TV screen or on a personal device (e.g. smartphone with headphone). Multi-view on multi-screen: the user can enrich the broadcast TV program (e.g. music live concert or sport event) by selecting additional points of views rendered on a second screen, e.g. a tablet. Hybrid stereoscopic 3D TV: it consists in rendering a 3D side-by-side content without monopolizing a broadcast channel. One view is transmitted over a broadcast network whilst the other view is delivered over Internet. Each view can be rendered independently as a 2D content. quaero-catalogue-210x210-v1.6-page-per-page.indd 136 02/10/2013 10:11:08
  • 139.
    137 Hybrid Broadcast BroadbandSynchronization Contact details: Anthony Laurent anthony.laurent@technicolor.com Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com The technology ensures a frame accurate synchronization for audiovisual components delivered over different networks with different transport protocols and transmission delays, each one having its own reference clock. The principle is to insert a timeline component in the broadcast service. This component is linked to the current content and embeds a timing information indicating the time elapsed since the beginning. Its format is based on existing DVB specification ETSI TS 102 823 “Specification for the carriage of synchronized auxiliary data in DVB transport streams”. Technology currently available only as a prototype based on GStreamer. Related IPL is proprietary of Technicolor. Description: Broadcasters and IPTV service providers would like to propose new value-added services but are confronted with bandwidth limitations. Hybrid broadcast broadband synchronization technology helps them overcome this constraint. It allows broadcast/IPTV service content to be enriched with additional audiovisual components delivered over broadband or stored locally, by ensuring a very accurate rendering synchronization. quaero-catalogue-210x210-v1.6-page-per-page.indd 137 02/10/2013 10:11:11
  • 140.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 138 An Automatic temporal segmentation of video Target users and customers Content providers• ISP (Internet Service Provider)• Video editing software companies• Partners: Technicolor Application sectors Video structuring• Video archiving• quaero-catalogue-210x210-v1.6-page-per-page.indd 138 02/10/2013 10:11:24
  • 141.
    139 Movie Chaptering Contact details: HassaneGuermoud Hassane.Guermoud@technicolor.com Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com Current version of module runs on QuadCore PC. It is a Linux and Windows application programmed in C++. This module is only available to a subset of PVAA partners, on their request. Related IPL is propriety of Technicolor. Description: The movie chaptering is an unsupervised processing that segments a movie in chapters. The rules to obtain chapters have to respect the unity of time, place and action. The movie chaptering module can be embedded in a set up box and executed offline to segment a movie recorded by the user. Thus, browsing through chapters to reach the sequence that the user is looking for becomes easier to achieve. quaero-catalogue-210x210-v1.6-page-per-page.indd 139 02/10/2013 10:11:29
  • 142.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 140 Identifying actors in movies and TV series Target users and customers Multimedia Content Providers• Movie/TV Streaming Providers• Movie/TV Industry Actors• Partners: Karlsruhe Institute of Technology (KIT) Application sectors Movie/TV Streaming & Playback• Second Screen• quaero-catalogue-210x210-v1.6-page-per-page.indd 140 02/10/2013 10:11:41
  • 143.
    141 Multimedia Person Identification Contactdetails: Prof. Dr. Rainer Stiefelhagen rainer.stiefelhagen@kit.edu Conditions for access and use: Karlsruhe Institute of Technology Institute for Anthropomatics Vincenz-Priessnitz-Str. 3 76131 Karlsruhe Germany https://cvhci.anthropomatik.kit.edu/ Available for licensing on a case-by-case basis. Description: This technology allows to identify actors/characters in multimedia data such as movies and TV series. It first tracks faces/persons and subsequently provides identities for each track. As such, it can be used to provide additional information about actors/characters while viewing the video. quaero-catalogue-210x210-v1.6-page-per-page.indd 141 02/10/2013 10:11:49
  • 144.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 142 Automatically detects events/actions of interest in a soccer match Target users and customers All content providers may be interested by the soccer event detection technology to manage efficiently their content database. When repurposing applications are targeted, these same providers will see an interest in having this module to help automatically detect events of interest. The soccer event detection module may also be used by end users to manage their own content at home. Partners: Technicolor Application sectors Content structuring & browsing• : Knowing where actions of interest are in a soccer match allows structuring of soccer matches. It also gives direct access to events, thus enabling a non linear browsing in the document. Content retrieval• : By adding additional metadata to contents (ie. timecodes of events), the module finds a direct application in the context of content retrieval in video databases. Content repurposing• : Knowing where the events are in a soccer match allows to repurpose this content for other broadcasting channels such as the internet or portable devices. In the later case, building summary from the detected events is of interest for sending this new repurposed content to mobile phones. quaero-catalogue-210x210-v1.6-page-per-page.indd 142 02/10/2013 10:12:02
  • 145.
    143 Soccer Event Detection Contactdetails: Claire-Helene Demarty claire-helene.demarty@technicolor.com Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com The Soccer Event Detection module takes a set of• precomputed features from a video stream as input. Currently running under matlab.• Module currently available as a matlab prototype• only. Corresponding deliverables are all stated QI – i.e.• this module is only available to Quaero partners, on their request. Related IPL is proprietary of Technicolor. Description: The soccer detection module is a system that detects automatically events/actions of interest in soccer matches. Provided with a set of pre-computed features from a video stream, the system will output a file with the corresponding event timecodes. The chosen pre-computed features were proved to be discriminative for the task of soccer event detection. They are used as an input for a classification sub-system based on Bayesian networks. This sub-system was trained and its parameters learned offline on a soccer matches database. Contrary to what exists in literature, the Bayesian model structure was also automatically learned, without using any expert knowledge. The structure learning property enables one to automatically train another model on another type of videos to detect some other type of events, without manually building the structure of the model. quaero-catalogue-210x210-v1.6-page-per-page.indd 143 02/10/2013 10:12:06
  • 146.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 144 Video segmentation: detection of cuts, dissolves, monochrome frames, silences and aspect ratio changes (4/3 & 16/9) Target users and customers The targeted users and customers are the multimedia industry actors, and all academic or industrial laboratories interested in video analysis. Partners: Inria Application sectors Multimedia document processing• Video indexing• quaero-catalogue-210x210-v1.6-page-per-page.indd 144 02/10/2013 10:12:19
  • 147.
    145 VidSeg: Video Segmentation Contactdetails: General issues: Patrick Gros patrick.gros@irisa.fr Conditions for access and use: IRISA/Texmex team Campus de Beaulieu 35042 Rennes Cedex France http://www.irisa.fr/ VidSeg is a software that has been developed at Irisa/Inria- Rennes and is the property of Inria. It is registered at the Agency for Program Protection (APP) in France under the references: * n°IDDN.FR.001.250009.000.S.P.2009.000.40000 VidSeg can be supplied under license on a case-by-case basis. Technical requirements: PC with Unix/Linux OS• This software requires the FFMPEG [1] software• installed on the system, as a third party library. [1] http://ffmpeg.org/ Technical issues: Sébastien Campion scampion@irisa.fr Description: VidSeg is a software tool dedicated to video segmentation. It detects cuts and dissolves transitions in a video, along with additional information: monochrome frames, silences and aspect ratio modifications (from 4/3 to 16/9 or inversely). The VidSeg software relies on the FFMPEG libraries for video decoding. Results are output in an XML format file as shown in figure 1 below. VidSeg was developed at Irisa/INRIA Rennes by the Texmex team. The VidSeg authors are: Manolis Delakis, Sébastien Campion. quaero-catalogue-210x210-v1.6-page-per-page.indd 145 02/10/2013 10:12:24
  • 148.
    Semantic Acquisition & Annotation(5) p8 to 17 Q&A (4) p20 to 27 Translation of Text and Speech (2) p46 to 49 Speech Processing (7) p30 to 43 Document Processing (10) p60 to 79 Audio Processing (3) p52 to 57 Object Recognition & Image Clustering (3) p82 to 87 Music Processing (7) p90 to 103 Indexing, Ranking and Retrieval (1) p106 to 107 Content Analysis (4) p110 to 117 Video Analysis & Structuring (12) p124 to 147 Gesture Recognition (1) p120 to 121 Core Technology Modules 146 Violent Scenes Detection – automatically detects violent scenes in movies Target users and customers All content providers may be interested in the violent scenes detection technology to manage efficiently their content database. In particular, the violent scenes detection technology is an interesting feature for VOD services, as it may help users select a movie suitable for the entire family. Partners: Technicolor Inria Application sectors Content structuring & browsing: Knowing where• the most violent scenes are in a movie gives direct access to these events, thus enabling a non linear browsing in the document. Content retrieval: By adding additional metadata• to contents (i.e., timecodes of violent scenes), the module finds a direct application in the context of content retrieval in videos databases. quaero-catalogue-210x210-v1.6-page-per-page.indd 146 02/10/2013 10:12:37
  • 149.
    147 Violent Scenes Detection Contactdetails: Claire-Helene Demarty claire-helene.demarty@technicolor.com Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com The Violent Scenes Detection module takes a set of• pre-computed features from a video stream as input. Currently running under matlab.• Module currently available as a matlab prototype only.• Corresponding deliverables are all stated QI – i.e., this• module is only available to Quaero partners, on their request. Related IPL is proprietary of Technicolor. Description: The Violent Scenes Detection module is a system which detects automatically the most violent scenes in a movie. Provided with a set of pre-computed audio and video features from a video stream, the system will output a file with the corresponding violent scenes timecodes. The chosen pre-computed features were proved to be discriminative for the task of violent scenes detection. They are used as an input for a classification sub-system based on Bayesian networks. This sub-system was trained and its parameters learned offline, on a movie database. The Bayesian model structure was also automatically learned, without using any expert knowledge. The structure learning property enables one to automatically train another model on another type of videos to detect some other type of events, without building the structure of the model manually. quaero-catalogue-210x210-v1.6-page-per-page.indd 147 02/10/2013 10:12:42
  • 150.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 148 Image search based on color content Application sectors Catalogues, e-commerce, web design, stock photography management, etc. Target users and customers Any organization possessing masses of images may be interested in indexing them regarding their color content. It may allow to search among images which have not been correctly described with textual metadata. Partners: Exalead quaero-catalogue-210x210-v1.6-page-per-page.indd 148 02/10/2013 10:12:49
  • 151.
    149 Contact details: Rémi Landais Remi.landais@3ds.com +33(0)1 55 35 26 26 Description Technical requirements: Conditions for access and use: Exalead SA 10 place de la Madeleine 75008 Paris France http://www.exalead.com/search/image/ http://chromatik.labs.exalead.com/ Within Chromatik, images may be searched by text or by color, by clicking on one or several color squares, adjusting the proportion of each selected color and/or selecting a color category (Bright vs Dark and Colorful vs Greyscale). The images must already be scanned and available in electronic form in any format.
 Their indexing relies on Exalead CloudView™ product and on Exalead Chromatik indexing service Commercially available through Exalead http://www.3ds.com/products/exalead/. Contact at contact@exalead.com As indexed pictures are obtained from FlickR API, Chromatik takes benefit from available metadata (location, owner id, tags, license) to improve the browsing (allowing to search images by geographical proximity, etc.). The user may also search images by similarity by selecting an image from the index or even by uploading an image specifying a url. Chromatik quaero-catalogue-210x210-v1.6-page-per-page.indd 149 02/10/2013 10:12:54
  • 152.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 150 Unified capture solution for incoming multimedia information Application sectors Bank and insurance companies• Public companies• Large companies with heterogeneous incoming• flows (postal mail, email, voice server, etc.) Target users and customers Large companies and organizations that face up to increasing amount of incoming data flow and to the diversification of information. The ongoing evolutions and convergence of Information Technology, Telecommunications and Medias are generating additional flows of unstructured multimedia information like voice messages, pictures, video sequences… Partners: Itesoft / A2iA / Vecsys / Vocapia / LTU technologies quaero-catalogue-210x210-v1.6-page-per-page.indd 150 02/10/2013 10:13:01
  • 153.
    MECA: Multimedia EnterpriseCApture 151 Contact details: Vincent Ehrström veh@itesoft.com Phone: +33 (0)4 66 35 77 00 Fax: +33 (0)4 66 35 77 01 Description Technical requirements: Conditions for access and use: ITESOFT Parc d’Andron / Le Séquoia 30470 Aimargues France www.itesoft.com The Multimedia Enterprise Capture platform processes high volumes of incoming multimedia information flow. It is able to capture the information automatically, analyze it, classify it and index it independently of the media. Automatic capture system processing multimedia• content Textual document extraction (machine printed,• handwritten) Multimedia content extraction (voice, picture)• Automatic document classification• Automatic document indexation• Manual validation interface• Export to CMS• Demo available: contact ITESOFT, Parc d’Andron, Le Séquoia, 30470 Aimargues. +33 (0)4 66 35 77 00 It offers a unified capture solution to large companies and organizations. quaero-catalogue-210x210-v1.6-page-per-page.indd 151 02/10/2013 10:13:04
  • 154.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 152 Advanced solution for multimedia and multilingual content processing Application sectors Market Issues: Press clippings, monitoring• reports, media analysis on specific interest (strategic intelligence, e-reputation, image study, etc.). Governmental Issues: Open Source Intelligence• on Defence or Homeland Security topics Target users and customers Media Intelligence Companies• Governmental Intelligence Agencies• Partners: Bertin Technologies / SYSTRAN / Vecsys quaero-catalogue-210x210-v1.6-page-per-page.indd 152 02/10/2013 10:13:11
  • 155.
    MediaCentric® 153 Contact details: Nabil Bouzerna,Nicolas Masson mediacentric@bertin.fr Description Technical requirements: Conditions for access and use: Bertin Technologies Parc d’Activité du Pas du Lac 10 bis avenue André Marie Ampère 78180 Montigny-le-Bretonneux France www.bertin.fr MediaCentric® is an advanced multimedia (video, audio, image and text) and multilingual content processing solution capable of supporting in a tight time massive amounts of data from multiple sources i.e. satellite and terrestrial broadcast TV / radio, Web TV, podcast, UGC (User-Generated Content) and social media (Twitter, Facebook, IRC,…). The platform powers a whole process from acquisition, monitoring, exploration and analysis to the dissemination of the critical pieces of information. Acquisition devices and hardware depend on targets and amounts of data to be processed (from PC to high performance cluster). All information available from Bertin Technologies. By the combined use of video, speech and image analysis technologies (face recognition, text extraction by OCR, etc.), text mining and translation, MediaCentric® makes the most of the richness conveyed by nowadays Media. Moreover, it offers a user-friendly interface, designed for promptness, to increase operator efficiency. quaero-catalogue-210x210-v1.6-page-per-page.indd 153 02/10/2013 10:13:19
  • 156.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 154 Solutions for speech processing Application sectors Media monitoring; Media Asset Management; Audio archives indexing Speech analytics in call centers; Security/Intelligence; etc. Target users and customers Audio content managers• Producers• Editors• Transcribers• Researchers• Monitors• Analysts• Partners: Vecsys / Bertin Technologies / Exalead / Itesoft / Orange / Yacast quaero-catalogue-210x210-v1.6-page-per-page.indd 154 02/10/2013 10:13:25
  • 157.
    MediaSpeech® product line 155 Contactdetails: Ariane Nabeth-Halber anabeth@vecsys.fr Description Technical requirements: Conditions for access and use: Vecsys Parc d’Activité du Pas du Lac 10 bis avenue André Marie Ampère 78180 Montigny-le-Bretonneux France http://www.vecsys.fr Vecsys has developed a family of highly efficient platforms for speech processing, typically offering such services as: spoken data extraction and partitioning speaker identification / speaker tracking automatic speech transcription speech and text synchronisation MediaSpeech® available solutions are the following: MediaSpeech® Factory: high availability distributed speech processing system (24/7), with a redundant cluster, designed and optimized to process huge Standard Web access On-premises or in-the-cloud (SaaS) Quotation on request volume of audio data, with a high efficiency process scheduler to handle a processing queue and load balancing. MediaSpeech® Lite: cost effective solution, without redundancy, for deployment on PC or standard server. MediaSpeech® VM: virtual machine solution. MediaSpeech® SaaS: set of hosted WebServices using the full capabilities of MediaSpeech Factory. These MediaSpeech® solutions are all accessible through the same WebServices communication interfaces (SOAP, REST, WEB pages, Web Content…), and share the same API. quaero-catalogue-210x210-v1.6-page-per-page.indd 155 02/10/2013 10:13:29
  • 158.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 156 A voice activated command technology Application sectors Connected TV, connected house• E-Business, E-Health, Logistics, Avionics,• Automotive Target users and customers Smartphone and tablet owners; Mobile people;• Hand-busy people Disabled or aged persons• Enterprise with mobile applications• Mobile Applications developers• Partners: Vecsys Technicolor quaero-catalogue-210x210-v1.6-page-per-page.indd 156 02/10/2013 10:13:36
  • 159.
    MobileSpeech 157 Contact details: Ariane Nabeth-Halber anabeth@vecsys.fr Description Technicalrequirements: Conditions for access and use: Vecsys Parc d’Activité du Pas du Lac 10 bis avenue André Marie Ampère 78180 Montigny-le-Bretonneux France http://www.vecsys.fr MediaSpeech® Mobile is a standalone Automatic Speech Recognition solution for smartphones and tablets. The speech recognition engine is optimized to reduce memory usage and processing requirements. It works in file mode or streaming mode, with a vocabulary up to several thousands of words. It processes constrained language (with standard W3C Java Speech format grammars) and natural language (with statistical models). Mobile OS (Android, iOS) or embedded OS Android/iPhone/embedded-OS package Quotation on request Vecsys offers tools, API and professional services to assist Partners and Customers in delivering successful applications. Such assistance includes: pronunciation dictionaries adaptation; phonetic or acoustic adaptation grammars development; statistical language models development and adaptation meeting target performances and accuracy validation/advising for spec and integration phases quaero-catalogue-210x210-v1.6-page-per-page.indd 157 02/10/2013 10:13:41
  • 160.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 158 The Music Mashup: an innovative search engine dedicated to music Application sectors Music editors, distributors or any organization needing a unified view of its business items (artists in the cloudmine context) based on different sources. Target users and customers Cloudmine may concern anybody interested in music. The general concept of gathering and integrating information from multiple web sources may also be extended to any other domain and then concern anybody using a search engine! Partners: Exalead IRCAM Télécom ParisTech quaero-catalogue-210x210-v1.6-page-per-page.indd 158 02/10/2013 10:13:48
  • 161.
    MuMa: The MusicMashup 159 Contact details: Jean-Marc Finsterwald jean-marc.finsterwald@3ds.com +33 (0)1 55 35 26 26 Description Technical requirements: Conditions for access and use: Exalead SA 10 place de la Madeleine 75008 Paris France http://music.labs.exalead.com/ MuMa, the Music Mashup, is an innovative search engine dedicated to music. It collects songs and information about it (artists, titles, albums, lyrics, concerts, tweets, pictures, biographies, prices...) from reference sources on the Web and displays it into a unique mashup way. Thanks to the collaboration with Ircam and Télécom ParisTech, MuMa also analyzes the content of songs, For each domain, reference sources must be clearly identified. Concerning the indexing, cloudmine relies on the Exalead CloudView™ product. Commercially available through Exalead: http://www.3ds. com/products/exalead/. Contact at contact@exalead.com allowing the user to search through music by chords, moods, genres, type of drums/guitar. The user may also browse MuMa content using search by similarity. Last but not least, the user may play the query with a real keyboard. quaero-catalogue-210x210-v1.6-page-per-page.indd 159 02/10/2013 10:13:53
  • 162.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 160 New generation of machine translation services for translating multimedia documents Application sectors OMTP can be used for media monitoring, business intelligence or public security applications, or in any other context where multimodal, multilingual data have to be processed (e.g. language service providers). Target users and customers Public and commercial actors in need of processing multilingual and multimodal documents (web pages, audio/video podcasts, OCRed documents) Partners: SYSTRAN / A2iA / Bertin Technologies / Exalead / Inria / LIMSI-CNRS / RWTH Aachen University / Vocapia quaero-catalogue-210x210-v1.6-page-per-page.indd 160 02/10/2013 10:14:00
  • 163.
    OMTP: Online MultimediaTranslation Platform 161 Contact details: Jean Senellart Directeur Recherche et Développement senellart@systran.fr Tel : + 33 (0) 1 44 82 49 49 Description Technical requirements: Conditions for access and use: SYSTRAN 5 rue Feydeau 75002 Paris France www.systransoft.com The online platform provides access to high-quality and real-time multimedia translation services, making the translation of web content, audio/video podcasts and graphical documents. It also enables to create new, customized (‘hyper-specialized’) translation models by training on resources created by automated data acquisition procedures. Online service with no specific technical• requirement API enabling third-party applications to integrate• translation services Demo available upon demand. Please contact Jean Senellart (details below) for any further inquiries. quaero-catalogue-210x210-v1.6-page-per-page.indd 161 02/10/2013 10:14:05
  • 164.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 162 Easy Search & innovative Discovery of Content through a unique User Experience which combines TV, gyroscopic Remote Control, Tablet and SmartPhone. Application sectors Media and Entertainment Target users and customers Network Service Operators• Internet Service Providers• Partners: Technicolor / Bertin Technologies / Exalead / Institut National de l’Audiovisuel / Inria / Karlsruhe Institute of Technology (KIT) / LTU technologies / Movea / Télécom ParisTech / Vecsys / Yacast quaero-catalogue-210x210-v1.6-page-per-page.indd 162 02/10/2013 10:14:11
  • 165.
    Personalized and socialTV 163 Contact details: Nathalie Cabel Nathalie.cabel@technicolor.com Description Technical requirements: Conditions for access and use: Technicolor R&D France 975, avenue des Champs Blancs ZAC des Champs Blancs / CS 176 16 35 576 Cesson-Sévigné France http://www.technicolor.com The solution offers the end-users a personal user experience to access content on TV with the combinaison of Tablet or SmartPhone. The end- user may explore easily with fun the large volumes of content and may display additional information about content watched on a second screen. The system integrates several technologies in different domains: content metadata enrichment, and search engine, recommendation engine, semantic analysis, privacy considerations, techniques of synchronization multi-devices, gesture, voice, facial and picture recognitions. The TV platform is hosted on the network and manages the catalogs of Video On Demand and of TV guide, the end-user accounts and their consumptions. Thanks to a web-based architecture, the end-users access portals on TV, Tablet or SmartPhone. The services leverage the latest innovations in terms of user interactions, such as facial recognition, gesture, voice and second screen devices. Thanks to efficient Search and Recommendation engines and a connection with the Social Networks, the end-user benefits from personalized services and can share his Media experience with his social community. quaero-catalogue-210x210-v1.6-page-per-page.indd 163 02/10/2013 10:14:19
  • 166.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 164 PlateusNet: a remote testing plateform for HMI’s usability Application sectors Information system design,• Complex system design,• Intervention in user centred process for• assessment of new products and services such as: Professional & general public• Software• Product• Interactive TV• Cockpit• Embedded & mobile product• Target users and customers Web agencies, bank and insurance, telecom companies, Complex systems (defense & civil), product Research & development. Partners: Bertin Technologies quaero-catalogue-210x210-v1.6-page-per-page.indd 164 02/10/2013 10:14:26
  • 167.
    PlateusNet 165 Contact details: Marie Vian vian@bertin.fr Description Technicalrequirements: Conditions for access and use: Bertin Technologies Parc d’Activité du Pas du Lac 10 bis avenue André Marie Ampère 78180 Montigny-le-Bretonneux France www.bertin.fr PlateusNet allows to pick up product or service HMIs and to create questionnaires for remote usability tests. A simulated HMI version is presented to many endusers (worldwide if required) to record all of their interactions with this HMI. All products or services can be assessed with PlateusNet at every step of the design process (mock-up, prototype or completed). No technical requirement is needed. End-user side application needs Windows OS, Java platform and internet connexion. All information available from Bertin Technologies. Results of all interactions are automatically collected on a database server. PlateusNet interprets them by statistical analysis with supervision of a usability expert. The aims of these analysis are to propose design recommendations to enhance usability and efficiency of the product or the service HMI at the early and each stage of design process. @ quaero-catalogue-210x210-v1.6-page-per-page.indd 165 02/10/2013 10:14:26
  • 168.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 166 Cloud-based collaborative service for website localization Application sectors Corporate websites (whatever the sector)• eCommerce websites• Blogs and individual websites• Target users and customers Users: Webmasters, professional translators, Marketing/PR professionals, Small Business owners Customers: International business leaders, SME with international ambitions, digital agencies, language service providers, eCommerce enterprises, start- ups, bloggers, individuals or associations, tourism industry actors (restaurants, hotels…) – in general any business or organization that needs to increase its international web exposure. Partners: SYSTRAN quaero-catalogue-210x210-v1.6-page-per-page.indd 166 02/10/2013 10:14:34
  • 169.
    SYSTRANLinks 167 Contact details: Jean Senellart DirecteurRecherche et Développement senellart@systran.fr Tel : +33 (0) 1 44 82 49 49 Description Technical requirements: Conditions for access and use: SYSTRAN 5 rue Feydeau 75002 Paris France www.systranlinks.com Online service, making website translation faster, easier and more cost-effective than classical solutions. This service consists of an innovative, collaborative and reliable online CMS platform that enables to launch and manage localization projects. Accessing the Online service requires only an Internet access and a browser; the service can be used for translating any website, whatever the technology behind it. No technical skill is required. A SYSTRAN account has to be created during the• subscription A Free version is offered, fully featured and suited• for websites with low traffic and small content to review 3 payment schemes for corporate or professional• websites with either higher traffic or larger content to be reviewed and edited quaero-catalogue-210x210-v1.6-page-per-page.indd 167 02/10/2013 10:14:40
  • 170.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 168 A unique application to explore public debates Application sectors Public institutions• Parliaments (in any country)• Political journalists• Sociologists• Target users and customers This application can be used by: political journalists, sociologists, students and more than that, any citizen Partners: Exalead Vecsys quaero-catalogue-210x210-v1.6-page-per-page.indd 168 02/10/2013 10:14:47
  • 171.
    Voxalead Débat Public 169 Contactdetails: Julien Law-To Julien.lawto@3ds.com +33 (0)1 55 35 26 26 Description Technical requirements: Conditions for access and use: Exalead SA 10 place de la Madeleine 75008 Paris France http://politics.labs.exalead.com/ Voting is the privilege of democratic citizens. But exercising it may be difficult: making good choices requires analyzing candidates’ positions across all important domains. While politicians have become more and more accessible by participating in media forums like talk shows, understanding their work still requires some effort. We want to make all the open data of our public institutions available and easy to browse by any end Manual transcription must be available. When audio or audio recording are available, we can align the recordings with the transcription. Commercially available through Exalead http://www.exalead.com/software/company/contact/ Exalead also works closely with leading information management specialists like Capgemini, EADS, Logica, TERMINALFOUR, Digirati, and Knowledge Concepts to provide you with enterprise search solutions tailored to your unique needs. See http://www.exalead.com/software/partners/channel/ user, with simple tools, through an innovative interface. What is said during the public debates is available as open-data. Using CloudView, we are able to analyze, enrich and index these debates. Users can search on different specific automatically extracted topics or search by keywords, on all the debates. It is also possible to focus on a political person. When browsing a debate, if the video is available, we can watch the video synchronized with the text, thanks to Vecsys. quaero-catalogue-210x210-v1.6-page-per-page.indd 169 02/10/2013 10:14:52
  • 172.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 170 Audio content-based video retrieval Application sectors Video search engine• Information retrieval, including videos• E-learning• Education• Defense and homeland security• Target users and customers Any organization possessing masses of video and audio contents can provide their users with access to these contents through this technology. Making the contents of the media automatically searchable and browsable with all the performances of a web search engine (robustness and scalability) provides a new experience to customers. Partners: Exalead Vocapia quaero-catalogue-210x210-v1.6-page-per-page.indd 170 02/10/2013 10:15:00
  • 173.
    Voxalead multimedia searchengine 171 Contact details: Julien Law-To Julien.lawto@3ds.com +33 (0)1 55 35 26 26 Description Technical requirements: Conditions for access and use: Exalead SA 10 place de la Madeleine 75008 Paris France http://politics.labs.exalead.com/ The audio part of the media is transcribed by a Vocapia component. The transcription is then analyzed, enriched and indexed by CloudView. The Voxalead demonstrator is composed of a result page and a play page that play the content interactively. The search can be done in different languages (English, French, German, Dutch, Spanish, Italian, Arabic and Chinese). The videos must be in electronic form. The better the quality of the audio part is and the better the transcription will be. Commercially available through Exalead http://www.exalead.com/software/company/contact/ Exalead also works closely with leading information management specialists like Capgemini, EADS, Logica, TERMINALFOUR, Digirati, and Knowledge Concepts to provide you with enterprise search solutions tailored to your unique needs. See http://www.exalead.com/software/partners/channel/ quaero-catalogue-210x210-v1.6-page-per-page.indd 171 02/10/2013 10:15:07
  • 174.
    Chromatik - p148 MediaCentric®- p152 MuMa: The Music Mashup - p158 SYSTRANLinks - p166 Voxalead multimedia search engine - p170 MECA: Multimedia Enterprise CApture - p150 MobileSpeech - p156 PlateusNet - p164 MediaSpeech® product line - p154 Personalized and social TV - p162 OMTP: Online Multimedia Translation Platform - p160 Voxalead Débat Public - p168 VoxSigma SaaS - p172 Application Demonstrators 172 Multilingual audio indexing, Teleconference transcription, Telephone speech analytics, Transcriptions of speeches, Subtitling Application sectors Multilingual audio indexing: the VoxSigma software suite• offers advanced language technologies to transform raw audio data into structured and searchable XML documents. It includes adaptive features allowing the transcription of noisy speech. Teleconference transcription: Vocapia’s speech-to-text• technology significantly reduces the cost of transcribing business conference calls (such as quarterly reports). Telephone speech analytics: Vocapia’s software processes• telephone data making recorded calls searchable and analyzable via text-based methods for call management companies. Transcriptions of speeches: VoxSigma is used by several• governmental organizations to provide easy access to audio and/or video content via time coded searchable XML documents. Subtitling: while fully automatic processing generally• does not deliver high enough quality subtitles, Vocapia’s technologies reduce the effort entailed when closely integrated in the subtitle creation process. Target users and customers The targeted users and customers of the VoxSigma SaaS are actors in the multimedia and call center sectors, including academic and industrial organizations, interested in the processing of audio documents. Partners: Vocapia LIMSI-CNRS quaero-catalogue-210x210-v1.6-page-per-page.indd 172 02/10/2013 10:15:14
  • 175.
    VoxSigma SaaS 173 Contact details: BernardProuts prouts@vocapia.com contact@vocapia.com +33 (0)1 84 17 01 14 Description Technical requirements: Conditions for access and use: Vocapia Research 28, rue Jean Rostand Parc Orsay Université France 91400 Orsay www.vocapia.com Vocapia developed a SaaS offer for the VoxSigma software suite complementary to the classical licensing, allowing customers to quickly reap the benefits of regular improvements to our technology. They can take advantage of additional features offered by the online environment, such as handling irregular processing needs by offering a high computing power. The VoxSigma SaaS offers three main processing functions: the identification of the language spoken in an audio document, the conversion of recorded speech input to text (speech-to-text transcription), and the synchronization of a transcription with the speech signal (also called speech-text alignment). Protocol: REST API over HTTPS. POST, GET and PUT HTTP methods are accepted. Both URI encoded requests and MIME multi-part requests are supported. Our service is available 24/7/365 with failover servers and geographic redundancy. It can be accessed via pay- as-you-go service and subscription offer. It handles content in many European languages as well as Mandarin and Arabic. LID systems are available for broadcast (15 languages currently available) and conversational data (50 languages). New languages can easily be added to the system. The system specifies the language of the audio document along with a confidence score. In the current version, it is assumed that a channel of an audio document is in a single language. In future versions, it is planned to allow multiple languages in a single document. The speech-to-text transcription systems are currently available for 17 languages for broadcast data, and for 7 languages for conversational speech. Each word is associated with start and end times and a confidence measure. quaero-catalogue-210x210-v1.6-page-per-page.indd 173 02/10/2013 10:15:20
  • 176.
    Table of Contents- Index A2iA Document Reader - Document Processing (p. 66) Bertin Technologies MediaCentric® (p. 152) PlateusNet (p. 164) Exalead Chromatik (p. 148) MuMa: The Music Mashup (p. 158) Voxalead Débat Public (p. 168) Voxalead multimedia search engine (p. 170) INRA AlvisAE: Alvis Annotation Editor - Semantic Acquisition & Annotation (p. 8) AlvisIR - Semantic Acquisition & Annotation (p. 10) AlvisNLP: Alvis Natural Language Processing - Semantic Acquisition & Annotation (p. 12) TyDI: Terminology Design Interface - Semantic Acquisition & Annotation (p. 16) INRIA KIWI: Keyword extractor - Semantic Acquisition & Annotation (p. 14) SAMuSA: Speech And Music Segmenter and Annotator - Audio Processing (p. 54) Music Structure - Music Processing (p. 102) AACI: Automatic acquisition and tracking of mobile target in image sequences - Video Analysis & Structuring (p. 124) C-Motion: Camera motion characterization - Video Analysis & Structuring (p. 128) VidSeg: Video Segmentation - Video Analysis & Structuring (p. 144) IRINTS: Irisa News Topic Segmenter - Content Analysis (p. 110) SloPy: Slope One with Privacy - Content Analysis (p. 116) IRCAM AudioPrint - Music Processing (p. 90) Ircamaudiosim: Acoustical Similarity Estimation - Music Processing (p. 92) Ircambeat: Music Tempo, Meter, Beat and Downbeat Estimation - Music Processing (p. 94) Ircamchord: Automatic Chord Estimation - Music Processing (p. 96) Ircammusicgenre and Ircammusicmood: Genre and Mood Estimation - Music Processing (p. 98) Ircamsummary: Music Summary Generation and Music Structure Estimation - Music Processing (p. 100) IRIT SYRIX: Information retrieval system in context - Indexing, Ranking and Retrieval (p. 106) iTESOFT MECA: Multimedia Enterprise Capture (p. 150) Jouve Colorimetric Correction System - Document Processing (p. 60) Document Classification System - Document Processing (p. 62) Document Layout Analysis System - Document Processing (p. 64) Document Structuring System - Document Processing (p. 68) Grey Level Character Recognition System - Document Processing (p. 70) Handwriting Recognition System - Document Processing (p. 72) Image Descreening System - Document Processing (p. 74) Image Resizing for Print on Demand Scanning - Document Processing (p. 76) Image Clusterization System - Object Recognition & Image Clustering (p. 82) Image Identification System - Object Recognition & ImageClustering (p. 84) Karlsruhe Institute of Technology (KIT) Speech-to-Text - Speech Processing (p. 42) Face Detection, Recognition and Analysis - Video Analysis & Structuring (p. 134) Multimedia Person Identification - Video Analysis & Structuring (p. 140) LIMSI-CNRS FIDJI: Finding In Documents Justifications and Inferences - Q&A (p. 20) QAVAL: Question Answering by VALidation - Q&A (p. 24) RITEL: Spoken and Interactive Question-Answering System - Q&A (p. 26) Acoustic Speaker Diarization - Speech Processing (p. 30) LTU Technologies LTU Leading Image Recognition Technologies - Object Recognition & Image Clustering (p. 86) Movea MoveaTV: Motion Processing Engine for interactive TV - Gesture Recognition (p. 120) RWTH Aachen University Machine Translation - Translation of Text and Speech (p. 46) Speech Translation - Translation of Text and Speech (p. 48) Automatic Speech Recognition - Speech Processing (p. 34) Recognition of Handwritten Text - Document Processing (p. 78) quaero-catalogue-210x210-v2-dernieres-pages.indd 1 02/10/2013 12:07:47
  • 177.
    Synapse Développement Question-Answering System- Q&A (p. 22) Sentiment analysis and Opinion mining - Content Analysis (p. 112) Persons, Places, Date, Organizations & Events Recognition - Content Analysis (p. 114) Systran OMTP: Online Multimedia Translation Platform (p. 160) SYSTRANLinks (p. 166) Technicolor Sync Audio Watermarking - Audio Processing (p. 52) Audience Characterization - Video Analysis & Structuring (p. 126) ContentArmor™ Video Watermarking - Video Analysis & Structuring (p. 130) Crowd Sourced Metadata - Video Analysis & Structuring (p. 132) Hybrid Broadcast Broadband Synchronization - Video Analysis & Structuring (p. 136) Movie Chaptering - Video Analysis & Structuring (p. 138) Soccer Event Detection - Video Analysis & Structuring (p. 142) Violent Scenes Detection - Video Analysis & Structuring (p. 146) Personalized and social TV (p. 162) Télécom ParisTech Yaafe: Audio feature extractor - Audio Processing (p. 56) Vecsys MediaSpeech® Alignment - Speech Processing (p. 32) Corinat: Language Resources production infrastructure - Speech Processing (p. 38) MediaSpeech® product line (p. 154) MobileSpeech (p. 156) Vocapia Research Automatic Speech Transcription - Speech Processing (p. 36) Language Identification - Speech Processing (p. 40) VoxSigma SaaS (p. 172) quaero-catalogue-210x210-v2-dernieres-pages.indd 2 02/10/2013 12:07:47
  • 178.
    Quaero en chiffres L’espritcollaboratif : Des workshops semestriels• réunissant plus de 130 chercheurs et ingénieurs 50 thèses• en cours dont 30 seront soutenues avant la fin du programme 35 nationalités• représentées une mobilité des jeunes acteurs académiques• recrutés par des partenaires industriels (SYSTRAN, Vocapia, LNE) ou à l’international (Etats-Unis, Canada, Chine, Suisse, Allemagne, etc.) La dynamique industrielle : 34 brevets déposés• 35 prototypes applicatifs développés• 9 distinctions• (meilleure démonstration ACM Multimedia Grand Challenge, Golden Mobile Award, TV Innovations Awards at CES 2012, etc.) Une position au niveau mondial• pour plusieurs partenaires industriels (LTU Technologie pour la reconnaissance d’images, Jouve pour la numérisation des livres et la conversion (e-books), Technicolor pour la TV sociale, interactive et personnalisée) L’excellence scientifique : Plus de 800 publications• dont 70 journaux, revues et livres 70 participations• à des campagnes d’évaluation nationales et internationales (classement le plus souvent dans les 3 premiers) 23 campagnes d’évaluation internes• conduites par le LNE 16 distinctions• (meilleure publication, prix jeune chercheur, prix de thèse, médaille de Cristal CNRS) 75 modules technologiques• élémentaires développés et transférés dans les prototypes applicatifs (dont plusieurs en open source accessibles sur sourceforge.net) Des résultats qui démontrent l’esprit collaboratif, l’excellence scientifique, et la dynamique industrielle du programme quaero-catalogue-210x210-v2-dernieres-pages.indd 3 02/10/2013 12:07:47
  • 179.
    Quaero in numbers Thecollaborative spirit: Bi-annual workshops gathering• more than 130 researchers and engineers 50 ongoing PhD theses among• which 30 will be defended before the end of the program 35 nationalities• Mobility of young academic actors• , recruited by industrial partners (SYSTRAN, Vocapia, LNE) or abroad (USA, Canada, China, Switzerland, Germany, etc.) The industrial dynamics: 34 patents• 35 application demonstrators• 9 awards• (Best Demo at ACM Multimedia Grand Challenge, Golden Mobile Award, TV Innovations Awards at CES 2012, etc.) A worldwide leading position• for several industrial partners (LTU Technologies for image recognition, Jouve for book digitalization and e-book conversion, Technicolor for social, interactive and personalized TV) The scientific excellence: More than 800 publications• among which 70 were published in scientific journals and books 70 participations• in national and international evaluation campaigns (regularly ranked in the top 3) More than 30 internal evaluation• campaigns every year 16 awards• (best publication, young researcher award, thesis prize, CNRS Crystal Medal, etc.) 75 core technology modules• developed and integrated into application demonstrators (among which several in open source, available on sourceforge.net) Results which attest to the collaborative spirit, scientific excellence, and industrial dynamics of the program quaero-catalogue-210x210-v2-dernieres-pages.indd 4 02/10/2013 12:07:47
  • 180.
    Avec le soutiende With the support of Quaero © 2013 – www.quaero.org quaero-catalogue-210x210-v2-dernieres-pages.indd 5 02/10/2013 12:08:03