1. YEAR 2008
Some words on Numediart (2007-2012)
Numediart is a long-term research programme centered on Digital Media Arts, funded by Région
Wallonne, Belgium (grant N°716631). Its main goal is to foster the development of new media
technologies through digital performances and installations, in connection with local companies and
artists. It is organized around three major R&D themes (HyFORGE – hypermedia navigation,
COMEDIA – body and media, COPI – digital luthery) and is performed as a series of short (3-months)
projects, typically 3 or 4 of them in parallel, which are concluded by a 1-week "hands on" workshop.
Numediart is the result of collaboration between Polytech'Mons (Information Technology R&D pole)
and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits
from the expertise of the MULTITEL research center on multimedia and telecommunications. As
such, it is the R&D component of MONS 2015, a broader effort towards making Mons the cultural
capital of Europe in 2015.
Participation to the Numediart Project : Audio Skimming (January-March)
This project aims at studying techniques and developing software components for skimming audio
contents. Exactly as a scroll bar allows displaying interactively text materials, the audio skimmer
widget will allow rendering interactively time-scaled audio materials with minimum sound distortion
and rapidly accessing a segment of interest.
Participation to the Numediart Project : TransVoice Table (January-March)
The main idea of the TransVoice Table project is the development of a flexible software structure
which allows the implementation of mapping strategies between three different modalities: voice-
related audio inputs (through microphones or database contents), interactions (through sensors
embedded on movable objects) and expressive voice transformation algorithms. The main context of
this work is digital scenic arts, and more precisely contemporary theatre with technological
contributions.
Participation to the Numediart Project : Audio Thumbnailing (April-June)
This project aims at studying techniques and developing software prototypes for analyzing the
structure of music contents and extracting summary excerpts. Several acoustic features are proposed
to describe music signals, namely timbral, harmonic and rhythmic features. Based on these features,
a method is proposed to derive the similarity structure of the music signals and extract the most
similar audio sections. The resulting structure is encoded in a XML format to be used within a
graphical user interface developed in the Processing language, which provides the user with an
enhanced listening experience of music contents.
Lecture at Computational Intelligence and Learning doctoral school (Louvain-
la-Neuve, April 28th)
2. Michael Biehl, professor at the University Of Groningen, presented the following lecture:
The theory of on-line learning : In this set of lectures the basic concepts of the theoretical
description and analysis of on-line learning in neural networks and other adaptive systems were
introduced. The approach aimed at a mathematically exact description of the training dynamics in
simplifying model situations. A key ingredient was the consideration of very large systems with many
degrees of freedom which corresponds to high-dimensional data. This allows to perform averages
over:
(a) the stochastic nature of the training process and
(b) the randomness contained in the training data.
The formalism facilitates
- the computation of typical learning curves in the model settings
- the systematic evaluation and comparison of training algorithms
- the optimization of training by means of variational methods.
The basic concepts was first illustrated in the context of perceptron training. Already in this simple
setting, problems like learning from noisy or non-stationary data can be addressed. In a second part
non-trivial extensions to, e.g., multi-layered neural networks were presented. Next, unsupervised
learning and prototype based systems (Learning Vector Quantization) were discussed in the
framework of the theory. Finally a summary and outlook on interesting open problems were given.
Oral session at the 17th Annual Belgian-Dutch Conference on Machine
Learning (Spa, Belgium, May 19th-20th)
My paper called “On the use of Machine Learning in Statistical Parametric Speech Synthesis” was
accepted for an oral presentation. Here is the abstract:
Statistical parametric speech synthesis has recently shown its ability to produce natural sounding
speech while keeping a certain flexibility for voice transformation without requiring a huge amount
of data. This abstract presents how machine learning techniques such as Hidden Markov Models in
generation mode or context oriented clustering with decision trees are applied in speech synthesis.
Fields that are investigated in our laboratory to improve this method are also discussed.
Participation to the French Springer School on Theoretical Informatics
(EPIT08, Porquerolles, France, May 25th-29th)
The French Springer School on Theoretical Informatics 2008 had for purpose the automatic learning
and its statistical approach, whose fundaments are due to Vapnik and Chervonenkis, at the end of
the 60’s. Lessons mainly focused on the four following fields:
- Kernel methods: Support vector machines (SVM) are certainly the most famous classifiers
using kernels. These are used so as to project the data into a new representation space
where it could be linearly separable. Between classes margin is then maximized. Other
techniques employing kernels have also been discussed.
- Reinforcement learning: These methods are a statistical (and non-linear) generalization of
the classical automatics techniques. How a machine can learn from its surrounding
3. environment with regard to its previous states and actions? One will try to optimize its future
actions while analyzing the exploitation-exploration trade-off.
- Boosting: Or how, from a set of weak learners, to merge their information while avoiding any
overfitting so as to keep excellent generalization capabilities to unseen data.
- Parsimoniousness, wavelets and learning: Parsimoniousness consists in considering, among
a large set of data, only the most relevant samples. Indeed, for a classification problem, only
samples close to the boundaries between classes should have an impact on the final
decision. Methods using wavelets and *-lets have also been presented in compression and
learning contexts.
Electroacoustics (FPMs, June)
I passed the course entitled “Electroacoustics”. Acoustics is the study of sound. Until the 19th
century, acoustics primarily consisted of the physics of sound propagation related to human hearing.
During the early 1800's, electromagnetics was discovered and one of the first non-musical
instrument sound generators, the telegraph, was developed. The invention of the telephone in 1876
resulted in the creation of microphones and loudspeakers, followed by the phonograph at the end of
the 19th century. Radio was developed during the early 1900's.
During the early part of the 20th century, a small group of researchers began applying engineering
principles, such as equivalent circuits, to the science of acoustics in order to improve the design and
construction of microphones and loudspeakers. This was the birth of the applied science of
electroacoustics.
Project Management (FPMs, June)
I passed the course entitled “Project Management”. Project Management is the discipline of
planning, organizing, and managing resources to bring about the successful completion of specific
project goals and objectives. A project is a finite endeavor—having specific start and completion
dates—undertaken to create a unique product or service which brings about beneficial change or
added value. This finite characteristic of projects stands in sharp contrast to processes, or operations,
which are permanent or semi-permanent functional work to repetitively produce the same product
or service. In practice, the management of these two systems is often found to be quite different,
and as such requires the development of distinct technical skills and the adoption of separate
management philosophy, which is the subject of this course.
The primary challenge of project management is to achieve all of the project goals and objectives
while adhering to classic project constraints—usually scope, quality, time and budget. The secondary
—and more ambitious—challenge is to optimize the allocation and integration of inputs necessary to
meet pre-defined objectives. A project is a carefully defined set of activities that use resources
(money, people, materials, energy, space, provisions, communication, motivation, etc.) to achieve
the project goals and objectives.
4. Presentation at the IEEE International Joint Conference on e-business and
Telecommunications, ICETE 2008 (Porto, Portugal, July 26th-29th)
The major goal of ICETE is to bring together researchers, engineers and practitioners interested in
information and communication technologies, including e-business, wireless networks and
information systems, security and cryptography, signal processing and multimedia applications.
These are the main knowledge areas that define the four component conferences, namely: ICE-B,
SECRYPT, SIGMAP and WINSYS, which together form the ICETE joint conference. I presented there
my paper entitled “Glottal Source Estimation Robustness - A comparison of sensitivity of voice source
estimation techniques”. Here is the abstract:
This paper addresses the problem of estimating the voice source directly from speech waveforms. A
novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal
open phase. This technique is compared to two other state-of-the-art well-known methods, namely
the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms.
Decomposition quality is assessed on synthetic signals through two objective measures: the spectral
distortion and a glottal formant determination rate. Technique robustness is tested by analyzing the
influence of noise and Glottal Closure Instant (GCI) location errors. Besides impacts of the
fundamental frequency and the first formant on the performance are evaluated. Our proposed
approach shows significant improvement in robustness, which could be of a great interest when
decomposing real speech.
Presentation at the European Signal Processing Conference, EUSIPCO 2008
(Lausanne, Switzerland, August 25th-29th)
The 2008 European Signal Processing Conference (EUSIPCO-2008) is the sixteenth in a series of
conferences promoted by EURASIP, the European Association for Signal Processing
(www.eurasip.org). Formerly biannual, this conference is now a yearly event. This edition took place
in Lausanne, Switzerland, organized by the Swiss Federal Institute of Technology, Lausanne (EPFL). I
presented there my paper entitled “Voice source parameters estimation by fitting the glottal formant
and the inverse filtering open phase”. Here is the abstract:
This paper presents two approaches to the problem of extracting the parameters of the LF source
model directly from the speech waveform. The first approach relies on the glottal formant estimated
from the anticausal contribution of speech. Indeed the ZZT technique has recently shown its ability to
deconvolve speech into its causal and anticausal components. The second method is based on the
glottal open phase obtained by inverse filtering. The notion of unanalyzable frames and the way to
detect and correct them are also presented. Once source parameters are extracted, the coefficients
of the ARX speech production model are estimated by spectral division. Decomposition on both
synthetic and natural speech, as well as an analysis-synthesis test confirm the accuracy of methods
exposed.
Presentation at the Information Technologies Seminars (FPMs, Mons,
October 16th)
My presentation dealt with the glottal source modeling in Statistical Parametric Speech Synthesis.
Here is the abstract:
5. Statistical parametric speech synthesizers have recently shown their ability to produce natural
sounding voices. They also gained considerable attention for their flexibility, smoothness and small
footprint. Nevertheless their main disadvantage is the typical buzziness of the produced speech. This
presentation addresses methods proposed to incorporate a more suited modeling of the source
signal so as to enhance the delivered quality.
Presentation at the IEEE International Conference on Multimodal Interfaces,
ICMI 2008 (Chania, Greece, October 20th-22nd)
The Tenth International Conference on Multimodal Interfaces (ICMI 2008) took place in Chania,
Greece, on October 20-22, 2008. The main aim of ICMI 2008 was to further scientific research within
the broad field of multimodal interaction and systems. The conference focused on major trends and
challenges in this area, including help identify a roadmap for future research and commercial
success. One of my Swiss colleague had the opportunity to present our paper entitled “Dynamic
modality weighting for multi-stream HMMs in Audio-Visual Speech Recognition”. Here is the abstract:
Merging decisions from different modalities is a crucial problem in Audio-Visual Speech Recognition.
To solve this, state synchronous multi-stream HMMs have been proposed for their important
advantage of incorporating stream reliability in their fusion scheme. This paper focuses on stream
weight adaptation based on modality confidence estimators. We assume different and time-varying
environment noise, as can be encountered in realistic applications, and, for this, adaptive methods
are best-suited. Stream reliability is assessed directly through classifier outputs since they are not
specific to either noise type or level. The influence of constraining the weights to sum to one is also
discussed.
Discussion on how Process Engineering can be applied to Sustainable
Development (FPMs, Mons, November 26th)
Sustainable development is a pattern of resource use that aims to meet human needs while
preserving the environment so that these needs can be met not only in the present, but in the
indefinite future. The term was used by the Brundtland Commission which coined what has become
the most often-quoted definition of sustainable development as development that "meets the needs
of the present without compromising the ability of future generations to meet their own needs.
Sustainable development ties together concern for the carrying capacity of natural systems with the
social challenges facing humanity. As early as the 1970s "sustainability" was employed to describe an
economy "in equilibrium with basic ecological support systems". Ecologists have pointed to the
“limits of growth” and presented the alternative of a “steady state economy” in order to address
environmental concerns.
The field of sustainable development can be conceptually broken into three constituent parts:
environmental sustainability, economic sustainability and sociopolitical sustainability.