YEAR 2008

Some words on Numediart (2007-2012)
Numediart is a long-term research programme centered on Digital Media Arts,...
Michael Biehl, professor at the University Of Groningen, presented the following lecture:

The theory of on-line learning ...
environment with regard to its previous states and actions? One will try to optimize its future
        actions while anal...
Presentation at the IEEE International Joint Conference on e-business and
Telecommunications, ICETE 2008 (Porto, Portugal,...
Statistical parametric speech synthesizers have recently shown their ability to produce natural
sounding voices. They also...
Upcoming SlideShare
Loading in...5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. YEAR 2008 Some words on Numediart (2007-2012) Numediart is a long-term research programme centered on Digital Media Arts, funded by Région Wallonne, Belgium (grant N°716631). Its main goal is to foster the development of new media technologies through digital performances and installations, in connection with local companies and artists. It is organized around three major R&D themes (HyFORGE – hypermedia navigation, COMEDIA – body and media, COPI – digital luthery) and is performed as a series of short (3-months) projects, typically 3 or 4 of them in parallel, which are concluded by a 1-week "hands on" workshop. Numediart is the result of collaboration between Polytech'Mons (Information Technology R&D pole) and UCL (TELE Lab), with a center of gravity in Mons, the cultural capital of Wallonia. It also benefits from the expertise of the MULTITEL research center on multimedia and telecommunications. As such, it is the R&D component of MONS 2015, a broader effort towards making Mons the cultural capital of Europe in 2015. Participation to the Numediart Project : Audio Skimming (January-March) This project aims at studying techniques and developing software components for skimming audio contents. Exactly as a scroll bar allows displaying interactively text materials, the audio skimmer widget will allow rendering interactively time-scaled audio materials with minimum sound distortion and rapidly accessing a segment of interest. Participation to the Numediart Project : TransVoice Table (January-March) The main idea of the TransVoice Table project is the development of a flexible software structure which allows the implementation of mapping strategies between three different modalities: voice- related audio inputs (through microphones or database contents), interactions (through sensors embedded on movable objects) and expressive voice transformation algorithms. The main context of this work is digital scenic arts, and more precisely contemporary theatre with technological contributions. Participation to the Numediart Project : Audio Thumbnailing (April-June) This project aims at studying techniques and developing software prototypes for analyzing the structure of music contents and extracting summary excerpts. Several acoustic features are proposed to describe music signals, namely timbral, harmonic and rhythmic features. Based on these features, a method is proposed to derive the similarity structure of the music signals and extract the most similar audio sections. The resulting structure is encoded in a XML format to be used within a graphical user interface developed in the Processing language, which provides the user with an enhanced listening experience of music contents. Lecture at Computational Intelligence and Learning doctoral school (Louvain- la-Neuve, April 28th)
  2. 2. Michael Biehl, professor at the University Of Groningen, presented the following lecture: The theory of on-line learning : In this set of lectures the basic concepts of the theoretical description and analysis of on-line learning in neural networks and other adaptive systems were introduced. The approach aimed at a mathematically exact description of the training dynamics in simplifying model situations. A key ingredient was the consideration of very large systems with many degrees of freedom which corresponds to high-dimensional data. This allows to perform averages over: (a) the stochastic nature of the training process and (b) the randomness contained in the training data. The formalism facilitates - the computation of typical learning curves in the model settings - the systematic evaluation and comparison of training algorithms - the optimization of training by means of variational methods. The basic concepts was first illustrated in the context of perceptron training. Already in this simple setting, problems like learning from noisy or non-stationary data can be addressed. In a second part non-trivial extensions to, e.g., multi-layered neural networks were presented. Next, unsupervised learning and prototype based systems (Learning Vector Quantization) were discussed in the framework of the theory. Finally a summary and outlook on interesting open problems were given. Oral session at the 17th Annual Belgian-Dutch Conference on Machine Learning (Spa, Belgium, May 19th-20th) My paper called “On the use of Machine Learning in Statistical Parametric Speech Synthesis” was accepted for an oral presentation. Here is the abstract: Statistical parametric speech synthesis has recently shown its ability to produce natural sounding speech while keeping a certain flexibility for voice transformation without requiring a huge amount of data. This abstract presents how machine learning techniques such as Hidden Markov Models in generation mode or context oriented clustering with decision trees are applied in speech synthesis. Fields that are investigated in our laboratory to improve this method are also discussed. Participation to the French Springer School on Theoretical Informatics (EPIT08, Porquerolles, France, May 25th-29th) The French Springer School on Theoretical Informatics 2008 had for purpose the automatic learning and its statistical approach, whose fundaments are due to Vapnik and Chervonenkis, at the end of the 60’s. Lessons mainly focused on the four following fields: - Kernel methods: Support vector machines (SVM) are certainly the most famous classifiers using kernels. These are used so as to project the data into a new representation space where it could be linearly separable. Between classes margin is then maximized. Other techniques employing kernels have also been discussed. - Reinforcement learning: These methods are a statistical (and non-linear) generalization of the classical automatics techniques. How a machine can learn from its surrounding
  3. 3. environment with regard to its previous states and actions? One will try to optimize its future actions while analyzing the exploitation-exploration trade-off. - Boosting: Or how, from a set of weak learners, to merge their information while avoiding any overfitting so as to keep excellent generalization capabilities to unseen data. - Parsimoniousness, wavelets and learning: Parsimoniousness consists in considering, among a large set of data, only the most relevant samples. Indeed, for a classification problem, only samples close to the boundaries between classes should have an impact on the final decision. Methods using wavelets and *-lets have also been presented in compression and learning contexts. Electroacoustics (FPMs, June) I passed the course entitled “Electroacoustics”. Acoustics is the study of sound. Until the 19th century, acoustics primarily consisted of the physics of sound propagation related to human hearing. During the early 1800's, electromagnetics was discovered and one of the first non-musical instrument sound generators, the telegraph, was developed. The invention of the telephone in 1876 resulted in the creation of microphones and loudspeakers, followed by the phonograph at the end of the 19th century. Radio was developed during the early 1900's. During the early part of the 20th century, a small group of researchers began applying engineering principles, such as equivalent circuits, to the science of acoustics in order to improve the design and construction of microphones and loudspeakers. This was the birth of the applied science of electroacoustics. Project Management (FPMs, June) I passed the course entitled “Project Management”. Project Management is the discipline of planning, organizing, and managing resources to bring about the successful completion of specific project goals and objectives. A project is a finite endeavor—having specific start and completion dates—undertaken to create a unique product or service which brings about beneficial change or added value. This finite characteristic of projects stands in sharp contrast to processes, or operations, which are permanent or semi-permanent functional work to repetitively produce the same product or service. In practice, the management of these two systems is often found to be quite different, and as such requires the development of distinct technical skills and the adoption of separate management philosophy, which is the subject of this course. The primary challenge of project management is to achieve all of the project goals and objectives while adhering to classic project constraints—usually scope, quality, time and budget. The secondary —and more ambitious—challenge is to optimize the allocation and integration of inputs necessary to meet pre-defined objectives. A project is a carefully defined set of activities that use resources (money, people, materials, energy, space, provisions, communication, motivation, etc.) to achieve the project goals and objectives.
  4. 4. Presentation at the IEEE International Joint Conference on e-business and Telecommunications, ICETE 2008 (Porto, Portugal, July 26th-29th) The major goal of ICETE is to bring together researchers, engineers and practitioners interested in information and communication technologies, including e-business, wireless networks and information systems, security and cryptography, signal processing and multimedia applications. These are the main knowledge areas that define the four component conferences, namely: ICE-B, SECRYPT, SIGMAP and WINSYS, which together form the ICETE joint conference. I presented there my paper entitled “Glottal Source Estimation Robustness - A comparison of sensitivity of voice source estimation techniques”. Here is the abstract: This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decomposition quality is assessed on synthetic signals through two objective measures: the spectral distortion and a glottal formant determination rate. Technique robustness is tested by analyzing the influence of noise and Glottal Closure Instant (GCI) location errors. Besides impacts of the fundamental frequency and the first formant on the performance are evaluated. Our proposed approach shows significant improvement in robustness, which could be of a great interest when decomposing real speech. Presentation at the European Signal Processing Conference, EUSIPCO 2008 (Lausanne, Switzerland, August 25th-29th) The 2008 European Signal Processing Conference (EUSIPCO-2008) is the sixteenth in a series of conferences promoted by EURASIP, the European Association for Signal Processing ( Formerly biannual, this conference is now a yearly event. This edition took place in Lausanne, Switzerland, organized by the Swiss Federal Institute of Technology, Lausanne (EPFL). I presented there my paper entitled “Voice source parameters estimation by fitting the glottal formant and the inverse filtering open phase”. Here is the abstract: This paper presents two approaches to the problem of extracting the parameters of the LF source model directly from the speech waveform. The first approach relies on the glottal formant estimated from the anticausal contribution of speech. Indeed the ZZT technique has recently shown its ability to deconvolve speech into its causal and anticausal components. The second method is based on the glottal open phase obtained by inverse filtering. The notion of unanalyzable frames and the way to detect and correct them are also presented. Once source parameters are extracted, the coefficients of the ARX speech production model are estimated by spectral division. Decomposition on both synthetic and natural speech, as well as an analysis-synthesis test confirm the accuracy of methods exposed. Presentation at the Information Technologies Seminars (FPMs, Mons, October 16th) My presentation dealt with the glottal source modeling in Statistical Parametric Speech Synthesis. Here is the abstract:
  5. 5. Statistical parametric speech synthesizers have recently shown their ability to produce natural sounding voices. They also gained considerable attention for their flexibility, smoothness and small footprint. Nevertheless their main disadvantage is the typical buzziness of the produced speech. This presentation addresses methods proposed to incorporate a more suited modeling of the source signal so as to enhance the delivered quality. Presentation at the IEEE International Conference on Multimodal Interfaces, ICMI 2008 (Chania, Greece, October 20th-22nd) The Tenth International Conference on Multimodal Interfaces (ICMI 2008) took place in Chania, Greece, on October 20-22, 2008. The main aim of ICMI 2008 was to further scientific research within the broad field of multimodal interaction and systems. The conference focused on major trends and challenges in this area, including help identify a roadmap for future research and commercial success. One of my Swiss colleague had the opportunity to present our paper entitled “Dynamic modality weighting for multi-stream HMMs in Audio-Visual Speech Recognition”. Here is the abstract: Merging decisions from different modalities is a crucial problem in Audio-Visual Speech Recognition. To solve this, state synchronous multi-stream HMMs have been proposed for their important advantage of incorporating stream reliability in their fusion scheme. This paper focuses on stream weight adaptation based on modality confidence estimators. We assume different and time-varying environment noise, as can be encountered in realistic applications, and, for this, adaptive methods are best-suited. Stream reliability is assessed directly through classifier outputs since they are not specific to either noise type or level. The influence of constraining the weights to sum to one is also discussed. Discussion on how Process Engineering can be applied to Sustainable Development (FPMs, Mons, November 26th) Sustainable development is a pattern of resource use that aims to meet human needs while preserving the environment so that these needs can be met not only in the present, but in the indefinite future. The term was used by the Brundtland Commission which coined what has become the most often-quoted definition of sustainable development as development that "meets the needs of the present without compromising the ability of future generations to meet their own needs. Sustainable development ties together concern for the carrying capacity of natural systems with the social challenges facing humanity. As early as the 1970s "sustainability" was employed to describe an economy "in equilibrium with basic ecological support systems". Ecologists have pointed to the “limits of growth” and presented the alternative of a “steady state economy” in order to address environmental concerns. The field of sustainable development can be conceptually broken into three constituent parts: environmental sustainability, economic sustainability and sociopolitical sustainability.