Multilevel Audio Descriptors @WWW09 develtrack


Published on

Presentation done at WWW 2009 Conference in Madrid, Spain introducing our work in using Linked Open Data as a way to add semantic descriptors to those coming from low-level signal analysis.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Multilevel Audio Descriptors @WWW09 develtrack

  1. 1. Combining multi-level audio descriptors via web identification and aggregation (a.k.a. The CLAM Aggregator) WWW '09 rackdev <ul><ul><li>Jun Wang (Chinese Academy of Sciences) </li></ul></ul><ul><ul><li>Xavier Amatriain (Telefonica Research) </li></ul></ul><ul><ul><li>David Garcia (Barcelona Media) </li></ul></ul><ul><ul><li>Jinlin Wang (Chinese Academy of Sciences) </li></ul></ul>
  2. 2. Index <ul><li>The CLAM Framework </li></ul><ul><ul><li>Infrastructure </li></ul></ul><ul><ul><li>Repositories </li></ul></ul><ul><ul><li>Tools </li></ul></ul><ul><ul><li>Applications </li></ul></ul><ul><ul><li>Prototyping </li></ul></ul><ul><li>The CLAM Annotator </li></ul><ul><li>The CLAM Aggregator </li></ul><ul><li>Demo! </li></ul>
  3. 3. The CLAM Framework
  4. 4. CLAM::Highlights <ul><li>Started in October 2000 </li></ul><ul><li>Cross-platform ANSI C++ regularly compiled and tested on GNU/Linux, Mac OSX and Windows </li></ul><ul><li>Currently specializes in audio and music, but presents a metamodel and tools for general multimedia. </li></ul><ul><li>Two different working modes: application framework and rapid-prototyping. </li></ul><ul><li>It is really OO and has been documented through a Pattern Language and a DSL </li></ul><ul><li>It is efficient and can be used for real-time applications </li></ul>
  5. 5. CLAM::Components
  6. 6. CLAM::Infrastructure. 4MPS metamodel <ul><li>The CLAM network is a graphical model of computation based on Dataflow Process Networks </li></ul><ul><li>Scheduling can be performed both statically and dynamically, depending on the particular application. </li></ul>
  7. 7. CLAM::Infrastructure. Processing
  8. 8. CLAM::Repositories (black-box) <ul><li>Ready-to-use processing classes (around 250): </li></ul><ul><ul><li>Analysis (FFT, spectral analysis, SMS analysis, Tonal Analysis, descriptor extraction...), Arithmetic Operators, Input/Output Processing Objects (Audio, AudioFile, MIDI, SDIF), Generators, Transformation, Synthesis </li></ul></ul><ul><li>Ready-to-use data classes : </li></ul><ul><ul><li>Audio, Spectrum, SpectralPeakArray, Fundamental, Frame, Segment, Descriptors... </li></ul></ul>
  9. 9. CLAM::Tools. <ul><li>Platform Abstraction </li></ul><ul><ul><li>Audio I/O </li></ul></ul><ul><ul><li>MIDI I/O </li></ul></ul><ul><ul><li>Audio File I/O: wav, aiff, mp3, ogg, id3 tags </li></ul></ul><ul><ul><li>SDIF File support </li></ul></ul><ul><ul><li>Support for OSC, JACK, LADSPA, SDIF, VST, ASIO... </li></ul></ul><ul><li>XML: Any Processing Data or Configuration has automatic XML persistency. </li></ul><ul><li>GUI: Visualization module based on the Qt toolkit plus many ready-to-use graphical components (widgets) </li></ul>
  10. 10. CLAM::3 rd Party OS libraries <ul><li>FFTW (FFT) </li></ul><ul><li>Xercesc & libxml (XML using DOM API) </li></ul><ul><li>qt GUI toolkit </li></ul><ul><li>RtAudio, PortAudio or DirectX (for Windows audio) </li></ul><ul><li>Libsndfile, Ogg-Vorbis, libmad (mp3), id3lib, for handling audio files. </li></ul><ul><li>oscpack </li></ul><ul><li>libjack </li></ul><ul><li>CppUnit (testing framework,used for development) </li></ul><ul><li>pthreads (multithreading on Windows) </li></ul>
  11. 11. CLAM::Applications
  12. 12. CLAM::Prototyping
  13. 13. NetworkEditor on Youtube Demo
  14. 14. Audio/Music Description and Annotation in CLAM
  15. 15. Semantic Media 2.0 Glossary <ul><li>Annotation </li></ul><ul><ul><li>Auxiliary information about a piece of existing data </li></ul></ul><ul><li>Metadata </li></ul><ul><ul><li>Annotation that has been converted into a particular data format (e.g. XML) </li></ul></ul><ul><ul><li>Metadata = Annotation + Format </li></ul></ul><ul><li>Descriptor </li></ul><ul><ul><li>Atomic annotation usually consisting of a pair (label, value) </li></ul></ul><ul><li>Extractor </li></ul><ul><ul><li>Tool to generate an annotation automatically </li></ul></ul>
  16. 16. Aggregator + SemWeb Extractor Demo
  17. 17. The CLAM Annotator <ul><li>Flexible GUI for annotating and editing multi-level audio and music description </li></ul>
  18. 18. Schemas and Descriptor Pools <ul><li>Annotations based on several XML files </li></ul><ul><li>Annotation Schema </li></ul><ul><ul><li>Describes number, type, and range of descriptors </li></ul></ul><ul><ul><li>Descriptors defined at different levels and with different scopes (frame, segment, song...) </li></ul></ul><ul><li>Descriptor Pool </li></ul><ul><ul><li>File containing the actual value for the descriptors </li></ul></ul><ul><ul><li>May be generated by 3 rd party extractors as long as they observe the schema </li></ul></ul>
  19. 19. Aggregation <ul><li>CLAM Aggregator: dynamic GUI for the combination of descriptors </li></ul><ul><li>Extracts subsets of descriptors from pools generated by different extractors depending on dynamic configuration </li></ul><ul><li>Configuration based on sources and maps </li></ul><ul><ul><li>Sources : instantiates schema, and defines extractors and descriptors pool file suffix </li></ul></ul><ul><ul><li>Map : defines selected attributes and maps their scope::attribute from the source to the target aggregated pool file </li></ul></ul>
  20. 20. Web Identification <ul><li>The CLAM Aggregator includes a Web extractor based on GNAT [Raimond et al. 08] </li></ul><ul><ul><li>GNAT uses audio fingerprinting and metadata to identify songs on MusicBrainz (MBID) </li></ul></ul><ul><ul><li>It then outputs RDF statements linking local files with remote web identifiers </li></ul></ul><ul><li>Using the MBID and the RDF statements the extractor crawls through several Open Linked Web datasets </li></ul><ul><li>It extracts high-level descriptors such as editorial metadata, user comments, reviews, tags... </li></ul>
  21. 21. Open Linked Data Web <ul><li>The goal of the W3C SWEO Linking Open Data community project is to extend the Web by publishing various open data sets as RDF on the Web and by setting RDF links between items from different sources. </li></ul><ul><li>RDF links enable you to navigate from a data item to related data items within other sources using a Semantic Web browser. RDF links can also be followed by the crawlers of Semantic Web </li></ul>4.5 billion RDF triples, interlinked by around 180 million RDF links (March 2009).
  22. 22. Conclusions <ul><li>Problem : Bridging the gap between low-level signal description and high-level semantic descriptions available on the web </li></ul><ul><li>Solution : Identification + Aggregation + Open Linked Data </li></ul><ul><li>This opens up a great avenue for Semantic Web Media services </li></ul>
  23. 23. Visit us at.... The CLAM Project
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.