Ontology-Based Audio Information Retrieval System

Ontology-Based Audio
Information Retrieval System
Presentation of the Masterthesis
by Lucas Mußmächer

12. Dezember 2017 2
Contents
● Introduction and Motivation
● Objectives of this Thesis
● Graphical Prototype
● Characteristics of the Dataset
● Data Preprocessing Pipeline
● Data Indexing and Visualisation
● Use Case Testing
● Summary

12. Dezember 2017 3
Student Radio Group
CampusCrew Passau
● Founded in 1995 by three students
● Group consists of 30 to 50 students (2017)
● Produce radio shows on various topics (e.g politics)
●
Episodes contain speech data as well as music data

12. Dezember 2017 4
Objectives of this Thesis
Implementation of a Retrieval System
● Requirements are gathered by a small user-survey
● Designing of graphical prototype
Reusability of Data
● Accessibility to audio files in the file archive
● Reusing of old radio shows (audio files) !

12. Dezember 2017 5
Motivation
Concept-based Search Functionality
● Keywords and concepts are indexed
● Search for concepts without using keywords
● Navigation between concepts
● Exploratory search with concepts

12. Dezember 2017 6
Modern Audio Retrieval Systems
Metadata Analysis vs
● Do not analyse the
speech content
● Content is ignored
Content Analysis
● Analysis of the content
● Indexing of the content
● Speech Recognition

12. Dezember 2017 7
Analysis of Multimedia Files
Content Analysis of Multimedia Data
● Descriptive text is very incomplete
● Lack of metadata
● Content analysis is very time-consuming
Goal: Analysing of the speech data

12. Dezember 2017 8
Ontology
"Ontology is a formal naming and definition of the types,
properties, and interrelationships of the entities that
fundamentally exist within a particular domain. "
● Entities are instances of concepts
● Attributes are used to describe concepts precisely
My Approach
● Focusing on concepts not on attributes
● Entities are defined as keywords in a text
● Concept is defined as a set of keywords

12. Dezember 2017 9
User Requirements
● 5 members of the student group were selected
● User were able to vote features
Collecting Features for the Retrieval System
● Features were discussed in the group
Ranking of the collected Features
● Features are ranked on user voting

12. Dezember 2017 10
List of User Requirements
Prototype was developed based on user requirements

Graphical Prototype (HTML Interface)
Search Interface
Search Result

Characteristics of the Dataset
● File archive consists of 2 terra bytes of data
● Total number of files: 265,786
● Classied audio files: 155,085
● Speech files: 14,281 (645 hours of speech)
● Data set was reduced to 30 mega bytes of text data

System Architecture (Python 3.5)
● View - HTML Interface (HTML5, CSS3, Yaml)
● Model - Data Processing (Sphinx, Spacy, nltk, PyDub, Tinytag)
● Controller – Webserver (Flask, Whoosh)

Data Processing (Model)
● Audio files are splitted into multiple audio segments
● Recognized speech is stored as text blocks

Data Filtering
● Metadata (ID3 tags) of the audio file is analysed
● Speech data is separated from music data
● Redundant audio files are removed

Audio Processing
● Information reduction (Audio processing)
● Loudness adaption (Normalization of the sound volume)
● Audio segmentation into multiple blocks (t =60 seconds)

Parallel Speech Recognition
● Speech recognition framework: Sphinx 4
● German language model: Voxforge (27,000 words)
● Parallel speech recognition: N-Sphinx processes

Speech Recognition Hypothesis
"The hypothesis of each audio segment is composed of a
sequence of words, with different probabilities" e.g.
● Framework returns a list of hypothesis
Only best hypothesis is used for indexing !

Results of the Speech Recognition
● Average number of words in each audio segment: 86
● 2.8% Words with low probabilities (Probability Filtering)
● Average Word Error Rate of the speech recognition: 42%
Reasons
● Multiple speakers and microphones
● Fast voice recording
● 21.328 individual words were recognized
● Word coverage rate of the language model was 78%

Information Extraction
● Relevant keywords are extracted from each text block
● Keywords are ranked by a formula
● Keywords are mapped to a set of concepts

Part of Speech Tagging
● Words with the part-of-speech tags ADJ, NOUN, PROPN
are classied as relevant keywords
● Words with different part-of-speech tags are ignored
● 9.9% of all words are relevant keywords

Keyword Ranking
● Relevance of each keyword is estimated by a
length-normalized TF-IDF function
● Highest ranked keywords are selected as input
for the similarity computation
Keyword RankingKeyword RankingKeyword RankingKeyword Ranking

Results of the Keyword Ranking

Concept Extraction
● Extracted keywords were mapped to 95 concepts
● Set of concepts was manually modeled
● Manual creation was a very timeconsuming task
● Encoded by the ontology language: RDF
Concept Hierarchy Concept Definition

Results of the Concept Extraction
● Text blocks with a minimum of one concept: 59%
● On average 2.34 concepts were extracted from each block

Data Indexing
● Results of the data processing steps were merged into one
common data structure
● Keywords and concepts were indexed by the retrieval system
● Ranking and weighting scheme: Okapi BM25
Example for one entry (audio segment)

Evaluation
Search Tasks (5 minutes per task)
●
Students had to complete six search tasks
● Search tasks were grouped into the three levels
of difficulty: easy, medium and hard

Results of the Evaluation
● Average success rate in all search tasks: 85%
● In almost every case the students used the normal
keyword-based search interface !!
● Common search strategy (60%): Combining several
specific keywords with the search operator (AND)

User Interviews
Four students were asked to answer questions about,
● Visual design of the interface
● Integration of concepts in the search process
● Overall usability of the system

Result of the User Interviews
● Visual design: very clear and structured
● Set of concepts could be displayed in a tree structure
Concept Search and Navigation
● Usefulness of the concept search: very low
● Concept definition was not clear
● Students prefered the classical keyword search
interface

Summary of the Thesis
● Many data processing steps were needed
● Speech recognition was the bottleneck in the processing
● Creation of the concepts was very time consuming
●
Word error rate of the speech recognition: 42%
● Lack of information on the German language model
●
Concept-based search functionality was not used for
solving the search tasks
● Students considered the system as very useful ;)

Final System
http://nlp-multimedia.fim.uni-passau.de
OR https://goo.gl/oHLLWv
● The retrieval system was introduced in October 2017
● Only available in the local University network (OpenVPN)

Future Work
● Disambiguation problem can be solved by integrating other
knowledge databases (Linked Open Data Cloud)
● New automatic concept detection algorithms
● Identification of the speaker in the audio signal
● Deep Neural Networks (DNN) for the speech recognition
to improve recognition accuracy
● Analysis of the past search querries in order to
understand the knowledge discovery process in
explorative search
● Designing of new search interfaces for a more efficient
way of formulating the queries with concepts

Data Visualization
● Keywords are displayed as text cloud
● Concepts are visualized as orange buttons supporting navigation
● Keywords, which occur frequently are highlighted

Navigation between Concepts
● The interface enables to navigate between concepts
● Related concepts of the search query are shown to the user

Similarity Computation
● A, B define a set of documents with the
keywords (a, b)
● Counting the frequency of the keywords in the
same text blocks (documents)
● Implemented by a modified version of the
Jaccard Coefficient

Results of the Similarity
Computation

Ontology-Based Audio Information Retrieval System

Recommended

Recommended

More Related Content

Similar to Ontology-Based Audio Information Retrieval System

Similar to Ontology-Based Audio Information Retrieval System (20)

Recently uploaded

Recently uploaded (20)

Ontology-Based Audio Information Retrieval System