2. 12. Dezember 2017 2
Contents
● Introduction and Motivation
● Objectives of this Thesis
● Graphical Prototype
● Characteristics of the Dataset
● Data Preprocessing Pipeline
● Data Indexing and Visualisation
● Use Case Testing
● Summary
3. 12. Dezember 2017 3
Student Radio Group
CampusCrew Passau
● Founded in 1995 by three students
● Group consists of 30 to 50 students (2017)
● Produce radio shows on various topics (e.g politics)
●
Episodes contain speech data as well as music data
4. 12. Dezember 2017 4
Objectives of this Thesis
Implementation of a Retrieval System
● Requirements are gathered by a small user-survey
● Designing of graphical prototype
Reusability of Data
● Accessibility to audio files in the file archive
● Reusing of old radio shows (audio files) !
5. 12. Dezember 2017 5
Motivation
Concept-based Search Functionality
● Keywords and concepts are indexed
● Search for concepts without using keywords
● Navigation between concepts
● Exploratory search with concepts
6. 12. Dezember 2017 6
Modern Audio Retrieval Systems
Metadata Analysis vs
● Do not analyse the
speech content
● Content is ignored
Content Analysis
● Analysis of the content
● Indexing of the content
● Speech Recognition
7. 12. Dezember 2017 7
Analysis of Multimedia Files
Content Analysis of Multimedia Data
● Descriptive text is very incomplete
● Lack of metadata
● Content analysis is very time-consuming
Goal: Analysing of the speech data
8. 12. Dezember 2017 8
Ontology
"Ontology is a formal naming and definition of the types,
properties, and interrelationships of the entities that
fundamentally exist within a particular domain. "
● Entities are instances of concepts
● Attributes are used to describe concepts precisely
My Approach
● Focusing on concepts not on attributes
● Entities are defined as keywords in a text
● Concept is defined as a set of keywords
9. 12. Dezember 2017 9
User Requirements
● 5 members of the student group were selected
● User were able to vote features
Collecting Features for the Retrieval System
● Features were discussed in the group
Ranking of the collected Features
● Features are ranked on user voting
10. 12. Dezember 2017 10
List of User Requirements
Prototype was developed based on user requirements
11. 12. Dezember 2017 11
Graphical Prototype (HTML Interface)
Search Interface
Search Result
12. 12. Dezember 2017 12
Characteristics of the Dataset
● File archive consists of 2 terra bytes of data
● Total number of files: 265,786
● Classied audio files: 155,085
● Speech files: 14,281 (645 hours of speech)
● Data set was reduced to 30 mega bytes of text data
13. 12. Dezember 2017 13
System Architecture (Python 3.5)
● View - HTML Interface (HTML5, CSS3, Yaml)
● Model - Data Processing (Sphinx, Spacy, nltk, PyDub, Tinytag)
● Controller – Webserver (Flask, Whoosh)
14. 12. Dezember 2017 14
Data Processing (Model)
● Audio files are splitted into multiple audio segments
● Recognized speech is stored as text blocks
15. 12. Dezember 2017 15
Data Filtering
● Metadata (ID3 tags) of the audio file is analysed
● Speech data is separated from music data
● Redundant audio files are removed
16. 12. Dezember 2017 16
Audio Processing
● Information reduction (Audio processing)
● Loudness adaption (Normalization of the sound volume)
● Audio segmentation into multiple blocks (t =60 seconds)
17. 12. Dezember 2017 17
Parallel Speech Recognition
● Speech recognition framework: Sphinx 4
● German language model: Voxforge (27,000 words)
● Parallel speech recognition: N-Sphinx processes
18. 12. Dezember 2017 18
Speech Recognition Hypothesis
"The hypothesis of each audio segment is composed of a
sequence of words, with different probabilities" e.g.
● Framework returns a list of hypothesis
Only best hypothesis is used for indexing !
19. 12. Dezember 2017 19
Results of the Speech Recognition
● Average number of words in each audio segment: 86
● 2.8% Words with low probabilities (Probability Filtering)
● Average Word Error Rate of the speech recognition: 42%
Reasons
● Multiple speakers and microphones
● Fast voice recording
● 21.328 individual words were recognized
● Word coverage rate of the language model was 78%
20. 12. Dezember 2017 20
Information Extraction
● Relevant keywords are extracted from each text block
● Keywords are ranked by a formula
● Keywords are mapped to a set of concepts
21. 12. Dezember 2017 21
Part of Speech Tagging
● Words with the part-of-speech tags ADJ, NOUN, PROPN
are classied as relevant keywords
● Words with different part-of-speech tags are ignored
● 9.9% of all words are relevant keywords
22. 12. Dezember 2017 22
Keyword Ranking
● Relevance of each keyword is estimated by a
length-normalized TF-IDF function
● Highest ranked keywords are selected as input
for the similarity computation
Keyword RankingKeyword RankingKeyword RankingKeyword Ranking
24. 12. Dezember 2017 24
Concept Extraction
● Extracted keywords were mapped to 95 concepts
● Set of concepts was manually modeled
● Manual creation was a very timeconsuming task
● Encoded by the ontology language: RDF
Concept Hierarchy Concept Definition
25. 12. Dezember 2017 25
Results of the Concept Extraction
● Text blocks with a minimum of one concept: 59%
● On average 2.34 concepts were extracted from each block
26. 12. Dezember 2017 26
Data Indexing
● Results of the data processing steps were merged into one
common data structure
● Keywords and concepts were indexed by the retrieval system
● Ranking and weighting scheme: Okapi BM25
Example for one entry (audio segment)
27. 12. Dezember 2017 27
Evaluation
Search Tasks (5 minutes per task)
●
Students had to complete six search tasks
● Search tasks were grouped into the three levels
of difficulty: easy, medium and hard
28. 12. Dezember 2017 28
Results of the Evaluation
● Average success rate in all search tasks: 85%
● In almost every case the students used the normal
keyword-based search interface !!
● Common search strategy (60%): Combining several
specific keywords with the search operator (AND)
29. 12. Dezember 2017 29
User Interviews
Four students were asked to answer questions about,
● Visual design of the interface
● Integration of concepts in the search process
● Overall usability of the system
30. 12. Dezember 2017 30
Result of the User Interviews
● Visual design: very clear and structured
● Set of concepts could be displayed in a tree structure
Concept Search and Navigation
● Usefulness of the concept search: very low
● Concept definition was not clear
● Students prefered the classical keyword search
interface
31. 12. Dezember 2017 31
Summary of the Thesis
● Many data processing steps were needed
● Speech recognition was the bottleneck in the processing
● Creation of the concepts was very time consuming
●
Word error rate of the speech recognition: 42%
● Lack of information on the German language model
●
Concept-based search functionality was not used for
solving the search tasks
● Students considered the system as very useful ;)
32. 12. Dezember 2017 32
Final System
http://nlp-multimedia.fim.uni-passau.de
OR https://goo.gl/oHLLWv
● The retrieval system was introduced in October 2017
● Only available in the local University network (OpenVPN)
33. 12. Dezember 2017 33
Future Work
● Disambiguation problem can be solved by integrating other
knowledge databases (Linked Open Data Cloud)
● New automatic concept detection algorithms
● Identification of the speaker in the audio signal
● Deep Neural Networks (DNN) for the speech recognition
to improve recognition accuracy
● Analysis of the past search querries in order to
understand the knowledge discovery process in
explorative search
● Designing of new search interfaces for a more efficient
way of formulating the queries with concepts
36. 12. Dezember 2017 36
Data Visualization
● Keywords are displayed as text cloud
● Concepts are visualized as orange buttons supporting navigation
● Keywords, which occur frequently are highlighted
37. 12. Dezember 2017 37
Navigation between Concepts
● The interface enables to navigate between concepts
● Related concepts of the search query are shown to the user
41. 12. Dezember 2017 41
Similarity Computation
● A, B define a set of documents with the
keywords (a, b)
● Counting the frequency of the keywords in the
same text blocks (documents)
● Implemented by a modified version of the
Jaccard Coefficient