SlideShare a Scribd company logo
Indexing and Retrieval of 
Audio 
Rachmat Wahid Saleh Insani, S.Kom 
Multimedia Database Management System - Chapter 5
Introduction 
• Audio is classified into three types: speech, music, 
and noise. 
• Different audio types are processed and indexed in 
different ways. 
• Query audio pieces are similarly classified, processed, 
and indexed. 
• Audio pieces are retrieved based on similarity between 
the query index and the audio index in the database. 
Multimedia Database Management System - Chapter 5
Objectives 
• Main audio properties and features. 
• Audio classification. 
• Main speech recognition techniques. 
• General approach in indexing and retrieval. 
• Temporal and content relationship between media 
types. 
Multimedia Database Management System - Chapter 5
Main Audio Properties and 
Features 
• Time domain 
• Frequency domain 
Multimedia Database Management System - Chapter 5
Features Derives in the 
Time Domain 
A signal is represented as amplitude varying with 
time. 
Multimedia Database Management System - Chapter 5
Features Derives in the 
Time Domain 
• Average energy 
• Zero crossing rate 
• Silence ratio 
E = 
NΣ 
Multimedia Database Management System - Chapter 5 
N−1 Σ 
N 
x(n)2 
n=0 
ZC = 
| sgn x(n) − sgn x(n −1) 
n=1 
2N
Features Derived from 
the Frequency Domain 
• Sound spectrum 
Multimedia Database Management System - Chapter 5
Features Derived from 
the Frequency Domain 
• Bandwidth 
• Energy Distribution 
• Harmonicity 
Multimedia Database Management System - Chapter 5 
• Pitch
Timbre 
• Quality of a sound. 
Multimedia Database Management System - Chapter 5
Audio Classification 
Why audio classification is important? 
- Different audio types require different processing and indexing 
retrieval techniques. 
- Different audio types have different significance to different 
applications. 
- Speech is important audio types which is successful speech 
recognition techniques available. 
- Audio types is very useful to some applications. 
- The search space after classification is reduced to a particular 
audio class during the retrieval process. 
Multimedia Database Management System - Chapter 5
Audio Classification 
• There are two types of sound: speech and music. 
Multimedia Database Management System - Chapter 5
Main Characteristics 
Music 
• Music has frequency range 
from 16-20,000 Hz. 
• Music has low silence ratio. 
• Music has regular beats. 
Multimedia Database Management System - Chapter 5 
Speech 
• Speech frequency 
range from 100-7,000 
Hz. 
• Speech has high 
silence ratio. 
• No regular beats.
Audio Classification 
Frameworks 
• Step by Step Classification 
• Feature Vector Based Audio Classification 
Multimedia Database Management System - Chapter 5
Step by Step 
Classification 
Multimedia Database Management System - Chapter 5
Feature Vector Based 
Audio Classification 
Audio pieces of the same class are located close to 
each other in the feature space and audio pieces of 
different classes are located far apart in the feature 
space. 
Multimedia Database Management System - Chapter 5
Speech Recognition 
and Retrieval 
Multimedia Database Management System - Chapter 5
Automatic 
Speech Recognition 
ASR system collect models or feature vectors for all 
possible speech units. Speech unit e.g., phoneme, 
word, and phrases. 
Multimedia Database Management System - Chapter 5
Automatic Speech 
Recognition Factors 
• A phoneme spoken by different speakers or the same 
speaker at different times produces different features in 
term of duration, amplitude, and frequency 
components. 
• The above differences are exacerbated by the 
background or environmental noise. 
• Normal speech is continuous and difficult to separate 
into individual phonemes. 
• Phonemes vary with their location in a word. 
Multimedia Database Management System - Chapter 5
General ASR System 
Multimedia Database Management System - Chapter 5
Speech Recognition 
Performance 
Speech recognition performance is normally measured by 
recognition error rate. The lower the error rate, the higher the 
performance. 
The performance are affected by following factors: 
- Subject matter: this may vary from a set of digits, a 
newspaper article, to general news. 
- Types of speech: read or spontaneous conversation. 
- Size of the vocabulary: it ranges from dozens to a few 
thousand words. 
Multimedia Database Management System - Chapter 5
Music Indexing and 
Retrieval 
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Structured 
Music and Sound Effects 
• Structured music are represented by a set of 
commands. 
• The most common structured music is MIDI. 
• A new standard of structured audio is MPEG-4 
Structured Audio. 
• The formats contains structure and notes 
description. 
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of Structured 
Music and Sound Effects 
Multimedia Database Management System - Chapter 5
Indexing and Retrieval of 
Sample Based Music 
• Based on extracted sound features. 
• Based on pitches of music notes. 
Multimedia Database Management System - Chapter 5
Music Retrieval Based on a 
set of Features 
Multimedia Database Management System - Chapter 5
Music Retrieval Based on 
Pitch 
Multimedia Database Management System - Chapter 5
Multimedia Information IR Using 
Relationships between Audio and Other 
Media 
Multimedia Database Management System - Chapter 5

More Related Content

What's hot

Multimedia:Multimedia compression
Multimedia:Multimedia compression Multimedia:Multimedia compression
Multimedia:Multimedia compression
St Mary's College,Thrissur,Kerala
 
IMAGE FILE FORMATS
IMAGE FILE FORMATSIMAGE FILE FORMATS
IMAGE FILE FORMATS
WendelynAchacoso
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
A B Shinde
 
Multimedia db system
Multimedia db systemMultimedia db system
Multimedia db system
Yojana Nanaware
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
Gabriel Hamilton
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Digital Image Fundamentals
Digital Image FundamentalsDigital Image Fundamentals
Digital Image Fundamentals
A B Shinde
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data Compression
Pratik Pradhan
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
krisztianbalog
 
Web content mining
Web content miningWeb content mining
Web content mining
Akanksha Dombe
 
Image file formats
Image file formatsImage file formats
Image file formats
Bob Watson
 
DigitalImageProcessing 9-Morphology.ppt
DigitalImageProcessing 9-Morphology.pptDigitalImageProcessing 9-Morphology.ppt
DigitalImageProcessing 9-Morphology.ppt
FazaFoudhaili
 
Image Indexing and Retrieval
Image Indexing and RetrievalImage Indexing and Retrieval
Image Indexing and Retrieval
Rachmat Wahid Saleh Insani
 
Multimedia systems
Multimedia systemsMultimedia systems
Multimedia systems
greg robertson
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image Compression
Kalyan Acharjya
 
Data compression
Data compressionData compression
Data compression
Abhishek Grover
 
Noise filtering
Noise filteringNoise filtering
Noise filtering
Alaa Ahmed
 
Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
 
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVECTYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
Mathankumar S
 

What's hot (20)

Multimedia:Multimedia compression
Multimedia:Multimedia compression Multimedia:Multimedia compression
Multimedia:Multimedia compression
 
IMAGE FILE FORMATS
IMAGE FILE FORMATSIMAGE FILE FORMATS
IMAGE FILE FORMATS
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
Multimedia db system
Multimedia db systemMultimedia db system
Multimedia db system
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Digital Image Fundamentals
Digital Image FundamentalsDigital Image Fundamentals
Digital Image Fundamentals
 
Chapter 5 - Data Compression
Chapter 5 - Data CompressionChapter 5 - Data Compression
Chapter 5 - Data Compression
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Image file formats
Image file formatsImage file formats
Image file formats
 
DigitalImageProcessing 9-Morphology.ppt
DigitalImageProcessing 9-Morphology.pptDigitalImageProcessing 9-Morphology.ppt
DigitalImageProcessing 9-Morphology.ppt
 
Image Indexing and Retrieval
Image Indexing and RetrievalImage Indexing and Retrieval
Image Indexing and Retrieval
 
Multimedia systems
Multimedia systemsMultimedia systems
Multimedia systems
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image Compression
 
Data compression
Data compressionData compression
Data compression
 
Noise filtering
Noise filteringNoise filtering
Noise filtering
 
Text summarization
Text summarizationText summarization
Text summarization
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVECTYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
TYPES OF IMAGE FILE FORMAT - MATHANKUMAR.S - VMKVEC
 

Similar to Indexing and Retrieval of Audio

Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technology
Shital Kat
 
audio digital.pdf
audio digital.pdfaudio digital.pdf
audio digital.pdf
ssuser997967
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUND
azira96
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
Speech Retrieval
Speech RetrievalSpeech Retrieval
Speech Retrieval
Sarang Rakhecha
 
Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...
Jenn Riley
 
Sound
SoundSound
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
Venkat Projects
 
01-00-ACA-Introduction-2-MIR.pdf
01-00-ACA-Introduction-2-MIR.pdf01-00-ACA-Introduction-2-MIR.pdf
01-00-ACA-Introduction-2-MIR.pdf
AlexanderLerch4
 
Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01
girishjoshi1234
 
Music genre detection using hidden markov models
Music genre detection using hidden markov modelsMusic genre detection using hidden markov models
Music genre detection using hidden markov models
Meghana Kantharaj
 
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationSnorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
IJERA Editor
 
Integrated Multimedia Indexing and Retrieval
Integrated Multimedia Indexing and RetrievalIntegrated Multimedia Indexing and Retrieval
Integrated Multimedia Indexing and Retrieval
Rachmat Wahid Saleh Insani
 
Scct2013 topic 4_audio
Scct2013 topic 4_audioScct2013 topic 4_audio
Scct2013 topic 4_audio
Anies Syahieda
 
Basics of audio coding
Basics of audio codingBasics of audio coding
Basics of audio coding
sakshij91
 
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
IRJET Journal
 
Audio Information and Media.pptx
Audio Information and Media.pptxAudio Information and Media.pptx
Audio Information and Media.pptx
PaulineMae5
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
Goa App
 
Chapter 7 Sound
Chapter 7 SoundChapter 7 Sound
Chapter 7 Sound
shelly3160
 
Automatic Speech Recognition.ppt
Automatic Speech Recognition.pptAutomatic Speech Recognition.ppt
Automatic Speech Recognition.ppt
RudraSaraswat3
 

Similar to Indexing and Retrieval of Audio (20)

Query By humming - Music retrieval technology
Query By humming - Music retrieval technologyQuery By humming - Music retrieval technology
Query By humming - Music retrieval technology
 
audio digital.pdf
audio digital.pdfaudio digital.pdf
audio digital.pdf
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUND
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Speech Retrieval
Speech RetrievalSpeech Retrieval
Speech Retrieval
 
Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...
 
Sound
SoundSound
Sound
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
 
01-00-ACA-Introduction-2-MIR.pdf
01-00-ACA-Introduction-2-MIR.pdf01-00-ACA-Introduction-2-MIR.pdf
01-00-ACA-Introduction-2-MIR.pdf
 
Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01
 
Music genre detection using hidden markov models
Music genre detection using hidden markov modelsMusic genre detection using hidden markov models
Music genre detection using hidden markov models
 
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationSnorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
 
Integrated Multimedia Indexing and Retrieval
Integrated Multimedia Indexing and RetrievalIntegrated Multimedia Indexing and Retrieval
Integrated Multimedia Indexing and Retrieval
 
Scct2013 topic 4_audio
Scct2013 topic 4_audioScct2013 topic 4_audio
Scct2013 topic 4_audio
 
Basics of audio coding
Basics of audio codingBasics of audio coding
Basics of audio coding
 
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
DATABASES, FEATURES, CLASSIFIERS AND CHALLENGES IN AUTOMATIC SPEECH RECOGNITI...
 
Audio Information and Media.pptx
Audio Information and Media.pptxAudio Information and Media.pptx
Audio Information and Media.pptx
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Chapter 7 Sound
Chapter 7 SoundChapter 7 Sound
Chapter 7 Sound
 
Automatic Speech Recognition.ppt
Automatic Speech Recognition.pptAutomatic Speech Recognition.ppt
Automatic Speech Recognition.ppt
 

More from Rachmat Wahid Saleh Insani

01 Mengenal Struktur Data
01 Mengenal Struktur Data01 Mengenal Struktur Data
01 Mengenal Struktur Data
Rachmat Wahid Saleh Insani
 
#2 LIST | PEMROGRAMAN PYTHON
#2 LIST | PEMROGRAMAN PYTHON#2 LIST | PEMROGRAMAN PYTHON
#2 LIST | PEMROGRAMAN PYTHON
Rachmat Wahid Saleh Insani
 
#1 PENGENALAN PYTHON
#1 PENGENALAN PYTHON#1 PENGENALAN PYTHON
#1 PENGENALAN PYTHON
Rachmat Wahid Saleh Insani
 
Video Indexing and Retrieval
Video Indexing and RetrievalVideo Indexing and Retrieval
Video Indexing and Retrieval
Rachmat Wahid Saleh Insani
 
Text Indexing and Retrieval
Text Indexing and RetrievalText Indexing and Retrieval
Text Indexing and Retrieval
Rachmat Wahid Saleh Insani
 
Teori Probabilitas
Teori ProbabilitasTeori Probabilitas
Teori Probabilitas
Rachmat Wahid Saleh Insani
 
Certainty Factor Theory
Certainty Factor TheoryCertainty Factor Theory
Certainty Factor Theory
Rachmat Wahid Saleh Insani
 
DNS (Domain Name System)
DNS (Domain Name System)DNS (Domain Name System)
DNS (Domain Name System)
Rachmat Wahid Saleh Insani
 

More from Rachmat Wahid Saleh Insani (8)

01 Mengenal Struktur Data
01 Mengenal Struktur Data01 Mengenal Struktur Data
01 Mengenal Struktur Data
 
#2 LIST | PEMROGRAMAN PYTHON
#2 LIST | PEMROGRAMAN PYTHON#2 LIST | PEMROGRAMAN PYTHON
#2 LIST | PEMROGRAMAN PYTHON
 
#1 PENGENALAN PYTHON
#1 PENGENALAN PYTHON#1 PENGENALAN PYTHON
#1 PENGENALAN PYTHON
 
Video Indexing and Retrieval
Video Indexing and RetrievalVideo Indexing and Retrieval
Video Indexing and Retrieval
 
Text Indexing and Retrieval
Text Indexing and RetrievalText Indexing and Retrieval
Text Indexing and Retrieval
 
Teori Probabilitas
Teori ProbabilitasTeori Probabilitas
Teori Probabilitas
 
Certainty Factor Theory
Certainty Factor TheoryCertainty Factor Theory
Certainty Factor Theory
 
DNS (Domain Name System)
DNS (Domain Name System)DNS (Domain Name System)
DNS (Domain Name System)
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Indexing and Retrieval of Audio

  • 1. Indexing and Retrieval of Audio Rachmat Wahid Saleh Insani, S.Kom Multimedia Database Management System - Chapter 5
  • 2. Introduction • Audio is classified into three types: speech, music, and noise. • Different audio types are processed and indexed in different ways. • Query audio pieces are similarly classified, processed, and indexed. • Audio pieces are retrieved based on similarity between the query index and the audio index in the database. Multimedia Database Management System - Chapter 5
  • 3. Objectives • Main audio properties and features. • Audio classification. • Main speech recognition techniques. • General approach in indexing and retrieval. • Temporal and content relationship between media types. Multimedia Database Management System - Chapter 5
  • 4. Main Audio Properties and Features • Time domain • Frequency domain Multimedia Database Management System - Chapter 5
  • 5. Features Derives in the Time Domain A signal is represented as amplitude varying with time. Multimedia Database Management System - Chapter 5
  • 6. Features Derives in the Time Domain • Average energy • Zero crossing rate • Silence ratio E = NΣ Multimedia Database Management System - Chapter 5 N−1 Σ N x(n)2 n=0 ZC = | sgn x(n) − sgn x(n −1) n=1 2N
  • 7. Features Derived from the Frequency Domain • Sound spectrum Multimedia Database Management System - Chapter 5
  • 8. Features Derived from the Frequency Domain • Bandwidth • Energy Distribution • Harmonicity Multimedia Database Management System - Chapter 5 • Pitch
  • 9. Timbre • Quality of a sound. Multimedia Database Management System - Chapter 5
  • 10. Audio Classification Why audio classification is important? - Different audio types require different processing and indexing retrieval techniques. - Different audio types have different significance to different applications. - Speech is important audio types which is successful speech recognition techniques available. - Audio types is very useful to some applications. - The search space after classification is reduced to a particular audio class during the retrieval process. Multimedia Database Management System - Chapter 5
  • 11. Audio Classification • There are two types of sound: speech and music. Multimedia Database Management System - Chapter 5
  • 12. Main Characteristics Music • Music has frequency range from 16-20,000 Hz. • Music has low silence ratio. • Music has regular beats. Multimedia Database Management System - Chapter 5 Speech • Speech frequency range from 100-7,000 Hz. • Speech has high silence ratio. • No regular beats.
  • 13. Audio Classification Frameworks • Step by Step Classification • Feature Vector Based Audio Classification Multimedia Database Management System - Chapter 5
  • 14. Step by Step Classification Multimedia Database Management System - Chapter 5
  • 15. Feature Vector Based Audio Classification Audio pieces of the same class are located close to each other in the feature space and audio pieces of different classes are located far apart in the feature space. Multimedia Database Management System - Chapter 5
  • 16. Speech Recognition and Retrieval Multimedia Database Management System - Chapter 5
  • 17. Automatic Speech Recognition ASR system collect models or feature vectors for all possible speech units. Speech unit e.g., phoneme, word, and phrases. Multimedia Database Management System - Chapter 5
  • 18. Automatic Speech Recognition Factors • A phoneme spoken by different speakers or the same speaker at different times produces different features in term of duration, amplitude, and frequency components. • The above differences are exacerbated by the background or environmental noise. • Normal speech is continuous and difficult to separate into individual phonemes. • Phonemes vary with their location in a word. Multimedia Database Management System - Chapter 5
  • 19. General ASR System Multimedia Database Management System - Chapter 5
  • 20. Speech Recognition Performance Speech recognition performance is normally measured by recognition error rate. The lower the error rate, the higher the performance. The performance are affected by following factors: - Subject matter: this may vary from a set of digits, a newspaper article, to general news. - Types of speech: read or spontaneous conversation. - Size of the vocabulary: it ranges from dozens to a few thousand words. Multimedia Database Management System - Chapter 5
  • 21. Music Indexing and Retrieval Multimedia Database Management System - Chapter 5
  • 22. Indexing and Retrieval of Structured Music and Sound Effects • Structured music are represented by a set of commands. • The most common structured music is MIDI. • A new standard of structured audio is MPEG-4 Structured Audio. • The formats contains structure and notes description. Multimedia Database Management System - Chapter 5
  • 23. Indexing and Retrieval of Structured Music and Sound Effects Multimedia Database Management System - Chapter 5
  • 24. Indexing and Retrieval of Sample Based Music • Based on extracted sound features. • Based on pitches of music notes. Multimedia Database Management System - Chapter 5
  • 25. Music Retrieval Based on a set of Features Multimedia Database Management System - Chapter 5
  • 26. Music Retrieval Based on Pitch Multimedia Database Management System - Chapter 5
  • 27. Multimedia Information IR Using Relationships between Audio and Other Media Multimedia Database Management System - Chapter 5