This document provides an overview of audio fingerprinting. It discusses the goals of representing an audio recording with a compact fingerprint and matching fingerprints to identify recordings. It describes fingerprinting applications in broadcast monitoring and value-added services. The document outlines fingerprint requirements and compares fingerprinting to watermarking. It then details the processing steps in Philips' audio fingerprinting system, including fingerprint extraction, database storage, and identification.
This presentation has been well received the the SANS community and many information security teams I engage with.
It describes how integrating a full content repository to your existing security architecture can decrease incident response time and lead to fast identification of root cause.
I also describe a new way of implementing NetFlow without sampling to provide greater visibility of your network.
Enjoy!
Boni Bruno, CISSP, CISM, CGEIT
www.bonibruno.com
Listening at the Cocktail Party with Deep Neural Networks and TensorFlowDatabricks
Many people are amazing at focusing their attention on one person or one voice in a multi speaker scenario, and ‘muting’ other people and background noise. This is known as the cocktail party effect. For other people it is a challenge to separate audio sources.
In this presentation I will focus on solving this problem with deep neural networks and TensorFlow. I will share technical and implementation details with the audience, and talk about gains, pains points, and merits of the solutions as it relates to:
* Preparing, transforming and augmenting relevant data for speech separation and noise removal.
* Creating, training and optimizing various neural network architectures.
* Hardware options for running networks on tiny devices.
* And the end goal : Real-time speech separation on a small embedded platform.
I will present a vision of future smart air pods, smart headsets and smart hearing aids that will be running deep neural networks .
Participants will get an insight into some of the latest advances and limitations in speech separation with deep neural networks on embedded devices in regards to:
* Data transformation and augmentation.
* Deep neural network models for speech separation and for removing noise.
* Training smaller and faster neural networks.
* Creating a real-time speech separation pipeline.
This document provides an introduction to the fundamentals of digitizing audio content, including discretization of signals through sampling in time and quantization in amplitude. It describes key concepts such as the sampling theorem, which states that a sampled signal can be perfectly reconstructed if the sampling frequency is greater than twice the maximum frequency of the original signal. It also covers properties of the quantization error introduced by representing continuous amplitude values with a finite number of levels, and how factors like word length affect the signal-to-noise ratio of the quantization process.
The document discusses an automatic subtitle generation system that takes a video file as input and generates a subtitle file as output. It describes the three main modules of the system: audio extraction, speech recognition, and subtitle generation. Audio extraction extracts the audio from the video file. Speech recognition recognizes the speech in the extracted audio. Subtitle generation then creates a subtitle file with text chunks and their respective start and end times synchronized to the video.
The document summarizes the features of the DS-7300/8100 Industrial Standalone DVR. It can support up to 16 analog camera inputs and uses H.264 compression. It has a Linux-based operating system, remote access capabilities, motion detection, and supports up to 16TB of total storage through SATA interfaces. It can be managed through the iVMS-2000 centralized management software.
Application Visibility and Experience through Flexible NetflowCisco DevNet
The world of applications is changing rapidly in the enterprise; from the way applications are increasingly hosted in the cloud, the diverse nature of apps and to the way they are consumed by many devices. The need for organizations and network administrators is to focus on "Fast IT" - "Innovation in the Enterprise" is growing, which means having to spend less time on daily operations, maintenance and troubleshooting and more time on delivering business value with newer services. Cisco AVC with its NBAR2 technology is designed to detect applications and measure application performance through measuring round trip time, retransmission rates, jitter, delay, packet loss, MoS, URL statistics etc. Those details are transmitted using Flexible Netflow/IPFIX, so partners could leverage the data for application usage reporting, performance reporting and troubleshooting application issues to deliver best possible application experience.
Watch the DevNet 2047 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92664&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
The document discusses the SurfUP media processing platform for building voice and video infrastructure applications. It highlights sample applications, system architectures, support for voice and video, integration levels and features, and value propositions of the SurfUP platform. Specifically, it allows processing of voice, video and fax on the same DSP, has direct DSP to network interfaces, is an open platform, supports various applications with the same hardware/software, and enables streaming diagnostics.
This presentation has been well received the the SANS community and many information security teams I engage with.
It describes how integrating a full content repository to your existing security architecture can decrease incident response time and lead to fast identification of root cause.
I also describe a new way of implementing NetFlow without sampling to provide greater visibility of your network.
Enjoy!
Boni Bruno, CISSP, CISM, CGEIT
www.bonibruno.com
Listening at the Cocktail Party with Deep Neural Networks and TensorFlowDatabricks
Many people are amazing at focusing their attention on one person or one voice in a multi speaker scenario, and ‘muting’ other people and background noise. This is known as the cocktail party effect. For other people it is a challenge to separate audio sources.
In this presentation I will focus on solving this problem with deep neural networks and TensorFlow. I will share technical and implementation details with the audience, and talk about gains, pains points, and merits of the solutions as it relates to:
* Preparing, transforming and augmenting relevant data for speech separation and noise removal.
* Creating, training and optimizing various neural network architectures.
* Hardware options for running networks on tiny devices.
* And the end goal : Real-time speech separation on a small embedded platform.
I will present a vision of future smart air pods, smart headsets and smart hearing aids that will be running deep neural networks .
Participants will get an insight into some of the latest advances and limitations in speech separation with deep neural networks on embedded devices in regards to:
* Data transformation and augmentation.
* Deep neural network models for speech separation and for removing noise.
* Training smaller and faster neural networks.
* Creating a real-time speech separation pipeline.
This document provides an introduction to the fundamentals of digitizing audio content, including discretization of signals through sampling in time and quantization in amplitude. It describes key concepts such as the sampling theorem, which states that a sampled signal can be perfectly reconstructed if the sampling frequency is greater than twice the maximum frequency of the original signal. It also covers properties of the quantization error introduced by representing continuous amplitude values with a finite number of levels, and how factors like word length affect the signal-to-noise ratio of the quantization process.
The document discusses an automatic subtitle generation system that takes a video file as input and generates a subtitle file as output. It describes the three main modules of the system: audio extraction, speech recognition, and subtitle generation. Audio extraction extracts the audio from the video file. Speech recognition recognizes the speech in the extracted audio. Subtitle generation then creates a subtitle file with text chunks and their respective start and end times synchronized to the video.
The document summarizes the features of the DS-7300/8100 Industrial Standalone DVR. It can support up to 16 analog camera inputs and uses H.264 compression. It has a Linux-based operating system, remote access capabilities, motion detection, and supports up to 16TB of total storage through SATA interfaces. It can be managed through the iVMS-2000 centralized management software.
Application Visibility and Experience through Flexible NetflowCisco DevNet
The world of applications is changing rapidly in the enterprise; from the way applications are increasingly hosted in the cloud, the diverse nature of apps and to the way they are consumed by many devices. The need for organizations and network administrators is to focus on "Fast IT" - "Innovation in the Enterprise" is growing, which means having to spend less time on daily operations, maintenance and troubleshooting and more time on delivering business value with newer services. Cisco AVC with its NBAR2 technology is designed to detect applications and measure application performance through measuring round trip time, retransmission rates, jitter, delay, packet loss, MoS, URL statistics etc. Those details are transmitted using Flexible Netflow/IPFIX, so partners could leverage the data for application usage reporting, performance reporting and troubleshooting application issues to deliver best possible application experience.
Watch the DevNet 2047 replay from the Cisco Live On-Demand Library at: https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=92664&backBtn=true
Check out more and register for Cisco DevNet: http://ow.ly/jCNV3030OfS
The document discusses the SurfUP media processing platform for building voice and video infrastructure applications. It highlights sample applications, system architectures, support for voice and video, integration levels and features, and value propositions of the SurfUP platform. Specifically, it allows processing of voice, video and fax on the same DSP, has direct DSP to network interfaces, is an open platform, supports various applications with the same hardware/software, and enables streaming diagnostics.
Android Audio HAL – Audio Architecture – Audio HAL interface – Audio Policy – Audio HAL compilation & verification – Overview of Tinyalsa
Android Video HAL – Camera Architecture – Overview of camera HAL interface – Overview of V4L2 – Enabling V4l2 in kernel – Camera HAL compilation and verification
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...Maksim Shudrak
The document discusses using coverage-guided fuzzing to find bugs in modern malware. It begins with an introduction and motivation for using fuzzing techniques on malware. It then provides an overview of coverage-guided fuzzing and how it works. Several case studies are presented where coverage-guided fuzzing was used to find vulnerabilities in popular malware samples like Mirai.
This document summarizes the key components of a digital media production infrastructure, including the network, hard disks, and file systems.
The network must support high throughput without packet loss or delays to support real-time media workflows. Point-to-point switched Ethernet and layer 3 routing address limitations of shared Ethernet.
Hard disks are most efficient with large sequential access rather than small random access. Disk striping and intelligent file systems can reduce random access.
File systems must support non-blocking access across many clients and disks without oversubscription to ensure resources are available for media production needs.
The role of robotic innovation in ore characterization, mineralogy and geomet...Mining On Top
The role of robotic innovation in ore characterization, mineralogy and geometallurgy
Mette Dobel,Global Product Manager, Laboratory solutions, FLSmidth, Denmark
Mining On Top: Helsinki
16-17 September 2013 | Helsinki
This document provides an overview and agenda for a session on advanced topics in IP multicast deployment. It discusses tools and techniques for deploying IP multicast, including examples of PIM mode configurations, rendezvous point deployment models, interconnecting PIM domains, label switched multicast, high availability techniques, and multicast in wireless environments. The target audience is network engineers in enterprise and service provider networks.
Track 3 session 8 - st dev con 2016 - music and voice over bleST_World
This document provides an overview of Roberto Sannino's presentation on BlueVoice, STMicroelectronics' solution for playing high-quality audio over Bluetooth Low Energy. BlueVoice uses advanced audio processing and compression techniques like ADPCM to stream compressed audio data over BLE at rates up to 64 kbps, enabling applications like voice-controlled devices, wireless audio, and the internet of audio things. The document describes BlueVoice's architecture, software libraries, development kits, and example applications to demonstrate voice streaming to mobile and cloud services using STM32 and BlueNRG hardware.
This document discusses event tracing using VampirTrace and Vampir. It provides an overview of event tracing, including instrumentation, run-time measurement, and visualization. Event tracing involves instrumenting code to record events, running the instrumented code to generate trace files, and using tools like Vampir to analyze and visualize the trace files.
As presented at ITExpo 2017 and the April Peerlyst Tel-Aviv security Meetup.
Can your company afford to ignore VoIP security? With the number of attacks on your telephone services and mobile devices your chance of being attacked and financial liability is at an all time high. This session offers an introductory primer to securing your VoIP PBX. This talk will include explanations about common attacks, how they can find you, and common techniques you can use to defend your company.
A Technique for Enabling and Supporting Debugging of Field Failures (ICSE 2007)James Clause
The document presents a technique for enabling and supporting debugging of field failures in deployed software. The technique involves recording high-level environment interactions during execution, replaying the execution, and minimizing the recording to focus debugging efforts. An evaluation on the Pine email client found that the technique can produce minimized executions using a small fraction of the original data size, while imposing negligible runtime overhead. The technique aims to help developers debug anomalous behavior that occurs in deployed software.
NetFlow Monitoring for Cyber Threat DefenseCisco Canada
Recent trends have led to the erosion of the security perimeter and increasingly attackers are gaining operational footprints on the network interior. For more information, please visit our website: http://www.cisco.com/web/CA/index.html
Embedded Android system development workshop is focused on integrating new device with Android framework. Our hands-on approach makes Emertxe as the best institute to learn android system development training. This workshop deep dives into Android porting, Android Hardware Abstraction Layer (HAL), Android Services and Linux device driver ecosystem. This workshop based training program will enable you to efficiently integrate new hardware with Android HAL / Framework.
practicing what you never preached: sorting and discarding from a practical ...FIAT/IFTA
This document outlines a process for selecting and discarding physical media as part of an archival digitization and preservation project. It involves 3 main steps: 1) documentation work to analyze the collections and properties, 2) physical inspection of media to verify condition and contents, and 3) technical examination of any remaining doubtful cases. Criteria are provided for determining what materials should be kept based on factors like uniqueness, quality, and role in digitization. Guidelines are given for different media types, especially film and video, regarding hierarchies of quality and combinations of formats. The goal is to retain the highest quality and most important materials while discarding redundant, obsolete, or too degraded copies.
Developing Applications Using Host Processing Instead of DSPsVideoguy
Host media processing (HMP) involves using general-purpose computing platforms instead of specialized hardware to create telephony applications. This offers lower costs through reduced hardware requirements, more flexibility in application deployment, and increased reliability by distributing processing across multiple servers. HMP provides voice processing and media services through software running on standard servers.
This document provides an overview of Palo Alto Networks next-generation firewall technology. It discusses how traditional firewalls do not provide visibility and control over applications. Next-gen firewalls can identify applications, users, and threats within encrypted traffic using techniques like App-ID, User-ID, and Content-ID. The document also describes Palo Alto Networks hardware models and their performance capabilities for handling firewall and threat prevention workloads. It highlights key next-gen firewall features like real-time threat analysis, application control, and safe enablement of network applications.
IMA/Thales EchoVoice (VOIP) for OpenSimulator Presentation at OSCC19Lisa Laxton
IMA/Thales EchoVoice (VOIP) for OpenSimulator
Frank Rulof, Seth Nygard, Lisa Laxton, Natacha Bru
Presentation Abstract: This presentation from Infinite Metaverse Alliance® (IMA) and Thales Group discusses progress towards improving open source code used in a self-hosted secure voice solution called EchoVoice. This is an alternative to ViVox which is commonly used for voice communication between avatars in OpenSimulator regions. The discussion is focused on the development work done to provide a modernized solution for HyperGrid-enabled regions as well as planned enhancements not currently available. The work of IMA and Thales is in general directed toward broadening use of the Metaverse for Public, Education, Industry and Government sectors but the community as a whole benefits from open source.
Target Audience & Outcomes: Participants from the OpenSimulator Community at large will learn about enhancements, features, and improvements IMA and Thales are working on together to deliver an open source solution that meets the needs of a broader OpenSimulator community.
@IMATalks
The document discusses packet-to-packet media processing and transcoding applications using the SurfUP media processing platform. It highlights the needs for these applications, optimal system architectures, and how SurfUP supports them through solutions at the chip-level and board-level. It provides value propositions of using SurfUP for transcoding, including supporting multiple media types on the same DSP, direct DSP to network interfaces, an open platform, flexibility across applications, and streaming diagnostics.
The document discusses packet-to-packet media processing and transcoding applications using the SurfUP media processing platform. It highlights the needs for these applications, optimal system architectures, and how SurfUP supports them through DSP chips, boards, and software that provide high density voice, video, and fax transcoding and processing with low latency and an open platform.
1. The document is a glossary of terms related to sound design and production for computer games. It contains definitions for terms like Foley artistry, sound libraries, file formats like .wav and .mp3, limitations of audio hardware, recording systems, sampling constraints, and more.
2. For each term, it provides a short definition from an online source along with a description of how the term relates to the student's own production practice.
3. The glossary is intended to help the student research and understand foundational concepts in sound design as part of their study of a BTEC extended diploma in games design.
The document discusses the effectiveness and efficiency of different types of network test solutions for identifying issues in video delivery. Stateful traffic testing that emulates real applications from layer 2-7 is more effective than stateless testing as it provides more accurate performance data and prediction of network issues. Using real sample traffic from multiple devices and formats allows more accurate modeling and understanding of network behavior under different conditions. Testing needs to evolve to address new video applications and protocols like IPv6.
This document provides an introduction to audio content analysis and convolution. It discusses linear time-invariant (LTI) systems and how convolution describes the output of an LTI system based on the input and impulse response. Examples of simple low-pass filters like moving average and single-pole filters are provided to illustrate convolution. The objectives are to understand LTI systems, convolution operations, and implement basic filters.
This document provides an overview of musical genre classification. It discusses how genre classification is one of the early research topics in music information retrieval and involves extracting audio features from songs and using those features to classify the songs into genres. The document describes common features used like spectral, pitch, rhythm, and intensity features. It also outlines the typical processing steps of feature extraction, normalization, selection and classification. Examples of challenges with genre classification like ill-defined genres and overfitting are also mentioned.
Android Audio HAL – Audio Architecture – Audio HAL interface – Audio Policy – Audio HAL compilation & verification – Overview of Tinyalsa
Android Video HAL – Camera Architecture – Overview of camera HAL interface – Overview of V4L2 – Enabling V4l2 in kernel – Camera HAL compilation and verification
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...Maksim Shudrak
The document discusses using coverage-guided fuzzing to find bugs in modern malware. It begins with an introduction and motivation for using fuzzing techniques on malware. It then provides an overview of coverage-guided fuzzing and how it works. Several case studies are presented where coverage-guided fuzzing was used to find vulnerabilities in popular malware samples like Mirai.
This document summarizes the key components of a digital media production infrastructure, including the network, hard disks, and file systems.
The network must support high throughput without packet loss or delays to support real-time media workflows. Point-to-point switched Ethernet and layer 3 routing address limitations of shared Ethernet.
Hard disks are most efficient with large sequential access rather than small random access. Disk striping and intelligent file systems can reduce random access.
File systems must support non-blocking access across many clients and disks without oversubscription to ensure resources are available for media production needs.
The role of robotic innovation in ore characterization, mineralogy and geomet...Mining On Top
The role of robotic innovation in ore characterization, mineralogy and geometallurgy
Mette Dobel,Global Product Manager, Laboratory solutions, FLSmidth, Denmark
Mining On Top: Helsinki
16-17 September 2013 | Helsinki
This document provides an overview and agenda for a session on advanced topics in IP multicast deployment. It discusses tools and techniques for deploying IP multicast, including examples of PIM mode configurations, rendezvous point deployment models, interconnecting PIM domains, label switched multicast, high availability techniques, and multicast in wireless environments. The target audience is network engineers in enterprise and service provider networks.
Track 3 session 8 - st dev con 2016 - music and voice over bleST_World
This document provides an overview of Roberto Sannino's presentation on BlueVoice, STMicroelectronics' solution for playing high-quality audio over Bluetooth Low Energy. BlueVoice uses advanced audio processing and compression techniques like ADPCM to stream compressed audio data over BLE at rates up to 64 kbps, enabling applications like voice-controlled devices, wireless audio, and the internet of audio things. The document describes BlueVoice's architecture, software libraries, development kits, and example applications to demonstrate voice streaming to mobile and cloud services using STM32 and BlueNRG hardware.
This document discusses event tracing using VampirTrace and Vampir. It provides an overview of event tracing, including instrumentation, run-time measurement, and visualization. Event tracing involves instrumenting code to record events, running the instrumented code to generate trace files, and using tools like Vampir to analyze and visualize the trace files.
As presented at ITExpo 2017 and the April Peerlyst Tel-Aviv security Meetup.
Can your company afford to ignore VoIP security? With the number of attacks on your telephone services and mobile devices your chance of being attacked and financial liability is at an all time high. This session offers an introductory primer to securing your VoIP PBX. This talk will include explanations about common attacks, how they can find you, and common techniques you can use to defend your company.
A Technique for Enabling and Supporting Debugging of Field Failures (ICSE 2007)James Clause
The document presents a technique for enabling and supporting debugging of field failures in deployed software. The technique involves recording high-level environment interactions during execution, replaying the execution, and minimizing the recording to focus debugging efforts. An evaluation on the Pine email client found that the technique can produce minimized executions using a small fraction of the original data size, while imposing negligible runtime overhead. The technique aims to help developers debug anomalous behavior that occurs in deployed software.
NetFlow Monitoring for Cyber Threat DefenseCisco Canada
Recent trends have led to the erosion of the security perimeter and increasingly attackers are gaining operational footprints on the network interior. For more information, please visit our website: http://www.cisco.com/web/CA/index.html
Embedded Android system development workshop is focused on integrating new device with Android framework. Our hands-on approach makes Emertxe as the best institute to learn android system development training. This workshop deep dives into Android porting, Android Hardware Abstraction Layer (HAL), Android Services and Linux device driver ecosystem. This workshop based training program will enable you to efficiently integrate new hardware with Android HAL / Framework.
practicing what you never preached: sorting and discarding from a practical ...FIAT/IFTA
This document outlines a process for selecting and discarding physical media as part of an archival digitization and preservation project. It involves 3 main steps: 1) documentation work to analyze the collections and properties, 2) physical inspection of media to verify condition and contents, and 3) technical examination of any remaining doubtful cases. Criteria are provided for determining what materials should be kept based on factors like uniqueness, quality, and role in digitization. Guidelines are given for different media types, especially film and video, regarding hierarchies of quality and combinations of formats. The goal is to retain the highest quality and most important materials while discarding redundant, obsolete, or too degraded copies.
Developing Applications Using Host Processing Instead of DSPsVideoguy
Host media processing (HMP) involves using general-purpose computing platforms instead of specialized hardware to create telephony applications. This offers lower costs through reduced hardware requirements, more flexibility in application deployment, and increased reliability by distributing processing across multiple servers. HMP provides voice processing and media services through software running on standard servers.
This document provides an overview of Palo Alto Networks next-generation firewall technology. It discusses how traditional firewalls do not provide visibility and control over applications. Next-gen firewalls can identify applications, users, and threats within encrypted traffic using techniques like App-ID, User-ID, and Content-ID. The document also describes Palo Alto Networks hardware models and their performance capabilities for handling firewall and threat prevention workloads. It highlights key next-gen firewall features like real-time threat analysis, application control, and safe enablement of network applications.
IMA/Thales EchoVoice (VOIP) for OpenSimulator Presentation at OSCC19Lisa Laxton
IMA/Thales EchoVoice (VOIP) for OpenSimulator
Frank Rulof, Seth Nygard, Lisa Laxton, Natacha Bru
Presentation Abstract: This presentation from Infinite Metaverse Alliance® (IMA) and Thales Group discusses progress towards improving open source code used in a self-hosted secure voice solution called EchoVoice. This is an alternative to ViVox which is commonly used for voice communication between avatars in OpenSimulator regions. The discussion is focused on the development work done to provide a modernized solution for HyperGrid-enabled regions as well as planned enhancements not currently available. The work of IMA and Thales is in general directed toward broadening use of the Metaverse for Public, Education, Industry and Government sectors but the community as a whole benefits from open source.
Target Audience & Outcomes: Participants from the OpenSimulator Community at large will learn about enhancements, features, and improvements IMA and Thales are working on together to deliver an open source solution that meets the needs of a broader OpenSimulator community.
@IMATalks
The document discusses packet-to-packet media processing and transcoding applications using the SurfUP media processing platform. It highlights the needs for these applications, optimal system architectures, and how SurfUP supports them through solutions at the chip-level and board-level. It provides value propositions of using SurfUP for transcoding, including supporting multiple media types on the same DSP, direct DSP to network interfaces, an open platform, flexibility across applications, and streaming diagnostics.
The document discusses packet-to-packet media processing and transcoding applications using the SurfUP media processing platform. It highlights the needs for these applications, optimal system architectures, and how SurfUP supports them through DSP chips, boards, and software that provide high density voice, video, and fax transcoding and processing with low latency and an open platform.
1. The document is a glossary of terms related to sound design and production for computer games. It contains definitions for terms like Foley artistry, sound libraries, file formats like .wav and .mp3, limitations of audio hardware, recording systems, sampling constraints, and more.
2. For each term, it provides a short definition from an online source along with a description of how the term relates to the student's own production practice.
3. The glossary is intended to help the student research and understand foundational concepts in sound design as part of their study of a BTEC extended diploma in games design.
The document discusses the effectiveness and efficiency of different types of network test solutions for identifying issues in video delivery. Stateful traffic testing that emulates real applications from layer 2-7 is more effective than stateless testing as it provides more accurate performance data and prediction of network issues. Using real sample traffic from multiple devices and formats allows more accurate modeling and understanding of network behavior under different conditions. Testing needs to evolve to address new video applications and protocols like IPv6.
This document provides an introduction to audio content analysis and convolution. It discusses linear time-invariant (LTI) systems and how convolution describes the output of an LTI system based on the input and impulse response. Examples of simple low-pass filters like moving average and single-pole filters are provided to illustrate convolution. The objectives are to understand LTI systems, convolution operations, and implement basic filters.
This document provides an overview of musical genre classification. It discusses how genre classification is one of the early research topics in music information retrieval and involves extracting audio features from songs and using those features to classify the songs into genres. The document describes common features used like spectral, pitch, rhythm, and intensity features. It also outlines the typical processing steps of feature extraction, normalization, selection and classification. Examples of challenges with genre classification like ill-defined genres and overfitting are also mentioned.
This document summarizes an introduction to audio content analysis module on audio-to-audio and audio-to-score alignment. It discusses using audio feature extraction and dynamic time warping to align two audio sequences or an audio sequence to a musical score. Key steps include extracting features from the audio, computing a distance matrix using measures like Euclidean distance, and finding the optimal alignment path. Evaluation of alignment accuracy can be challenging due to differences in the domains of audio and scores.
This document discusses onset detection in audio signals. It describes onset detection as a two-step process: 1) generating a novelty function to measure signal changes over time, and 2) peak picking to identify onset locations in the novelty function. Common novelty functions are based on time-domain envelopes, pitch contours, or STFT differences. Peak picking typically uses thresholding to find local maxima above a likelihood threshold. The goal of evaluation is to compare predicted onset times to ground truth labels.
This document provides an overview of mood recognition in audio content analysis. It discusses how mood recognition aims to identify the mood or emotion of a song by extracting features and classifying using methods like regression. Key challenges include defining and quantifying moods/emotions, developing ground truth data, and determining appropriate mood models. Common approaches include classifying into mood clusters or mapping to the arousal-valence circumplex model of affect using regression techniques. Results can vary significantly based on the data but classification rates around 50% and regression errors of 0.1-0.4 are typical.
This document discusses music structure detection. It introduces self-similarity matrices which show pairwise similarities between segments of a song and can be used to detect structural changes based on novelty, homogeneity, or repetitions. Ground truth annotations of music structure are challenging due to ambiguity and different hierarchical levels. Typical accuracy of automatic structure detection ranges from 50-70% compared to human annotations.
This document introduces key concepts for analyzing musical rhythm, including onset, tempo, meter, bar, and rhythm. It discusses how humans perceive and group temporal musical events, and describes typical onset durations for different instruments. Accuracy for detecting onsets ranges from 2-300ms, depending on factors like tempo and instrumentation. Musical works represent these temporal concepts through notation of tempo, time signature, bars, and note values.
This document discusses chord detection in music audio. It introduces musical chords and their properties, describes the process of extracting pitch chroma and comparing it to chord templates, and explains how Hidden Markov Models (HMMs) can be used to model chord progressions over time using transition probabilities between states representing different chords. The Viterbi algorithm is used to find the most likely chord sequence given the observations.
This document provides an overview of music performance analysis, including defining music performance, common parameters analyzed like tempo and dynamics, goals of performance analysis like understanding the performer and listener experience, challenges like disentangling variables and limited reliable datasets, and opportunities to address challenges through cross-disciplinary work.
This document discusses music similarity analysis and clustering. It describes how music similarity is multidimensional and subjective. Common approaches for measuring music similarity involve extracting audio features and calculating distances between songs in the feature space. Visualization techniques like self-organizing maps can be used to project the high-dimensional feature spaces into 2D for visualization. Evaluation is challenging as there is no clear ground truth, but can involve using genre labels or listening surveys.
The document discusses the beat histogram, which is a representation of the periodicities in an audio signal. It can be computed from a novelty function or series of onset times. Features are then extracted from the beat histogram like the mean, max peak value, and standard deviation to characterize the rhythmic properties. However, the beat histogram lacks phase information so it provides a limited description of rhythm.
This document summarizes different approaches to tempo detection and beat tracking from audio signals, including oscillator, filterbank, and template-based approaches. The oscillator approach models beat tracking as an adaptive oscillator that predicts beat locations. The filterbank approach uses a bank of filters spaced at different intervals to detect the tempo. The template-based approach correlates templates of pulses at different tempos with the audio signal. Challenges discussed include detecting multiple hierarchical tempo levels and handling sudden tempo changes.
This document provides an overview of an audio content analysis module about intensity and loudness. It discusses human perception of loudness and how it relates to physical measurements of intensity and magnitude. It then describes several common low-level features used to represent intensity in audio signals, including root mean square (RMS), peak envelope, weighted RMS, and models of loudness like the Zwicker model. The goal is to discuss intensity and loudness from both a perceptual and technical feature extraction perspective.
This document provides an overview of instantaneous audio features that can be extracted from audio signals. It discusses that features are low-level descriptors of audio that are not necessarily perceptually meaningful on their own. Common instantaneous features described include spectral centroid, spectral spread, spectral rolloff, and spectral flux, which characterize the spectral shape and are useful for timbre perception. Mel-frequency cepstral coefficients (MFCCs) are also covered as they represent the short-term power spectrum of a sound. The document aims to describe the computation and meaning of various instantaneous audio features.
This document discusses approaches for detecting the fundamental frequency (f0) of monophonic audio signals. It describes time-domain methods like zero-crossing rate analysis and autocorrelation, as well as frequency-domain techniques like harmonic product spectrum, harmonic sum spectrum, and spectral maximum likelihood. It also covers auditory-motivated approaches and notes that while older, tweaked versions of these methods remain state-of-the-art for monophonic f0 detection.
This document discusses data requirements, data splits, cross validation, and data augmentation for machine learning. It covers dividing data into training, validation, and test sets; using cross validation to evaluate models; and techniques for augmenting data like degrading quality, adding effects, and segmentation when data is insufficient. The overall goal is to have representative, clean data that is sufficient for the task and to maximize its use through validation and augmentation.
This document discusses evaluating pitch tracking systems by comparing predicted pitch to ground truth pitch. It describes challenges in annotating pitch and time, and defines different pitch tracking tasks from individual notes to polyphonic mixtures. Metrics for score-based and F0-based ground truths are provided to measure accuracy and deviation between predicted and ground truth pitch. Results should be aggregated at the frame/note and file levels.
This document provides an introduction and overview of time-frequency representations using the Fourier transform. It defines the continuous Fourier transform and discusses its key properties including invertibility, superposition, convolution/multiplication relationships, Parseval's theorem, and effects of time/frequency shifts and scaling. It also covers the Fourier transform of sampled signals, the short-time Fourier transform (STFT), and the discrete Fourier transform (DFT).
This document discusses feature learning techniques for audio content analysis. It introduces feature learning as an alternative to traditional hand-crafted features that can automatically learn features from data. Two main approaches are dictionary learning, which learns a dictionary of templates to represent audio signals, and deep learning using neural networks to learn high-level representations through multiple layers. Learned features may contain more useful information than hand-crafted ones but require large amounts of data and computation.
This document provides an overview of techniques for fundamental frequency detection in polyphonic signals, including iterative subtraction methods and non-negative matrix factorization (NMF). Iterative subtraction methods work by repeatedly detecting the most salient frequency, removing its contribution, and iterating until all frequencies are identified. NMF decomposes a signal into additive combinations of spectral templates to separate voices. The document discusses algorithms like Cheveigné, Meddis, and Klapuri that apply these techniques, and compares their approaches, advantages, and limitations.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
1. Introduction to Audio Content Analysis
module 11.0: audio fingerprinting
alexander lerch
2. overview intro requirements approaches philips shazam summary
introduction
overview
corresponding textbook section
chapter 11
lecture content
• introduction to audio fingerprinting
• in-depth example for fingerprint extraction and retrieval
learning objectives
• discuss goals and limitations of audio fingerprinting systems as compared to
watermarking or cover song detection systems
• describe the processing steps of the Philips fingerprinting system
module 11.0: audio fingerprinting 1 / 14
3. overview intro requirements approaches philips shazam summary
introduction
overview
corresponding textbook section
chapter 11
lecture content
• introduction to audio fingerprinting
• in-depth example for fingerprint extraction and retrieval
learning objectives
• discuss goals and limitations of audio fingerprinting systems as compared to
watermarking or cover song detection systems
• describe the processing steps of the Philips fingerprinting system
module 11.0: audio fingerprinting 1 / 14
4. overview intro requirements approaches philips shazam summary
audio fingerprinting
introduction
objective:
• represent a recording with a compact and unique digest
(→ fingerprint, perceptual hash)
• allow quick matching between previously stored fingerprints and an extracted
fingerprint
applications:
• broadcast monitoring:
automate verification for royalties/infringement claims
• value-added services:
offer information and meta data
module 11.0: audio fingerprinting 2 / 14
5. overview intro requirements approaches philips shazam summary
audio fingerprinting
introduction
objective:
• represent a recording with a compact and unique digest
(→ fingerprint, perceptual hash)
• allow quick matching between previously stored fingerprints and an extracted
fingerprint
applications:
• broadcast monitoring:
automate verification for royalties/infringement claims
• value-added services:
offer information and meta data
module 11.0: audio fingerprinting 2 / 14
6. overview intro requirements approaches philips shazam summary
audio fingerprinting
introduction
objective:
• represent a recording with a compact and unique digest
(→ fingerprint, perceptual hash)
• allow quick matching between previously stored fingerprints and an extracted
fingerprint
applications:
• broadcast monitoring:
automate verification for royalties/infringement claims
• value-added services:
offer information and meta data
module 11.0: audio fingerprinting 2 / 14
7. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprinting vs. watermarking
fingerprinting:
• identifies recording (but not musical content)
watermarking:
• embeds perceptually “unnoticeable” data block in the audio
• identifies instance of recording
Property Fingerprinting Watermarking
Allows Legacy Content Indexing + –
Allows Embedded (Meta) Data – +
Leaves Signal Unchanged + –
Identification of Recording User or Interaction
module 11.0: audio fingerprinting 3 / 14
8. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprinting vs. watermarking
fingerprinting:
• identifies recording (but not musical content)
watermarking:
• embeds perceptually “unnoticeable” data block in the audio
• identifies instance of recording
Property Fingerprinting Watermarking
Allows Legacy Content Indexing + –
Allows Embedded (Meta) Data – +
Leaves Signal Unchanged + –
Identification of Recording User or Interaction
module 11.0: audio fingerprinting 3 / 14
9. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprint requirements
accuracy & reliability:
minimize false negatives/positives
robustness & security:
robust against distortions and attacks
granularity:
quick identification in a real-time context
versatility:
independent of file format, etc.
scalability:
good database performance
complexity:
implementation possible on embedded devices
module 11.0: audio fingerprinting 4 / 14
10. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprint requirements
accuracy & reliability:
minimize false negatives/positives
robustness & security:
robust against distortions and attacks
granularity:
quick identification in a real-time context
versatility:
independent of file format, etc.
scalability:
good database performance
complexity:
implementation possible on embedded devices
module 11.0: audio fingerprinting 4 / 14
11. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprint requirements
accuracy & reliability:
minimize false negatives/positives
robustness & security:
robust against distortions and attacks
granularity:
quick identification in a real-time context
versatility:
independent of file format, etc.
scalability:
good database performance
complexity:
implementation possible on embedded devices
module 11.0: audio fingerprinting 4 / 14
12. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprint requirements
accuracy & reliability:
minimize false negatives/positives
robustness & security:
robust against distortions and attacks
granularity:
quick identification in a real-time context
versatility:
independent of file format, etc.
scalability:
good database performance
complexity:
implementation possible on embedded devices
module 11.0: audio fingerprinting 4 / 14
13. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprint requirements
accuracy & reliability:
minimize false negatives/positives
robustness & security:
robust against distortions and attacks
granularity:
quick identification in a real-time context
versatility:
independent of file format, etc.
scalability:
good database performance
complexity:
implementation possible on embedded devices
module 11.0: audio fingerprinting 4 / 14
14. overview intro requirements approaches philips shazam summary
audio fingerprinting
fingerprint requirements
accuracy & reliability:
minimize false negatives/positives
robustness & security:
robust against distortions and attacks
granularity:
quick identification in a real-time context
versatility:
independent of file format, etc.
scalability:
good database performance
complexity:
implementation possible on embedded devices
module 11.0: audio fingerprinting 4 / 14
15. overview intro requirements approaches philips shazam summary
audio fingerprinting
general fingerprinting system
Unlabeled
Recording
Fingerprint
Extraction
Find
Match
ID
Database
Creation
Identification
Recording
IDs Database
Collection of
Recordings
Fingerprint
Extraction
module 11.0: audio fingerprinting 5 / 14
16. overview intro requirements approaches philips shazam summary
audio fingerprinting
brainstorm
How does it work? MD5?
module 11.0: audio fingerprinting 6 / 14
17. overview intro requirements approaches philips shazam summary
audio fingerprinting
brainstorm
How does it work? MD5?
module 11.0: audio fingerprinting 6 / 14
18. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips extraction 1/3
Audio
Signal STFT
Power Spectrum
Filterbank
Frequency Filtering
Temporal Filtering
Quantization
1 pre-processing:
downmixing & downsampling (5 kHz)
2 STFT: K = 2048, overlap 31
32
3 log frequency bands:
33 bands from 300–2000Hz
4 freq derivative: 33 bands
5 time derivative: 32 bands
6 quantization:
vFP(k, n) =
(
1 if ∆E(k, n) − ∆E(k, n − 1)
0
0 otherwise
⇒ 32 bit subfingerprint
module 11.0: audio fingerprinting 7 / 14
19. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips extraction 1/3
Audio
Signal STFT
Power Spectrum
Filterbank
Frequency Filtering
Temporal Filtering
Quantization
1 pre-processing:
downmixing downsampling (5 kHz)
2 STFT: K = 2048, overlap 31
32
3 log frequency bands:
33 bands from 300–2000Hz
4 freq derivative: 33 bands
5 time derivative: 32 bands
6 quantization:
vFP(k, n) =
(
1 if ∆E(k, n) − ∆E(k, n − 1)
0
0 otherwise
⇒ 32 bit subfingerprint
module 11.0: audio fingerprinting 7 / 14
20. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips extraction 1/3
Audio
Signal STFT
Power Spectrum
Filterbank
Frequency Filtering
Temporal Filtering
Quantization
1 pre-processing:
downmixing downsampling (5 kHz)
2 STFT: K = 2048, overlap 31
32
3 log frequency bands:
33 bands from 300–2000Hz
4 freq derivative: 33 bands
5 time derivative: 32 bands
6 quantization:
vFP(k, n) =
(
1 if ∆E(k, n) − ∆E(k, n − 1)
0
0 otherwise
⇒ 32 bit subfingerprint
module 11.0: audio fingerprinting 7 / 14
21. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips extraction 1/3
Audio
Signal STFT
Power Spectrum
Filterbank
Frequency Filtering
Temporal Filtering
Quantization
1 pre-processing:
downmixing downsampling (5 kHz)
2 STFT: K = 2048, overlap 31
32
3 log frequency bands:
33 bands from 300–2000Hz
4 freq derivative: 33 bands
5 time derivative: 32 bands
6 quantization:
vFP(k, n) =
(
1 if ∆E(k, n) − ∆E(k, n − 1)
0
0 otherwise
⇒ 32 bit subfingerprint
module 11.0: audio fingerprinting 7 / 14
22. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips extraction 1/3
Audio
Signal STFT
Power Spectrum
Filterbank
Frequency Filtering
Temporal Filtering
Quantization
1 pre-processing:
downmixing downsampling (5 kHz)
2 STFT: K = 2048, overlap 31
32
3 log frequency bands:
33 bands from 300–2000Hz
4 freq derivative: 33 bands
5 time derivative: 32 bands
6 quantization:
vFP(k, n) =
(
1 if ∆E(k, n) − ∆E(k, n − 1)
0
0 otherwise
⇒ 32 bit subfingerprint
module 11.0: audio fingerprinting 7 / 14
23. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips extraction 1/3
Audio
Signal STFT
Power Spectrum
Filterbank
Frequency Filtering
Temporal Filtering
Quantization
1 pre-processing:
downmixing downsampling (5 kHz)
2 STFT: K = 2048, overlap 31
32
3 log frequency bands:
33 bands from 300–2000Hz
4 freq derivative: 33 bands
5 time derivative: 32 bands
6 quantization:
vFP(k, n) =
(
1 if ∆E(k, n) − ∆E(k, n − 1)
0
0 otherwise
⇒ 32 bit subfingerprint
module 11.0: audio fingerprinting 7 / 14
29. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 1/3
database
• contains all subfingerprints for all songs
• previous example database: 25 billion subfingerprints
problem
• how to identify fingerprint efficiently?
module 11.0: audio fingerprinting 10 / 14
30. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 1/3
database
• contains all subfingerprints for all songs
• previous example database: 25 billion subfingerprints
problem
• how to identify fingerprint efficiently?
module 11.0: audio fingerprinting 10 / 14
31. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 1/3
database
• contains all subfingerprints for all songs
• previous example database: 25 billion subfingerprints
problem
• how to identify fingerprint efficiently?
module 11.0: audio fingerprinting 10 / 14
32. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 2/3
simple system:
1 create lookup table with all possible subfingerprints (232
) pointing to occurrences
2 assume at least one of the extracted 256 subfingerprints is error-free
⇒ only entries listed at 256 positions of the table have to be checked
3 compute Hamming distance between extracted fingerprint and candidates
module 11.0: audio fingerprinting 11 / 14
33. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 2/3
simple system:
1 create lookup table with all possible subfingerprints (232
) pointing to occurrences
2 assume at least one of the extracted 256 subfingerprints is error-free
⇒ only entries listed at 256 positions of the table have to be checked
3 compute Hamming distance between extracted fingerprint and candidates
module 11.0: audio fingerprinting 11 / 14
34. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 2/3
simple system:
1 create lookup table with all possible subfingerprints (232
) pointing to occurrences
2 assume at least one of the extracted 256 subfingerprints is error-free
⇒ only entries listed at 256 positions of the table have to be checked
3 compute Hamming distance between extracted fingerprint and candidates
module 11.0: audio fingerprinting 11 / 14
35. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 3/3
variant 1:
• allow one bit error
⇒ workload increase by factor ≈ 33
variant 2:
• introduce concept of bit error probability into fingerprint extraction
▶ small energy difference → high error probability
▶ large energy difference → low error probability
• rank bits per subfingerprint by error probability and check only for bit errors at likely
positions
module 11.0: audio fingerprinting 12 / 14
36. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 3/3
variant 1:
• allow one bit error
⇒ workload increase by factor ≈ 33
variant 2:
• introduce concept of bit error probability into fingerprint extraction
▶ small energy difference → high error probability
▶ large energy difference → low error probability
• rank bits per subfingerprint by error probability and check only for bit errors at likely
positions
module 11.0: audio fingerprinting 12 / 14
37. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 3/3
variant 1:
• allow one bit error
⇒ workload increase by factor ≈ 33
variant 2:
• introduce concept of bit error probability into fingerprint extraction
▶ small energy difference → high error probability
▶ large energy difference → low error probability
• rank bits per subfingerprint by error probability and check only for bit errors at likely
positions
module 11.0: audio fingerprinting 12 / 14
38. overview intro requirements approaches philips shazam summary
audio fingerprinting
system example: philips identification 3/3
variant 1:
• allow one bit error
⇒ workload increase by factor ≈ 33
variant 2:
• introduce concept of bit error probability into fingerprint extraction
▶ small energy difference → high error probability
▶ large energy difference → low error probability
• rank bits per subfingerprint by error probability and check only for bit errors at likely
positions
module 11.0: audio fingerprinting 12 / 14
39. overview intro requirements approaches philips shazam summary
audio fingerprinting
other systems: shazam
plot from1
1
A. Wang, “An Industrial Strength Audio Search Algorithm,” in Proceedings of the 4th International Conference on Music Information Retrieval
(ISMIR), Washington, 2003.
module 11.0: audio fingerprinting 13 / 14
40. overview intro requirements approaches philips shazam summary
summary
lecture content
audio fingerprinting
• represent recording with compact, robust, and unique fingerprint
• focus on (perceptual) audio representation rather than “musical” content
• allow efficient matching of this fingerprint with database
often confused with other tasks
1 audio watermarking
2 cover song detection
module 11.0: audio fingerprinting 14 / 14