SlideShare a Scribd company logo
1 of 27
Department of Electronics & Computers,  IIT Roorkee A Bachelor Thesis Project Presentation(First Evaluation) on Audio FingerprintingFor  Song Identification under the guidance ofDr. Padam Kumar Team –  RishabhSoodB.Tech. CSE IV Yr. 070820 Santosh KumarB.Tech. CSE IV Yr. 070824 VikeshKhannaB.Tech. CSE IV Yr. 070829
Contents 1.  Objective1.1  Problem statement 	1.2  Motivation  2.  Theory2.1 Audio Fingerprint definition2.2 System Parameters   Design3.1 Architecture		3.2 Flow Diagram		3.3 Codec Layer 3.4 Fingerprint Layer 3.5 Protocol Layer3.6 Search Algorithm	3.7 Database Architecture  Demonstration Progress timeline References 2
Problem Statement To build a robust audio fingerprinting system which can be used to identify songs efficiently from a large database  with limited computing resources and input.
Motivation There is an immense scope of robust audio fingerprinting applications in the industry.  P2P  Filtering Filtering copyright material  from P2P networks  Even if filenames and Metadata is tampered with Language Translation Identifying audio content In foreign languages, not  possible by textual search Broadcast Monitoring Automating the royalties collection by monitoring  broadcast channels Media Plugins Plugins for playlist  generation and identifying  similar tracks
Audio Fingerprint definition An audio fingerprint is essentially a hash function that maps an audio object of a large number of bits  to a ‘fingerprint’ of only a limited number of bits. The audio object can be uniquely identified from this bit string.  F 5 MB 100 KB
Audio Fingerprint v/s Cryptographic hash functions Mathematical Equivalence v/s Perceptual similarityAssume X and Y are two objects that are mapped into H(X) and H(Y) by a crypto. hash function H. Strictmathematical equality of H(X) and H(Y) implies an equality of X and Y with a very low probability of error.   In case of audio, we are not interested in strict mathematical equivalence but perceptual  similarity. Transitivity propertyIf two sound tracks X and Y are perceptually similar while Y and Z are perceptually similar to each other, it does NOT imply that X and Z are perceptually similar. Transitivity property essentially holds for all mathematical hash functions. Therefore, in stead of mathematical equivalence, we use threshold comparisons:  |F(x) – F(y) | ≤ T      implies X and Y are similar|F(x) – F(y) | > T      implies X and Y are not similar
System Parameters Robustness Low false negative rate. Reliability Low false positive rate.  Fingerprint Size How many bits per song? Granularity What is the minimum input size? Search Speed How fast is the search for a particular database size?
Architecture A layered approach
HTTP POST request CLIENT SERVER Protocol Layer Database (Search Algorithm) Fingerprint  Metadata Fingerprint Layer XML generator Samples in  unsigned char format Codec Layer XML Data Audio input XML Parser Album Artist Lyrics 9
Protocol Layer Fingerprint Layer Codec Layer An audio codec is a computer program that compresses/decompresses an audio file format for encryption or playback AAC MP3 WMA AAC
Protocol Layer WAV Fingerprint Layer Codec Layer AudioData i) Samples (unsigned char* samples)A buffer of the actual data samples (2 bytes or 16 bits per sample)ii) Byte Order (intbyteOrder) The byte order of the samples in. This can be CONST_LITTLE_ENDIAN or CONST_BIG_ENDIAN iii) Number of samples (long size) Number of samples read.iv) Sample rate (intsRate)  The number of samples per second of audio (samples/sec)v) Stereo (bool stereo) Boolean value indicating whether the audio is stereo 	Vi) DurationDuration of the original audio regardless of the number of samples. Vii)  FormatFormat of the original audio. This will be expressed as file extensions -  .mp3, .wav  etc.
Protocol Layer Fingerprint Layer The “RIFF” chunk  descriptor. The format “WAVE” requires two subchunks “fmt” and “data”  Codec Layer Field offset (bytes) Field size (bytes) Field name Endian 0 big 4 ChunkID “fmt” subchunk Describes the format of the data in the “data” subchunk 4 ChunkSize little 4 8 4 big Format Uncompressed PCM (WAV format) [4] 12 big 4 Subchunk1 ID 16 little 4 Subchunk1 Size 20 2 Audio Format little 22 little Num channels 4 “data” subchunk Indicates the ‘size’ of the sound Information and contains the raw sound data 24 little 4 Sample rate 28 Byte Rate little 4 32 Block Align little 2 34 BitsPerSample little 4 36 big 4 Subchunk2 ID 40 4 Subchunk2 Size little 44 Data little Subchunk2 size
Protocol Layer Fingerprint Layer Codec Layer Fingerprint layer carries out the core mathematical analysis of the audio, thereby converting a 5MB audio file into a 100KB fingerprint (bit string) WAV (5MB) fea690b1-b11dce98-a… (100KB)
Protocol Layer Fingerprint Layer Codec Layer Fingerprint extraction scheme [1] : FramingDivide the audio file into equally sized frames .  Sub fingerprintingFor each frame, degradation invariant features are calculated. Well known audio features include Fourier Coefficients, Mel Frequency Cepstral Coefficients (MFCC), Spectral Flatness, Sharpness, Linear Predictive Coding (LPC). These features are mapped into a more compact representation by using classification algorithms like Hidden Markov Models (HMM) or Quantization. Generate a fingerprint blockOne sub fingerprint is not sufficient for identification of an audio clip. The basic unit that is sufficient to identify an audio clip is called a fingerprint block.
Protocol Layer 1 if E(n.m) – E(n,m+1) – (E(n-1,m) – E(n-1,m-1)) > 00 if E(n.m) – E(n,m+1) – (E(n-1,m) – E(n-1,m-1)) <= 0 F(n,m) =  Fingerprint Layer Codec Layer E(n,m) = Energy of band m of frame n F(n,m)  = m-th bit of the subfingerprint of frame n  Framing Framing Band Division Energy Computation - F + ∑ x2 Feature T >0 F(n,0) + - - ∑ x2 T >0 F(n,1) + + - ABS - ∑ x2 T >0 F(n,30) + + - - T ∑ x2 F(n,31) >0 + +
Protocol Layer Fingerprint Layer Codec Layer The protocol layer accepts the fingerprint from the fingerprint layer and makes an HTTP POST request to the server for the relevant metadata.  The protocol layer has two major modules –  HTTP moduleThis module implements the POST request to the server with the fingerprint in the request message. XML Parser The returned metadata is in XML format. The protocol layer has the parser module to retrieve the required information like the artist, album, lyrics etc.
Protocol Layer Fingerprint Layer Codec Layer POST/path/script.cgi HTTP/1.0 From: vikesh@zeppelin.com User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 32 client_id=42&fingerprint=fea690b1b11dce98a… HTTP POST Database AlbumDark Side of the moon Song Comfortably Numb Artist  Pink Floyd XML <xml version=“1.0” version=“UTF-8” ?> <metadata fp=“fea690b1b11dce98a…” id=“42”> <album>Dark Side of the moon</album> 	<song>Comfortably Numb</song> 	<artist>Pink Floyd</artist> </metadata> XML Parser
Server side Search algorithm and scalability
Database Architecture To understand the search algorithm, it is essential to understand the database architecture first.
Database Implementation Tables used: a) look_up subfingerprintINT link_list	BLOB b) songs song_id		INT song_fingerprint	MEDIUMBLOB c) Metadata        This table stores the song name, album, artist, genre, lyrics, year etc. Note: ,[object Object]
The list is stored as a binary large object via object serialization. It contains the following fields:i)songId ii)offset
Search algorithm A brute force matching approach takes O(n) time which is unacceptable for any commercial deployment having large databases. For example, consider a moderate fingerprint database of 10,000 songs with an average length of 5 minutes. Recall that every 11.6 ms of audio generates a sub-fingerprint => 	Number of sub-fingerprints = (5 x 10000 x 60) /  (11.6 x  10-3 )                             =  258 million Assuming a rate of 2 x 105  fingerprint comparisons per seconds [1] on a modern PC, an O(n) time algorithm takes about 20 minutes for execution on this database.  Optimized Algorithm Assumption: At least one sub-fingerprint has an exact match in the correct song.  The positions in the database where a specific 32-bit  sub-fingerprint is located are retrieved using the database architecture shown already. The fingerprint database contains a lookup table (LUT) with all possible 32 bit sub-fingerprints as an entry. Every entry points to a list with pointers to the positions in the real fingerprint lists where the respective 32-bit sub-fingerprints are located.  Assume the same 10,000 song database with each song of length approximately 5 minutes, leading to about 250 million sub-fingerprints. The average number of positions in the list will be, assuming all positions to be equally likely, :	Average list size =  250,000,000 / 232  		       =  0.058
Search algorithm Average number of comparisons per identification  = 0.058 x 256  			              	= 15 Therefore, the average time for the algorithm = 15 x 20 / 106                = 30 ms Improvement over brute force = 20 x 60 / 30 x 10-3           =   40,000
Demonstration Codec Layer in action
Progress Timeline
Progress Timeline (contd.)
References [1] JaapHaitsmaand Ton Kalker, “A highly robust audio fingerprinting system”, Philips Research , Eindhoven, The Netherlands, October 2001[2] Music IP corporation, Available HTTP: musicip.com[3] Neuschmied H., Mayer H. and Battle E., “Identification of Audio Titles on the Internet”, Proceedings of International  October 2000. Conference on Web Delivering of Music 2001, Florence, Italy, November 2001 [4] Microsoft-IBM Wave file format, Available HTTP: ccrma.stanford.edu/courses/422/projects/WaveFormat/ [5] Haitsma J., Kalker T. and Oostveen J., “Robust Audio Hashing for Content Identification, Content BasedMultimedia Indexing 2001, Brescia, Italy, September 2001.

More Related Content

What's hot

Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processingazhagujaisudhan
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
Optimal reception-of-digital-signals
Optimal reception-of-digital-signalsOptimal reception-of-digital-signals
Optimal reception-of-digital-signalsxyxz
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding StandardVideoguy
 
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transformsDISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transformsNITHIN KALLE PALLY
 
Wavelets AND counterlets
Wavelets  AND  counterletsWavelets  AND  counterlets
Wavelets AND counterletsAvichal Sharma
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing Sandeep Wakchaure
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)BushraShaikh44
 
A short history of video coding
A short history of video codingA short history of video coding
A short history of video codingIain Richardson
 

What's hot (20)

Equalization
EqualizationEqualization
Equalization
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Mp3
Mp3Mp3
Mp3
 
Multimedia compression
Multimedia compressionMultimedia compression
Multimedia compression
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Optimal reception-of-digital-signals
Optimal reception-of-digital-signalsOptimal reception-of-digital-signals
Optimal reception-of-digital-signals
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Introduction to Adaptive filters
Introduction to Adaptive filtersIntroduction to Adaptive filters
Introduction to Adaptive filters
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding Standard
 
Equalization
EqualizationEqualization
Equalization
 
Turbo Codes
Turbo CodesTurbo Codes
Turbo Codes
 
Digital Filters Part 1
Digital Filters Part 1Digital Filters Part 1
Digital Filters Part 1
 
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transformsDISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
DISTINGUISH BETWEEN WALSH TRANSFORM AND HAAR TRANSFORMDip transforms
 
Wavelets AND counterlets
Wavelets  AND  counterletsWavelets  AND  counterlets
Wavelets AND counterlets
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)
 
A short history of video coding
A short history of video codingA short history of video coding
A short history of video coding
 

Viewers also liked

Ieee projects 2011 2012
Ieee projects 2011 2012Ieee projects 2011 2012
Ieee projects 2011 2012SBGC
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
MASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio FingerprintingMASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio FingerprintingXavier Anguera
 
seminar topics cse
seminar topics cseseminar topics cse
seminar topics cseNawab Khan
 
Mobi context a context aware cloud-based venue recommendation framework
Mobi context a context aware cloud-based venue recommendation frameworkMobi context a context aware cloud-based venue recommendation framework
Mobi context a context aware cloud-based venue recommendation frameworkNagamalleswararao Tadikonda
 
Friendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networksFriendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networksNagamalleswararao Tadikonda
 
Wearable Technology Futures 2020: A New Path for Public Health?
Wearable Technology Futures 2020: A New Path for Public Health?Wearable Technology Futures 2020: A New Path for Public Health?
Wearable Technology Futures 2020: A New Path for Public Health?Ogilvy Health
 
Brain Finger Printing Technology
Brain Finger Printing TechnologyBrain Finger Printing Technology
Brain Finger Printing TechnologyYashu Cutepal
 
power generation by speed breaker
power generation by speed breakerpower generation by speed breaker
power generation by speed breakerIftekhar Alam
 
Power generation from speed breakers
Power generation from speed breakers Power generation from speed breakers
Power generation from speed breakers Brati Sundar Nanda
 
GENERATION OF ELECTRICITY THROUGH SPEED BREAKER
GENERATION OF ELECTRICITY THROUGH SPEED BREAKERGENERATION OF ELECTRICITY THROUGH SPEED BREAKER
GENERATION OF ELECTRICITY THROUGH SPEED BREAKERSamiullah Kakar
 
Power generation from speed breakers
Power generation from speed breakersPower generation from speed breakers
Power generation from speed breakersBiswajit Pratihari
 
Enabling Next Generation RFID Application
Enabling Next Generation RFID ApplicationEnabling Next Generation RFID Application
Enabling Next Generation RFID ApplicationMahmud M
 
My presentation on Android in my college
My presentation on Android in my collegeMy presentation on Android in my college
My presentation on Android in my collegeSneha Lata
 
Friendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networksFriendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networksPapitha Velumani
 
Vehicle Number Plate Recognition System
Vehicle Number Plate Recognition SystemVehicle Number Plate Recognition System
Vehicle Number Plate Recognition Systemprashantdahake
 
wimax Ppt for seminar
wimax Ppt for seminarwimax Ppt for seminar
wimax Ppt for seminarPratik Anand
 

Viewers also liked (17)

Ieee projects 2011 2012
Ieee projects 2011 2012Ieee projects 2011 2012
Ieee projects 2011 2012
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
MASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio FingerprintingMASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio Fingerprinting
 
seminar topics cse
seminar topics cseseminar topics cse
seminar topics cse
 
Mobi context a context aware cloud-based venue recommendation framework
Mobi context a context aware cloud-based venue recommendation frameworkMobi context a context aware cloud-based venue recommendation framework
Mobi context a context aware cloud-based venue recommendation framework
 
Friendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networksFriendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networks
 
Wearable Technology Futures 2020: A New Path for Public Health?
Wearable Technology Futures 2020: A New Path for Public Health?Wearable Technology Futures 2020: A New Path for Public Health?
Wearable Technology Futures 2020: A New Path for Public Health?
 
Brain Finger Printing Technology
Brain Finger Printing TechnologyBrain Finger Printing Technology
Brain Finger Printing Technology
 
power generation by speed breaker
power generation by speed breakerpower generation by speed breaker
power generation by speed breaker
 
Power generation from speed breakers
Power generation from speed breakers Power generation from speed breakers
Power generation from speed breakers
 
GENERATION OF ELECTRICITY THROUGH SPEED BREAKER
GENERATION OF ELECTRICITY THROUGH SPEED BREAKERGENERATION OF ELECTRICITY THROUGH SPEED BREAKER
GENERATION OF ELECTRICITY THROUGH SPEED BREAKER
 
Power generation from speed breakers
Power generation from speed breakersPower generation from speed breakers
Power generation from speed breakers
 
Enabling Next Generation RFID Application
Enabling Next Generation RFID ApplicationEnabling Next Generation RFID Application
Enabling Next Generation RFID Application
 
My presentation on Android in my college
My presentation on Android in my collegeMy presentation on Android in my college
My presentation on Android in my college
 
Friendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networksFriendbook a semantic based friend recommendation system for social networks
Friendbook a semantic based friend recommendation system for social networks
 
Vehicle Number Plate Recognition System
Vehicle Number Plate Recognition SystemVehicle Number Plate Recognition System
Vehicle Number Plate Recognition System
 
wimax Ppt for seminar
wimax Ppt for seminarwimax Ppt for seminar
wimax Ppt for seminar
 

Similar to Audio Fingerprinting Introduction

AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxHamzaJaved306957
 
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to knowRoy van Rijn
 
Listening at the Cocktail Party with Deep Neural Networks and TensorFlow
Listening at the Cocktail Party with Deep Neural Networks and TensorFlowListening at the Cocktail Party with Deep Neural Networks and TensorFlow
Listening at the Cocktail Party with Deep Neural Networks and TensorFlowDatabricks
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labIRJET Journal
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 
Ig2 task 1 no2
Ig2 task 1 no2Ig2 task 1 no2
Ig2 task 1 no2Kenyon101
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetluisfvazquez1
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preiviousPhillipWynne12281991
 
The method of comparing two audio files
The method of comparing two audio filesThe method of comparing two audio files
The method of comparing two audio filesMinh Anh Nguyen
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetJordanianmc
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
 

Similar to Audio Fingerprinting Introduction (20)

AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Introductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal ProcessingIntroductory Lecture to Audio Signal Processing
Introductory Lecture to Audio Signal Processing
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
G05114043
G05114043G05114043
G05114043
 
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to know
 
Listening at the Cocktail Party with Deep Neural Networks and TensorFlow
Listening at the Cocktail Party with Deep Neural Networks and TensorFlowListening at the Cocktail Party with Deep Neural Networks and TensorFlow
Listening at the Cocktail Party with Deep Neural Networks and TensorFlow
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
Arithmetic Coding
Arithmetic CodingArithmetic Coding
Arithmetic Coding
 
Soundpres
SoundpresSoundpres
Soundpres
 
Ig2 task 1 no2
Ig2 task 1 no2Ig2 task 1 no2
Ig2 task 1 no2
 
Ig2
Ig2Ig2
Ig2
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Ig2 task 1
Ig2 task 1Ig2 task 1
Ig2 task 1
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preivious
 
The method of comparing two audio files
The method of comparing two audio filesThe method of comparing two audio files
The method of comparing two audio files
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Audio Fingerprinting Introduction

  • 1. Department of Electronics & Computers, IIT Roorkee A Bachelor Thesis Project Presentation(First Evaluation) on Audio FingerprintingFor Song Identification under the guidance ofDr. Padam Kumar Team – RishabhSoodB.Tech. CSE IV Yr. 070820 Santosh KumarB.Tech. CSE IV Yr. 070824 VikeshKhannaB.Tech. CSE IV Yr. 070829
  • 2. Contents 1. Objective1.1 Problem statement 1.2 Motivation 2. Theory2.1 Audio Fingerprint definition2.2 System Parameters Design3.1 Architecture 3.2 Flow Diagram 3.3 Codec Layer 3.4 Fingerprint Layer 3.5 Protocol Layer3.6 Search Algorithm 3.7 Database Architecture Demonstration Progress timeline References 2
  • 3. Problem Statement To build a robust audio fingerprinting system which can be used to identify songs efficiently from a large database with limited computing resources and input.
  • 4. Motivation There is an immense scope of robust audio fingerprinting applications in the industry. P2P Filtering Filtering copyright material from P2P networks Even if filenames and Metadata is tampered with Language Translation Identifying audio content In foreign languages, not possible by textual search Broadcast Monitoring Automating the royalties collection by monitoring broadcast channels Media Plugins Plugins for playlist generation and identifying similar tracks
  • 5. Audio Fingerprint definition An audio fingerprint is essentially a hash function that maps an audio object of a large number of bits to a ‘fingerprint’ of only a limited number of bits. The audio object can be uniquely identified from this bit string. F 5 MB 100 KB
  • 6. Audio Fingerprint v/s Cryptographic hash functions Mathematical Equivalence v/s Perceptual similarityAssume X and Y are two objects that are mapped into H(X) and H(Y) by a crypto. hash function H. Strictmathematical equality of H(X) and H(Y) implies an equality of X and Y with a very low probability of error. In case of audio, we are not interested in strict mathematical equivalence but perceptual similarity. Transitivity propertyIf two sound tracks X and Y are perceptually similar while Y and Z are perceptually similar to each other, it does NOT imply that X and Z are perceptually similar. Transitivity property essentially holds for all mathematical hash functions. Therefore, in stead of mathematical equivalence, we use threshold comparisons: |F(x) – F(y) | ≤ T implies X and Y are similar|F(x) – F(y) | > T implies X and Y are not similar
  • 7. System Parameters Robustness Low false negative rate. Reliability Low false positive rate. Fingerprint Size How many bits per song? Granularity What is the minimum input size? Search Speed How fast is the search for a particular database size?
  • 9. HTTP POST request CLIENT SERVER Protocol Layer Database (Search Algorithm) Fingerprint Metadata Fingerprint Layer XML generator Samples in unsigned char format Codec Layer XML Data Audio input XML Parser Album Artist Lyrics 9
  • 10. Protocol Layer Fingerprint Layer Codec Layer An audio codec is a computer program that compresses/decompresses an audio file format for encryption or playback AAC MP3 WMA AAC
  • 11. Protocol Layer WAV Fingerprint Layer Codec Layer AudioData i) Samples (unsigned char* samples)A buffer of the actual data samples (2 bytes or 16 bits per sample)ii) Byte Order (intbyteOrder) The byte order of the samples in. This can be CONST_LITTLE_ENDIAN or CONST_BIG_ENDIAN iii) Number of samples (long size) Number of samples read.iv) Sample rate (intsRate) The number of samples per second of audio (samples/sec)v) Stereo (bool stereo) Boolean value indicating whether the audio is stereo Vi) DurationDuration of the original audio regardless of the number of samples. Vii) FormatFormat of the original audio. This will be expressed as file extensions - .mp3, .wav etc.
  • 12. Protocol Layer Fingerprint Layer The “RIFF” chunk descriptor. The format “WAVE” requires two subchunks “fmt” and “data” Codec Layer Field offset (bytes) Field size (bytes) Field name Endian 0 big 4 ChunkID “fmt” subchunk Describes the format of the data in the “data” subchunk 4 ChunkSize little 4 8 4 big Format Uncompressed PCM (WAV format) [4] 12 big 4 Subchunk1 ID 16 little 4 Subchunk1 Size 20 2 Audio Format little 22 little Num channels 4 “data” subchunk Indicates the ‘size’ of the sound Information and contains the raw sound data 24 little 4 Sample rate 28 Byte Rate little 4 32 Block Align little 2 34 BitsPerSample little 4 36 big 4 Subchunk2 ID 40 4 Subchunk2 Size little 44 Data little Subchunk2 size
  • 13. Protocol Layer Fingerprint Layer Codec Layer Fingerprint layer carries out the core mathematical analysis of the audio, thereby converting a 5MB audio file into a 100KB fingerprint (bit string) WAV (5MB) fea690b1-b11dce98-a… (100KB)
  • 14. Protocol Layer Fingerprint Layer Codec Layer Fingerprint extraction scheme [1] : FramingDivide the audio file into equally sized frames . Sub fingerprintingFor each frame, degradation invariant features are calculated. Well known audio features include Fourier Coefficients, Mel Frequency Cepstral Coefficients (MFCC), Spectral Flatness, Sharpness, Linear Predictive Coding (LPC). These features are mapped into a more compact representation by using classification algorithms like Hidden Markov Models (HMM) or Quantization. Generate a fingerprint blockOne sub fingerprint is not sufficient for identification of an audio clip. The basic unit that is sufficient to identify an audio clip is called a fingerprint block.
  • 15. Protocol Layer 1 if E(n.m) – E(n,m+1) – (E(n-1,m) – E(n-1,m-1)) > 00 if E(n.m) – E(n,m+1) – (E(n-1,m) – E(n-1,m-1)) <= 0 F(n,m) = Fingerprint Layer Codec Layer E(n,m) = Energy of band m of frame n F(n,m) = m-th bit of the subfingerprint of frame n Framing Framing Band Division Energy Computation - F + ∑ x2 Feature T >0 F(n,0) + - - ∑ x2 T >0 F(n,1) + + - ABS - ∑ x2 T >0 F(n,30) + + - - T ∑ x2 F(n,31) >0 + +
  • 16. Protocol Layer Fingerprint Layer Codec Layer The protocol layer accepts the fingerprint from the fingerprint layer and makes an HTTP POST request to the server for the relevant metadata. The protocol layer has two major modules – HTTP moduleThis module implements the POST request to the server with the fingerprint in the request message. XML Parser The returned metadata is in XML format. The protocol layer has the parser module to retrieve the required information like the artist, album, lyrics etc.
  • 17. Protocol Layer Fingerprint Layer Codec Layer POST/path/script.cgi HTTP/1.0 From: vikesh@zeppelin.com User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 32 client_id=42&fingerprint=fea690b1b11dce98a… HTTP POST Database AlbumDark Side of the moon Song Comfortably Numb Artist Pink Floyd XML <xml version=“1.0” version=“UTF-8” ?> <metadata fp=“fea690b1b11dce98a…” id=“42”> <album>Dark Side of the moon</album> <song>Comfortably Numb</song> <artist>Pink Floyd</artist> </metadata> XML Parser
  • 18. Server side Search algorithm and scalability
  • 19. Database Architecture To understand the search algorithm, it is essential to understand the database architecture first.
  • 20.
  • 21. The list is stored as a binary large object via object serialization. It contains the following fields:i)songId ii)offset
  • 22. Search algorithm A brute force matching approach takes O(n) time which is unacceptable for any commercial deployment having large databases. For example, consider a moderate fingerprint database of 10,000 songs with an average length of 5 minutes. Recall that every 11.6 ms of audio generates a sub-fingerprint => Number of sub-fingerprints = (5 x 10000 x 60) / (11.6 x 10-3 ) = 258 million Assuming a rate of 2 x 105 fingerprint comparisons per seconds [1] on a modern PC, an O(n) time algorithm takes about 20 minutes for execution on this database. Optimized Algorithm Assumption: At least one sub-fingerprint has an exact match in the correct song. The positions in the database where a specific 32-bit sub-fingerprint is located are retrieved using the database architecture shown already. The fingerprint database contains a lookup table (LUT) with all possible 32 bit sub-fingerprints as an entry. Every entry points to a list with pointers to the positions in the real fingerprint lists where the respective 32-bit sub-fingerprints are located. Assume the same 10,000 song database with each song of length approximately 5 minutes, leading to about 250 million sub-fingerprints. The average number of positions in the list will be, assuming all positions to be equally likely, : Average list size = 250,000,000 / 232 = 0.058
  • 23. Search algorithm Average number of comparisons per identification = 0.058 x 256 = 15 Therefore, the average time for the algorithm = 15 x 20 / 106 = 30 ms Improvement over brute force = 20 x 60 / 30 x 10-3 = 40,000
  • 27. References [1] JaapHaitsmaand Ton Kalker, “A highly robust audio fingerprinting system”, Philips Research , Eindhoven, The Netherlands, October 2001[2] Music IP corporation, Available HTTP: musicip.com[3] Neuschmied H., Mayer H. and Battle E., “Identification of Audio Titles on the Internet”, Proceedings of International October 2000. Conference on Web Delivering of Music 2001, Florence, Italy, November 2001 [4] Microsoft-IBM Wave file format, Available HTTP: ccrma.stanford.edu/courses/422/projects/WaveFormat/ [5] Haitsma J., Kalker T. and Oostveen J., “Robust Audio Hashing for Content Identification, Content BasedMultimedia Indexing 2001, Brescia, Italy, September 2001.
  • 28. Thank you Any Questions?