Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
(Autonomous Institution, approved by UGC and Accredited by NAAC with ‘A’ Grade)
TECHNICAL SEMINAR
Presented by…
Mrinmoy Da...
AUDIO PROCESSING
AND MUSIC
RECOGNITION
SOUND
316 February 2016
WHAT IS SOUND
DEFINITION
 Physical - sound as a disturbance in the air
 Psychophysical - sound as perceived by the
ear
...
HOW DO WE HEAR
Ear connected to the brain
left brain: speech
right brain: music
Ear's sensitivity to frequency is logarith...
DIGITAL AUDIO
16 February 2016 6
FUNDA - MENTALS
• Digital audio is sound reproduction using pulse-code modulation and digital
signals
• Digital audio syst...
SOUND : PHYSICAL TO DIGITAL
16 February 2016 8
PULSE CODE MODULATION
PCM consists of three steps to digitize an analog signal
Sampling
Quantization
Binary encoding
16 Fe...
PULSE CODE MODULATION
16 February 2016 10
SAMPLING
16 February 2016 11
16 February 2016 12
1 song = 27.2 MB
1 GB Hard Drive
($899 in 1995)
Would hold
35 songs
AUDIO COMPRESSION
• Audio data compression, as distinguished from dynamic range
compression, has the potential to reduce t...
AUDIO FILE FORMATS
• RIFF (Resource Interchange File
Format)
• MS WAV and .AVI
• MPEGAudio Layer (MPEG) [.mpa,
.mp3]
• AIF...
WHAT’S IN A SOUND FILE
• Header Information
• Magic Cookie
• Sampling Rate
• Bits/Sample
• Channels
• Byte Order
• Endian
...
16 February 2016 16
AUDIO PROCESSING
16 February 2016 17
AUDIO PROCESSING
• Audio signal processing, sometimes referred to as audio processing,
is the intentional alteration of au...
16 February 2016 19
AUDIO PROCESSING TECHNIQUES
• Equalization
• Modulation
• Delay
• Chorus
• Flanger
• Phaser
• Pitch Shifting
• Time Stretc...
MUSIC
RECOGNITION
AUDIO FINGERPRINTING
16 February 2016 22
16 February 2016 23
An audio fingerprint is essentially a hash function that maps an audio object of a
large number of bit...
AUDIO FINGERPRINTING ARCHITECTURE
16 February 2016 24
CODEC LAYER
16 February 2016 25
i) Samples (unsigned char* samples)
A buffer of the actual data samples (2 bytes or 16 bit...
FINGERPRINTING LAYER
16 February 2016 26
WAV
(5MB)
fea690b1-b11dce98-a…
(100KB)
Fingerprint layer carries out the core mat...
16 February 2016 27
POST /path/script.cgi HTTP/1.0
From: XYZ@abc.com
User-Agent: HTTPTool/1.0
Content-Type: application/x-...
16 February 2016 29
HOW SHAZAM WORKS
1. Beforehand, Shazam fingerprints a comprehensive catalog of
music, and stores the fingerprints in a dat...
SPECTROGRAM FINGERPRINTING
• You can think of any piece of music as a time-
frequency graph called a spectrogram.
• On one...
SPECTROGRAM FINGERPRINTING
• The Shazam algorithm fingerprints a
song by generating this 3d graph, and
identifying frequen...
SPECTOGRAM FINGERPRINTING
• Shazam builds their fingerprint
catalog out as a hash table,
where the key is the frequency.
•...
SPECTOGRAM FINGERPRINTING
• If a specific song is hit multiple times, it then checks
to see if these frequencies correspon...
16 February 2016 35
16 February 2016 36
16 February 2016 37
Audio Processing and Music Recognition
Upcoming SlideShare
Loading in …5
×

Audio Processing and Music Recognition

1,432 views

Published on

Basics of Sound and Hearing (Physics and Biology), Digital Audio Processing (Electronics) and Music Recognition (Shazam, Computer Science)

Published in: Education
  • Be the first to comment

Audio Processing and Music Recognition

  1. 1. (Autonomous Institution, approved by UGC and Accredited by NAAC with ‘A’ Grade) TECHNICAL SEMINAR Presented by… Mrinmoy Dalal CSE A (13311A0506) 16 February 2016
  2. 2. AUDIO PROCESSING AND MUSIC RECOGNITION
  3. 3. SOUND 316 February 2016
  4. 4. WHAT IS SOUND DEFINITION  Physical - sound as a disturbance in the air  Psychophysical - sound as perceived by the ear  Sound as stimulus (physical event) & sound as a sensation.  Pressures changes (in band from 20 Hz to 20 kHz)  ACOUSTICS is the study of sound. PHYSICAL TERMS  Amplitude  Frequency  Spectrum 416 February 2016
  5. 5. HOW DO WE HEAR Ear connected to the brain left brain: speech right brain: music Ear's sensitivity to frequency is logarithmic Varying frequency response Dynamic range is about 120 dB (at 3-4 kHz) Frequency discrimination 2 Hz (at 1 kHz) Intensity change of 1 dB can be detected. 16 February 2016 5
  6. 6. DIGITAL AUDIO 16 February 2016 6
  7. 7. FUNDA - MENTALS • Digital audio is sound reproduction using pulse-code modulation and digital signals • Digital audio systems include analog-to-digital conversion (ADC), digital-to- analog conversion (DAC), digital storage, processing and transmission components • A primary benefit of digital audio is in its convenience of storage, transmission and retrieval • Digital audio is useful in the recording, manipulation, mass-production, and distribution of sound • Modern distribution of music across the Internet via on-line stores depends on digital recording and digital compression algorithms 16 February 2016 7
  8. 8. SOUND : PHYSICAL TO DIGITAL 16 February 2016 8
  9. 9. PULSE CODE MODULATION PCM consists of three steps to digitize an analog signal Sampling Quantization Binary encoding 16 February 2016 9
  10. 10. PULSE CODE MODULATION 16 February 2016 10
  11. 11. SAMPLING 16 February 2016 11
  12. 12. 16 February 2016 12 1 song = 27.2 MB 1 GB Hard Drive ($899 in 1995) Would hold 35 songs
  13. 13. AUDIO COMPRESSION • Audio data compression, as distinguished from dynamic range compression, has the potential to reduce the transmission bandwidth and storage requirements of audio data. • Audio compression algorithms are implemented in software as audio codecs. • Lossy audio compression algorithms provide higher compression at the cost of fidelity and are used in numerous audio applications. • Lossless audio compression produces a representation of digital data that decompress to an exact digital duplicate of the original audio stream 16 February 2016 13
  14. 14. AUDIO FILE FORMATS • RIFF (Resource Interchange File Format) • MS WAV and .AVI • MPEGAudio Layer (MPEG) [.mpa, .mp3] • AIFC (Apple, SGI) [.aiff, .aif] • HCOM (Mac) [.hcom] • SND (Sun, NeXT) [.snd] • VOC (SoundBlaster card proprietary standard) [.voc] • AND MANY OTHERS! 16 February 2016 14
  15. 15. WHAT’S IN A SOUND FILE • Header Information • Magic Cookie • Sampling Rate • Bits/Sample • Channels • Byte Order • Endian • Compression type • Data 16 February 2016 15
  16. 16. 16 February 2016 16
  17. 17. AUDIO PROCESSING 16 February 2016 17
  18. 18. AUDIO PROCESSING • Audio signal processing, sometimes referred to as audio processing, is the intentional alteration of auditory signals, or sound, often through an audio effect or effects unit. • As audio signals may be electronically represented in either digital or analog format, signal processing may occur in either domain. • Analog processors operate directly on the electrical signal, while digital processors operate mathematically on the digital representation of that signal. • Processing methods and application areas include storage, level compression, data compression, transmission, etc. 16 February 2016 18
  19. 19. 16 February 2016 19
  20. 20. AUDIO PROCESSING TECHNIQUES • Equalization • Modulation • Delay • Chorus • Flanger • Phaser • Pitch Shifting • Time Stretching • Active Noise Control 16 February 2016 20
  21. 21. MUSIC RECOGNITION
  22. 22. AUDIO FINGERPRINTING 16 February 2016 22
  23. 23. 16 February 2016 23 An audio fingerprint is essentially a hash function that maps an audio object of a large number of bits to a ‘fingerprint’ of only a limited number of bits. The audio object can be uniquely identified from this bit string. AUDIO FINGERPRINT DEFINITION F5MB 100KB
  24. 24. AUDIO FINGERPRINTING ARCHITECTURE 16 February 2016 24
  25. 25. CODEC LAYER 16 February 2016 25 i) Samples (unsigned char* samples) A buffer of the actual data samples (2 bytes or 16 bits per sample) ii) Byte Order (int byteOrder) The byte order of the samples in.This can be CONST_LITTLE_ENDIAN or CONST_BIG_ENDIAN iii) Number of samples (long size) Number of samples read. iv) Sample rate (int sRate) The number of samples per second of audio (samples/sec) v) Stereo (bool stereo) Boolean value indicating whether the audio is stereo Vi) Duration Duration of the original audio regardless of the number of samples. Vii) Format Format of the original audio.This will be expressed as file extensions - .mp3, .wav etc.
  26. 26. FINGERPRINTING LAYER 16 February 2016 26 WAV (5MB) fea690b1-b11dce98-a… (100KB) Fingerprint layer carries out the core mathematical analysis of the audio, thereby converting a 5MB audio file into a 100KB fingerprint (bit string)
  27. 27. 16 February 2016 27 POST /path/script.cgi HTTP/1.0 From: XYZ@abc.com User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 32 client_id=42&fingerprint=fea690b1b11dce98a… HTTP POST Database XML <xml version=“1.0” version=“UTF-8” ?> <metadata fp=“fea690b1b11dce98a…” id=“42”> <album>Dark Side of the moon</album> <song>Comfortably Numb</song> <artist>Pink Floyd</artist> </metadata> XML Parser Album Dark Side of the moon Song Comfortably Numb Artist Pink Floyd PROTOCOL LAYER
  28. 28. 16 February 2016 29
  29. 29. HOW SHAZAM WORKS 1. Beforehand, Shazam fingerprints a comprehensive catalog of music, and stores the fingerprints in a database. 2. A user “tags” a song they hear, which fingerprints a 10 second sample of audio. 3. The Shazam app uploads the fingerprint to Shazam’s service, which runs a search for a matching fingerprint in their database. 4. If a match is found, the song info is returned to the user, otherwise an error is returned. 16 February 2016 30
  30. 30. SPECTROGRAM FINGERPRINTING • You can think of any piece of music as a time- frequency graph called a spectrogram. • On one axis is time, on another is frequency, and on the 3rd is intensity. • Each point on the graph represents the intensity of a given frequency at a specific point in time. • Assuming time is on the x-axis and frequency is on the y-axis, a horizontal line would represent a continuous pure tone and a vertical line would represent an instantaneous burst of white noise. 16 February 2016 31
  31. 31. SPECTROGRAM FINGERPRINTING • The Shazam algorithm fingerprints a song by generating this 3d graph, and identifying frequencies of “peak intensity.” • For each of these peak points it keeps track of the frequency and the amount of time from the beginning 16 February 2016 32 Frequency in Hz Time in seconds 823.44 1.054 1892.31 1.321 712.84 1.703 . . . . . . 819.71 9.943
  32. 32. SPECTOGRAM FINGERPRINTING • Shazam builds their fingerprint catalog out as a hash table, where the key is the frequency. • When Shazam receives a fingerprint like the one above, it uses the first key (in this case 823.44), and it searches for all matching songs. of the track. 16 February 2016 33 Frequency in Hz Time in seconds, song information 823.43 53.352, “SongA” by Artist 1 823.44 34.678, “Song B” by Artist 2 823.45 108.65, “Song C’ by Artist 3 . . . . . . 1892.31 34.945, “Song B” by Artist 2
  33. 33. SPECTOGRAM FINGERPRINTING • If a specific song is hit multiple times, it then checks to see if these frequencies correspond in time. • They create a 2d plot of frequency hits, on one axis is the time from the beginning of the track those frequencies appear in the song, on the other axis is the time those frequencies appear in the sample. • If there is a temporal relation between the sets of points, then the points will align along a diagonal. • They use another signal processing method to find this line, and if it exists with some certainty, then they label the song a match. 16 February 2016 34
  34. 34. 16 February 2016 35
  35. 35. 16 February 2016 36
  36. 36. 16 February 2016 37

×