Digital Music &  Music Processing  George Tzanetakis  PostDoctoral Fellow  Computer Science Department  Carnegie Mellon Un...
Overview <ul><li>Music Information Retrieval (MIR) and Computer Audition </li></ul><ul><ul><li>Motivation </li></ul></ul><...
MIR Music History 9000 B.C 1000 1700 1877 1960 2002
Music <ul><li>4 million recorded CD tracks </li></ul><ul><li>4000 CDs / month </li></ul><ul><li>Mp3 bandwidth % </li></ul>...
The future of MIR <ul><li>Library of all recorded music  </li></ul><ul><li>Tasks: organize, search, retrieve, classify, re...
Audio MIR Pipeline Signal Processing Machine Learning Human Computer Interaction Hearing Representation Understanding Anal...
Traditional Music Representations
Time domain waveform time pressure Decompose into building  blocks time frequency
MIDI <ul><li>Musical Instrument Digital Interface </li></ul><ul><ul><li>Hardware interface  </li></ul></ul><ul><ul><li>Fil...
Symbolic vs Audio MIR Audio Polyphonic Transcription Symbolic Representation  (MIDI) MIR Audio Computer Audition MIR Machi...
Feature extraction
Timbral Texture Timbre = differentiate sounds of same loudness, pitch Timbral Texture = differentiate mixtures of sounds  ...
Spectrum t t+1 M M
Fourier Transform  P=1/f
Short Time Fourier Transform STFT Filterbank interpretation Filters Oscillators Amplitude Frequency output
Short Time Fourier Transform II t t+1 M M
Formants From “Real Time Synthesis for Interactive Applications” P.Cook, A.K Peters Press, used by permission
Linear Prediction Coefficients Impulses @ f0 White Noise Source Filter Speech Lossless tubes
MPEG Audio Coding (mp3) Analysis Filterbank Psychoacoustic Model Available bits 32 linearly spaced bands Encoder: Slower, ...
Spectral Shape t M Centroid Rolloff Flux RMS Moments …
Summary of Timbral Texture Features <ul><li>Time-Frequency analysis </li></ul><ul><li>Signal Processing (STFT, DWT) </li><...
Pitch Content <ul><li>Harmony-melody = pitch concepts </li></ul><ul><li>Music theory  score = music </li></ul><ul><li>Brid...
Automatic Pitch Detection P=1/f Time-domain Frequency-domain Perceptual Zerocrossings  Autocorrelation analysis = peaks of...
Pitch Histograms Jazz Irish Chroma - folded Height - unfolded
Automatic Music Transcription Original Transcribed Mixture signal Noise suppresion Predominant Pitch  Estimation Remove de...
Rhythm  <ul><li>Movement in time </li></ul><ul><li>Origins in Poetry (iambic, trochaic) </li></ul><ul><li>Foot tapping def...
Self similarity DWT Autocorrelation Peak Picking Beat Histograms Envelope Extraction
Beat Histograms
Analysis <ul><li>Classification </li></ul><ul><li>Segmentation </li></ul><ul><li>Similarity Retrieval  </li></ul><ul><li>C...
Analysis Overview Musical Piece Trajectory Point
Query-by-example  Content-based Retrieval  Ranked list of k  nearest neighbors
QBE examples Rock: Beatles Jazz:  Bobby Hutserson  Funk: Mano negra World: Tibetan singer Computer Music: Paul Lansky Quer...
Automatic Musical Genre Classification <ul><li>Categorical music descriptions created by humans  </li></ul><ul><ul><li>Fuz...
Genregram demo  Dynamic real-time visualization for classification  of radio signals
Audio segmentation Detect changes of audio texture
Multifeature automatic segmenation methodology <ul><li>Time series of feature vector  v(t) </li></ul><ul><li>Detect abrupt...
Context & Content Aware  User Interfaces <ul><li>Automatic results not perfect </li></ul><ul><li>Music listening is person...
Content and Context <ul><li>Content ~ file </li></ul><ul><ul><li>Genre, male voice, saxophone </li></ul></ul><ul><li>Conte...
Timbregrams Content & Context Similarity + Time Structure Principal Component Analysis Map feature vectors to color
Timbrespaces
Islands of Music
Auditory Scene Analysis
Upcoming SlideShare
Loading in...5
×

Download

734

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
734
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Download

  1. 1. Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University [email_address] http://www.cs.cmu.edu/~gtzan
  2. 2. Overview <ul><li>Music Information Retrieval (MIR) and Computer Audition </li></ul><ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Techniques </li></ul></ul><ul><ul><li>Applications </li></ul></ul><ul><li>Computer Music and Sound Synthesis </li></ul><ul><ul><li>Examples, demos </li></ul></ul>
  3. 3. MIR Music History 9000 B.C 1000 1700 1877 1960 2002
  4. 4. Music <ul><li>4 million recorded CD tracks </li></ul><ul><li>4000 CDs / month </li></ul><ul><li>Mp3 bandwidth % </li></ul><ul><li>Global </li></ul><ul><li>Pervasive </li></ul><ul><li>Persistent </li></ul><ul><li>Why ? </li></ul>
  5. 5. The future of MIR <ul><li>Library of all recorded music </li></ul><ul><li>Tasks: organize, search, retrieve, classify, recommend, browse, listen, annotate </li></ul><ul><li>Examples: </li></ul>
  6. 6. Audio MIR Pipeline Signal Processing Machine Learning Human Computer Interaction Hearing Representation Understanding Analysis Reacting Interaction
  7. 7. Traditional Music Representations
  8. 8. Time domain waveform time pressure Decompose into building blocks time frequency
  9. 9. MIDI <ul><li>Musical Instrument Digital Interface </li></ul><ul><ul><li>Hardware interface </li></ul></ul><ul><ul><li>File format </li></ul></ul><ul><li>Note events </li></ul><ul><ul><li>Duration, discrete pitch, “instrument” </li></ul></ul><ul><li>Extensions </li></ul><ul><ul><li>General MIDI </li></ul></ul><ul><ul><li>Notation, OMR, continuous pitch </li></ul></ul>
  10. 10. Symbolic vs Audio MIR Audio Polyphonic Transcription Symbolic Representation (MIDI) MIR Audio Computer Audition MIR Machine Learning Models
  11. 11. Feature extraction
  12. 12. Timbral Texture Timbre = differentiate sounds of same loudness, pitch Timbral Texture = differentiate mixtures of sounds Global, statistical and fuzzy properties
  13. 13. Spectrum t t+1 M M
  14. 14. Fourier Transform P=1/f
  15. 15. Short Time Fourier Transform STFT Filterbank interpretation Filters Oscillators Amplitude Frequency output
  16. 16. Short Time Fourier Transform II t t+1 M M
  17. 17. Formants From “Real Time Synthesis for Interactive Applications” P.Cook, A.K Peters Press, used by permission
  18. 18. Linear Prediction Coefficients Impulses @ f0 White Noise Source Filter Speech Lossless tubes
  19. 19. MPEG Audio Coding (mp3) Analysis Filterbank Psychoacoustic Model Available bits 32 linearly spaced bands Encoder: Slower, Complicated Decoder: Faster, Simpler Perceptual Audio Coding
  20. 20. Spectral Shape t M Centroid Rolloff Flux RMS Moments …
  21. 21. Summary of Timbral Texture Features <ul><li>Time-Frequency analysis </li></ul><ul><li>Signal Processing (STFT, DWT) </li></ul><ul><li>Source-filter (LPC) </li></ul><ul><li>Perceptual (MP3) </li></ul><ul><li>Spectral Shape to feature vector </li></ul>
  22. 22. Pitch Content <ul><li>Harmony-melody = pitch concepts </li></ul><ul><li>Music theory score = music </li></ul><ul><li>Bridge to symbolic MIR </li></ul><ul><li>Automatic music transcription </li></ul><ul><li>Non-transcriptive arguments </li></ul>
  23. 23. Automatic Pitch Detection P=1/f Time-domain Frequency-domain Perceptual Zerocrossings Autocorrelation analysis = peaks of function correspond to dominant pitches
  24. 24. Pitch Histograms Jazz Irish Chroma - folded Height - unfolded
  25. 25. Automatic Music Transcription Original Transcribed Mixture signal Noise suppresion Predominant Pitch Estimation Remove detected sound Estimate # of voices
  26. 26. Rhythm <ul><li>Movement in time </li></ul><ul><li>Origins in Poetry (iambic, trochaic) </li></ul><ul><li>Foot tapping definition </li></ul><ul><li>Hierarchical semi-periodic structure at multiple levels of detail </li></ul><ul><li>Links to motion, dance </li></ul><ul><li>Running vs global </li></ul>
  27. 27. Self similarity DWT Autocorrelation Peak Picking Beat Histograms Envelope Extraction
  28. 28. Beat Histograms
  29. 29. Analysis <ul><li>Classification </li></ul><ul><li>Segmentation </li></ul><ul><li>Similarity Retrieval </li></ul><ul><li>Clustering </li></ul><ul><li>Thumbnailing </li></ul><ul><li>Fingerprinting </li></ul>
  30. 30. Analysis Overview Musical Piece Trajectory Point
  31. 31. Query-by-example Content-based Retrieval Ranked list of k nearest neighbors
  32. 32. QBE examples Rock: Beatles Jazz: Bobby Hutserson Funk: Mano negra World: Tibetan singer Computer Music: Paul Lansky Query Match
  33. 33. Automatic Musical Genre Classification <ul><li>Categorical music descriptions created by humans </li></ul><ul><ul><li>Fuzzy boundaries </li></ul></ul><ul><li>Statistical properties </li></ul><ul><ul><li>Timbral texture, rhythmic structure, harmonic content </li></ul></ul><ul><li>Automatic musical genre classification </li></ul><ul><ul><li>Evaluate musical content features </li></ul></ul><ul><ul><li>Structure audio collections </li></ul></ul>
  34. 34. Genregram demo Dynamic real-time visualization for classification of radio signals
  35. 35. Audio segmentation Detect changes of audio texture
  36. 36. Multifeature automatic segmenation methodology <ul><li>Time series of feature vector v(t) </li></ul><ul><li>Detect abrupt changes in trajectory </li></ul>
  37. 37. Context & Content Aware User Interfaces <ul><li>Automatic results not perfect </li></ul><ul><li>Music listening is personal and subjective </li></ul><ul><li>Browsing vs retrieval </li></ul><ul><li>“ Overview, zoom and filter, details on demand”, Shneiderman mantra </li></ul><ul><li>Adapt UI to music content and context </li></ul><ul><ul><li>Computer Audition </li></ul></ul><ul><ul><li>Visualization </li></ul></ul>
  38. 38. Content and Context <ul><li>Content ~ file </li></ul><ul><ul><li>Genre, male voice, saxophone </li></ul></ul><ul><li>Content ~ file, collection </li></ul><ul><ul><li>Similarity </li></ul></ul><ul><ul><li>Slow-fast </li></ul></ul><ul><li>Multiple visualizations </li></ul><ul><ul><li>Same content </li></ul></ul><ul><ul><li>Different context </li></ul></ul>
  39. 39. Timbregrams Content & Context Similarity + Time Structure Principal Component Analysis Map feature vectors to color
  40. 40. Timbrespaces
  41. 41. Islands of Music
  42. 42. Auditory Scene Analysis
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×