Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Download Download Presentation Transcript

  • Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University [email_address]
  • Overview
    • Music Information Retrieval (MIR) and Computer Audition
      • Motivation
      • Techniques
      • Applications
    • Computer Music and Sound Synthesis
      • Examples, demos
  • MIR Music History 9000 B.C 1000 1700 1877 1960 2002 View slide
  • Music
    • 4 million recorded CD tracks
    • 4000 CDs / month
    • Mp3 bandwidth %
    • Global
    • Pervasive
    • Persistent
    • Why ?
    View slide
  • The future of MIR
    • Library of all recorded music
    • Tasks: organize, search, retrieve, classify, recommend, browse, listen, annotate
    • Examples:
  • Audio MIR Pipeline Signal Processing Machine Learning Human Computer Interaction Hearing Representation Understanding Analysis Reacting Interaction
  • Traditional Music Representations
  • Time domain waveform time pressure Decompose into building blocks time frequency
  • MIDI
    • Musical Instrument Digital Interface
      • Hardware interface
      • File format
    • Note events
      • Duration, discrete pitch, “instrument”
    • Extensions
      • General MIDI
      • Notation, OMR, continuous pitch
  • Symbolic vs Audio MIR Audio Polyphonic Transcription Symbolic Representation (MIDI) MIR Audio Computer Audition MIR Machine Learning Models
  • Feature extraction
  • Timbral Texture Timbre = differentiate sounds of same loudness, pitch Timbral Texture = differentiate mixtures of sounds Global, statistical and fuzzy properties
  • Spectrum t t+1 M M
  • Fourier Transform P=1/f
  • Short Time Fourier Transform STFT Filterbank interpretation Filters Oscillators Amplitude Frequency output
  • Short Time Fourier Transform II t t+1 M M
  • Formants From “Real Time Synthesis for Interactive Applications” P.Cook, A.K Peters Press, used by permission
  • Linear Prediction Coefficients Impulses @ f0 White Noise Source Filter Speech Lossless tubes
  • MPEG Audio Coding (mp3) Analysis Filterbank Psychoacoustic Model Available bits 32 linearly spaced bands Encoder: Slower, Complicated Decoder: Faster, Simpler Perceptual Audio Coding
  • Spectral Shape t M Centroid Rolloff Flux RMS Moments …
  • Summary of Timbral Texture Features
    • Time-Frequency analysis
    • Signal Processing (STFT, DWT)
    • Source-filter (LPC)
    • Perceptual (MP3)
    • Spectral Shape to feature vector
  • Pitch Content
    • Harmony-melody = pitch concepts
    • Music theory score = music
    • Bridge to symbolic MIR
    • Automatic music transcription
    • Non-transcriptive arguments
  • Automatic Pitch Detection P=1/f Time-domain Frequency-domain Perceptual Zerocrossings Autocorrelation analysis = peaks of function correspond to dominant pitches
  • Pitch Histograms Jazz Irish Chroma - folded Height - unfolded
  • Automatic Music Transcription Original Transcribed Mixture signal Noise suppresion Predominant Pitch Estimation Remove detected sound Estimate # of voices
  • Rhythm
    • Movement in time
    • Origins in Poetry (iambic, trochaic)
    • Foot tapping definition
    • Hierarchical semi-periodic structure at multiple levels of detail
    • Links to motion, dance
    • Running vs global
  • Self similarity DWT Autocorrelation Peak Picking Beat Histograms Envelope Extraction
  • Beat Histograms
  • Analysis
    • Classification
    • Segmentation
    • Similarity Retrieval
    • Clustering
    • Thumbnailing
    • Fingerprinting
  • Analysis Overview Musical Piece Trajectory Point
  • Query-by-example Content-based Retrieval Ranked list of k nearest neighbors
  • QBE examples Rock: Beatles Jazz: Bobby Hutserson Funk: Mano negra World: Tibetan singer Computer Music: Paul Lansky Query Match
  • Automatic Musical Genre Classification
    • Categorical music descriptions created by humans
      • Fuzzy boundaries
    • Statistical properties
      • Timbral texture, rhythmic structure, harmonic content
    • Automatic musical genre classification
      • Evaluate musical content features
      • Structure audio collections
  • Genregram demo Dynamic real-time visualization for classification of radio signals
  • Audio segmentation Detect changes of audio texture
  • Multifeature automatic segmenation methodology
    • Time series of feature vector v(t)
    • Detect abrupt changes in trajectory
  • Context & Content Aware User Interfaces
    • Automatic results not perfect
    • Music listening is personal and subjective
    • Browsing vs retrieval
    • “ Overview, zoom and filter, details on demand”, Shneiderman mantra
    • Adapt UI to music content and context
      • Computer Audition
      • Visualization
  • Content and Context
    • Content ~ file
      • Genre, male voice, saxophone
    • Content ~ file, collection
      • Similarity
      • Slow-fast
    • Multiple visualizations
      • Same content
      • Different context
  • Timbregrams Content & Context Similarity + Time Structure Principal Component Analysis Map feature vectors to color
  • Timbrespaces
  • Islands of Music
  • Auditory Scene Analysis