• Like
  • Save
Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Download Download Presentation Transcript

    • Digital Music & Music Processing George Tzanetakis PostDoctoral Fellow Computer Science Department Carnegie Mellon University [email_address] http://www.cs.cmu.edu/~gtzan
    • Overview
      • Music Information Retrieval (MIR) and Computer Audition
        • Motivation
        • Techniques
        • Applications
      • Computer Music and Sound Synthesis
        • Examples, demos
    • MIR Music History 9000 B.C 1000 1700 1877 1960 2002
    • Music
      • 4 million recorded CD tracks
      • 4000 CDs / month
      • Mp3 bandwidth %
      • Global
      • Pervasive
      • Persistent
      • Why ?
    • The future of MIR
      • Library of all recorded music
      • Tasks: organize, search, retrieve, classify, recommend, browse, listen, annotate
      • Examples:
    • Audio MIR Pipeline Signal Processing Machine Learning Human Computer Interaction Hearing Representation Understanding Analysis Reacting Interaction
    • Traditional Music Representations
    • Time domain waveform time pressure Decompose into building blocks time frequency
    • MIDI
      • Musical Instrument Digital Interface
        • Hardware interface
        • File format
      • Note events
        • Duration, discrete pitch, “instrument”
      • Extensions
        • General MIDI
        • Notation, OMR, continuous pitch
    • Symbolic vs Audio MIR Audio Polyphonic Transcription Symbolic Representation (MIDI) MIR Audio Computer Audition MIR Machine Learning Models
    • Feature extraction
    • Timbral Texture Timbre = differentiate sounds of same loudness, pitch Timbral Texture = differentiate mixtures of sounds Global, statistical and fuzzy properties
    • Spectrum t t+1 M M
    • Fourier Transform P=1/f
    • Short Time Fourier Transform STFT Filterbank interpretation Filters Oscillators Amplitude Frequency output
    • Short Time Fourier Transform II t t+1 M M
    • Formants From “Real Time Synthesis for Interactive Applications” P.Cook, A.K Peters Press, used by permission
    • Linear Prediction Coefficients Impulses @ f0 White Noise Source Filter Speech Lossless tubes
    • MPEG Audio Coding (mp3) Analysis Filterbank Psychoacoustic Model Available bits 32 linearly spaced bands Encoder: Slower, Complicated Decoder: Faster, Simpler Perceptual Audio Coding
    • Spectral Shape t M Centroid Rolloff Flux RMS Moments …
    • Summary of Timbral Texture Features
      • Time-Frequency analysis
      • Signal Processing (STFT, DWT)
      • Source-filter (LPC)
      • Perceptual (MP3)
      • Spectral Shape to feature vector
    • Pitch Content
      • Harmony-melody = pitch concepts
      • Music theory score = music
      • Bridge to symbolic MIR
      • Automatic music transcription
      • Non-transcriptive arguments
    • Automatic Pitch Detection P=1/f Time-domain Frequency-domain Perceptual Zerocrossings Autocorrelation analysis = peaks of function correspond to dominant pitches
    • Pitch Histograms Jazz Irish Chroma - folded Height - unfolded
    • Automatic Music Transcription Original Transcribed Mixture signal Noise suppresion Predominant Pitch Estimation Remove detected sound Estimate # of voices
    • Rhythm
      • Movement in time
      • Origins in Poetry (iambic, trochaic)
      • Foot tapping definition
      • Hierarchical semi-periodic structure at multiple levels of detail
      • Links to motion, dance
      • Running vs global
    • Self similarity DWT Autocorrelation Peak Picking Beat Histograms Envelope Extraction
    • Beat Histograms
    • Analysis
      • Classification
      • Segmentation
      • Similarity Retrieval
      • Clustering
      • Thumbnailing
      • Fingerprinting
    • Analysis Overview Musical Piece Trajectory Point
    • Query-by-example Content-based Retrieval Ranked list of k nearest neighbors
    • QBE examples Rock: Beatles Jazz: Bobby Hutserson Funk: Mano negra World: Tibetan singer Computer Music: Paul Lansky Query Match
    • Automatic Musical Genre Classification
      • Categorical music descriptions created by humans
        • Fuzzy boundaries
      • Statistical properties
        • Timbral texture, rhythmic structure, harmonic content
      • Automatic musical genre classification
        • Evaluate musical content features
        • Structure audio collections
    • Genregram demo Dynamic real-time visualization for classification of radio signals
    • Audio segmentation Detect changes of audio texture
    • Multifeature automatic segmenation methodology
      • Time series of feature vector v(t)
      • Detect abrupt changes in trajectory
    • Context & Content Aware User Interfaces
      • Automatic results not perfect
      • Music listening is personal and subjective
      • Browsing vs retrieval
      • “ Overview, zoom and filter, details on demand”, Shneiderman mantra
      • Adapt UI to music content and context
        • Computer Audition
        • Visualization
    • Content and Context
      • Content ~ file
        • Genre, male voice, saxophone
      • Content ~ file, collection
        • Similarity
        • Slow-fast
      • Multiple visualizations
        • Same content
        • Different context
    • Timbregrams Content & Context Similarity + Time Structure Principal Component Analysis Map feature vectors to color
    • Timbrespaces
    • Islands of Music
    • Auditory Scene Analysis