Digital Signal Processing through  Speech, Hearing, and Python                     Mel Chua                    PyCon 2013 ...
Agenda●   Introduction●   Fourier transforms, spectrums, and spectrograms●   Playtime!●   SANITY BREAK●   Nyquist, samplin...
Whats signal processing?●   Usually an upper-level undergraduate engineering    class●   Prerequisites: circuit theory, di...
ThereforeWell skip a lot of stuff.
We will not...●   Do circuit theory, differential equations,    MATLAB programming, etc, etc...●   Work with images●   Wri...
We will...●   Play with audio●   Visualize audio●   Generate and record audio●   In general, play with audio●   Do a lot o...
Side notes●   This is based on a graduate class teaching    signal processing to audiology majors●   Weve had half a semes...
Introduction: Trig In One Slide
Sampling
Lets write some code.Open up the terminal and follow along.We assume you have a file called flute.wav inthe directory you ...
Import the libraries we need...from numpy import *
...and create some data.   Here were making a signal   consisting of 2 sine waves       (1250Hz and 625Hz)    sampled at a...
What does this look like?    Lets plot it and find out.import matplotlib.pyplot as pyplotpyplot.plot(sig)pyplot.savefig(si...
sig.png
While were at it, lets define agraphing function so we dont need       to do this all again.def makegraph(data, filename):...
Our first plot showed the signal inthe time domain. We want to see it      in the frequency domain. A numpy function that ...
fft0.png
Thats a start.We had 2 frequencies in the signal,and were seeing 2 spikes here, so     that seems reasonable.   But we did...
Thats because the fourier transform gave us a complex output – so we need to take the magnitude of the         complex out...
fft1.png
But this is displaying raw poweroutput, and we usually think of audio    volume in terms of decibels. Wikipedia tells us d...
fft2.png
We see our 2 pure tones showing up as 2 peaks – this is great. Thejaggedness of the rest of the signal    is quantization ...
Answer: the numpy fft function goes    from 0-5000Hz by default. This means the x-axis markerscorrespond to values of 0-50...
The two peaks are at 16 and 32. (5000/128)*16 = close to 625Hz(5000/128)*32 = close to 1250Hz...which are our 2 original t...
Another visualization: spectrogram
Generate and plot a spectrogram...from pylab import specgrampyplot.clf()sgram = specgram(sig)pyplot.savefig(sgram.png)
sgram.png
Do you see how the spectrogram is sort of like our last plot, extruded  forward out of the screen, and looked down upon fr...
Now lets do this with a morecomplex sound. Well need to use a  library to read/write .wav files. import scipy from scipy.i...
Lets define a function to get thedata from the .wav file, and use it.def getwavdata(file):    return scipy.io.wavfile.read...
Hang on! How do we make sureweve got the right data? We couldwrite it back to a .wav file and make     sure they sound the...
Now lets see what this looks like inthe time domain. Weve got a lot of data points, so well only plot the   beginning of t...
flute.png
What does this look like in the     frequency domain?audiofft = fft.rfft(audio)audiofft = abs(audiofft)audiofft = 10*log10...
flutefft.png
This is much more complex. We can  see harmonics on the left side.Perhaps this will be clearer if we plot         it as a ...
flutespectro.png
You can see the base note of theflute (a 494Hz B) in dark red at the bottom, and lighter red harmonics             above i...
Your Turn: Challenge●   That first signal we made? Make a wav of it.●   Hint: you may need to generate more samples.●   Bo...
Your turn: Challenge 2●   Record some sounds on your computer●   Do an FFT on it●   Plot the spectrum●   Plot the spectrog...
Sanity break?Come back in 20 minutes, OR: stay for a demoof the wave library (aka “why were using scipy”)note: wavlibrarye...
Things people found during breakProblem #1: When trying to generate a pure-tone(sine wave) .wav file, the sound is not aud...
More things people foundProblem #2: The sine wave is audible in the.wav file, but sounds like white noise ratherthan a pur...
Coercing to int16# option 1: rewrite the makewav function# so it includes type coerciondef savewav(data, outfile, samplera...
Post-break: Agenda●   Introduction●   Fourier transforms, spectrums, and spectrograms●   Playtime!●   SANITY BREAK●   Nyqu...
Nyquist: sampling and aliasing●   The sample rate matters.●   Higher is better.●   There is a tradeoff.
Sampling
Aliasing
Nyquist-Shannon sampling theorem        (Shannons version) If a function x(t) contains no frequencies higher    than B her...
Nyquist-Shannon sampling theorem          (haiku version)           lowest sample rate       for sound with highest freq F...
Lets explore the effects of samplerate. When you listen to these .wav files, note that doubling/halfing the     sample rat...
Your turn●   Take some of your signals from earlier●   Try out different sample rates and see what    happens    ●   Hint:...
What do aliases alias at?●   They reflect around the sampling frequency●   Example: 40kHz sampling frequency●   Implies 20...
Agenda●   Introduction●   Fourier transforms, spectrums, and spectrograms●   Playtime!●   SANITY BREAK●   Nyquist, samplin...
Remember this?
Well, these are filters.
Noise and filtering it●   High pass●   Low pass●   Band pass●   Band stop●   Notch●   (there are many more, but these basi...
Notice that all these filters work in     the frequency domain.  We went from the time to thefrequency domain using an FFT...
We can go back from the frequencyto the time domain using an inverse             FFT (IFFT).reflute.wav should sound ident...
Lets look at flute.wav in the    frequency domain again...# plot on decibel (dB) scalemakegraph(10*log10(abs(flutefft)), f...
What if we wanted to cut off all thefrequencies higher than the 5000th      index? (low-pass filter)
Implement and plot the low-pass filter in the frequency domain...# zero out all frequencies above# the 5000th index# (BONU...
flutefft_lowpassed.png
Going from frequency back to time    domain so we can listenreflute = fft.irfft(flutefft, len(audio))reflute_coerced = arr...
What does the spectrogram of thelow-passed flute sound look like?pyplot.clf()sgram = specgram(audio)pyplot.savefig(reflute...
reflutespectro.png
Compare to flutespectro.png
Your turn●   Take some of your .wav files from earlier, and    try making...    ●   Low-pass or high-pass filters    ●   B...
Agenda●   Introduction●   Fourier transforms, spectrums, and spectrograms●   Playtime!●   SANITY BREAK●   Nyquist, samplin...
Formants
Formants    f1     f2a   1000 Hz 1400 Hzi   320 Hz    2500 Hzu   320 Hz    800 Hze   500 Hz    2300 Hzo   500 Hz    1000 Hz
Vocodinghttp://en.wikipedia.org/wiki/Vocoder
Bode plot (high pass)
Another Bode plot...
Credits and Resources●   http://onlamp.com/pub/a/python/2001/01/31/numerically.html●   http://jeremykun.com/2012/07/18/the...
Upcoming SlideShare
Loading in …5
×

Digital signal processing through speech, hearing, and Python

20,172 views

Published on

Slides from PyCon 2013 tutorial reformatted for self-study. Code at https://github.com/mchua/pycon-sigproc, original description follows: Why do pianos sound different from guitars? How can we visualize how deafness affects a child's speech? These are signal processing questions, traditionally tackled only by upper-level engineering students with MATLAB and differential equations; we're going to do it with algebra and basic Python skills. Based on a signal processing class for audiology graduate students, taught by a deaf musician.

Published in: Technology
  • Great tut!

    One problem: when I get to the step where we check that the wav file sounds the same after we've put it through the getwavdata (slide 32), when I listen to the verification copy, it just comes out like white noise without any hint of the original sound that I can discern. The graphs of the two waveforms look identical, but the copy is just noise.

    In an attempt to remedy this, I added a line in my makewav function to coerce the data to int16, like this:

    def makewav(data, outfile, samplerate):
    scaled = array(data, dtype = int16)
    wave.write(outfile, samplerate, scaled)

    but it fixes nothing. What could be the problem? I looked everywhere.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Digital signal processing through speech, hearing, and Python

  1. 1. Digital Signal Processing through Speech, Hearing, and Python Mel Chua PyCon 2013 This tutorial was designed to be run on a free pythonanywhere.com Python 2.7 terminal.If you want to run the code directly on your machine, youll need python 2.7.x, numpy, scipy, and matplotlib. Either way, youll need a .wav file to play with (preferably 1-2 seconds long).
  2. 2. Agenda● Introduction● Fourier transforms, spectrums, and spectrograms● Playtime!● SANITY BREAK● Nyquist, sampling and aliasing● Noise and filtering it● (if time permits) formants, vocoding, shifting, etc.● Recap: so, what did we do?
  3. 3. Whats signal processing?● Usually an upper-level undergraduate engineering class● Prerequisites: circuit theory, differential equations, MATLAB programming, etc, etc...● About 144 hours worth of work (3 hours per credit per week, 3 credits, 16 weeks)● Were going to do this in 3 hours (1/48th the time)● I assume you know basic Python and therefore algebra
  4. 4. ThereforeWell skip a lot of stuff.
  5. 5. We will not...● Do circuit theory, differential equations, MATLAB programming, etc, etc...● Work with images● Write tons of code from scratch● See rigorous proofs, math, and/or definitions
  6. 6. We will...● Play with audio● Visualize audio● Generate and record audio● In general, play with audio● Do a lot of “group challenge time!”
  7. 7. Side notes● This is based on a graduate class teaching signal processing to audiology majors● Weve had half a semester to do everything (about 70 hours)● Im not sure how far we will get today
  8. 8. Introduction: Trig In One Slide
  9. 9. Sampling
  10. 10. Lets write some code.Open up the terminal and follow along.We assume you have a file called flute.wav inthe directory you are running the terminal from.
  11. 11. Import the libraries we need...from numpy import *
  12. 12. ...and create some data. Here were making a signal consisting of 2 sine waves (1250Hz and 625Hz) sampled at a 10kHz rate.x = arange(256.0)sin1 = sin(2*pi*(1250.0/10000.0)*x)sin2 = sin(2*pi*(625.0/10000.0)*x)sig = sin1 + sin2
  13. 13. What does this look like? Lets plot it and find out.import matplotlib.pyplot as pyplotpyplot.plot(sig)pyplot.savefig(sig.png)pyplot.clf() # clear plot
  14. 14. sig.png
  15. 15. While were at it, lets define agraphing function so we dont need to do this all again.def makegraph(data, filename): pyplot.clf() pyplot.plot(data) pyplot.savefig(filename)
  16. 16. Our first plot showed the signal inthe time domain. We want to see it in the frequency domain. A numpy function that implementsan algorithm called the Fast FourierTransform (FFT) can take us there. data = fft.rfft(sig) # note that we use rfft because # the values of sig are real makegraph(data, fft0.png)
  17. 17. fft0.png
  18. 18. Thats a start.We had 2 frequencies in the signal,and were seeing 2 spikes here, so that seems reasonable. But we did get this warning.>>> makegraph(data, fft0.png)/usr/local/lib/python2.7/site-packages/numpy/core/numeric.py:320:ComplexWarning:Casting complex values to real discards theimaginary partreturn array(a, dtype, copy=False, order=order)
  19. 19. Thats because the fourier transform gave us a complex output – so we need to take the magnitude of the complex output... data = abs(data) makegraph(data, fft1.png) # more detail: sigproc-outline.py # lines 42-71
  20. 20. fft1.png
  21. 21. But this is displaying raw poweroutput, and we usually think of audio volume in terms of decibels. Wikipedia tells us decibels (dB) are the original signal plotted on a 10*log10 y-axis, so... data = 10*log10(data) makegraph(data, fft2.png)
  22. 22. fft2.png
  23. 23. We see our 2 pure tones showing up as 2 peaks – this is great. Thejaggedness of the rest of the signal is quantization noise, a.k.a. numerical error, because were doing this with approximations.Question: whats the relationship of the x-axis of the graph and the frequency of the signal?
  24. 24. Answer: the numpy fft function goes from 0-5000Hz by default. This means the x-axis markerscorrespond to values of 0-5000Hz divided into 128 slices.5000/128 = 39.0625 Hz per marker
  25. 25. The two peaks are at 16 and 32. (5000/128)*16 = close to 625Hz(5000/128)*32 = close to 1250Hz...which are our 2 original tones.
  26. 26. Another visualization: spectrogram
  27. 27. Generate and plot a spectrogram...from pylab import specgrampyplot.clf()sgram = specgram(sig)pyplot.savefig(sgram.png)
  28. 28. sgram.png
  29. 29. Do you see how the spectrogram is sort of like our last plot, extruded forward out of the screen, and looked down upon from above?Thats a spectrogram. Time is on the x-axis, frequency on the y-axis,and amplitude is marked by color.
  30. 30. Now lets do this with a morecomplex sound. Well need to use a library to read/write .wav files. import scipy from scipy.io.wavfile import read
  31. 31. Lets define a function to get thedata from the .wav file, and use it.def getwavdata(file): return scipy.io.wavfile.read(file)[1]audio = getwavdata(flute.wav)# more detail on scipy.io.wavfile.read# in sigproc-outline.py, lines 117-123
  32. 32. Hang on! How do we make sureweve got the right data? We couldwrite it back to a .wav file and make sure they sound the same. from scipy.io.wavfile import write def makewav(data, outfile, samplerate): scipy.io.wavfile.write(outfile, samplerate, data) makewav(audio, reflute.wav, 44100) # 44100Hz is the default CD sampling rate, and # what most .wav files will use.
  33. 33. Now lets see what this looks like inthe time domain. Weve got a lot of data points, so well only plot the beginning of the signal here. makegraph(audio[0:1024], flute.png)
  34. 34. flute.png
  35. 35. What does this look like in the frequency domain?audiofft = fft.rfft(audio)audiofft = abs(audiofft)audiofft = 10*log10(audiofft)makegraph(audiofft, flutefft.png)
  36. 36. flutefft.png
  37. 37. This is much more complex. We can see harmonics on the left side.Perhaps this will be clearer if we plot it as a spectrogram. pyplot.clf() sgram = specgram(audio) pyplot.savefig(flutespectro.png)
  38. 38. flutespectro.png
  39. 39. You can see the base note of theflute (a 494Hz B) in dark red at the bottom, and lighter red harmonics above it.http://www.bgfl.org/custom/resources_ftp/client_ftp/ks2/music/piano/flute.htm http://en.wikipedia.org/wiki/Piano_key_frequencies
  40. 40. Your Turn: Challenge● That first signal we made? Make a wav of it.● Hint: you may need to generate more samples.● Bonus: the flute played a B (494Hz) – generate a single sinusoid of that.● Megabonus: add the flute and sinusoid signals and play them together
  41. 41. Your turn: Challenge 2● Record some sounds on your computer● Do an FFT on it● Plot the spectrum● Plot the spectrogram● Bonus: add the flute and your sinusoid and plot their spectrum and spectrogram together – whats the x scale?● Bonus: whats the difference between fft/rfft?● Bonus: numpy vs scipy fft libraries?● Bonus: try the same sound at different frequencies (example: vowels)
  42. 42. Sanity break?Come back in 20 minutes, OR: stay for a demoof the wave library (aka “why were using scipy”)note: wavlibraryexample.py contains thewave library demo (which we didnt get to in theactual workshop)
  43. 43. Things people found during breakProblem #1: When trying to generate a pure-tone(sine wave) .wav file, the sound is not audible.Underlying reason: The amplitude of a sine wave is1, which is really, really tiny. Compare that to theamplitude of the data you get when you read in theflute.wav file – over 20,000.Solution: Amplify your sine wave by multiplying it bya large number (20,000 is good) before writing it tothe .wav file.
  44. 44. More things people foundProblem #2: The sine wave is audible in the.wav file, but sounds like white noise ratherthan a pure tone.Underlying reason: scipy.io.wavfile.write()expects an int16 datatype, and you may begiving it a float instead.Solution: Coerce your data to int16 (see nextslide).
  45. 45. Coercing to int16# option 1: rewrite the makewav function# so it includes type coerciondef savewav(data, outfile, samplerate): out_data = array(data, dtype=int16) scipy.io.wavfile.write(outfile, samplerate, out_data)# option 2: generate the sine wave as int16# which allows you to use the original makewav functiondef makesinwav(freq, amplitude, sampling_freq, num_samples): return array(sin(2*pi*freq/float(sampling_freq) *arange(float(num_samples)))*amplitude,dtype=int16)
  46. 46. Post-break: Agenda● Introduction● Fourier transforms, spectrums, and spectrograms● Playtime!● SANITY BREAK● Nyquist, sampling and aliasing● Noise and filtering it● (if time permits) formants, vocoding, shifting, etc.● Recap: so, what did we do?
  47. 47. Nyquist: sampling and aliasing● The sample rate matters.● Higher is better.● There is a tradeoff.
  48. 48. Sampling
  49. 49. Aliasing
  50. 50. Nyquist-Shannon sampling theorem (Shannons version) If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.
  51. 51. Nyquist-Shannon sampling theorem (haiku version) lowest sample rate for sound with highest freq F equals 2 times F
  52. 52. Lets explore the effects of samplerate. When you listen to these .wav files, note that doubling/halfing the sample rate moves the sound up/down an octave, respectively.audio = getwavdata(flute.wav)makewav(audio, fluteagain44100.wav, 44100)makewav(audio, fluteagain22000.wav, 22000)makewav(audio, fluteagain88200.wav, 88200)
  53. 53. Your turn● Take some of your signals from earlier● Try out different sample rates and see what happens ● Hint: this is easier with simple sinusoids at first ● Hint: determine the highest frequency (your Nyquist frequency), double it (thats your highest sampling rate) and try sampling above, below, and at that sampling frequency● What do you find?
  54. 54. What do aliases alias at?● They reflect around the sampling frequency● Example: 40kHz sampling frequency● Implies 20kHz Nyquist frequency● So if we try to play a 23kHz frequency...● ...itll sound like 17kHz. Your turn: make this happen with pure sinusoids Bonus: with non-pure sinusoids
  55. 55. Agenda● Introduction● Fourier transforms, spectrums, and spectrograms● Playtime!● SANITY BREAK● Nyquist, sampling and aliasing● Noise and filtering it● (if time permits) formants, vocoding, shifting, etc.● Recap: so, what did we do?
  56. 56. Remember this?
  57. 57. Well, these are filters.
  58. 58. Noise and filtering it● High pass● Low pass● Band pass● Band stop● Notch● (there are many more, but these basics)
  59. 59. Notice that all these filters work in the frequency domain. We went from the time to thefrequency domain using an FFT.# get audio (again) in the time domainaudio = getwavdata(flute.wav)# convert to frequency domainflutefft = fft.rfft(audio)
  60. 60. We can go back from the frequencyto the time domain using an inverse FFT (IFFT).reflute.wav should sound identical to flute.wav. reflute= fft.irfft(flutefft, len(audio)) reflute_coerced = array(reflute, dtype=int16) # coerce to int16 makewav(reflute_coerced, fluteregenerated.wav, 44100)
  61. 61. Lets look at flute.wav in the frequency domain again...# plot on decibel (dB) scalemakegraph(10*log10(abs(flutefft)), flutefftdb.png)
  62. 62. What if we wanted to cut off all thefrequencies higher than the 5000th index? (low-pass filter)
  63. 63. Implement and plot the low-pass filter in the frequency domain...# zero out all frequencies above# the 5000th index# (BONUS: what frequency does this# correspond to?)flutefft[5000:] = 0# plot on decibel (dB) scalemakegraph(10*log10(abs(flutefft)), flutefft_lowpassed.png)
  64. 64. flutefft_lowpassed.png
  65. 65. Going from frequency back to time domain so we can listenreflute = fft.irfft(flutefft, len(audio))reflute_coerced = array(reflute, dtype=int16) # coerce itmakewav(reflute_coerced, flute_lowpassed.wav, 44100)
  66. 66. What does the spectrogram of thelow-passed flute sound look like?pyplot.clf()sgram = specgram(audio)pyplot.savefig(reflutespectro.png)
  67. 67. reflutespectro.png
  68. 68. Compare to flutespectro.png
  69. 69. Your turn● Take some of your .wav files from earlier, and try making... ● Low-pass or high-pass filters ● Band-pass, band-stop, or notch filters ● Filters with varying amounts of rolloff
  70. 70. Agenda● Introduction● Fourier transforms, spectrums, and spectrograms● Playtime!● SANITY BREAK● Nyquist, sampling and aliasing● Noise and filtering it● (if time permits) formants, vocoding, shifting, etc.● Recap: so, what did we do?
  71. 71. Formants
  72. 72. Formants f1 f2a 1000 Hz 1400 Hzi 320 Hz 2500 Hzu 320 Hz 800 Hze 500 Hz 2300 Hzo 500 Hz 1000 Hz
  73. 73. Vocodinghttp://en.wikipedia.org/wiki/Vocoder
  74. 74. Bode plot (high pass)
  75. 75. Another Bode plot...
  76. 76. Credits and Resources● http://onlamp.com/pub/a/python/2001/01/31/numerically.html● http://jeremykun.com/2012/07/18/the-fast-fourier-transform/● http://lac.linuxaudio.org/2011/papers/40.pdf● Farrah Fayyaz, Purdue University● signalprocessingforaudiologists.wordpress.com● Wikipedia (for images)● Tons of Python library documentation

×