HOW TO PLAY AUDIO FROM A 
COLLEGE OF ENGINEERING TRIVANDRUM 
Mahadev G 
+919496370662 
+919632920773 
mahadevg1492@gmail.com 
MICROCONTROLLER
INTRODUCTION TO SOUND 
• In physics, sound is a vibration that propagates as a typically audible mechanical wave of 
pressure and displacement, through a medium such as air or water.[1] 
• So sound is a collection of 
compression and 
rarefaction of pressure of a 
single or varying frequencies 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
ELECTRONIC SOUND - SPEAKER 
• A loudspeaker (or loud-speaker or speaker) is an electroacoustic transducer; a device 
which converts an electrical audio signal into a corresponding sound.[2] 
• The sound as an electrical signal is a collection of waves of different frequencies, but 
audible sound(for humans) is an electrical signal having frequencies in [20Hz,20kHz] ; but 
peak audible frequencies lie between [2kHz,5kHz] 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
• So sound is pretty much an analog signal; which is an infinite data set. For the signal to be 
understood by microcontrollers it needs to be digital and finite. To make it finite we must 
sample the sound with an interval so that the information of the analog signal is not lost. 
• We’ll go to Nyquist theorem for this, so for a signal not be aliased (corrupted) the minimum 
sampling frequency should be 2*Fbw, where Fbw is the maximum frequency component present 
in the signal. 
• So lets analyse a song. 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
SPECTRUM OF “IN THE END”– LINKIN PARK 
• Spectrum Curtsy Audacity 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
• So looking at the spectrum of the song, there is an attenuation of -30dB at 4kHz and a -37dB 
attenuation at 8kHz, after that the attenuation just increases. The more the attenuation the 
less we hear. 
From this we can infer that a song has less of high frequency components that we actually 
hear. 
• Most of the sound we hear are under the 8kHz band. 
• So using our good old Nyquist sampling theorem a sampling frequency of 8kHz*2 would be 
good enough to reproduce the song.[or any audible sound for that matter] 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
SAMPLED AT 16KHZ 
• So looking at the spectrum of the 
16kHz sampled song, you find that 
the core sound of the song is 
retained without loss of information 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
HOW SOUND IS SAVED.. 
• Now that the sound has become finite, we 
need to save it. Each sample has an analog 
value/amplitude associated to it. This has to 
be captured, this is done by ADC’s and 
saved as 8,16 or 32 bit data depending upon 
the quality of the sound. 
• So putting it all together, we have for each 
second of a song we have 16000 8/16/32 bit 
data. This is how a .wav file is represented. 
This kind of modulation is called pulse code 
modulation. 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
HOW TO RECONSTRUCT THE ANALOG 
SIGNAL FROM THE DIGITAL AUDIO SAMPLES 
• Since each sample stored represents the amplitude of the signal at that particular instant, 
we will use Pulse Width modulation to recreate the signal amplitude. 
• Why pulse width modulation?? 
• It’s the easiest way to produce an analog voltage from a digital signal. 
• How? 
• Explained better in the link http://arduino.cc/en/Tutorial/PWM 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
INTERPOLATING.. 
• Now you have at t=1/16000 having amplitude x1 and t=2/16000 having amplitude x2, to 
make the signal continuous the space between these two times should have something !! 
• So you flood it with a PWM signal having an average of x1 from time time t=1/16000 to 
t=2/16000, so that the average between these two sample times is x1, and similarly 
between t=2/16000 to t=3/16000 with a PWM signal of average x2. 
• For that your PWM frequency should be higher than 16kHz. Choose a frequency 10 times 
greater so that at least 10 waves can fit in between two adjacent samples. 
• Choosing a higher frequency for the PWM signal can greatly improve the noise 
characteristics of your audio since the more the frequency of the PWM signal the more 
the noise is being pushed out of the frequency band we are using. [ How?? See last slide] 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
THE NUTSHELL.. 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
DO WE NEED A LOW PASS FILTER AT THE END? 
• Actually the low pass filter is not required as if 
you look at the equivalent circuit of the standard 
loud speaker, you see that the R1 and C1 forms 
an LPF 
• But an Amplifier might be required if you are 
trying to drive powerful loudspeakers directly 
using a microcontroller output pin. 
• For a standard headphone the microcontroller 
pin drive strength is good enough to reproduce 
sound with good volume. 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
HOW TO GET THE AUDIO DATA FROM A SONG.. 
1. Install the open source app “Audacity” 
http://audacity.sourceforge.net/ 
2. Open any song in it, and export it as 
.wav file (screen shot). 
3. Reopen the .wav file 
4. Goto Analyse menu-> Sample data 
Export 
You have a lot of handles from there to 
export, get the signal amplitude in numbers 
or in dB scale; export to csv or txt file format 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
THE ALGORITHM IN THE 
MICROCONTROLLER.. 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
LIMITING FACTORS 
• The major limiting factor is definitely the code memory of the microcontroller, Each 
sample takes a minimum of 8bit of space, and since we have 16k samples in a second we 
have 16k * 8 bits of space for a second. Which is sort of the max limit for most 
microcontrollers available under 3$. 
• Using a sampling frequency of 8k would double the amount of sound that you could play 
without compromising much on quality. But if your sounds are very low frequency even 6k 
would work, but would not recommend using sub 8k frequencies for this application. 
• So if you want to play a whole song interface an SRAM or SPI-Flash. 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM
THANKS AND COURTESY 
• Hugely helping in understanding sound, modulation, reconstruction and noise is the 
application note by Audio Precision. 
http://users.ece.utexas.edu/~bevans/courses/rtdsp/lectures/10_Data_Conversion/AP_Underst 
anding_PDM_Digital_Audio.pdf 
• Please go through this as it will be an eye opener. 
• Audacity 
• Wikipedia for all definitions [1] [2] and google image search for images ;) 
ROBOCET, COLLEGE OF ENGINEERING 
TRIVANDRUM

How to play audio from a microcontroller

  • 1.
    HOW TO PLAYAUDIO FROM A COLLEGE OF ENGINEERING TRIVANDRUM Mahadev G +919496370662 +919632920773 mahadevg1492@gmail.com MICROCONTROLLER
  • 2.
    INTRODUCTION TO SOUND • In physics, sound is a vibration that propagates as a typically audible mechanical wave of pressure and displacement, through a medium such as air or water.[1] • So sound is a collection of compression and rarefaction of pressure of a single or varying frequencies ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 3.
    ELECTRONIC SOUND -SPEAKER • A loudspeaker (or loud-speaker or speaker) is an electroacoustic transducer; a device which converts an electrical audio signal into a corresponding sound.[2] • The sound as an electrical signal is a collection of waves of different frequencies, but audible sound(for humans) is an electrical signal having frequencies in [20Hz,20kHz] ; but peak audible frequencies lie between [2kHz,5kHz] ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 4.
    • So soundis pretty much an analog signal; which is an infinite data set. For the signal to be understood by microcontrollers it needs to be digital and finite. To make it finite we must sample the sound with an interval so that the information of the analog signal is not lost. • We’ll go to Nyquist theorem for this, so for a signal not be aliased (corrupted) the minimum sampling frequency should be 2*Fbw, where Fbw is the maximum frequency component present in the signal. • So lets analyse a song. ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 5.
    SPECTRUM OF “INTHE END”– LINKIN PARK • Spectrum Curtsy Audacity ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 6.
    • So lookingat the spectrum of the song, there is an attenuation of -30dB at 4kHz and a -37dB attenuation at 8kHz, after that the attenuation just increases. The more the attenuation the less we hear. From this we can infer that a song has less of high frequency components that we actually hear. • Most of the sound we hear are under the 8kHz band. • So using our good old Nyquist sampling theorem a sampling frequency of 8kHz*2 would be good enough to reproduce the song.[or any audible sound for that matter] ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 7.
    SAMPLED AT 16KHZ • So looking at the spectrum of the 16kHz sampled song, you find that the core sound of the song is retained without loss of information ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 8.
    HOW SOUND ISSAVED.. • Now that the sound has become finite, we need to save it. Each sample has an analog value/amplitude associated to it. This has to be captured, this is done by ADC’s and saved as 8,16 or 32 bit data depending upon the quality of the sound. • So putting it all together, we have for each second of a song we have 16000 8/16/32 bit data. This is how a .wav file is represented. This kind of modulation is called pulse code modulation. ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 9.
    HOW TO RECONSTRUCTTHE ANALOG SIGNAL FROM THE DIGITAL AUDIO SAMPLES • Since each sample stored represents the amplitude of the signal at that particular instant, we will use Pulse Width modulation to recreate the signal amplitude. • Why pulse width modulation?? • It’s the easiest way to produce an analog voltage from a digital signal. • How? • Explained better in the link http://arduino.cc/en/Tutorial/PWM ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 10.
    INTERPOLATING.. • Nowyou have at t=1/16000 having amplitude x1 and t=2/16000 having amplitude x2, to make the signal continuous the space between these two times should have something !! • So you flood it with a PWM signal having an average of x1 from time time t=1/16000 to t=2/16000, so that the average between these two sample times is x1, and similarly between t=2/16000 to t=3/16000 with a PWM signal of average x2. • For that your PWM frequency should be higher than 16kHz. Choose a frequency 10 times greater so that at least 10 waves can fit in between two adjacent samples. • Choosing a higher frequency for the PWM signal can greatly improve the noise characteristics of your audio since the more the frequency of the PWM signal the more the noise is being pushed out of the frequency band we are using. [ How?? See last slide] ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 11.
    THE NUTSHELL.. ROBOCET,COLLEGE OF ENGINEERING TRIVANDRUM
  • 12.
    DO WE NEEDA LOW PASS FILTER AT THE END? • Actually the low pass filter is not required as if you look at the equivalent circuit of the standard loud speaker, you see that the R1 and C1 forms an LPF • But an Amplifier might be required if you are trying to drive powerful loudspeakers directly using a microcontroller output pin. • For a standard headphone the microcontroller pin drive strength is good enough to reproduce sound with good volume. ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 13.
    HOW TO GETTHE AUDIO DATA FROM A SONG.. 1. Install the open source app “Audacity” http://audacity.sourceforge.net/ 2. Open any song in it, and export it as .wav file (screen shot). 3. Reopen the .wav file 4. Goto Analyse menu-> Sample data Export You have a lot of handles from there to export, get the signal amplitude in numbers or in dB scale; export to csv or txt file format ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 14.
    THE ALGORITHM INTHE MICROCONTROLLER.. ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 15.
    LIMITING FACTORS •The major limiting factor is definitely the code memory of the microcontroller, Each sample takes a minimum of 8bit of space, and since we have 16k samples in a second we have 16k * 8 bits of space for a second. Which is sort of the max limit for most microcontrollers available under 3$. • Using a sampling frequency of 8k would double the amount of sound that you could play without compromising much on quality. But if your sounds are very low frequency even 6k would work, but would not recommend using sub 8k frequencies for this application. • So if you want to play a whole song interface an SRAM or SPI-Flash. ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM
  • 16.
    THANKS AND COURTESY • Hugely helping in understanding sound, modulation, reconstruction and noise is the application note by Audio Precision. http://users.ece.utexas.edu/~bevans/courses/rtdsp/lectures/10_Data_Conversion/AP_Underst anding_PDM_Digital_Audio.pdf • Please go through this as it will be an eye opener. • Audacity • Wikipedia for all definitions [1] [2] and google image search for images ;) ROBOCET, COLLEGE OF ENGINEERING TRIVANDRUM