Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions

Unit 3
Audio Signal Processing Basics, mirtoolbox contains many
useful audio processing library functions,
VOICEBOX: Speech Processing Toolbox for MATLAB,
Audio processing in Matlab.

Audio Signal Processing Basics
What is audio data?
• Audio data represents analog sounds in a digital form, preserving the main
properties of the original. As we know from school lessons in physics, a
sound is a wave of vibrations traveling through a medium like air or water
and finally reaching our ears.
• It has three key characteristics to be considered when analyzing audio data
— time period, amplitude, and frequency.
• Time period is how long a certain sound lasts or, in other words, how many
seconds it takes to complete one cycle of vibrations

• Amplitude is the sound intensity measured in decibels (dB) which we
perceive as loudness.
• Frequency measured in Hertz (Hz) indicates how many sound vibrations
happen per second. People interpret frequency as low or high pitch.
• While frequency is an objective parameter, the pitch is subjective.
• The human hearing range lies between 20 and 20,000 Hz. Scientists claim
that most people perceive as low pitch all sounds below 500 Hz — like the
plane engine roar. In turn, high pitch for us is everything beyond 2,000 Hz
(for example, a whistle.)

Audio data file formats
• Similar to texts and images, audio is unstructured data meaning that it’s not
arranged in tables with connected rows and columns. Instead, you can store audio
in various file formats like
• WAV or WAVE (Waveform Audio File Format) developed by Microsoft and IBM.
It’s a lossless or raw file format meaning that it doesn’t compress the original sound
recording;
• AIFF (Audio Interchange File Format) developed by Apple. Like WAV, it works with
uncompressed audio;
• FLAC (Free Lossless Audio Codec) developed by Xiph.Org Foundation that offers
free multimedia formats and software tools. FLAC files are compressed without
losing sound quality.

• MP3 (mpeg (Moving Picture Experts Group )-1 audio layer 3) developed by
the Fraunhofer Society in Germany and supported globally. It’s the most
common file format since it makes music easy to store on portable devices
and send back and forth via the Internet. Though mp3 compresses audio, it
still offers an acceptable sound quality.
• We recommend using AIFF and WAV files for analysis as they don’t miss
any information present in analog sounds. At the same time, keep in mind
that neither of those and other audio files can be fed directly to machine
learning models. To make audio understandable for computers, data must
undergo a transformation.

What are audio signals?
• Audio signals are signals that vibrate in the audible frequency range. When
someone talks, it generates air pressure signals; the ear takes in these air
pressure differences and communicates with the brain.
• That's how the brain helps a person recognize that the signal is speech and
understand what someone is saying.
• There are a lot of MATLAB tools to perform audio processing, but not as
many exist in Python.
• Before we get into some of the tools that can be used to process audio signals
in Python, let's examine some of the features of audio that apply to audio
processing and machine learning.

• Some data features and transformations that are important in speech and
audio processing are Mel-frequency cepstral coefficients (MFCCs),
Gammatone-frequency cepstral coefficients (GFCCs), Linear-prediction
cepstral coefficients (LFCCs), Bark-frequency cepstral coefficients (BFCCs),
Power-normalized cepstral coefficients (PNCCs), spectrum, cepstrum,
spectrogram, and more.
• We can use some of these features directly and extract features from some
others, like spectrum, to train a machine learning model.

What are spectrum and cepstrum?
• Spectrum and cepstrum are two particularly important features in audio
processing.

• Mathematically, a spectrum is the Fourier transform of a signal. A Fourier
transform converts a time-domain signal to the frequency domain.
• In other words, a spectrum is the frequency domain representation of the
input audio's time-domain signal.
• A cepstrum is formed by taking the log magnitude of the spectrum followed
by an inverse Fourier transform.
• This results in a signal that's neither in the frequency domain (because we
took an inverse Fourier transform) nor in the time domain (because we
took the log magnitude prior to the inverse Fourier transform).
• The domain of the resulting signal is called the quefrency.

• To start, we want pyAudioProcessing to classify audio into three categories:
speech, music, or birds.

• Using a small dataset (50 samples for training per class) and without any
fine-tuning, we can gauge the potential of this classification model to
identify audio categories.

What is audio analysis?
• Audio analysis is a process of transforming, exploring, and interpreting audio
signals recorded by digital devices.
• Aiming at understanding sound data, it applies a range of technologies,
including state-of-the-art deep learning algorithms.
• Audio analysis has already gained broad adoption in various industries, from
entertainment to healthcare to manufacturing.

Speech recognition
• Speech recognition is about the ability of computers to distinguish spoken
words with natural language processing techniques.
• It allows us to control PCs, smartphones, and other devices via voice
commands and dictate texts to machines instead of manual entering.
• Siri by Apple, Alexa by Amazon, Google Assistant, and Cortana by
Microsoft are popular examples of how deeply the technology has
penetrated into our daily lives.

Voice recognition
• Voice recognition is meant to identify people by the unique characteristics
of their voices rather than to isolate separate words.
• The approach finds applications in security systems for user authentication.
• For instance, Nuance Gatekeeper biometric engine verifies employees and
customers by their voices in the banking sector.

Music recognition
• Music recognition is a popular feature of such apps as Shazam that helps
you identify unknown songs from a short sample.
• Another application of musical audio analysis is genre classification: Say,
Spotify runs its proprietary algorithm to group tracks into categories (their
database holds more than 5,000 genres)

Environmental sound recognition
• Environmental sound recognition focuses on the identification of noises
around us, promising a bunch of advantages to automotive and
manufacturing industries. It’s vital for understanding surroundings in IoT
applications.
• Systems like Audio Analytic ‘listen’ to the events inside and outside your
car, enabling the vehicle to make adjustments in order to increase a driver’s
safety. Another example is SoundSee technology by Bosch that can analyze
machine noises and facilitate predictive maintenance to monitor equipment
health and prevent costly failures.

• Healthcare is another field where environmental sound recognition comes
in handy.
• It offers a non-invasive type of remote patient monitoring to detect events
like falling.
• Besides that, analysis of coughing, sneezing, snoring, and other sounds can
facilitate pre-screening, identifying a patient’s status, assessing the
infection level in public spaces, and so on.

• A real-life use case of such analysis is Sleep.ai which detects teeth grinding
and snoring sounds during sleep.
• The solution created by AltexSoft for a Dutch healthcare startup helps
dentists identify and monitor bruxism to eventually understand the causes
of this abnormality and treat it.
• No matter what type of sounds you analyze, it all starts with an
understanding of audio data and its specific characteristics.

Audio data analysis steps
• Obtain project-specific audio data stored in standard file formats.
• Prepare data for your machine learning project, using software tools
• Extract audio features from visual representations of sound data.
• Select the machine learning model and train it on audio features.

Audio analysis software
Audacity is a free and open-source audio editor to split recordings, remove
noise, transform waveforms to spectrograms, and label them. Audacity doesn’t
require coding skills.
Tensorflow-io package for preparation and augmentation of audio data lets you
perform a wide range of operations — noise removal, converting waveforms to
spectrograms, frequency, and time masking to make the sound clearly audible,
and more.
Librosa is an open-source Python library that has almost everything you need
for audio and music analysis.

• Audio Toolbox by MathWorks offers numerous instruments for audio data
processing and analysis, from labeling to estimating signal metrics to
extracting certain features.

MIRtoolbox
What is MIRtoolbox?
• Mirtoolbox. MIRtoolbox offers an integrated set of functions written in
Matlab, dedicated to the extraction from audio files of musical features
such as tonality, rhythm, structures, etc.
• The objective is to offer an overview of computational approaches in the area
of Music Information Retrieval.

MIRtoolbox
What features does Mir toolbox have?
• In short, the MIR toolbox allows us to extract data about musical features
dealing with waveform and spectral analysis, tonality, pitch, dynamics,
rhythm, tempo, timbre, and other high-level audio features

MIRtoolbox
What is MATLAB tool?
• MATLAB®
is a programming platform designed specifically for engineers
and scientists to analyze and design systems and products that transform
our world. The heart of MATLAB is the MATLAB language, a matrix-based
language allowing the most natural expression of computational
mathematics.

MIRtoolbox
How many toolboxes are there in MATLAB?
• Access MATLAB Add-On Toolboxes
• Statistics and Machine Learning Toolbox™ (Statistics and Machine Learning
Toolbox)
• Curve Fitting Toolbox™ (Curve Fitting Toolbox)
• Control System Toolbox™ (Control System Toolbox)
• Signal Processing Toolbox™ (Signal Processing Toolbox)
• Mapping Toolbox™ (Mapping Toolbox)

MIRtoolbox
The toolbox is available free of charge under the GNU General Public License.
• This distribution actually includes, besides MIRtoolbox itself, three other
toolboxes:
• the Auditory toolbox, version 2, by Malcolm Slaney,
• the Netlab toolbox, version 3.3, by Ian Nabney,
• the SOM toolbox, version 2.0, by Esa Alhoniemi, Johan Himberg, Jukka
Parviainen and Juha Vesanto.
• MIRtoolbox requires Matlab version 7 and Mathworks' Signal Processing
toolbox.

MIRtoolbox
Why use MATLAB for Audio Processing?
• MATLAB consists of toolboxes used in different domains like Deep Learning,
Machine Learning, Image Processing, etc. Such an example of a toolbox is
the Audio Toolbox.
• The audio toolbox hosts many tweaking for audio files, such as speech
analysis, acoustic measurement, etc. It has a set of predefined algorithms
used for audio Processing, such as equalization and extracting the audio
pitch.

MIRtoolbox
Why use MATLAB for Audio Processing?
• The audio toolbox can be used to import, label, analyze and experiment on
datasets, and these can also be used for training models for machine learning
and deep learning.
• So overall, a host of features can be done using the Audio Toolbox in
MATLAB, which very few software provide.

MIRtoolbox
What are library functions?
A library function is accessed by simply writing the function name, followed
by a list of arguments, which represent the information being passed to the
function. The arguments must be enclosed in parentheses, and separated by
commas: they can be constants, variables, or more complex expressions.

MIRtoolbox
What is library function in MATLAB?
• A shared library is a collection of functions dynamically loaded by an
application at run time. The MATLAB interface supports libraries containing
functions defined in C header files. To call functions in C++ libraries, use the
interface described in Call C++ from MATLAB

MIRToolbox -Library Functions
Blocks
• Subsystem - Group blocks to create model hierarchy
Functions
• Libinfo - Get information about library blocks referenced by model
• gcb - Get path name of current block
• gcbh -Get handle of current block Tools
• Library Browser - Find and add blocks to model
Objects
• LibraryBrowser.LBStandalone - Display, hide, size, and position Simulink
Library Browser

MIRToolbox
Create Custom Library
1. From the Simulink start page, select Blank Library and click Create Library
2. (Optional) Define data types to be used on block interfaces in a Simulink data
dictionary
3. Add blocks to the new library
4. Add annotations or images
5. If you plan to add the library to the Library Browser, you can order the blocks
and annotations in your library
6. If you want the library to appear in the Library Browser, enable the
EnableLBRepository library property before you save the library.
7. Save the library

MIRtoolbox
Create a Sublibrary
If your library contains many blocks, you can group the blocks into subsystems or
separate sublibraries. To create a sublibrary, you create a library of the sublibrary
blocks and reference the library from a Subsystem block in the parent library.
1.In the library you want to add a sublibrary to, add a Subsystem block.
2.Inside the Subsystem block, delete the default input and output ports.
3.If you want, create a mask for the subsystem that displays text or an image that
conveys the sublibrary purpose.
4.In the subsystem block properties, set the OpenFcn callback to the name of the
library you want to reference.

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio processing in
Matlab.
What is voice recognition in MATLAB?
• Voice Recognition system is a method of analyzing the input voice of the
person with the help of its features.
• It then compares it with the features saved in the database for prerecorded
signals.
• It displays an output that tells if any other audio of the same person is present
in the database or not.

Matlab.
What is a voice processing system?
• The computerized handling of voice, which includes voice
store and forward, voice response, voice recognition and text
to speech technologies.

Matlab.
How does voice processing work?
• Voice recognition software on computers requires analog audio to be
converted into digital signals, known as analog-to-digital (A/D) conversion.
• For a computer to decipher a signal, it must have a digital database of words
or syllables as well as a quick process for comparing this data to signals.
• The speech patterns are stored on the hard drive and loaded into memory
when the program is running.
• A comparator checks these stored patterns against the output of the A/D
converter -- an action called pattern recognition.

Matlab.

VOICEBOX: Speech Processing Toolbox for MATLAB, Audio
processing in Matlab.
• Audio also must be processed for clarity, so some devices may filter out
background noise. In some voice recognition systems, certain frequencies in
the audio are emphasized so the device can recognize a voice better.
• Voice recognition systems analyze speech through one of two models: the
hidden Markov model and neural networks.
• The hidden Markov model breaks down spoken words into their phonemes
(characters), while recurrent neural networks use the output from previous
steps to influence the input to the current step.

• As uses for voice recognition technology grow and more users
interact with it, the organizations implementing voice recognition
software will have more data and information to feed into
neural networks for voice recognition systems.
• This improves the capabilities and accuracy of voice recognition
products.

Matlab.
Voice recognition uses
• The uses for voice recognition have grown quickly as AI, machine learning
and consumer acceptance have matured.
Examples of how voice recognition is used include the following:
• Virtual assistants. Siri, Alexa and Google virtual assistants all implement
voice recognition software to interact with users. The way consumers use
voice recognition technology varies depending on the product. But they can
use it to transcribe voice to text, set up reminders, search the internet and
respond to simple questions and requests, such as play music or share
weather or traffic information.

• Smart devices. Users can control their smart homes – including smart
thermostats and smart speakers -- using voice recognition software.
• Automated phone systems. Organizations use voice recognition with their
phone systems to direct callers to a corresponding department by saying a
specific number.
• Conferencing. Voice recognition is used in live captioning a speaker so others
can follow what is said in real time as text.

• Bluetooth. Bluetooth systems in modern cars support voice recognition to
help drivers keep their eyes on the road. Drivers can use voice recognition to
perform commands such as "call my office."
• Dictation and voice recognition software. These tools can help users dictate
and transcribe documents without having to enter text using a physical
keyboard or mouse.
• Government. The National Security Agency has used voice recognition
systems dating back to 2006 to identify terrorists and spies or to verify the
audio of anyone speaking.

Voice recognition advantages and disadvantages
Voice recognition offers numerous benefits:
• Consumers can multitask by speaking directly to their voice assistant or other voice
recognition technology.
• Users who have trouble with sight can still interact with their devices.
• Machine learning and sophisticated algorithms help voice recognition technology
quickly turn spoken words into written text.
• This technology can capture speech faster than some users can type. This makes
tasks like taking notes or setting reminders faster and more convenient.

Matlab.
Disadvantages of the technology include the following:
• Background noise can produce false input.
• While accuracy rates are improving, all voice recognition systems and
programs make errors.
• There's a problem with words that sound alike but are spelled differently and
have different meanings -- for example, hear and here. This issue might be
largely overcome using stored contextual information. However, this requires
more RAM and faster processors.

What are 3 uses for voice recognition software?
• Voice recognition can be used to control a smart home, instruct a smart
speaker, and command phones and tablets.
• In addition, we can set reminders and interact hands-free with personal
technologies.
• The most significant use is for the entry of text without using an on-screen or
physical keyboard.

• VOICEBOX is a speech processing toolbox consisting of MATLAB routines
that are maintained by and mostly written by Mike Brookes, Speech and
Audio Processing Lab, CSP Group, EEE Dept, Imperial College London.
• The routines are available as a GitHub repository or a zip archive and are
made available under the terms of the GNU Public License.
• To avoid conflicts, all routine names begin with a "v_" prefix. For
compatibility with legacy code, aliased versions without the prefix are
included but these are likely to be removed in the future (the routine
v_voicebox_update.m is included to update legacy code to the new names)

Matlab.
• Audio Toolbox™ enables real-time audio signal processing and analysis in
MATLAB®
and Simulink®(Graphical Programming language). It provides
low-latency(delay) connectivity for streaming audio from and to sound cards
via the following driver standards:
• Windows: DirectSound, WASAPI, ASIO™
• Apple Mac OS X: Core Audio
• Linux®
: ALSA

• All audio device interfaces in both MATLAB and Simulink support C code
generation for acceleration and desktop prototyping. For example, you can
generate libraries or standalone applications that process audio in real-time on
the desktop.
• Audio Toolbox also enables you to tune algorithm parameters interactively
during simulations using external MIDI controls.

Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions

More Related Content

What's hot

Similar to Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions

Recently uploaded

Audio Signal Processing Basics, mirtoolbox contains many useful audio processing library functions