Final year undergraduate project detecting sleep apnoea and hypopnea from audio breathing signals for the RCSI.
Developed a solution using signal processing in the time domain in conjunction with parallel processing on a GPU.
Acoustic analysis of Sleep Apnoea and Hypopnea events in night-time respiratory signals Final Year Project Report Muhammad Bilal Alli
1. Final Year Project Report
Acoustic analysis of Sleep Apnoea and
Hypopnea events in night-time respiratory
signals
Name: Muhammad Bilal Alli
Student No.: 10322935
Class: PBM4
Date: 28th
April 2014
Supervisor: Prof. Kevin McGuigan
2. 1
Declaration
Name: Muhammad Alli Student ID Number: 10322935
Programme: Physics with biomedical Sciences
Module Code: PS451
Assignment Title: Final Year project Report
Submission Date: 28th
April 2014
I declare that this material, which I now submit for assessment, is entirely my own work and
has not been taken from the work of others, save and to the extent that such work has been
cited and acknowledged within the text of my work. I understand that plagiarism, collusion,
and copying are grave and serious offences in the university and accept the penalties that
would be imposed should I engage in plagiarism, collusion, or copying.
I have read and understood the DCU Academic Integrity and Plagiarism Policy (available at:
http://www4.dcu.ie/registry/examinations/plagiarism.pdf and
http://www.dcu.ie/info/regulations/plagiarism.shtml). I accept the penalties that may be
imposed should I engage in practice or practices that breach this policy.
I have identified and included the source of all facts, ideas, opinions, viewpoints of others in
the assignment references. Direct quotations from books, journal articles, internet sources,
module text, or any other source whatsoever are acknowledged and the sources cited are
identified in the assignment references.
I have read and understood the DCU library referencing guidelines (available at:
http://www.dcu.ie/sites/default/files/library/LibraryGuides/Citing&ReferencingGuide/pla
yer.html) and/or recommended in the assignment guidelines and/or programme
documentation.
By signing this form or by submitting this material online I also confirm that this
assignment, or any part of it, has not been previously submitted by me or any other person
for assessment on this or any other course of study.
Name:_________________________
Date:__________________________
3. 2
Abstract
The research carried out over the course of this final year project served to evaluate methods
to identify sleep apnoea and hypopnea in audio samples ideally recorded in a patients home.
The provided audio data from recorded sleep studies resulted in limited and obscured audio
quality due to environmental and in particular fan noises from medical equipment .Details,
such as microphone placement and model of the device, with which the sound was recorded
was not provided with the sample recordings .The use of a rigid threshold to carry out this
identification was explored and found to be unsuitable. An original adaptive system of
filtering was designed, the system is data driven and uses the audio samples themselves to
devise parameters to detect apnoea. The adaptive system of filters was used to screen for
events which had a high probability of being apnoeic, by design taking advantage of the
duration and energy of respiratory signal. The adaptive system designed as showcased in this
report has the ability to separate respiratory signals from white noise of varying intensity
profiles over the course of a night time signal, as well as separating the shot noise of the
respiratory signal; it then collects the information related to the respiratory effort of the
patient. The approach used in this project was found to be susceptible to fan noise. The
system filters out 82% of the apnoea in a sample similar to the acoustic, home environment.
It records a number of false positives as it requires further classifiers to be converted from a
system of filters to a detector that can be used for at home screening . Detecting an
increased or sustained respiratory effort after a silence can be employed to reduce the
number of false positives in the approach, and raise its accuracy .Finally converting the
software to a fully functional apnoea screening tool on its own .A classifier which detects a
decrease in respiratory effort prior to a silent sequence, would allow the processes in this
study to be used to detect for hypopnea[1]
. While these two conditions have to be
implemented to take this work to the next level, methods are already developed to record the
positions of apnoeas in audio samples. The number of apnoeic events can deliver a measure
of the severity of apnoea a person is suffering from by, indicating the apnoea-hypopnea
index (AHI).
Acknowledgements
I would like to thank my supervisor Prof. Kevin McGuigan for the chance to take on
something as emotionally fulfilling a rewarding as contributing to the detection of
respiratory disorders. I would also like to thank Dr. Peter Knief for his guidance and help
with understanding sleep apnea as well as using the MatLab interface.
4. 3
List of figures
Figure 2.1.1: An audio sample of sleep apnoea
Figure 3.1.1: Visualization of the fixed threshold method
Figure 3.1.2: Flow diagram Fixed threshold method
Figure 3.1.3: Verification tool Matlab interface
Figure 3.2.1: Breathing
Figure 3.2.2: Moving average applied to breathing
Figure 3.2.3: Snoring
Figure 3.2.4: Moving average applied to snoring
Figure 3.2.5: Short Artifact
Figure 3.2.6: Moving average applied to Short Artifact
Figure 3.2.7: Flow diagram Addition of the moving average
Figure 3.3.1: Apnoea audio sample
Figure 3.3.2: Threshold applied to an apnoea
Figure 3.3.3: Moving average applied to the thresholded data
Figure 3.3.4: Cumulative sum applied after the energy threshold of the moving average
Figure 3.3.5: Example of a Resonant artifact
Figure 3.3.6: Cumulative sum of a lower energy threshold of the moving average
Figure 3.3.7: Flow Diagram Adaptive threshold method
Figure 4.2.1: Noise profile and breathing 1
Figure 4.2.2: Noise profile and breathing 2
Figure 4.3.1: Example of respiratory effort increase
Figure 4.3.2: Example of respiratory effort decrease
5. 4
Table of contents:
1. Introduction ……………………………………….pg.5
1.1 Preamble………………………………………pg.5
1.2 Sleep Apnoea and hypopnea…………………...pg.5
1.3 Why audio analysis……………………………..pg.6
2. Theory……………………………………………….pg.7
2.1 Mathematics and Physics of Apnoea…………....pg.7
3. Experimentation……………………………………pg.8
3.1 Fixed Threshold Method………………………pg.8
3.2 Application of a Moving Average……………...pg.11
3.3 Adaptive threshold……………………………..pg.17
4. Results and Discussion……………………………..pg.23
4.1 Fixed threshold method………………………...pg.23
4.2 Application of a Moving average……………….pg.24
4.3 Application of an Adaptive Threshold………….pg.26
5. Conclusions………………………………………….pg.28
6. References…………………………………………...pg.29
A. Appendices………………………………………….pg.30
A.1 Risk Assessment …………………………………pg.30
A.2 Code for Fixed threshold…………………………pg.32
A.3 Code for audio playback and result verification…...pg.33
A.4 Code for Adaptive threshold……………………...pg.35
6. 5
1. Introduction
1.1 Preamble
The objective of this project was to analyse the nighttime respiratory signals and report
mathematical behaviors, which could in future be used to detect sleep apnoea. Anonymised
patient audio data were provided by the RCSI and analysed using MatLab.
This report serves to evaluate common methods suggested in literature thus far, in terms of
suitability and applicability for detecting sleep apnoea and the purpose of designing an
application to screen for sleep apnoea in an at home environment.
1.2 Sleep Apnoea and hypopnea
Apnoea is defined as the cessation of breathing. In sleep apnoea an apnoeic event is, as
defined within the context of the research within this document as the cessation of breathing
for a period of greater than ten seconds and less than ninety seconds. Hypopnea on the
other hand is a period were human breathing is abnormally shallow, clinically differentially
diagnosed by the accompanying drop in blood oxygenation levels.
There are two distinct types of sleep apnoea, obstructive sleep apnoea (OSA), and central
sleep apnoea (CSA).
Obstructive sleep apnoea occurs when the muscle tone in the human airway is insufficient to
stop it from collapsing. The airway itself is flexible and prone to collapsing due to the
biomechanical nature and properties of connective tissues it is comprised of. Muscles
provide the support structure in the airway and normally prevent collapsing, similar to how
bones provide us with rigid structure, when human muscle tone drops so is its capacity to
keep the airway open reduced.
Central sleep apnoea is a neurological problem at its core; It occurs when the brain fails to
regulate respiration adequately during sleep, by proxy failing to manage blood oxygen levels.
These two types of apnoea are frequently found together, these forms can even manifest in
mixed forms. OSA can lead to CSA due to periods where blood oxygen levels in the brain
are low. The reverse is also found, CSA (as well as OSA) , causes depression, depression
itself and classic first and second generation antidepressants can cause obesity, obesity results
in a lowered muscle tone in the airway with a knock on effect on apnea, leading to a vicious
cycle.
7. 6
1.3 Why audio analysis
A person who suspects to suffer from sleep apnoea would be referred to a sleep clinic and
undergo a sleep study. They would spend a night sleeping while polysomnography would be
carried out. Polysomnography (PSG) is a diagnostic test in which several kinds of physical
parameters are recorded simultaneously, each pertaining to different physiological factors of
the patient.
Electrooculography ,electroencephalography, electrocardiography, pulse oximetry and audio
recordings of patient breathing are carried out simultaneously.
Electrooculography measures the muscle action potentials related to eye movement, it is
predominantly used to gauge if REM (rapid eye movement) sleep phases have been
achieved. Apnoea denies patients to access this part of the sleep cycle sufficiently , as
apnoeic events are intermittent over the course of a night , and as a prolonged period of
sleep is required to achieve REM sleep. Denial of REM sleep reduces the individuals
capability to repair itself and also results in depression.
Electroencephalography analyses electric potentials related to brain waves and is used
predominantly to highlight neurological abnormalities such as epilepsy. Electrocardiography
analyses electric potentials in signal transduction of the heart and can be used to identify
irregular behavior such as an arrhythmia.
Pulse oximetery measures a patient’s blood oxygen saturation commonly via simplified
absorption spectroscopy.
Audio recordings deliver only information about the patients respiratory behavior and the
noises caused by the respiratory tract.
While the monitoring with this equipment seems to be adequate to diagnose sleep apnoea,
there is an interesting social phenomenon associated with this practice which results in a
bottleneck effect .
The PSG and its equipment requires technicians and physicians to conduct and monitor the
facilities available in a sleep clinics. This resulting in the bottleneck effect, where waiting lists
and times get constantly longer.
A proposed solution to this is giving people a tool of reasonable accuracy to screen
themselves. In case their suspicion is confirmed they can then contact their GP and proceed
with the classic regime.
The core idea is, using just the audio data, to screen the patient. Due to the rise of
smartphone technology and multiplatform online stores, designing software, which can be
used on a device that has a microphone, is a promising approach. It may give people a
chance to access if their suspicions are well founded.
8. 7
The concept would be to have an application that would record audio of whilst the likely
patient slept at home, thereby collecting information related to respiration. The data would
be analysed and a result would be displayed to the individual.
2.Theory
2.1 Mathematics and Physics of Apnoea
Sound waves are mechanical disturbances of a medium; they propagate through solids,
gasses and liquids. Audio recordings take sound in as a measure of the intensity of captured
samples. Intensity is an arbitrary measure of how loud a sample is and commonly not
displayed in an SI representation but rather as arbitrary units (AU) of intensity. Noises are
sounds we can hear, respiratory noises are sounds related to breathing, respiratory noises are
the features on interest this study focuses on identifying, from a signal analysis perspective
the term noise represents all non-respiratory information, which is not of interest.
Figure 2.1.1: An audio sample of sleep apnoea
The structure of the airway of each patient has unique physiological and acoustic parameters
that distort sound waves in different ways; due to the provided data being anonymised, this
was an unknown factor .
These unknown conditions and the envisaged generalization allowed for educated
approximations within reason. An approximation equivalent of the energy of a packet of
sound in an audio sample is equivalent to the square sum of the intensities that comprise it,
the energy of a packet is a measure of how loud a packet is . As shown below in equation
2.1.1
∑ (Eq.2.1.1)
9. 8
Where E is the energy of the packet in arbitrary units and I is the intensity of the samples
within that packet in arbitrary units. Energy is the sum of the intensities squared in a
packet.[2]
Due to the arbitrary nature of intensity of samples in audio recordings due to the selection of
format and bit representation, we can further approximate the energy of a packet to the
form in equation 2.1.2 below.
∑ (Eq.2.1.2)
The arbitrary nature of this representation of energy is only relevant when comparing
packets within the same sample, thus it is a measure of the relative energy of sounds per
sample.
This relative measure of energy has further meaning associated with it when used in
respiratory signal analysis.
(Eq.2.1.3)
E is the energy term as defined in equations 2.1.1 & 2.1.2
However the energy itself is a measure of respiratory flow F, K and α are values related to
the parameters responsible for sound generation. [3]
A large portion of research has been dedicated to the frequency analysis of OSA. Research
to this end has yielded a qualitative factor of interest. Patients with OSA have displayed a
different frequency behavior of their snores to those without OSA.[4,5]
There is a presence of peaks above 500HZ in the power spectrum for patients with OSA,
however using this as a tool to screen for aponea within the context of this project seems
problematic and inherently flawed.
This behavior does not manifest in all patients with OSA[6]
. Patients with CSA may not snore
at all at the end of an apnoeic event , they may just inhale or gasp .
3. Experimentation
3.1Fixed Threshold Method
Using the definition of apnoea as a period where breathing has ceased, we can simply
redefine aponea as a silence of greater than 10 seconds but less than 90 seconds, which is
followed by a respiratory noise. Below a certain intensity value, the audio signal recorded is
not loud enough to be detected by the human ear as breathing, above that value respiratory
noises can be heard we refer to this as the threshold. Below the threshold, we consider
samples as silences and above it we consider them as noises.
10. 9
To this end, we can use the formula 3.1 below.[1]
∑
(Eq.3.1.1)
The duration of a silence is calculated by the number of samples with values below a
threshold divided by the sampling frequency, which gives us the duration of the silence in
seconds. The sampling frequency is the number of samples per second. The sampling
frequency in this study was 44.1kHz, that means there were 44100 samples recorded per
second.
Figure 3.1.1: Visualization of the fixed threshold method
The figure above illustrates this approach. The black line represents the threshold, looking at
it from left to right we can identify a period of approximately 4 seconds of silence.
Three fixed thresholds were selected based on reasonable observations of the audio data.
Code was generated in MatLab to carry out the sequence of processes to record the position
of an aponea. The sequence of events as carried out computationally as illustrated in figure
3.1.2. (Matlab code included in appendices under A.2)
11. 10
Start
Figure 3.1.2: Flow diagram Fixed threshold method
A system of analysing and verifying the results also needed to be designed. Using the matlab
software tool a system was designed to graph and play captured apnoea samples to
determine if they were true or false positives. This system prompted the user to select the
sample they wished to analyze, it displays a graph of that sample then asks the user if they
would like the audio to be played. The user is then prompted for an answer on whether or
not the sample was an apnoea. These answers are stored and displayed when the analysis
finished. As analyzing the data can be a time consuming process there was also a feature
built in to end the analysis, store the results and allow the user to resume analysis another
time.
(Included in appendices under A3)
Detect a drop below the
threshold
Start counting samples
When the threshold is exceeded
the program stops counting
samples, it divide the number of
counted samples by the sampling
frequency , this gives us the
duration of the silence
If the duration of the silence is
greater than 10 but less than 90
seconds record the event as
apnoea
If the duration of the silence is
less than 10 or greater 90 seconds
do not record the event as
apnoea
12. 11
Figure 3.1.3: Verification tool Matlab interface
As shown above in figure 3.1.3 the interface for the verification tool was designed to be
simple and ergonomic, it displays the audio file of interest as a graph and plays the audio.
Analyzing audio data provided a challenge in terms of RAM resources, as there were
approximately 1.3 billion 64 bit samples per patient this resulted in a 9.2 GB RAM
requirement, to allow the processes to take place a paging file of sufficient size had to be
established on the hard drive of the analyzing computer.
A paging file allows a computer to run a portion of the hard drive as RAM to carry out a
process, this was however a double edged sword as the hard drive itself provides only very
slow virtual memory.
By reducing the bit representation of the numbers to 16 bit , the RAM requirement was
brought down to a manageable size of 2GB, speeding up processes as, RAM when used
without the slowing contribution of virtual memory from a hard drive is much faster.
3.2 Application of a Moving Average
To compensate for the presence of shot noise, which manifests as a single sample of high
intensity, sometimes enough to exceed the threshold and stop the program from counting
the duration of the silence prematurely, a means of removing shot noise had to be applied.
Shot noise as it is a single high intensity sample, if averaged over a window of sufficient size
is diminished.
To carry this out a moving average was applied. Two window sizes were used, 0.4 seconds as
per the literature[1]
to mimic speech analysis and 0.5 seconds . After substantial testing 0.5
was used as a standard, as it proved to diminish non-respiratory noises as well as amplify
13. 12
respiratory noises to a greater extent due to its larger window size. Reparatory noises are
found to be between 0.3 and 0.5 seconds of duration, though they can exceed this with
snoring, this is an important to note as most artifacts will be of a much shorter duration.
∑
(Eq.3.2.1)
Y as in equation 3.2.1 above is the output sample for a moving average using sample set
X(n), with a window of size W.
Examples of the samples and their associated 0.5-second window moving averages are
shown below.
Note the black regions highlighted in the breathing figures 3.2.1 & 3.2.2 , in the moving
average the region is smoothed. Single high intensity peaks are diminished while packets of
high intensity information are amplified.
For breathing and snoring the resulting intensities of the packets in the moving average
graphs are much higher than the artifact, this is because in a 0.5 window attenuates and
amplifies respiratory noises while simultaneously diminishing noises that are too short. Also
notable is that the packets are broadened as a result of smoothing with a moving average.
Figure 3.2.1: Breathing
16. 15
Figure 3.2.6: Moving average applied to Short Artifact
The intensity value of the peaks of the packets in the moving average graphs represent the
energy of that packet, as respiratory noises should be attenuated to have a higher energy
value, that feature was selected to act as a classifier. This particular characteristic can be
observed by comparing figures 3.2.2 and 3.2.4 with 3.2.6.
If a packet was detected with a suitable energy, it would be classed as respiratory.
This condition was added to the fixed threshold method.
Figure 3.2.7 explains the sequence of events in which the code written in matlab acts to
record the position of apnoeic events in samples.
17. 16
Start
Figure 3.2.7: Flow diagram Addition of the moving average
Detect a drop below the
threshold
Start counting samples
When the threshold is exceeded ,
load 5 seconds of the sample
starting at the threshold breech ,
analyse to detect if it has
sufficient energy
If the duration of the silence is
greater than 10 but less than 90
seconds record the event as
apnoea
If the duration of the silence is
less than 10 or greater 90 seconds
do not record the event as
apnoea
If the energy is not high
enough, do not stop
counting samples
If the energy is high enough stop
counting samples, calculate the
duration of the silence
18. 17
3.3 Adaptive threshold
To devise a method of applying an adaptive threshold one has to consider the factor that
results in the changing intensity profiles of similar respiratory noises, which is white noise.
White noise is random in the time domain and constant in the frequency domain, it its
termed white noise because it represents equally over the entire spectrum of frequencies.
White noises’ additive property essentially makes it an amplifier which emphasizes certain
features of the signal in a varying fashion throughout the duration of a signal.
By using the white noise as the parameter to define the adaptive threshold we have a means
of analysing data that is amplified by varying amounts over the course of the signal. We take
advantage of the very phenomenon that causes the problem in order to solve it.
A suitable sample of sufficient length should have a large amount of time where respiration
did not occur to be selected for this purpose. The median is a middle number of a dataset
separating the higher half of the data set from the lower half. By calculating the median of a
sample we derive a measure of the intensity profile of the white noise. A reasonable length as
shown by results in table 4.3.1 is 10 seconds to calculate a median. If the sample is too small
a median of a respiratory event will be calculated and the corresponding threshold will be set
too high.
The following systematic application of filters was designed independently, it serves to
identify apnoea and hypopnea as well as information about the patients respiratory effort
over time, it resembles a major outcome of this research.
This system is illustrated below in a stepwise fashion in figures 3.3.1 -3.3.4.
Figure 3.3.1: Apnoea audio sample
19. 18
As shown in figure 3.3.1 above the system is loaded with an audio sample. The median of
the sample was then calculated.
Figure 3.3.2: Threshold applied to an apnoea
The figure 3.3.2 shows the outcome when the data of the first step are threshold by
applying a logical operator. If the data were below the threshold, it was set to 0, if the data
were at or above the threshold it is set to 1. As there is no further need for a large precision ,
the samples could then be stored in 8 bit representation. Reducing the RAM requirement
after this stage to 800MB when running on full 8 hour night time samples.
Figure 3.3.3: Moving average applied to the thresholded data
A moving average of a 0.5-second window was then applied. The peaks of the packets now
represent the energy of the packets. Note how the white noise is removed and only features
of interest remain when compared to figure 3.3.1. This prevents shot noise triggers and the
20. 19
majority of non-resonant artifacts from triggering the system henceforth. A further
interesting aspect is that the physiological packets are reformed into near identical versions
of what they were before the threshold was applied.
The signal was destroyed and the features of interest, the very physiological packets that
would identify apnoea or hypopnea are reformed by applying a moving average of the
correct window size of 0.5 seconds.
Figure 3.3.4: Cumulative sum applied after the energy threshold of the moving average
By applying another threshold to remove packets of low energy ,unwanted packets can be
removed. Those packets would represent ambient noises. The threshold is set to an intensity
of below 50, by then applying a cumulative sum the duration of the packets was determined.
If the duration of a packet was above 0.3 seconds it was considered respiratory.
20 samples were taken from 7 patients each. 140 in total each of 20 seconds duration. This
was done to ascertain what would be the best conditions to apply an adaptive threshold to
the intensity, the intensity profiles of the samples yielded similar thresholds to apply for
samples of similar noise profiles, but setting these rules in place to threshold in a fixed
manner within an adaptive system would prove to be taking a step backwards. Rules in table
3.3.1.
During the analysis stage of application of the adaptive threshold and filtering processes
above that threshold , a large overhead of 32 seconds per application of the moving average
on a 20 second sample was discovered.
To address this, parallel processing was implemented. Usually in an iterative process one
iteration takes place at a time carrying out a single task , the index is changed and the next
iteration takes place. Parallel processing allows multiple iterations to take place
simultaneously, by setting matlab to use 12 workers (possible parallel processes) , 12
iterations will be processed at once, dramatically reducing calculation time. Reducing the 32
second delay to less than 16 seconds. This would not have been possible without the
21. 20
significant reduction in the RAM requirement from 2GB to 800MB as this process is RAM
intensive, each worker holding a load of 100-350MB.
The prospect of processing on the graphics card to accelerate the program was investigated.
On small samples of 20 second duration there was a slight delay , on larger full night time
samples a process that would take 8 hours was delayed to 29 hours. The delay between the
commutation of the graphics card and CPU central processing unit became a huge factor.
GPU acceleration kicks in when simple process are carried out on a large dataset. As only 3
second samples or 0.7MB was being processed at a time this was not a situation where the
GPU could have added any performance increase.
As GPU programming was not suitable for the acceleration of this program it was not used
further.
Table 3.3.1
Patient No. Apply threshold
when the median is:
Set the threshold to
this corresponding
value :
14 >500 4000
18 <100 600
20 <240 1500
20 >240<350 3000
10 >450 3750
Using leave one out cross validation with the rules in table 3.3.1 above sample 14 was tested,
using the rules of the other samples and leaving out its rule for itself.
This yielded a 0% capture for apnoea. Referring to table 4.1.1 this threshold should have
showed some return, as the condition from patient 10 matched up very closely to patient 14 .
The system was offset by the set thresholds in lower intensity regions and triggered in an
unwanted fashion. To correct for this the system was made entirely adaptive. The intensity
threshold was consecutively set to be five times the median of a ten second segment.
Further testing on the samples revealed that energy was a poor way of defining whether
packets were physiological or not, resonant artifacts can have very high energies and thereby
offset the apnoea detection. Figure 3.3.5 illustrates this.
22. 21
Figure 3.3.5: Example of a Resonant artifact
The use of energy as a classifier requires a great deal of care to implement, low intensity
packets such as breathing or pronounced hypopnea would be obliterated if the energy
threshold applied in the system was set too high. As was the case with the energy threshold
of 50 . By lowering the energy threshold used to 5 it was possible to correct for this. A
Further use of this lower threshold was the correction of the broadening of the packets due
to the convolution with a moving average filter reducing them back similar to their original
durations.
Figure 3.3.6: Cumulative sum of a lower energy threshold of the moving average
As we can see above hypopnea at the 21 second line and 37 second line now also appears,
this displays the usefulness of this approach in apnoea and hypopnea detection, it will not
catch hypopnea if it is sufficiently buried under the white noise, but it will present hypopnea
if it can be distinguished in terms of its energy at the moving average phase .
23. 22
Start
Figure 3.3.7: Flow Diagram Adaptive threshold method
The sample is loaded Each ten second segment is then
has a threshold applied to it, if
samples are below 5 times the
median they become zeros, if the
samples are above the threshold
they become ones.
The pieces are then placed back
into the big sample they came
from
The samples are then passed
through a program which
considers silences as samples
which are below 1 and counts
them, if a 1 is detected the
system stops counting samples
and is triggered to carry out the
following events
Starting at the sample that
triggered this process load a 3
second sample segment
Apply the 0.5 second window
moving average
Apply an energy threshold of 5
to correct for the broadening of
packet by the moving average
and remove low energy ambient
noises
Apply a cumulative sum which
processes the packets and gives
us the duration of each packet
If a packet of greater 0.3 second
duration is detected in the sample
segment , then a respiratory
event is considered to have
occurred
The duration of the silence
preceding the trigger is calculated
If the duration of the silence
preceding the trigger was greater
than ten and less than ninety
seconds then record the event as
an apnoea
If the duration was less than ten
seconds or greater than ninety
seconds do not record the event
as an apnoea
24. 23
4. Results and Discussion
4.1 Fixed threshold method
The results for sample 14 and 18 below show detected apnoeic events, which were events
captured by the code designed for the fixed threshold method.
Confirmed events are detected events which were listened into, to confirm if those events
were indeed apnoea. The number is the patient report column is the number of apnoea
detected in the patient reports provided by the RCSI, these are taken as an accurate
representation of 100% of the apnoea in the recordings. However as the patient reports
were verified by human means they are prone to discrepancies due to human error and
implementation of additional sensory input from PSG.
Sample 14 Table 4.1.1
Intensity
Threshold
Detected events Confirmed events Patient report
3000 52 22 368
3600 330 58 368
4200 476 141 368
Sample 18 Table4.1.2
Intensity
Threshold
Detected events Confirmed events Patient report
3000 133 4 128
3600 100 5 128
4200 65 2 128
From the results above, of the obvious number of confirmed events is quite low, while the
number of detected events is for the most part often quiet close to the number of events in
the patient report. From this, it was assumed that shot noise lead to the system tripping too
early thereby failing to capture apnoea, as no means had been put in place to stop the shot
noise; the model was prone to being offset and recording false positives.
25. 24
4.2 Application of a Moving average
Two energy thresholds were tested; the advantage of the moving average is that less
premature tripping of the system is supposed to occur. The same intensity thresholds as used
in the fixed threshold method were applied.
Results sample 18 Table 4.2.1
Intensity
Threshold
Energy
Threshold
Events
Found
Events
Confirmed
3000 350 9 1
3000 500 0 0
3600 350 2 0
3600 500 0 0
4200 350 Test Aborted N/A
4200 500 Test Aborted N/A
The results show a surprisingly low yield, the system has performed even worse than before
the moving average was applied. The last two runs were aborted when the criteria below
became apparent after substantial small sample testing.
As reason white noise was identified. White noise is what the silence is comprised of in the
audio sample, it has a constant frequency profile and represents equally across all
frequencies.
However, the intensity profile of white noise in an audio sample is not constant from
recording to recording or even within the same recording after enough time has elapsed.
This is highlighted in figures 4.2.1 & 4.2.2. Note that in the figure with larger white noise
intensity the breathing of the samples is much larger. This is because noise is an additive
property, when the intensity of the white noise profile is high, features of interest such as
breathing are amplified if they are of sufficient intensity when they were recorded. Hypopnea
or shallow breathing is prone to be buried in noise.
26. 25
Figure 4.2.1: Noise profile and breathing
Figure 4.2.2: Noise profile and breathing 2
By comparing figures 4.2.1 and 4.2.2 we could observe that the intensity thresholds used
were far too extreme. Regions in the audio recordings exist that carry respiratory signals
which are below even the lowest threshold of 3000 as shown in figure 4.2.1. This means that
entire regions were not processed or taken into account and they were simply treated as
silence by the program. From this, it was concluded that a single fixed threshold was not a
viable approach to use to screen for sleep apnoea.
27. 26
4.3 Application of an Adaptive Threshold
Test for a suitable length of calculate the median
The results in table 4.3.1 characterize noises as physiological if they are greater than 0.3
seconds after a correction factor is applied to account for broadening effects due to applying
a moving average. A 0.5 second window was used for the moving average. Subtracting 0.5
from the longest noise serves to correct for broadening of the packets.
Table 4.3.1
Type of noise Artifact Breathing Breathing Breathing Apnoea
Longest
noise(seconds)
0.7146 1.2307 1.0388 0.0007 1.0044
Correction of
longest noise
(seconds)
0.2146 7.7307 0.5388 -0.4993 0.5044
Total duration
of the
sample(seconds)
11 5 13 3.5 9
Result PASS PASS PASS FAIL PASS
From this we can conclude that 3.5 seconds is an interval too small to calculate the median
and expect it to apply an appropriate adaptive threshold as a sample that short may be
dominated by a respiratory noise. 10 seconds was then used as the sample length used for
the median calculation.
Results of applying an adaptive threshold with a low energy threshold
Table 4.3.2
Patient No. Detected events Confirmed events Patient report
14 650 60 368
18 651 105 128
From the results above we see that the approach did not catch as many true positives in
sample 14 as it did in sample 18.
Both over reported quite a lot in the detected event coulomb.
Referring to the detected events, and comparing to the associated confirmed events it
becomes evident that the filter records a rather large number of false positives compared to
true positives even with sample 18.
If we divided the confirmed events by their corresponding patient report value then
multiplied by a hundred ,we then have a percentage representation of the apnoea captured
by the program.
There are several reasons for the capture of 16% in sample 14 as compared to the much
larger values of 82% capture of apnea in sample 18.
28. 27
Sample 14 and 18 are both audio data taken during sleep studies, the audio quality however
is quite different in the two.
During the sleep studies the medical equipment as listed in the section 3.1 generates very
loud fan noises. These noises can be of sufficient packet energy and duration to mislead the
program.
In sample 14 the fan noises are quiet dominant , sample 18 represents an environment closer
to what a person should have in their homes but still has a few regions which are dominated
by fan noises.
As these high fan noises are not indicative of a home environment it should be noted that
the usefulness of the adaptive threshold method is exemplified by the results for sample 18.
A suggested method to bring this system to a fully operational standard as a home use
apnoea detector for screening purposes would be to detect an attempt by the body to try to
return to normal breathing after an apnoea, this would be characterized by an increase in
packet duration after the apnoea for a short period of time. The idea is illustrated in figure
4.3.1.
Figure 4.3.1: Example of respiratory effort increase
Figure 4.3.1 shows an increase in the slope of the duration of respiratory events with time
after an apnoea, as the packet duration is a representation of events of energy suitable to be
considered respiratory, an increase in its slope would indicate in an increase in respiratory
flow and effort to return to normal breathing.
A drawback to this approach would be if the patient or subject took in a breath or a
hypopnea occurred and no further packets were detected, as shown at the 21 second mark of
figure 4.3.1.
A method of detecting hypopnea has been suggested and validated in the literature
referenced below. It hinges on a similar principal, as apnea is typically followed by an
29. 28
attempted to return to normal breathing or at least a sustained increase in respiratory effort
as showed in figure 4.3.1, hypopnea is preceded by a decrease in respiratory effort[1]
.
Figure 4.3.2: Example of respiratory effort decrease
Figure 4.3.2 shows the decrease in respiratory effort over time preceding hypopnea, this is
represented by the delineation or falling slope.
5. Conclusions
As per the experimentation and the results it has yielded , it is not feasible to apply a fixed
threshold to screen for apnoea in a manner in which the intensity profile of the sample
would not be known before carrying out the test itself. Various regions of a single night time
audio sample can have very different white noise intensity profiles, this will result in features
of interest being buried beneath the white noise or amplified to varying degrees. An adaptive
system who’s threshold parameter is set by the white noise intensity profile itself can
overcome this.
A method of employing an adaptive threshold was designed and shown to capture 82% of
apnoea in a sample that was of audio quality close to what a home environment would be.
The adaptive system designed brings out the information related to respiratory effort, from
there literature suggested & supported methods were reported to bring this system of
filtering to the stage of detection and thereby significantly reduce the amount of false
positives hence reduce the offset to the detection and potentially raise the accuracy of the
system.
This adaptive approach is however prone to fan noises, and resonant artifacts, sample 14
showed just how offsetting these factors could be to the detection software.
For at home use however if an application were to be designed on the basis of this software,
a message could be displayed to tell the user to turn off all fan noise emitting devices thereby
solving the problem.
30. 29
The methods used in this project were comprised only of low key functions that were not
computationally intensive. This was done because typical mobile devices have a lower
amount of computational resources compared to a desktop computer. While only the time
domain was manipulated in this study , yielding reasonable good results, a further
investigation in the frequency domain and or time domain would present reasonable
prospects in the pursuit of apnoea detection from audio signals.
6. References
[1] Alshaer H, Fernie GR, Maki E, Bradley TD. Validation of an automated algorithm
for detecting apneas and hypopneas by acoustic analysis of breath sounds. Sleep
Med [Internet]. Elsevier B.V.; 2013 Jun [cited 2014 Apr 24];14(6):562–71. Available
from: http://www.ncbi.nlm.nih.gov/pubmed/23453251
[2] Ankishan, H. & Ari, F. Snore-related sound classification based on time-domain
features by using ANFIS model. Innov. Intell. Syst. … 441–444 (2011). at
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5946113
[3]. Yadollahi A, Montazeri A, Azarbarzin A, Moussavi Z. Respiratory flow-sound
relationship during both wakefulness and sleep and its variation in relation to sleep
apnea. Ann Biomed Eng [Internet]. 2013 Mar [cited 2014 Apr 24];41(3):537–46.
Available from: http://www.ncbi.nlm.nih.gov/pubmed/23149903
[4] Ng AK, Koh TS, Abeyratne UR, Puvanendran K. Investigation of obstructive sleep
apnea using nonlinear mode interactions in nonstationary snore signals. Ann Biomed
Eng [Internet]. 2009 Sep [cited 2014 Apr 24];37(9):1796–806. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/19551511
[5] Pevernagie D, Aarts RM, De Meyer M. The acoustics of snoring. Sleep Med Rev
[Internet]. Elsevier Ltd; 2010 Apr [cited 2014 Apr 3];14(2):131–44. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/19665907
[6] Fiz J a., Abad J, Jané R, Riera M, Mañanas M a., Caminal P, et al. Acoustic analysis of
snoring sound in patients with simple snoring and obstructive sleep apnoea. Eur
Respir J [Internet]. 1996 Nov 1 [cited 2014 Apr 24];9(11):2365–70. Available from:
http://erj.ersjournals.com/content/9/11/2365
31. 30
A. Appendices
A.1 Risk assessment
Physics 4th
year project RISK ASSESSMENT form
Student Name:
Student Number:
Muhammad Alli
10322935
Project Title: Acoustic analysis of Sleep Apnoea and
Hypopnea events in night-time respiratory
signals
Main Project Working Location: N229 4th
year physics laboratories
Supervisor Name: Prof. Kevin McGuigan
Brief Listing of All Risks Associated with Project (bullet point format)*:
Theoretical/computational: “none”.
*3 categories of project are envisaged:
1 – Theoretical/computational projects – in these cases you can write “None” above and proceed to the
signature page (but be sure to confirm this with your supervisor and remember that there can be manual
handling risks even with lifting computers, printers, monitors in addition to ergonomic issues such as
seating etc.).
2 – Experimental work where no very significant or meaningful risks are identifiable, e.g. if one is
working with very standard “off the shelf” equipment modules, with no accessible high voltages, radiation
exposure dangers, manual handling, chemical, physical or other potential hazards – in these cases you
should list above the potential risks and the safety feature which renders them not significant (e.g. all high
voltages such as mains voltage fully enclosed in insulating or earthed container, all laser radiation fully
enclosed in opaque container etc.) above and proceed to the signature page – but be sure to confirm this
with your supervisor.
3 – Experimental work where some significant or meaningful risks are identifiable – in these cases you
should list above the potential risks above. Each risk listed above should also have a risk assessment
page of the type on the next page of this document completed. This page should also include controls and
precautions and other measures to reduce the risk level and these sections must be completed and then
proceed to the signature page – but again be sure to confirm this with your supervisor.
33. 32
A.2 Code for Fixed threshold
%% Here we load in the sample with audioread
[wave,Fs]=audioread('E:final year
projectrawssample14_1hr15min_8hr30min.wav','native');
%%
%we get the absolute values of the audio data so we dont have to apply a
%dublicate of the threshold with the negitive sign
wave=abs(wave);
Number_of_samples= length(wave);
%% find possible events
last_secound=length(wave)-Fs;
% arrays are pre-alocated values, this lets matlab run faster as it does not have to change the
dimensions of the array as the dataset grows
event_end_times=zeros(1000,1);
event_durations=zeros(1000,1);
event_start_times=zeros(1000,1);
Sample_counter=0;
x=0;
%% threshhold applied
for i= 1: Number_of_samples
if wave(i)<3000
Sample_counter=Sample_counter+1;
Sample_counter2=Sample_counter;
else
Sample_counter2=Sample_counter;
Sample_counter=0;
% if the length of the silence preceeding the break was >10 & <90 seconds then it is
recored as an apnoea
if (Sample_counter == 0)
&&((Sample_counter2/Fs)>=10)&&((Sample_counter2/Fs)<90)
x=x+1;
event_end_times(x,1)=i/Fs;
event_durations(x,1)= Sample_counter2/Fs;
event_start_times(x,1)=(i/Fs)-(Sample_counter2/Fs);
end
end
end
%%
34. 33
A.3 Code for audio playback and result verification
%%here we load the recorded events from excel files into the verification tool to play them
out and display the graphs, the number of events is also loaded in
Number_of_events =xlsread('meta18N.xls');
event_end_times2=xlsread('meta18R.xls');
replay=0;
log_true_results=zeros(1000,1);
%here we load up and play samples for the user to verify
for i=1:1000000
replay=0;
%the number of events is displayed
Number_of_events
% the user is prompted to enter the event they would like to hear and display
N = 'type input sample number you want to play? ';
N = input(N)
lets_end_operation=0;
% a sample is loaded such that the 10 second silce before the apnoea can be heard and a 5
seconds after
strt=round((event_end_times(N,1)-10)*Fs);
edn=strt+(15*Fs);
%the sample is loaded in
[wave2, Fs] = audioread('E:final year projectrawssample14_1hr15min_8hr30min.wav',
[strt edn]);
% the time axis is setup so the graph can be displayed
t=0:1/Fs:(length(wave2)-1)/Fs;
plot(t,wave2x)
xlabel('Time')
ylabel('Intensity')
sound(wave2, Fs);
pause(2);
35. 34
%%built in replay function asks the user if they would like to replay the
%%sample
if replay==0
replay = 'to replay press 1 , else hit 0 : ';
replay = input(replay)
if replay ==1
sound(wave2, Fs);
end
end
% the program asks the user if the event was an apnoea
true_or_false='press 1 if apnoea, else press zero:';
log_true_results(N,1)=input(true_or_false);
number_of_true_events=sum(log_true_results);
if lets_end_operation==0
% the program asks the user if they would like to exit
lets_end_operation = 'press 1 to end :? ';
lets_end_operation = input(lets_end_operation)
end
if lets_end_operation==1
% if the user would like to exit the results are stored and they can pickup later
xlswrite('saved results',log_true_results)
break
end
end
36. 35
A.4 Code for Adaptive threshold
%the sample is loaded in
[wave,Fs]=audioread('F:final year projectrawssample_20_09_2013.wav','native');
% the absolute value of the samples replaces the original values
% a means of processing the sample in ten second segments is put into place
%various arrays are prealcocated to speed up processes
wave=abs(wave);
physiological_funtion_master=0;
suppression_master=0;
x=0;
Sample_counter=0;
event_end_times=zeros(1000,1);
size=length(wave);
size=size/Fs;
size=size/10;
size=round(size)-1;
%%keep in mind the cutoff points
enpoint_hyper=size*10*44100;
size_Arrayend=1:size;
size_Arrayend=size_Arrayend*441000;
size_Arraystart=(size_Arrayend-440999);
display('innitialsations complete')
threshold=0;
37. 36
%the adaptive threshold is appled here to each ten second segment
for i=1:size
start_load= size_Arraystart(i);
end_load=size_Arrayend(i);
sampleholder= wave(start_load:end_load);
sampleholder=single(sampleholder);
med=median(sampleholder);
if med <100
threshold=600;
end
if med>=100
threshold=roundn(med*5,1);
end
sampleholder= sampleholder >= threshold ;
sampleholder=int8(sampleholder);
wave(start_load:end_load)=sampleholder;
end
clear size_Arrayend;
clear size_Arraystart;
clear sampleholder;
clear start_load;
clear end_load;
wave= wave>= 1;
% the bit representation is cut down to 8bits as only 1's and 0's remain
wave=int8(wave);
clear i;
clear med;
clear threshold;
%various arrays are pre-allocated to speed up processes
wavhold=zeros(143326,1);
wavholdx=zeros(143326,1);
wavhold=int8(wavhold);
wavholdx=int8(wavholdx);
number_of_samples=length(wavhold);
endpoint_of_avg= number_of_samples-11025;
endsum=number_of_samples-22050;
last_loop_iteration=endpoint_of_avg-1;
display('preprocessing done')
38. 37
%% pre processing done, adaptive theshold has been applied, now we use moving average
to reconstruct features of intrest as we need them
matlabpool open
for i= 1: enpoint_hyper
%%here we see if a number breaks the threshold of 1
if i> suppression_master
if wave(i)==0
Sample_counter=Sample_counter+1;
Sample_counter2=Sample_counter;
else
Sample_counter2=Sample_counter;
Sample_counter=0;
end
end
if (i< suppression_master)
Sample_counter=Sample_counter+1;
Sample_counter2=Sample_counter;
end
%%first point
if (Sample_counter==0)&&(i>11025)
startpatx=i-11025;
endpatx=i+132300;
wavholdx=wave(startpatx:endpatx);
wavhold(11025,1)=(sum(wavholdx(1:22050)));
wavhold(11025,1)=wavhold(11025,1)/22050;
%%endpont
wavhold(endpoint_of_avg,1)=(mean(wavholdx(endsum:number_of_samples)));
wavhold=int16(wavhold);
39. 38
% here the moving average is applied and paralel processed
parfor h1=11026:last_loop_iteration
j=h1-11025;
k=h1+11025;
wavhold(h1,1)=int16((sum(wavholdx(j:k))));
end
wavhold=wavhold/22.05 ;
%% second logical operator applied, this is the energy threshold that corrects for
broadening
wavhold=wavhold>5;
wavhold=int16(wavhold);
%% cumulative sum applied this gives us the information related to packet duration
hold=0;
for h2=2:number_of_samples
wavhold(h2)=(hold+wavhold(h2))*wavhold(h2);
hold=wavhold(h2);
end
%% physiologial test , if the packet is >0.3 seconds in length it is considered physiological
testP=(max(wavhold)/Fs);
if testP>0.3
physiological_funtion_master=1;
else
physiological_funtion_master=0;
end
if physiological_funtion_master==1
suppression_master=i+132300;
Sample_counter2=Sample_counter2+132300;
end
if physiological_funtion_master==0
Sample_counter=Sample_counter2+1;
suppression_master=i+132300;
end
40. 39
%%here we log the events if >10 and <90 seconds
if (Sample_counter == 0) &&((Sample_counter2/Fs)>=10)
&&(((Sample_counter2/Fs)<=90))
x=x+1;
event_end_times(x,1)=i/Fs;
end
end
end
%% free up memory
matlabpool close
dispNumber_of_events=x;
clear wave;
clear Sample_counter;
clear Sample_counter2;
clear endpatx;
clear endpoint_of_avg;
clear endsum;
clear enpoint_hyper;
clear h2;
clear hold;
clear i;
clear last_loop_iteration;
clear number_of_samples;
clear physiological_funtion_master;
clear size;
clear startpatx;
clear suppression_master;
clear testP;
% here we save results into excell sheet to call them into the
% verification tool when needed
xlswrite('meta20N',x)
xlswrite('meta20R',event_end_times)