SlideShare a Scribd company logo
1 of 63
Download to read offline
The project report is prepared for
Faculty of Engineering
Multimedia University
in partial fulfilment for
Bachelor of Engineering
FACULTY OF ENGINEERING
MULTIMEDIA UNIVERSITY
January 2012
ECG BIOMETRIC RECOGNITION WITHOUT
FIDUCIAL DETECTION
by
Kazi Tasneem Farhan
10711118244
Session 2011/2012
Session 2011/2012
i
The copyright of this report belongs to the author under the
terms of the Copyright Act 1987 as qualified by Regulation 4(1)
of the Multimedia University Intellectual Property Regulations.
Due acknowledgement shall always be made of the use of any
material contained in, or derived from, this report.
ii
Declaration
I hereby declare that this work has been done by myself and no portion of the work
contained in this report has been submitted in support of any application for any
other degree or qualification of this or any other university or institute of learning.
I also declare that pursuant to the provisions of the Copyright Act 1987, I have not
engaged in any unauthorised act of copying or reproducing or attempt to copy /
reproduce or cause to copy / reproduce or permit the copying / reproducing or the
sharing and / or downloading of any copyrighted material or an attempt to do so
whether by use of the University‟s facilities or outside networks / facilities whether
in hard copy or soft copy format, of any material protected under the provisions of
sections 3 and 7 of the Act whether for payment or otherwise save as specifically
provided for therein. This shall include but not be limited to any lecture notes,
course packs, thesis, text books, exam questions, any works of authorship fixed in
any tangible medium of expression whether provided by the University or otherwise.
I hereby further declare that in the event of any infringement of the provisions of the
Act whether knowingly or unknowingly the University shall not be liable for the
same in any manner whatsoever and undertakes to indemnify and keep indemnified
the University against all such claims and actions.
Signature: ________________________
Name: Kazi Tasneem Farhan
Student ID: 1071118244
Date: 9th
January 2012
iii
Acknowledgement
Firstly I would like to thank Allah. I believe without His blessing and mercy I would
not have been able to come so far. Then I would like thank my parents for their
tremendous prayers and moral support. I want to thank them for believing in me.
I would like to express my sincerest thankfulness to my supervisor Dr. Khazaimatol
S. Subari. Her tremendous support and guidance is truly mesmerising. Her
friendliness, co-operation and kindness have helped me a lot throughout my work.
Once again I would like to thank Miss Shima for being so patient with me.
I would also like to mention a research officer Syed Syahril. I would like to express
my gratitude towards him because he has been very helpful and kind. I would like to
thank a senior and Master‟s researcher Mr. Rameshwor Prasad Shah who has been
assisting and guiding me with my research work.
Not to forget my friends who have been a tremendous support as well during my
duration of this research.
iv
Abstract
The structure and functions of the heart is studied. This gives idea of how the heart
works in the human body. The ECG signal is studied by looking into its features and
how they are related to the heart. The method of extraction of ECG signal from the
heart and various forms of obtaining is also studied because it helps in analysing the
signal.
The various types of noise in the ECG and the different methods to filter the signal
are studied. A database of signal is collected and pre-processed using a chosen filter
and window to get a clean and sharp signal. The autocorrelation based feature
extraction method is used to extract features from the signal. Principal component
analysis is used to reduce the dimension of the features obtained from
autocorrelation.
The features obtained are put through a classification stage using a neural network.
The network uses a pattern recognition tool with a feed forward back propagation
network and a training function called scaled conjugate gradient to classify the the
features obtained from each signal.
The scope of the project is based on 35 subjects. A set of 10 trials of signals is
collected from every individual. 9 out 10 trials are used to train the network and the
1 trial from each individual is used to test the network.
The results obtained give a recognition rate of 95%. This means that it is a good
recognition system. It is found that ECG is unique for every individual. So ECG is a
good feature to use in a biometric system.
v
Table of Contents
Declaration...................................................................................................................ii
Acknowledgement..................................................................................................... iii
Abstract.......................................................................................................................iv
Table of Contents ........................................................................................................v
List of Figures............................................................................................................vii
List of Tables........................................................................................................... viii
List of Abbreviations..................................................................................................ix
CHAPTER 1: INTRODUCTION TO ELECTROCARDIOGRAM SIGNAL ...........1
1.1 Heart Structure...................................................................................................1
1.1.1 Surfaces and layers within the heart...........................................................2
1.1.2 Major heart structures.................................................................................2
1.1.3 Systematic and pulmonary circulation .......................................................3
1.2 ECG ...................................................................................................................4
1.2.1 Limb leads and Augmented limb leads ......................................................7
1.2.2 Precordial Leads .......................................................................................10
1.2.3 ECG signal labels .....................................................................................11
1.2.4 ECG sampling frequency .........................................................................12
1.2.5 ECG noise and artifacts............................................................................13
1.2.6 ECG applications......................................................................................14
CHAPTER 2: PREVIOUS WORK AND PROPOSED METHODOLOGY............16
2.1 Previous Work.................................................................................................16
2.2 Proposed methodology ....................................................................................18
2.2.1 Pre-processing ..........................................................................................18
2.2.2 Feature Extraction.....................................................................................19
vi
2.2.3 Principle Component Analysis .................................................................20
2.2.4 Template matching ...................................................................................21
CHAPTER 3: PRE-PROCESSING AND FEATURE EXTRACTION
METHODOLOGY....................................................................................................23
3.1 Experimental Setup .........................................................................................24
3.2 Methodology used ...........................................................................................25
3.2.1 Pre-processing ..........................................................................................25
3.2.2 Autocorrelation based feature extraction..................................................30
CHAPTER 4: CLASSIFICATION ...........................................................................32
4.1 Neural Network ...............................................................................................32
4.1.1 Pattern Recognition Network ...................................................................33
4.1.2 Feed-forward Back propagation network.................................................34
4.2 Process of Classification..................................................................................36
4.2.1 Feature Set................................................................................................36
4.2.2 Training the network ................................................................................37
4.2.3 Testing the network ..................................................................................40
CHAPTER 5: DISCUSSION ....................................................................................43
CHAPTER 6: CONCLUSION..................................................................................46
6.1 Future work and recommendation...................................................................46
References .................................................................................................................47
Appendix A ...............................................................................................................49
Matlab Code ..........................................................................................................49
vii
List of Figures
Figure 1-1: (a) Anterior view showing surface features, (b) Posterior view.....................................................3
Figure 1-2: Willem Einthoven..............................................................................................................................4
Figure 1-3: An ECG signal obtained from one heart beat................................................................................4
Figure 1-4: Types of ECG electrodes and its placements on the body.............................................................5
Figure 1-5: Circuit used to collect ECG signal ...................................................................................................7
Figure 1-6: The top three are Limb leads and the bottoms three are Augmented limb leads [1]...................7
Figure 1-7: ECG signal from the Limb leads [2]................................................................................................8
Figure 1-8 ECG signal from Augmented Limb leads [2] ...................................................................................9
Figure 1-9: Einthoven’s Triangle representation of Limb leads and Augmented limb leads [2] ...................9
Figure 1-10 Position of placement of Precordial leads.....................................................................................10
Figure 1-11 ECG signal from the six Precordial leads [2] ...............................................................................10
Figure 1-12: Noisy Signal (Top) and Filtered Signal (bottom) ........................................................................14
Figure 2-1: Noisy signal (Top) and Filtered signal (bottom) ...........................................................................18
Figure 2-2: Different fiducial points of an ECG signal ....................................................................................19
Figure 3-1: Process flow diagram of the steps involved in Biometric system.................................................23
Figure 3-2: Position of electrode placement to collect ECG data....................................................................24
Figure 3-3: Noisy ECG signal ............................................................................................................................26
Figure 3-4: Normalised noisy ECG signal.........................................................................................................26
Figure 3-5: Noisy ECG signal in frequency domain.........................................................................................27
Figure 3-6: Filtered normalised ECG signal.....................................................................................................27
Figure 3-7: Filtered ECG signal in frequency domain.....................................................................................28
Figure 3-8: ECG from 3 subjects.......................................................................................................................28
Figure 3-9: Hamming Window ..........................................................................................................................29
Figure 3-10: Filtered ECG signal after windowing..........................................................................................29
Figure 3-11: Autocorrelated data ......................................................................................................................30
Figure 3-12: 400 AC points from the maximum to the right...........................................................................31
Figure 3-13: PCA Hotelling’s T2
statistic data from 3 subjects.......................................................................31
Figure 4-1: Process flow diagram of a Neural Network [10] ...........................................................................32
Figure 4-2: A portion of the target matrix for training ...................................................................................34
Figure 4-3: Feed Forward Network...................................................................................................................35
Figure 4-4: Neural Network Training tool........................................................................................................38
Figure 4-5: Performance graph of the network................................................................................................39
Figure 4-6: Training state graph of gradient and validation checks...............................................................40
Figure 4-7: Confusion matrix with test results of 20 subjects .........................................................................41
viii
List of Tables
Table 1-1: Chart on position of placement of electrodes [2]..............................................................................6
Table 2-1: Results from using PCA and LDA [9].............................................................................................17
Table 4-1: Subjects assigned to classes..............................................................................................................37
ix
List of Abbreviations
ECG Electrocardiogram
PCA Principal Component Analysis
DCT Discrete Cosine Transform
AC Autocorrelation
BP Back-propagation
NN Neural Network
SCG Scaled Conjugate Gradient
SAN Sinoatrial Node
AVN Atrioventricular Node
LA Left atrium
LV Left ventricle
RA Right atrium
RV Right Ventricle
aVL Augmented vector left
aVR Augmented vector right
aVF Augmented vector foot
RL Right leg
LL Left leg
RR R to R peak
AP Action potential
RBFNN Radial basis neural network
LDA Linear discriminant analysis
MSE Mean squared error
MLP Multi-layer perceptron
1
CHAPTER 1: INTRODUCTION TO
ELECTROCARDIOGRAM SIGNAL
Security devices and technology have to be rapidly changing and advancing since
new ways of falsification are coming up all the time. Innovation in this field is of
utmost important to stay ahead of the game. Biometrics is one of the possibilities of
providing such protection. Biometrics refers to a certain physical, biological and
behavioural characteristic of a human being. There many features that are used as
the modalities of biometrics such as physical traits include face, iris and behavioural
include gait and keystroke. The process of using these unique features of everything
human being is done through computing and signal processing.
The problem which remains with these previously mentioned features is that they
can be falsified. In the last few years a lot of researchers have suggested the use
Electrocardiogram (ECG) for biometric recognition. ECG is signal wave used to
represent the electrical activity of heart. It has been found that this measure is unique
for every individual. It is thought to be a promising application in the biometric
field.
A lot of research has been done in this field now. Most of the researches try to study
the ECG signals by looking into its various aspects. This is done to find out the best
procedure of using ECG in biometric system. In this particular project, several
combinations of aspects of the ECG signal will be taken into consideration. Using
these aspects or features the identification process will be done.
1.1 Heart Structure
In order to study the signal it is important to study the biology of the heart. Only by
understanding it we get a good knowledge of the ECG signal. Thus this first chapter
will give a thorough demonstration of the Heart structure and the method used for
the acquisition of ECG signal will be elaborated.
2
1.1.1 Surfaces and layers within the heart
The human heart rests in the thoracic cavity of the human body. The wide superior
portion of the heart from which the great vessels emerge is the base of the heart. The
inferior end pointing to the left is the apex of the heart. It tilted at an angle so that
the inferior surface rests against the diaphragm with two thirds of it at the left side of
the sternum.
1.1.2 Major heart structures
The heart is divided into two sides: the left and the right. Each side has an upper and
lower chamber. So there are a total of four chambers in a human heart. The upper
chamber is referred to as an atrium and the lower as a ventricle. The apex is formed
by the tip of the left ventricle and the two atria form the base of the heart. The four
chambers are marked on their boundaries by shallow grooves called sulci.
The wall of the heart is composed of Cardiac Muscle tissues which have its own
blood supply and circulation, the coronary circulation. The heart is surrounded by
coronary blood vessels. The right and left coronary arteries found on the anterior
surface of the heart supplies blood when the ventricles are resting. The cusps of the
aortic valves cover the coronary artery openings when the ventricles contract.
The great veins of the heart such as superior vena cava, inferior vena cava and
coronary sinus return oxygen poor blood to the right atrium.
The great arteries carry blood away from the ventricles. The large artery is divided
into the right and left pulmonary artery which carries blood to the lungs where it is
oxygenated. Then this oxygen rich blood is returned to the left atrium through the
right and left pulmonary veins. Then the blood passes to the left ventricle where it is
pumped to the large aorta. The aorta then distributes the blood to the systematic
circulation.
3
(a)
(b)
Figure 1-1: (a) Anterior view showing surface features, (b) Posterior view
1.1.3 Systematic and pulmonary circulation
There are two types of circulation: pulmonary and systematic. The function of the
pulmonary circulation is to carry the blood from the left ventricle to the lungs and
back to the left atrium. The systematic circulation functions same like the pulmonary
circulation except that it functions with body tissues instead of lungs. Both the
circulations involve arteries, capillaries and veins. The circulations start from the
heart and come back to the heart in the end.
In short, the blood returning to the heart through the great veins is received by the
thin walled atria. The large thick ventricular walls contract simultaneously to send
the blood from the right ventricle to the pulmonary circulation and from the left
ventricle to the systematic circulation.
4
1.2 ECG
ECG is short for electrocardiogram. This term was first introduced in 1924 by
Willem Einthoven. He won a Nobel Prize for introducing the mechanism of
electrocardiogram [1].
Figure 1-2: Willem Einthoven
Mainly this process digitally records the electrical activity of the heart muscle tissue
at the cell level. It poses no pain or physical discomfort to the patient on whom it is
performed. Several electrodes can be placed on the skin which will make several
simultaneous aspects of spatial phenomenon accessible known as
electrocardiographic leads. This electrical behaviour helps to detect diagnose heart
abnormalities and thickness or damage in the heart muscle before or during heart
attacks. Electrodes are placed on the human body to receive the ECG signals .The
usage of electrodes to receive ECG signals range from a number of orthogonal 3-
lead systems to as many as 80 or 120 leads on some extremely redundant body. [1]
An ECG signal representing a single heart beat is shown below in figure 1-3.
Figure 1-3: An ECG signal obtained from one heart beat
5
Figure 1-4: Types of ECG electrodes and its placements on the body
The standard method used in obtaining the electrical activity of the heart uses 12-
lead electrode system. The position of the lead determines with reference to the heart
position determines the wave‟s morphology and amplitude. A wave can rise up or
fall or be invisible depending on this very phenomenon of spatial orientation. The
lead covers two orthogonal planes in this system: the frontal and transversal plane.
The frontal plane refers to the vertical plane which is made up of the three limbs
bipolar leads (called I, II and III) and three augmented unipolar leads (called aVL,
aVR and aVF). The transversal plane refers to the horizontal plane which crosses the
thorax orthogonally to the frontal plane consists of the six precordial unipolar leads
(called V1, V2, V3, V4, V5 and V6). Vector representation is used differently on
each lead to get the direction along which electric potentials are measured.
The position of the lead determines with reference to the heart position the waves
morphology and amplitude. When the wave goes above the base line it shows a
positive deflection and when it goes below the base line it shows a negative
deflection. A positive deflection means that the recorded wave has travelled towards
the electrode and a negative mean that it has travelled towards it. A summarised
chart on the position of placement of the electrodes is given below in table 1-1.
6
Table 1-1: Chart on position of placement of electrodes [2]
The electrodes are placed on the chest by the process given above to collect the ECG
data. The electrodes are placed in this way because of the location of the heart on
that very place on the chest and the electrical activity involving SAN and AVN takes
place during the heart cycle.
The position of placing the electrodes is shown in the diagram in figure 1-5. Zbody is
the resistance in between the electrodes where Z1 and Z2 are the lumped thoracic
medium resistances. Vecg is the bipolar scalar ECG scalar lead voltage and it is
measured with respect to a different reference potential. The foot has to be
connected to the ground in this process otherwise no signal will be obtained.
7
Figure 1-5: Circuit used to collect ECG signal
In the standard 12 lead electrode system there are two ways of electrode placement.
The first one is the limb and augmented limb leads Bipolar and unipolar are two
different types of lead that is used in obtaining the signals. All lead apart from the
limb leads use unipolar leads. The two ways of electrode placement is further
described below.
1.2.1 Limb leads and Augmented limb leads
Figure 1-6: The top three are Limb leads and the bottoms three are Augmented limb
leads [1]
8
1.2.1.1 Limb leads
In figure 1-6 the top three drawings showing the leads I, II and III are called the
limbs leads. If the three lines on each figure are put together then they form what is
called the Einthoven‟s triangle. The leads use the electrodes in different
combinations.
Lead 1 captures the potential difference in between negative RA electrode which is
on the right arm and the positive LA electrode which is on the left arm.
Lead II captures the potential difference in between negative RA electrode which is
on the right arm and the positive LL electrode which is on the left arm.
Lead III captures the potential difference in between negative LA electrode which is
on the right arm and the positive LL electrode which is on the left arm.
Figure 1-7: ECG signal from the Limb leads [2]
1.2.1.2 Augmented Limb leads:
In figure 1-6 the three drawings on the bottom show the leads aVR, aVL, and aVF
which are otherwise called the augmented limb leads. They are basically derived
from the leads I , II and III. The augmented limb leads are created by combining two
of the three electrodes into a mutual negative pole and the remaining lead is kept as
a positive pole. This modification is known as Wilson‟s central terminal.
Lead aVR which stands for “augmented vector right” places a positive electrode on
the right arm. The mutual negative pole here is a combination of the left arm
electrode and the left leg electrode and thus the signal strength of the positive
electrode on the right arm is augmented.
9
Figure 1-8 ECG signal from Augmented Limb leads [2]
Figure 1-9: Einthoven‟s Triangle representation of Limb leads and Augmented limb
leads [2]
Lead aVL which stands for “augmented vector left” places a positive electrode on
the left arm. The mutual negative pole here is a combination of the right arm
electrode and the left leg electrode and thus the signal strength of the positive
electrode on the left arm is augmented.
Lead aVF which stands for “augmented vector foot” places a positive electrode on
the left leg. The mutual negative pole here is a combination of the right arm
electrode and the left arm electrode and thus the signal strength of the positive
electrode on the left leg is augmented.
10
1.2.2 Precordial Leads
There are six leads under this category and they are also known as the chest leads.
They are placed on different parts of the chest as shown in figure 1-10. The leads
used here are unipolar as was mentioned before. This is because the Wilson‟s central
terminal is used as the negative electrode.
Figure 1-10 Position of placement of Precordial leads
The limb and augmented leads together form the hexaxial reference system which
helps to obtain the electrical activity of the heart on the frontal plane. The leads
work in pair to cover a certain area on the chest. The pairs are V1 and V2, V3 and
V4 and V5 and V6 which covers the anteroseptal ventricular region, the anteroapical
ventricular region and the anterolateral ventricular region respectively. These leads
are used to record the signals from horizontal plane known as the z-axis.
Figure 1-11 ECG signal from the six Precordial leads [2]
11
1.2.3 ECG signal labels
The living cells are negatively charged 90mV) when they are at rest but they are
quickly depolarized by the electrical stimulus. This change in voltage leads to the
mechanical contraction of the heart by compressing and electrically exciting the
proteins in the cells. This contraction leads to the changes in chamber volume. The
electrically activated procedure of contraction and distraction happens repeatedly to
pump blood effectively.
The group of cells located in the upper part of the right atrium known as sinoatrial
node (SAN) electrically initializes a heartbeat by its ability of spontaneous
discharge.
The resulting wave front scatters around all the atrial paths and causes both the atria
to contract which forces blood into the ventricles. The process electrical conduction
is performed at a speed of ca. 4m/s through atrioventricular pathways. The electrical
conductivity is low in the atrioventricular node so the wave front is delayed by
120ms. During this delay the content of atria is transferred through the
atrioventricular valves to the ventricles. Special conducting pathways (Purkinje
fibres) make up for the large ventricles and helps in accelerating the propagation of
the wave front, thus allowing efficient pumping of the blood throughout the body.
Until the relaxation state goes up to 65mV after depolarization the cells do not
receive or transmit electrical stimuli. Thus the wave front dies after receiving the
heart muscle tissue. The refractory period between two heartbeats is around 200ms.
The ECG signal is used to represent the electrical activity of heart over time taken
from the human body surface for the different heart regions. The ECG signals here
refer to the action potential (AP) curves with corresponding characteristics of the
propagating wave front at different heart regions simulated at different stages of the
cardiac cycle. Each heartbeat is represented by the series of five principal waves
known as P, Q, R, S and T. (Refer to figure 1-3)
1. The P-wave shows the beginning of a new beat. It is produced by the atrial
activation as a small amplitude wave. This wave can be positive, negative, or
biphasic (with 2 peaks of opposite polarity). It ranges from 0.08 to 0.1ms.
The section in between the P wave and QRS complex is known as the PR
12
interval. It ranges from 0.12 to 0.20ms. This represents the start of atrial
depolarization until the start of ventricular depolarization.
2. The QRS complex is a combination of three sharp waves that are formed by
the ventricular activation. The initial small wave is called the Q
wave(depolarization on the wall between the ventricles), followed
significantly big wave with opposite polarity called the R
wave(depolarization on the left ventricle external wall) and finally the
ventricular activation is finished by a small wave called the S wave
(depolarization of upper part of ventricles). The Q-wave is negative, the R-
wave is positive, the S-wave is negative and the last positive deflection is R‟.
The duration of it ranges from 0.06 to 0.6ms. The ideal image of the QRs
complex is given below but is practical it depends on the kin of lead that is
being used.
3. The ventricular repolarization is shown by a small T-wave of variable
morphology. Seldom has a small U-wave of unknown origin following it
signifying the last bits and pieces of the repolarization process. The T-wave
can be positive, negative, bimodal or biphasic or be an upwards or
downwards deflection only. The QT interval shows the depolarization and
repolarization of the ventricle which ranges 0.2 to 0.4 seconds.
1.2.4 ECG sampling frequency
A study showed that the RR interval is usually 1ms. A critical error for heart rate
variability is likely to occur if the frequency range happens to fall in between the
threshold value. This is why Timo Bragge at al suggested that the ECG sampling
frequency should at least be 500Hz.
The normal heart rate ranges in between 60 to 100bpm. The usual heart rate
frequency ranges in between 0.50 to 3.0Hz for a heart rate of 30 to 180 bpm. The
typical value for the highest frequency is about 125Hz depending on the age, sex
13
and health of a person. But in case of paediatric ECG processing it sometimes jumps
up to 150Hz. The frequency ranges in some of the ECG application is given below
There are eight amplifiers in the modern-microprocessor based interpretive
machines. They can store and sample eight leads simultaneously –I, II, and V1-V6.
Then the rest of the four leads are stored and sampled – III, aVR, aVL, and aVF.
These machines have enough memory to store all the leads for a 10 second interval
at a sampling rate of 500 samples per second. The clinical bandwidth ranges from
0.05 to 100 Hz. The upper limit is halved to detect arrhythmias so the range for
Intensive Care Unit patients ranges from 0.50 to 50Hz. A heart rate meter eliminates
any non-ORS waves from the ECG signal such as the P and T waves. Thus the
frequency is centred using bandpass filter at 17 Hz. Late potential measurements are
the small frequency events that occur following the QRS complex wave. So the
frequency here ranges up to 500Hz.
1.2.5 ECG noise and artifacts
An ECG signal while being collected is often corrupted by external sources or
noises. These external artifacts may be of artificial or biological in nature. The
artificial sources of noise include power line interference, instrumentation noise,
electrosurgical noise, impulse noise and electrostatic potentials. The biological
artifacts come from motion and muscle artifacts and baseline drift due to respiration.
But the most important biological artifacts come from motion and muscle artifacts.
(Tompkins, W.J., 1993)
The noise in the signals includes high frequency components and low frequency
components. The high frequency components include the noises such as the power
line interference and the low frequency components come from the baseline wander.
In order to retrieve the original signal by removing the artifacts components we have
to filter the signal. The low frequency components usually fall below 0.5 Hz and the
high frequency components are above 40 Hz to 100Hz. There are numerous filters
which can remove these noises. One such example is the FIR band pass filter which
can set its cut off frequencies at 0.5Hz and 40Hz. This filter will only allow
frequencies ranging in between cut-off frequencies to pass through. When selecting
the cut-off frequency it must be made sure that the signal does not get much
14
distorted. The muscle noise remains a problem still because it overlaps with the
actual PQRST of the ECG data. The figure below shows a filtered signal at the
bottom of the figure. It can be seen that the bottom signal is much cleaner after
filtering.
Figure 1-12: Noisy Signal (Top) and Filtered Signal (bottom)
Before collecting a signal there are three things that need to be made sure this
includes electrode placement, electrode selection and skin preparation. In order to
keep the interference at a minimum the electrodes must be placed on smooth
surfaces where there is minimal hair, skin creases, bony protuberances and lesser
muscle. The skin must be clean cleaned, dry, clipped off excess hair and scrubbed to
remove any dead skin cells.
1.2.6 ECG applications
There are various uses made out of the ECG signals. It can be applied in numerous
fields of study as well as applications [2]. This is because of the huge benefits that
can be obtained from the usage of ECG signals.
One of the main areas of its usage is in the medical sector. ECG is widely used in
the medical sector for determining the various conditions of the heart. It also shows
other body conditions alongside like electrolyte abnormalities. Temporal durations
15
of the heart‟s electrical phenomena and their variations over time are parameters of
primary clinical relevance. Other than that low variations of cycle-based parameters
are also considered clinically important. Biometrics is another large field of study
where the ECG signals is highly studied and used. The increasing crime and
falsification rates have led to the search for unique traits that can be used to create a
good security system. ECG is considered to be an exceptional trait which would be
hard to falsify. It is good for biometric recognition applications because its unique
features such as relative onsets of the various peaks, beat geometry, and responses to
stress and activity give personalised characteristic of an individual. The ECG
features also vary according to age, sex and way of lifestyle (caffeine intake,
exercising, and body weight). All these features can prove to be very useful in the
biometric field. Although this inter-individual variability might cause problem in the
identification process in the long term due to the changes in the ECG features that
come over time but this can be fixed by updating the database of the biometric
system regularly. This also reduces the probability of the system being falsified
since it will be updated periodically.
Besides the quality which makes it hard to falsify there are other sides which makes
its robust to the environmental factors because the heart lays highly protected deep
inside the thoracic cavity.
This is a promising field of study but it requires thorough understanding and
knowledge of the signal and its intra-individual variability features.
16
CHAPTER 2: PREVIOUS WORK AND
PROPOSED METHODOLOGY
The previous chapter explains the basic things related to an ECG signal. This
chapter will look more into the details as to what kind of methodology is used to do
ECG signal processing. A brief description will be given from some previous studies
that have been done related to this topic. Following that the proposed methodology
for this project will be discussed.
A biometric recognition system has four vital segments. They are [21]
(i) Experimental Setup: The data acquisition method is described.
(ii) Pre-processing: The signals recorded are filtered to remove noise
(iii) Feature Extraction: The unique features of the signals are extracted
(iv) Classification: An individual is identified
2.1 Previous Work
In one of the previous study the recorded signals are filtered to remove noise. The
Principle Component Analysis (PCA) is used with Linear Discriminant Analysis
(LDA) for classification purpose for feature extraction and dimensionality reduction.
Radial Basis Neural Networks (RBFNN)-based classifier is used for classification.
The ECG is collected using industry standard ECG hardware equipped with PC-
interface with one channel electrocardiograph. The ECG is transferred via this
interface into the computer and stored as a binary file.
The project uses real world ECG data of nine persons with twenty complete cardiac
cycles for each collected at different times. Lead 1 from the Einthoven‟s triangle is
used. The sampling rate of the signal is at 128Hz. The high frequency components
of the signals recorded are removed by using a zero phase digital filter with a cutoff
frequency, fc at 0.5Hz which removes the baseline drift from the signal. As
mentioned earlier in chapter 1 that the dominating noise results from the muscle
artifacts. In this paper a wavelet denoising with soft threshold of the Wavelet
coefficients is used. This is because the muscle artifacts are additive and is modelled
with Gaussian white noise.
17
The ECG signal after filtration is segmented into full cardiac cycles. This causes the
baseline P and T wave to vanish because this segment is influenced by heart rate
variations. The ECG signal is repetitive and is a random process which is why a
sequential probabilistic model known as the Markov model is used [9]. This
segmentation will assign each signal sample to a class.
A matrix is formed where each segmented cardiac cycle is a row vector with finite
number of elements. The columns of the matrix are the training cycles per person
that are strongly correlated. The input data in then sorted according to its correlation
matrix where the first 20 vectors are used in PCA. PCA is a popular technique which
is used to reduce the dimension of discriminative information into a small number of
coefficients [9].
LDA is applied on the features extracted from PCA which reduces the features from
20 down to 9. The application of LDA also increases the class discriminativity.
The classification stage uses a Radial Basis Function Neural Network (RBFNN).
RBFNN consists of asset of input nodes and a hidden neuron layer where each
neuron has special type of output radial basis function which is centred at the mean
vector of a cluster in feature space. In the output layer the outputs of the hidden
neurons are summed up using a linear output function. There are two classifiers: one
which classifies the PCA projections and another which classifies the LDA
projections. Both the classifiers are realised by the RBFNN [9].
The results obtained for each person from the two types of feature extraction method
is given below in the table 2-1.
Table 2-1: Results from using PCA and LDA [9]
18
It can be seen from the table 2-1 that the recognition rate with PCA based feature
extraction is higher than with LDA. In fact some of the values under LDA are lower
than in PCA.
2.2 Proposed methodology
The signals recorded will be filtered using a Butterworth bandpass filter. The pre-
processed signal will then be used to extract fiducial or non-fiducial features. The
fiducial feature extracts the temporal, angular or amplitude features of the signal.
The non-fiducial feature extracts discriminative information from the signal without
localizing any fiducial markers. Then the extracted features are classified using a
template matching method.
2.2.1 Pre-processing
A recorded ECG signal usually has a lot of noise in it. So is this stage a way has to
be found so that the noise is removed. There are two types of noise which are make
up the high frequency components such as power line interference and low
frequency components such as the baseline wander in the signal. A Butterworth
band-pass filter of order 6 with cutoff frequencies at 0.05 and 40Hz is used.
Figure 2-1: Noisy signal (Top) and Filtered signal (bottom)
19
2.2.2 Feature Extraction
In order to study a signal different features have to be extracted from it. These
extracted features can be fiducial or non-fiducial. Fiducial points refer to the
angular, temporal and amplitude features of a signal. Non-fiducial features extract
discriminative information from the ECG trace without having to localize fiducial
markers. This means non-fiducial points extracts patterns globally whereas fiducial
points extracts information local to a single heartbeat
2.2.2.1 Fiducial features
The figure below shows the different kind of fiducial features that can be extracted
from the signal. The features chosen to be extracted were the P, Q, R, S,T, RQ, RS,
RP, PT amplitudes and the RR, QRS, PT, ST, QR and RS intervals. A thresholding
method using the Pan and Tompkins algorithm were used to determine the P, Q, R,
S and T. Then from the determined points the rest of the temporal and amplitude
features were to be extracted.
Figure 2-2: Different fiducial points of an ECG signal
20
2.2.2.2 Non-fiducial features:
The pre-processed signal has to be segmented into non overlapping windows. The
normalised autocorrelation (AC) for every window is obtained. The AC coefficients
are further reduced by using Principle Component Analysis (PCA) method.
Thus the method can be called AC/PCA. When the denoised signal is autocorrelated
the actual fiducial location will no longer required be found. The AC blends in sum
of all the samples in a window to sequence of sum of products [4]. The AC takes
into account the representative and highly distinctive features of the signals
recorded. Even though the signal is non-periodic but this will still work because the
signal is highly repetitive.
Windowing is done to cut out a certain portion of the signal. The AC is performed
on the windowed ECG. The data window of length N has to be longer than the heart
beat rate. The relative distances and cycle lengths of ECG varies over time
according to a subject‟s physical or mental or other states. The AC is shift invariant
so it will gather the similar features over multiple heart beat cycles. The normalized
autocorrelation coefficients Ȓxx[m] can be computed as:
Ȓxx[m] =
∑
| |
(1)
Where x[i] is the windowed ECG for i=0,1… N-|m| - 1), x[i+m] is the time shifted
form of the windowed ECG with a time lag of m = 0, 1, ...(M − 1); M<<N, where N
is the length of the windowed signal.
2.2.3 Principle Component Analysis
This uses orthogonal transformation to convert a set of correlated variables into a set
of uncorrelated variables which are called the principle components. The features
extracted from the AC are run through the PCA to compress the discriminative
information into low number of coefficients. The PCA is a linear transform
represented by
Y = PT
X (2)
Where Y = transformed data
P = linear transformation matrix
21
X = Data set
The variables in the data set X are correlated to each other in varying degrees. The
transformed data output obtained gives the variables for the various linear
combinations of the variables in X. The target is to get the covariance matrix of the
transformed data in diagonal format. The matrix representation can be written as:
∑y = diag {λ1, λ2……., λM} (3)
The equation for ∑y may be written as
∑y = YYT
= (PT
XXT
P) = PT
∑x P (4)
In the equation above the columns of P consists of the eigenvectors of ∑x. A
diagonal matrix Λ is formed from the eigenvalues in ∑x. Hence the eigenvector and
eigenvalue decomposition is done according to the equation below:
∑x P = PΛ (5)
Since the ∑x is symmetric so P is an orthogonal matrix and the eigenvalues of ∑x are
real numbers.
Thus for the orthogonal matrix P:
PT
∑x P = PT
PΛ and (6)
PT
∑x P = Λ (7)
This makes ∑y = Λ. The eigenvalues found are the results with a polynomial degree
M. The leaves the final stage which is to find the eigenvectors corresponding to the
eigenvalues and arrange them in descending-energy order.
The original basis has a dimension of MxM but in the result the basis is reduced in
size to MxL (where L<<M). The factors with negligible energy are eliminated which
reduces the size of the result obtained.
2.2.4 Template matching
At this stage it is necessary to reduce the size of a large class number problem to a
small-class number problem. This is the main idea behind the classification stage.
22
This is what the template matching method does. It reduces the number of class and
makes the possibility of classification easier by making the scope smaller. The
performance of the system is enhanced because the search space is smaller.
This can be performed using different distance metrics. One such distance metrics in
the Euclidian distance. [4] The computation is done based on the equation below.
D(x1, x2) = √ (8)
This equation calculates the normalized Euclidean distance D between two feature
vectors x1 and x2. In this case the feature vectors are the features (covariance matrix)
obtained from PCA. C is this equation refers to the dimension of the feature vectors.
This will make the computation fair for the different dimensions that x has.
The PCA results obtained from the trained samples will be compared with the PCA
result that is obtained from a test input. The distance will be computed for the test
input with all the feature vectors stored for the trained samples.
The minimum distance found will lead us to the correct person. This is because the
stored feature vector for a subject must have been close to that of the test input
features vectors. Thus the stored feature vector giving the minimum distance is the
correct person. Hence the identification is done.
23
CHAPTER 3: PRE-PROCESSING AND
FEATURE EXTRACTION
METHODOLOGY
This chapter will focus on the process of how the data was collected and processed.
This will be done based on the process explained in chapter1 on ECG collection
method and chapter two which explains the pre-processing and feature extraction
method. The ECG records database used in this project are recorded by one of the
senior students who have been working on a project related to biometrics. Still the
process of collection is relevant to the report and shall be explained in details in the
later sections of this chapter.
This chapter will also discuss in detail the method chosen to process the signal. The
different stages of operation performed on the dataset and the results obtained will
be presented for better understanding. This will give a clear picture as to why the
discussed algorithm is chosen. The process flow diagram of the whole process is in
the figure 3-1.
Figure 3-1: Process flow diagram of the steps involved in Biometric system
ECG collection • Experimental
Setup
Filter and Window • Pre-processing
Autocorrelation • Feature
Extraction
PCA • Dimension
reduction
Neural Network • Classification
24
3.1 Experimental Setup
The ECG database used in this project was collected using Lead 1. The software
used to collect and measure the data was Matlab R2009b software. It was used in
combination with the hardware gMobilab+ by gTech, Guger Technologies,
electrodes and personal computers. The figure below shows the position of the
electrodes placed on the human body.
Figure 3-2: Position of electrode placement to collect ECG data
The electrode place on the lower left and right rib form the Einthoven‟s triangle as
was mentioned in chapter 1.
The data was collected from 35 subjects with a age, weight and height between 21-
25 years old, 50 to 100 kg and 160 to 200 cm respectively. When the signals were
collected the subjects were made to sit in a comfortable position on a sofa. They
were given some time to come to a state of relaxation. The palms were rested on the
handles of the sofa on the either side of the person. The signals were obtained from
Ch 5 -ve
GND
Ch 5 +ve
25
the subjects at a sampling frequency of 250Hz. There were 10 sets of signals
collected per person. Each signal was recorded for a length of 60 seconds time [2].
3.2 Methodology used
This part will go through the method used in signal processing and collecting the
features from every signal. There is several stage of work done in each of these two
stages. In general the steps are:
 Pre-processing
 The first 30 seconds of the signal are taken.
 The signal is normalised
 ECG records are filtered to remove noise
 The filtered signal is windowed
 Feature Extraction
 The normalised autocorrelation coefficient s are extracted from
the filtered signal
 The PCA covariance matrix is generated for classification
The similar steps are repeated for all the 35 individuals in creating biometric
recognition system.
3.2.1 Pre-processing
The ECG signals in the database are filtered to remove the noise. The signals are not
only have high and low frequency components of noise. It also has muscle noise
which overlaps with the original signal. Traditional methods like a bandpass filter
can often degrade a signal. Wavelet can de-noise a signal without appreciable
degradation [3]. This is a why a daubechies wavelet „db6‟ which is Discrete Wavelet
Transform (DWT) function at scale 12 is used. This removes the noise from power
line interference and baseline wander. All the signals are standardized at a sampling
frequency of 256Hz.
26
Figure 3-3: Noisy ECG signal
Figure 3-4: Normalised noisy ECG signal
The signal in figure 3-2 is not so sharp and clear. The second figure 3-3 shows the
signal is normalised within -1 to 1.
27
Figure 3-5: Noisy ECG signal in frequency domain
The unfiltered signal in the frequency domain in figure 3-4 shows a spike at 50Hz.
This noise is from power line interference.
Figure 3-6: Filtered normalised ECG signal
The figure 3-5 shows a de-noised signal. The signal after filtering gives a much
clean and sharp signal.
28
Figure 3-7: Filtered ECG signal in frequency domain
The filtered signal in the frequency domain in figure 3-6 shows no spike at the
50Hz. This is evidence that the signal is filtered properly.
A workspace folder is created where all the ECG records are stored. When the
program is running each signal passes through the pre-processing stage and goes
onto the feature extraction stage one after another. This way signal from all the 35
individuals are filtered before they are taken in for feature extraction. The figure 3-7
is evidence of the fact that each person has their own unique ECG signal. Filtered
and normalized signals of 3 subjects are shown.
Figure 3-8: ECG from 3 subjects
29
Figure 3-9: Hamming Window
Figure 3-10: Filtered ECG signal after windowing
ECG is a repetitive signal. When it is replicated after being sampled there happens
to be spectral leakage [15]. In order to avoid discontinuities of the repetitive
structure when it is tiled and reduce spectral leakage a window function is used.
Hamming window is used on the filtered signal to avoid any spectral leakage. This
will help to give a much smoother transition.
30
3.2.2 Autocorrelation based feature extraction
The reason for choosing this non-fiducial over fiducial detection is because it is fully
automatic. The main function of autocorrelation is to extract a set of significant data
from a given set of data. The purpose it serves is that it will represent the data given
to it in lower dimension. Since is ECG is highly repetitive it becomes easier to
extract the autocorrelation features from it. In Chapter 2 it was mentioned under the
proposed methodology that AC is shift invariant so it will gather the similar features
over multiple heart beat cycles [4]. It will then blends in sum of all the samples in a
window to sequence of sum of products. The AC takes into account the
representative and highly distinctive features of the signals recorded.
The figure 3-10 shows the autocorrelated data obtained from the windowed signal.
The next figure 3-11 shows the segmented autocorrelated data by taking 400 data
points from the maximum. QRS complex provides least variability under different
conditions [4]. So 400 from the maximum point of the autocorrelated data is taken
which is equivalent to the length of QRS complex. The first figure shows the
autocorrelated data of the windowed signal. The next figure zooms to 400 AC points
from the maximum point of autocorrelated data.
If the window length is 5 seconds then the autocorrelated data will be present over
10 seconds.
Figure 3-11: Autocorrelated data
31
Figure 3-12: 400 AC points from the maximum to the right
Then Principal Component Analysis is applied on the autocorrelated data to further
reduce the dimension [9]. It discards the insignificant factors and thus reduces the
dimension. It does so using linear transformation. The PCA generates the
Hotelling‟s T2
statistic which gives the multivariate distance of each observation
from the center of the data set. Figure 3-13 shows the plot of the T2
statistic of 3
individual. It can be seen clearly that each is different from another.
Figure 3-13: PCA Hotelling‟s T2
statistic data from 3 subjects
These T2
statistic shall be used as input features for the classification stage in next
chapter. The process flow diagram of the whole system is shows in figure.
32
CHAPTER 4: CLASSIFICATION
This chapter is about the last stage in this project. This is regarding the classification
of the various features of the different subjects according to the class assigned to
them. A set of features is given into the classification stage for it to differentiate
among the features and be able to assign them to particular individuals. A class is
created for each individual and under the class a large set of data is given for the
system to train itself on the variations in data an individual might have. Based on
this logic the recognition stage is completed.
4.1 Neural Network
The method used for classification in this project in the neural network (NN). The
name given to this network is based on the biological network in which the
biological neurons are functionally connected in a nervous system [16]. In a similar
the NN that is going to be used in the classification stage has artificial neurons in
interconnected groups. These groups use the connections for computational
purposes. It will take in inputs and adjust its structure (weights) to generate a pattern
that connects the input to the output. The process is shown in the flow diagram
below.
Figure 4-1: Process flow diagram of a Neural Network [10]
In the flow diagram in figure 4-1 the network is given an input. Its function is to try
and match the output to the target by adjusting its weight until the output is a close
match to the target.
This process can be compared with the way the neurons in a human brain works. It
will works in its own way and find out a solution on its own. It does not rely on
33
anything to find out the solution. There are things that a human might not notice but
it will not be missed by the neural network. It processes information using the high
degree of interconnection in between its massive simple processing units which
work together to achieve massive parallel distributed processing. The neural
network is remarkable specially for its
 Adaptive learning: Ability to learn to do a task based the input given to it
 Self-organisation: While learning it organises the data given to it
 Real time operation: Computations are carried out in parallel
 Fault tolerance: Performance is degraded by partial degradation of a network
4.1.1 Pattern Recognition Network
There are several functions that can be implemented by the neural network. This
classification requires a pattern recognition tool. Thus the pattern recognition
function was chosen. A matrix where each of its columns contains the feature
vectors obtained from an individual. A target matrix is also fed along with it which
will have the same number of columns as the input matrix. The purpose of the target
matrix is to show in each of its column the class to which the corresponding column
in the input belongs to.
There are a total of 35 subjects from whom signals are collected. For each subject
the feature vectors are generated. A total of 10 trials are collected per subject. One
trial among the 10 for each subject is kept to be used as test input. The other 9 trials
from the 35 people are used to form the training set (input) matrix which has 315
columns of feature vectors. A target matrix of the size 35 by 315 is formed. Since
the training set matrix and the target matrix has the same number of columns thus
each column in the target matrix corresponds to a column in the training set matrix.
So each column in the target matrix can be assigned to an individual by setting a row
in each column of the target matrix to 1. If the first 9 feature vectors column in the
training set input belongs to person 1 then the class 1 in the target matrix will have
each of first 9 columns will have their first row as 1. A portion of the target matrix is
shown in the table in figure 4-2 for understanding.
34
Figure 4-2: A portion of the target matrix for training
The one set of data kept separately from each subject will lead to a set of 35 data‟s.
Thus a matrix of 35 columns of feature vectors will be formed to be used as test
input signals in the network that will be formed after training the network with the
training set matrix and the target matrix.
The input space is shared by all the neurons. One analog neuron is assigned for each
pattern that is to be recognised [2]. Several neural nets are working in parallel for
pattern recognition trying to find features that are common along the columns in the
matrix. These features are passed into another to find the closest match. This is how
the neural network works in the process of recognition. All that the network needs
is that it is given sufficient data for training itself for pattern recognition and an
output with proper structure algorithm.
4.1.2 Feed-forward Back propagation network
A feed forward back-propagation network is used. It is the simplest kind of neural
network. The signal travels from the input to the output which means in one
direction only. There is no feedback from any part to some other part in the network.
It has three layers which include an input layer, a hidden layer and an output layer.
35
Figure 4-3: Feed Forward Network
A multi-layer perceptron network is used in the feed forward network which makes
it so useful. Multiple layers are interconnected with each other where each layer is a
computational unit. There is one neuron in each layer which is connected to all the
other neuron in the subsequent layer.
The function of the network is to try and get a close match of the output to the
target. It will keep on training itself with different weights and parameters. The real
intelligence on the network lies in the adjusting of weights. A MLP uses several
kinds of learning method. The method of adjusting the weight uses a learning
algorithm known as Back Propagation (BP). An error function is pre-defined in the
network. The job of the network is to compare the output to the target matrix which
indicates the correct answer and compute the error. The error will then be sent back
through the network which the algorithm will use to adjust its weight of each this.
This process will be repeated until the error value is small. When the error becomes
small this mean the network has determined the target function.
The learning method used by the back-propagation network is called a supervised
leaning method [19]. It mainly back-propagates the errors through the network. It
has the ability to calculate the desired output from the input given. The method used
in back-propagation is derived from the delta rule. The Delta rule uses a gradient
descent algorithm from the family of Conjugate Gradient algorithms. The training
function used in this project is the scaled gradient conjugate (SCG).
The SCG is a second order Conjugate gradient algorithm. It assists in achieving the
target functions of the several variables. There are two main reasons for choosing it.
They are:
36
1. It uses a step sizing scaling mechanism which prevents it from line searching
per learning iteration and which in turn reduces the time consumption on its
part [7].
2. It uses the model trust region from the Levenberg-Marquardt algorithm with
the conjugate gradient approach. This makes SCG faster than any other
second order algorithm [20].
The parameters used in the training process are epochs, show, goal, time, min_grad,
max_fail, sigma and lambda.
The weight vector is a point in the weight space. Minimisation is an iterative process
to minimize the approximation of the function [2]. The error function in the SCG
with respect to the power w:
Eqy y) = E w) + E‟ w)T
y + yT
E” w)y (1)
The critical point in Eqy(y) must be found to obtain the minimum of Eqy(y). The
equation for finding the critical point is:
Eqy(y) = E” w)y + E‟ w) = 0 (2)
The neural network has a limitation which is it works on trial and error basis. The
network has to be trained again and again by changing it parameters and adjusting
its weight. Although this process is a bit time consuming but ultimately it is able to a
give an optimum output.
4.2 Process of Classification
In this stage the every process done from the start with training the network and then
testing along with the results obtained will be explained in details.
4.2.1 Feature Set
ECG signals were collected from 35 subjects. The number of trials collected per
subject was 10. The signals were collected at a sampling frequency of 256Hz. That
means there are a total of 350 separate signals altogether. Now as was explained in
the previous sections out of the 10 trials per person 9 were taken for training the
system. That means 315 signals were used to train the system in total. The 1 signal
37
kept aside from each person will add together to make 35 signals to be used for
testing.
The 315 signals would each give a set of feature vector. A matrix is formed with 35
columns where each column represented features extracted from one signal. Thus
training matrix has 9 columns of features extracted per person. This will help to
form the target matrix where each of its columns will correspond to a certain subject
and the row with a value of one will indicate the person.
4.2.2 Training the network
At first when the neural network start is to choose the pattern recognition tool since
we are trying to recognize individual using certain feature collected from their
signals. Then the training matrix and it corresponding target matrix will be loaded.
The training matrix will be used as a set of input. The training matrix will show
which feature vector belong to which individual. The system will be simulated based
on these to matrices provided. The network has to be trained several times until the
percentage error is small. While training the network repeatedly some of the features
like the number of neurons may be altered to achieve a good performance and lower
error percentage from the system. As can be seen from the table below that each
class is assigned to one individual.
Table 4-1: Subjects assigned to classes
38
The network will be trained according to class to see the variations in features that
one subject might have. This will help for it to figure out the function for the
network. The different aptitudes of the network during training and testing is terms
of time, performance, success rate, failure rate, iterations and other criterions will be
elaborately discussed.
The neural network training interface is shown in figure 4-4. At the top we can see
the architecture of the network. The layer of the network can be seen. The 50 written
below the hidden layer shows 50. The default is 10. But by trial and error it was
found that 50 neurons at the hidden layer gave the best result. The goal of the mean
squared error is to reach 0 according to the network. The validation check shows 6
maximum failures. So this means if the maximum failure is 6 then the system will
stop its process. The gradient value also has to be accounted for because as soon as
its minimum value is met the system will halt too. From the training tool it can be
seen that the target was met after 97 iterations and at a gradient of 0.000746.
Figure 4-4: Neural Network Training tool
39
The plots section in the training tool shows different methods of analysing the
network. The first one is the performance graph. This is derived from the training,
validation and testing set. Now when we give the training matrix and target matrix
for training the network the system automatically takes a certain percentage of the
training matrix data for validation and testing. In this case the percentage stand at
15%. This is done because the validation will indicate the system to stop if they
maximum failure reaches the number of validation samples. That means the network
generalization is no longer showing any improvements. The test samples are mainly
used to check the performance of the network. That means to see how effective the
network is. The graph in figure 4-5 will show a representation of it.
Figure 4-5: Performance graph of the network
It can be seen from the graph in figure 4-5 that the training data has crossed the
threshold of the Mean Squared error long before the validation and test data. The test
data remains on the top showing that the test performance is good which means the
network is working in the correct part. The validation is right below the test curve.
Both the validation and test curve pass the best threshold at the same time. The
40
circle on the graph shows the point where the best iteration is achieved which is at
91 iterations. This is the returning point of the network.
Another graph to be observed is the training state graph. This graph in figure 4-6 is
plotted based on the training state during the training of the network. This will show
how the goal to meet an MSE of 0 was achieved. The graph will show the gradient
flow and validation checks in two separate graphs.
Figure 4-6: Training state graph of gradient and validation checks
4.2.3 Testing the network
The test samples taken by the network itself during the training are small. Thus it is
better to check with a larger database. The matrix that was created with a feature
vector set from 1 signal from each of the 35 person will be used to test the system. A
target matrix created for it will also be used.
After the target function is achieved with the lowest error percentage the network is
tested with a set of input. This is done to check how correct the network that is
41
formed is. It is necessary to check if the network works properly with a different set
of input. Thus an input matrix with a feature set obtained from each subject is used
to test the network. A target matrix corresponding to the input matrix is also formed.
The test matrix with 35 columns formed where each column represents the feature
vector extracted from each signal kept from a person. It must be made sure the trial
number used for the each of the 35 subjects is the same. If trial 10 is being used for
testing then all the input taken from all the subjects must be trial 10. There are
several aspects that will be studied in the process. The very first one will show the
percentage of recognition on a confusion matrix. The confusion matrix for 35
subjects is too large to see the details properly so a confusion matrix with 20 inputs
tested is given below in figure 4-7.
Figure 4-7: Confusion matrix with test results of 20 subjects
42
A confusion matrix is used to represent the target classes with respect to the number
of inputs. It will show the whole matrix of the results obtained. But the green
diagonal line shows the correct results. This line is an infusion of the target input
matrix with the number of classes. A 1 in each of the boxes along the line shows that
the individual has been recognised. A 0 in the box means that the individual could
not be recognised. In the blue which is in the last column of the last row shows the
percentage of recognition. The percentage value written in green colour in the blue
box shows the recognition percentage. And the value is set at 100%. That means 20
out of 20 people were correctly identified. This is a very good result.
In the same way the recognition rate for 35 subjects were found to be at 95%. This is
a very good result too. Thus it is an efficient biometric system.
43
CHAPTER 5: DISCUSSION
This chapter will provide and overall discussion on all that have been done starting
from the collection of ECG signals until recognition of an individual using a
person‟s ECG signal characteristics. There were some drawbacks, benefits and
observations made while working with this project. They will be discussed here
elaborately.
The first chapter discusses the Heart structure and also explains the procedure of
obtaining ECG from it. It shows the different ways that ECG can be collected from a
human. In this experiment the ECG is obtained using Lead 1 forming an Einthoven‟s
triangle. This is one thing that is different from many other projects where the ECG
is taken in other different forms.
The second chapter discusses a previous study where a biometric recognition system
is made. Its process is described so give an overview of the processes and steps
involved in creating a biometric recognition system. The other reason for writing
about that specific paper was to explain how the PCA method was used previously.
PCA was going to be used in this project in a different way. That chapter also
described the initial proposed methodology. The initial methodology had a pre-
processing stage but the feature extraction part provided two possible ways to handle
it. One was fiducial feature detection and the other was the non-fiducial feature
extraction. The fiducial feature extraction was related to extracting temporal,
amplitude and angular features. The non-fiducial feature was based on extracting the
auto correlated data and reducing the dimension of the autocorrelated data using
PCA.
The third chapter showed the methodology that was followed. In the pre-processing
stage a wavelet filter (daubechies) was used instead of the bandpass filter. This was
done because the wavelet filter was found to have caused lower damage to the signal
when filtering it. Then the signal was windowed to avoid the possibility of spectral
leakage. The other reason was that it was found that the recognition rate with
windowing was better.
In the feature extraction part both the fiducial and non-fiducial points were tried but
the non-fiducial features gave better result. Fiducial points were not chosen to work
44
with because this process chose specific data from the signal and worked with them.
This limits or ignores other details that the signal might have. Then the
programming is also longer and time consuming. Whereas the non-fiducial points
extracted by autocorrelation are more global. It does not focus on specific attributes.
It takes out all the important data by itself from the signal and ignores the
insignificant ones. This is another disadvantage in fiducial points because we do not
know which features will be good for usage. The features in ECG are always
varying in the signals even for the same person. Sometimes some features do not
yield good results. This adds another reason to why choosing fiducial points is not a
good idea. So ultimately the chosen method for feature extraction is autocorrelation.
The autocorrelated feature extraction extracts a large number of features. So we
focus on 400 data points of it taken from its maximum point to the right. This is the
region of QRS complex which was found not to change much for the same person a
lot over time. Still data from autocorrelation is a lot so PCA is used to further reduce
the dimension. The Hotelling T2
statistic obtained is the focus here. The Hotelling T2
statistic data extracted from the autocorrelated data of each signal is taken as the
final features to be used for the classification stage.
In the fourth chapter the features extracted for each person will be used to classify
and determine the correct individual. In order to do this a neural network is chosen.
This is chosen because it is a very powerful tool which can see or point out things
that a human can miss. It works like a human brain biological neural network. Each
neuron is a computational unit and is interconnected to other neurons in the
subsequent layers. So the results are massive simple processing units which work
together to achieve massive parallel distributed processing. The requirement from it
is that it should be able to recognise a pattern of data and specify it to a certain class.
This is why a pattern recognition tool in the neural network is chosen. A feed
forward network is chosen because it is the simplest and easiest among all the other
networks. The network uses Multi-layer perceptron. Each layer in MLP is a
computational unit. Thus there are multiple computational units working together to
achieve the best results. The MLP uses a learning algorithm known as the Back-
propagation which has an error function in it. The error function calculates the error
every time the network is trained with data. The target is to minimise the error. The
45
back-propagation uses supervised learning to send the error back through the
network until the target is met. The error minimisation is done using the scaled
conjugate gradient function for its benefits like it does not use line searching per
learning iteration which reduces the time consumption on its part and its faster than
any other second order algorithm.
Chapter 4 also shows the various performance of the network from different
perspective and they have shown good result as well. So all in all it is a good idea to
choose neural network for the classification stage.
The last step in chapter 4 where after the network is created it is tested with some
input and it has a high recognition rate. For 20 individuals it has given a recognition
rate of 100%. And with 35 people it has given 95% recognition rate. Although all
the 35 individuals have not been identified correctly but still the percentage is
comparatively high and good. It can be concluded that it is a good biometric
recognition system.
46
CHAPTER 6: CONCLUSION
The method of using fiducial detection in recognition is not a reliable option. Rather
non-fiducial is showing good result and reliable and durable performance. It has
computational advantages too compared to the fiducial detection. This is faster than
the fiducial detection algorithm.
In a previous research paper a biometric recognition system was designed using
autocorrelation (AC) based feature extraction and dimensionality reduction using
Discrete Cosine Transform(DCT). The research of the AC/DCT was based on 11
subjects and the recognition achieved was 90%. The methods used in this project are
1. The filtering of the signal which uses a wavelet filter which damages the
signal less
2. The whole filtered signal is windowed to avoid spectral leakage.
3. The dimensionality reduction feature uses a different algorithm known as the
Principal Component Analysis(PCA) from which the Hotelling T2
statistic
data is used for classification
The results obtained for a scope of 35 individuals gave a result of 95%. Thus it can
be concluded that this is a better system than the AC/DCT.
6.1 Future work and recommendation
The system has shown good performance and result. Thus it would be good idea to
carry out further research on this in the future. The results that have been shown
with 35 individuals are not 100%. When the numbers of individuals were 20 the
recognition was 100%. For higher individuals the percentage has gone down. This is
something that can be improved in the future.
A possible way to improve this would be gather a larger database. ECG signals tend
to change over time. This means more signals have to be collected from the
individuals over a long period of time so that as many variations as possible in the
signal of a person can be captured. Then train the network so that the neural network
has a better idea of the variability of the features of a person. Thus the network can
base its decision from a large range of data.
47
References
[1] Augustyniak, P., & Tadeusiewicz, R. (2009). Ubiquitous Cardiology: Emerging
Wireless Telemedical Applications. United States of America: Information
Science Reference. pp.11-54
[2] MADZRI, F. R. B.(2010). Study and Classification of ECG signals(Unpublished
Master‟s Thesis). Multimedia University, Cyberjaya, Malaysia
[3] Jianchu Yao,Senior Member IEEE and Yongbo Wan, A Wavelet Method for
Biometric Identification Using Wearable ECG Sensors, Proceedings of the 5th
International Workshop on Wearable and Implantable Body Sensor Networks,
in conjunction with The 5th International Summer School and Symposium on
Medical Devices and Biosensors,The Chinese University of Hong Kong,
HKSAR, China. Jun 1-3, 2008
[4] Konstantinos N. Plataniotis, Dimitrios Hatzinakos, Jimmy K. M. Lee, Edward S.
Rogers Sr. Department of Electrical And Computer Engineering, University Of
Toronto, ECG Biometric Recognition Without Fiducial Detection, 2006.
[5] YongjinWang, Foteini Agrafioti, Dimitrios Hatzinakos, and Konstantinos N.
Plataniotis, Analysis of Human Electrocardiogram for Biometric Recognition,
August 2007
[6] Yogendra Narain Singh and Phalguni Gupta, ECG to Individual Identification
[7] Zulhadi Zakaria, Nor Ashidi Mat Isa, Shahrel A. Suandi, Member IEEE,
Universiti Sains Malaysia, A study on the neural network training algorithm for
Multiface detection in static images, 24-26 February 2010.
[8] Yongbo Wan and Jianchu Yao, Senior Member, IEEE, A Neural Network
to Identify Human Subjects with Electrocardiogram Signals,2008
[9] Ognian Boumbarov, Yuliyan Velchev, Strahil Sokolov, Technical University of
Sofia, 8, Kl. Ohridski str., Sofia 1000, Bulgaria, ECG Personal Identification in
Subspaces using Radial Basis Neural Networks, IEEE International Workshop
on Intelligent Data Acquisition and Advanced Computing Systems: Technology
and Applications, 21-23 September 2009
[10] Robyn Ball and Philippe Tissot, Division of Nearhsore Research, Texas A&M
University, Demonstration of Artificial Neural Network in Matlab
[11] S.Zahra Fatemian, Student Member, IEEE, Dimitrios Hatzinakos, Senior
Member, IEEE, The Edward S. Rogers SR. Department of Electrical and
Computer Engineering,University of Toronto, 10 King's College Circle,
Toronto, A New ECG Feature Extractor For Biometric Recognition
[12] Y. Gahi, M. Lamrani, A. Zoglat, M. Guennoun, B. Kapralos, K. El-Khatib,
Département Math-Info, Faculté des Sciences de Rabat, 4 Avenue Ibn Battouta
48
B.P. 1014 RP, Rabat, Maroc, University of Ontario Institute of Technology,
2000 Simcoe Street North, Oshawa, Ontario, Canada, Biometric Identification
System Based on Electrocardiogram Data
[13] Loh Sik Hou, Khazaimatol S. Subari and Syed Syahril, Faculty of Engineering,
Multimedia University, Cyberjaya, Selangor, Malaysia, QRS complex of ECG-
based Biometrics in Two Level Classifier
[14] Supervised Learning in Neural Networks, 2012 Marcus bros. Retrieved from
http://www.neuralnetworksolutions.com/nn/supervised3.php
[15] MATLAB 'spectrogram' params, Retrieved from
http://stackoverflow.com/questions/5887366/matlab-spectrogram-params
[16] Neural network, Wikipedia, Retrieved from
http://en.wikipedia.org/wiki/Neural_network
[17] Artificial neural network, Wikipedia, Retrieved from
http://en.wikipedia.org/wiki/Artificial_neural_network
[18] Feedforward neural network, Wikipedia, Retrieved from
http://en.wikipedia.org/wiki/Feedforward_neural_network
[19] Backpropagation, Wikipedia, Retrieved from
http://en.wikipedia.org/wiki/Backpropagation
[20] Scaled Conjugate Gradient (SCG), 1995, Retrieved from
http://www.csc.kth.se/~orre/snns-
manual/UserManual/node243.html#SECTION0010172000000000000000
[21] Justin Leo Cheang Loong, Khazaimatol S Subari, Rosli Besar and Muhammad
Kamil Abdullah, A New Approach to ECG Biometric Systems: A Comparitive
Study between LPC and WPD Systems, 2010
49
Appendix A
Matlab Code
1. Training Program
S = load('trainingsamples.mat');
C = struct2cell(S);
for m= 1:315
A = cell2mat(C(m));
N = 20*256;
my_signal=A(1:N,5);
x= my_signal;
% Normalised Signal [-1 1]
x = x - mean (x ); % cancel DC conponents
x = x/ max( abs(x));
N = length (x); % Signal length
t = [0:N-1]/256; figure (1)
plot(t,x);
% Filtered Signal
[thr,sorh,keepapp] = ddencmp('den','wv',x);
xd = wdencmp('gbl',x,'db6',3,thr,sorh,keepapp);
w = hamming(N,'periodic');
sig=w.*xd;
% Autocorrelation
r=xcorr(sig,'coeff');
k=length(r);
txcorr = [0:k-1]/256;
plot(txcorr,r)
[x y]=max(r);
a=txcorr(y);
ny=y+400;
b=txcorr(ny);
xnew= r(y:ny);
k=length(xnew);
txcorr = [0:k-1]/256;
plot(txcorr,xnew);
% Principal Component Analysis
[coeff(:,m),score(:,m),latent(:,m),tsquare(:,m)] = ...
princomp(xnew);
plot(txcorr, tsquare(:,m));
end
50
2. Testing program
S = load('trainingsamples.mat');
C = struct2cell(S);
for m= 1:35
A = cell2mat(C(m));
N = 20*256;
my_signal=A(1:N,5);
x= my_signal;
% Normalised Signal [-1 1]
x = x - mean (x ); % cancel DC conponents
x = x/ max( abs(x));
N = length (x); % Signal length
t = [0:N-1]/256; figure (1)
plot(t,x);
% Filtered Signal
[thr,sorh,keepapp] = ddencmp('den','wv',x);
xd = wdencmp('gbl',x,'db6',3,thr,sorh,keepapp);
w = hamming(N,'periodic');
sig=w.*xd;
% Autocorrelation
r=xcorr(sig,'coeff');
k=length(r);
txcorr = [0:k-1]/256;
plot(txcorr,r)
[x y]=max(r);
a=txcorr(y);
ny=y+400;
b=txcorr(ny);
xnew= r(y:ny);
k=length(xnew);
txcorr = [0:k-1]/256;
plot(txcorr,xnew);
% Principal Component Analysis
[coefftest(:,m),scoretest(:,m),latenttest(:,m),tsquare
...test(:,m)] = princomp(xnew);
plot(txcorr,tsquare(:,m));
end
51
3. Training and Input target matrix
target=zeros(35,315);
training_target=[target(1,1:9)+1 target(1,10:315)
target(2,1:9) target(2,10:18)+1 target(2,19:315)
target(2,1:18) target(2,19:27)+1 target(2,28:315)
target(2,1:27) target(2,28:36)+1 target(2,37:315)
target(2,1:36) target(2,37:45)+1 target(2,46:315)
target(2,1:45) target(2,46:54)+1 target(2,55:315)
target(2,1:54) target(2,55:63)+1 target(2,64:315)
target(2,1:63) target(2,64:72)+1 target(2,73:315)
target(2,1:72) target(2,73:81)+1 target(2,82:315)
target(2,1:81) target(2,82:90)+1 target(2,91:315)
target(2,1:90) target(2,91:99)+1 target(2,100:315)
target(2,1:99) target(2,100:108)+1 target(2,109:315)
target(2,1:108) target(2,109:117)+1 target(2,118:315)
target(2,1:117) target(2,118:126)+1 target(2,127:315)
target(2,1:126) target(2,127:135)+1 target(2,136:315)
target(2,1:135) target(2,136:144)+1 target(2,145:315)
target(2,1:144) target(2,145:153)+1 target(2,154:315)
target(2,1:153) target(2,154:162)+1 target(2,163:315)
target(2,1:162) target(2,163:171)+1 target(2,172:315)
target(2,1:171) target(2,172:180)+1 target(2,181:315)
target(2,1:180) target(2,181:189)+1 target(2,190:315)
target(2,1:189) target(2,190:198)+1 target(2,199:315)
target(2,1:198) target(2,199:207)+1 target(2,208:315)
target(2,1:207) target(2,208:216)+1 target(2,217:315)
target(2,1:216) target(2,217:225)+1 target(2,226:315)
target(2,1:225) target(2,226:234)+1 target(2,235:315)
target(2,1:234) target(2,235:243)+1 target(2,244:315)
target(2,1:243) target(2,244:252)+1 target(2,253:315)
target(2,1:252) target(2,253:261)+1 target(2,262:315)
target(2,1:261) target(2,262:270)+1 target(2,271:315)
target(2,1:270) target(2,271:279)+1 target(2,280:315)
target(2,1:279) target(2,280:288)+1 target(2,289:315)
target(2,1:288) target(2,289:297)+1 target(2,298:315)
target(2,1:297) target(2,298:306)+1 target(2,307:315)
target(2,1:306) target(2,307:315)+1 ];
target_test=zeros(35,35);
input_target=[target_test(1,1)+1 target_test(1,2:35)
target_test(2,1) target_test(2,2)+1
target_test(2,3:35)
target_test(2,1:2) target_test(2,3)+1
target_test(2,4:35)
target_test(2,1:3) target_test(2,4)+1
target_test(2,5:35)
52
target_test(2,1:4) target_test(2,5)+1
target_test(2,6:35)
target_test(2,1:5) target_test(2,6)+1
target_test(2,7:35)
target_test(2,1:6) target_test(2,7)+1
target_test(2,8:35)
target_test(2,1:7) target_test(2,8)+1
target_test(2,9:35)
target_test(2,1:8) target_test(2,9)+1
target_test(2,10:35)
target_test(2,1:9) target_test(2,10)+1
target_test(2,11:35)
target_test(2,1:10) target_test(2,11)+1
target_test(2,12:35)
target_test(2,1:11) target_test(2,12)+1
target_test(2,13:35)
target_test(2,1:12) target_test(2,13)+1
target_test(2,14:35)
target_test(2,1:13) target_test(2,14)+1
target_test(2,15:35)
target_test(2,1:14) target_test(2,15)+1
target_test(2,16:35)
target_test(2,1:15) target_test(2,16)+1
target_test(2,17:35)
target_test(2,1:16) target_test(2,17)+1
target_test(2,18:35)
target_test(2,1:17) target_test(2,18)+1
target_test(2,19:35)
target_test(2,1:18) target_test(2,19)+1
target_test(2,20:35)
target_test(2,1:19) target_test(2,20)+1
target_test(2,21:35)
target_test(2,1:20) target_test(2,21)+1
target_test(2,22:35)
target_test(2,1:21) target_test(2,22)+1
target_test(2,23:35)
target_test(2,1:22) target_test(2,23)+1
target_test(2,24:35)
target_test(2,1:23) target_test(2,24)+1
target_test(2,25:35)
target_test(2,1:24) target_test(2,25)+1
target_test(2,26:35)
target_test(2,1:25) target_test(2,26)+1
target_test(2,27:35)
target_test(2,1:26) target_test(2,27)+1
target_test(2,28:35)
target_test(2,1:27) target_test(2,28)+1
target_test(2,29:35)
53
target_test(2,1:28) target_test(2,29)+1
target_test(2,30:35)
target_test(2,1:29) target_test(2,30)+1
target_test(2,31:35)
target_test(2,1:30) target_test(2,31)+1
target_test(2,32:35)
target_test(2,1:31) target_test(2,32)+1
target_test(2,33:35)
target_test(2,1:32) target_test(2,33)+1
target_test(2,34:35)
target_test(2,1:33) target_test(2,34)+1
target_test(2,35)
target_test(2,1:34) target_test(2,35)+1 ];
4. Classification testing
outputs = sim(net,tsquaretest);
outputs = round(outputs);
figure(1);
plotconfusion(input_target,outputs)

More Related Content

Similar to FYP Part 2 report

Guide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMU
Guide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMUGuide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMU
Guide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMUAhmad Ozair
 
Wajid Shah-MCS143027.pdf
Wajid Shah-MCS143027.pdfWajid Shah-MCS143027.pdf
Wajid Shah-MCS143027.pdfMehwishKanwal14
 
Essay Help Forum.pdf
Essay Help Forum.pdfEssay Help Forum.pdf
Essay Help Forum.pdfJulie Johnson
 
SMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdf
SMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdfSMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdf
SMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdfJamelBaili2
 
Eye Tracking & Visual Marketing - A Study of the Vietnamese Beer Market
Eye Tracking & Visual Marketing - A Study of the Vietnamese Beer MarketEye Tracking & Visual Marketing - A Study of the Vietnamese Beer Market
Eye Tracking & Visual Marketing - A Study of the Vietnamese Beer MarketAnas El Khaloui
 
barış_geçer_tez
barış_geçer_tezbarış_geçer_tez
barış_geçer_tezBaris Geçer
 
Dhruv Rai - Master's Thesis
Dhruv Rai - Master's ThesisDhruv Rai - Master's Thesis
Dhruv Rai - Master's ThesisDhruv Rai
 
An Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General PeopleAn Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General PeopleAfif Bin Kamrul
 
JULIUS KIPCHUMBA KEMBOI
JULIUS KIPCHUMBA KEMBOIJULIUS KIPCHUMBA KEMBOI
JULIUS KIPCHUMBA KEMBOIjulius kemboi
 
Software Engineering Final Year Project Report
Software Engineering Final Year Project ReportSoftware Engineering Final Year Project Report
Software Engineering Final Year Project Reportjudebwayo
 
FIRST AID RESEARCH PROJECT
FIRST AID RESEARCH PROJECTFIRST AID RESEARCH PROJECT
FIRST AID RESEARCH PROJECTAmb Steve Mbugua
 
Regional report Asia Pacific
Regional report Asia PacificRegional report Asia Pacific
Regional report Asia Pacificclac.cab
 

Similar to FYP Part 2 report (20)

Decision Support System
Decision Support SystemDecision Support System
Decision Support System
 
Guide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMU
Guide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMUGuide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMU
Guide to Obtaining IEC/IRB (ethics committee/review board) approval in KGMU
 
thesis
thesisthesis
thesis
 
thesis
thesisthesis
thesis
 
slt siwes report
slt siwes reportslt siwes report
slt siwes report
 
Wajid Shah-MCS143027.pdf
Wajid Shah-MCS143027.pdfWajid Shah-MCS143027.pdf
Wajid Shah-MCS143027.pdf
 
Essay Help Forum.pdf
Essay Help Forum.pdfEssay Help Forum.pdf
Essay Help Forum.pdf
 
SMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdf
SMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdfSMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdf
SMART_HOME_ENERGY_MANAGEMENT_DESIGN_AND _IMPLEMENTATION.pdf
 
Eye Tracking & Visual Marketing - A Study of the Vietnamese Beer Market
Eye Tracking & Visual Marketing - A Study of the Vietnamese Beer MarketEye Tracking & Visual Marketing - A Study of the Vietnamese Beer Market
Eye Tracking & Visual Marketing - A Study of the Vietnamese Beer Market
 
barış_geçer_tez
barış_geçer_tezbarış_geçer_tez
barış_geçer_tez
 
Dhruv Rai - Master's Thesis
Dhruv Rai - Master's ThesisDhruv Rai - Master's Thesis
Dhruv Rai - Master's Thesis
 
Integrated protection and control strategies for microgrid
Integrated protection and control strategies for microgrid Integrated protection and control strategies for microgrid
Integrated protection and control strategies for microgrid
 
Online Voting System
Online Voting SystemOnline Voting System
Online Voting System
 
Jun dai blockchain
Jun dai blockchainJun dai blockchain
Jun dai blockchain
 
An Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General PeopleAn Android Communication Platform between Hearing Impaired and General People
An Android Communication Platform between Hearing Impaired and General People
 
RHouraniDSFinalPaper
RHouraniDSFinalPaperRHouraniDSFinalPaper
RHouraniDSFinalPaper
 
JULIUS KIPCHUMBA KEMBOI
JULIUS KIPCHUMBA KEMBOIJULIUS KIPCHUMBA KEMBOI
JULIUS KIPCHUMBA KEMBOI
 
Software Engineering Final Year Project Report
Software Engineering Final Year Project ReportSoftware Engineering Final Year Project Report
Software Engineering Final Year Project Report
 
FIRST AID RESEARCH PROJECT
FIRST AID RESEARCH PROJECTFIRST AID RESEARCH PROJECT
FIRST AID RESEARCH PROJECT
 
Regional report Asia Pacific
Regional report Asia PacificRegional report Asia Pacific
Regional report Asia Pacific
 

FYP Part 2 report

  • 1. The project report is prepared for Faculty of Engineering Multimedia University in partial fulfilment for Bachelor of Engineering FACULTY OF ENGINEERING MULTIMEDIA UNIVERSITY January 2012 ECG BIOMETRIC RECOGNITION WITHOUT FIDUCIAL DETECTION by Kazi Tasneem Farhan 10711118244 Session 2011/2012 Session 2011/2012
  • 2. i The copyright of this report belongs to the author under the terms of the Copyright Act 1987 as qualified by Regulation 4(1) of the Multimedia University Intellectual Property Regulations. Due acknowledgement shall always be made of the use of any material contained in, or derived from, this report.
  • 3. ii Declaration I hereby declare that this work has been done by myself and no portion of the work contained in this report has been submitted in support of any application for any other degree or qualification of this or any other university or institute of learning. I also declare that pursuant to the provisions of the Copyright Act 1987, I have not engaged in any unauthorised act of copying or reproducing or attempt to copy / reproduce or cause to copy / reproduce or permit the copying / reproducing or the sharing and / or downloading of any copyrighted material or an attempt to do so whether by use of the University‟s facilities or outside networks / facilities whether in hard copy or soft copy format, of any material protected under the provisions of sections 3 and 7 of the Act whether for payment or otherwise save as specifically provided for therein. This shall include but not be limited to any lecture notes, course packs, thesis, text books, exam questions, any works of authorship fixed in any tangible medium of expression whether provided by the University or otherwise. I hereby further declare that in the event of any infringement of the provisions of the Act whether knowingly or unknowingly the University shall not be liable for the same in any manner whatsoever and undertakes to indemnify and keep indemnified the University against all such claims and actions. Signature: ________________________ Name: Kazi Tasneem Farhan Student ID: 1071118244 Date: 9th January 2012
  • 4. iii Acknowledgement Firstly I would like to thank Allah. I believe without His blessing and mercy I would not have been able to come so far. Then I would like thank my parents for their tremendous prayers and moral support. I want to thank them for believing in me. I would like to express my sincerest thankfulness to my supervisor Dr. Khazaimatol S. Subari. Her tremendous support and guidance is truly mesmerising. Her friendliness, co-operation and kindness have helped me a lot throughout my work. Once again I would like to thank Miss Shima for being so patient with me. I would also like to mention a research officer Syed Syahril. I would like to express my gratitude towards him because he has been very helpful and kind. I would like to thank a senior and Master‟s researcher Mr. Rameshwor Prasad Shah who has been assisting and guiding me with my research work. Not to forget my friends who have been a tremendous support as well during my duration of this research.
  • 5. iv Abstract The structure and functions of the heart is studied. This gives idea of how the heart works in the human body. The ECG signal is studied by looking into its features and how they are related to the heart. The method of extraction of ECG signal from the heart and various forms of obtaining is also studied because it helps in analysing the signal. The various types of noise in the ECG and the different methods to filter the signal are studied. A database of signal is collected and pre-processed using a chosen filter and window to get a clean and sharp signal. The autocorrelation based feature extraction method is used to extract features from the signal. Principal component analysis is used to reduce the dimension of the features obtained from autocorrelation. The features obtained are put through a classification stage using a neural network. The network uses a pattern recognition tool with a feed forward back propagation network and a training function called scaled conjugate gradient to classify the the features obtained from each signal. The scope of the project is based on 35 subjects. A set of 10 trials of signals is collected from every individual. 9 out 10 trials are used to train the network and the 1 trial from each individual is used to test the network. The results obtained give a recognition rate of 95%. This means that it is a good recognition system. It is found that ECG is unique for every individual. So ECG is a good feature to use in a biometric system.
  • 6. v Table of Contents Declaration...................................................................................................................ii Acknowledgement..................................................................................................... iii Abstract.......................................................................................................................iv Table of Contents ........................................................................................................v List of Figures............................................................................................................vii List of Tables........................................................................................................... viii List of Abbreviations..................................................................................................ix CHAPTER 1: INTRODUCTION TO ELECTROCARDIOGRAM SIGNAL ...........1 1.1 Heart Structure...................................................................................................1 1.1.1 Surfaces and layers within the heart...........................................................2 1.1.2 Major heart structures.................................................................................2 1.1.3 Systematic and pulmonary circulation .......................................................3 1.2 ECG ...................................................................................................................4 1.2.1 Limb leads and Augmented limb leads ......................................................7 1.2.2 Precordial Leads .......................................................................................10 1.2.3 ECG signal labels .....................................................................................11 1.2.4 ECG sampling frequency .........................................................................12 1.2.5 ECG noise and artifacts............................................................................13 1.2.6 ECG applications......................................................................................14 CHAPTER 2: PREVIOUS WORK AND PROPOSED METHODOLOGY............16 2.1 Previous Work.................................................................................................16 2.2 Proposed methodology ....................................................................................18 2.2.1 Pre-processing ..........................................................................................18 2.2.2 Feature Extraction.....................................................................................19
  • 7. vi 2.2.3 Principle Component Analysis .................................................................20 2.2.4 Template matching ...................................................................................21 CHAPTER 3: PRE-PROCESSING AND FEATURE EXTRACTION METHODOLOGY....................................................................................................23 3.1 Experimental Setup .........................................................................................24 3.2 Methodology used ...........................................................................................25 3.2.1 Pre-processing ..........................................................................................25 3.2.2 Autocorrelation based feature extraction..................................................30 CHAPTER 4: CLASSIFICATION ...........................................................................32 4.1 Neural Network ...............................................................................................32 4.1.1 Pattern Recognition Network ...................................................................33 4.1.2 Feed-forward Back propagation network.................................................34 4.2 Process of Classification..................................................................................36 4.2.1 Feature Set................................................................................................36 4.2.2 Training the network ................................................................................37 4.2.3 Testing the network ..................................................................................40 CHAPTER 5: DISCUSSION ....................................................................................43 CHAPTER 6: CONCLUSION..................................................................................46 6.1 Future work and recommendation...................................................................46 References .................................................................................................................47 Appendix A ...............................................................................................................49 Matlab Code ..........................................................................................................49
  • 8. vii List of Figures Figure 1-1: (a) Anterior view showing surface features, (b) Posterior view.....................................................3 Figure 1-2: Willem Einthoven..............................................................................................................................4 Figure 1-3: An ECG signal obtained from one heart beat................................................................................4 Figure 1-4: Types of ECG electrodes and its placements on the body.............................................................5 Figure 1-5: Circuit used to collect ECG signal ...................................................................................................7 Figure 1-6: The top three are Limb leads and the bottoms three are Augmented limb leads [1]...................7 Figure 1-7: ECG signal from the Limb leads [2]................................................................................................8 Figure 1-8 ECG signal from Augmented Limb leads [2] ...................................................................................9 Figure 1-9: Einthoven’s Triangle representation of Limb leads and Augmented limb leads [2] ...................9 Figure 1-10 Position of placement of Precordial leads.....................................................................................10 Figure 1-11 ECG signal from the six Precordial leads [2] ...............................................................................10 Figure 1-12: Noisy Signal (Top) and Filtered Signal (bottom) ........................................................................14 Figure 2-1: Noisy signal (Top) and Filtered signal (bottom) ...........................................................................18 Figure 2-2: Different fiducial points of an ECG signal ....................................................................................19 Figure 3-1: Process flow diagram of the steps involved in Biometric system.................................................23 Figure 3-2: Position of electrode placement to collect ECG data....................................................................24 Figure 3-3: Noisy ECG signal ............................................................................................................................26 Figure 3-4: Normalised noisy ECG signal.........................................................................................................26 Figure 3-5: Noisy ECG signal in frequency domain.........................................................................................27 Figure 3-6: Filtered normalised ECG signal.....................................................................................................27 Figure 3-7: Filtered ECG signal in frequency domain.....................................................................................28 Figure 3-8: ECG from 3 subjects.......................................................................................................................28 Figure 3-9: Hamming Window ..........................................................................................................................29 Figure 3-10: Filtered ECG signal after windowing..........................................................................................29 Figure 3-11: Autocorrelated data ......................................................................................................................30 Figure 3-12: 400 AC points from the maximum to the right...........................................................................31 Figure 3-13: PCA Hotelling’s T2 statistic data from 3 subjects.......................................................................31 Figure 4-1: Process flow diagram of a Neural Network [10] ...........................................................................32 Figure 4-2: A portion of the target matrix for training ...................................................................................34 Figure 4-3: Feed Forward Network...................................................................................................................35 Figure 4-4: Neural Network Training tool........................................................................................................38 Figure 4-5: Performance graph of the network................................................................................................39 Figure 4-6: Training state graph of gradient and validation checks...............................................................40 Figure 4-7: Confusion matrix with test results of 20 subjects .........................................................................41
  • 9. viii List of Tables Table 1-1: Chart on position of placement of electrodes [2]..............................................................................6 Table 2-1: Results from using PCA and LDA [9].............................................................................................17 Table 4-1: Subjects assigned to classes..............................................................................................................37
  • 10. ix List of Abbreviations ECG Electrocardiogram PCA Principal Component Analysis DCT Discrete Cosine Transform AC Autocorrelation BP Back-propagation NN Neural Network SCG Scaled Conjugate Gradient SAN Sinoatrial Node AVN Atrioventricular Node LA Left atrium LV Left ventricle RA Right atrium RV Right Ventricle aVL Augmented vector left aVR Augmented vector right aVF Augmented vector foot RL Right leg LL Left leg RR R to R peak AP Action potential RBFNN Radial basis neural network LDA Linear discriminant analysis MSE Mean squared error MLP Multi-layer perceptron
  • 11. 1 CHAPTER 1: INTRODUCTION TO ELECTROCARDIOGRAM SIGNAL Security devices and technology have to be rapidly changing and advancing since new ways of falsification are coming up all the time. Innovation in this field is of utmost important to stay ahead of the game. Biometrics is one of the possibilities of providing such protection. Biometrics refers to a certain physical, biological and behavioural characteristic of a human being. There many features that are used as the modalities of biometrics such as physical traits include face, iris and behavioural include gait and keystroke. The process of using these unique features of everything human being is done through computing and signal processing. The problem which remains with these previously mentioned features is that they can be falsified. In the last few years a lot of researchers have suggested the use Electrocardiogram (ECG) for biometric recognition. ECG is signal wave used to represent the electrical activity of heart. It has been found that this measure is unique for every individual. It is thought to be a promising application in the biometric field. A lot of research has been done in this field now. Most of the researches try to study the ECG signals by looking into its various aspects. This is done to find out the best procedure of using ECG in biometric system. In this particular project, several combinations of aspects of the ECG signal will be taken into consideration. Using these aspects or features the identification process will be done. 1.1 Heart Structure In order to study the signal it is important to study the biology of the heart. Only by understanding it we get a good knowledge of the ECG signal. Thus this first chapter will give a thorough demonstration of the Heart structure and the method used for the acquisition of ECG signal will be elaborated.
  • 12. 2 1.1.1 Surfaces and layers within the heart The human heart rests in the thoracic cavity of the human body. The wide superior portion of the heart from which the great vessels emerge is the base of the heart. The inferior end pointing to the left is the apex of the heart. It tilted at an angle so that the inferior surface rests against the diaphragm with two thirds of it at the left side of the sternum. 1.1.2 Major heart structures The heart is divided into two sides: the left and the right. Each side has an upper and lower chamber. So there are a total of four chambers in a human heart. The upper chamber is referred to as an atrium and the lower as a ventricle. The apex is formed by the tip of the left ventricle and the two atria form the base of the heart. The four chambers are marked on their boundaries by shallow grooves called sulci. The wall of the heart is composed of Cardiac Muscle tissues which have its own blood supply and circulation, the coronary circulation. The heart is surrounded by coronary blood vessels. The right and left coronary arteries found on the anterior surface of the heart supplies blood when the ventricles are resting. The cusps of the aortic valves cover the coronary artery openings when the ventricles contract. The great veins of the heart such as superior vena cava, inferior vena cava and coronary sinus return oxygen poor blood to the right atrium. The great arteries carry blood away from the ventricles. The large artery is divided into the right and left pulmonary artery which carries blood to the lungs where it is oxygenated. Then this oxygen rich blood is returned to the left atrium through the right and left pulmonary veins. Then the blood passes to the left ventricle where it is pumped to the large aorta. The aorta then distributes the blood to the systematic circulation.
  • 13. 3 (a) (b) Figure 1-1: (a) Anterior view showing surface features, (b) Posterior view 1.1.3 Systematic and pulmonary circulation There are two types of circulation: pulmonary and systematic. The function of the pulmonary circulation is to carry the blood from the left ventricle to the lungs and back to the left atrium. The systematic circulation functions same like the pulmonary circulation except that it functions with body tissues instead of lungs. Both the circulations involve arteries, capillaries and veins. The circulations start from the heart and come back to the heart in the end. In short, the blood returning to the heart through the great veins is received by the thin walled atria. The large thick ventricular walls contract simultaneously to send the blood from the right ventricle to the pulmonary circulation and from the left ventricle to the systematic circulation.
  • 14. 4 1.2 ECG ECG is short for electrocardiogram. This term was first introduced in 1924 by Willem Einthoven. He won a Nobel Prize for introducing the mechanism of electrocardiogram [1]. Figure 1-2: Willem Einthoven Mainly this process digitally records the electrical activity of the heart muscle tissue at the cell level. It poses no pain or physical discomfort to the patient on whom it is performed. Several electrodes can be placed on the skin which will make several simultaneous aspects of spatial phenomenon accessible known as electrocardiographic leads. This electrical behaviour helps to detect diagnose heart abnormalities and thickness or damage in the heart muscle before or during heart attacks. Electrodes are placed on the human body to receive the ECG signals .The usage of electrodes to receive ECG signals range from a number of orthogonal 3- lead systems to as many as 80 or 120 leads on some extremely redundant body. [1] An ECG signal representing a single heart beat is shown below in figure 1-3. Figure 1-3: An ECG signal obtained from one heart beat
  • 15. 5 Figure 1-4: Types of ECG electrodes and its placements on the body The standard method used in obtaining the electrical activity of the heart uses 12- lead electrode system. The position of the lead determines with reference to the heart position determines the wave‟s morphology and amplitude. A wave can rise up or fall or be invisible depending on this very phenomenon of spatial orientation. The lead covers two orthogonal planes in this system: the frontal and transversal plane. The frontal plane refers to the vertical plane which is made up of the three limbs bipolar leads (called I, II and III) and three augmented unipolar leads (called aVL, aVR and aVF). The transversal plane refers to the horizontal plane which crosses the thorax orthogonally to the frontal plane consists of the six precordial unipolar leads (called V1, V2, V3, V4, V5 and V6). Vector representation is used differently on each lead to get the direction along which electric potentials are measured. The position of the lead determines with reference to the heart position the waves morphology and amplitude. When the wave goes above the base line it shows a positive deflection and when it goes below the base line it shows a negative deflection. A positive deflection means that the recorded wave has travelled towards the electrode and a negative mean that it has travelled towards it. A summarised chart on the position of placement of the electrodes is given below in table 1-1.
  • 16. 6 Table 1-1: Chart on position of placement of electrodes [2] The electrodes are placed on the chest by the process given above to collect the ECG data. The electrodes are placed in this way because of the location of the heart on that very place on the chest and the electrical activity involving SAN and AVN takes place during the heart cycle. The position of placing the electrodes is shown in the diagram in figure 1-5. Zbody is the resistance in between the electrodes where Z1 and Z2 are the lumped thoracic medium resistances. Vecg is the bipolar scalar ECG scalar lead voltage and it is measured with respect to a different reference potential. The foot has to be connected to the ground in this process otherwise no signal will be obtained.
  • 17. 7 Figure 1-5: Circuit used to collect ECG signal In the standard 12 lead electrode system there are two ways of electrode placement. The first one is the limb and augmented limb leads Bipolar and unipolar are two different types of lead that is used in obtaining the signals. All lead apart from the limb leads use unipolar leads. The two ways of electrode placement is further described below. 1.2.1 Limb leads and Augmented limb leads Figure 1-6: The top three are Limb leads and the bottoms three are Augmented limb leads [1]
  • 18. 8 1.2.1.1 Limb leads In figure 1-6 the top three drawings showing the leads I, II and III are called the limbs leads. If the three lines on each figure are put together then they form what is called the Einthoven‟s triangle. The leads use the electrodes in different combinations. Lead 1 captures the potential difference in between negative RA electrode which is on the right arm and the positive LA electrode which is on the left arm. Lead II captures the potential difference in between negative RA electrode which is on the right arm and the positive LL electrode which is on the left arm. Lead III captures the potential difference in between negative LA electrode which is on the right arm and the positive LL electrode which is on the left arm. Figure 1-7: ECG signal from the Limb leads [2] 1.2.1.2 Augmented Limb leads: In figure 1-6 the three drawings on the bottom show the leads aVR, aVL, and aVF which are otherwise called the augmented limb leads. They are basically derived from the leads I , II and III. The augmented limb leads are created by combining two of the three electrodes into a mutual negative pole and the remaining lead is kept as a positive pole. This modification is known as Wilson‟s central terminal. Lead aVR which stands for “augmented vector right” places a positive electrode on the right arm. The mutual negative pole here is a combination of the left arm electrode and the left leg electrode and thus the signal strength of the positive electrode on the right arm is augmented.
  • 19. 9 Figure 1-8 ECG signal from Augmented Limb leads [2] Figure 1-9: Einthoven‟s Triangle representation of Limb leads and Augmented limb leads [2] Lead aVL which stands for “augmented vector left” places a positive electrode on the left arm. The mutual negative pole here is a combination of the right arm electrode and the left leg electrode and thus the signal strength of the positive electrode on the left arm is augmented. Lead aVF which stands for “augmented vector foot” places a positive electrode on the left leg. The mutual negative pole here is a combination of the right arm electrode and the left arm electrode and thus the signal strength of the positive electrode on the left leg is augmented.
  • 20. 10 1.2.2 Precordial Leads There are six leads under this category and they are also known as the chest leads. They are placed on different parts of the chest as shown in figure 1-10. The leads used here are unipolar as was mentioned before. This is because the Wilson‟s central terminal is used as the negative electrode. Figure 1-10 Position of placement of Precordial leads The limb and augmented leads together form the hexaxial reference system which helps to obtain the electrical activity of the heart on the frontal plane. The leads work in pair to cover a certain area on the chest. The pairs are V1 and V2, V3 and V4 and V5 and V6 which covers the anteroseptal ventricular region, the anteroapical ventricular region and the anterolateral ventricular region respectively. These leads are used to record the signals from horizontal plane known as the z-axis. Figure 1-11 ECG signal from the six Precordial leads [2]
  • 21. 11 1.2.3 ECG signal labels The living cells are negatively charged 90mV) when they are at rest but they are quickly depolarized by the electrical stimulus. This change in voltage leads to the mechanical contraction of the heart by compressing and electrically exciting the proteins in the cells. This contraction leads to the changes in chamber volume. The electrically activated procedure of contraction and distraction happens repeatedly to pump blood effectively. The group of cells located in the upper part of the right atrium known as sinoatrial node (SAN) electrically initializes a heartbeat by its ability of spontaneous discharge. The resulting wave front scatters around all the atrial paths and causes both the atria to contract which forces blood into the ventricles. The process electrical conduction is performed at a speed of ca. 4m/s through atrioventricular pathways. The electrical conductivity is low in the atrioventricular node so the wave front is delayed by 120ms. During this delay the content of atria is transferred through the atrioventricular valves to the ventricles. Special conducting pathways (Purkinje fibres) make up for the large ventricles and helps in accelerating the propagation of the wave front, thus allowing efficient pumping of the blood throughout the body. Until the relaxation state goes up to 65mV after depolarization the cells do not receive or transmit electrical stimuli. Thus the wave front dies after receiving the heart muscle tissue. The refractory period between two heartbeats is around 200ms. The ECG signal is used to represent the electrical activity of heart over time taken from the human body surface for the different heart regions. The ECG signals here refer to the action potential (AP) curves with corresponding characteristics of the propagating wave front at different heart regions simulated at different stages of the cardiac cycle. Each heartbeat is represented by the series of five principal waves known as P, Q, R, S and T. (Refer to figure 1-3) 1. The P-wave shows the beginning of a new beat. It is produced by the atrial activation as a small amplitude wave. This wave can be positive, negative, or biphasic (with 2 peaks of opposite polarity). It ranges from 0.08 to 0.1ms. The section in between the P wave and QRS complex is known as the PR
  • 22. 12 interval. It ranges from 0.12 to 0.20ms. This represents the start of atrial depolarization until the start of ventricular depolarization. 2. The QRS complex is a combination of three sharp waves that are formed by the ventricular activation. The initial small wave is called the Q wave(depolarization on the wall between the ventricles), followed significantly big wave with opposite polarity called the R wave(depolarization on the left ventricle external wall) and finally the ventricular activation is finished by a small wave called the S wave (depolarization of upper part of ventricles). The Q-wave is negative, the R- wave is positive, the S-wave is negative and the last positive deflection is R‟. The duration of it ranges from 0.06 to 0.6ms. The ideal image of the QRs complex is given below but is practical it depends on the kin of lead that is being used. 3. The ventricular repolarization is shown by a small T-wave of variable morphology. Seldom has a small U-wave of unknown origin following it signifying the last bits and pieces of the repolarization process. The T-wave can be positive, negative, bimodal or biphasic or be an upwards or downwards deflection only. The QT interval shows the depolarization and repolarization of the ventricle which ranges 0.2 to 0.4 seconds. 1.2.4 ECG sampling frequency A study showed that the RR interval is usually 1ms. A critical error for heart rate variability is likely to occur if the frequency range happens to fall in between the threshold value. This is why Timo Bragge at al suggested that the ECG sampling frequency should at least be 500Hz. The normal heart rate ranges in between 60 to 100bpm. The usual heart rate frequency ranges in between 0.50 to 3.0Hz for a heart rate of 30 to 180 bpm. The typical value for the highest frequency is about 125Hz depending on the age, sex
  • 23. 13 and health of a person. But in case of paediatric ECG processing it sometimes jumps up to 150Hz. The frequency ranges in some of the ECG application is given below There are eight amplifiers in the modern-microprocessor based interpretive machines. They can store and sample eight leads simultaneously –I, II, and V1-V6. Then the rest of the four leads are stored and sampled – III, aVR, aVL, and aVF. These machines have enough memory to store all the leads for a 10 second interval at a sampling rate of 500 samples per second. The clinical bandwidth ranges from 0.05 to 100 Hz. The upper limit is halved to detect arrhythmias so the range for Intensive Care Unit patients ranges from 0.50 to 50Hz. A heart rate meter eliminates any non-ORS waves from the ECG signal such as the P and T waves. Thus the frequency is centred using bandpass filter at 17 Hz. Late potential measurements are the small frequency events that occur following the QRS complex wave. So the frequency here ranges up to 500Hz. 1.2.5 ECG noise and artifacts An ECG signal while being collected is often corrupted by external sources or noises. These external artifacts may be of artificial or biological in nature. The artificial sources of noise include power line interference, instrumentation noise, electrosurgical noise, impulse noise and electrostatic potentials. The biological artifacts come from motion and muscle artifacts and baseline drift due to respiration. But the most important biological artifacts come from motion and muscle artifacts. (Tompkins, W.J., 1993) The noise in the signals includes high frequency components and low frequency components. The high frequency components include the noises such as the power line interference and the low frequency components come from the baseline wander. In order to retrieve the original signal by removing the artifacts components we have to filter the signal. The low frequency components usually fall below 0.5 Hz and the high frequency components are above 40 Hz to 100Hz. There are numerous filters which can remove these noises. One such example is the FIR band pass filter which can set its cut off frequencies at 0.5Hz and 40Hz. This filter will only allow frequencies ranging in between cut-off frequencies to pass through. When selecting the cut-off frequency it must be made sure that the signal does not get much
  • 24. 14 distorted. The muscle noise remains a problem still because it overlaps with the actual PQRST of the ECG data. The figure below shows a filtered signal at the bottom of the figure. It can be seen that the bottom signal is much cleaner after filtering. Figure 1-12: Noisy Signal (Top) and Filtered Signal (bottom) Before collecting a signal there are three things that need to be made sure this includes electrode placement, electrode selection and skin preparation. In order to keep the interference at a minimum the electrodes must be placed on smooth surfaces where there is minimal hair, skin creases, bony protuberances and lesser muscle. The skin must be clean cleaned, dry, clipped off excess hair and scrubbed to remove any dead skin cells. 1.2.6 ECG applications There are various uses made out of the ECG signals. It can be applied in numerous fields of study as well as applications [2]. This is because of the huge benefits that can be obtained from the usage of ECG signals. One of the main areas of its usage is in the medical sector. ECG is widely used in the medical sector for determining the various conditions of the heart. It also shows other body conditions alongside like electrolyte abnormalities. Temporal durations
  • 25. 15 of the heart‟s electrical phenomena and their variations over time are parameters of primary clinical relevance. Other than that low variations of cycle-based parameters are also considered clinically important. Biometrics is another large field of study where the ECG signals is highly studied and used. The increasing crime and falsification rates have led to the search for unique traits that can be used to create a good security system. ECG is considered to be an exceptional trait which would be hard to falsify. It is good for biometric recognition applications because its unique features such as relative onsets of the various peaks, beat geometry, and responses to stress and activity give personalised characteristic of an individual. The ECG features also vary according to age, sex and way of lifestyle (caffeine intake, exercising, and body weight). All these features can prove to be very useful in the biometric field. Although this inter-individual variability might cause problem in the identification process in the long term due to the changes in the ECG features that come over time but this can be fixed by updating the database of the biometric system regularly. This also reduces the probability of the system being falsified since it will be updated periodically. Besides the quality which makes it hard to falsify there are other sides which makes its robust to the environmental factors because the heart lays highly protected deep inside the thoracic cavity. This is a promising field of study but it requires thorough understanding and knowledge of the signal and its intra-individual variability features.
  • 26. 16 CHAPTER 2: PREVIOUS WORK AND PROPOSED METHODOLOGY The previous chapter explains the basic things related to an ECG signal. This chapter will look more into the details as to what kind of methodology is used to do ECG signal processing. A brief description will be given from some previous studies that have been done related to this topic. Following that the proposed methodology for this project will be discussed. A biometric recognition system has four vital segments. They are [21] (i) Experimental Setup: The data acquisition method is described. (ii) Pre-processing: The signals recorded are filtered to remove noise (iii) Feature Extraction: The unique features of the signals are extracted (iv) Classification: An individual is identified 2.1 Previous Work In one of the previous study the recorded signals are filtered to remove noise. The Principle Component Analysis (PCA) is used with Linear Discriminant Analysis (LDA) for classification purpose for feature extraction and dimensionality reduction. Radial Basis Neural Networks (RBFNN)-based classifier is used for classification. The ECG is collected using industry standard ECG hardware equipped with PC- interface with one channel electrocardiograph. The ECG is transferred via this interface into the computer and stored as a binary file. The project uses real world ECG data of nine persons with twenty complete cardiac cycles for each collected at different times. Lead 1 from the Einthoven‟s triangle is used. The sampling rate of the signal is at 128Hz. The high frequency components of the signals recorded are removed by using a zero phase digital filter with a cutoff frequency, fc at 0.5Hz which removes the baseline drift from the signal. As mentioned earlier in chapter 1 that the dominating noise results from the muscle artifacts. In this paper a wavelet denoising with soft threshold of the Wavelet coefficients is used. This is because the muscle artifacts are additive and is modelled with Gaussian white noise.
  • 27. 17 The ECG signal after filtration is segmented into full cardiac cycles. This causes the baseline P and T wave to vanish because this segment is influenced by heart rate variations. The ECG signal is repetitive and is a random process which is why a sequential probabilistic model known as the Markov model is used [9]. This segmentation will assign each signal sample to a class. A matrix is formed where each segmented cardiac cycle is a row vector with finite number of elements. The columns of the matrix are the training cycles per person that are strongly correlated. The input data in then sorted according to its correlation matrix where the first 20 vectors are used in PCA. PCA is a popular technique which is used to reduce the dimension of discriminative information into a small number of coefficients [9]. LDA is applied on the features extracted from PCA which reduces the features from 20 down to 9. The application of LDA also increases the class discriminativity. The classification stage uses a Radial Basis Function Neural Network (RBFNN). RBFNN consists of asset of input nodes and a hidden neuron layer where each neuron has special type of output radial basis function which is centred at the mean vector of a cluster in feature space. In the output layer the outputs of the hidden neurons are summed up using a linear output function. There are two classifiers: one which classifies the PCA projections and another which classifies the LDA projections. Both the classifiers are realised by the RBFNN [9]. The results obtained for each person from the two types of feature extraction method is given below in the table 2-1. Table 2-1: Results from using PCA and LDA [9]
  • 28. 18 It can be seen from the table 2-1 that the recognition rate with PCA based feature extraction is higher than with LDA. In fact some of the values under LDA are lower than in PCA. 2.2 Proposed methodology The signals recorded will be filtered using a Butterworth bandpass filter. The pre- processed signal will then be used to extract fiducial or non-fiducial features. The fiducial feature extracts the temporal, angular or amplitude features of the signal. The non-fiducial feature extracts discriminative information from the signal without localizing any fiducial markers. Then the extracted features are classified using a template matching method. 2.2.1 Pre-processing A recorded ECG signal usually has a lot of noise in it. So is this stage a way has to be found so that the noise is removed. There are two types of noise which are make up the high frequency components such as power line interference and low frequency components such as the baseline wander in the signal. A Butterworth band-pass filter of order 6 with cutoff frequencies at 0.05 and 40Hz is used. Figure 2-1: Noisy signal (Top) and Filtered signal (bottom)
  • 29. 19 2.2.2 Feature Extraction In order to study a signal different features have to be extracted from it. These extracted features can be fiducial or non-fiducial. Fiducial points refer to the angular, temporal and amplitude features of a signal. Non-fiducial features extract discriminative information from the ECG trace without having to localize fiducial markers. This means non-fiducial points extracts patterns globally whereas fiducial points extracts information local to a single heartbeat 2.2.2.1 Fiducial features The figure below shows the different kind of fiducial features that can be extracted from the signal. The features chosen to be extracted were the P, Q, R, S,T, RQ, RS, RP, PT amplitudes and the RR, QRS, PT, ST, QR and RS intervals. A thresholding method using the Pan and Tompkins algorithm were used to determine the P, Q, R, S and T. Then from the determined points the rest of the temporal and amplitude features were to be extracted. Figure 2-2: Different fiducial points of an ECG signal
  • 30. 20 2.2.2.2 Non-fiducial features: The pre-processed signal has to be segmented into non overlapping windows. The normalised autocorrelation (AC) for every window is obtained. The AC coefficients are further reduced by using Principle Component Analysis (PCA) method. Thus the method can be called AC/PCA. When the denoised signal is autocorrelated the actual fiducial location will no longer required be found. The AC blends in sum of all the samples in a window to sequence of sum of products [4]. The AC takes into account the representative and highly distinctive features of the signals recorded. Even though the signal is non-periodic but this will still work because the signal is highly repetitive. Windowing is done to cut out a certain portion of the signal. The AC is performed on the windowed ECG. The data window of length N has to be longer than the heart beat rate. The relative distances and cycle lengths of ECG varies over time according to a subject‟s physical or mental or other states. The AC is shift invariant so it will gather the similar features over multiple heart beat cycles. The normalized autocorrelation coefficients Ȓxx[m] can be computed as: Ȓxx[m] = ∑ | | (1) Where x[i] is the windowed ECG for i=0,1… N-|m| - 1), x[i+m] is the time shifted form of the windowed ECG with a time lag of m = 0, 1, ...(M − 1); M<<N, where N is the length of the windowed signal. 2.2.3 Principle Component Analysis This uses orthogonal transformation to convert a set of correlated variables into a set of uncorrelated variables which are called the principle components. The features extracted from the AC are run through the PCA to compress the discriminative information into low number of coefficients. The PCA is a linear transform represented by Y = PT X (2) Where Y = transformed data P = linear transformation matrix
  • 31. 21 X = Data set The variables in the data set X are correlated to each other in varying degrees. The transformed data output obtained gives the variables for the various linear combinations of the variables in X. The target is to get the covariance matrix of the transformed data in diagonal format. The matrix representation can be written as: ∑y = diag {λ1, λ2……., λM} (3) The equation for ∑y may be written as ∑y = YYT = (PT XXT P) = PT ∑x P (4) In the equation above the columns of P consists of the eigenvectors of ∑x. A diagonal matrix Λ is formed from the eigenvalues in ∑x. Hence the eigenvector and eigenvalue decomposition is done according to the equation below: ∑x P = PΛ (5) Since the ∑x is symmetric so P is an orthogonal matrix and the eigenvalues of ∑x are real numbers. Thus for the orthogonal matrix P: PT ∑x P = PT PΛ and (6) PT ∑x P = Λ (7) This makes ∑y = Λ. The eigenvalues found are the results with a polynomial degree M. The leaves the final stage which is to find the eigenvectors corresponding to the eigenvalues and arrange them in descending-energy order. The original basis has a dimension of MxM but in the result the basis is reduced in size to MxL (where L<<M). The factors with negligible energy are eliminated which reduces the size of the result obtained. 2.2.4 Template matching At this stage it is necessary to reduce the size of a large class number problem to a small-class number problem. This is the main idea behind the classification stage.
  • 32. 22 This is what the template matching method does. It reduces the number of class and makes the possibility of classification easier by making the scope smaller. The performance of the system is enhanced because the search space is smaller. This can be performed using different distance metrics. One such distance metrics in the Euclidian distance. [4] The computation is done based on the equation below. D(x1, x2) = √ (8) This equation calculates the normalized Euclidean distance D between two feature vectors x1 and x2. In this case the feature vectors are the features (covariance matrix) obtained from PCA. C is this equation refers to the dimension of the feature vectors. This will make the computation fair for the different dimensions that x has. The PCA results obtained from the trained samples will be compared with the PCA result that is obtained from a test input. The distance will be computed for the test input with all the feature vectors stored for the trained samples. The minimum distance found will lead us to the correct person. This is because the stored feature vector for a subject must have been close to that of the test input features vectors. Thus the stored feature vector giving the minimum distance is the correct person. Hence the identification is done.
  • 33. 23 CHAPTER 3: PRE-PROCESSING AND FEATURE EXTRACTION METHODOLOGY This chapter will focus on the process of how the data was collected and processed. This will be done based on the process explained in chapter1 on ECG collection method and chapter two which explains the pre-processing and feature extraction method. The ECG records database used in this project are recorded by one of the senior students who have been working on a project related to biometrics. Still the process of collection is relevant to the report and shall be explained in details in the later sections of this chapter. This chapter will also discuss in detail the method chosen to process the signal. The different stages of operation performed on the dataset and the results obtained will be presented for better understanding. This will give a clear picture as to why the discussed algorithm is chosen. The process flow diagram of the whole process is in the figure 3-1. Figure 3-1: Process flow diagram of the steps involved in Biometric system ECG collection • Experimental Setup Filter and Window • Pre-processing Autocorrelation • Feature Extraction PCA • Dimension reduction Neural Network • Classification
  • 34. 24 3.1 Experimental Setup The ECG database used in this project was collected using Lead 1. The software used to collect and measure the data was Matlab R2009b software. It was used in combination with the hardware gMobilab+ by gTech, Guger Technologies, electrodes and personal computers. The figure below shows the position of the electrodes placed on the human body. Figure 3-2: Position of electrode placement to collect ECG data The electrode place on the lower left and right rib form the Einthoven‟s triangle as was mentioned in chapter 1. The data was collected from 35 subjects with a age, weight and height between 21- 25 years old, 50 to 100 kg and 160 to 200 cm respectively. When the signals were collected the subjects were made to sit in a comfortable position on a sofa. They were given some time to come to a state of relaxation. The palms were rested on the handles of the sofa on the either side of the person. The signals were obtained from Ch 5 -ve GND Ch 5 +ve
  • 35. 25 the subjects at a sampling frequency of 250Hz. There were 10 sets of signals collected per person. Each signal was recorded for a length of 60 seconds time [2]. 3.2 Methodology used This part will go through the method used in signal processing and collecting the features from every signal. There is several stage of work done in each of these two stages. In general the steps are:  Pre-processing  The first 30 seconds of the signal are taken.  The signal is normalised  ECG records are filtered to remove noise  The filtered signal is windowed  Feature Extraction  The normalised autocorrelation coefficient s are extracted from the filtered signal  The PCA covariance matrix is generated for classification The similar steps are repeated for all the 35 individuals in creating biometric recognition system. 3.2.1 Pre-processing The ECG signals in the database are filtered to remove the noise. The signals are not only have high and low frequency components of noise. It also has muscle noise which overlaps with the original signal. Traditional methods like a bandpass filter can often degrade a signal. Wavelet can de-noise a signal without appreciable degradation [3]. This is a why a daubechies wavelet „db6‟ which is Discrete Wavelet Transform (DWT) function at scale 12 is used. This removes the noise from power line interference and baseline wander. All the signals are standardized at a sampling frequency of 256Hz.
  • 36. 26 Figure 3-3: Noisy ECG signal Figure 3-4: Normalised noisy ECG signal The signal in figure 3-2 is not so sharp and clear. The second figure 3-3 shows the signal is normalised within -1 to 1.
  • 37. 27 Figure 3-5: Noisy ECG signal in frequency domain The unfiltered signal in the frequency domain in figure 3-4 shows a spike at 50Hz. This noise is from power line interference. Figure 3-6: Filtered normalised ECG signal The figure 3-5 shows a de-noised signal. The signal after filtering gives a much clean and sharp signal.
  • 38. 28 Figure 3-7: Filtered ECG signal in frequency domain The filtered signal in the frequency domain in figure 3-6 shows no spike at the 50Hz. This is evidence that the signal is filtered properly. A workspace folder is created where all the ECG records are stored. When the program is running each signal passes through the pre-processing stage and goes onto the feature extraction stage one after another. This way signal from all the 35 individuals are filtered before they are taken in for feature extraction. The figure 3-7 is evidence of the fact that each person has their own unique ECG signal. Filtered and normalized signals of 3 subjects are shown. Figure 3-8: ECG from 3 subjects
  • 39. 29 Figure 3-9: Hamming Window Figure 3-10: Filtered ECG signal after windowing ECG is a repetitive signal. When it is replicated after being sampled there happens to be spectral leakage [15]. In order to avoid discontinuities of the repetitive structure when it is tiled and reduce spectral leakage a window function is used. Hamming window is used on the filtered signal to avoid any spectral leakage. This will help to give a much smoother transition.
  • 40. 30 3.2.2 Autocorrelation based feature extraction The reason for choosing this non-fiducial over fiducial detection is because it is fully automatic. The main function of autocorrelation is to extract a set of significant data from a given set of data. The purpose it serves is that it will represent the data given to it in lower dimension. Since is ECG is highly repetitive it becomes easier to extract the autocorrelation features from it. In Chapter 2 it was mentioned under the proposed methodology that AC is shift invariant so it will gather the similar features over multiple heart beat cycles [4]. It will then blends in sum of all the samples in a window to sequence of sum of products. The AC takes into account the representative and highly distinctive features of the signals recorded. The figure 3-10 shows the autocorrelated data obtained from the windowed signal. The next figure 3-11 shows the segmented autocorrelated data by taking 400 data points from the maximum. QRS complex provides least variability under different conditions [4]. So 400 from the maximum point of the autocorrelated data is taken which is equivalent to the length of QRS complex. The first figure shows the autocorrelated data of the windowed signal. The next figure zooms to 400 AC points from the maximum point of autocorrelated data. If the window length is 5 seconds then the autocorrelated data will be present over 10 seconds. Figure 3-11: Autocorrelated data
  • 41. 31 Figure 3-12: 400 AC points from the maximum to the right Then Principal Component Analysis is applied on the autocorrelated data to further reduce the dimension [9]. It discards the insignificant factors and thus reduces the dimension. It does so using linear transformation. The PCA generates the Hotelling‟s T2 statistic which gives the multivariate distance of each observation from the center of the data set. Figure 3-13 shows the plot of the T2 statistic of 3 individual. It can be seen clearly that each is different from another. Figure 3-13: PCA Hotelling‟s T2 statistic data from 3 subjects These T2 statistic shall be used as input features for the classification stage in next chapter. The process flow diagram of the whole system is shows in figure.
  • 42. 32 CHAPTER 4: CLASSIFICATION This chapter is about the last stage in this project. This is regarding the classification of the various features of the different subjects according to the class assigned to them. A set of features is given into the classification stage for it to differentiate among the features and be able to assign them to particular individuals. A class is created for each individual and under the class a large set of data is given for the system to train itself on the variations in data an individual might have. Based on this logic the recognition stage is completed. 4.1 Neural Network The method used for classification in this project in the neural network (NN). The name given to this network is based on the biological network in which the biological neurons are functionally connected in a nervous system [16]. In a similar the NN that is going to be used in the classification stage has artificial neurons in interconnected groups. These groups use the connections for computational purposes. It will take in inputs and adjust its structure (weights) to generate a pattern that connects the input to the output. The process is shown in the flow diagram below. Figure 4-1: Process flow diagram of a Neural Network [10] In the flow diagram in figure 4-1 the network is given an input. Its function is to try and match the output to the target by adjusting its weight until the output is a close match to the target. This process can be compared with the way the neurons in a human brain works. It will works in its own way and find out a solution on its own. It does not rely on
  • 43. 33 anything to find out the solution. There are things that a human might not notice but it will not be missed by the neural network. It processes information using the high degree of interconnection in between its massive simple processing units which work together to achieve massive parallel distributed processing. The neural network is remarkable specially for its  Adaptive learning: Ability to learn to do a task based the input given to it  Self-organisation: While learning it organises the data given to it  Real time operation: Computations are carried out in parallel  Fault tolerance: Performance is degraded by partial degradation of a network 4.1.1 Pattern Recognition Network There are several functions that can be implemented by the neural network. This classification requires a pattern recognition tool. Thus the pattern recognition function was chosen. A matrix where each of its columns contains the feature vectors obtained from an individual. A target matrix is also fed along with it which will have the same number of columns as the input matrix. The purpose of the target matrix is to show in each of its column the class to which the corresponding column in the input belongs to. There are a total of 35 subjects from whom signals are collected. For each subject the feature vectors are generated. A total of 10 trials are collected per subject. One trial among the 10 for each subject is kept to be used as test input. The other 9 trials from the 35 people are used to form the training set (input) matrix which has 315 columns of feature vectors. A target matrix of the size 35 by 315 is formed. Since the training set matrix and the target matrix has the same number of columns thus each column in the target matrix corresponds to a column in the training set matrix. So each column in the target matrix can be assigned to an individual by setting a row in each column of the target matrix to 1. If the first 9 feature vectors column in the training set input belongs to person 1 then the class 1 in the target matrix will have each of first 9 columns will have their first row as 1. A portion of the target matrix is shown in the table in figure 4-2 for understanding.
  • 44. 34 Figure 4-2: A portion of the target matrix for training The one set of data kept separately from each subject will lead to a set of 35 data‟s. Thus a matrix of 35 columns of feature vectors will be formed to be used as test input signals in the network that will be formed after training the network with the training set matrix and the target matrix. The input space is shared by all the neurons. One analog neuron is assigned for each pattern that is to be recognised [2]. Several neural nets are working in parallel for pattern recognition trying to find features that are common along the columns in the matrix. These features are passed into another to find the closest match. This is how the neural network works in the process of recognition. All that the network needs is that it is given sufficient data for training itself for pattern recognition and an output with proper structure algorithm. 4.1.2 Feed-forward Back propagation network A feed forward back-propagation network is used. It is the simplest kind of neural network. The signal travels from the input to the output which means in one direction only. There is no feedback from any part to some other part in the network. It has three layers which include an input layer, a hidden layer and an output layer.
  • 45. 35 Figure 4-3: Feed Forward Network A multi-layer perceptron network is used in the feed forward network which makes it so useful. Multiple layers are interconnected with each other where each layer is a computational unit. There is one neuron in each layer which is connected to all the other neuron in the subsequent layer. The function of the network is to try and get a close match of the output to the target. It will keep on training itself with different weights and parameters. The real intelligence on the network lies in the adjusting of weights. A MLP uses several kinds of learning method. The method of adjusting the weight uses a learning algorithm known as Back Propagation (BP). An error function is pre-defined in the network. The job of the network is to compare the output to the target matrix which indicates the correct answer and compute the error. The error will then be sent back through the network which the algorithm will use to adjust its weight of each this. This process will be repeated until the error value is small. When the error becomes small this mean the network has determined the target function. The learning method used by the back-propagation network is called a supervised leaning method [19]. It mainly back-propagates the errors through the network. It has the ability to calculate the desired output from the input given. The method used in back-propagation is derived from the delta rule. The Delta rule uses a gradient descent algorithm from the family of Conjugate Gradient algorithms. The training function used in this project is the scaled gradient conjugate (SCG). The SCG is a second order Conjugate gradient algorithm. It assists in achieving the target functions of the several variables. There are two main reasons for choosing it. They are:
  • 46. 36 1. It uses a step sizing scaling mechanism which prevents it from line searching per learning iteration and which in turn reduces the time consumption on its part [7]. 2. It uses the model trust region from the Levenberg-Marquardt algorithm with the conjugate gradient approach. This makes SCG faster than any other second order algorithm [20]. The parameters used in the training process are epochs, show, goal, time, min_grad, max_fail, sigma and lambda. The weight vector is a point in the weight space. Minimisation is an iterative process to minimize the approximation of the function [2]. The error function in the SCG with respect to the power w: Eqy y) = E w) + E‟ w)T y + yT E” w)y (1) The critical point in Eqy(y) must be found to obtain the minimum of Eqy(y). The equation for finding the critical point is: Eqy(y) = E” w)y + E‟ w) = 0 (2) The neural network has a limitation which is it works on trial and error basis. The network has to be trained again and again by changing it parameters and adjusting its weight. Although this process is a bit time consuming but ultimately it is able to a give an optimum output. 4.2 Process of Classification In this stage the every process done from the start with training the network and then testing along with the results obtained will be explained in details. 4.2.1 Feature Set ECG signals were collected from 35 subjects. The number of trials collected per subject was 10. The signals were collected at a sampling frequency of 256Hz. That means there are a total of 350 separate signals altogether. Now as was explained in the previous sections out of the 10 trials per person 9 were taken for training the system. That means 315 signals were used to train the system in total. The 1 signal
  • 47. 37 kept aside from each person will add together to make 35 signals to be used for testing. The 315 signals would each give a set of feature vector. A matrix is formed with 35 columns where each column represented features extracted from one signal. Thus training matrix has 9 columns of features extracted per person. This will help to form the target matrix where each of its columns will correspond to a certain subject and the row with a value of one will indicate the person. 4.2.2 Training the network At first when the neural network start is to choose the pattern recognition tool since we are trying to recognize individual using certain feature collected from their signals. Then the training matrix and it corresponding target matrix will be loaded. The training matrix will be used as a set of input. The training matrix will show which feature vector belong to which individual. The system will be simulated based on these to matrices provided. The network has to be trained several times until the percentage error is small. While training the network repeatedly some of the features like the number of neurons may be altered to achieve a good performance and lower error percentage from the system. As can be seen from the table below that each class is assigned to one individual. Table 4-1: Subjects assigned to classes
  • 48. 38 The network will be trained according to class to see the variations in features that one subject might have. This will help for it to figure out the function for the network. The different aptitudes of the network during training and testing is terms of time, performance, success rate, failure rate, iterations and other criterions will be elaborately discussed. The neural network training interface is shown in figure 4-4. At the top we can see the architecture of the network. The layer of the network can be seen. The 50 written below the hidden layer shows 50. The default is 10. But by trial and error it was found that 50 neurons at the hidden layer gave the best result. The goal of the mean squared error is to reach 0 according to the network. The validation check shows 6 maximum failures. So this means if the maximum failure is 6 then the system will stop its process. The gradient value also has to be accounted for because as soon as its minimum value is met the system will halt too. From the training tool it can be seen that the target was met after 97 iterations and at a gradient of 0.000746. Figure 4-4: Neural Network Training tool
  • 49. 39 The plots section in the training tool shows different methods of analysing the network. The first one is the performance graph. This is derived from the training, validation and testing set. Now when we give the training matrix and target matrix for training the network the system automatically takes a certain percentage of the training matrix data for validation and testing. In this case the percentage stand at 15%. This is done because the validation will indicate the system to stop if they maximum failure reaches the number of validation samples. That means the network generalization is no longer showing any improvements. The test samples are mainly used to check the performance of the network. That means to see how effective the network is. The graph in figure 4-5 will show a representation of it. Figure 4-5: Performance graph of the network It can be seen from the graph in figure 4-5 that the training data has crossed the threshold of the Mean Squared error long before the validation and test data. The test data remains on the top showing that the test performance is good which means the network is working in the correct part. The validation is right below the test curve. Both the validation and test curve pass the best threshold at the same time. The
  • 50. 40 circle on the graph shows the point where the best iteration is achieved which is at 91 iterations. This is the returning point of the network. Another graph to be observed is the training state graph. This graph in figure 4-6 is plotted based on the training state during the training of the network. This will show how the goal to meet an MSE of 0 was achieved. The graph will show the gradient flow and validation checks in two separate graphs. Figure 4-6: Training state graph of gradient and validation checks 4.2.3 Testing the network The test samples taken by the network itself during the training are small. Thus it is better to check with a larger database. The matrix that was created with a feature vector set from 1 signal from each of the 35 person will be used to test the system. A target matrix created for it will also be used. After the target function is achieved with the lowest error percentage the network is tested with a set of input. This is done to check how correct the network that is
  • 51. 41 formed is. It is necessary to check if the network works properly with a different set of input. Thus an input matrix with a feature set obtained from each subject is used to test the network. A target matrix corresponding to the input matrix is also formed. The test matrix with 35 columns formed where each column represents the feature vector extracted from each signal kept from a person. It must be made sure the trial number used for the each of the 35 subjects is the same. If trial 10 is being used for testing then all the input taken from all the subjects must be trial 10. There are several aspects that will be studied in the process. The very first one will show the percentage of recognition on a confusion matrix. The confusion matrix for 35 subjects is too large to see the details properly so a confusion matrix with 20 inputs tested is given below in figure 4-7. Figure 4-7: Confusion matrix with test results of 20 subjects
  • 52. 42 A confusion matrix is used to represent the target classes with respect to the number of inputs. It will show the whole matrix of the results obtained. But the green diagonal line shows the correct results. This line is an infusion of the target input matrix with the number of classes. A 1 in each of the boxes along the line shows that the individual has been recognised. A 0 in the box means that the individual could not be recognised. In the blue which is in the last column of the last row shows the percentage of recognition. The percentage value written in green colour in the blue box shows the recognition percentage. And the value is set at 100%. That means 20 out of 20 people were correctly identified. This is a very good result. In the same way the recognition rate for 35 subjects were found to be at 95%. This is a very good result too. Thus it is an efficient biometric system.
  • 53. 43 CHAPTER 5: DISCUSSION This chapter will provide and overall discussion on all that have been done starting from the collection of ECG signals until recognition of an individual using a person‟s ECG signal characteristics. There were some drawbacks, benefits and observations made while working with this project. They will be discussed here elaborately. The first chapter discusses the Heart structure and also explains the procedure of obtaining ECG from it. It shows the different ways that ECG can be collected from a human. In this experiment the ECG is obtained using Lead 1 forming an Einthoven‟s triangle. This is one thing that is different from many other projects where the ECG is taken in other different forms. The second chapter discusses a previous study where a biometric recognition system is made. Its process is described so give an overview of the processes and steps involved in creating a biometric recognition system. The other reason for writing about that specific paper was to explain how the PCA method was used previously. PCA was going to be used in this project in a different way. That chapter also described the initial proposed methodology. The initial methodology had a pre- processing stage but the feature extraction part provided two possible ways to handle it. One was fiducial feature detection and the other was the non-fiducial feature extraction. The fiducial feature extraction was related to extracting temporal, amplitude and angular features. The non-fiducial feature was based on extracting the auto correlated data and reducing the dimension of the autocorrelated data using PCA. The third chapter showed the methodology that was followed. In the pre-processing stage a wavelet filter (daubechies) was used instead of the bandpass filter. This was done because the wavelet filter was found to have caused lower damage to the signal when filtering it. Then the signal was windowed to avoid the possibility of spectral leakage. The other reason was that it was found that the recognition rate with windowing was better. In the feature extraction part both the fiducial and non-fiducial points were tried but the non-fiducial features gave better result. Fiducial points were not chosen to work
  • 54. 44 with because this process chose specific data from the signal and worked with them. This limits or ignores other details that the signal might have. Then the programming is also longer and time consuming. Whereas the non-fiducial points extracted by autocorrelation are more global. It does not focus on specific attributes. It takes out all the important data by itself from the signal and ignores the insignificant ones. This is another disadvantage in fiducial points because we do not know which features will be good for usage. The features in ECG are always varying in the signals even for the same person. Sometimes some features do not yield good results. This adds another reason to why choosing fiducial points is not a good idea. So ultimately the chosen method for feature extraction is autocorrelation. The autocorrelated feature extraction extracts a large number of features. So we focus on 400 data points of it taken from its maximum point to the right. This is the region of QRS complex which was found not to change much for the same person a lot over time. Still data from autocorrelation is a lot so PCA is used to further reduce the dimension. The Hotelling T2 statistic obtained is the focus here. The Hotelling T2 statistic data extracted from the autocorrelated data of each signal is taken as the final features to be used for the classification stage. In the fourth chapter the features extracted for each person will be used to classify and determine the correct individual. In order to do this a neural network is chosen. This is chosen because it is a very powerful tool which can see or point out things that a human can miss. It works like a human brain biological neural network. Each neuron is a computational unit and is interconnected to other neurons in the subsequent layers. So the results are massive simple processing units which work together to achieve massive parallel distributed processing. The requirement from it is that it should be able to recognise a pattern of data and specify it to a certain class. This is why a pattern recognition tool in the neural network is chosen. A feed forward network is chosen because it is the simplest and easiest among all the other networks. The network uses Multi-layer perceptron. Each layer in MLP is a computational unit. Thus there are multiple computational units working together to achieve the best results. The MLP uses a learning algorithm known as the Back- propagation which has an error function in it. The error function calculates the error every time the network is trained with data. The target is to minimise the error. The
  • 55. 45 back-propagation uses supervised learning to send the error back through the network until the target is met. The error minimisation is done using the scaled conjugate gradient function for its benefits like it does not use line searching per learning iteration which reduces the time consumption on its part and its faster than any other second order algorithm. Chapter 4 also shows the various performance of the network from different perspective and they have shown good result as well. So all in all it is a good idea to choose neural network for the classification stage. The last step in chapter 4 where after the network is created it is tested with some input and it has a high recognition rate. For 20 individuals it has given a recognition rate of 100%. And with 35 people it has given 95% recognition rate. Although all the 35 individuals have not been identified correctly but still the percentage is comparatively high and good. It can be concluded that it is a good biometric recognition system.
  • 56. 46 CHAPTER 6: CONCLUSION The method of using fiducial detection in recognition is not a reliable option. Rather non-fiducial is showing good result and reliable and durable performance. It has computational advantages too compared to the fiducial detection. This is faster than the fiducial detection algorithm. In a previous research paper a biometric recognition system was designed using autocorrelation (AC) based feature extraction and dimensionality reduction using Discrete Cosine Transform(DCT). The research of the AC/DCT was based on 11 subjects and the recognition achieved was 90%. The methods used in this project are 1. The filtering of the signal which uses a wavelet filter which damages the signal less 2. The whole filtered signal is windowed to avoid spectral leakage. 3. The dimensionality reduction feature uses a different algorithm known as the Principal Component Analysis(PCA) from which the Hotelling T2 statistic data is used for classification The results obtained for a scope of 35 individuals gave a result of 95%. Thus it can be concluded that this is a better system than the AC/DCT. 6.1 Future work and recommendation The system has shown good performance and result. Thus it would be good idea to carry out further research on this in the future. The results that have been shown with 35 individuals are not 100%. When the numbers of individuals were 20 the recognition was 100%. For higher individuals the percentage has gone down. This is something that can be improved in the future. A possible way to improve this would be gather a larger database. ECG signals tend to change over time. This means more signals have to be collected from the individuals over a long period of time so that as many variations as possible in the signal of a person can be captured. Then train the network so that the neural network has a better idea of the variability of the features of a person. Thus the network can base its decision from a large range of data.
  • 57. 47 References [1] Augustyniak, P., & Tadeusiewicz, R. (2009). Ubiquitous Cardiology: Emerging Wireless Telemedical Applications. United States of America: Information Science Reference. pp.11-54 [2] MADZRI, F. R. B.(2010). Study and Classification of ECG signals(Unpublished Master‟s Thesis). Multimedia University, Cyberjaya, Malaysia [3] Jianchu Yao,Senior Member IEEE and Yongbo Wan, A Wavelet Method for Biometric Identification Using Wearable ECG Sensors, Proceedings of the 5th International Workshop on Wearable and Implantable Body Sensor Networks, in conjunction with The 5th International Summer School and Symposium on Medical Devices and Biosensors,The Chinese University of Hong Kong, HKSAR, China. Jun 1-3, 2008 [4] Konstantinos N. Plataniotis, Dimitrios Hatzinakos, Jimmy K. M. Lee, Edward S. Rogers Sr. Department of Electrical And Computer Engineering, University Of Toronto, ECG Biometric Recognition Without Fiducial Detection, 2006. [5] YongjinWang, Foteini Agrafioti, Dimitrios Hatzinakos, and Konstantinos N. Plataniotis, Analysis of Human Electrocardiogram for Biometric Recognition, August 2007 [6] Yogendra Narain Singh and Phalguni Gupta, ECG to Individual Identification [7] Zulhadi Zakaria, Nor Ashidi Mat Isa, Shahrel A. Suandi, Member IEEE, Universiti Sains Malaysia, A study on the neural network training algorithm for Multiface detection in static images, 24-26 February 2010. [8] Yongbo Wan and Jianchu Yao, Senior Member, IEEE, A Neural Network to Identify Human Subjects with Electrocardiogram Signals,2008 [9] Ognian Boumbarov, Yuliyan Velchev, Strahil Sokolov, Technical University of Sofia, 8, Kl. Ohridski str., Sofia 1000, Bulgaria, ECG Personal Identification in Subspaces using Radial Basis Neural Networks, IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 21-23 September 2009 [10] Robyn Ball and Philippe Tissot, Division of Nearhsore Research, Texas A&M University, Demonstration of Artificial Neural Network in Matlab [11] S.Zahra Fatemian, Student Member, IEEE, Dimitrios Hatzinakos, Senior Member, IEEE, The Edward S. Rogers SR. Department of Electrical and Computer Engineering,University of Toronto, 10 King's College Circle, Toronto, A New ECG Feature Extractor For Biometric Recognition [12] Y. Gahi, M. Lamrani, A. Zoglat, M. Guennoun, B. Kapralos, K. El-Khatib, Département Math-Info, Faculté des Sciences de Rabat, 4 Avenue Ibn Battouta
  • 58. 48 B.P. 1014 RP, Rabat, Maroc, University of Ontario Institute of Technology, 2000 Simcoe Street North, Oshawa, Ontario, Canada, Biometric Identification System Based on Electrocardiogram Data [13] Loh Sik Hou, Khazaimatol S. Subari and Syed Syahril, Faculty of Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia, QRS complex of ECG- based Biometrics in Two Level Classifier [14] Supervised Learning in Neural Networks, 2012 Marcus bros. Retrieved from http://www.neuralnetworksolutions.com/nn/supervised3.php [15] MATLAB 'spectrogram' params, Retrieved from http://stackoverflow.com/questions/5887366/matlab-spectrogram-params [16] Neural network, Wikipedia, Retrieved from http://en.wikipedia.org/wiki/Neural_network [17] Artificial neural network, Wikipedia, Retrieved from http://en.wikipedia.org/wiki/Artificial_neural_network [18] Feedforward neural network, Wikipedia, Retrieved from http://en.wikipedia.org/wiki/Feedforward_neural_network [19] Backpropagation, Wikipedia, Retrieved from http://en.wikipedia.org/wiki/Backpropagation [20] Scaled Conjugate Gradient (SCG), 1995, Retrieved from http://www.csc.kth.se/~orre/snns- manual/UserManual/node243.html#SECTION0010172000000000000000 [21] Justin Leo Cheang Loong, Khazaimatol S Subari, Rosli Besar and Muhammad Kamil Abdullah, A New Approach to ECG Biometric Systems: A Comparitive Study between LPC and WPD Systems, 2010
  • 59. 49 Appendix A Matlab Code 1. Training Program S = load('trainingsamples.mat'); C = struct2cell(S); for m= 1:315 A = cell2mat(C(m)); N = 20*256; my_signal=A(1:N,5); x= my_signal; % Normalised Signal [-1 1] x = x - mean (x ); % cancel DC conponents x = x/ max( abs(x)); N = length (x); % Signal length t = [0:N-1]/256; figure (1) plot(t,x); % Filtered Signal [thr,sorh,keepapp] = ddencmp('den','wv',x); xd = wdencmp('gbl',x,'db6',3,thr,sorh,keepapp); w = hamming(N,'periodic'); sig=w.*xd; % Autocorrelation r=xcorr(sig,'coeff'); k=length(r); txcorr = [0:k-1]/256; plot(txcorr,r) [x y]=max(r); a=txcorr(y); ny=y+400; b=txcorr(ny); xnew= r(y:ny); k=length(xnew); txcorr = [0:k-1]/256; plot(txcorr,xnew); % Principal Component Analysis [coeff(:,m),score(:,m),latent(:,m),tsquare(:,m)] = ... princomp(xnew); plot(txcorr, tsquare(:,m)); end
  • 60. 50 2. Testing program S = load('trainingsamples.mat'); C = struct2cell(S); for m= 1:35 A = cell2mat(C(m)); N = 20*256; my_signal=A(1:N,5); x= my_signal; % Normalised Signal [-1 1] x = x - mean (x ); % cancel DC conponents x = x/ max( abs(x)); N = length (x); % Signal length t = [0:N-1]/256; figure (1) plot(t,x); % Filtered Signal [thr,sorh,keepapp] = ddencmp('den','wv',x); xd = wdencmp('gbl',x,'db6',3,thr,sorh,keepapp); w = hamming(N,'periodic'); sig=w.*xd; % Autocorrelation r=xcorr(sig,'coeff'); k=length(r); txcorr = [0:k-1]/256; plot(txcorr,r) [x y]=max(r); a=txcorr(y); ny=y+400; b=txcorr(ny); xnew= r(y:ny); k=length(xnew); txcorr = [0:k-1]/256; plot(txcorr,xnew); % Principal Component Analysis [coefftest(:,m),scoretest(:,m),latenttest(:,m),tsquare ...test(:,m)] = princomp(xnew); plot(txcorr,tsquare(:,m)); end
  • 61. 51 3. Training and Input target matrix target=zeros(35,315); training_target=[target(1,1:9)+1 target(1,10:315) target(2,1:9) target(2,10:18)+1 target(2,19:315) target(2,1:18) target(2,19:27)+1 target(2,28:315) target(2,1:27) target(2,28:36)+1 target(2,37:315) target(2,1:36) target(2,37:45)+1 target(2,46:315) target(2,1:45) target(2,46:54)+1 target(2,55:315) target(2,1:54) target(2,55:63)+1 target(2,64:315) target(2,1:63) target(2,64:72)+1 target(2,73:315) target(2,1:72) target(2,73:81)+1 target(2,82:315) target(2,1:81) target(2,82:90)+1 target(2,91:315) target(2,1:90) target(2,91:99)+1 target(2,100:315) target(2,1:99) target(2,100:108)+1 target(2,109:315) target(2,1:108) target(2,109:117)+1 target(2,118:315) target(2,1:117) target(2,118:126)+1 target(2,127:315) target(2,1:126) target(2,127:135)+1 target(2,136:315) target(2,1:135) target(2,136:144)+1 target(2,145:315) target(2,1:144) target(2,145:153)+1 target(2,154:315) target(2,1:153) target(2,154:162)+1 target(2,163:315) target(2,1:162) target(2,163:171)+1 target(2,172:315) target(2,1:171) target(2,172:180)+1 target(2,181:315) target(2,1:180) target(2,181:189)+1 target(2,190:315) target(2,1:189) target(2,190:198)+1 target(2,199:315) target(2,1:198) target(2,199:207)+1 target(2,208:315) target(2,1:207) target(2,208:216)+1 target(2,217:315) target(2,1:216) target(2,217:225)+1 target(2,226:315) target(2,1:225) target(2,226:234)+1 target(2,235:315) target(2,1:234) target(2,235:243)+1 target(2,244:315) target(2,1:243) target(2,244:252)+1 target(2,253:315) target(2,1:252) target(2,253:261)+1 target(2,262:315) target(2,1:261) target(2,262:270)+1 target(2,271:315) target(2,1:270) target(2,271:279)+1 target(2,280:315) target(2,1:279) target(2,280:288)+1 target(2,289:315) target(2,1:288) target(2,289:297)+1 target(2,298:315) target(2,1:297) target(2,298:306)+1 target(2,307:315) target(2,1:306) target(2,307:315)+1 ]; target_test=zeros(35,35); input_target=[target_test(1,1)+1 target_test(1,2:35) target_test(2,1) target_test(2,2)+1 target_test(2,3:35) target_test(2,1:2) target_test(2,3)+1 target_test(2,4:35) target_test(2,1:3) target_test(2,4)+1 target_test(2,5:35)
  • 62. 52 target_test(2,1:4) target_test(2,5)+1 target_test(2,6:35) target_test(2,1:5) target_test(2,6)+1 target_test(2,7:35) target_test(2,1:6) target_test(2,7)+1 target_test(2,8:35) target_test(2,1:7) target_test(2,8)+1 target_test(2,9:35) target_test(2,1:8) target_test(2,9)+1 target_test(2,10:35) target_test(2,1:9) target_test(2,10)+1 target_test(2,11:35) target_test(2,1:10) target_test(2,11)+1 target_test(2,12:35) target_test(2,1:11) target_test(2,12)+1 target_test(2,13:35) target_test(2,1:12) target_test(2,13)+1 target_test(2,14:35) target_test(2,1:13) target_test(2,14)+1 target_test(2,15:35) target_test(2,1:14) target_test(2,15)+1 target_test(2,16:35) target_test(2,1:15) target_test(2,16)+1 target_test(2,17:35) target_test(2,1:16) target_test(2,17)+1 target_test(2,18:35) target_test(2,1:17) target_test(2,18)+1 target_test(2,19:35) target_test(2,1:18) target_test(2,19)+1 target_test(2,20:35) target_test(2,1:19) target_test(2,20)+1 target_test(2,21:35) target_test(2,1:20) target_test(2,21)+1 target_test(2,22:35) target_test(2,1:21) target_test(2,22)+1 target_test(2,23:35) target_test(2,1:22) target_test(2,23)+1 target_test(2,24:35) target_test(2,1:23) target_test(2,24)+1 target_test(2,25:35) target_test(2,1:24) target_test(2,25)+1 target_test(2,26:35) target_test(2,1:25) target_test(2,26)+1 target_test(2,27:35) target_test(2,1:26) target_test(2,27)+1 target_test(2,28:35) target_test(2,1:27) target_test(2,28)+1 target_test(2,29:35)
  • 63. 53 target_test(2,1:28) target_test(2,29)+1 target_test(2,30:35) target_test(2,1:29) target_test(2,30)+1 target_test(2,31:35) target_test(2,1:30) target_test(2,31)+1 target_test(2,32:35) target_test(2,1:31) target_test(2,32)+1 target_test(2,33:35) target_test(2,1:32) target_test(2,33)+1 target_test(2,34:35) target_test(2,1:33) target_test(2,34)+1 target_test(2,35) target_test(2,1:34) target_test(2,35)+1 ]; 4. Classification testing outputs = sim(net,tsquaretest); outputs = round(outputs); figure(1); plotconfusion(input_target,outputs)