SlideShare a Scribd company logo
1 of 42
Download to read offline
Interdisciplinary Project - Master of Computer Science
Evaluation Of Learning Algorithms For
Tracking Neuronal Signals With EDC
Probes
Ramin Zohouri
Autonomous Intelligent Systems Laboratory
Department Of Computer Science
Microsystem Material Laboratory
Department Of Microsystem Engineering
University Of Freiburg
Author : Ramin Zohouri
Master’s Degree Program in Computer Science
Interdisciplinary Project
Evaluation Of Learning Algorithms For Tracking Neuronal Signals With EDC Probes
Examiner: Prof. Dr. Wolfram Burgard and Prof. Dr. Oliver Paul
Supervisor: Dr. Barbara Frank
II
ABSTRACT
CMOS-integrated electronic depth control (EDC) probes are high-density Microelectrode arrays which
are used for monitoring the activity of the neuron of interest in the ensemble of neurons. EDC probes are
capable of selecting channel configurations in order to measure signals in different regions along probe
shank. However, probe and neural activity may shift in the brain and it is necessary to keep track of
activities in the brain by choosing next channel configuration.
In this project, we try to identify recording channels and track down their activities using features
extracted from their measured signals. Given feature vectors of each channel we first apply pre-processing
step for normalization and dimension reduction. Then we employ different supervised machine learning
algorithms to identify channels and find out which algorithm is more appropriate for this task. Then we
test this already trained models with another recording session with the same channel configurations.
The prediction result shows that it is possible to track down neural activity between different recording
sessions. Furthermore, off-diagonal values on the confusion matrix of the test phase show that we may
have the probe or activity shift between consecutive recording sessions.
04.2013 - 04.2014
Ramin Zohouri
III
ACKNOWLEDGMENT
I have taken efforts in this Project. However, it would not have been possible without the kind support
and help of many individuals and organizations. I would like to extend my sincere thanks to all of them.
I am highly indebted to Prof. Dr. Wolfram Burgard for his guidance and supervision as well as for
providing necessary information regarding the project. I would like to express my gratitude towards
Prof. Dr. Oliver Paul for his kind co-operation and encouragement which help me in the completion of
this project.
Furthermore, I would like to thank Dr. Barbara Frank for the useful comments, remarks and engage-
ment through the learning process of this interdisciplinary project. My thanks and appreciations also
go to Dr. Patrick Ruther in developing the project and EDC++ project members who have willingly
helped me out with their abilities.
IV
CONTENTS
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Related Works 4
2.1 Spike Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Detecting active brain region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Recoding from same neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Channel Identification 6
3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Spike detection and feature computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Normalization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.2 Attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Machine learning for channel identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 K-Nearest Neighbour (K-NN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.2 Iterative Dichotomiser 3 (ID3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.3 Random forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.4 Support Vector Machine (SVM) Algorithm . . . . . . . . . . . . . . . . . . . . . . 12
3.4.5 Corss-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.6 Hyper-parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Tracking down neural activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Experiments and Results 14
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Spikes and features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Data normalization and attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Training and validation of classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Tracking down neural activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5.1 Testing trained models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5.2 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Summary 31
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Feature works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A Feature List 33
A.1 Feature list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Bibliography 35
1
CHAPTER 1
INTRODUCTION
1.1 Motivation
Understanding of brain functionality and the complex interaction of the large neural network with huge
numbers of neurons is one of the most challenging research fields in the neuroscience. Development of
appropriate tools opens a new perspective in research and application, e.g. in neural prostheses, as well
as for diagnoses and therapy of neurodegenerative diseases including Alzheimer, Parkinson and epilepsy.
Recordings of single neuron activity within an ensemble of neurons are required for a basic understand-
ing of neural processes [1]. By this aim within The European Project NeuroProbes a new high-density
electrode array for recording with the high spatial resolution was introduced and successfully tested for
the first time in vivo experiments [2, 3]. These probes contain 188 electrodes configured in 2 rows. Di-
rectly integrated CMOS multiplexing units on the probe shafts enables a drastic increase in the number
and density of electrodes in NeuroProbe to compare to existing devices [4]. The density of such arrays
makes it possible to switch between the electrodes and reach a close proximity between the neuron of
interest and the recording electrode. In this context the concept of switching between individual micro-
electrodes of the same shaft, without the need to reposition either the shaft or the entire probe is called
electronic depth control (EDC).
EDC allows us to switch between electrodes and scan their signals along the probe shank and select
those with higher signal quality. However during long-term in-vivo recording there are moments which
the current configuration of the electrodes is not able to record qualified signals. One reason for losing
qualified signals might be the occurrence of a drift in probe position. This drift may occur for several
reasons e.g. inflammation in the brain tissue, human interaction or unexpected animal movement. Such
a drift causes us to lose track of an activity of interest which is recorded earlier or in previous sessions.
Furthermore, for discriminating a single neuron and studying its behaviour in long term it is necessary
to make sure that the probe remains in the starting configuration and a particular channel and records
from the neuron of interest. In addition to that, Having prior information about quality and properties
of signals recorded by each channel makes it possible to select the next configuration more efficiently and
accurately which provide us with high quality and less noisy signals from neural activities. Therefore,
we need to be able to identify each recording channel(each channel is assigned to one electrode during
the recording).
1.2 Problem Statement
In this work, we try to identify characteristics of each recording channels. For this purpose, first, we
compute sets of features from measured signals of each channel. Then we applied supervised machine
learning techniques to identify recording channels based on the computed features. In the This con-
text, class labels are channel IDEs connected to particular electrode on the probe and their activity
2
CHAPTER 1. INTRODUCTION 1.3. OUTLINE
represented by sets of extracted features from their measured signals. There are three main challenges
here. first computing features from the measure signals and choosing relevant methods to do such a
computation. Second, select an appropriate supervised machine learning algorithm and select suitable
number of features in order to get maximum classification accuracy given a learning algorithm. Third,
providing series on analytical approaches to interpret the classification results and make a conclusion
about channel identification and drift occurrence.
It can thus be suggested that such an identification makes us able to track down a particular activity
during and between a long term in-vivo recording sessions and deal with the probe drift from its original
position. In the other words, if we lose a signal quality in a recording session we could use this prior
information and chose electrodes which they have shown higher signals quality for next configuration.
Furthermore, unintended movement in the probe position between different recording sessions with same
recording configuration could be detectable. If there was the drift in the probe position between recording
session we could observe that a particular activity now identified to be in a new channel below or above
its original channel relative to drift direction.
1.3 Outline
The rest of this work is structured as follows. In the next chapter, we discuss related work about
Electronic Depth Control (EDC) in intra-cortical recordings and spike detection and detecting active
brain regions and finally recording from the same neuron in motor cortex. In Chapter 3, we discuss in
fine detail our approach for spike detection and feature extraction from in-vivo recording data sets and
classification and channel identification using these features in order to track down the neural activities
between different recording sessions and in the long term in-vivo recordings. In Chapter 4, we present
the results of our chosen approach. Finally, in chapter 5 we summarize what we have achieved in this
work and discuss how this work can be extended.
3
CHAPTER 2
RELATED WORKS
2.1 Spike Detection Algorithms
In the extracellular recording, a spike or action potential is a short lasting high amplitude signal fired
by a neuron. Spikes are produced by rising and falling potential of the neuron cell membrane. During
neural activity a neuron fires spikes by particular amplitude, shape, and different rates. Each neuron
has spikes of a characteristic shape and firing rates, which is mainly determined by the morphology of
their dendrite trees and the distance and orientation relative to the recording electrodes [5]. In order
to extract features from recordings first we need to extract recorded spikes of each channel. There are
two common types of spike detection algorithms available. First supervised algorithms which need user
invention such as window discrimination [1], principal components analysis [6] and matched filtering
[7]. However, using supervised algorithms would be very tedious when we have a comb of multi-array
electrodes, and we need to adjust the setting for each channel separately. The second common type of
spike detection algorithms are unsupervised category. These algorithms require no user intervention e.g.
algorithms based on amplitude detection [8],non-linear energy detection [9, 10, 11] and wavelet-based
detection [12, 13, 14]. In a study by Obeid and Wolf [15], spike detection algorithms have been compared
taking into account their accuracy and their computational cost. It was found that taking the absolute
value of the neural signals before applying a threshold in combination with a refractory period is just
as effective for spike detection as more elaborate energy based schemes. Therefore, in this work we used
absolute value of signal and adaptive-threshold spike detection algorithm.
2.2 Detection of Active Brain Regions Using a Machine Learn-
ing Approach
Given we have all detected spikes for each channel we need to compute sets of essential features from
them and use them to identify properties of a recording channel. There were approaches by Ramirez
et al. [16] which applied machine learning algorithms in order to classify activity of each channel and
find out which channel records a single unit activity and which one multi-unit activities. In their work
first they have trained a learning algorithm using feature extracted from detected spikes of labelled data
and then have performed prediction on the unlabelled data. They present in their work which kind of
features could possibly be extracted from detected spikes and which combinations of those features would
lead to more accurate classification result. Their goal in the short-term was to develop algorithms that
assist the neuroscientists in detecting active brain regions. And their long-term perspective was to have a
smart neural recording array which allows for the finding and maintenance of high quality neural signals
through the fully automatic selection of many electrodes in active brain regions. The kind of features
they have used have two main categories. First features computed directly from measured signal itself
i.e. Min, Max, Mean, Median, Standard Deviation (STD) and Root Mean Square (RMS). Second kind
of features that are computed from detected spikes i.e. Signal to Noise Ratio (SNR), average Firing
Rate and Maximum Firing Rate of spikes. These second category of features have different variation
regarding refractory time for spikes i.e. 1 ms or 2 ms and different firing rates and average firing rates
4
CHAPTER 2. RELATED WORKS 2.3. RECODING FROM SAME NEURON
time window i.e 20 ms, 100 ms, 500 ms, 10s. In our work we use all possible combinations of these
features i.e. 43 features all together, to get highest accuracy in classification result. The major difference
between approach by Ramirez et al. [16] and our goal is they tried to classify the activity types i.e single
unit activity (SUA), multi-unit activity (MUA), and noise activity (NA) but we trying to classify the
recording channels.
2.3 Recording From the Same Neurons Chronically in Motor
Cortex
Neuro-Biologists during chronic extracellular recordings frequently have observed similar activity recorded
on the same electrode from day to day. Occasionally a single neuron has had some unusual characteristic,
such as a distinctive waveform or some unusual and obvious firing property, that makes it clear that this
same neuron was present in multiple sessions. The possibility that some neurons may be represented
multiple times in a series of recording sessions creates a problem and an opportunity. Separately recorded
neurons may not actually represent independent sources of data, so statistical tests that assume each
unit is an independent sample may not be valid. However, If the same neuron could be identified as
such through multiple sessions, it would be possible to combine data and thereby estimate the firing
properties of that neuron with greater confidence. Fraser and Schwartz [17] have developed a new metric
of unit identity using pairwise cross-correlograms between neurons in a simultaneously recorded pop-
ulation. It has provided unit identification information comparable to that based on wave shape. By
combining this metric with wave shape, autocorrelation shape, and mean firing rate, they were able to
clearly identify whether two separately recorded units represent the same or different underlying neurons.
There are similarities between the goal of our project and work of Fraser and Schwartz [17]. They have
used feature vector consist of firing rate and wave from of spikes to represent the activity of each channel
and then used these features to classify the activity of each neuron. The had some strong assumption
that by using Utah micro-array [18], which has electrode pitch of 400µm, each electrode would record
from different neurons. In the other words, they assume it is less likely that two adjacent electrodes
record from the same neuron. This assumption makes it possible for them to use waveform of the spikes
as part of their features to track down that the activity of a particular neuron in long-term recordings
and between different recording sessions. However, in ECD-Probes with a high electrode density, it is
more likely that some adjacent electrodes record from the same neuron due to small pitch size i.e. 40µm.
Therefore, in our work we use different feature vector to represent the activity of each channel and then
we apply the supervised learning algorithms to identify channels and track down their activity of each
channel.
5
CHAPTER 3
CHANNEL IDENTIFICATION
3.1 Approach
The main purpose of this project is channel identification for tracking a neuron of interest between dif-
ferent recording sessions. To do this we design a pipeline with four major steps which leads up to desired
conclusion. The diagram in the figure 3.1 shows this four major step.
Spike Detection And
Feature Extraction
Data Pre-Processing : Normal-
ization And Attribute Selection
Supervised Machine Learning
For Channel Identification
Tracking Down Neural Activities
Figure 3.1: The diagram shows project pipeline. First, detecting spikes and computing features from measured signals
of each channel. Second, applying different normalization techniques and attribute selection methods on the provided
dataset. The dataset here is computed feature vectors for all channels. Third, training and evaluating the performance of
the different classifiers in conjunction with attribute selection methods and normalization techniques. This would enable
us to identify a particular channel based on its computed features. Fourth, training and testing classifiers with the dataset
of consecutive recording sessions. This will allow us track down neural activities between different recording sessions and
detect unintentional drift in the probe position.
Based on the diagram in figure 3.1, first, we need to characterize each channel given their measured
signals. Each channel could be represented by sets of features extracted from its measured signals. As
we mentioned earlier in the previous chapter, there were different methods for detecting action potentials
or spikes and feature extraction [16] in order to classify the activity type of each recording channel using
supervised machine learning algorithms. We employ same methods to extract features.
After computing features for each recording channel, we try some data pre-processing steps i.e. nor-
malization techniques and attribute selection methods in order to deal with the noisy data and increase
the prediction accuracy of the classification. Then we apply machine learning algorithms to find out who
6
CHAPTER 3. CHANNEL IDENTIFICATION 3.2. SPIKE DETECTION AND FEATURE COMPUTATION
well we could identify each channel given their computed features.
Next step is to use supervised machine learning algorithm in combination with pre-processing steps to
identify a channel given its computed features. Therefore we need to train and evaluate the performance
of the different classifiers. This will give us a notion of feasibility of the channel identification problem.
Finally, by using supervised machine learning algorithm we would be able to track down the activity
of different channels. Given that we have two recording sessions with same channel configuration, the
idea is to train a learning algorithm by dataset provided from the first recording session. Then, test
trained models with dataset provided from the second session. The prediction result shows how well the
neural activities are traceable and whether or not we had an unintended movement on the probe position.
An efficient implementation of supervised learning algorithms is available in the Weka machine learn-
ing tool [19]. In this work we implement a light framework for detecting spikes and computing features
from them using C++ programming language. In the result section, we observe the performance of each
of the introduced algorithms to find out which one gives more accurate results and better identification.
3.2 Spike Detection Algorithm and Feature Computation
To compute features for each channel we need to extract spikes of measured signals of that channel. Here
each channel connected to a specific electrode on the probe shank and these connection adjustable for
each particular recording session. Figure 3.2 shows 10 seconds of raw measured signal for eight channels
in EDF format. Afterward, all recordings are filtered using a bandpass filter between frequencies of 500
Hz and 5000 HZ. Then they could be processed in order to calculate the attributes to characterize the
recorded signals.
Figure 3.2: This graph shows 10 second of raw recordings of neural activity for eight different channels before filtering.
Signal units are in mV (plot using EDFBrowser [20]).
Figure 3.3 shows the same recorded signals depicted in figure 3.2 after it has been filtered.
7
CHAPTER 3. CHANNEL IDENTIFICATION 3.2. SPIKE DETECTION AND FEATURE COMPUTATION
Figure 3.3: This graph shows 10 second of filtered signals for eight different channels, filtered by band-pass filter between
500 Hz and 5000 Hz. Signal units are in mV (plot using EDFBrowser [20]).
In this work, we apply adaptive threshold spike detection algorithms [15]. The work follows for spike
detection and computing features is done similarly as the work we introduced in previous chapter [16].
The idea is first to estimate the background noise for the time window of 50 ms. Then detect all signal
samples whose their absolute value exceeds this noise level by a factor of 3.5 or 5. After detecting spikes
we could compute the signal to noise ratio (SNR) value of each channel in the time window of 10s. First
RMS of each spike is calculated using the signal 0.5 ms before the peak of the spike and 1 ms after the
spike. Then the RMS of all spikes is averaged and the RMS of the noise is calculated, where noise is the
portion of the signal excluding the founded spikes.
At last the SNR is calculated as follows:
SNR = 20 · log10
¯RMSspikes
RMSnoise
(3.1)
In order to compute appropriate features, we use different combinations of the threshold i.e. 3.5 and
5 and refractory time for spike detection i.e. 1 ms and 2 ms. This method makes it possible to detect
spikes with 4 different combinations. For each one of these combinations, the maximum firing rate in
intervals of 20 ms, 100 ms, 500 ms and 10s and their average value were calculated and defined as at-
tributes, which, taking into account the 4 different combinations of parameters for the SNR calculation,
produces 36 different attributes. For example, if using a 3.5 threshold multiplier with a 1 ms spike re-
fractory window defines nine attributes for the different maximum and average firing rate intervals, then
using a 2 ms window instead of a 1 ms window yields nine new attributes and so on. There are other
features i.e. minimum (Min), maximum (Max), mean, median, standard deviation (STD), root means
square (RMS) Signal, average noise level (ANL) which we use them to apply classification algorithms
and channel identification. In comparison to [16] the Average Noise Level (ANL) is a new feature which
is computed from the average value of the noise level value in one segment of measured signals , 10 s of a
particular recording session. The value of ANL could represent the quality of the measured signals and
in the experiment section, we will show how ANL value is used as a good feature for classification. We
compute all 43 attribute in our feature vector and each feature vector computed in the time window of
10 s which is one segment of the measured signals.
8
CHAPTER 3. CHANNEL IDENTIFICATION 3.3. DATA PRE-PROCESSING
3.3 Data Pre-Processing
Each element in computed features vectors from extracellular recordings and their detected spikes has
its own range. Since some of the supervised machine learning algorithms are using similarity measures
between feature vectors for classification, features better to have the same scale. Data normalization
techniques are common in machine learning in order to deal with this problem. Basically, global nor-
malization techniques are essential preprocessing part for many machine learning algorithms and boost
their performance.
One more problem here is the number of computed features with our employed approach i.e. 43
feature per sample per class. This is a high dimensional feature vector and would make the classification
task difficult especially when we have some noisy and irrelevant features, computed from noisy measured
signals. In fact, due to noisy measurements or high background noise activities and an existence of
the artifacts in the measured signals, some of the computed features are irrelevant or redundant. This
phenomenon could dramatically reduce the prediction accuracy of the classifiers. However, There are
exist common attribute selection and dimension reduction methods in the field of machine learning for
overcoming these problems. In the following, we briefly explain our candidate methods and techniques
for data normalization and attribute selection.
3.3.1 Normalization Techniques
In order to increase the performance of the supervised learning algorithm we apply normalization tech-
niques on our datasets. Since the computed features have different scales and ranges and this preprocess-
ing steps most of the time shows significant improvement on the performance of the learning algorithm
by reducing the correlations between data samples and scaling them into similar range. In this work we
applied two common types of global normalization techniques which are used frequently [19] in machine
learning algorithms more specifically in Support Vector Machine (SVM) :
• Min-Max Normalization.
´D(i) =
D(i) − Min(D)
Max(D) − Min(D)
∗ (U − L) − L. (3.2)
Here ´D is normalized vector and D is the natural vector, Min(D) is the minimum natural value, Max(D)
is maximum natural value, and U and L are the upper and lower band of the scale range usually between
[0:1] or [-1:1].
• Zero Mean Normalization
´D(i) =
D(i) − ¯µ
σ
. (3.3)
Here ´D is normalized vector and D is the natural vector, µ mean of the natural data, and σ is the standard
deviation of the natural data.
3.3.2 Attribute Selection
High-dimensional feature vectors do not always increase the prediction accuracy of the supervised learn-
ing algorithms. In machine learning feature selection also known as variable selection, attribute selec-
tion or variable subset selection is a technique for reducing the dimensionality of the feature vectors
[21, 22, 23]. Feature selection methods could lead to i) improvement in the prediction performance
of the predictor, ii) faster and more cost-effective predictors, iii) provide the better understanding of
the process that generates the data. In classification problems, especially when there are few sample
data with high-dimensional feature vectors there are possibilities that we have irrelevant and redundant
features. Redundant features are those provide no more information than currently selected features.
Irrelevant features are those provide no useful information in any context. In dealing with extracellular
neural activities, there are necessities to use feature selection methods due to an existence of background
noise activities. High background noise level has the negative impact on the performance of the spike
detection algorithms and quality and quantity of the computed features. Therefore, there is likelihood to
have redundant and irrelevant features and attribute selection could deal with this problem. There are
two common attribute selection methods which are used widely in machine learning field for reducing
the dimensionality of the feature vectors. One is Principle component analysis [24, 23] and second is
9
CHAPTER 3. CHANNEL IDENTIFICATION 3.3. DATA PRE-PROCESSING
Correlated feature selection (CFS) [25, 26] .
Principle Component Analysis (PCA)
Principle Component Analysis (PCA) is a statistical procedure which orthogonally transforms input
data with some dimension n to sets of linearly uncorrelated data with same or lower dimension, m, using
variables called principal components. Mathematically speaking, each principle component represents a
feature from input data in the new coordinate system. The highest rank among the principal components
goes to the feature with the highest variance, and that component lies on the first coordinate of the new
coordinate system and the second rank on the second coordinate and so on. In fact, PCA is not a feature
selection but a feature extraction method. The new attributes are obtained by a linear combination
of the original attributes. Dimensionality reduction is achieved by keeping the m components with the
highest variance out of n original components. The common version of this method has [19] has following
steps :
• Compute the covariance matrix of the original training samples. Then solve all the eigenvectors
and eigenvalues.
• Rank attributes by their individual evaluations. Use in conjunction with attribute evaluators
(ReliefF, GainRatio, Entropy etc).
• Select m highest ranked features.
Correlation Feature Selection (CFS)
The other feature selection method we use in this work is Correlation Feature Selection (CFS) [25, 26].
CFS is a measure that selects a subset of features from original feature vectors in a way that the features
from subset are highly correlated to the class labels and uncorrelated with each other. CFS could ignore
irrelevant features because they have low correlation with the class labels. CFS also screen out redundant
features due to their high correlation with other features. A feature will be accepted if it predicts classes
in the area of the instance space which not already predicted by other features. Given a subset in feature
space S containing k features, CFS evaluates the subset based on the following ”merit” :
MS =
krcf
k + k(k − 1)rff
. (3.4)
where MS is a heuristic merit of the feature subset S containing k features, rcf is the mean feature-
class correlation (f ∈ S), and rff the average feature-feature correlation.
The numerator of the equation 3.4 is the indicator for predictive of a class given a set of features and
denominator shows the amount of the redundancy in among the features. Selecting all possible subsets
of features is very exhaustive and sometimes not feasible due to large number of attributes, in [25, 19]
there are experimental approaches to select heuristics search strategies :
• Forward selection begins with no feature and greedily add one feature at a time until no possible
single feature addition possible.
• Backward elimination begins with all features and greedily remove one feature at a time as long as
evaluation does not degrade.
• Best first search starts either with no features or all features. It progresses forward and add features
or backward and remove features to or from sunset and has a stopping criterion.
Furthermore, there is tree variation of CFS [25, 19] each employing one of the following attribute
quality measures to estimate the correlations in equation 3.7 :
• CFS-UC uses symmetrical uncertainty to measure correlation.
• CFS-MDL normalized symmetrical minimum description length (MDL) principle to measure the
correlation.
• CFS-Relief uses symmetrical relief to measure correlation.
10
CHAPTER 3. CHANNEL IDENTIFICATION 3.4. MACHINE LEARNING FOR CHANNEL IDENTIFICATION
3.4 Supervised Machine Learning Algorithm For Channel Iden-
tification
We need to evaluate the effect of the normalization and attribute selection methods on the different su-
pervised learning algorithm. This will give us a notion about the feasibility of classification and channel
identification problem. By looking at validation result of the classifiers we could argue that how well each
classifier could identify each channel based on it computed features. Here we expect due to density and
geometry of the electrodes on the probe shank to have some similar activities in adjacent electrodes. In
the following, we explain about the four different classifiers i.e. K-Nearest Neighbour (K-NN), Iterative
Dichotomiser 3 (ID3), Random Forest and Support Vector Machine (SVM) and their different param-
eter settings. By comparing their results on our dataset we could select the most appropriate classifier
for our goal. In order to find out how well each classifier generalizes, we need to use cross-validation
technique. Furthermore, some of the supervised learning algorithms need precise parameter selection,
therefore, we use hyper-parameter optimization methods to increase the prediction accuracy of those
learning algorithms.
3.4.1 K-Nearest Neighbour (K-NN)
One of the learning algorithms we have selected is K-Nearest Neighbour (K-NN) [27]. The idea is to
classify an object based on the majority vote of its neighbors, with the object being assigned to its K
nearest neighbors. Each object is represented by its feature vectors and the algorithm uses a similarity
measure in order to find a nearest neighbor e.g. Manhattan distance or euclidean distance. There are
parameters and setting which improve the accuracy of classification e.g. weighting neighbors with their
relative distance and K number of neighbors. The k-nn algorithm is recommended because it is easy
to understand, simple to train and it gives an inside about the feasibility of our classification task.
However, the algorithm readily fooled by noise and irrelevant data and biased by the value of K and it is
computationally intensive for large data sets. By using appropriate nearest neighbor search algorithms
e.g. KD-Tree the K-Nearest Neighbour (K-NN) algorithm would be computationally tractable.
3.4.2 Iterative Dichotomiser 3 (ID3)
The second algorithm we used for classification is Iterative Dichotomiser 3 (ID3) [28]. Here the idea is
to split the dataset to subsets based on the selected attribute and add a non-terminal node to decision
tree and continue this process recursively on each subset. Terminal nodes represent the class label of
its branch. In terms of selecting attributes, we choose the one with largest information gain or smallest
entropy among non-selected attributes. The four main state in Iterative Dichotomiser 3 (ID3) algorithm
are :
• Calculate the entropy of every attribute using the data set S.
• Split the set S into subsets using the attribute for which entropy is minimum (or, equivalently,
information gain is maximum).
• Make a decision tree node containing that attribute.
• Recursively on each subset repeat previous three steps using remaining attributes.
We employ Iterative Dichotomiser 3 (ID3) algorithm because it treats each feature separately based
on a probabilistic approach. It builds the decision tree fast and uses the whole dataset to create the tree.
furthermore, its results are invariant to natural or normalized data. But, the Iterative Dichotomiser 3
(ID3) algorithm may face the over-fitting problem and be biased in favor of some of the attributes with
high information gain.
3.4.3 Random Forest Algorithm
The third supervised learning algorithm which we used is Random Forest [29]. The algorithm creates a
forest of decision trees in training time and outputs the class that is the mode of the classes outputted
by each individual tree. Given a sample set the algorithms grows each tree as follows :
11
CHAPTER 3. CHANNEL IDENTIFICATION 3.4. MACHINE LEARNING FOR CHANNEL IDENTIFICATION
• If the number of cases in the training set is N, sample N cases at random - but with replacement,
from the original data. This sample will be the training set for growing the tree.
• If there are M input variables, a number m << M is specified such that at each node, m variables
are selected at random out of the M and the best split on these m is used to split the node. The
value of m is held constant during the forest growing.
• Each tree is grown to the largest extent possible. There is no pruning.
The Random Forest algorithm follows almost the same core principle of Iterative Dichotomiser 3 (ID3)
algorithm but usually shows better performance and its results are virtually invariant to the normalized
or natural dataset.
3.4.4 Support Vector Machine (SVM) Algorithm
Our fourth candidate algorithm is Support Vector Machine [30] which is one the sophisticated supervised
learning algorithms. The idea of the Support vector Machine is to separate sample data in d dimen-
sional space using d-1 dimensional hyperplanes. There is the reverse relation between the distance of
the hyperplanes to sample points i.e. margin and the generalization error. The larger the margins are
the smaller the generalization error is. Base on that, here we are dealing with an optimization problem.
Whereas in the classification or regression problem mostly facing with non-linear data distribution,
Support Vector Machine (SVM) uses a kernel function to transform the data samples to same or higher
dimensional feature space which they are linearly separable. There is three common non-linear kernels
used for mapping the samples to higher dimensions:
• Polynomial (homogeneous):
k(xi, xj) = (xi · xj)d
. (3.5)
Here xi, xj are samples which represented by feature vectors and d is polynomial degree.
• Gaussian Radial Based Function:
k(xi, xj) = exp(−γ xi − xj
2
), for γ > 0. (3.6)
Here xi, xj are samples which represented by feature vectors and γ is the kernel coefficient.
• Hyperbolic Tangent:
k(xi, xj) = tanh(κxi · xj + c), for some (not every) κ > 0 and c < 0. (3.7)
Here xi, xj are samples which represented by feature vectors and κ is kernel coefficient.
Although Support Vector Machine is a very sophisticated supervised learning algorithm but it needs
careful selection of the model i.e. kernel types and parameters specification in order to retrieve a highly
accurate result. Empirical model selection is a very tedious and interminable task. Therefore, in the
following, we will explain some common methods to deal with this problem.
3.4.5 Cross-Validation
The cross-validation [31] is a technique to measure how well the predictive model will generalize indepen-
dent of the data that were used to train the model. In machine learning cross-validation measures how
the well trained model will perform in practice. Each model has one or more unknown parameters and
when the number of the sample small or number parameter is large the model will face the over-fitting
problem. One way the cross-validation technique deal with this problem is dividing the sample data
to K equal subsets (K-fold cross-validation) then use K - 1 subsets to train the data and 1 subset for
validation. The algorithm repeats same procedure (K times) until all the individual subsets being used
as validation sets. At the end, the K results from the folds can be averaged (or otherwise combined) to
produce a single estimation. The common values for K are 3,5,10 depending on the size of the training
data. In our task, we used K = 5 for all four different algorithms.
12
CHAPTER 3. CHANNEL IDENTIFICATION 3.5. TRACKING DOWN NEURAL ACTIVITIES
3.4.6 Hyper-Parameter Optimization
Hyper-parameter optimization is the problem obtaining generalization for a learning algorithm by choos-
ing a set of parameters. The idea here is to adjust different model parameters in order to minimize the
loss function on the training data. There are different approaches for hyper-parameter optimization e.g.
global parameter optimization using Gaussian Processes [32] or simple grid search [33]. In this work, we
used grid search fortune parameters of a particular model in order to increase the accuracy of it. The
idea is to set a range for each of parameter and a step size. Then go throughout all possible combinations
of parameters and create the model and train the model with them and find out which combinations
minimizes the loss function value or in the other words gives a higher accuracy.
In previously introduced supervised learning algorithms, support vector machine is the one which
demands hyper-parameter optimization because of a complexity of selecting the parameter and the wide
range of choices which need a mechanism to tune them. For support vector machine there mainly three
parameter need to be tuned i.e. kernel function, C constant (regularization parameter) and γ factor
(kernel multiplier).
3.5 Tracking Down Neural Activities
After finding out the best combination of the normalization techniques, attribute selection methods and
classification algorithms we try to track down neural activities between different recording sessions. Now
we have the notion how well we could identify each channel give its feature vector computed from its
measure signals. Therefore, we want to know how likely it is to identify same activity between different
recording sessions on the same channel or its adjacent channels. The idea is to identify each specific
channel in a particular recording session using data pre-processing and machine learning approach that
we mentioned earlier in this chapter. Then, compute sets features for one another recording session
with the same channel configuration as the we built the model with and using test trained model. Here
we try to predict the class labels for new of measured signals. By looking at the differences between
predicted class labels and actual class labels in the test result we could argue that how well the cur-
rent recording of particular channel is predictable based on the earlier measurements of the same channel.
To make this argument we provide precision-recall analysis and observed the classification error
results and confusion matrices in the test phase. The result of confusion matrix could show a particular
channel activity now appears most likely on the same channel or elsewhere. Due to the density of the
electrodes and geometrical position of them on the probe shank if there was subtle unintended movement
in the NeuroProbe position, the activity of a particular channel in the test phase would be classified to
its adjacent channel relative to drift direction(during recording each channel is connected to specific
electrode on the probe shank).
13
CHAPTER 4
EXPERIMENTS AND RESULTS
4.1 Dataset Of Extracellular Recording
For our experiment, we used a dataset from in-vivo recordings, performed in April 2009 and February
2010 at the Institute for Psychology of the Hungarian Academy of Sciences in Budapest (Hungary). Data
acquisition is done using NeuroSelect software [34]. The operation of three micro-probes was verified by
acute implantation in the Neocortices of Wistar rats. One probe was implanted in the primary motor
cortex (2 mm in the lateral direction, aiming M1/M2) and 2 probes were implanted in the S1 trunk
region (see Figure 4.1). The data was pre-amplified (g=10 gain, bandpass filtered between DC and 100
kHz) and amplified (g=100 gain, bandpass filtered between 0.5 kHz and 5 kHz) with a total gain of 1000.
Signals were digitized at 16-bit resolution and 20 kHz sampling rate per channel.
Figure 4.1: Cross section of the located area of one implantation (based on [16]). The probe was inserted 2 mm in lateral
direction aiming for the M1/M2 region indicated by the black line.
Before trying to track dawn the neural activities between different recording session, We need to
evaluate the effect of the normalization and attribute selection methods on the overall performance of
classification algorithms. And also we need to know which combination of the introduced pre-processing
methods and classification algorithms will give us higher prediction accuracy. Hence, we selected rela-
tively large dataset containing all recording sessions of day 15.02.2010. This dataset has a specification
which there was no intentional movement in the probe position and also it has tried out most of the
electrodes available on the probe shank for signal measurement. This dataset contains 152 electrodes
14
CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES
which are also considered as class labels and proximately 15 sample per class, altogether 2199 samples.
It should be mentioned that in some of these recording sessions there are not enough qualified measure-
ments. It means some of the class labels have fewer samples, around 12 samples per class, because of
poor signals and channel disconnection and outliers. All these outliers are ignored in spike detection and
feature extraction steps.
For tracking the neural activity between different recording sessions we need to have consecutive
recordings with same electrode configurations. Then we would be able to train candidate classification
algorithms with enough number of samples and choose the one with the highest accuracy. Among the
available recordings we chose a dataset i.e. a pair of consecutive recording sessions with same elec-
trode configuration from day 12.02.2010 for channel identification and tracking down the activity of each
channel between recording sessions. These particular datasets chose because they have same electrode
configuration and there was no deliberate movement into probe position during and between the recording
sessions. Furthermore, the dataset contains high quality measured signals which advocate the presence
of neural activity. Here, we used one session of our data for training the algorithms and another session
for testing the algorithm. Each session of contains measured signals for eight channels connected to elec-
trodes 43,44,45,46,141,142,143,144. For each channel, we have 15 samples and each sample computed
from 10 s of recorded signals, represented by their feature vectors, altogether 120 samples per-session.
All recordings are available in EDF (European Data Format) and there is the library called EDFLib
[35] available for manipulating them. In each of chosen recording sessions in both of the datasets, There
were 8 electrodes, pair of the tetrode, selected and assigned to channels. Our implemented light Frame-
work, detect spikes and computer features we discussed in the previous chapter in order to use them in
classification algorithms.
4.2 Detected Spikes and Extracted Features
The first step in our approach was to detect spikes and compute some features from measured signals.
Figure 4.2 shows detected spikes for 10 s of recording using 3.5 threshold factor and 1 ms spike refractory
time. Compare to figure 4.2 in figure 4.3 we see detected spikes of the same recording segment with
threshold factor of 5 and 2 ms of spike refractory time window. As it depicted there are fewer spikes
detected with high threshold factor and this leads to different values for average firing rates.
Figure 4.2: Detected spikes from 10 s of the one channel activity using 1 ms spikes refractory time window and 3.5
threshold factor. Here the raw signal refers to filters signal which was used as input for spike detection algorithms.
In order to have more information about the activity of each channel, we would need to compute all
possible feature using different spike refractory times and various threshold factors.
15
CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES
Figure 4.3: Detected spikes from 10 s of the one channel activity using 2 ms spikes refractory time window and 5 threshold
factor. Here the raw signal refers to filters signal which was used as input for spike detection algorithms.
The figure 4.4, 4.5, and 4.6 show histogram distribution of the signal to noise ratio (SNR), maximum
(Max), and standard deviation (STD) value for four different channels in the same recording session.
Here we could see how these extracted features have to overlap in their distributions which make the
classification and channel identification a difficult task using these features. For instance, SNR values of
all four channels i.e. channel 3,4,6 and 8 lay down mostly in the range of 8 to 9. Furthermore, it is clear
that the value range of these features is different which also has the negative effect on the result of the
classification task. Therefore, we would need to normalize the feature value.
Figure 4.4: Histogram distribution of signal to noise ratio (SNR) value for detected spikes with threshold factor 5 and
1 ms refractory time for channels 8,6,3 and 4 connected to electrodes 141,142,143 and 144 including 15 samples each.The
Y-axis is the number of samples, 15 samples all together, and the X-axis is the value of the each samples. The measured
signals belong to one session on extracellular recordings on day 12.02.2010.
16
CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES
Figure 4.5: Histogram distribution of standard deviation (STD) value for measured signal of the channels 8,6,3 and 4
connected to electrodes 141,142,143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all
together, and The X-axis is the value of the each samples. The measured signals belong to one session on extracellular
recordings on day 12.02.2010.
Figure 4.6: Histogram distribution of maximum (Max) value for measured signal of the channels 8,6,3 and 4 connected
to electrodes 141,142,143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all together, and
The X-axis is the value of the each samples. The measured signals belong to one session on extracellular recordings on day
12.02.2010.
The histogram distribution of the maximum value in figure 4.6 shows that its value is irrelevant and
would not contribute to classification task. Here employed attribute selection methods to remove such
redundant and irrelevant feature and perform the classification with the smaller subset of the feature.
17
CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION
4.3 Data Normalization and Attribute Selection
In this section, we applied two attribute selection methods i.e. Correlation Feature Selection (CFS)
and Principal Component Analysis (PCA). We applied Min-Max and Zero-Mean normalization methods
as preprocessing step on the large dataset from Day 15.02.2010 with 152 class labels. To perform the
global Min-Max normalization and scaling first we computed the global minimum and maximum of each
particular feature among all samples of all classes of that feature then subtracted each global minimum
value from each feature, divided it by difference of global maximum and minimum value and then scale
each feature between [-1,1]. To perform the global Zero-Mean normalization first we computed the global
mean and standard deviation of each particular feature among all samples of all classes of that feature
then subtracted each global mean value from each feature and divide it by standard deviation value.
The figure 4.7 and 4.8 the show histogram distribution of the signal to noise ratio (SNR) and maximum
firing rate (MFR) value for two different channels in a same recording session. Here by comparing feature
value of the natural data and normalized data, we could see how normalization produce a completely
different scales and new values for each feature .
Figure 4.7: Histogram distribution of SNR for natural, Min-Max normal and Zero-Mean normal value detected spikes
with threshold factor 5 and 2 ms refractory time for channels 3 and 4 connected to electrodes 143 and 144 including 15
sample each. The Y-axis is number of samples, 15 samples all together, and The X-axis is the value of each samples. The
measured signals belong to one session on extracellular recordings on day 12.02.2010.
18
CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION
Figure 4.8: Histogram distribution of Maximum Firing Rate (MFR) with the time window of 10 s for natural, Min-Max
normal and Zero-Mean normal value for detected spikes with threshold factor 5 and 2 ms refractory time for channels 3 and
4 connected to electrodes 143 and 144 including 15 sample each. The Y-axis is a number of samples, 15 samples altogether,
and The X-axis is the value of each sample. The measured signals belong to one session on extracellular recordings on day
12.02.2010.
Then for finding the best subset of features from 43-dimensional feature vectors we applied Correla-
tion Feature Selection (CFS) and Principle Component Analysis (PCA) dimension reduction methods.
It needs to be mentioned that attribute selection is done before classification step and it is independent
of the supervised learning algorithms which are used for classification. Furthermore, as we could see in
table 4.1, the subset features selected by Correlation Feature Selection (CFS) and Principle Component
Analysis (PCA) methods are depend on the type and distribution of the provided input data. Here most
of the selected feature by both algorithms came from measured signal itself rather than detected spikes
i.e. minimum (Min), Median, standard deviation (STD), root mean square of the recorded signal (RMS
Signal), and average noise level of the measured signal (ANL).
19
CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION
Table 4.1: List of selected attributes by Correlation Feature Selection (CFS) and PCA methods. Dimension reduc-
tion methods applied on Min-Max and Zero-Mean Normalized dataset of the Day 15th of iv-vivo recording. Complete
descriptions of the attributes in appendix A.
CFS on Min-Max and Zero-Mean Normal-
ized Data
PCA on Min-Max and Zero-Mean Normal-
ized Data
Min Min
Median Mean
STD STD
RMS Signal RMS Signal
ANL SNR tf 3.5 sr 2ms
SNR tf 3.5 sr 1ms SNR tf 5 sr 2ms
AFR 20ms tf 3.5 sr 1ms MFR 500ms tf 3.5 sr 1ms
We applied Correlation Feature Selection (CFS) method using best first searching heuristic and
forward direction. Table 4.1 shows that the feature subset selected for both Min-Max and Zero-Mean
normalized dataset are the same. Results for the Principle Component Analysis (PCA) method on both
normalized data are the same as well. We sued Principle Component Analysis (PCA) method with
ranked search strategy and threshold factor equal to -1.797. In the figures 4.1 and 4.2, we see how
these selected attributes are distributed. Each method gives us seven selected attributes which mean the
dimension reduction from forty-three to seven. The first four selected attributes by both methods are
similar i.e. minimum value, mean value, standard deviation (STD), and root mean square (RMS) of 10
s measured signal. But the remaining three attributes are different. The Correlation Feature Selection
(CFS) method selected average noise level (ANL) for 10 s of measured signal, signal to noise ratio (SNR)
with 3.5 threshold factor and 1 ms spike refractory time, and average firing rate (AFR) for time window
of 20 ms with threshold factor of 3.5 and 1 ms spike refractory time. The Principle Component Analysis
(PCA) method selected signal to noise ratio (SNR) with 3.5 threshold factor and 2 ms spike refractory
time, signal to noise ratio (SNR) with 5 threshold factor and 2 ms spike refractory time, and maximum
firing rate (MFR) for time window of 500 ms with threshold factor of 3.5 and 1 ms spike refractory time.
Figure 4.9: The scattered diagram for selected attributed by CFS method. The X-axis is sample index, 2199 samples per
attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the attributes
are available in appendix A.
20
CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION
Figure 4.10: The scattered diagram for selected attributed by PCA method. The X-axis is sample index, 2199 samples
per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the
attributes are available in appendix A.
From figure 4.9 and 4.10, it is visible that the selected attributes using Correlation Feature Selection
(CFS) method are more suitable for classification problem than those selected with Principle Component
Analysis (PCA) method. For instance, the value of the attribute AFR 20ms of 3.5 sr 1ms, average firing
rate for the time window of 20 ms with threshold factor of 3.5 and 1 ms spike refractory time, selected
by Correlation Feature Selection (CFS) method gives more separable distributed relative to the value of
other selected attributes. But, on the other hand, the value of the attribute MFR 500ms of 3.5 sr 1ms,
maximum firing rate for time window of 500 ms with threshold factor of 3.5 and 1 ms spike refractory time.
The dimension reduction methods are aimed to remove redundant and unrelated attributes from the
given feature vectors. In the figure 4.11, we could see the scattered diagram for some of these attributes
which are ignored by both Principle Component Analysis (PCA) and Correlation Feature Selection (CFS)
attribute selection methods. The graph in figure 4.11 indicates that these features have almost the same
value distributions, therefore, they don’t contribute to classification problem so much. This phenomenon
may occur due to high background noise and artifacts in the measured signals or absence of the neural
activity nearby a particular electrode connected to a specific channel etc.
21
CHAPTER 4. EXPERIMENTS AND RESULTS 4.4. TRAINING AND VALIDATION OF CLASSIFIERS
Figure 4.11: The scattered diagram for unselected attributed by both PCA and CFS methods. The X-axis is sample
index, 2199 samples per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete
description of the attributes are available in appendix A.
4.4 Training And Validation Of The Classifiers In Conjunction
With Pre-Processing Methods
Data normalization is the highly recommended pre-processing step for algorithms like Support Vector
Machine (SVM) and K-Nearest Neighbour (K-NN). Therefore, in this section, we demonstrated the effect
of the dimension reduction methods on the performance of the classification algorithms. We applied our
candidate learning algorithms on both Zero-Min and Min-Max normalized datasets. The validation result
would show how well we can identify each channel given its sets of features. As we mentioned earlier
some classification algorithms i.e. Support Vector Machine (SVM) need parameter tuning in order to
obtain the higher accuracy. We used grid search as hyper-parameter tuning method in dealing with
this issue. It is apparent from table 4.2 and 4.3 the result obtained by Correlation Feature Selection
(CFS) attribute selection could generalize better although there were noisy measurements among some
of the recording sessions. In training these algorithms we used 10 fold cross-validation in order to avoid
over-fitting and get a notion about which of these algorithms could serve the goal of our project better.
Table 4.2: Train results for classification algorithms on Mim-Max normalized data and Correlation Feature Selection
(CFS), Principal Component Analysis (PCA) attribute selection methods, and all features. The dataset is provided from
all recording sessions of the day 15.02.2010. The γ parameter for Support Vector Machine (SVM) has two different values,
100 for low dimensional dataset and 10 for all dataset with all features. It contains 152 class labels ,here electrode numbers
on the probe, and 2199 samples all together. All algorithms trained using 10 fold cross-validation.
Algorithm Accuracy CFS Accuracy PCA Accuracy All Parameter Specification
ID3 64.93% 39.10% 63.48%
K-NN 62.98% 43.65% 40.60% K = 3 and inverse dis-
tance weighing
SVM 65.71% 44.29% 34.10% C = 250007 and γ =
100 and 10
Random
Forest
68.25% 44.29% 63.98% trees numbers = 10
22
CHAPTER 4. EXPERIMENTS AND RESULTS 4.4. TRAINING AND VALIDATION OF CLASSIFIERS
Table 4.3: Train results for classification algorithms on Zero-Mean normalized data and Correlation Feature Selection
(CFS), Principal Component Analysis (PCA) attribute selection methods, and all features. The dataset is provided from
all recording sessions of the day 15.02.2010. It contains 152 class labels, here electrode numbers on the probe, and 2199
samples all together. All algorithms trained using 10 fold cross-validation.
Algorithm Accuracy CFS Accuracy PCA Accuracy All Parameter Specification
ID3 64.75% 39.10% 63.61%
K-NN 62.98% 43.65% 40.60% K = 3 and inverse dis-
tance weighing
SVM 67.75% 47.88% 37.56% C = 250007 and γ = 1.0
Random
Forest
68.44% 43.97% 63.57% trees numbers = 10
The following two graphs in figures 4.12 and 4.13 show the confusion matrices for best and worst
results from the table 4.2. Comparing this to figures gave us a notion how well each algorithm predicts
the class labels and also on which class label we had the most false prediction. The figure 4.12 illustrates
the confusion matrix for the Random Forest algorithm applied on the Zero-Mean normalized dataset, in
conjunction with Correlated Feature Selection (CFS) method. The results of Random Forest algorithm
has the highest accuracy i.e 68.25% relative to other algorithms in table 4.3. On the other hand, Support
Vector Machine (SVM) algorithm applied on the Zero-Mean normalized dataset using all features has
the worst accuracy i.e 37.56% relative to others. Therefore, the confusion matrix in the figure 4.12 has
more visible diagonal with high values which means class labels are predicted to their original class. On
the other hand in figure 4.13 we could see those classes which classified wrongly, hence there were more
bright regions on both sides of the matrix’s diagonal.
Figure 4.12: Confusion matrix for the Random Forest algorithm applied on the Zero-Mean normalized data in conjunction
with Correlated Feature Selection (CFS) method. The dataset is provided from all recording sessions of the day 15.02.2010.
It contains 152 class labels, here electrode numbers on the probe, and 2199 samples all together. The algorithms trained
using 10 fold cross-validation.
23
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
Figure 4.13: Confusion matrix for the Support Vector Machine (SVM) algorithm applied on the Zero-Mean normalized
data using all features. The dataset is provided from all recording sessions of the day 15.02.2010. It contains 152 class labels
,here electrode numbers on the probe, and 2199 samples all together. The algorithms trained using 10 fold cross-validation.
From the results presented in tables 4.2 and 4.3, we could conclude that all four supervised learning
algorithms in conjunction with Correlation Feature Selection (CFS) feature selection method perform
virtually at the same level on the both Zero-Mean and Min-Max normalized data. Therefore, in the
following section, the combination of Correlation Feature Selection (CFS) method and four classification
algorithms are tried on both Min-Max and Zero-Mean normalized data in order to find out how well we
could track the neural activities between different recording sessions.
4.5 Tracking Down Neural Activities Using Supervised Learn-
ing Algorithms
In this section, we trained and test the previously selected classification algorithms using the specific
dataset we introduced earlier in this chapter i.e. two consecutive recording sessions from day 12.02.2010
of in-vivo experiment as we mentioned earlier in this chapter. Each recording contains eight channel
of recording connected to eight electrodes. Hence, we have eight class labels and fifteen sample per
class. The aim in the current step was to first train learning algorithms in conjunction with Correlation
Feature Selection (CFS) method using the dataset from the first recording session. Then to test trained
algorithms with the dataset from the second recording session. The accuracy of prediction in the test
phase could give us a notion that how well neural activities between two recording session are traceable.
The difference between this experiment and the earlier one in this chapter is, in the previous experiment
all the computed features belong to the same recording session. However, in the current experiment, we
have datasets belong to two different consecutive recording sessions.
In the pre-processing step, we first applied Min-Max and Zero-Mean normalization based on the same
principal mentioned earlier in this chapter and then we tried out Correlation Feature Selection (CFS)
method to reduce the dimension of the feature vectors. Table 4.4 presents the selected attributes using
Correlation Feature Selection (CFS) method. Compare to attribute subset we had in table 4.1, there
are 4 more attributes in the new subset. Furthermore, the types of the selected attributes are different.
In the subset of our smaller dataset, we have more attributes related to detected spikes and their firing
24
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
rates. These phenomenon support the fact that the measured signals are less noisy and there are more
neural activities present in these two recording sessions.
Table 4.4: List of selected attributes by Correlation Feature Selection (CFS) methods. Dimension reduction methods
applied on Min-Max and Zero-Mean Normalized dataset of the Day 12th of iv-vivo recording. Complete descriptions of
the attributes in appendix A.
CFS on Min-Max and Zero-Mean Normal-
ized Data
Median
STD
RMS Signal
SNR tf 3.5 sr 1ms
SNR tf 3.5 sr 2ms
SNR tf 5 sr 2ms
AFR 20ms tf 3.5 sr 1ms
AFR 20ms tf 3.5 sr 2ms
AFR 20ms tf 5 sr 2ms
MFR 500ms tf 3.5 sr 1ms
Figures 4.12 and 4.13 depict the data distribution of the selected attributes by Correlation Feature
Selection (CFS) method from both recording sessions for Zero-Mean normalized data. The attributes se-
lected based on the result of the Correlation Feature Selection (CFS) method applied to the first recording
session. Comparing the figure 4.12 to 4.13 we see they have more or less same distribution. This simi-
lar feature distribution indicates that the same sort activities were present during both recording session.
Figure 4.14: The scattered diagram for selected by CFS method from first session of day 12th of the in-vivo. The X-axis
is sample in the, 120 sample all together. The Y-axis denotes the attribute value after Zero-Mean normalization. The
Complete description of the attributes are available in appendix A.
25
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
Figure 4.15: The scattered diagram for selected by CFS method from second session of day 12th of the in-vivo. The
X-axis is sample in the, 120 sample all together. The Y-axis denotes the attribute value after Zero-Mean normalization.
The Complete description of the attributes are available in appendix A.
4.5.1 Prediction Results For Trained Models
The prediction results for four supervised learning algorithms are presented in table 4.5. The obtained
results show that Support Vector Machine and K-Nearest Neighbour algorithms perform on both normal-
ized data better than two other algorithms. Also, we can see that the after normalization and attribute
selection simple K-Nearest Neighbour algorithm reaches the prediction accuracy as good as sophisticated
algorithms like Support Vector Machine.
Table 4.5: Test results for classification algorithms on Mim-Max and Zero-Mean normalized data using Correlation
Feature Selection (CFS) method. The dataset is provided from second recording sessions of the day 12.02.2010. It contains
8 class labels, here electrode numbers on the probe, and 120 samples all together. All algorithms trained using 10 fold
cross-validation. This result achieved using subset of attributes presented in the table 4.4.
Algorithm Accuracy on Zero-
Mean Dataset
Accuracy on Min-
Max Dataset
Parameter Specification
ID3 73.33% 73.33%
K-NN 89.16% 90% K = 3 and inverse distance
weighing
SVM 90% 90% C = 250007 and γ = 0.01
Random Forest 83.33% 81.66% trees numbers = 10
We performed the same experiment as above on the same dataset but with the complete set of
features. Table 4.6 presents the prediction results for our selected classifiers on the two Min-Max and
Zero-Mean normalized data. Comparing results in table 4.5 and 4.6 is quite revealing in several ways.
First, it shows the tree based algorithms are almost invariably regarding data normalization. Second, The
performance of all classifiers except Iterative Dichotomiser 3 (ID3) boosted using Correlation Feature
Selection (CFS) method. Furthermore, since Iterative Dichotomiser 3 (ID3) uses pruning mechanism to
remove unrelated attributes by the time of the tree construction its results still invariant to Correlation
Feature Selection (CFS) method. However, using FSC feature selection could help other classifiers deal
26
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
with redundant and unrelated attributes away better than decision tree. As it, obvious simple algorithm
like K-Nearest Neighbour outperform Iterative Dichotomiser 3 (ID3) and Random Forest algorithm in
this case and reaches the same performance as Support Vector Machine.
Table 4.6: Test results for classification algorithms on Mim-Max and Zero-Mean normalized data using complete sets
of features. The dataset is provided from second recording sessions of the day 12.02.2010. It contains 8 class labels ,here
electrode numbers on the probe, and 120 samples all together. All algorithms trained using 10 fold cross-validation. These
results achieved using complete set of attributes presented in the appendix A.
Algorithm Accuracy on Zero-
Mean Dataset
Accuracy on Min-
Max Dataset
Parameter Specification
ID3 73.33% 73.33%
K-NN 81.66% 85% K = 3 and inverse distance
weighing
SVM 86.66% 83% C = 250007 and γ = 0.001
Random Forest 77.33% 74.16% trees numbers = 10
The following two graph, figure 4.14 and 4.15 show the precision-recall analysis for these four learning
algorithms on the both Min-Max and Zero-Mean normalized datasets. In both of the illustrations show
that Support Vector Machine and K-Nearest Neighbour outperform two other algorithms and basically
have higher precision values. Precision is the fraction of the predicted instances which are relevant to
original class and recall is the fraction of the relevant instance which are predicted. The higher ratio of
precision to recall means a better and more accurate prediction has achieved.
Figure 4.16: The precision-recall analysis computer from Min-Max normalized data. Each sample shows precision-recall
ratio for individual class labels. In the graph for each algorithm, we expect to see eight samples, one sample per class. Since
some of the values are the same they overlay on each other and not fully visible in the graph. Here we could see Support
Vector Machine and K-Nearest Neighbour algorithms have higher precision than Random Forest and Iterative Dichotomiser
3 (ID3). The reason lay on the boosting effect of the normalization on the dataset and CFS attribute selection.
27
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
Figure 4.17: The precision-recalls analysis computer from Zero-Mean normalized data. Each sample shows precision-
recall ratio for individual class labels. In the graph for each algorithm, we expect to see eight samples, one sample per
class. Since some of the values are the same they overlay on each other and not fully visible in the graph. Here we could
see Support Vector Machine and K-Nearest Neighbour algorithms have higher precision than Random Forest and Iterative
Dichotomiser 3 (ID3). The reason lay on the boosting effect of the normalization on the dataset and CFS attribute selection.
4.5.2 Observation on Confusion Matrices
Figures 4.16 and 4.17 depict the confusion matrices for the results of classification algorithms. The main
diagonal of matrices indicates the accuracy of the classifiers and also the quality of channel identifica-
tion. Higher the values on the matrix diagonal mean the accurate is the prediction of the classifier. Our
experiment designed in a way that we first built the learning models using feature computed from the
first recording session and then given features computed from the second recording session we tested our
learning models. The channel configuration, the electrodes connected to the channels, remained the same
in both recording sessions. Since there was no deliberate movement in the probe position we expected
that activity of each channel to be predicted to its original channel. In the other words, here we tried to
define ground truth for each channel based on its measure signal and later on use this information for
identifying the activities which are measured in other recording sessions.
28
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
(a) Random Forest (b) Support Vector Machine
(c) ID3 (d) 3-NN
Figure 4.18: The confusion matrices for a) Random Forest, b) Support Vector Machine, c) ID3, and d) K-Nearest
Neighbour algorithms on the Min-Max normalized dataset. The X-axis is original class labels, here electrode indexes on
the probe shank connected to their specific channels. The Y-axis has the same values as the X-axis. The value of each cell
indicates the number of the instances from the class label on X-axis predicted to of the label from Y-axis. Therefore, the
higher value of the diagonal shows each class predicted to its original class label.
(a) Random Forest (b) Support Vector Machine
(c) ID3 (d) 3-NN
Figure 4.19: The confusion matrices for a) Random Forest, b) Support Vector Machine, c) ID3, and d) K-Nearest
Neighbour algorithms on the Zero-Mean normalized dataset. The X-axis is original class labels, here electrode indexes on
the probe shank connected to their specific channels. The Y-axis has the same values as the X-axis. The value of each cell
indicates the number of the instances from the class label on X-axis predicted to of the label from Y-axis. Therefore, the
higher value of the diagonal shows each class predicted to its original class label.
By looking at the confusion matrices in both figure 4.16 and figure 4.17 there are channels that could
29
CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
be identified quite accurately by all classifiers i.e. channels 5 and 1 connected to electrode 45 and 46. On
the other hand, there electrodes which in most of the algorithms are classified to their adjacent channels
i.e. channels 3 and 4 connected to electrodes 144 and 143. Figure 4.18 shows the distribution of the
SNR value for mentioned electrodes from test phase with 2 ms refractory time and threshold factor of 5
of natural data. As it is depicted channel 1 and 5 have relatively higher SNR value than channel 3 and
4. This indicates the fact that the quality of the signals which are measured by first two channels are
higher and they contain lower noise level. Therefore, their computed features would be more separable
and could have the better contribution to channel identification task.
Figure 4.20: The SNR value 2 ms refractory time with threshold factor of 5 for channel 5,1,3 and 4 connected to electrodes
45,46,143 and 144. The data normalized using Zero-Mean normalization method. The Y-axis is number of samples, 15
samples all together, and X-axis is the value of the each samples.
It is obvious that in confusion matrices, the higher the value on diagonal the more accurate the
classification result is. Therefore, in the test session, the higher value on the diagonal means activity
are measured by channels in the test dataset have same feature distribution as their measured signals in
the training dataset. Although some of the channels due to lower quality in their measurements showing
some similarities mostly to their adjacent channels, but there are channels in the test session, i.e. those
which are connected to electrodes 43, 44, 45, 46, and 141 which showing high accuracy in prediction
to their original class label from train session. Based this observation supports two ideas we earlier
mentioned in the problem statement section.
The first , by identifying each channel using its measured signal of former recording session we could
define ground truth for each channel and choose next configuration of the electrode. It means if lose
the signal quality in a particular recording session then we could select new channel configurations from
those have shown better measurements by looking at this ground truth. The second , we could support
the argument that there was no drift in the probe position between two consecutive recording sessions.
Because if there was the drift in probe position between to session of recordings we would expect that
channels in test session are classified to their adjacent channels from train session relative to shift direc-
tion. It means if there was upward drift they will be classified to their downer channels located on the
probe shank and if there was downward drift it would another way around.
30
CHAPTER 5
SUMMARY
5.1 Conclusion
In this work, we are dealing with the problem of channel identification in in-vivo recording using ”Neuro-
Probe”. Solving this task could help us to efficiently select electrodes from high-density microarray and
contribute to Electronic Depth Control (EDC) problem. It could provide us with ground truth for each
electrode on the probe shank. Using this identification we can choose channel configuration which has
shown high-quality signals and more detectable activities. Furthermore, it would be possible detect an
unintended drift in the position of the EDC-Probe during long-term in-vivo recording and between dif-
ferent recording sessions.
There were four different steps in this work. In the first step, given a dataset recorded by EDC-
Probe, we applied adaptive threshold spike detection algorithm and computed features of each recording
channel. We computed and used average noise level (ANL) as an extra new feature relative to former
approaches in this field. This feature has provided more information when the quality of the measured
signals surpassed by high background noise activity.
In the second step, for data pre-processing we applied Min-Max and Zero-Mean global normalization
in order to have the same scale and better distribution for all the computed features. In addition to
that, we applied correlation feature selection (CFS) and principal component analysis (PCA) for remove
is irrelevant and redundant features and reduce the dimension of the feature vectors. The dimension
reduction and normalization have boosted the performance of the classifiers. The result of the attribute
selection gave us a notion about the quality of the measured signals. If measurements dominated by
noise, selected features were those computed from signals itself and not from detected spikes. In contrast,
in the presence of the neural activities attributes related to detected spikes were selected.
In the third step, trained and validated various supervised machine learning algorithms i.e. K-Nearest
Neighbour, Iterative Decision Tree, Support Vector Machine and Random Forest in conjunction with pre-
processing methods and techniques for identifying each channel. Furthermore, we applied Grid-Search
as a simple case of hyper-parameter optimization to increase the accuracy of Support Vector Machine
(SVM). We trained the candidate algorithms with features computed from all recording sessions of the
day 15.02.2010 containing measured signals of 152 different electrodes. The classification result has
shown the possibility of identifying each channel up to 68% using Random Forest algorithm combined
we Correlation Feature Selection (CFS) method. Also, it has shown that after normalization other
classifiers like Support Vector Machine (SVM) and K-Nearest Neighbour (K-NN) could reach accuracy
above 62%. This suggests that for channel identification it is possible to use a combination of normaliza-
tion, Correlation Feature Selection (CFS) method and simple classifier like K-Nearest Neighbour (K-NN).
In the fourth step for tracking down the neural activities between two consecutive recording sessions
with the same channel configuration, We trained and test the combination of Correlation Feature Se-
lection (CFS) method and both normalization techniques with four supervised learning algorithms. We
31
CHAPTER 5. SUMMARY 5.2. FEATURE WORKS
were able to reach almost 90% accuracy using Support Vector Machine (SVM) algorithm. Interestingly
simple K-Nearest Neighbour (K-NN) algorithm performed at the same level as Support Vector Machine
(SVM) did. It needs to be mentioned that although ID3 and Random Forest algorithms had high accu-
racy in the training phase but in the test phase they fell behind two other algorithms. The observation on
confusion matrix, precision-recall analysis, and feature distribution showed there was no drift in probe
position between two sessions. In addition to that, it has shown the neural activity between various
recording sessions are traceable using our approach.
5.2 Future Works
• To provide strong support for approach regarding channel identification and detecting drift in the
probe position during in-vivo recording we need to provide a better data sets and recordings. Then
we could study the neural activity better. Having data set which contains long-term recording
with the same channel configuration gives us a chance to train our learning algorithm better and
observe the result of the test session to see the possibility of detecting drift in the probe position.
• Regarding identifying each channel in a particular recording session we could try not only identify
one channel but also a group of channels. In this case, we could assume a tetrode, four channel,
or two adjacent channel and train our learning algorithm by their extracted feature and try to
identify their activity in the test session. To do this we would need to find out which of pairwise
channels are recording from the same neuron simultaneously using signal similarity measurement
algorithms.
32
APPENDIX A
LIST OF FEATURES AND THEIR DESCRIPTIONS
A.1 All Features List
Feature Name Description
Min The minimum peak value of the measured signal.
Max The maximum peak value of the measured signal.
Mean The mean value of the measured signal.
Median The Median value of the measured signal.
STD The standard deviation (STD) of the measured signal.
RMS The root mean square (RMS) of the measured signal.
ANL The average noise level (ANL) of the measured signal. For computing noise level the
time window of 50 ms is used.
SNR tf 3.5 sr 2 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
3.5 and spike refractory (sr) time window is 2 ms.
MFR 20 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 20 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
AFR 20 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time
window is 2 ms.
MFR 100 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
AFR 100 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
MFR 500 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
33
APPENDIX A. FEATURE LIST A.1. FEATURE LIST
AFR 500 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
MFR 10 s tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
AFR 10 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 10 s. For
the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time
window is 2 ms.
SNR tf 3.5 sr 1 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
3.5 and spike refractory (sr) time window is 1 ms.
MFR 20 ms tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 20 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 1 ms.
AFR 20 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time
window is 1 ms.
MFR 100 ms tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 1 ms.
AFR 100 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 1 ms.
MFR 500 ms tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 1 ms.
AFR 500 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
MFR 10 s tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 1 ms.
AFR 10 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 10 s. For
the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time
window is 1 ms.
SNR tf 5 sr 2 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
5 and spike refractory (sr) time window is 2 ms.
MFR 20 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 20 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
AFR 20 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time
window is 2 ms.
MFR 100 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
34
APPENDIX A. FEATURE LIST A.1. FEATURE LIST
AFR 100 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
MFR 500 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
AFR 500 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
MFR 10 s tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
AFR 10 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 10 s. For
the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time
window is 2 ms.
SNR tf 5 sr 1 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
5 and spike refractory (sr) time window is 1 ms.
MFR 20 ms tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 20 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 1 ms.
AFR 20 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time
window is 1 ms.
MFR 100 ms tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 1 ms.
AFR 100 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 100 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 1 ms.
MFR 500 ms tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 1 ms.
AFR 500 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 2 ms.
MFR 10 s tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
time window is 1 ms.
AFR 10 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 10 s. For
the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time
window is 1 ms.
Table A.1: List of all computed features and their descriptions. Note that all the attributes are computed
from one segment of each recording session which contains 10 s of measured signals.
35
BIBLIOGRAPHY
[1] Miguel AL Nicolelis. Methods for Neural Ensemble Recordings. Boca Raton (FL): CRC Press., Upper Saddle River,
NJ, USA, 2008.
[2] Herc p.Neves, Tom Torfs, Refet F.Yazicioglu, Junaid Aslam, Arno A.Aarts, Patrick Merken, Patrick Ruther, and
Chris Van Hoof. The neuroprobes project:a concept for electronic depth control. Annual International Conference of
the IEEE Engineering in Medicine and Biology Society EMBS 2008, 261:1857–1857, 2008.
[3] K.Seidl, H.Herwik, Y.Nurcahyo, T.Torfs, M.Keller, M.Schuettler, H.Neves, T.Stieglitz, O.Paul, and P Ruther. Cmos-
based high-density silicon micro-probe array for electronic depth control in neural recording. 22nd Int. MEMS Conf,
261:232–5, 2009.
[4] J. Ji and K.D. Wise. An implantable cmos circuit interface for multiplexed microelectrode recording arrays. Solid-State
Circuits, IEEE Journal of, 27(3):433–443, Mar 1992.
[5] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data
mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.
[6] M.Abeles and M.Goldstein. Multispike train analysis. IEEE 1977, 65:762–773, 1977.
[7] I.Bankman, K.Johnson, and W.Schneider. Optimal detection classification and superposition resolution in neural
waveform recordings. IEEE Trans Biomed Eng, 40:836–841, 1993.
[8] Sahani M.Latent. variable models for neural data analysis. PhD Dissertation Pasadena, 1999.
[9] KH.Kim and SJ.Kim. Neural spike sorting under nearly 0-db signal-to-noise ratio using nonlinear energy operator
and artificial neural network classifier. IEEE Trans Biomed Eng, 47:1406–1411, 2000.
[10] S.Mukhopahdyay and GC.Ray. A new interpretation of nonlinear energy operator and its efficacy in spike detection.
Trans Biomed Eng, 45:180–187, 1998.
[11] L.Traver, C.Tarin, P.Marti, and N.Cardona. Adaptive threshold neural spike detection by noise-envelope tracking.
Electron, 43:1333–1335, 2007.
[12] RJ.Brychta, S.Tuntrakool, and M.Appalsamy et al. Wavelet methods for spike detection in mouse renal sympathetic
nerve activity. IEEE Trans Biomed Eng, 54:82–93, 2007.
[13] S.Kim and K.Kim. A wavelet-based method for action potential detection from extracellular neural signal recording
with low signal-to-noise ratio. IEEE Trans Biomed Eng., 50:999–1011, 2003.
[14] Z.Nenadic and JW.Burdick. Spike detection using the continuous wavelet transform. IEEE Trans Biomed Eng.,
20:74–87, 2005.
[15] I.Obeid and PD.Wolf. Evaluation of spike detection algorithms for a brain-machine interface application. IEEE Trans
Biomed., 51:905–911, 2004.
[16] Detection of Active Brain Regions for Automatic Electrode Selection Using a Machine Learning Approach. Bachelor
thesis. Master’s thesis, 2010.
[17] George W.Fraser and Andrew B.Schwartz. Recording from the same neurons chronically in motor cortex. J Neuro-
physiol., 107:1970–1978, 2012.
[18] Edwin M. Maynard, Craig T. Nordhausen, and Richard A. Normann. The utah intracortical electrode array: A
recording structure for potential brain-computer interfaces. Electroencephalography and Clinical Neurophysiology,
102(3):228 – 239, 1997.
[19] Ali Shawkat and Kate A.Smith-Miles. Improved support vector machine generalization using normalized input space.
Advances in Artificial Intelligence., 4304:362–371., 2006.
[20] Teunis van Beelen. GCC: GNU EDFbrowser a free, opensource, multiplatform, universal viewer and toolbox in-
tended for, but not limited to, timeseries storage files like eeg, emg, ecg, bioimpedance, etc. http://www.teuniz.net/
edfbrowser/, 2010–2013.
[21] Isabelle Guyon and Andr´e Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–
1182, March 2003.
[22] Mark A. Hall and Geoffrey Holmes. Benchmarking attribute selection techniques for discrete class data mining. IEEE
Trans. on Knowl. and Data Eng., 15(6):1437–1447, November 2003.
36
BIBLIOGRAPHY BIBLIOGRAPHY
[23] M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis, 1:131–156, 1997.
[24] Fengxi Song, Zhongwei Guo, and Dayong Mei. Feature selection using principal component analysis. In System Science,
Engineering Design and Manufacturing Informatization (ICSEM), 2010 International Conference on, volume 1, pages
27–30, Nov 2010.
[25] Mark A Hall. Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, 1999.
[26] Mark A. Hall and Lloyd A. Smith. Feature subset selection: a correlation based filter approach. In 1997 International
Conference on Neural Information Processing and Intelligent Information Systems, pages 855–858. Springer, 1997.
[27] Songbo Tan. Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications,
28(4):667 – 671, 2005.
[28] J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
[29] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
[30] Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273–297, 1995.
[31] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. International
joint conference on Aritificial Intelligence., pages 1137–1143, 1995.
[32] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms.
Technical report, 2012.
[33] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. A practical guide to support vector classification. 2010.
[34] Karsten Seidl, Tom Torfs, Patrick A De Mazi`ere, Gert Van Dijck, Richard Csercsa, Balazs Dombovari, Yohanes
Nurcahyo, Hernando Ramirez, Marc M Van Hulle, Guy A Orban, et al. Control and data acquisition software for high-
density cmos-based microprobe arrays implementing electronic depth control. Biomedizinische Technik/Biomedical
Engineering, 55(3):183–191, 2010.
[35] Teunis van Beelen. GCC: GNU EDFlib edflib is a programming library for c/c++ to read/write edf+/bdf+ files.
http://www.teuniz.net/edflib/index.html, 2010–2013.
37

More Related Content

What's hot

Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisevegod
 
Pranav_Shah_Report
Pranav_Shah_ReportPranav_Shah_Report
Pranav_Shah_ReportPranav Shah
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and trackingJavier_Barbadillo
 
Nonlinear image processing using artificial neural
Nonlinear image processing using artificial neuralNonlinear image processing using artificial neural
Nonlinear image processing using artificial neuralHưng Đặng
 
Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyFarhad Gholami
 
Final_project_watermarked
Final_project_watermarkedFinal_project_watermarked
Final_project_watermarkedNorbert Naskov
 
Head_Movement_Visualization
Head_Movement_VisualizationHead_Movement_Visualization
Head_Movement_VisualizationHongfu Huang
 
aniketpingley_dissertation_aug11
aniketpingley_dissertation_aug11aniketpingley_dissertation_aug11
aniketpingley_dissertation_aug11Aniket Pingley
 
Master Thesis Overview
Master Thesis OverviewMaster Thesis Overview
Master Thesis OverviewMirjad Keka
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAPArjun Aravind
 
2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthiHoopeer Hoopeer
 
RFP_2016_Zhenjie_CEN
RFP_2016_Zhenjie_CENRFP_2016_Zhenjie_CEN
RFP_2016_Zhenjie_CENZhenjie Cen
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Cooper Wakefield
 

What's hot (20)

Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesis
 
main
mainmain
main
 
Pranav_Shah_Report
Pranav_Shah_ReportPranav_Shah_Report
Pranav_Shah_Report
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and tracking
 
Thesis
ThesisThesis
Thesis
 
mscthesis
mscthesismscthesis
mscthesis
 
Nonlinear image processing using artificial neural
Nonlinear image processing using artificial neuralNonlinear image processing using artificial neural
Nonlinear image processing using artificial neural
 
Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkley
 
論文
論文論文
論文
 
Final_project_watermarked
Final_project_watermarkedFinal_project_watermarked
Final_project_watermarked
 
thesis
thesisthesis
thesis
 
Head_Movement_Visualization
Head_Movement_VisualizationHead_Movement_Visualization
Head_Movement_Visualization
 
aniketpingley_dissertation_aug11
aniketpingley_dissertation_aug11aniketpingley_dissertation_aug11
aniketpingley_dissertation_aug11
 
Master Thesis Overview
Master Thesis OverviewMaster Thesis Overview
Master Thesis Overview
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi
 
Sensfusion
SensfusionSensfusion
Sensfusion
 
SCE-0188
SCE-0188SCE-0188
SCE-0188
 
RFP_2016_Zhenjie_CEN
RFP_2016_Zhenjie_CENRFP_2016_Zhenjie_CEN
RFP_2016_Zhenjie_CEN
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...
 

Similar to edc_adaptivity

A Seminar Report On NEURAL NETWORK
A Seminar Report On NEURAL NETWORKA Seminar Report On NEURAL NETWORK
A Seminar Report On NEURAL NETWORKSara Parker
 
Analysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural NetworkAnalysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural NetworkZHENG YAN LAM
 
SeniorThesisFinal_Biswas
SeniorThesisFinal_BiswasSeniorThesisFinal_Biswas
SeniorThesisFinal_BiswasAditya Biswas
 
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...BouzidiAmir
 
Keraudren-K-2015-PhD-Thesis
Keraudren-K-2015-PhD-ThesisKeraudren-K-2015-PhD-Thesis
Keraudren-K-2015-PhD-ThesisKevin Keraudren
 
Project report on Eye tracking interpretation system
Project report on Eye tracking interpretation systemProject report on Eye tracking interpretation system
Project report on Eye tracking interpretation systemkurkute1994
 
RY_PhD_Thesis_2012
RY_PhD_Thesis_2012RY_PhD_Thesis_2012
RY_PhD_Thesis_2012Rajeev Yadav
 
Nweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italyNweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italyAimonJamali
 
Master_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuMaster_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuJiaqi Liu
 
gemes_daniel_thesis
gemes_daniel_thesisgemes_daniel_thesis
gemes_daniel_thesisDaniel Gemes
 
Geometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksGeometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksLorenzo Cassani
 
Olanrewaju_Ayokunle_Fall+2011
Olanrewaju_Ayokunle_Fall+2011Olanrewaju_Ayokunle_Fall+2011
Olanrewaju_Ayokunle_Fall+2011Ayo Olanrewaju
 
image processing to detect worms
image processing to detect wormsimage processing to detect worms
image processing to detect wormsSynergy Vision
 
Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmIntegrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmKavita Pillai
 
Quan gsas.harvard 0084_l_10421
Quan gsas.harvard 0084_l_10421Quan gsas.harvard 0084_l_10421
Quan gsas.harvard 0084_l_10421shehab kadhim
 

Similar to edc_adaptivity (20)

A Seminar Report On NEURAL NETWORK
A Seminar Report On NEURAL NETWORKA Seminar Report On NEURAL NETWORK
A Seminar Report On NEURAL NETWORK
 
Analysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural NetworkAnalysis and Classification of ECG Signal using Neural Network
Analysis and Classification of ECG Signal using Neural Network
 
AUDIBERT_Julien_2021.pdf
AUDIBERT_Julien_2021.pdfAUDIBERT_Julien_2021.pdf
AUDIBERT_Julien_2021.pdf
 
SeniorThesisFinal_Biswas
SeniorThesisFinal_BiswasSeniorThesisFinal_Biswas
SeniorThesisFinal_Biswas
 
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...
 
Tac note
Tac noteTac note
Tac note
 
Keraudren-K-2015-PhD-Thesis
Keraudren-K-2015-PhD-ThesisKeraudren-K-2015-PhD-Thesis
Keraudren-K-2015-PhD-Thesis
 
Project report on Eye tracking interpretation system
Project report on Eye tracking interpretation systemProject report on Eye tracking interpretation system
Project report on Eye tracking interpretation system
 
RY_PhD_Thesis_2012
RY_PhD_Thesis_2012RY_PhD_Thesis_2012
RY_PhD_Thesis_2012
 
thesis_report
thesis_reportthesis_report
thesis_report
 
Inglis PhD Thesis
Inglis PhD ThesisInglis PhD Thesis
Inglis PhD Thesis
 
Nweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italyNweke digital-forensics-masters-thesis-sapienza-university-italy
Nweke digital-forensics-masters-thesis-sapienza-university-italy
 
Master_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuMaster_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_Liu
 
gemes_daniel_thesis
gemes_daniel_thesisgemes_daniel_thesis
gemes_daniel_thesis
 
Geometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksGeometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural Networks
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Olanrewaju_Ayokunle_Fall+2011
Olanrewaju_Ayokunle_Fall+2011Olanrewaju_Ayokunle_Fall+2011
Olanrewaju_Ayokunle_Fall+2011
 
image processing to detect worms
image processing to detect wormsimage processing to detect worms
image processing to detect worms
 
Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based ParadigmIntegrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
 
Quan gsas.harvard 0084_l_10421
Quan gsas.harvard 0084_l_10421Quan gsas.harvard 0084_l_10421
Quan gsas.harvard 0084_l_10421
 

edc_adaptivity

  • 1. Interdisciplinary Project - Master of Computer Science Evaluation Of Learning Algorithms For Tracking Neuronal Signals With EDC Probes Ramin Zohouri
  • 2.
  • 3. Autonomous Intelligent Systems Laboratory Department Of Computer Science Microsystem Material Laboratory Department Of Microsystem Engineering University Of Freiburg Author : Ramin Zohouri Master’s Degree Program in Computer Science Interdisciplinary Project Evaluation Of Learning Algorithms For Tracking Neuronal Signals With EDC Probes Examiner: Prof. Dr. Wolfram Burgard and Prof. Dr. Oliver Paul Supervisor: Dr. Barbara Frank II
  • 4. ABSTRACT CMOS-integrated electronic depth control (EDC) probes are high-density Microelectrode arrays which are used for monitoring the activity of the neuron of interest in the ensemble of neurons. EDC probes are capable of selecting channel configurations in order to measure signals in different regions along probe shank. However, probe and neural activity may shift in the brain and it is necessary to keep track of activities in the brain by choosing next channel configuration. In this project, we try to identify recording channels and track down their activities using features extracted from their measured signals. Given feature vectors of each channel we first apply pre-processing step for normalization and dimension reduction. Then we employ different supervised machine learning algorithms to identify channels and find out which algorithm is more appropriate for this task. Then we test this already trained models with another recording session with the same channel configurations. The prediction result shows that it is possible to track down neural activity between different recording sessions. Furthermore, off-diagonal values on the confusion matrix of the test phase show that we may have the probe or activity shift between consecutive recording sessions. 04.2013 - 04.2014 Ramin Zohouri III
  • 5. ACKNOWLEDGMENT I have taken efforts in this Project. However, it would not have been possible without the kind support and help of many individuals and organizations. I would like to extend my sincere thanks to all of them. I am highly indebted to Prof. Dr. Wolfram Burgard for his guidance and supervision as well as for providing necessary information regarding the project. I would like to express my gratitude towards Prof. Dr. Oliver Paul for his kind co-operation and encouragement which help me in the completion of this project. Furthermore, I would like to thank Dr. Barbara Frank for the useful comments, remarks and engage- ment through the learning process of this interdisciplinary project. My thanks and appreciations also go to Dr. Patrick Ruther in developing the project and EDC++ project members who have willingly helped me out with their abilities. IV
  • 6. CONTENTS 1 Introduction 2 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Related Works 4 2.1 Spike Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Detecting active brain region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Recoding from same neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Channel Identification 6 3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Spike detection and feature computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Normalization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.2 Attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Machine learning for channel identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.1 K-Nearest Neighbour (K-NN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.2 Iterative Dichotomiser 3 (ID3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.3 Random forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.4 Support Vector Machine (SVM) Algorithm . . . . . . . . . . . . . . . . . . . . . . 12 3.4.5 Corss-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4.6 Hyper-parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 Tracking down neural activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Experiments and Results 14 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Spikes and features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 Data normalization and attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Training and validation of classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5 Tracking down neural activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5.1 Testing trained models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5.2 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Summary 31 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Feature works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A Feature List 33 A.1 Feature list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Bibliography 35 1
  • 7. CHAPTER 1 INTRODUCTION 1.1 Motivation Understanding of brain functionality and the complex interaction of the large neural network with huge numbers of neurons is one of the most challenging research fields in the neuroscience. Development of appropriate tools opens a new perspective in research and application, e.g. in neural prostheses, as well as for diagnoses and therapy of neurodegenerative diseases including Alzheimer, Parkinson and epilepsy. Recordings of single neuron activity within an ensemble of neurons are required for a basic understand- ing of neural processes [1]. By this aim within The European Project NeuroProbes a new high-density electrode array for recording with the high spatial resolution was introduced and successfully tested for the first time in vivo experiments [2, 3]. These probes contain 188 electrodes configured in 2 rows. Di- rectly integrated CMOS multiplexing units on the probe shafts enables a drastic increase in the number and density of electrodes in NeuroProbe to compare to existing devices [4]. The density of such arrays makes it possible to switch between the electrodes and reach a close proximity between the neuron of interest and the recording electrode. In this context the concept of switching between individual micro- electrodes of the same shaft, without the need to reposition either the shaft or the entire probe is called electronic depth control (EDC). EDC allows us to switch between electrodes and scan their signals along the probe shank and select those with higher signal quality. However during long-term in-vivo recording there are moments which the current configuration of the electrodes is not able to record qualified signals. One reason for losing qualified signals might be the occurrence of a drift in probe position. This drift may occur for several reasons e.g. inflammation in the brain tissue, human interaction or unexpected animal movement. Such a drift causes us to lose track of an activity of interest which is recorded earlier or in previous sessions. Furthermore, for discriminating a single neuron and studying its behaviour in long term it is necessary to make sure that the probe remains in the starting configuration and a particular channel and records from the neuron of interest. In addition to that, Having prior information about quality and properties of signals recorded by each channel makes it possible to select the next configuration more efficiently and accurately which provide us with high quality and less noisy signals from neural activities. Therefore, we need to be able to identify each recording channel(each channel is assigned to one electrode during the recording). 1.2 Problem Statement In this work, we try to identify characteristics of each recording channels. For this purpose, first, we compute sets of features from measured signals of each channel. Then we applied supervised machine learning techniques to identify recording channels based on the computed features. In the This con- text, class labels are channel IDEs connected to particular electrode on the probe and their activity 2
  • 8. CHAPTER 1. INTRODUCTION 1.3. OUTLINE represented by sets of extracted features from their measured signals. There are three main challenges here. first computing features from the measure signals and choosing relevant methods to do such a computation. Second, select an appropriate supervised machine learning algorithm and select suitable number of features in order to get maximum classification accuracy given a learning algorithm. Third, providing series on analytical approaches to interpret the classification results and make a conclusion about channel identification and drift occurrence. It can thus be suggested that such an identification makes us able to track down a particular activity during and between a long term in-vivo recording sessions and deal with the probe drift from its original position. In the other words, if we lose a signal quality in a recording session we could use this prior information and chose electrodes which they have shown higher signals quality for next configuration. Furthermore, unintended movement in the probe position between different recording sessions with same recording configuration could be detectable. If there was the drift in the probe position between recording session we could observe that a particular activity now identified to be in a new channel below or above its original channel relative to drift direction. 1.3 Outline The rest of this work is structured as follows. In the next chapter, we discuss related work about Electronic Depth Control (EDC) in intra-cortical recordings and spike detection and detecting active brain regions and finally recording from the same neuron in motor cortex. In Chapter 3, we discuss in fine detail our approach for spike detection and feature extraction from in-vivo recording data sets and classification and channel identification using these features in order to track down the neural activities between different recording sessions and in the long term in-vivo recordings. In Chapter 4, we present the results of our chosen approach. Finally, in chapter 5 we summarize what we have achieved in this work and discuss how this work can be extended. 3
  • 9. CHAPTER 2 RELATED WORKS 2.1 Spike Detection Algorithms In the extracellular recording, a spike or action potential is a short lasting high amplitude signal fired by a neuron. Spikes are produced by rising and falling potential of the neuron cell membrane. During neural activity a neuron fires spikes by particular amplitude, shape, and different rates. Each neuron has spikes of a characteristic shape and firing rates, which is mainly determined by the morphology of their dendrite trees and the distance and orientation relative to the recording electrodes [5]. In order to extract features from recordings first we need to extract recorded spikes of each channel. There are two common types of spike detection algorithms available. First supervised algorithms which need user invention such as window discrimination [1], principal components analysis [6] and matched filtering [7]. However, using supervised algorithms would be very tedious when we have a comb of multi-array electrodes, and we need to adjust the setting for each channel separately. The second common type of spike detection algorithms are unsupervised category. These algorithms require no user intervention e.g. algorithms based on amplitude detection [8],non-linear energy detection [9, 10, 11] and wavelet-based detection [12, 13, 14]. In a study by Obeid and Wolf [15], spike detection algorithms have been compared taking into account their accuracy and their computational cost. It was found that taking the absolute value of the neural signals before applying a threshold in combination with a refractory period is just as effective for spike detection as more elaborate energy based schemes. Therefore, in this work we used absolute value of signal and adaptive-threshold spike detection algorithm. 2.2 Detection of Active Brain Regions Using a Machine Learn- ing Approach Given we have all detected spikes for each channel we need to compute sets of essential features from them and use them to identify properties of a recording channel. There were approaches by Ramirez et al. [16] which applied machine learning algorithms in order to classify activity of each channel and find out which channel records a single unit activity and which one multi-unit activities. In their work first they have trained a learning algorithm using feature extracted from detected spikes of labelled data and then have performed prediction on the unlabelled data. They present in their work which kind of features could possibly be extracted from detected spikes and which combinations of those features would lead to more accurate classification result. Their goal in the short-term was to develop algorithms that assist the neuroscientists in detecting active brain regions. And their long-term perspective was to have a smart neural recording array which allows for the finding and maintenance of high quality neural signals through the fully automatic selection of many electrodes in active brain regions. The kind of features they have used have two main categories. First features computed directly from measured signal itself i.e. Min, Max, Mean, Median, Standard Deviation (STD) and Root Mean Square (RMS). Second kind of features that are computed from detected spikes i.e. Signal to Noise Ratio (SNR), average Firing Rate and Maximum Firing Rate of spikes. These second category of features have different variation regarding refractory time for spikes i.e. 1 ms or 2 ms and different firing rates and average firing rates 4
  • 10. CHAPTER 2. RELATED WORKS 2.3. RECODING FROM SAME NEURON time window i.e 20 ms, 100 ms, 500 ms, 10s. In our work we use all possible combinations of these features i.e. 43 features all together, to get highest accuracy in classification result. The major difference between approach by Ramirez et al. [16] and our goal is they tried to classify the activity types i.e single unit activity (SUA), multi-unit activity (MUA), and noise activity (NA) but we trying to classify the recording channels. 2.3 Recording From the Same Neurons Chronically in Motor Cortex Neuro-Biologists during chronic extracellular recordings frequently have observed similar activity recorded on the same electrode from day to day. Occasionally a single neuron has had some unusual characteristic, such as a distinctive waveform or some unusual and obvious firing property, that makes it clear that this same neuron was present in multiple sessions. The possibility that some neurons may be represented multiple times in a series of recording sessions creates a problem and an opportunity. Separately recorded neurons may not actually represent independent sources of data, so statistical tests that assume each unit is an independent sample may not be valid. However, If the same neuron could be identified as such through multiple sessions, it would be possible to combine data and thereby estimate the firing properties of that neuron with greater confidence. Fraser and Schwartz [17] have developed a new metric of unit identity using pairwise cross-correlograms between neurons in a simultaneously recorded pop- ulation. It has provided unit identification information comparable to that based on wave shape. By combining this metric with wave shape, autocorrelation shape, and mean firing rate, they were able to clearly identify whether two separately recorded units represent the same or different underlying neurons. There are similarities between the goal of our project and work of Fraser and Schwartz [17]. They have used feature vector consist of firing rate and wave from of spikes to represent the activity of each channel and then used these features to classify the activity of each neuron. The had some strong assumption that by using Utah micro-array [18], which has electrode pitch of 400µm, each electrode would record from different neurons. In the other words, they assume it is less likely that two adjacent electrodes record from the same neuron. This assumption makes it possible for them to use waveform of the spikes as part of their features to track down that the activity of a particular neuron in long-term recordings and between different recording sessions. However, in ECD-Probes with a high electrode density, it is more likely that some adjacent electrodes record from the same neuron due to small pitch size i.e. 40µm. Therefore, in our work we use different feature vector to represent the activity of each channel and then we apply the supervised learning algorithms to identify channels and track down their activity of each channel. 5
  • 11. CHAPTER 3 CHANNEL IDENTIFICATION 3.1 Approach The main purpose of this project is channel identification for tracking a neuron of interest between dif- ferent recording sessions. To do this we design a pipeline with four major steps which leads up to desired conclusion. The diagram in the figure 3.1 shows this four major step. Spike Detection And Feature Extraction Data Pre-Processing : Normal- ization And Attribute Selection Supervised Machine Learning For Channel Identification Tracking Down Neural Activities Figure 3.1: The diagram shows project pipeline. First, detecting spikes and computing features from measured signals of each channel. Second, applying different normalization techniques and attribute selection methods on the provided dataset. The dataset here is computed feature vectors for all channels. Third, training and evaluating the performance of the different classifiers in conjunction with attribute selection methods and normalization techniques. This would enable us to identify a particular channel based on its computed features. Fourth, training and testing classifiers with the dataset of consecutive recording sessions. This will allow us track down neural activities between different recording sessions and detect unintentional drift in the probe position. Based on the diagram in figure 3.1, first, we need to characterize each channel given their measured signals. Each channel could be represented by sets of features extracted from its measured signals. As we mentioned earlier in the previous chapter, there were different methods for detecting action potentials or spikes and feature extraction [16] in order to classify the activity type of each recording channel using supervised machine learning algorithms. We employ same methods to extract features. After computing features for each recording channel, we try some data pre-processing steps i.e. nor- malization techniques and attribute selection methods in order to deal with the noisy data and increase the prediction accuracy of the classification. Then we apply machine learning algorithms to find out who 6
  • 12. CHAPTER 3. CHANNEL IDENTIFICATION 3.2. SPIKE DETECTION AND FEATURE COMPUTATION well we could identify each channel given their computed features. Next step is to use supervised machine learning algorithm in combination with pre-processing steps to identify a channel given its computed features. Therefore we need to train and evaluate the performance of the different classifiers. This will give us a notion of feasibility of the channel identification problem. Finally, by using supervised machine learning algorithm we would be able to track down the activity of different channels. Given that we have two recording sessions with same channel configuration, the idea is to train a learning algorithm by dataset provided from the first recording session. Then, test trained models with dataset provided from the second session. The prediction result shows how well the neural activities are traceable and whether or not we had an unintended movement on the probe position. An efficient implementation of supervised learning algorithms is available in the Weka machine learn- ing tool [19]. In this work we implement a light framework for detecting spikes and computing features from them using C++ programming language. In the result section, we observe the performance of each of the introduced algorithms to find out which one gives more accurate results and better identification. 3.2 Spike Detection Algorithm and Feature Computation To compute features for each channel we need to extract spikes of measured signals of that channel. Here each channel connected to a specific electrode on the probe shank and these connection adjustable for each particular recording session. Figure 3.2 shows 10 seconds of raw measured signal for eight channels in EDF format. Afterward, all recordings are filtered using a bandpass filter between frequencies of 500 Hz and 5000 HZ. Then they could be processed in order to calculate the attributes to characterize the recorded signals. Figure 3.2: This graph shows 10 second of raw recordings of neural activity for eight different channels before filtering. Signal units are in mV (plot using EDFBrowser [20]). Figure 3.3 shows the same recorded signals depicted in figure 3.2 after it has been filtered. 7
  • 13. CHAPTER 3. CHANNEL IDENTIFICATION 3.2. SPIKE DETECTION AND FEATURE COMPUTATION Figure 3.3: This graph shows 10 second of filtered signals for eight different channels, filtered by band-pass filter between 500 Hz and 5000 Hz. Signal units are in mV (plot using EDFBrowser [20]). In this work, we apply adaptive threshold spike detection algorithms [15]. The work follows for spike detection and computing features is done similarly as the work we introduced in previous chapter [16]. The idea is first to estimate the background noise for the time window of 50 ms. Then detect all signal samples whose their absolute value exceeds this noise level by a factor of 3.5 or 5. After detecting spikes we could compute the signal to noise ratio (SNR) value of each channel in the time window of 10s. First RMS of each spike is calculated using the signal 0.5 ms before the peak of the spike and 1 ms after the spike. Then the RMS of all spikes is averaged and the RMS of the noise is calculated, where noise is the portion of the signal excluding the founded spikes. At last the SNR is calculated as follows: SNR = 20 · log10 ¯RMSspikes RMSnoise (3.1) In order to compute appropriate features, we use different combinations of the threshold i.e. 3.5 and 5 and refractory time for spike detection i.e. 1 ms and 2 ms. This method makes it possible to detect spikes with 4 different combinations. For each one of these combinations, the maximum firing rate in intervals of 20 ms, 100 ms, 500 ms and 10s and their average value were calculated and defined as at- tributes, which, taking into account the 4 different combinations of parameters for the SNR calculation, produces 36 different attributes. For example, if using a 3.5 threshold multiplier with a 1 ms spike re- fractory window defines nine attributes for the different maximum and average firing rate intervals, then using a 2 ms window instead of a 1 ms window yields nine new attributes and so on. There are other features i.e. minimum (Min), maximum (Max), mean, median, standard deviation (STD), root means square (RMS) Signal, average noise level (ANL) which we use them to apply classification algorithms and channel identification. In comparison to [16] the Average Noise Level (ANL) is a new feature which is computed from the average value of the noise level value in one segment of measured signals , 10 s of a particular recording session. The value of ANL could represent the quality of the measured signals and in the experiment section, we will show how ANL value is used as a good feature for classification. We compute all 43 attribute in our feature vector and each feature vector computed in the time window of 10 s which is one segment of the measured signals. 8
  • 14. CHAPTER 3. CHANNEL IDENTIFICATION 3.3. DATA PRE-PROCESSING 3.3 Data Pre-Processing Each element in computed features vectors from extracellular recordings and their detected spikes has its own range. Since some of the supervised machine learning algorithms are using similarity measures between feature vectors for classification, features better to have the same scale. Data normalization techniques are common in machine learning in order to deal with this problem. Basically, global nor- malization techniques are essential preprocessing part for many machine learning algorithms and boost their performance. One more problem here is the number of computed features with our employed approach i.e. 43 feature per sample per class. This is a high dimensional feature vector and would make the classification task difficult especially when we have some noisy and irrelevant features, computed from noisy measured signals. In fact, due to noisy measurements or high background noise activities and an existence of the artifacts in the measured signals, some of the computed features are irrelevant or redundant. This phenomenon could dramatically reduce the prediction accuracy of the classifiers. However, There are exist common attribute selection and dimension reduction methods in the field of machine learning for overcoming these problems. In the following, we briefly explain our candidate methods and techniques for data normalization and attribute selection. 3.3.1 Normalization Techniques In order to increase the performance of the supervised learning algorithm we apply normalization tech- niques on our datasets. Since the computed features have different scales and ranges and this preprocess- ing steps most of the time shows significant improvement on the performance of the learning algorithm by reducing the correlations between data samples and scaling them into similar range. In this work we applied two common types of global normalization techniques which are used frequently [19] in machine learning algorithms more specifically in Support Vector Machine (SVM) : • Min-Max Normalization. ´D(i) = D(i) − Min(D) Max(D) − Min(D) ∗ (U − L) − L. (3.2) Here ´D is normalized vector and D is the natural vector, Min(D) is the minimum natural value, Max(D) is maximum natural value, and U and L are the upper and lower band of the scale range usually between [0:1] or [-1:1]. • Zero Mean Normalization ´D(i) = D(i) − ¯µ σ . (3.3) Here ´D is normalized vector and D is the natural vector, µ mean of the natural data, and σ is the standard deviation of the natural data. 3.3.2 Attribute Selection High-dimensional feature vectors do not always increase the prediction accuracy of the supervised learn- ing algorithms. In machine learning feature selection also known as variable selection, attribute selec- tion or variable subset selection is a technique for reducing the dimensionality of the feature vectors [21, 22, 23]. Feature selection methods could lead to i) improvement in the prediction performance of the predictor, ii) faster and more cost-effective predictors, iii) provide the better understanding of the process that generates the data. In classification problems, especially when there are few sample data with high-dimensional feature vectors there are possibilities that we have irrelevant and redundant features. Redundant features are those provide no more information than currently selected features. Irrelevant features are those provide no useful information in any context. In dealing with extracellular neural activities, there are necessities to use feature selection methods due to an existence of background noise activities. High background noise level has the negative impact on the performance of the spike detection algorithms and quality and quantity of the computed features. Therefore, there is likelihood to have redundant and irrelevant features and attribute selection could deal with this problem. There are two common attribute selection methods which are used widely in machine learning field for reducing the dimensionality of the feature vectors. One is Principle component analysis [24, 23] and second is 9
  • 15. CHAPTER 3. CHANNEL IDENTIFICATION 3.3. DATA PRE-PROCESSING Correlated feature selection (CFS) [25, 26] . Principle Component Analysis (PCA) Principle Component Analysis (PCA) is a statistical procedure which orthogonally transforms input data with some dimension n to sets of linearly uncorrelated data with same or lower dimension, m, using variables called principal components. Mathematically speaking, each principle component represents a feature from input data in the new coordinate system. The highest rank among the principal components goes to the feature with the highest variance, and that component lies on the first coordinate of the new coordinate system and the second rank on the second coordinate and so on. In fact, PCA is not a feature selection but a feature extraction method. The new attributes are obtained by a linear combination of the original attributes. Dimensionality reduction is achieved by keeping the m components with the highest variance out of n original components. The common version of this method has [19] has following steps : • Compute the covariance matrix of the original training samples. Then solve all the eigenvectors and eigenvalues. • Rank attributes by their individual evaluations. Use in conjunction with attribute evaluators (ReliefF, GainRatio, Entropy etc). • Select m highest ranked features. Correlation Feature Selection (CFS) The other feature selection method we use in this work is Correlation Feature Selection (CFS) [25, 26]. CFS is a measure that selects a subset of features from original feature vectors in a way that the features from subset are highly correlated to the class labels and uncorrelated with each other. CFS could ignore irrelevant features because they have low correlation with the class labels. CFS also screen out redundant features due to their high correlation with other features. A feature will be accepted if it predicts classes in the area of the instance space which not already predicted by other features. Given a subset in feature space S containing k features, CFS evaluates the subset based on the following ”merit” : MS = krcf k + k(k − 1)rff . (3.4) where MS is a heuristic merit of the feature subset S containing k features, rcf is the mean feature- class correlation (f ∈ S), and rff the average feature-feature correlation. The numerator of the equation 3.4 is the indicator for predictive of a class given a set of features and denominator shows the amount of the redundancy in among the features. Selecting all possible subsets of features is very exhaustive and sometimes not feasible due to large number of attributes, in [25, 19] there are experimental approaches to select heuristics search strategies : • Forward selection begins with no feature and greedily add one feature at a time until no possible single feature addition possible. • Backward elimination begins with all features and greedily remove one feature at a time as long as evaluation does not degrade. • Best first search starts either with no features or all features. It progresses forward and add features or backward and remove features to or from sunset and has a stopping criterion. Furthermore, there is tree variation of CFS [25, 19] each employing one of the following attribute quality measures to estimate the correlations in equation 3.7 : • CFS-UC uses symmetrical uncertainty to measure correlation. • CFS-MDL normalized symmetrical minimum description length (MDL) principle to measure the correlation. • CFS-Relief uses symmetrical relief to measure correlation. 10
  • 16. CHAPTER 3. CHANNEL IDENTIFICATION 3.4. MACHINE LEARNING FOR CHANNEL IDENTIFICATION 3.4 Supervised Machine Learning Algorithm For Channel Iden- tification We need to evaluate the effect of the normalization and attribute selection methods on the different su- pervised learning algorithm. This will give us a notion about the feasibility of classification and channel identification problem. By looking at validation result of the classifiers we could argue that how well each classifier could identify each channel based on it computed features. Here we expect due to density and geometry of the electrodes on the probe shank to have some similar activities in adjacent electrodes. In the following, we explain about the four different classifiers i.e. K-Nearest Neighbour (K-NN), Iterative Dichotomiser 3 (ID3), Random Forest and Support Vector Machine (SVM) and their different param- eter settings. By comparing their results on our dataset we could select the most appropriate classifier for our goal. In order to find out how well each classifier generalizes, we need to use cross-validation technique. Furthermore, some of the supervised learning algorithms need precise parameter selection, therefore, we use hyper-parameter optimization methods to increase the prediction accuracy of those learning algorithms. 3.4.1 K-Nearest Neighbour (K-NN) One of the learning algorithms we have selected is K-Nearest Neighbour (K-NN) [27]. The idea is to classify an object based on the majority vote of its neighbors, with the object being assigned to its K nearest neighbors. Each object is represented by its feature vectors and the algorithm uses a similarity measure in order to find a nearest neighbor e.g. Manhattan distance or euclidean distance. There are parameters and setting which improve the accuracy of classification e.g. weighting neighbors with their relative distance and K number of neighbors. The k-nn algorithm is recommended because it is easy to understand, simple to train and it gives an inside about the feasibility of our classification task. However, the algorithm readily fooled by noise and irrelevant data and biased by the value of K and it is computationally intensive for large data sets. By using appropriate nearest neighbor search algorithms e.g. KD-Tree the K-Nearest Neighbour (K-NN) algorithm would be computationally tractable. 3.4.2 Iterative Dichotomiser 3 (ID3) The second algorithm we used for classification is Iterative Dichotomiser 3 (ID3) [28]. Here the idea is to split the dataset to subsets based on the selected attribute and add a non-terminal node to decision tree and continue this process recursively on each subset. Terminal nodes represent the class label of its branch. In terms of selecting attributes, we choose the one with largest information gain or smallest entropy among non-selected attributes. The four main state in Iterative Dichotomiser 3 (ID3) algorithm are : • Calculate the entropy of every attribute using the data set S. • Split the set S into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum). • Make a decision tree node containing that attribute. • Recursively on each subset repeat previous three steps using remaining attributes. We employ Iterative Dichotomiser 3 (ID3) algorithm because it treats each feature separately based on a probabilistic approach. It builds the decision tree fast and uses the whole dataset to create the tree. furthermore, its results are invariant to natural or normalized data. But, the Iterative Dichotomiser 3 (ID3) algorithm may face the over-fitting problem and be biased in favor of some of the attributes with high information gain. 3.4.3 Random Forest Algorithm The third supervised learning algorithm which we used is Random Forest [29]. The algorithm creates a forest of decision trees in training time and outputs the class that is the mode of the classes outputted by each individual tree. Given a sample set the algorithms grows each tree as follows : 11
  • 17. CHAPTER 3. CHANNEL IDENTIFICATION 3.4. MACHINE LEARNING FOR CHANNEL IDENTIFICATION • If the number of cases in the training set is N, sample N cases at random - but with replacement, from the original data. This sample will be the training set for growing the tree. • If there are M input variables, a number m << M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. • Each tree is grown to the largest extent possible. There is no pruning. The Random Forest algorithm follows almost the same core principle of Iterative Dichotomiser 3 (ID3) algorithm but usually shows better performance and its results are virtually invariant to the normalized or natural dataset. 3.4.4 Support Vector Machine (SVM) Algorithm Our fourth candidate algorithm is Support Vector Machine [30] which is one the sophisticated supervised learning algorithms. The idea of the Support vector Machine is to separate sample data in d dimen- sional space using d-1 dimensional hyperplanes. There is the reverse relation between the distance of the hyperplanes to sample points i.e. margin and the generalization error. The larger the margins are the smaller the generalization error is. Base on that, here we are dealing with an optimization problem. Whereas in the classification or regression problem mostly facing with non-linear data distribution, Support Vector Machine (SVM) uses a kernel function to transform the data samples to same or higher dimensional feature space which they are linearly separable. There is three common non-linear kernels used for mapping the samples to higher dimensions: • Polynomial (homogeneous): k(xi, xj) = (xi · xj)d . (3.5) Here xi, xj are samples which represented by feature vectors and d is polynomial degree. • Gaussian Radial Based Function: k(xi, xj) = exp(−γ xi − xj 2 ), for γ > 0. (3.6) Here xi, xj are samples which represented by feature vectors and γ is the kernel coefficient. • Hyperbolic Tangent: k(xi, xj) = tanh(κxi · xj + c), for some (not every) κ > 0 and c < 0. (3.7) Here xi, xj are samples which represented by feature vectors and κ is kernel coefficient. Although Support Vector Machine is a very sophisticated supervised learning algorithm but it needs careful selection of the model i.e. kernel types and parameters specification in order to retrieve a highly accurate result. Empirical model selection is a very tedious and interminable task. Therefore, in the following, we will explain some common methods to deal with this problem. 3.4.5 Cross-Validation The cross-validation [31] is a technique to measure how well the predictive model will generalize indepen- dent of the data that were used to train the model. In machine learning cross-validation measures how the well trained model will perform in practice. Each model has one or more unknown parameters and when the number of the sample small or number parameter is large the model will face the over-fitting problem. One way the cross-validation technique deal with this problem is dividing the sample data to K equal subsets (K-fold cross-validation) then use K - 1 subsets to train the data and 1 subset for validation. The algorithm repeats same procedure (K times) until all the individual subsets being used as validation sets. At the end, the K results from the folds can be averaged (or otherwise combined) to produce a single estimation. The common values for K are 3,5,10 depending on the size of the training data. In our task, we used K = 5 for all four different algorithms. 12
  • 18. CHAPTER 3. CHANNEL IDENTIFICATION 3.5. TRACKING DOWN NEURAL ACTIVITIES 3.4.6 Hyper-Parameter Optimization Hyper-parameter optimization is the problem obtaining generalization for a learning algorithm by choos- ing a set of parameters. The idea here is to adjust different model parameters in order to minimize the loss function on the training data. There are different approaches for hyper-parameter optimization e.g. global parameter optimization using Gaussian Processes [32] or simple grid search [33]. In this work, we used grid search fortune parameters of a particular model in order to increase the accuracy of it. The idea is to set a range for each of parameter and a step size. Then go throughout all possible combinations of parameters and create the model and train the model with them and find out which combinations minimizes the loss function value or in the other words gives a higher accuracy. In previously introduced supervised learning algorithms, support vector machine is the one which demands hyper-parameter optimization because of a complexity of selecting the parameter and the wide range of choices which need a mechanism to tune them. For support vector machine there mainly three parameter need to be tuned i.e. kernel function, C constant (regularization parameter) and γ factor (kernel multiplier). 3.5 Tracking Down Neural Activities After finding out the best combination of the normalization techniques, attribute selection methods and classification algorithms we try to track down neural activities between different recording sessions. Now we have the notion how well we could identify each channel give its feature vector computed from its measure signals. Therefore, we want to know how likely it is to identify same activity between different recording sessions on the same channel or its adjacent channels. The idea is to identify each specific channel in a particular recording session using data pre-processing and machine learning approach that we mentioned earlier in this chapter. Then, compute sets features for one another recording session with the same channel configuration as the we built the model with and using test trained model. Here we try to predict the class labels for new of measured signals. By looking at the differences between predicted class labels and actual class labels in the test result we could argue that how well the cur- rent recording of particular channel is predictable based on the earlier measurements of the same channel. To make this argument we provide precision-recall analysis and observed the classification error results and confusion matrices in the test phase. The result of confusion matrix could show a particular channel activity now appears most likely on the same channel or elsewhere. Due to the density of the electrodes and geometrical position of them on the probe shank if there was subtle unintended movement in the NeuroProbe position, the activity of a particular channel in the test phase would be classified to its adjacent channel relative to drift direction(during recording each channel is connected to specific electrode on the probe shank). 13
  • 19. CHAPTER 4 EXPERIMENTS AND RESULTS 4.1 Dataset Of Extracellular Recording For our experiment, we used a dataset from in-vivo recordings, performed in April 2009 and February 2010 at the Institute for Psychology of the Hungarian Academy of Sciences in Budapest (Hungary). Data acquisition is done using NeuroSelect software [34]. The operation of three micro-probes was verified by acute implantation in the Neocortices of Wistar rats. One probe was implanted in the primary motor cortex (2 mm in the lateral direction, aiming M1/M2) and 2 probes were implanted in the S1 trunk region (see Figure 4.1). The data was pre-amplified (g=10 gain, bandpass filtered between DC and 100 kHz) and amplified (g=100 gain, bandpass filtered between 0.5 kHz and 5 kHz) with a total gain of 1000. Signals were digitized at 16-bit resolution and 20 kHz sampling rate per channel. Figure 4.1: Cross section of the located area of one implantation (based on [16]). The probe was inserted 2 mm in lateral direction aiming for the M1/M2 region indicated by the black line. Before trying to track dawn the neural activities between different recording session, We need to evaluate the effect of the normalization and attribute selection methods on the overall performance of classification algorithms. And also we need to know which combination of the introduced pre-processing methods and classification algorithms will give us higher prediction accuracy. Hence, we selected rela- tively large dataset containing all recording sessions of day 15.02.2010. This dataset has a specification which there was no intentional movement in the probe position and also it has tried out most of the electrodes available on the probe shank for signal measurement. This dataset contains 152 electrodes 14
  • 20. CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES which are also considered as class labels and proximately 15 sample per class, altogether 2199 samples. It should be mentioned that in some of these recording sessions there are not enough qualified measure- ments. It means some of the class labels have fewer samples, around 12 samples per class, because of poor signals and channel disconnection and outliers. All these outliers are ignored in spike detection and feature extraction steps. For tracking the neural activity between different recording sessions we need to have consecutive recordings with same electrode configurations. Then we would be able to train candidate classification algorithms with enough number of samples and choose the one with the highest accuracy. Among the available recordings we chose a dataset i.e. a pair of consecutive recording sessions with same elec- trode configuration from day 12.02.2010 for channel identification and tracking down the activity of each channel between recording sessions. These particular datasets chose because they have same electrode configuration and there was no deliberate movement into probe position during and between the recording sessions. Furthermore, the dataset contains high quality measured signals which advocate the presence of neural activity. Here, we used one session of our data for training the algorithms and another session for testing the algorithm. Each session of contains measured signals for eight channels connected to elec- trodes 43,44,45,46,141,142,143,144. For each channel, we have 15 samples and each sample computed from 10 s of recorded signals, represented by their feature vectors, altogether 120 samples per-session. All recordings are available in EDF (European Data Format) and there is the library called EDFLib [35] available for manipulating them. In each of chosen recording sessions in both of the datasets, There were 8 electrodes, pair of the tetrode, selected and assigned to channels. Our implemented light Frame- work, detect spikes and computer features we discussed in the previous chapter in order to use them in classification algorithms. 4.2 Detected Spikes and Extracted Features The first step in our approach was to detect spikes and compute some features from measured signals. Figure 4.2 shows detected spikes for 10 s of recording using 3.5 threshold factor and 1 ms spike refractory time. Compare to figure 4.2 in figure 4.3 we see detected spikes of the same recording segment with threshold factor of 5 and 2 ms of spike refractory time window. As it depicted there are fewer spikes detected with high threshold factor and this leads to different values for average firing rates. Figure 4.2: Detected spikes from 10 s of the one channel activity using 1 ms spikes refractory time window and 3.5 threshold factor. Here the raw signal refers to filters signal which was used as input for spike detection algorithms. In order to have more information about the activity of each channel, we would need to compute all possible feature using different spike refractory times and various threshold factors. 15
  • 21. CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES Figure 4.3: Detected spikes from 10 s of the one channel activity using 2 ms spikes refractory time window and 5 threshold factor. Here the raw signal refers to filters signal which was used as input for spike detection algorithms. The figure 4.4, 4.5, and 4.6 show histogram distribution of the signal to noise ratio (SNR), maximum (Max), and standard deviation (STD) value for four different channels in the same recording session. Here we could see how these extracted features have to overlap in their distributions which make the classification and channel identification a difficult task using these features. For instance, SNR values of all four channels i.e. channel 3,4,6 and 8 lay down mostly in the range of 8 to 9. Furthermore, it is clear that the value range of these features is different which also has the negative effect on the result of the classification task. Therefore, we would need to normalize the feature value. Figure 4.4: Histogram distribution of signal to noise ratio (SNR) value for detected spikes with threshold factor 5 and 1 ms refractory time for channels 8,6,3 and 4 connected to electrodes 141,142,143 and 144 including 15 samples each.The Y-axis is the number of samples, 15 samples all together, and the X-axis is the value of the each samples. The measured signals belong to one session on extracellular recordings on day 12.02.2010. 16
  • 22. CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES Figure 4.5: Histogram distribution of standard deviation (STD) value for measured signal of the channels 8,6,3 and 4 connected to electrodes 141,142,143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all together, and The X-axis is the value of the each samples. The measured signals belong to one session on extracellular recordings on day 12.02.2010. Figure 4.6: Histogram distribution of maximum (Max) value for measured signal of the channels 8,6,3 and 4 connected to electrodes 141,142,143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all together, and The X-axis is the value of the each samples. The measured signals belong to one session on extracellular recordings on day 12.02.2010. The histogram distribution of the maximum value in figure 4.6 shows that its value is irrelevant and would not contribute to classification task. Here employed attribute selection methods to remove such redundant and irrelevant feature and perform the classification with the smaller subset of the feature. 17
  • 23. CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION 4.3 Data Normalization and Attribute Selection In this section, we applied two attribute selection methods i.e. Correlation Feature Selection (CFS) and Principal Component Analysis (PCA). We applied Min-Max and Zero-Mean normalization methods as preprocessing step on the large dataset from Day 15.02.2010 with 152 class labels. To perform the global Min-Max normalization and scaling first we computed the global minimum and maximum of each particular feature among all samples of all classes of that feature then subtracted each global minimum value from each feature, divided it by difference of global maximum and minimum value and then scale each feature between [-1,1]. To perform the global Zero-Mean normalization first we computed the global mean and standard deviation of each particular feature among all samples of all classes of that feature then subtracted each global mean value from each feature and divide it by standard deviation value. The figure 4.7 and 4.8 the show histogram distribution of the signal to noise ratio (SNR) and maximum firing rate (MFR) value for two different channels in a same recording session. Here by comparing feature value of the natural data and normalized data, we could see how normalization produce a completely different scales and new values for each feature . Figure 4.7: Histogram distribution of SNR for natural, Min-Max normal and Zero-Mean normal value detected spikes with threshold factor 5 and 2 ms refractory time for channels 3 and 4 connected to electrodes 143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all together, and The X-axis is the value of each samples. The measured signals belong to one session on extracellular recordings on day 12.02.2010. 18
  • 24. CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION Figure 4.8: Histogram distribution of Maximum Firing Rate (MFR) with the time window of 10 s for natural, Min-Max normal and Zero-Mean normal value for detected spikes with threshold factor 5 and 2 ms refractory time for channels 3 and 4 connected to electrodes 143 and 144 including 15 sample each. The Y-axis is a number of samples, 15 samples altogether, and The X-axis is the value of each sample. The measured signals belong to one session on extracellular recordings on day 12.02.2010. Then for finding the best subset of features from 43-dimensional feature vectors we applied Correla- tion Feature Selection (CFS) and Principle Component Analysis (PCA) dimension reduction methods. It needs to be mentioned that attribute selection is done before classification step and it is independent of the supervised learning algorithms which are used for classification. Furthermore, as we could see in table 4.1, the subset features selected by Correlation Feature Selection (CFS) and Principle Component Analysis (PCA) methods are depend on the type and distribution of the provided input data. Here most of the selected feature by both algorithms came from measured signal itself rather than detected spikes i.e. minimum (Min), Median, standard deviation (STD), root mean square of the recorded signal (RMS Signal), and average noise level of the measured signal (ANL). 19
  • 25. CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION Table 4.1: List of selected attributes by Correlation Feature Selection (CFS) and PCA methods. Dimension reduc- tion methods applied on Min-Max and Zero-Mean Normalized dataset of the Day 15th of iv-vivo recording. Complete descriptions of the attributes in appendix A. CFS on Min-Max and Zero-Mean Normal- ized Data PCA on Min-Max and Zero-Mean Normal- ized Data Min Min Median Mean STD STD RMS Signal RMS Signal ANL SNR tf 3.5 sr 2ms SNR tf 3.5 sr 1ms SNR tf 5 sr 2ms AFR 20ms tf 3.5 sr 1ms MFR 500ms tf 3.5 sr 1ms We applied Correlation Feature Selection (CFS) method using best first searching heuristic and forward direction. Table 4.1 shows that the feature subset selected for both Min-Max and Zero-Mean normalized dataset are the same. Results for the Principle Component Analysis (PCA) method on both normalized data are the same as well. We sued Principle Component Analysis (PCA) method with ranked search strategy and threshold factor equal to -1.797. In the figures 4.1 and 4.2, we see how these selected attributes are distributed. Each method gives us seven selected attributes which mean the dimension reduction from forty-three to seven. The first four selected attributes by both methods are similar i.e. minimum value, mean value, standard deviation (STD), and root mean square (RMS) of 10 s measured signal. But the remaining three attributes are different. The Correlation Feature Selection (CFS) method selected average noise level (ANL) for 10 s of measured signal, signal to noise ratio (SNR) with 3.5 threshold factor and 1 ms spike refractory time, and average firing rate (AFR) for time window of 20 ms with threshold factor of 3.5 and 1 ms spike refractory time. The Principle Component Analysis (PCA) method selected signal to noise ratio (SNR) with 3.5 threshold factor and 2 ms spike refractory time, signal to noise ratio (SNR) with 5 threshold factor and 2 ms spike refractory time, and maximum firing rate (MFR) for time window of 500 ms with threshold factor of 3.5 and 1 ms spike refractory time. Figure 4.9: The scattered diagram for selected attributed by CFS method. The X-axis is sample index, 2199 samples per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the attributes are available in appendix A. 20
  • 26. CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION Figure 4.10: The scattered diagram for selected attributed by PCA method. The X-axis is sample index, 2199 samples per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the attributes are available in appendix A. From figure 4.9 and 4.10, it is visible that the selected attributes using Correlation Feature Selection (CFS) method are more suitable for classification problem than those selected with Principle Component Analysis (PCA) method. For instance, the value of the attribute AFR 20ms of 3.5 sr 1ms, average firing rate for the time window of 20 ms with threshold factor of 3.5 and 1 ms spike refractory time, selected by Correlation Feature Selection (CFS) method gives more separable distributed relative to the value of other selected attributes. But, on the other hand, the value of the attribute MFR 500ms of 3.5 sr 1ms, maximum firing rate for time window of 500 ms with threshold factor of 3.5 and 1 ms spike refractory time. The dimension reduction methods are aimed to remove redundant and unrelated attributes from the given feature vectors. In the figure 4.11, we could see the scattered diagram for some of these attributes which are ignored by both Principle Component Analysis (PCA) and Correlation Feature Selection (CFS) attribute selection methods. The graph in figure 4.11 indicates that these features have almost the same value distributions, therefore, they don’t contribute to classification problem so much. This phenomenon may occur due to high background noise and artifacts in the measured signals or absence of the neural activity nearby a particular electrode connected to a specific channel etc. 21
  • 27. CHAPTER 4. EXPERIMENTS AND RESULTS 4.4. TRAINING AND VALIDATION OF CLASSIFIERS Figure 4.11: The scattered diagram for unselected attributed by both PCA and CFS methods. The X-axis is sample index, 2199 samples per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the attributes are available in appendix A. 4.4 Training And Validation Of The Classifiers In Conjunction With Pre-Processing Methods Data normalization is the highly recommended pre-processing step for algorithms like Support Vector Machine (SVM) and K-Nearest Neighbour (K-NN). Therefore, in this section, we demonstrated the effect of the dimension reduction methods on the performance of the classification algorithms. We applied our candidate learning algorithms on both Zero-Min and Min-Max normalized datasets. The validation result would show how well we can identify each channel given its sets of features. As we mentioned earlier some classification algorithms i.e. Support Vector Machine (SVM) need parameter tuning in order to obtain the higher accuracy. We used grid search as hyper-parameter tuning method in dealing with this issue. It is apparent from table 4.2 and 4.3 the result obtained by Correlation Feature Selection (CFS) attribute selection could generalize better although there were noisy measurements among some of the recording sessions. In training these algorithms we used 10 fold cross-validation in order to avoid over-fitting and get a notion about which of these algorithms could serve the goal of our project better. Table 4.2: Train results for classification algorithms on Mim-Max normalized data and Correlation Feature Selection (CFS), Principal Component Analysis (PCA) attribute selection methods, and all features. The dataset is provided from all recording sessions of the day 15.02.2010. The γ parameter for Support Vector Machine (SVM) has two different values, 100 for low dimensional dataset and 10 for all dataset with all features. It contains 152 class labels ,here electrode numbers on the probe, and 2199 samples all together. All algorithms trained using 10 fold cross-validation. Algorithm Accuracy CFS Accuracy PCA Accuracy All Parameter Specification ID3 64.93% 39.10% 63.48% K-NN 62.98% 43.65% 40.60% K = 3 and inverse dis- tance weighing SVM 65.71% 44.29% 34.10% C = 250007 and γ = 100 and 10 Random Forest 68.25% 44.29% 63.98% trees numbers = 10 22
  • 28. CHAPTER 4. EXPERIMENTS AND RESULTS 4.4. TRAINING AND VALIDATION OF CLASSIFIERS Table 4.3: Train results for classification algorithms on Zero-Mean normalized data and Correlation Feature Selection (CFS), Principal Component Analysis (PCA) attribute selection methods, and all features. The dataset is provided from all recording sessions of the day 15.02.2010. It contains 152 class labels, here electrode numbers on the probe, and 2199 samples all together. All algorithms trained using 10 fold cross-validation. Algorithm Accuracy CFS Accuracy PCA Accuracy All Parameter Specification ID3 64.75% 39.10% 63.61% K-NN 62.98% 43.65% 40.60% K = 3 and inverse dis- tance weighing SVM 67.75% 47.88% 37.56% C = 250007 and γ = 1.0 Random Forest 68.44% 43.97% 63.57% trees numbers = 10 The following two graphs in figures 4.12 and 4.13 show the confusion matrices for best and worst results from the table 4.2. Comparing this to figures gave us a notion how well each algorithm predicts the class labels and also on which class label we had the most false prediction. The figure 4.12 illustrates the confusion matrix for the Random Forest algorithm applied on the Zero-Mean normalized dataset, in conjunction with Correlated Feature Selection (CFS) method. The results of Random Forest algorithm has the highest accuracy i.e 68.25% relative to other algorithms in table 4.3. On the other hand, Support Vector Machine (SVM) algorithm applied on the Zero-Mean normalized dataset using all features has the worst accuracy i.e 37.56% relative to others. Therefore, the confusion matrix in the figure 4.12 has more visible diagonal with high values which means class labels are predicted to their original class. On the other hand in figure 4.13 we could see those classes which classified wrongly, hence there were more bright regions on both sides of the matrix’s diagonal. Figure 4.12: Confusion matrix for the Random Forest algorithm applied on the Zero-Mean normalized data in conjunction with Correlated Feature Selection (CFS) method. The dataset is provided from all recording sessions of the day 15.02.2010. It contains 152 class labels, here electrode numbers on the probe, and 2199 samples all together. The algorithms trained using 10 fold cross-validation. 23
  • 29. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY Figure 4.13: Confusion matrix for the Support Vector Machine (SVM) algorithm applied on the Zero-Mean normalized data using all features. The dataset is provided from all recording sessions of the day 15.02.2010. It contains 152 class labels ,here electrode numbers on the probe, and 2199 samples all together. The algorithms trained using 10 fold cross-validation. From the results presented in tables 4.2 and 4.3, we could conclude that all four supervised learning algorithms in conjunction with Correlation Feature Selection (CFS) feature selection method perform virtually at the same level on the both Zero-Mean and Min-Max normalized data. Therefore, in the following section, the combination of Correlation Feature Selection (CFS) method and four classification algorithms are tried on both Min-Max and Zero-Mean normalized data in order to find out how well we could track the neural activities between different recording sessions. 4.5 Tracking Down Neural Activities Using Supervised Learn- ing Algorithms In this section, we trained and test the previously selected classification algorithms using the specific dataset we introduced earlier in this chapter i.e. two consecutive recording sessions from day 12.02.2010 of in-vivo experiment as we mentioned earlier in this chapter. Each recording contains eight channel of recording connected to eight electrodes. Hence, we have eight class labels and fifteen sample per class. The aim in the current step was to first train learning algorithms in conjunction with Correlation Feature Selection (CFS) method using the dataset from the first recording session. Then to test trained algorithms with the dataset from the second recording session. The accuracy of prediction in the test phase could give us a notion that how well neural activities between two recording session are traceable. The difference between this experiment and the earlier one in this chapter is, in the previous experiment all the computed features belong to the same recording session. However, in the current experiment, we have datasets belong to two different consecutive recording sessions. In the pre-processing step, we first applied Min-Max and Zero-Mean normalization based on the same principal mentioned earlier in this chapter and then we tried out Correlation Feature Selection (CFS) method to reduce the dimension of the feature vectors. Table 4.4 presents the selected attributes using Correlation Feature Selection (CFS) method. Compare to attribute subset we had in table 4.1, there are 4 more attributes in the new subset. Furthermore, the types of the selected attributes are different. In the subset of our smaller dataset, we have more attributes related to detected spikes and their firing 24
  • 30. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY rates. These phenomenon support the fact that the measured signals are less noisy and there are more neural activities present in these two recording sessions. Table 4.4: List of selected attributes by Correlation Feature Selection (CFS) methods. Dimension reduction methods applied on Min-Max and Zero-Mean Normalized dataset of the Day 12th of iv-vivo recording. Complete descriptions of the attributes in appendix A. CFS on Min-Max and Zero-Mean Normal- ized Data Median STD RMS Signal SNR tf 3.5 sr 1ms SNR tf 3.5 sr 2ms SNR tf 5 sr 2ms AFR 20ms tf 3.5 sr 1ms AFR 20ms tf 3.5 sr 2ms AFR 20ms tf 5 sr 2ms MFR 500ms tf 3.5 sr 1ms Figures 4.12 and 4.13 depict the data distribution of the selected attributes by Correlation Feature Selection (CFS) method from both recording sessions for Zero-Mean normalized data. The attributes se- lected based on the result of the Correlation Feature Selection (CFS) method applied to the first recording session. Comparing the figure 4.12 to 4.13 we see they have more or less same distribution. This simi- lar feature distribution indicates that the same sort activities were present during both recording session. Figure 4.14: The scattered diagram for selected by CFS method from first session of day 12th of the in-vivo. The X-axis is sample in the, 120 sample all together. The Y-axis denotes the attribute value after Zero-Mean normalization. The Complete description of the attributes are available in appendix A. 25
  • 31. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY Figure 4.15: The scattered diagram for selected by CFS method from second session of day 12th of the in-vivo. The X-axis is sample in the, 120 sample all together. The Y-axis denotes the attribute value after Zero-Mean normalization. The Complete description of the attributes are available in appendix A. 4.5.1 Prediction Results For Trained Models The prediction results for four supervised learning algorithms are presented in table 4.5. The obtained results show that Support Vector Machine and K-Nearest Neighbour algorithms perform on both normal- ized data better than two other algorithms. Also, we can see that the after normalization and attribute selection simple K-Nearest Neighbour algorithm reaches the prediction accuracy as good as sophisticated algorithms like Support Vector Machine. Table 4.5: Test results for classification algorithms on Mim-Max and Zero-Mean normalized data using Correlation Feature Selection (CFS) method. The dataset is provided from second recording sessions of the day 12.02.2010. It contains 8 class labels, here electrode numbers on the probe, and 120 samples all together. All algorithms trained using 10 fold cross-validation. This result achieved using subset of attributes presented in the table 4.4. Algorithm Accuracy on Zero- Mean Dataset Accuracy on Min- Max Dataset Parameter Specification ID3 73.33% 73.33% K-NN 89.16% 90% K = 3 and inverse distance weighing SVM 90% 90% C = 250007 and γ = 0.01 Random Forest 83.33% 81.66% trees numbers = 10 We performed the same experiment as above on the same dataset but with the complete set of features. Table 4.6 presents the prediction results for our selected classifiers on the two Min-Max and Zero-Mean normalized data. Comparing results in table 4.5 and 4.6 is quite revealing in several ways. First, it shows the tree based algorithms are almost invariably regarding data normalization. Second, The performance of all classifiers except Iterative Dichotomiser 3 (ID3) boosted using Correlation Feature Selection (CFS) method. Furthermore, since Iterative Dichotomiser 3 (ID3) uses pruning mechanism to remove unrelated attributes by the time of the tree construction its results still invariant to Correlation Feature Selection (CFS) method. However, using FSC feature selection could help other classifiers deal 26
  • 32. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY with redundant and unrelated attributes away better than decision tree. As it, obvious simple algorithm like K-Nearest Neighbour outperform Iterative Dichotomiser 3 (ID3) and Random Forest algorithm in this case and reaches the same performance as Support Vector Machine. Table 4.6: Test results for classification algorithms on Mim-Max and Zero-Mean normalized data using complete sets of features. The dataset is provided from second recording sessions of the day 12.02.2010. It contains 8 class labels ,here electrode numbers on the probe, and 120 samples all together. All algorithms trained using 10 fold cross-validation. These results achieved using complete set of attributes presented in the appendix A. Algorithm Accuracy on Zero- Mean Dataset Accuracy on Min- Max Dataset Parameter Specification ID3 73.33% 73.33% K-NN 81.66% 85% K = 3 and inverse distance weighing SVM 86.66% 83% C = 250007 and γ = 0.001 Random Forest 77.33% 74.16% trees numbers = 10 The following two graph, figure 4.14 and 4.15 show the precision-recall analysis for these four learning algorithms on the both Min-Max and Zero-Mean normalized datasets. In both of the illustrations show that Support Vector Machine and K-Nearest Neighbour outperform two other algorithms and basically have higher precision values. Precision is the fraction of the predicted instances which are relevant to original class and recall is the fraction of the relevant instance which are predicted. The higher ratio of precision to recall means a better and more accurate prediction has achieved. Figure 4.16: The precision-recall analysis computer from Min-Max normalized data. Each sample shows precision-recall ratio for individual class labels. In the graph for each algorithm, we expect to see eight samples, one sample per class. Since some of the values are the same they overlay on each other and not fully visible in the graph. Here we could see Support Vector Machine and K-Nearest Neighbour algorithms have higher precision than Random Forest and Iterative Dichotomiser 3 (ID3). The reason lay on the boosting effect of the normalization on the dataset and CFS attribute selection. 27
  • 33. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY Figure 4.17: The precision-recalls analysis computer from Zero-Mean normalized data. Each sample shows precision- recall ratio for individual class labels. In the graph for each algorithm, we expect to see eight samples, one sample per class. Since some of the values are the same they overlay on each other and not fully visible in the graph. Here we could see Support Vector Machine and K-Nearest Neighbour algorithms have higher precision than Random Forest and Iterative Dichotomiser 3 (ID3). The reason lay on the boosting effect of the normalization on the dataset and CFS attribute selection. 4.5.2 Observation on Confusion Matrices Figures 4.16 and 4.17 depict the confusion matrices for the results of classification algorithms. The main diagonal of matrices indicates the accuracy of the classifiers and also the quality of channel identifica- tion. Higher the values on the matrix diagonal mean the accurate is the prediction of the classifier. Our experiment designed in a way that we first built the learning models using feature computed from the first recording session and then given features computed from the second recording session we tested our learning models. The channel configuration, the electrodes connected to the channels, remained the same in both recording sessions. Since there was no deliberate movement in the probe position we expected that activity of each channel to be predicted to its original channel. In the other words, here we tried to define ground truth for each channel based on its measure signal and later on use this information for identifying the activities which are measured in other recording sessions. 28
  • 34. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY (a) Random Forest (b) Support Vector Machine (c) ID3 (d) 3-NN Figure 4.18: The confusion matrices for a) Random Forest, b) Support Vector Machine, c) ID3, and d) K-Nearest Neighbour algorithms on the Min-Max normalized dataset. The X-axis is original class labels, here electrode indexes on the probe shank connected to their specific channels. The Y-axis has the same values as the X-axis. The value of each cell indicates the number of the instances from the class label on X-axis predicted to of the label from Y-axis. Therefore, the higher value of the diagonal shows each class predicted to its original class label. (a) Random Forest (b) Support Vector Machine (c) ID3 (d) 3-NN Figure 4.19: The confusion matrices for a) Random Forest, b) Support Vector Machine, c) ID3, and d) K-Nearest Neighbour algorithms on the Zero-Mean normalized dataset. The X-axis is original class labels, here electrode indexes on the probe shank connected to their specific channels. The Y-axis has the same values as the X-axis. The value of each cell indicates the number of the instances from the class label on X-axis predicted to of the label from Y-axis. Therefore, the higher value of the diagonal shows each class predicted to its original class label. By looking at the confusion matrices in both figure 4.16 and figure 4.17 there are channels that could 29
  • 35. CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY be identified quite accurately by all classifiers i.e. channels 5 and 1 connected to electrode 45 and 46. On the other hand, there electrodes which in most of the algorithms are classified to their adjacent channels i.e. channels 3 and 4 connected to electrodes 144 and 143. Figure 4.18 shows the distribution of the SNR value for mentioned electrodes from test phase with 2 ms refractory time and threshold factor of 5 of natural data. As it is depicted channel 1 and 5 have relatively higher SNR value than channel 3 and 4. This indicates the fact that the quality of the signals which are measured by first two channels are higher and they contain lower noise level. Therefore, their computed features would be more separable and could have the better contribution to channel identification task. Figure 4.20: The SNR value 2 ms refractory time with threshold factor of 5 for channel 5,1,3 and 4 connected to electrodes 45,46,143 and 144. The data normalized using Zero-Mean normalization method. The Y-axis is number of samples, 15 samples all together, and X-axis is the value of the each samples. It is obvious that in confusion matrices, the higher the value on diagonal the more accurate the classification result is. Therefore, in the test session, the higher value on the diagonal means activity are measured by channels in the test dataset have same feature distribution as their measured signals in the training dataset. Although some of the channels due to lower quality in their measurements showing some similarities mostly to their adjacent channels, but there are channels in the test session, i.e. those which are connected to electrodes 43, 44, 45, 46, and 141 which showing high accuracy in prediction to their original class label from train session. Based this observation supports two ideas we earlier mentioned in the problem statement section. The first , by identifying each channel using its measured signal of former recording session we could define ground truth for each channel and choose next configuration of the electrode. It means if lose the signal quality in a particular recording session then we could select new channel configurations from those have shown better measurements by looking at this ground truth. The second , we could support the argument that there was no drift in the probe position between two consecutive recording sessions. Because if there was the drift in probe position between to session of recordings we would expect that channels in test session are classified to their adjacent channels from train session relative to shift direc- tion. It means if there was upward drift they will be classified to their downer channels located on the probe shank and if there was downward drift it would another way around. 30
  • 36. CHAPTER 5 SUMMARY 5.1 Conclusion In this work, we are dealing with the problem of channel identification in in-vivo recording using ”Neuro- Probe”. Solving this task could help us to efficiently select electrodes from high-density microarray and contribute to Electronic Depth Control (EDC) problem. It could provide us with ground truth for each electrode on the probe shank. Using this identification we can choose channel configuration which has shown high-quality signals and more detectable activities. Furthermore, it would be possible detect an unintended drift in the position of the EDC-Probe during long-term in-vivo recording and between dif- ferent recording sessions. There were four different steps in this work. In the first step, given a dataset recorded by EDC- Probe, we applied adaptive threshold spike detection algorithm and computed features of each recording channel. We computed and used average noise level (ANL) as an extra new feature relative to former approaches in this field. This feature has provided more information when the quality of the measured signals surpassed by high background noise activity. In the second step, for data pre-processing we applied Min-Max and Zero-Mean global normalization in order to have the same scale and better distribution for all the computed features. In addition to that, we applied correlation feature selection (CFS) and principal component analysis (PCA) for remove is irrelevant and redundant features and reduce the dimension of the feature vectors. The dimension reduction and normalization have boosted the performance of the classifiers. The result of the attribute selection gave us a notion about the quality of the measured signals. If measurements dominated by noise, selected features were those computed from signals itself and not from detected spikes. In contrast, in the presence of the neural activities attributes related to detected spikes were selected. In the third step, trained and validated various supervised machine learning algorithms i.e. K-Nearest Neighbour, Iterative Decision Tree, Support Vector Machine and Random Forest in conjunction with pre- processing methods and techniques for identifying each channel. Furthermore, we applied Grid-Search as a simple case of hyper-parameter optimization to increase the accuracy of Support Vector Machine (SVM). We trained the candidate algorithms with features computed from all recording sessions of the day 15.02.2010 containing measured signals of 152 different electrodes. The classification result has shown the possibility of identifying each channel up to 68% using Random Forest algorithm combined we Correlation Feature Selection (CFS) method. Also, it has shown that after normalization other classifiers like Support Vector Machine (SVM) and K-Nearest Neighbour (K-NN) could reach accuracy above 62%. This suggests that for channel identification it is possible to use a combination of normaliza- tion, Correlation Feature Selection (CFS) method and simple classifier like K-Nearest Neighbour (K-NN). In the fourth step for tracking down the neural activities between two consecutive recording sessions with the same channel configuration, We trained and test the combination of Correlation Feature Se- lection (CFS) method and both normalization techniques with four supervised learning algorithms. We 31
  • 37. CHAPTER 5. SUMMARY 5.2. FEATURE WORKS were able to reach almost 90% accuracy using Support Vector Machine (SVM) algorithm. Interestingly simple K-Nearest Neighbour (K-NN) algorithm performed at the same level as Support Vector Machine (SVM) did. It needs to be mentioned that although ID3 and Random Forest algorithms had high accu- racy in the training phase but in the test phase they fell behind two other algorithms. The observation on confusion matrix, precision-recall analysis, and feature distribution showed there was no drift in probe position between two sessions. In addition to that, it has shown the neural activity between various recording sessions are traceable using our approach. 5.2 Future Works • To provide strong support for approach regarding channel identification and detecting drift in the probe position during in-vivo recording we need to provide a better data sets and recordings. Then we could study the neural activity better. Having data set which contains long-term recording with the same channel configuration gives us a chance to train our learning algorithm better and observe the result of the test session to see the possibility of detecting drift in the probe position. • Regarding identifying each channel in a particular recording session we could try not only identify one channel but also a group of channels. In this case, we could assume a tetrode, four channel, or two adjacent channel and train our learning algorithm by their extracted feature and try to identify their activity in the test session. To do this we would need to find out which of pairwise channels are recording from the same neuron simultaneously using signal similarity measurement algorithms. 32
  • 38. APPENDIX A LIST OF FEATURES AND THEIR DESCRIPTIONS A.1 All Features List Feature Name Description Min The minimum peak value of the measured signal. Max The maximum peak value of the measured signal. Mean The mean value of the measured signal. Median The Median value of the measured signal. STD The standard deviation (STD) of the measured signal. RMS The root mean square (RMS) of the measured signal. ANL The average noise level (ANL) of the measured signal. For computing noise level the time window of 50 ms is used. SNR tf 3.5 sr 2 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. MFR 20 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. AFR 20 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. MFR 100 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. AFR 100 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. MFR 500 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. 33
  • 39. APPENDIX A. FEATURE LIST A.1. FEATURE LIST AFR 500 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. MFR 10 s tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. AFR 10 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. SNR tf 3.5 sr 1 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. MFR 20 ms tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. AFR 20 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. MFR 100 ms tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. AFR 100 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. MFR 500 ms tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. AFR 500 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 2 ms. MFR 10 s tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. AFR 10 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time window is 1 ms. SNR tf 5 sr 2 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. MFR 20 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. AFR 20 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. MFR 100 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. 34
  • 40. APPENDIX A. FEATURE LIST A.1. FEATURE LIST AFR 100 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. MFR 500 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. AFR 500 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. MFR 10 s tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. AFR 10 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. SNR tf 5 sr 1 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. MFR 20 ms tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. AFR 20 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 20 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. MFR 100 ms tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. AFR 100 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 100 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. MFR 500 ms tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. AFR 500 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 500 ms. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 2 ms. MFR 10 s tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. AFR 10 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 10 s. For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time window is 1 ms. Table A.1: List of all computed features and their descriptions. Note that all the attributes are computed from one segment of each recording session which contains 10 s of measured signals. 35
  • 41. BIBLIOGRAPHY [1] Miguel AL Nicolelis. Methods for Neural Ensemble Recordings. Boca Raton (FL): CRC Press., Upper Saddle River, NJ, USA, 2008. [2] Herc p.Neves, Tom Torfs, Refet F.Yazicioglu, Junaid Aslam, Arno A.Aarts, Patrick Merken, Patrick Ruther, and Chris Van Hoof. The neuroprobes project:a concept for electronic depth control. Annual International Conference of the IEEE Engineering in Medicine and Biology Society EMBS 2008, 261:1857–1857, 2008. [3] K.Seidl, H.Herwik, Y.Nurcahyo, T.Torfs, M.Keller, M.Schuettler, H.Neves, T.Stieglitz, O.Paul, and P Ruther. Cmos- based high-density silicon micro-probe array for electronic depth control in neural recording. 22nd Int. MEMS Conf, 261:232–5, 2009. [4] J. Ji and K.D. Wise. An implantable cmos circuit interface for multiplexed microelectrode recording arrays. Solid-State Circuits, IEEE Journal of, 27(3):433–443, Mar 1992. [5] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009. [6] M.Abeles and M.Goldstein. Multispike train analysis. IEEE 1977, 65:762–773, 1977. [7] I.Bankman, K.Johnson, and W.Schneider. Optimal detection classification and superposition resolution in neural waveform recordings. IEEE Trans Biomed Eng, 40:836–841, 1993. [8] Sahani M.Latent. variable models for neural data analysis. PhD Dissertation Pasadena, 1999. [9] KH.Kim and SJ.Kim. Neural spike sorting under nearly 0-db signal-to-noise ratio using nonlinear energy operator and artificial neural network classifier. IEEE Trans Biomed Eng, 47:1406–1411, 2000. [10] S.Mukhopahdyay and GC.Ray. A new interpretation of nonlinear energy operator and its efficacy in spike detection. Trans Biomed Eng, 45:180–187, 1998. [11] L.Traver, C.Tarin, P.Marti, and N.Cardona. Adaptive threshold neural spike detection by noise-envelope tracking. Electron, 43:1333–1335, 2007. [12] RJ.Brychta, S.Tuntrakool, and M.Appalsamy et al. Wavelet methods for spike detection in mouse renal sympathetic nerve activity. IEEE Trans Biomed Eng, 54:82–93, 2007. [13] S.Kim and K.Kim. A wavelet-based method for action potential detection from extracellular neural signal recording with low signal-to-noise ratio. IEEE Trans Biomed Eng., 50:999–1011, 2003. [14] Z.Nenadic and JW.Burdick. Spike detection using the continuous wavelet transform. IEEE Trans Biomed Eng., 20:74–87, 2005. [15] I.Obeid and PD.Wolf. Evaluation of spike detection algorithms for a brain-machine interface application. IEEE Trans Biomed., 51:905–911, 2004. [16] Detection of Active Brain Regions for Automatic Electrode Selection Using a Machine Learning Approach. Bachelor thesis. Master’s thesis, 2010. [17] George W.Fraser and Andrew B.Schwartz. Recording from the same neurons chronically in motor cortex. J Neuro- physiol., 107:1970–1978, 2012. [18] Edwin M. Maynard, Craig T. Nordhausen, and Richard A. Normann. The utah intracortical electrode array: A recording structure for potential brain-computer interfaces. Electroencephalography and Clinical Neurophysiology, 102(3):228 – 239, 1997. [19] Ali Shawkat and Kate A.Smith-Miles. Improved support vector machine generalization using normalized input space. Advances in Artificial Intelligence., 4304:362–371., 2006. [20] Teunis van Beelen. GCC: GNU EDFbrowser a free, opensource, multiplatform, universal viewer and toolbox in- tended for, but not limited to, timeseries storage files like eeg, emg, ecg, bioimpedance, etc. http://www.teuniz.net/ edfbrowser/, 2010–2013. [21] Isabelle Guyon and Andr´e Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157– 1182, March 2003. [22] Mark A. Hall and Geoffrey Holmes. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. on Knowl. and Data Eng., 15(6):1437–1447, November 2003. 36
  • 42. BIBLIOGRAPHY BIBLIOGRAPHY [23] M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis, 1:131–156, 1997. [24] Fengxi Song, Zhongwei Guo, and Dayong Mei. Feature selection using principal component analysis. In System Science, Engineering Design and Manufacturing Informatization (ICSEM), 2010 International Conference on, volume 1, pages 27–30, Nov 2010. [25] Mark A Hall. Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, 1999. [26] Mark A. Hall and Lloyd A. Smith. Feature subset selection: a correlation based filter approach. In 1997 International Conference on Neural Information Processing and Intelligent Information Systems, pages 855–858. Springer, 1997. [27] Songbo Tan. Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 28(4):667 – 671, 2005. [28] J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986. [29] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. [30] Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273–297, 1995. [31] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. International joint conference on Aritificial Intelligence., pages 1137–1143, 1995. [32] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms. Technical report, 2012. [33] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. A practical guide to support vector classification. 2010. [34] Karsten Seidl, Tom Torfs, Patrick A De Mazi`ere, Gert Van Dijck, Richard Csercsa, Balazs Dombovari, Yohanes Nurcahyo, Hernando Ramirez, Marc M Van Hulle, Guy A Orban, et al. Control and data acquisition software for high- density cmos-based microprobe arrays implementing electronic depth control. Biomedizinische Technik/Biomedical Engineering, 55(3):183–191, 2010. [35] Teunis van Beelen. GCC: GNU EDFlib edflib is a programming library for c/c++ to read/write edf+/bdf+ files. http://www.teuniz.net/edflib/index.html, 2010–2013. 37