edc_adaptivity

Interdisciplinary Project - Master of Computer Science
Evaluation Of Learning Algorithms For
Tracking Neuronal Signals With EDC
Probes
Ramin Zohouri

Autonomous Intelligent Systems Laboratory
Department Of Computer Science
Microsystem Material Laboratory
Department Of Microsystem Engineering
University Of Freiburg
Author : Ramin Zohouri
Master’s Degree Program in Computer Science
Interdisciplinary Project
Evaluation Of Learning Algorithms For Tracking Neuronal Signals With EDC Probes
Examiner: Prof. Dr. Wolfram Burgard and Prof. Dr. Oliver Paul
Supervisor: Dr. Barbara Frank
II

ABSTRACT
CMOS-integrated electronic depth control (EDC) probes are high-density Microelectrode arrays which
are used for monitoring the activity of the neuron of interest in the ensemble of neurons. EDC probes are
capable of selecting channel configurations in order to measure signals in different regions along probe
shank. However, probe and neural activity may shift in the brain and it is necessary to keep track of
activities in the brain by choosing next channel configuration.
In this project, we try to identify recording channels and track down their activities using features
extracted from their measured signals. Given feature vectors of each channel we first apply pre-processing
step for normalization and dimension reduction. Then we employ different supervised machine learning
algorithms to identify channels and find out which algorithm is more appropriate for this task. Then we
test this already trained models with another recording session with the same channel configurations.
The prediction result shows that it is possible to track down neural activity between different recording
sessions. Furthermore, off-diagonal values on the confusion matrix of the test phase show that we may
have the probe or activity shift between consecutive recording sessions.
04.2013 - 04.2014
Ramin Zohouri
III

ACKNOWLEDGMENT
I have taken eﬀorts in this Project. However, it would not have been possible without the kind support
and help of many individuals and organizations. I would like to extend my sincere thanks to all of them.
I am highly indebted to Prof. Dr. Wolfram Burgard for his guidance and supervision as well as for
providing necessary information regarding the project. I would like to express my gratitude towards
Prof. Dr. Oliver Paul for his kind co-operation and encouragement which help me in the completion of
this project.
Furthermore, I would like to thank Dr. Barbara Frank for the useful comments, remarks and engage-
ment through the learning process of this interdisciplinary project. My thanks and appreciations also
go to Dr. Patrick Ruther in developing the project and EDC++ project members who have willingly
helped me out with their abilities.
IV

CONTENTS
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Related Works 4
2.1 Spike Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Detecting active brain region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Recoding from same neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Channel Identification 6
3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Spike detection and feature computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Normalization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.2 Attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Machine learning for channel identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 K-Nearest Neighbour (K-NN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.2 Iterative Dichotomiser 3 (ID3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.3 Random forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.4 Support Vector Machine (SVM) Algorithm . . . . . . . . . . . . . . . . . . . . . . 12
3.4.5 Corss-validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.6 Hyper-parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Tracking down neural activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Experiments and Results 14
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Spikes and features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Data normalization and attribute selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Training and validation of classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 Tracking down neural activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5.1 Testing trained models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5.2 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Summary 31
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Feature works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A Feature List 33
A.1 Feature list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Bibliography 35
1

CHAPTER 1
INTRODUCTION
1.1 Motivation
Understanding of brain functionality and the complex interaction of the large neural network with huge
numbers of neurons is one of the most challenging research fields in the neuroscience. Development of
appropriate tools opens a new perspective in research and application, e.g. in neural prostheses, as well
as for diagnoses and therapy of neurodegenerative diseases including Alzheimer, Parkinson and epilepsy.
Recordings of single neuron activity within an ensemble of neurons are required for a basic understand-
ing of neural processes [1]. By this aim within The European Project NeuroProbes a new high-density
electrode array for recording with the high spatial resolution was introduced and successfully tested for
the first time in vivo experiments [2, 3]. These probes contain 188 electrodes configured in 2 rows. Di-
rectly integrated CMOS multiplexing units on the probe shafts enables a drastic increase in the number
and density of electrodes in NeuroProbe to compare to existing devices [4]. The density of such arrays
makes it possible to switch between the electrodes and reach a close proximity between the neuron of
interest and the recording electrode. In this context the concept of switching between individual micro-
electrodes of the same shaft, without the need to reposition either the shaft or the entire probe is called
electronic depth control (EDC).
EDC allows us to switch between electrodes and scan their signals along the probe shank and select
those with higher signal quality. However during long-term in-vivo recording there are moments which
the current configuration of the electrodes is not able to record qualified signals. One reason for losing
qualified signals might be the occurrence of a drift in probe position. This drift may occur for several
reasons e.g. inflammation in the brain tissue, human interaction or unexpected animal movement. Such
a drift causes us to lose track of an activity of interest which is recorded earlier or in previous sessions.
Furthermore, for discriminating a single neuron and studying its behaviour in long term it is necessary
to make sure that the probe remains in the starting configuration and a particular channel and records
from the neuron of interest. In addition to that, Having prior information about quality and properties
of signals recorded by each channel makes it possible to select the next configuration more efficiently and
accurately which provide us with high quality and less noisy signals from neural activities. Therefore,
we need to be able to identify each recording channel(each channel is assigned to one electrode during
the recording).
1.2 Problem Statement
In this work, we try to identify characteristics of each recording channels. For this purpose, first, we
compute sets of features from measured signals of each channel. Then we applied supervised machine
learning techniques to identify recording channels based on the computed features. In the This con-
text, class labels are channel IDEs connected to particular electrode on the probe and their activity
2

CHAPTER 1. INTRODUCTION 1.3. OUTLINE
represented by sets of extracted features from their measured signals. There are three main challenges
here. first computing features from the measure signals and choosing relevant methods to do such a
computation. Second, select an appropriate supervised machine learning algorithm and select suitable
number of features in order to get maximum classification accuracy given a learning algorithm. Third,
providing series on analytical approaches to interpret the classification results and make a conclusion
about channel identification and drift occurrence.
It can thus be suggested that such an identification makes us able to track down a particular activity
during and between a long term in-vivo recording sessions and deal with the probe drift from its original
position. In the other words, if we lose a signal quality in a recording session we could use this prior
information and chose electrodes which they have shown higher signals quality for next configuration.
Furthermore, unintended movement in the probe position between different recording sessions with same
recording configuration could be detectable. If there was the drift in the probe position between recording
session we could observe that a particular activity now identified to be in a new channel below or above
its original channel relative to drift direction.
1.3 Outline
The rest of this work is structured as follows. In the next chapter, we discuss related work about
Electronic Depth Control (EDC) in intra-cortical recordings and spike detection and detecting active
brain regions and finally recording from the same neuron in motor cortex. In Chapter 3, we discuss in
fine detail our approach for spike detection and feature extraction from in-vivo recording data sets and
classification and channel identification using these features in order to track down the neural activities
between different recording sessions and in the long term in-vivo recordings. In Chapter 4, we present
the results of our chosen approach. Finally, in chapter 5 we summarize what we have achieved in this
work and discuss how this work can be extended.
3

CHAPTER 2
RELATED WORKS
2.1 Spike Detection Algorithms
In the extracellular recording, a spike or action potential is a short lasting high amplitude signal fired
by a neuron. Spikes are produced by rising and falling potential of the neuron cell membrane. During
neural activity a neuron fires spikes by particular amplitude, shape, and different rates. Each neuron
has spikes of a characteristic shape and firing rates, which is mainly determined by the morphology of
their dendrite trees and the distance and orientation relative to the recording electrodes [5]. In order
to extract features from recordings first we need to extract recorded spikes of each channel. There are
two common types of spike detection algorithms available. First supervised algorithms which need user
invention such as window discrimination [1], principal components analysis [6] and matched filtering
[7]. However, using supervised algorithms would be very tedious when we have a comb of multi-array
electrodes, and we need to adjust the setting for each channel separately. The second common type of
spike detection algorithms are unsupervised category. These algorithms require no user intervention e.g.
algorithms based on amplitude detection [8],non-linear energy detection [9, 10, 11] and wavelet-based
detection [12, 13, 14]. In a study by Obeid and Wolf [15], spike detection algorithms have been compared
taking into account their accuracy and their computational cost. It was found that taking the absolute
value of the neural signals before applying a threshold in combination with a refractory period is just
as effective for spike detection as more elaborate energy based schemes. Therefore, in this work we used
absolute value of signal and adaptive-threshold spike detection algorithm.
2.2 Detection of Active Brain Regions Using a Machine Learn-
ing Approach
Given we have all detected spikes for each channel we need to compute sets of essential features from
them and use them to identify properties of a recording channel. There were approaches by Ramirez
et al. [16] which applied machine learning algorithms in order to classify activity of each channel and
find out which channel records a single unit activity and which one multi-unit activities. In their work
first they have trained a learning algorithm using feature extracted from detected spikes of labelled data
and then have performed prediction on the unlabelled data. They present in their work which kind of
features could possibly be extracted from detected spikes and which combinations of those features would
lead to more accurate classification result. Their goal in the short-term was to develop algorithms that
assist the neuroscientists in detecting active brain regions. And their long-term perspective was to have a
smart neural recording array which allows for the finding and maintenance of high quality neural signals
through the fully automatic selection of many electrodes in active brain regions. The kind of features
they have used have two main categories. First features computed directly from measured signal itself
i.e. Min, Max, Mean, Median, Standard Deviation (STD) and Root Mean Square (RMS). Second kind
of features that are computed from detected spikes i.e. Signal to Noise Ratio (SNR), average Firing
Rate and Maximum Firing Rate of spikes. These second category of features have different variation
regarding refractory time for spikes i.e. 1 ms or 2 ms and different firing rates and average firing rates
4

CHAPTER 2. RELATED WORKS 2.3. RECODING FROM SAME NEURON
time window i.e 20 ms, 100 ms, 500 ms, 10s. In our work we use all possible combinations of these
features i.e. 43 features all together, to get highest accuracy in classification result. The major difference
between approach by Ramirez et al. [16] and our goal is they tried to classify the activity types i.e single
unit activity (SUA), multi-unit activity (MUA), and noise activity (NA) but we trying to classify the
recording channels.
2.3 Recording From the Same Neurons Chronically in Motor
Cortex
Neuro-Biologists during chronic extracellular recordings frequently have observed similar activity recorded
on the same electrode from day to day. Occasionally a single neuron has had some unusual characteristic,
such as a distinctive waveform or some unusual and obvious firing property, that makes it clear that this
same neuron was present in multiple sessions. The possibility that some neurons may be represented
multiple times in a series of recording sessions creates a problem and an opportunity. Separately recorded
neurons may not actually represent independent sources of data, so statistical tests that assume each
unit is an independent sample may not be valid. However, If the same neuron could be identified as
such through multiple sessions, it would be possible to combine data and thereby estimate the firing
properties of that neuron with greater confidence. Fraser and Schwartz [17] have developed a new metric
of unit identity using pairwise cross-correlograms between neurons in a simultaneously recorded pop-
ulation. It has provided unit identification information comparable to that based on wave shape. By
combining this metric with wave shape, autocorrelation shape, and mean firing rate, they were able to
clearly identify whether two separately recorded units represent the same or different underlying neurons.
There are similarities between the goal of our project and work of Fraser and Schwartz [17]. They have
used feature vector consist of firing rate and wave from of spikes to represent the activity of each channel
and then used these features to classify the activity of each neuron. The had some strong assumption
that by using Utah micro-array [18], which has electrode pitch of 400µm, each electrode would record
from different neurons. In the other words, they assume it is less likely that two adjacent electrodes
record from the same neuron. This assumption makes it possible for them to use waveform of the spikes
as part of their features to track down that the activity of a particular neuron in long-term recordings
and between different recording sessions. However, in ECD-Probes with a high electrode density, it is
more likely that some adjacent electrodes record from the same neuron due to small pitch size i.e. 40µm.
Therefore, in our work we use different feature vector to represent the activity of each channel and then
we apply the supervised learning algorithms to identify channels and track down their activity of each
channel.
5

CHAPTER 3
CHANNEL IDENTIFICATION
3.1 Approach
The main purpose of this project is channel identification for tracking a neuron of interest between dif-
ferent recording sessions. To do this we design a pipeline with four major steps which leads up to desired
conclusion. The diagram in the figure 3.1 shows this four major step.
Spike Detection And
Feature Extraction
Data Pre-Processing : Normal-
ization And Attribute Selection
Supervised Machine Learning
For Channel Identification
Tracking Down Neural Activities
Figure 3.1: The diagram shows project pipeline. First, detecting spikes and computing features from measured signals
of each channel. Second, applying different normalization techniques and attribute selection methods on the provided
dataset. The dataset here is computed feature vectors for all channels. Third, training and evaluating the performance of
the different classifiers in conjunction with attribute selection methods and normalization techniques. This would enable
us to identify a particular channel based on its computed features. Fourth, training and testing classifiers with the dataset
of consecutive recording sessions. This will allow us track down neural activities between different recording sessions and
detect unintentional drift in the probe position.
Based on the diagram in figure 3.1, first, we need to characterize each channel given their measured
signals. Each channel could be represented by sets of features extracted from its measured signals. As
we mentioned earlier in the previous chapter, there were different methods for detecting action potentials
or spikes and feature extraction [16] in order to classify the activity type of each recording channel using
supervised machine learning algorithms. We employ same methods to extract features.
After computing features for each recording channel, we try some data pre-processing steps i.e. nor-
malization techniques and attribute selection methods in order to deal with the noisy data and increase
the prediction accuracy of the classification. Then we apply machine learning algorithms to find out who
6

CHAPTER 3. CHANNEL IDENTIFICATION 3.2. SPIKE DETECTION AND FEATURE COMPUTATION
well we could identify each channel given their computed features.
Next step is to use supervised machine learning algorithm in combination with pre-processing steps to
identify a channel given its computed features. Therefore we need to train and evaluate the performance
of the different classifiers. This will give us a notion of feasibility of the channel identification problem.
Finally, by using supervised machine learning algorithm we would be able to track down the activity
of different channels. Given that we have two recording sessions with same channel configuration, the
idea is to train a learning algorithm by dataset provided from the first recording session. Then, test
trained models with dataset provided from the second session. The prediction result shows how well the
neural activities are traceable and whether or not we had an unintended movement on the probe position.
An efficient implementation of supervised learning algorithms is available in the Weka machine learn-
ing tool [19]. In this work we implement a light framework for detecting spikes and computing features
from them using C++ programming language. In the result section, we observe the performance of each
of the introduced algorithms to find out which one gives more accurate results and better identification.
3.2 Spike Detection Algorithm and Feature Computation
To compute features for each channel we need to extract spikes of measured signals of that channel. Here
each channel connected to a specific electrode on the probe shank and these connection adjustable for
each particular recording session. Figure 3.2 shows 10 seconds of raw measured signal for eight channels
in EDF format. Afterward, all recordings are filtered using a bandpass filter between frequencies of 500
Hz and 5000 HZ. Then they could be processed in order to calculate the attributes to characterize the
recorded signals.
Figure 3.2: This graph shows 10 second of raw recordings of neural activity for eight different channels before filtering.
Signal units are in mV (plot using EDFBrowser [20]).
Figure 3.3 shows the same recorded signals depicted in figure 3.2 after it has been filtered.
7

CHAPTER 3. CHANNEL IDENTIFICATION 3.2. SPIKE DETECTION AND FEATURE COMPUTATION
Figure 3.3: This graph shows 10 second of filtered signals for eight different channels, filtered by band-pass filter between
500 Hz and 5000 Hz. Signal units are in mV (plot using EDFBrowser [20]).
In this work, we apply adaptive threshold spike detection algorithms [15]. The work follows for spike
detection and computing features is done similarly as the work we introduced in previous chapter [16].
The idea is first to estimate the background noise for the time window of 50 ms. Then detect all signal
samples whose their absolute value exceeds this noise level by a factor of 3.5 or 5. After detecting spikes
we could compute the signal to noise ratio (SNR) value of each channel in the time window of 10s. First
RMS of each spike is calculated using the signal 0.5 ms before the peak of the spike and 1 ms after the
spike. Then the RMS of all spikes is averaged and the RMS of the noise is calculated, where noise is the
portion of the signal excluding the founded spikes.
At last the SNR is calculated as follows:
SNR = 20 · log10
¯RMSspikes
RMSnoise
(3.1)
In order to compute appropriate features, we use different combinations of the threshold i.e. 3.5 and
5 and refractory time for spike detection i.e. 1 ms and 2 ms. This method makes it possible to detect
spikes with 4 different combinations. For each one of these combinations, the maximum firing rate in
intervals of 20 ms, 100 ms, 500 ms and 10s and their average value were calculated and defined as at-
tributes, which, taking into account the 4 different combinations of parameters for the SNR calculation,
produces 36 different attributes. For example, if using a 3.5 threshold multiplier with a 1 ms spike re-
fractory window defines nine attributes for the different maximum and average firing rate intervals, then
using a 2 ms window instead of a 1 ms window yields nine new attributes and so on. There are other
features i.e. minimum (Min), maximum (Max), mean, median, standard deviation (STD), root means
square (RMS) Signal, average noise level (ANL) which we use them to apply classification algorithms
and channel identification. In comparison to [16] the Average Noise Level (ANL) is a new feature which
is computed from the average value of the noise level value in one segment of measured signals , 10 s of a
particular recording session. The value of ANL could represent the quality of the measured signals and
in the experiment section, we will show how ANL value is used as a good feature for classification. We
compute all 43 attribute in our feature vector and each feature vector computed in the time window of
10 s which is one segment of the measured signals.
8

CHAPTER 3. CHANNEL IDENTIFICATION 3.3. DATA PRE-PROCESSING
3.3 Data Pre-Processing
Each element in computed features vectors from extracellular recordings and their detected spikes has
its own range. Since some of the supervised machine learning algorithms are using similarity measures
between feature vectors for classification, features better to have the same scale. Data normalization
techniques are common in machine learning in order to deal with this problem. Basically, global nor-
malization techniques are essential preprocessing part for many machine learning algorithms and boost
their performance.
One more problem here is the number of computed features with our employed approach i.e. 43
feature per sample per class. This is a high dimensional feature vector and would make the classification
task difficult especially when we have some noisy and irrelevant features, computed from noisy measured
signals. In fact, due to noisy measurements or high background noise activities and an existence of
the artifacts in the measured signals, some of the computed features are irrelevant or redundant. This
phenomenon could dramatically reduce the prediction accuracy of the classifiers. However, There are
exist common attribute selection and dimension reduction methods in the field of machine learning for
overcoming these problems. In the following, we briefly explain our candidate methods and techniques
for data normalization and attribute selection.
3.3.1 Normalization Techniques
In order to increase the performance of the supervised learning algorithm we apply normalization tech-
niques on our datasets. Since the computed features have different scales and ranges and this preprocess-
ing steps most of the time shows significant improvement on the performance of the learning algorithm
by reducing the correlations between data samples and scaling them into similar range. In this work we
applied two common types of global normalization techniques which are used frequently [19] in machine
learning algorithms more specifically in Support Vector Machine (SVM) :
• Min-Max Normalization.
´D(i) =
D(i) − Min(D)
Max(D) − Min(D)
∗ (U − L) − L. (3.2)
Here ´D is normalized vector and D is the natural vector, Min(D) is the minimum natural value, Max(D)
is maximum natural value, and U and L are the upper and lower band of the scale range usually between
[0:1] or [-1:1].
• Zero Mean Normalization
´D(i) =
D(i) − ¯µ
σ
. (3.3)
Here ´D is normalized vector and D is the natural vector, µ mean of the natural data, and σ is the standard
deviation of the natural data.
3.3.2 Attribute Selection
High-dimensional feature vectors do not always increase the prediction accuracy of the supervised learn-
ing algorithms. In machine learning feature selection also known as variable selection, attribute selec-
tion or variable subset selection is a technique for reducing the dimensionality of the feature vectors
[21, 22, 23]. Feature selection methods could lead to i) improvement in the prediction performance
of the predictor, ii) faster and more cost-effective predictors, iii) provide the better understanding of
the process that generates the data. In classification problems, especially when there are few sample
data with high-dimensional feature vectors there are possibilities that we have irrelevant and redundant
features. Redundant features are those provide no more information than currently selected features.
Irrelevant features are those provide no useful information in any context. In dealing with extracellular
neural activities, there are necessities to use feature selection methods due to an existence of background
noise activities. High background noise level has the negative impact on the performance of the spike
detection algorithms and quality and quantity of the computed features. Therefore, there is likelihood to
have redundant and irrelevant features and attribute selection could deal with this problem. There are
two common attribute selection methods which are used widely in machine learning field for reducing
the dimensionality of the feature vectors. One is Principle component analysis [24, 23] and second is
9

CHAPTER 3. CHANNEL IDENTIFICATION 3.3. DATA PRE-PROCESSING
Correlated feature selection (CFS) [25, 26] .
Principle Component Analysis (PCA)
Principle Component Analysis (PCA) is a statistical procedure which orthogonally transforms input
data with some dimension n to sets of linearly uncorrelated data with same or lower dimension, m, using
variables called principal components. Mathematically speaking, each principle component represents a
feature from input data in the new coordinate system. The highest rank among the principal components
goes to the feature with the highest variance, and that component lies on the ﬁrst coordinate of the new
coordinate system and the second rank on the second coordinate and so on. In fact, PCA is not a feature
selection but a feature extraction method. The new attributes are obtained by a linear combination
of the original attributes. Dimensionality reduction is achieved by keeping the m components with the
highest variance out of n original components. The common version of this method has [19] has following
steps :
• Compute the covariance matrix of the original training samples. Then solve all the eigenvectors
and eigenvalues.
• Rank attributes by their individual evaluations. Use in conjunction with attribute evaluators
(ReliefF, GainRatio, Entropy etc).
• Select m highest ranked features.
Correlation Feature Selection (CFS)
The other feature selection method we use in this work is Correlation Feature Selection (CFS) [25, 26].
CFS is a measure that selects a subset of features from original feature vectors in a way that the features
from subset are highly correlated to the class labels and uncorrelated with each other. CFS could ignore
irrelevant features because they have low correlation with the class labels. CFS also screen out redundant
features due to their high correlation with other features. A feature will be accepted if it predicts classes
in the area of the instance space which not already predicted by other features. Given a subset in feature
space S containing k features, CFS evaluates the subset based on the following ”merit” :
MS =
krcf
k + k(k − 1)rff
. (3.4)
where MS is a heuristic merit of the feature subset S containing k features, rcf is the mean feature-
class correlation (f ∈ S), and rff the average feature-feature correlation.
The numerator of the equation 3.4 is the indicator for predictive of a class given a set of features and
denominator shows the amount of the redundancy in among the features. Selecting all possible subsets
of features is very exhaustive and sometimes not feasible due to large number of attributes, in [25, 19]
there are experimental approaches to select heuristics search strategies :
• Forward selection begins with no feature and greedily add one feature at a time until no possible
single feature addition possible.
• Backward elimination begins with all features and greedily remove one feature at a time as long as
evaluation does not degrade.
• Best ﬁrst search starts either with no features or all features. It progresses forward and add features
or backward and remove features to or from sunset and has a stopping criterion.
Furthermore, there is tree variation of CFS [25, 19] each employing one of the following attribute
quality measures to estimate the correlations in equation 3.7 :
• CFS-UC uses symmetrical uncertainty to measure correlation.
• CFS-MDL normalized symmetrical minimum description length (MDL) principle to measure the
correlation.
• CFS-Relief uses symmetrical relief to measure correlation.
10

CHAPTER 3. CHANNEL IDENTIFICATION 3.4. MACHINE LEARNING FOR CHANNEL IDENTIFICATION
3.4 Supervised Machine Learning Algorithm For Channel Iden-
tification
We need to evaluate the effect of the normalization and attribute selection methods on the different su-
pervised learning algorithm. This will give us a notion about the feasibility of classification and channel
identification problem. By looking at validation result of the classifiers we could argue that how well each
classifier could identify each channel based on it computed features. Here we expect due to density and
geometry of the electrodes on the probe shank to have some similar activities in adjacent electrodes. In
the following, we explain about the four different classifiers i.e. K-Nearest Neighbour (K-NN), Iterative
Dichotomiser 3 (ID3), Random Forest and Support Vector Machine (SVM) and their different param-
eter settings. By comparing their results on our dataset we could select the most appropriate classifier
for our goal. In order to find out how well each classifier generalizes, we need to use cross-validation
technique. Furthermore, some of the supervised learning algorithms need precise parameter selection,
therefore, we use hyper-parameter optimization methods to increase the prediction accuracy of those
learning algorithms.
3.4.1 K-Nearest Neighbour (K-NN)
One of the learning algorithms we have selected is K-Nearest Neighbour (K-NN) [27]. The idea is to
classify an object based on the majority vote of its neighbors, with the object being assigned to its K
nearest neighbors. Each object is represented by its feature vectors and the algorithm uses a similarity
measure in order to find a nearest neighbor e.g. Manhattan distance or euclidean distance. There are
parameters and setting which improve the accuracy of classification e.g. weighting neighbors with their
relative distance and K number of neighbors. The k-nn algorithm is recommended because it is easy
to understand, simple to train and it gives an inside about the feasibility of our classification task.
However, the algorithm readily fooled by noise and irrelevant data and biased by the value of K and it is
computationally intensive for large data sets. By using appropriate nearest neighbor search algorithms
e.g. KD-Tree the K-Nearest Neighbour (K-NN) algorithm would be computationally tractable.
3.4.2 Iterative Dichotomiser 3 (ID3)
The second algorithm we used for classification is Iterative Dichotomiser 3 (ID3) [28]. Here the idea is
to split the dataset to subsets based on the selected attribute and add a non-terminal node to decision
tree and continue this process recursively on each subset. Terminal nodes represent the class label of
its branch. In terms of selecting attributes, we choose the one with largest information gain or smallest
entropy among non-selected attributes. The four main state in Iterative Dichotomiser 3 (ID3) algorithm
are :
• Calculate the entropy of every attribute using the data set S.
• Split the set S into subsets using the attribute for which entropy is minimum (or, equivalently,
information gain is maximum).
• Make a decision tree node containing that attribute.
• Recursively on each subset repeat previous three steps using remaining attributes.
We employ Iterative Dichotomiser 3 (ID3) algorithm because it treats each feature separately based
on a probabilistic approach. It builds the decision tree fast and uses the whole dataset to create the tree.
furthermore, its results are invariant to natural or normalized data. But, the Iterative Dichotomiser 3
(ID3) algorithm may face the over-fitting problem and be biased in favor of some of the attributes with
high information gain.
3.4.3 Random Forest Algorithm
The third supervised learning algorithm which we used is Random Forest [29]. The algorithm creates a
forest of decision trees in training time and outputs the class that is the mode of the classes outputted
by each individual tree. Given a sample set the algorithms grows each tree as follows :
11

CHAPTER 3. CHANNEL IDENTIFICATION 3.4. MACHINE LEARNING FOR CHANNEL IDENTIFICATION
• If the number of cases in the training set is N, sample N cases at random - but with replacement,
from the original data. This sample will be the training set for growing the tree.
• If there are M input variables, a number m << M is specified such that at each node, m variables
are selected at random out of the M and the best split on these m is used to split the node. The
value of m is held constant during the forest growing.
• Each tree is grown to the largest extent possible. There is no pruning.
The Random Forest algorithm follows almost the same core principle of Iterative Dichotomiser 3 (ID3)
algorithm but usually shows better performance and its results are virtually invariant to the normalized
or natural dataset.
3.4.4 Support Vector Machine (SVM) Algorithm
Our fourth candidate algorithm is Support Vector Machine [30] which is one the sophisticated supervised
learning algorithms. The idea of the Support vector Machine is to separate sample data in d dimen-
sional space using d-1 dimensional hyperplanes. There is the reverse relation between the distance of
the hyperplanes to sample points i.e. margin and the generalization error. The larger the margins are
the smaller the generalization error is. Base on that, here we are dealing with an optimization problem.
Whereas in the classification or regression problem mostly facing with non-linear data distribution,
Support Vector Machine (SVM) uses a kernel function to transform the data samples to same or higher
dimensional feature space which they are linearly separable. There is three common non-linear kernels
used for mapping the samples to higher dimensions:
• Polynomial (homogeneous):
k(xi, xj) = (xi · xj)d
. (3.5)
Here xi, xj are samples which represented by feature vectors and d is polynomial degree.
• Gaussian Radial Based Function:
k(xi, xj) = exp(−γ xi − xj
2
), for γ > 0. (3.6)
Here xi, xj are samples which represented by feature vectors and γ is the kernel coefficient.
• Hyperbolic Tangent:
k(xi, xj) = tanh(κxi · xj + c), for some (not every) κ > 0 and c < 0. (3.7)
Here xi, xj are samples which represented by feature vectors and κ is kernel coefficient.
Although Support Vector Machine is a very sophisticated supervised learning algorithm but it needs
careful selection of the model i.e. kernel types and parameters specification in order to retrieve a highly
accurate result. Empirical model selection is a very tedious and interminable task. Therefore, in the
following, we will explain some common methods to deal with this problem.
3.4.5 Cross-Validation
The cross-validation [31] is a technique to measure how well the predictive model will generalize indepen-
dent of the data that were used to train the model. In machine learning cross-validation measures how
the well trained model will perform in practice. Each model has one or more unknown parameters and
when the number of the sample small or number parameter is large the model will face the over-fitting
problem. One way the cross-validation technique deal with this problem is dividing the sample data
to K equal subsets (K-fold cross-validation) then use K - 1 subsets to train the data and 1 subset for
validation. The algorithm repeats same procedure (K times) until all the individual subsets being used
as validation sets. At the end, the K results from the folds can be averaged (or otherwise combined) to
produce a single estimation. The common values for K are 3,5,10 depending on the size of the training
data. In our task, we used K = 5 for all four different algorithms.
12

CHAPTER 3. CHANNEL IDENTIFICATION 3.5. TRACKING DOWN NEURAL ACTIVITIES
3.4.6 Hyper-Parameter Optimization
Hyper-parameter optimization is the problem obtaining generalization for a learning algorithm by choos-
ing a set of parameters. The idea here is to adjust different model parameters in order to minimize the
loss function on the training data. There are different approaches for hyper-parameter optimization e.g.
global parameter optimization using Gaussian Processes [32] or simple grid search [33]. In this work, we
used grid search fortune parameters of a particular model in order to increase the accuracy of it. The
idea is to set a range for each of parameter and a step size. Then go throughout all possible combinations
of parameters and create the model and train the model with them and find out which combinations
minimizes the loss function value or in the other words gives a higher accuracy.
In previously introduced supervised learning algorithms, support vector machine is the one which
demands hyper-parameter optimization because of a complexity of selecting the parameter and the wide
range of choices which need a mechanism to tune them. For support vector machine there mainly three
parameter need to be tuned i.e. kernel function, C constant (regularization parameter) and γ factor
(kernel multiplier).
3.5 Tracking Down Neural Activities
After finding out the best combination of the normalization techniques, attribute selection methods and
classification algorithms we try to track down neural activities between different recording sessions. Now
we have the notion how well we could identify each channel give its feature vector computed from its
measure signals. Therefore, we want to know how likely it is to identify same activity between different
recording sessions on the same channel or its adjacent channels. The idea is to identify each specific
channel in a particular recording session using data pre-processing and machine learning approach that
we mentioned earlier in this chapter. Then, compute sets features for one another recording session
with the same channel configuration as the we built the model with and using test trained model. Here
we try to predict the class labels for new of measured signals. By looking at the differences between
predicted class labels and actual class labels in the test result we could argue that how well the cur-
rent recording of particular channel is predictable based on the earlier measurements of the same channel.
To make this argument we provide precision-recall analysis and observed the classification error
results and confusion matrices in the test phase. The result of confusion matrix could show a particular
channel activity now appears most likely on the same channel or elsewhere. Due to the density of the
electrodes and geometrical position of them on the probe shank if there was subtle unintended movement
in the NeuroProbe position, the activity of a particular channel in the test phase would be classified to
its adjacent channel relative to drift direction(during recording each channel is connected to specific
electrode on the probe shank).
13

CHAPTER 4
EXPERIMENTS AND RESULTS
4.1 Dataset Of Extracellular Recording
For our experiment, we used a dataset from in-vivo recordings, performed in April 2009 and February
2010 at the Institute for Psychology of the Hungarian Academy of Sciences in Budapest (Hungary). Data
acquisition is done using NeuroSelect software [34]. The operation of three micro-probes was verified by
acute implantation in the Neocortices of Wistar rats. One probe was implanted in the primary motor
cortex (2 mm in the lateral direction, aiming M1/M2) and 2 probes were implanted in the S1 trunk
region (see Figure 4.1). The data was pre-amplified (g=10 gain, bandpass filtered between DC and 100
kHz) and amplified (g=100 gain, bandpass filtered between 0.5 kHz and 5 kHz) with a total gain of 1000.
Signals were digitized at 16-bit resolution and 20 kHz sampling rate per channel.
Figure 4.1: Cross section of the located area of one implantation (based on [16]). The probe was inserted 2 mm in lateral
direction aiming for the M1/M2 region indicated by the black line.
Before trying to track dawn the neural activities between different recording session, We need to
evaluate the effect of the normalization and attribute selection methods on the overall performance of
classification algorithms. And also we need to know which combination of the introduced pre-processing
methods and classification algorithms will give us higher prediction accuracy. Hence, we selected rela-
tively large dataset containing all recording sessions of day 15.02.2010. This dataset has a specification
which there was no intentional movement in the probe position and also it has tried out most of the
electrodes available on the probe shank for signal measurement. This dataset contains 152 electrodes
14

CHAPTER 4. EXPERIMENTS AND RESULTS 4.2. SPIKES AND FEATURES
which are also considered as class labels and proximately 15 sample per class, altogether 2199 samples.
It should be mentioned that in some of these recording sessions there are not enough qualified measure-
ments. It means some of the class labels have fewer samples, around 12 samples per class, because of
poor signals and channel disconnection and outliers. All these outliers are ignored in spike detection and
feature extraction steps.
For tracking the neural activity between different recording sessions we need to have consecutive
recordings with same electrode configurations. Then we would be able to train candidate classification
algorithms with enough number of samples and choose the one with the highest accuracy. Among the
available recordings we chose a dataset i.e. a pair of consecutive recording sessions with same elec-
trode configuration from day 12.02.2010 for channel identification and tracking down the activity of each
channel between recording sessions. These particular datasets chose because they have same electrode
configuration and there was no deliberate movement into probe position during and between the recording
sessions. Furthermore, the dataset contains high quality measured signals which advocate the presence
of neural activity. Here, we used one session of our data for training the algorithms and another session
for testing the algorithm. Each session of contains measured signals for eight channels connected to elec-
trodes 43,44,45,46,141,142,143,144. For each channel, we have 15 samples and each sample computed
from 10 s of recorded signals, represented by their feature vectors, altogether 120 samples per-session.
All recordings are available in EDF (European Data Format) and there is the library called EDFLib
[35] available for manipulating them. In each of chosen recording sessions in both of the datasets, There
were 8 electrodes, pair of the tetrode, selected and assigned to channels. Our implemented light Frame-
work, detect spikes and computer features we discussed in the previous chapter in order to use them in
classification algorithms.
4.2 Detected Spikes and Extracted Features
The first step in our approach was to detect spikes and compute some features from measured signals.
Figure 4.2 shows detected spikes for 10 s of recording using 3.5 threshold factor and 1 ms spike refractory
time. Compare to figure 4.2 in figure 4.3 we see detected spikes of the same recording segment with
threshold factor of 5 and 2 ms of spike refractory time window. As it depicted there are fewer spikes
detected with high threshold factor and this leads to different values for average firing rates.
Figure 4.2: Detected spikes from 10 s of the one channel activity using 1 ms spikes refractory time window and 3.5
threshold factor. Here the raw signal refers to filters signal which was used as input for spike detection algorithms.
In order to have more information about the activity of each channel, we would need to compute all
possible feature using different spike refractory times and various threshold factors.
15

Figure 4.3: Detected spikes from 10 s of the one channel activity using 2 ms spikes refractory time window and 5 threshold
factor. Here the raw signal refers to filters signal which was used as input for spike detection algorithms.
The figure 4.4, 4.5, and 4.6 show histogram distribution of the signal to noise ratio (SNR), maximum
(Max), and standard deviation (STD) value for four different channels in the same recording session.
Here we could see how these extracted features have to overlap in their distributions which make the
classification and channel identification a difficult task using these features. For instance, SNR values of
all four channels i.e. channel 3,4,6 and 8 lay down mostly in the range of 8 to 9. Furthermore, it is clear
that the value range of these features is different which also has the negative effect on the result of the
classification task. Therefore, we would need to normalize the feature value.
Figure 4.4: Histogram distribution of signal to noise ratio (SNR) value for detected spikes with threshold factor 5 and
1 ms refractory time for channels 8,6,3 and 4 connected to electrodes 141,142,143 and 144 including 15 samples each.The
Y-axis is the number of samples, 15 samples all together, and the X-axis is the value of the each samples. The measured
signals belong to one session on extracellular recordings on day 12.02.2010.
16

Figure 4.5: Histogram distribution of standard deviation (STD) value for measured signal of the channels 8,6,3 and 4
connected to electrodes 141,142,143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all
together, and The X-axis is the value of the each samples. The measured signals belong to one session on extracellular
recordings on day 12.02.2010.
Figure 4.6: Histogram distribution of maximum (Max) value for measured signal of the channels 8,6,3 and 4 connected
to electrodes 141,142,143 and 144 including 15 sample each. The Y-axis is number of samples, 15 samples all together, and
The X-axis is the value of the each samples. The measured signals belong to one session on extracellular recordings on day
12.02.2010.
The histogram distribution of the maximum value in figure 4.6 shows that its value is irrelevant and
would not contribute to classification task. Here employed attribute selection methods to remove such
redundant and irrelevant feature and perform the classification with the smaller subset of the feature.
17

CHAPTER 4. EXPERIMENTS AND RESULTS 4.3. DATA NORMALIZATION AND ATTRIBUTE SELECTION
4.3 Data Normalization and Attribute Selection
In this section, we applied two attribute selection methods i.e. Correlation Feature Selection (CFS)
and Principal Component Analysis (PCA). We applied Min-Max and Zero-Mean normalization methods
as preprocessing step on the large dataset from Day 15.02.2010 with 152 class labels. To perform the
global Min-Max normalization and scaling first we computed the global minimum and maximum of each
particular feature among all samples of all classes of that feature then subtracted each global minimum
value from each feature, divided it by difference of global maximum and minimum value and then scale
each feature between [-1,1]. To perform the global Zero-Mean normalization first we computed the global
mean and standard deviation of each particular feature among all samples of all classes of that feature
then subtracted each global mean value from each feature and divide it by standard deviation value.
The figure 4.7 and 4.8 the show histogram distribution of the signal to noise ratio (SNR) and maximum
firing rate (MFR) value for two different channels in a same recording session. Here by comparing feature
value of the natural data and normalized data, we could see how normalization produce a completely
different scales and new values for each feature .
Figure 4.7: Histogram distribution of SNR for natural, Min-Max normal and Zero-Mean normal value detected spikes
with threshold factor 5 and 2 ms refractory time for channels 3 and 4 connected to electrodes 143 and 144 including 15
sample each. The Y-axis is number of samples, 15 samples all together, and The X-axis is the value of each samples. The
measured signals belong to one session on extracellular recordings on day 12.02.2010.
18

Figure 4.8: Histogram distribution of Maximum Firing Rate (MFR) with the time window of 10 s for natural, Min-Max
normal and Zero-Mean normal value for detected spikes with threshold factor 5 and 2 ms refractory time for channels 3 and
4 connected to electrodes 143 and 144 including 15 sample each. The Y-axis is a number of samples, 15 samples altogether,
and The X-axis is the value of each sample. The measured signals belong to one session on extracellular recordings on day
12.02.2010.
Then for finding the best subset of features from 43-dimensional feature vectors we applied Correla-
tion Feature Selection (CFS) and Principle Component Analysis (PCA) dimension reduction methods.
It needs to be mentioned that attribute selection is done before classification step and it is independent
of the supervised learning algorithms which are used for classification. Furthermore, as we could see in
table 4.1, the subset features selected by Correlation Feature Selection (CFS) and Principle Component
Analysis (PCA) methods are depend on the type and distribution of the provided input data. Here most
of the selected feature by both algorithms came from measured signal itself rather than detected spikes
i.e. minimum (Min), Median, standard deviation (STD), root mean square of the recorded signal (RMS
Signal), and average noise level of the measured signal (ANL).
19

Table 4.1: List of selected attributes by Correlation Feature Selection (CFS) and PCA methods. Dimension reduc-
tion methods applied on Min-Max and Zero-Mean Normalized dataset of the Day 15th of iv-vivo recording. Complete
descriptions of the attributes in appendix A.
CFS on Min-Max and Zero-Mean Normal-
ized Data
PCA on Min-Max and Zero-Mean Normal-
ized Data
Min Min
Median Mean
STD STD
RMS Signal RMS Signal
ANL SNR tf 3.5 sr 2ms
SNR tf 3.5 sr 1ms SNR tf 5 sr 2ms
AFR 20ms tf 3.5 sr 1ms MFR 500ms tf 3.5 sr 1ms
We applied Correlation Feature Selection (CFS) method using best first searching heuristic and
forward direction. Table 4.1 shows that the feature subset selected for both Min-Max and Zero-Mean
normalized dataset are the same. Results for the Principle Component Analysis (PCA) method on both
normalized data are the same as well. We sued Principle Component Analysis (PCA) method with
ranked search strategy and threshold factor equal to -1.797. In the figures 4.1 and 4.2, we see how
these selected attributes are distributed. Each method gives us seven selected attributes which mean the
dimension reduction from forty-three to seven. The first four selected attributes by both methods are
similar i.e. minimum value, mean value, standard deviation (STD), and root mean square (RMS) of 10
s measured signal. But the remaining three attributes are different. The Correlation Feature Selection
(CFS) method selected average noise level (ANL) for 10 s of measured signal, signal to noise ratio (SNR)
with 3.5 threshold factor and 1 ms spike refractory time, and average firing rate (AFR) for time window
of 20 ms with threshold factor of 3.5 and 1 ms spike refractory time. The Principle Component Analysis
(PCA) method selected signal to noise ratio (SNR) with 3.5 threshold factor and 2 ms spike refractory
time, signal to noise ratio (SNR) with 5 threshold factor and 2 ms spike refractory time, and maximum
firing rate (MFR) for time window of 500 ms with threshold factor of 3.5 and 1 ms spike refractory time.
Figure 4.9: The scattered diagram for selected attributed by CFS method. The X-axis is sample index, 2199 samples per
attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the attributes
are available in appendix A.
20

Figure 4.10: The scattered diagram for selected attributed by PCA method. The X-axis is sample index, 2199 samples
per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete description of the
attributes are available in appendix A.
From figure 4.9 and 4.10, it is visible that the selected attributes using Correlation Feature Selection
(CFS) method are more suitable for classification problem than those selected with Principle Component
Analysis (PCA) method. For instance, the value of the attribute AFR 20ms of 3.5 sr 1ms, average firing
rate for the time window of 20 ms with threshold factor of 3.5 and 1 ms spike refractory time, selected
by Correlation Feature Selection (CFS) method gives more separable distributed relative to the value of
other selected attributes. But, on the other hand, the value of the attribute MFR 500ms of 3.5 sr 1ms,
maximum firing rate for time window of 500 ms with threshold factor of 3.5 and 1 ms spike refractory time.
The dimension reduction methods are aimed to remove redundant and unrelated attributes from the
given feature vectors. In the figure 4.11, we could see the scattered diagram for some of these attributes
which are ignored by both Principle Component Analysis (PCA) and Correlation Feature Selection (CFS)
attribute selection methods. The graph in figure 4.11 indicates that these features have almost the same
value distributions, therefore, they don’t contribute to classification problem so much. This phenomenon
may occur due to high background noise and artifacts in the measured signals or absence of the neural
activity nearby a particular electrode connected to a specific channel etc.
21

CHAPTER 4. EXPERIMENTS AND RESULTS 4.4. TRAINING AND VALIDATION OF CLASSIFIERS
Figure 4.11: The scattered diagram for unselected attributed by both PCA and CFS methods. The X-axis is sample
index, 2199 samples per attribute. The Y-axis denotes the attribute value after Min-Max normalization. The Complete
description of the attributes are available in appendix A.
4.4 Training And Validation Of The Classifiers In Conjunction
With Pre-Processing Methods
Data normalization is the highly recommended pre-processing step for algorithms like Support Vector
Machine (SVM) and K-Nearest Neighbour (K-NN). Therefore, in this section, we demonstrated the effect
of the dimension reduction methods on the performance of the classification algorithms. We applied our
candidate learning algorithms on both Zero-Min and Min-Max normalized datasets. The validation result
would show how well we can identify each channel given its sets of features. As we mentioned earlier
some classification algorithms i.e. Support Vector Machine (SVM) need parameter tuning in order to
obtain the higher accuracy. We used grid search as hyper-parameter tuning method in dealing with
this issue. It is apparent from table 4.2 and 4.3 the result obtained by Correlation Feature Selection
(CFS) attribute selection could generalize better although there were noisy measurements among some
of the recording sessions. In training these algorithms we used 10 fold cross-validation in order to avoid
over-fitting and get a notion about which of these algorithms could serve the goal of our project better.
Table 4.2: Train results for classification algorithms on Mim-Max normalized data and Correlation Feature Selection
(CFS), Principal Component Analysis (PCA) attribute selection methods, and all features. The dataset is provided from
all recording sessions of the day 15.02.2010. The γ parameter for Support Vector Machine (SVM) has two different values,
100 for low dimensional dataset and 10 for all dataset with all features. It contains 152 class labels ,here electrode numbers
on the probe, and 2199 samples all together. All algorithms trained using 10 fold cross-validation.
Algorithm Accuracy CFS Accuracy PCA Accuracy All Parameter Specification
ID3 64.93% 39.10% 63.48%
K-NN 62.98% 43.65% 40.60% K = 3 and inverse dis-
tance weighing
SVM 65.71% 44.29% 34.10% C = 250007 and γ =
100 and 10
Random
Forest
68.25% 44.29% 63.98% trees numbers = 10
22

CHAPTER 4. EXPERIMENTS AND RESULTS 4.4. TRAINING AND VALIDATION OF CLASSIFIERS
Table 4.3: Train results for classification algorithms on Zero-Mean normalized data and Correlation Feature Selection
(CFS), Principal Component Analysis (PCA) attribute selection methods, and all features. The dataset is provided from
all recording sessions of the day 15.02.2010. It contains 152 class labels, here electrode numbers on the probe, and 2199
samples all together. All algorithms trained using 10 fold cross-validation.
Algorithm Accuracy CFS Accuracy PCA Accuracy All Parameter Specification
ID3 64.75% 39.10% 63.61%
K-NN 62.98% 43.65% 40.60% K = 3 and inverse dis-
tance weighing
SVM 67.75% 47.88% 37.56% C = 250007 and γ = 1.0
Random
Forest
68.44% 43.97% 63.57% trees numbers = 10
The following two graphs in figures 4.12 and 4.13 show the confusion matrices for best and worst
results from the table 4.2. Comparing this to figures gave us a notion how well each algorithm predicts
the class labels and also on which class label we had the most false prediction. The figure 4.12 illustrates
the confusion matrix for the Random Forest algorithm applied on the Zero-Mean normalized dataset, in
conjunction with Correlated Feature Selection (CFS) method. The results of Random Forest algorithm
has the highest accuracy i.e 68.25% relative to other algorithms in table 4.3. On the other hand, Support
Vector Machine (SVM) algorithm applied on the Zero-Mean normalized dataset using all features has
the worst accuracy i.e 37.56% relative to others. Therefore, the confusion matrix in the figure 4.12 has
more visible diagonal with high values which means class labels are predicted to their original class. On
the other hand in figure 4.13 we could see those classes which classified wrongly, hence there were more
bright regions on both sides of the matrix’s diagonal.
Figure 4.12: Confusion matrix for the Random Forest algorithm applied on the Zero-Mean normalized data in conjunction
with Correlated Feature Selection (CFS) method. The dataset is provided from all recording sessions of the day 15.02.2010.
It contains 152 class labels, here electrode numbers on the probe, and 2199 samples all together. The algorithms trained
using 10 fold cross-validation.
23

CHAPTER 4. EXPERIMENTS AND RESULTS 4.5. TRACKING DOWN NEURAL ACTIVITY
Figure 4.13: Confusion matrix for the Support Vector Machine (SVM) algorithm applied on the Zero-Mean normalized
data using all features. The dataset is provided from all recording sessions of the day 15.02.2010. It contains 152 class labels
,here electrode numbers on the probe, and 2199 samples all together. The algorithms trained using 10 fold cross-validation.
From the results presented in tables 4.2 and 4.3, we could conclude that all four supervised learning
algorithms in conjunction with Correlation Feature Selection (CFS) feature selection method perform
virtually at the same level on the both Zero-Mean and Min-Max normalized data. Therefore, in the
following section, the combination of Correlation Feature Selection (CFS) method and four classification
algorithms are tried on both Min-Max and Zero-Mean normalized data in order to find out how well we
could track the neural activities between different recording sessions.
4.5 Tracking Down Neural Activities Using Supervised Learn-
ing Algorithms
In this section, we trained and test the previously selected classification algorithms using the specific
dataset we introduced earlier in this chapter i.e. two consecutive recording sessions from day 12.02.2010
of in-vivo experiment as we mentioned earlier in this chapter. Each recording contains eight channel
of recording connected to eight electrodes. Hence, we have eight class labels and fifteen sample per
class. The aim in the current step was to first train learning algorithms in conjunction with Correlation
Feature Selection (CFS) method using the dataset from the first recording session. Then to test trained
algorithms with the dataset from the second recording session. The accuracy of prediction in the test
phase could give us a notion that how well neural activities between two recording session are traceable.
The difference between this experiment and the earlier one in this chapter is, in the previous experiment
all the computed features belong to the same recording session. However, in the current experiment, we
have datasets belong to two different consecutive recording sessions.
In the pre-processing step, we first applied Min-Max and Zero-Mean normalization based on the same
principal mentioned earlier in this chapter and then we tried out Correlation Feature Selection (CFS)
method to reduce the dimension of the feature vectors. Table 4.4 presents the selected attributes using
Correlation Feature Selection (CFS) method. Compare to attribute subset we had in table 4.1, there
are 4 more attributes in the new subset. Furthermore, the types of the selected attributes are different.
In the subset of our smaller dataset, we have more attributes related to detected spikes and their firing
24

rates. These phenomenon support the fact that the measured signals are less noisy and there are more
neural activities present in these two recording sessions.
Table 4.4: List of selected attributes by Correlation Feature Selection (CFS) methods. Dimension reduction methods
applied on Min-Max and Zero-Mean Normalized dataset of the Day 12th of iv-vivo recording. Complete descriptions of
the attributes in appendix A.
CFS on Min-Max and Zero-Mean Normal-
ized Data
Median
STD
RMS Signal
SNR tf 3.5 sr 1ms
SNR tf 3.5 sr 2ms
SNR tf 5 sr 2ms
AFR 20ms tf 3.5 sr 1ms
AFR 20ms tf 3.5 sr 2ms
AFR 20ms tf 5 sr 2ms
MFR 500ms tf 3.5 sr 1ms
Figures 4.12 and 4.13 depict the data distribution of the selected attributes by Correlation Feature
Selection (CFS) method from both recording sessions for Zero-Mean normalized data. The attributes se-
lected based on the result of the Correlation Feature Selection (CFS) method applied to the first recording
session. Comparing the figure 4.12 to 4.13 we see they have more or less same distribution. This simi-
lar feature distribution indicates that the same sort activities were present during both recording session.
Figure 4.14: The scattered diagram for selected by CFS method from first session of day 12th of the in-vivo. The X-axis
is sample in the, 120 sample all together. The Y-axis denotes the attribute value after Zero-Mean normalization. The
Complete description of the attributes are available in appendix A.
25

Figure 4.15: The scattered diagram for selected by CFS method from second session of day 12th of the in-vivo. The
X-axis is sample in the, 120 sample all together. The Y-axis denotes the attribute value after Zero-Mean normalization.
The Complete description of the attributes are available in appendix A.
4.5.1 Prediction Results For Trained Models
The prediction results for four supervised learning algorithms are presented in table 4.5. The obtained
results show that Support Vector Machine and K-Nearest Neighbour algorithms perform on both normal-
ized data better than two other algorithms. Also, we can see that the after normalization and attribute
selection simple K-Nearest Neighbour algorithm reaches the prediction accuracy as good as sophisticated
algorithms like Support Vector Machine.
Table 4.5: Test results for classification algorithms on Mim-Max and Zero-Mean normalized data using Correlation
Feature Selection (CFS) method. The dataset is provided from second recording sessions of the day 12.02.2010. It contains
8 class labels, here electrode numbers on the probe, and 120 samples all together. All algorithms trained using 10 fold
cross-validation. This result achieved using subset of attributes presented in the table 4.4.
Algorithm Accuracy on Zero-
Mean Dataset
Accuracy on Min-
Max Dataset
Parameter Specification
ID3 73.33% 73.33%
K-NN 89.16% 90% K = 3 and inverse distance
weighing
SVM 90% 90% C = 250007 and γ = 0.01
Random Forest 83.33% 81.66% trees numbers = 10
We performed the same experiment as above on the same dataset but with the complete set of
features. Table 4.6 presents the prediction results for our selected classifiers on the two Min-Max and
Zero-Mean normalized data. Comparing results in table 4.5 and 4.6 is quite revealing in several ways.
First, it shows the tree based algorithms are almost invariably regarding data normalization. Second, The
performance of all classifiers except Iterative Dichotomiser 3 (ID3) boosted using Correlation Feature
Selection (CFS) method. Furthermore, since Iterative Dichotomiser 3 (ID3) uses pruning mechanism to
remove unrelated attributes by the time of the tree construction its results still invariant to Correlation
Feature Selection (CFS) method. However, using FSC feature selection could help other classifiers deal
26

with redundant and unrelated attributes away better than decision tree. As it, obvious simple algorithm
like K-Nearest Neighbour outperform Iterative Dichotomiser 3 (ID3) and Random Forest algorithm in
this case and reaches the same performance as Support Vector Machine.
Table 4.6: Test results for classification algorithms on Mim-Max and Zero-Mean normalized data using complete sets
of features. The dataset is provided from second recording sessions of the day 12.02.2010. It contains 8 class labels ,here
electrode numbers on the probe, and 120 samples all together. All algorithms trained using 10 fold cross-validation. These
results achieved using complete set of attributes presented in the appendix A.
Algorithm Accuracy on Zero-
Mean Dataset
Accuracy on Min-
Max Dataset
Parameter Specification
ID3 73.33% 73.33%
K-NN 81.66% 85% K = 3 and inverse distance
weighing
SVM 86.66% 83% C = 250007 and γ = 0.001
Random Forest 77.33% 74.16% trees numbers = 10
The following two graph, figure 4.14 and 4.15 show the precision-recall analysis for these four learning
algorithms on the both Min-Max and Zero-Mean normalized datasets. In both of the illustrations show
that Support Vector Machine and K-Nearest Neighbour outperform two other algorithms and basically
have higher precision values. Precision is the fraction of the predicted instances which are relevant to
original class and recall is the fraction of the relevant instance which are predicted. The higher ratio of
precision to recall means a better and more accurate prediction has achieved.
Figure 4.16: The precision-recall analysis computer from Min-Max normalized data. Each sample shows precision-recall
ratio for individual class labels. In the graph for each algorithm, we expect to see eight samples, one sample per class. Since
some of the values are the same they overlay on each other and not fully visible in the graph. Here we could see Support
Vector Machine and K-Nearest Neighbour algorithms have higher precision than Random Forest and Iterative Dichotomiser
3 (ID3). The reason lay on the boosting effect of the normalization on the dataset and CFS attribute selection.
27

Figure 4.17: The precision-recalls analysis computer from Zero-Mean normalized data. Each sample shows precision-
recall ratio for individual class labels. In the graph for each algorithm, we expect to see eight samples, one sample per
class. Since some of the values are the same they overlay on each other and not fully visible in the graph. Here we could
see Support Vector Machine and K-Nearest Neighbour algorithms have higher precision than Random Forest and Iterative
Dichotomiser 3 (ID3). The reason lay on the boosting effect of the normalization on the dataset and CFS attribute selection.
4.5.2 Observation on Confusion Matrices
Figures 4.16 and 4.17 depict the confusion matrices for the results of classification algorithms. The main
diagonal of matrices indicates the accuracy of the classifiers and also the quality of channel identifica-
tion. Higher the values on the matrix diagonal mean the accurate is the prediction of the classifier. Our
experiment designed in a way that we first built the learning models using feature computed from the
first recording session and then given features computed from the second recording session we tested our
learning models. The channel configuration, the electrodes connected to the channels, remained the same
in both recording sessions. Since there was no deliberate movement in the probe position we expected
that activity of each channel to be predicted to its original channel. In the other words, here we tried to
define ground truth for each channel based on its measure signal and later on use this information for
identifying the activities which are measured in other recording sessions.
28

(a) Random Forest (b) Support Vector Machine
(c) ID3 (d) 3-NN
Figure 4.18: The confusion matrices for a) Random Forest, b) Support Vector Machine, c) ID3, and d) K-Nearest
Neighbour algorithms on the Min-Max normalized dataset. The X-axis is original class labels, here electrode indexes on
the probe shank connected to their specific channels. The Y-axis has the same values as the X-axis. The value of each cell
indicates the number of the instances from the class label on X-axis predicted to of the label from Y-axis. Therefore, the
higher value of the diagonal shows each class predicted to its original class label.
(a) Random Forest (b) Support Vector Machine
(c) ID3 (d) 3-NN
Figure 4.19: The confusion matrices for a) Random Forest, b) Support Vector Machine, c) ID3, and d) K-Nearest
Neighbour algorithms on the Zero-Mean normalized dataset. The X-axis is original class labels, here electrode indexes on
the probe shank connected to their specific channels. The Y-axis has the same values as the X-axis. The value of each cell
indicates the number of the instances from the class label on X-axis predicted to of the label from Y-axis. Therefore, the
higher value of the diagonal shows each class predicted to its original class label.
By looking at the confusion matrices in both figure 4.16 and figure 4.17 there are channels that could
29

be identified quite accurately by all classifiers i.e. channels 5 and 1 connected to electrode 45 and 46. On
the other hand, there electrodes which in most of the algorithms are classified to their adjacent channels
i.e. channels 3 and 4 connected to electrodes 144 and 143. Figure 4.18 shows the distribution of the
SNR value for mentioned electrodes from test phase with 2 ms refractory time and threshold factor of 5
of natural data. As it is depicted channel 1 and 5 have relatively higher SNR value than channel 3 and
4. This indicates the fact that the quality of the signals which are measured by first two channels are
higher and they contain lower noise level. Therefore, their computed features would be more separable
and could have the better contribution to channel identification task.
Figure 4.20: The SNR value 2 ms refractory time with threshold factor of 5 for channel 5,1,3 and 4 connected to electrodes
45,46,143 and 144. The data normalized using Zero-Mean normalization method. The Y-axis is number of samples, 15
samples all together, and X-axis is the value of the each samples.
It is obvious that in confusion matrices, the higher the value on diagonal the more accurate the
classification result is. Therefore, in the test session, the higher value on the diagonal means activity
are measured by channels in the test dataset have same feature distribution as their measured signals in
the training dataset. Although some of the channels due to lower quality in their measurements showing
some similarities mostly to their adjacent channels, but there are channels in the test session, i.e. those
which are connected to electrodes 43, 44, 45, 46, and 141 which showing high accuracy in prediction
to their original class label from train session. Based this observation supports two ideas we earlier
mentioned in the problem statement section.
The first , by identifying each channel using its measured signal of former recording session we could
define ground truth for each channel and choose next configuration of the electrode. It means if lose
the signal quality in a particular recording session then we could select new channel configurations from
those have shown better measurements by looking at this ground truth. The second , we could support
the argument that there was no drift in the probe position between two consecutive recording sessions.
Because if there was the drift in probe position between to session of recordings we would expect that
channels in test session are classified to their adjacent channels from train session relative to shift direc-
tion. It means if there was upward drift they will be classified to their downer channels located on the
probe shank and if there was downward drift it would another way around.
30

CHAPTER 5
SUMMARY
5.1 Conclusion
In this work, we are dealing with the problem of channel identification in in-vivo recording using ”Neuro-
Probe”. Solving this task could help us to efficiently select electrodes from high-density microarray and
contribute to Electronic Depth Control (EDC) problem. It could provide us with ground truth for each
electrode on the probe shank. Using this identification we can choose channel configuration which has
shown high-quality signals and more detectable activities. Furthermore, it would be possible detect an
unintended drift in the position of the EDC-Probe during long-term in-vivo recording and between dif-
ferent recording sessions.
There were four different steps in this work. In the first step, given a dataset recorded by EDC-
Probe, we applied adaptive threshold spike detection algorithm and computed features of each recording
channel. We computed and used average noise level (ANL) as an extra new feature relative to former
approaches in this field. This feature has provided more information when the quality of the measured
signals surpassed by high background noise activity.
In the second step, for data pre-processing we applied Min-Max and Zero-Mean global normalization
in order to have the same scale and better distribution for all the computed features. In addition to
that, we applied correlation feature selection (CFS) and principal component analysis (PCA) for remove
is irrelevant and redundant features and reduce the dimension of the feature vectors. The dimension
reduction and normalization have boosted the performance of the classifiers. The result of the attribute
selection gave us a notion about the quality of the measured signals. If measurements dominated by
noise, selected features were those computed from signals itself and not from detected spikes. In contrast,
in the presence of the neural activities attributes related to detected spikes were selected.
In the third step, trained and validated various supervised machine learning algorithms i.e. K-Nearest
Neighbour, Iterative Decision Tree, Support Vector Machine and Random Forest in conjunction with pre-
processing methods and techniques for identifying each channel. Furthermore, we applied Grid-Search
as a simple case of hyper-parameter optimization to increase the accuracy of Support Vector Machine
(SVM). We trained the candidate algorithms with features computed from all recording sessions of the
day 15.02.2010 containing measured signals of 152 different electrodes. The classification result has
shown the possibility of identifying each channel up to 68% using Random Forest algorithm combined
we Correlation Feature Selection (CFS) method. Also, it has shown that after normalization other
classifiers like Support Vector Machine (SVM) and K-Nearest Neighbour (K-NN) could reach accuracy
above 62%. This suggests that for channel identification it is possible to use a combination of normaliza-
tion, Correlation Feature Selection (CFS) method and simple classifier like K-Nearest Neighbour (K-NN).
In the fourth step for tracking down the neural activities between two consecutive recording sessions
with the same channel configuration, We trained and test the combination of Correlation Feature Se-
lection (CFS) method and both normalization techniques with four supervised learning algorithms. We
31

CHAPTER 5. SUMMARY 5.2. FEATURE WORKS
were able to reach almost 90% accuracy using Support Vector Machine (SVM) algorithm. Interestingly
simple K-Nearest Neighbour (K-NN) algorithm performed at the same level as Support Vector Machine
(SVM) did. It needs to be mentioned that although ID3 and Random Forest algorithms had high accu-
racy in the training phase but in the test phase they fell behind two other algorithms. The observation on
confusion matrix, precision-recall analysis, and feature distribution showed there was no drift in probe
position between two sessions. In addition to that, it has shown the neural activity between various
recording sessions are traceable using our approach.
5.2 Future Works
• To provide strong support for approach regarding channel identification and detecting drift in the
probe position during in-vivo recording we need to provide a better data sets and recordings. Then
we could study the neural activity better. Having data set which contains long-term recording
with the same channel configuration gives us a chance to train our learning algorithm better and
observe the result of the test session to see the possibility of detecting drift in the probe position.
• Regarding identifying each channel in a particular recording session we could try not only identify
one channel but also a group of channels. In this case, we could assume a tetrode, four channel,
or two adjacent channel and train our learning algorithm by their extracted feature and try to
identify their activity in the test session. To do this we would need to find out which of pairwise
channels are recording from the same neuron simultaneously using signal similarity measurement
algorithms.
32

APPENDIX A
LIST OF FEATURES AND THEIR DESCRIPTIONS
A.1 All Features List
Feature Name Description
Min The minimum peak value of the measured signal.
Max The maximum peak value of the measured signal.
Mean The mean value of the measured signal.
Median The Median value of the measured signal.
STD The standard deviation (STD) of the measured signal.
RMS The root mean square (RMS) of the measured signal.
ANL The average noise level (ANL) of the measured signal. For computing noise level the
time window of 50 ms is used.
SNR tf 3.5 sr 2 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
3.5 and spike refractory (sr) time window is 2 ms.
MFR 20 ms tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 20 ms.
For the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr)
time window is 2 ms.
AFR 20 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
the spikes detected with the threshold factor (tf) is 3.5 and spike refractory (sr) time
window is 2 ms.
AFR 100 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 100 ms.
33

APPENDIX A. FEATURE LIST A.1. FEATURE LIST
MFR 10 s tf 3.5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
AFR 10 ms tf 3.5 sr 2 The average firing rate (AFR) of the measured signal with time window of 10 s. For
window is 2 ms.
SNR tf 3.5 sr 1 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
3.5 and spike refractory (sr) time window is 1 ms.
AFR 20 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
window is 1 ms.
MFR 10 s tf 3.5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
AFR 10 ms tf 3.5 sr 1 The average firing rate (AFR) of the measured signal with time window of 10 s. For
window is 1 ms.
SNR tf 5 sr 2 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
5 and spike refractory (sr) time window is 2 ms.
MFR 20 ms tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 20 ms.
For the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr)
AFR 20 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
the spikes detected with the threshold factor (tf) is 5 and spike refractory (sr) time
window is 2 ms.
34

APPENDIX A. FEATURE LIST A.1. FEATURE LIST
AFR 100 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 100 ms.
MFR 10 s tf 5 sr 2 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
AFR 10 ms tf 5 sr 2 The average firing rate (AFR) of the measured signal with time window of 10 s. For
window is 2 ms.
SNR tf 5 sr 1 ms The signal to noise ration (SNR) of the measured signal. The threshold factor (tf) is
5 and spike refractory (sr) time window is 1 ms.
AFR 20 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 20 ms. For
window is 1 ms.
MFR 10 s tf 5 sr 1 The maximum firing rate (MFR) of the measured signal with time window of 10 s.
AFR 10 ms tf 5 sr 1 The average firing rate (AFR) of the measured signal with time window of 10 s. For
window is 1 ms.
Table A.1: List of all computed features and their descriptions. Note that all the attributes are computed
from one segment of each recording session which contains 10 s of measured signals.
35

BIBLIOGRAPHY
[1] Miguel AL Nicolelis. Methods for Neural Ensemble Recordings. Boca Raton (FL): CRC Press., Upper Saddle River,
NJ, USA, 2008.
[2] Herc p.Neves, Tom Torfs, Refet F.Yazicioglu, Junaid Aslam, Arno A.Aarts, Patrick Merken, Patrick Ruther, and
Chris Van Hoof. The neuroprobes project:a concept for electronic depth control. Annual International Conference of
the IEEE Engineering in Medicine and Biology Society EMBS 2008, 261:1857–1857, 2008.
[3] K.Seidl, H.Herwik, Y.Nurcahyo, T.Torfs, M.Keller, M.Schuettler, H.Neves, T.Stieglitz, O.Paul, and P Ruther. Cmos-
based high-density silicon micro-probe array for electronic depth control in neural recording. 22nd Int. MEMS Conf,
261:232–5, 2009.
[4] J. Ji and K.D. Wise. An implantable cmos circuit interface for multiplexed microelectrode recording arrays. Solid-State
Circuits, IEEE Journal of, 27(3):433–443, Mar 1992.
[5] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data
mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.
[6] M.Abeles and M.Goldstein. Multispike train analysis. IEEE 1977, 65:762–773, 1977.
[7] I.Bankman, K.Johnson, and W.Schneider. Optimal detection classification and superposition resolution in neural
waveform recordings. IEEE Trans Biomed Eng, 40:836–841, 1993.
[8] Sahani M.Latent. variable models for neural data analysis. PhD Dissertation Pasadena, 1999.
[9] KH.Kim and SJ.Kim. Neural spike sorting under nearly 0-db signal-to-noise ratio using nonlinear energy operator
and artificial neural network classifier. IEEE Trans Biomed Eng, 47:1406–1411, 2000.
[10] S.Mukhopahdyay and GC.Ray. A new interpretation of nonlinear energy operator and its efficacy in spike detection.
Trans Biomed Eng, 45:180–187, 1998.
[11] L.Traver, C.Tarin, P.Marti, and N.Cardona. Adaptive threshold neural spike detection by noise-envelope tracking.
Electron, 43:1333–1335, 2007.
[12] RJ.Brychta, S.Tuntrakool, and M.Appalsamy et al. Wavelet methods for spike detection in mouse renal sympathetic
nerve activity. IEEE Trans Biomed Eng, 54:82–93, 2007.
[13] S.Kim and K.Kim. A wavelet-based method for action potential detection from extracellular neural signal recording
with low signal-to-noise ratio. IEEE Trans Biomed Eng., 50:999–1011, 2003.
[14] Z.Nenadic and JW.Burdick. Spike detection using the continuous wavelet transform. IEEE Trans Biomed Eng.,
20:74–87, 2005.
[15] I.Obeid and PD.Wolf. Evaluation of spike detection algorithms for a brain-machine interface application. IEEE Trans
Biomed., 51:905–911, 2004.
[16] Detection of Active Brain Regions for Automatic Electrode Selection Using a Machine Learning Approach. Bachelor
thesis. Master’s thesis, 2010.
[17] George W.Fraser and Andrew B.Schwartz. Recording from the same neurons chronically in motor cortex. J Neuro-
physiol., 107:1970–1978, 2012.
[18] Edwin M. Maynard, Craig T. Nordhausen, and Richard A. Normann. The utah intracortical electrode array: A
recording structure for potential brain-computer interfaces. Electroencephalography and Clinical Neurophysiology,
102(3):228 – 239, 1997.
[19] Ali Shawkat and Kate A.Smith-Miles. Improved support vector machine generalization using normalized input space.
Advances in Artificial Intelligence., 4304:362–371., 2006.
[20] Teunis van Beelen. GCC: GNU EDFbrowser a free, opensource, multiplatform, universal viewer and toolbox in-
tended for, but not limited to, timeseries storage files like eeg, emg, ecg, bioimpedance, etc. http://www.teuniz.net/
edfbrowser/, 2010–2013.
[21] Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–
1182, March 2003.
[22] Mark A. Hall and Geoffrey Holmes. Benchmarking attribute selection techniques for discrete class data mining. IEEE
Trans. on Knowl. and Data Eng., 15(6):1437–1447, November 2003.
36

BIBLIOGRAPHY BIBLIOGRAPHY
[23] M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis, 1:131–156, 1997.
[24] Fengxi Song, Zhongwei Guo, and Dayong Mei. Feature selection using principal component analysis. In System Science,
Engineering Design and Manufacturing Informatization (ICSEM), 2010 International Conference on, volume 1, pages
27–30, Nov 2010.
[25] Mark A Hall. Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, 1999.
[26] Mark A. Hall and Lloyd A. Smith. Feature subset selection: a correlation based filter approach. In 1997 International
Conference on Neural Information Processing and Intelligent Information Systems, pages 855–858. Springer, 1997.
[27] Songbo Tan. Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications,
28(4):667 – 671, 2005.
[28] J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
[29] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
[30] Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273–297, 1995.
[31] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. International
joint conference on Aritificial Intelligence., pages 1137–1143, 1995.
[32] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms.
Technical report, 2012.
[33] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. A practical guide to support vector classification. 2010.
[34] Karsten Seidl, Tom Torfs, Patrick A De Mazière, Gert Van Dijck, Richard Csercsa, Balazs Dombovari, Yohanes
Nurcahyo, Hernando Ramirez, Marc M Van Hulle, Guy A Orban, et al. Control and data acquisition software for high-
density cmos-based microprobe arrays implementing electronic depth control. Biomedizinische Technik/Biomedical
Engineering, 55(3):183–191, 2010.
[35] Teunis van Beelen. GCC: GNU EDFlib edflib is a programming library for c/c++ to read/write edf+/bdf+ files.
http://www.teuniz.net/edflib/index.html, 2010–2013.
37

edc_adaptivity

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to edc_adaptivity

Similar to edc_adaptivity (20)

edc_adaptivity