• Like


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Spike sorting based upon machine learning algorithms (SOMA)

Uploaded on


More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Journal of Neuroscience Methods 160 (2007) 52–68 Spike sorting based upon machine learning algorithms (SOMA) P.M. Horton a,∗ , A.U. Nicol b , K.M. Kendrick b , J.F. Feng c,∗∗ a Department of Informatics, Sussex University, Brighton BN1 9QH, UK b Laboratory of Cognitive and Behavioural Neuroscience, The Babraham Institute, Cambridge CB2 4AT, UK c Department of Computer Science, Warwick University, Coventry CV2 4AT, UK Received 21 December 2005; received in revised form 18 August 2006; accepted 23 August 2006Abstract We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological dataand automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensionsto a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known numberof clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance ofmisclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealingseparable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired usingPCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have appliedit to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. TheSOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.© 2006 Elsevier B.V. All rights reserved.Keywords: Spike sorting; Neural networks; Olfactory bulb; Odour; Sheep temporal cortex; Pre-processing; Self-organising maps1. Introduction at the level of multiple cell–cell interactions in the brain is to record simultaneously from large numbers of cells within a de- After over a century of neurophysiological research we still fined system. The two main difficulties in achieving this stepdo not understand the principle by which a stimulus such as an have been, firstly, the lack of appropriate hardware to record si-odour, an image or a sound is represented by distributed neu- multaneously the electrical activity of ensembles of neurones atral ensembles within the brain. While large numbers of stud- the single cell level and, secondly, restrictions of current meth-ies have made detailed analyses of response profiles of single ods in analysing the resulting huge volume of multivariate, highcells in isolation, such techniques cannot address holistic issues frequency data . In the current paper, we aim to resolve a crucialof how large ensembles of neurones can integrate information issue in the second category: spike sorting.both spatially and temporally. There is little doubt that much There have been many automatic (see Harris et al., 2000; Har-of the information processing power of the brain resides in the ris KD. Klustakwik Software (http://klustakwik.sourceforge.activities of co-operating and competing networks of neurons. net/), 2002; Lipa P. BubbleClust Software (http://www.cbc.If we can unlock the principles whereby information is encoded umn.edu/∼redish/mclust/MClust-3.3.2.zip), 2005), manual (seewithin these networks as a whole, rather than focusing on the Redish A. MClust Software (http://ccgb.umn.edu/∼redish/activities of single neurones in isolation, we may come closer mclust/), 2005; Hazab L. Kluster Software (G. Buzsaki’s Lab)to understanding how the brain works (Petersen and Panzeri, (http://klusters.sourceforge.net/UserManual/index.html), 2004;2004). While some progress towards understanding how this is Spike2 Software (Cambridge Electronic Design Ltd., UK)), andachieved at a gross structural level is being achieved with brain semi-automatic methods, i.e. a combination of both, to sortimaging techniques, the only way to provide an understanding spikes. None of these methods supersedes the rest as each has associated difficulties (Lewicki, 1998). Normally, the first step in these methods is to find the number of clusters, i.e. template ∗ Corresponding author. Tel.: +44 7963246444. waveforms, formed from the chosen set of waveform features. ∗∗ Corresponding author. Manual methods require the user to select these features and to E-mail address: pmh20@sussex.ac.uk (P.M. Horton). identify the template waveforms, which can be labour intensive0165-0270/$ – see front matter © 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.jneumeth.2006.08.013
  • 2. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 53and inefficient. Fully automated methods rely on a chosen dataset and multiplying it by an arbitrary number.1 Thus, afterpre-processing method, normally principal component analysis training, a subset of the outputs will represent every cluster and(PCA), to automatically extract differentiable features. A clus- one will be nearer the centre (denser area). We subsequentlytering technique is then used to find a number of clusters. This identify these outputs by comparing the distributions that eachapproach is less time-consuming. However, there are still prob- represents. Consequently, a set of non-central nodes arises whichlems associated with this approach; mainly that most clustering we use to reduce misclassification (next stage).algorithms require a pre-determined number of groups to iden- The final part of our proposed process (second extension) istify, and PCA does not always produce several distinguishable to define the waveforms belonging to each cluster using all theclusters. The semi-automatic methods have a combination of the nodes, i.e. central and non-central nodes. We firstly, move, i.e.above problems. space out, the non-central nodes around and towards their clus- The next step in these methods is to identify the sets of wave- ter’s edge. We subsequently use them with the central nodes informs that best represent each of the clusters, usually referred the classification process, where implicit boundaries are formedto as cluster cutting. A user can manually split the feature space to contain every cluster’s size and shape. This resolves the prob-into clusters by defining boundaries, but this technique is nor- lem of misclassification; if several clusters are in close proximitymally inefficient and time-consuming. An alternative is to define and one has a wider distribution, an implicit boundary would en-the clusters automatically, by identifying the closest set of spike compass this area reducing the likelihood of misclassification.waveforms to each template waveform (mean) using a distance We test the new approach using datasets acquired from the ratmeasure. This forms a set of implicit boundaries that separate olfactory bulb and the sheep temporal cortex. We demonstratethe clusters. Clustering using an automated method normally that our method out-performs others, and show that these pro-ignores the clusters’ distributions, thus, classifying a high per- cesses can extract artifactual waveforms from the data, limitingcentage of the dataset correctly if the clusters are not close and inaccuracies in the analysis of the results.have a spherical distribution. However, the likelihood of findingsuch clusters in real electrophysiological data is minimal. This isbecause the differential noise levels attributed to the waveforms 2. Extracting experimental spike datacreate differentially distributed clusters. The technique proposed in this paper has the three following The spikes used in our spike sorting process were extractedobjectives, resolving some of the problems stated above (see from the continuous electrophysiological trace sampled at analso Lewicki, 1998): electrode when the signal exceeded a given threshold (2* back- ground noise). Each spike was sampled for a period spanning 0.4 ms preceding, to 1.25 ms after the threshold was crossed.1. To form distinguishable clusters (where each represents a For each channel of data, the width of the spikes collected was biologically plausible firing rate) by appending extra feature identical throughout the period of recordings. components, which describe the geometrical structure of the Each spike collected was comprised of 48 sample points (t = waveform, to the PCA components. 1, . . . , 48), the signal crossing the spike-triggering threshold at2. To identify, automatically, the number of clusters within the t = 11. We show an example of a spike waveform in Fig. 6 feature space. (bottom panel).3. To reduce the number of waveforms classified to the incorrect groups. 3. Spike sorting pre-processing method We deal with these objectives by extending a version of an 3.1. Feature extractionsunsupervised neural network (Kohonen), resulting in a fully au-tomated process, which is a vital consideration when dealing The first stage of our methodology is to extract the featureswith multiple electrode recordings. that differentiate spikes, thus revealing clusters within the fea- The first part of our spike sorting methodology pre-processes ture space. Principal component analysis (PCA) will automati-the waveform data using a new combined approach. First, we cally extract these features (Bishop, 1995; Csicsvari et al., 1998;acquire the PCA components and if necessary append this to Jolliffe, 2002). However, it appears that this is not always reliablefeatures, which describe the waveform’s geometrical shape. The and can produce results which are not biologically plausible. Foruse of additional features, and their number, is dependent on the example (bottom panel, Fig. 5), PCA formed one cluster sug-number of clusters formed and whether they represent neurons gesting that one neuron was recorded with a firing rate >200 Hz.with a plausible firing rate. However, the high firing rate suggests that multiple neurons were The next part of the process identifies the number of clusters recorded.within the feature space, using a proposed extension to the Ko-honen method. The standard Kohonen network requires severaloutputs, which correspond to the number of clusters (neurons) 1 It is not compulsory to calculate the number of outputs using the maximumwithin the feature space. As we would have no prior knowledge number of neurons, as using an arbitrary large number would be sufficient.of this, we acquire a sufficient number of outputs by calculating However, using our method identifies a minimal number of outputs to use, thusthe maximum number of neurons that could feasibly be in the improving efficiency.
  • 3. 54 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 To resolve this, we append extra feature components, which mate the first and second derivatives at t, using the three valuesdescribe a waveform’s geometrical shape as a set of con- with the central difference approximation, and use them withvex and concave curves, to the corresponding PCA set. To the equation below:achieve this, we divide every waveform into C sections along Vthe time axis t and calculate C components (curvature values), curv(t) = (3.1) (1 + V 2 )3/2i.e. a component for each section, where each describes thecurve shape. Thus, the process forms distinguishable clusters, where V and V are the first and second derivatives at point t.as different combinations of curve shapes constitute different For further information see (Rutter, 2000).waveforms. We do not calculate the curvature score for the first and last The number of extra components C to use relates to the cur- points of the waveform, as they do not have two adjacent points.rent clusters and their associated firing rates. Therefore, we start For example, if there were 48 sampling points we would obtainwith C = 1 and continually increase it by 1, until each cluster 46 curvature scores.within the feature space represents a firing rate below a thresholdlevel. In other words, we increase the number of curve shapes to 3.1.2. Calculating C extra components for a spike waveformdescribe the waveforms, thus, dividing the current clusters into We acquire C extra components for a waveform by averagingsmaller ones.2 Consequently, the pre-processing method identi- C consecutive subsets of the corresponding curv set. We definefies a minimal number of extra components to use, thus improv- the size of a subset using S = A/C, where A is the number ofing the efficiency of the spike sorting process, as we require less scores in curv.memory. To acquire the extra components we: We achieve our proposed pre-processing method by imple-menting the following: 1. Calculate the ith component using one of the following: If (i == 1): avg(curv(1 . . . S)) 1. Smooth each waveform using the moving average Else: avg(curv((i − 1) ∗ S . . . S ∗ i)) method: We used an arbitrary filter length of 5. 2. Initiate the extra number of components to 0: i.e. C = 0. For example, if C = 4 and a, b, c, d represent the 3. Transform the smoothed data using PCA: Exclude com- four components, respectively, then the components would ponents that contribute to <0.5% variance. equal: 4. Calculate a set of curvature scores curv for every a = avg(curv(1 . . . S)) smoothed spike waveform (explained in more detail be- b = avg(curv(S ∗ 1 . . . S ∗ 2)) low). c = avg(curv(S ∗ 2 . . . S ∗ 3)) d = avg(curv(S ∗ 3 . . . S ∗ 4)) 5. Identify the number of clusters formed using the current sets of components: We identify and define the clusters 2. Round each new component to a −1 or 1, if the value is using the clustering stage described in the next section. < 0 or > 0, respectively: We implement this to extract the 6. Quit process: if every cluster corresponds to a firing rate curve shape information, i.e. < 0 and > 0 refer to a concave < threshold level. and convex shape, respectively. 7. Increase C by 1. 8. Calculate C new extra components for every waveform: 4. Spike sorting clustering method We divide a curv set into C consecutive subsets, which par- titions the corresponding waveform into C sections, and 4.1. Simulated data calculate the C components by averaging the subsets. We discuss this in detail below. To be able to explain this methodology we simulated sets of 9. For every waveform, replace the previous extra compo- feature components, where a set of three components (C1 − C3 ) nents by appending the C new ones to the corresponding represents a waveform.3,4 PCA set. Throughout the paper we assume that noise attributed to the10. Repeat steps 5–9. neuronal recordings, used with our spike sorting process, is Gaussian. Therefore, the clusters within the simulated datasets3.1.1. Calculating a set of curvature scores curv for a spike represent Gaussian distributions.waveform We acquired a simulated set containing two clusters (shown A curv set contains a curvature score for every point t along in Fig. 1 (top left panel)), by selecting two sets of threea spike waveform, where each score describes the curve shape values, where each set represents the centre and position ofand its curvature, i.e. how curved it is. To calculate a curvature score at point t, we first acquire the 3 We use three components, i.e. dimensions, throughout this paper for ex-values of the voltage (V) at t − 1, t and t + 1. We then approxi- planation, and presentation, purposes. The entire method is applicable to many more components, as in practice three dimensions would not reveal separable clusters within the feature space. At the end of the paper, we show the results of 2 This approach is similar to support vector machine algorithms, as this also in- applying the method to >three dimensions.volves incrementing the number of dimensions until several clusters are formed. 4 We refer to a feature component set as a datapoint throughout the paper.
  • 4. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 55Fig. 1. Top left: this shows the simulated data set and the initial state of the Kohonen network. The black circular markers (nodes) represent the outputs. Top right:this shows the outcome of the Kohonen network after using 250 epochs. The nodes’ positions (weight vectors) are shown, where eight of them represent a clusterand two represent the denser areas, i.e. the cluster centres. The lines that connect them represent the nodes’ neighbours. Bottom left: this shows that nodes n2 andn4 , i.e. yellow and blue markers, respectively, represent the same cluster. The black nodes are n4 ’s neighbours, i.e. n3 and n5 . Bottom right: this shows the outcomeof the Kohonen process, i.e. extensions I and II, when used with the second set of simulated data. The red nodes represent the central nodes, i.e. set R, and the nodesconnected to them, with the black striped lines, represent the O sets. The black circular sphere surrounding node b, and the line through the middle, represent the fulland half boundaries, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of the article.)a cluster. We then created 500 feature descriptions, i.e. 500 contain, and multiply it by an arbitrary number of 3. Thus,C1 − C3 values, by adding random gaussian noise to the starting after training, several outputs represent every cluster, wherevalues. one is nearer the centre (identified in extension I), and the remainder reduces misclassification (discussed in extension4.2. Kohonen network II). 2. Select the initial connection values (weights) to the The first stage of the clustering process is to locate, using a outputs, i.e. select the initial node positions: We se-Kohonen network, the denser areas of the feature space which lect the initial position of a node n, i.e. the weight vec-represent cluster centres. A detailed discussion on the Kohonen tor Wn , at random from the range of feature componentnetwork is not in this paper, but can be found in many books, values.e.g. (Kohonen, 2001). 3. Select the number of training epochs: An epoch refers to To begin the method, we have to select several network fea- every datapoint being used once to train the network. Wetures: used an arbitrary number of 250. 4. Select an initial learning rate: This value is low and de-1. Select a number of network outputs: The number cho- creases at the start of every epoch. We used an arbitrary value sen normally corresponds to the number of clusters within of 0.003 and decreased it using 0.0001. These values pre- the feature space. As we would have no prior knowledge of vent the nodes from moving around the entire feature space this, we acquire a number of outputs by calculating a max- and restrict them to one area, i.e. the cluster they currently imum number of neurons, which the dataset could feasibly represent.
  • 5. 56 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–685. Select distance measure: We used the Euclidean distance represents a high density, i.e. a potential cluster centre, and a high measure. score represents a low density, i.e. the outer area of a cluster. To6. Extra feature: Use fairer competition. This updates the acquire the score, we calculate the average euclidean distance winning node’s two adjacent neighbours, but only by half between a node’s weight vector and several of its closest data- the amount. For example, we update nodes 1 and 3 if the points. The equation is shown below: winning node was 2. We implement this so all the outputs P have a chance of representing a subset of the data. p=1 Mp densn = (4.1) P Fig. 1 (top left, and right, panels) shows the state of a Kohonen where M is the set of distances between the node’s weight vectornetwork before and after training, respectively. The network was and the closest P datapoints, and n represents a node. P is anused with nine outputs, the above features, and the simulated arbitrary number but < the number of datapoints within thedataset discussed earlier. training set. We used P = 100.4.3. Extension to Kohonen network I (identifying the Identify a set of central nodes. We achieve this by iden-number of clusters) tifying the nodes which have a lower dens score compared to its two adjacent neighbours, so if This section describes how we extend the Kohonen networkapproach to identify the number of clusters present within the (densn < dens(n+1) )&(densn < dens(n−1) )feature space. We achieve this by comparing the distributions node n represents a potential cluster centre.that a node and its two adjacent neighbours represent. Thus, two sets D and ND are formed, which represent the Subsets of Kohonen network outputs (nodes) will represent clusters’ centres and outer areas, respectively.different clusters (for an example, see the top right panel in Fig. We used this method on our simulated data, shown in Fig. 11) after training the network. As the density of the datapoints (top right panel), which identified a set D comprising two nodes,within a cluster decreases from the centre to the outer area, we n3 and n7 . We show all the dens scores in Table 1.can identify the nodes closest to a cluster centre as the ones rep-resenting a denser area compared to the others.5 Consequently,the set compiled can contain nodes representing the same cluster Summary of the main variables in this section.that need to be removed. Therefore, we implement two stagesto identify the central nodes, which we describe briefly below: 1. D = set of nodes that represent potential cluster centres. 2. ND = set of nodes that represent the clusters’ outer areas.• Stage 1: Identify A Set of Central Nodes D: 4.3.2. Stage 2: verifying the nodes in set D represent 1. Calculate a density score for every node: This de- different clusters scribes the density of the datapoints that a node repre- Set D can contain nodes that represent the same cluster. sents. For example, from the dens scores in Table 3 we would con- 2. Identify a set D of possible central nodes: We achieve clude that nodes (n2 and n4 ) constitute the set D, i.e. these this by comparing the density scores of every node and two nodes represent different clusters. This is incorrect (see its two adjacent neighbours. Fig. 1 (bottom left panel)). The process includes n4 in D be- cause its two neighbours are positioned further out causing• Stage 2: Verify that Each Node in D Represents a Different densn4 < densn3 and densn4 < densn5 . Therefore, n4 would Cluster: need removing from set D, as it has a higher dens score than n2 (Table 2 ). 1. Analyse the changes in the datapoint distribution from a node, selected from D, to another. Table 1 These stages are described in more detail in the subsequent The table shows that nodes n3 and n7 represent potential cluster centrestwo sections. n dens scores 1 0.72004.3.1. Stage 1: identifying a set of nodes that represent 2 0.6439potential cluster centres 3 0.59884.3.1.1. Calculating a density score for a node. The score dens 4 0.6290describes the density that a node represents, where a low score 5 4.6731 6 0.6538 7 0.5923 5 The kohonen features chosen at the beginning will always result in a sub- 8 0.6230set of nodes representing every cluster, where one will be nearer the centre. 9 0.6478Additionally, the outputs will maintain their number order, therefore, compar- The nodes surrounding n3 and n7 have higher scores, thus represent the clustering each node with its two adjacent neighbours is always possible. outer areas.
  • 6. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 57Table 2 Please note: these procedures are successful on any numberThe table shows there are two nodes, n2 and n4 , that represent potential cluster of datapoints that constitute a cluster, as an area within it will becentres denser than the rest.n dens scores1 0.78982 0.5678 The algorithm to identify whether two nodes repre-3 0.6898 sent the same cluster. We refer to this process as movingm and4 0.6459 achieve it implementing the following.5 0.7061 1. Add an extra output (node) m to the Kohonen• Identifying whether two nodes represent the same cluster: network. 2. Update node m to move from a start node sn to a des- To achieve this, we analyse the changes in the datapoint dis- tination node dn in T steps (positions), i.e. Wm = Wsn totribution from one node to the other. For example, the density of Wm = Wdn .the datapoints would decrease if we moved from n2 (centre) to n4 (a) Calculate the densdn (using Eq. 4.1) and numddn(cluster outer area). Therefore, if we calculate the dens score at scores (we explain how to calculate numd below).several positions, equally apart, from n2 to n4 , the scores would (b) Calculate the densm and numdm scores for every newconsecutively increase and < densn4 . We show an example set position.of scores for four positions in Table 3 (dens scores 1). (c) For every G consecutive positions m moves, we: (i) Average the G densm and numdm scores corre-• Identifying whether two nodes represent different clus- sponding to the G positions and append them to ters: the densavgm and numdavgm sets, respectively. We analyse the average of these scores to improve If we assume n2 and n4 represent different clusters, the reliability, i.e. the process ignores any noisy data-changes in the dens scores from n2 to n4 would correspond point fluctuations within the clusters.to one of the following: (ii) Analyse the newly calculated averages using the densmjourney and numdmjourney procedures.1. The scores consecutively increase, i.e. the first few positions These identify whether sn and dn represent the same are within n2 ’s cluster, and then decrease, i.e. the remainder cluster (explained below). of the positions are outside, and within, n4 ’s cluster. (d) Quit process: when densmjourney or numdmjourney2. One of the scores is > densn4 , i.e. the related position is satisfy a condition before Wm = Wdn . Otherwise re- outside n4 ’s cluster. peat steps (b) and (c). We show an example set of dens scores in Table 3 (densscores 2). T and G represent arbitrary values, but the greater the values the more accurate the result. For this paper G = 5 and T = 50.• Identifying whether two nodes represent the same cluster In Fig. 2, we show a graphical example of movingm, using using another method: the data from Fig. 1 (top right panel). Calculating the numd score This method analyses the number of datapoints numd con-tained within a boundary surrounding each position. Again, we The numd score for a node n, i.e. numdn , is the number ofanalyse the score changes from n2 to n4 , so if: datapoints that satisfy dist{Wn , x} < rad, where x is a datapoint vector and Wn is the node’s weight vector. In other words, numdn1. The scores consecutively decrease and are not > numdn4 , is the number of datapoints within the boundary rad surrounding we conclude that both nodes represent the same cluster. An n. example is shown in Table 3 (numd scores 1). The boundary rad is the same for both m and dn. To acquire2. If the first few scores consecutively decrease then the re- this value, we do the following: mainder increases, or, a score is < numdn4 , we conclude that both represent different clusters. An example set of scores is 1. Identify two nodes in set ND, n1 which is the closest to node shown in Table 3 (numd scores 2). sn, and, n2 which is the closest to node dn.Table 3This table shows the distribution of the datapoints for four consecutive positions between n2 and n4 , using the dens and numd scoresPositions 0(n2 ) 1 2 3 end(n4 )dens scores 1 0.5678 0.5873 0.6068 0.6264 0.6459dens scores 2 0.5678 0.7874 0.8068 0.7564 0.6459numd scores 1 50 38 34 28 21numd scores 2 50 12 0 16 21From rows 1, 3 and 2, 4 we conclude that both nodes represent the same or different clusters, respectively.
  • 7. 58 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68Fig. 2. This shows the 10th update of node m (black marker), which has moved from sn (right blue marker) towards the dn node (left blue marker). The red stripedline is the path m (black marker) will make across the feature space. Therefore, the line represents the total set of positions m can move to, where every positioncorresponds to a different densm and numd m score. Using m to represent every position is computationally inefficient, therefore, we move m T positions, equallyspaced, to make the journey. We used an arbitrary number of 50. We removed all the datapoints from each cluster, so we can clearly see node m. Left: this shows theclosest 100 datapoints (connected with a set of lines) used to calculate densm . Right: this shows the number of datapoints used to derive numd m , i.e. the datapointswithin the spherical boundary (rad) surrounding m. The sphere surrounding the light blue marker represents the boundary around dn. (For interpretation of thereferences to colour in this figure legend, the reader is referred to the web version of the article.)2. Calculate the following: or (a) dist{Wsn , Wn1 }/2 numdmjourney: (b) dist{Wdn , Wn2 }/2. rad is the lowest of these values. (numdavgm (a − 1) > numdavgm (a))&(numdavgm (a) > numdavgdn ) This provides a rad value large enough to only analyse data- i.e. m moved from the cluster centre towards the outer areapoints within the sn and dn clusters at the start and end of node where dn is positioned.m’s journey, respectively. Otherwise, including the surround-ing clusters’ datapoints could influence the numdm scores and For example, both nodes would be in the same cluster ifproduce an incorrect conclusion. m produced densavgm scores of 0.6, 0.7, 0.8, 0.9, 1 where densdn = 1.1, or, numdavgm scores of 40, 38, 37, 34, 304. The densmjourney and numdmjourney procedures where numddn = 25. The procedures densmjourney and numdmjourney analyse the • Condition 2: if fulfilled, this also states that both nodesdensavgm and numdavgm scores, respectively, using three con- represent the same cluster.ditions (described below6 ), which determine whether sn and dnrepresent the same cluster. The condition is satisfied if: Both sets of scores will satisfy the same condition. How- densmjourney:ever, we analyse both to improve efficiency, as one willsatisfy the condition earlier than the other set and conse- (1) First part of journey:quently stop the process. It is impossible to manually choose (densavgm (a − 1) >this set, as the clusters’ positions and distributions deter- densavgm (a))&(densavgm (a) < densdn )mine it. (2) Second part of journey: Set of conditions: (densavgm (a − 1) < densavgm (a))&(densavgm (a) < densdn )• Condition 1: if fulfilled, this states that both nodes repre- sent the same cluster. or The condition is satisfied if: numdmjourney: densmjourney: (1) First part of journey: (densavgm (a − 1) < densavgm (a))&(densavgm (a) < (numdavgm (a − 1) < numdavgm (a))& densavgdn ) (numdavgm (a) > numd dn ) (2) Second part of journey: (numdavgm (a − 1) > numdavgm (a))& 6 We use values derived from simulations to help describe these conditions. (numdavgm (a) > numd dn )
  • 8. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 59 i.e. node m moved across the cluster centre (1) towards the every node, and use them for subsequent analyses throughout the outer area where dn is positioned (2). paper. For example, both nodes would be in the same cluster if m produced densavgm scores of 0.5, 0.4, 0.6, 0.7, 0.8 where 4.3.3. Summary of the main variables in this section densdn = 0.9, or, numdavgm scores of 28, 30, 32, 27, 21 where densdn = 10. 1. R = set of nodes that represent the cluster centres. 2. ND = set of nodes that represent the clusters’ outer areas.• Condition 3: if fulfilled, this states that both nodes repre- sent different clusters. 4.4. Extension to the Kohonen network II (reducing The condition is satisfied if: misclassification) densmjourney: This section describes a further extension to the Kohonen network, which uses the nodes in the R set and the nodes in the (1) First part of journey: ND set to reduce misclassification. Using only the central nodes densavgm (a − 1) < densavgm (a) in the classification process can result in implicit boundaries not (2) Second part of journey: separating the clusters sufficiently, thereby causing misclassifi- densavgm (a − 1) > densavgm (a) cation. We show an example of this in Fig. 3 (bottom right panel), where a subset of the larger cluster would be misclassified as it or passes the boundary. Therefore, using, and moving (updating) numdmjourney: the extra nodes towards the clusters’ edges, repositions the im- plicit boundaries between the central nodes to enhance cluster (1) First part of journey: separation (Fig. 3 bottom left panel). numdavgm (a − 1) > numdavgm (a) To achieve a reduction in misclassification we implement the (2) Second part of journey: following: numdavgm (a − 1) < numdavgm (a) 1. The identification process: Associate a set of nodes O, ac- i.e. node m moved towards the outer area of sn’s cluster quired from the ND set, to every node in the R set, where a (1) towards another cluster where dn is positioned (2). node from R and its corresponding O set represent the same For example, both nodes would be in different clusters cluster. if m produced densavgm scores of 0.5, 0.6, 0.7, 0.65, 0.55, Using extra nodes with the classification process may not where densdn = 0.5, or, numdavgm scores of 30, 25, 20, reduce misclassification immensely, as the first stage (Koho- 22, 28, where numddn = 36. Condition 3 is also satisfied if nen training) may not distribute the nodes to represent ev- (densavgm (a) > densdn ) or (numdavgm (a) < numd dn ). ery cluster’s area sufficiently. Therefore, we move the non- central nodes (sets O) further into and around their cluster’s4.3.2.2. Identifying the central nodes from D. We form a new outer area.set R from set D, which contains the nodes that represent differ- To achieve this, we use a second set of epochs (for thisent clusters. To achieve this, we use the above procedures and paper we used an arbitrary number of 250) and for everyimplement the following: epoch we implement the following:1. Acquire a set SD, by sorting the nodes in D into dens ascend- 2. The pushing process: At the start of every epoch, we push a ing order. set of nodes, which contains a node from every O set, further2. Identify the nodes from SD that represent the same clusters. towards their cluster’s edge. (i) Select a sn node from SD, which has the lowest dens 3. The dispersing process: For the rest of the epoch, we use the score. Kohonen training process to disperse the outer nodes (sets (ii) Append sn to R. O) around their cluster. (iii) Remove sn from SD. We designed these two techniques to slowly push and (iv) Use the procedure movingm with sn and every node in space out the nodes in an O set, preventing the confinement SD, where SD corresponds to a set of dn nodes. If the of node subsets to one area. Therefore, an entire cluster area, outcome of densmjourney or numdmjourney for a dn regardless of shape and size, is sufficiently represented. To node is conditions 1 or 2: visualise this extension, we show an example of the pushing (i) Remove dn from SD. and dispersing stages in Fig. 3 (top and middle panels). (ii) Append dn to ND. Once these stages are complete, we classify each of the (v) Repeat steps (i)–(iv) until SD is empty. datapoints using the new classification process described below. As we may use the nodes in the SD set more than once,several rad values (used with the movingm procedure) may be 4. The classification process: Classify each datapoint to aassociated to each. Therefore, we record the lowest rad score for node in set R. We identify the closest node to a datapoint
  • 9. 60 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68Fig. 3. Top two panels, from top left to bottom right: these show the results of the pushing and dispersing stages. It shows the process has been continuouslypushing and spreading out the clusters’ outer nodes, i.e. sets O, over several epochs. Bottom panel, left: this shows the result of the SOMA process when used withall 10 nodes, where the implicit boundary (black line) formed from the outer nodes (sets O) improves the cluster separation. Bottom panel, right: this shows theresult of the SOMA process when only using the central nodes, i.e. K-means. The boundary (black line) formed between them does not sufficiently separate theclusters. using the Euclidean distance measure. The identified node 1. Identify the node in R to which each node in ND is closest. is either from an O set, so we classify the datapoint to Therefore, we form a set of nodes Q for every node in R: the corresponding node in R, or set R, so we classify the There is a high chance the nodes in set R and the nodes in datapoint to the identified node. their corresponding Q set represent the same cluster. 2. Verify that every node in R and its corresponding Q setWe explain the first three processes in more detail below. represent the same cluster: we achieve this using the mov- ingm procedure (extension I, stage 2) to verify the nodes4.4.1. The identification process sn, which is a node from R, and dn, which is a node To identify a set of nodes O for every node in R, we have from the corresponding Q set, represent the same cluster,to: so:
  • 10. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 61 (i) We append dn to the corresponding O set when the out- Calculating the numdm score come of either the densmjourney or numdmjourney pro- cedures is conditions 1 or 2. To derive the numdm score, we use the datapoints within m’s (ii) If dn satisfies condition 3 we test it against the other half boundary facing sn. Therefore, we use the same calculations nodes in R. We ignore it if every outcome produced is as above but replace Wdn with Wm . So, the datapoints that satisfy condition 3. (dist{Wn , x} < rad)&(M < 90) To derive the dens and numd score for dn and m, we only constitute numdm , where rad is the same value for Wdn .use the datapoints positioned between sn and dn, i.e. sn’s clus-ter. This is to prohibit the use of datapoints from other clusters, Calculating the densdn and densm scoresi.e. behind dn, which may affect the scores and produce incor- We calculate the densdn and densm score using Eq. (4.1),rect conclusions. We describe the modifications to the movingm where the closest datapoints to derive the scores have to be withinprocedure below. sn and dn, i.e. M < 90. The process to calculate M is in the previous section. Modifications to the movingm procedure. 4.4.2. The pushing process4. Calculating the numddn score At the start of every epoch, we choose a node from every set To derive the numddn score, we implicitly divide the spherical O, and push them further towards their cluster’s edge.boundary, i.e. rad, surrounding dn (outer node) through the mid- We achieve this pushing process by associating a new bound-dle into two halves, where the division line is perpendicular to ary, unrelated to the rad values, to the central nodes (R) and thesn (central node). Thus, the number of datapoints within the half outer nodes (sets O). An example of this is shown in Fig. 4sphere facing sn constitutes the score. We show an example of (bottom panel). We retain the central nodes within the clusterthis in Fig. 1 (bottom right panel). We ignore the datapoints in the centres, but increase their boundaries. Therefore, the processother half, i.e. behind dn, as these could be within another cluster pushes an outer node if its boundary and the corresponding cen-and consequently produce an incorrect conclusion. For the ex- tral node’s boundary overlap. This process continues until all theample in Fig. 1 (bottom right panel), if we used all the datapoints outer nodes (O sets) represent an area near their cluster’s edge.within the spherical boundary for dn’s score, the process could As clusters can be in various shapes and sizes, we have toconclude that sn and dn(b) represent different clusters, i.e. a sub- push the nodes either few or many times. For example, if a clus-set of numdm scores could satisfy numdavgm (a) < numd dn . ter had an elongated spherical shape we need to push some nodes As dn moves around the cluster (part of a later process), the further out, to represent the edges of the elongated sections, thandivision used to form the half sphere changes with respect to the nodes that represent the reduced sections. To achieve this,sn. Therefore, the datapoints used are always within the correct we associate a set of boundaries to every node r in R, whichcluster. correspond to the nodes in r’s O set. We implement the push- To identify if a datapoint’s position is within the correct half ing process in this way, as we exclude the datapoints positionedof the boundary, we implement the following: within the central nodes’ boundaries from updating the outer nodes’ in the dispersing stage. Thus, a node’s movement is con-1. Calculate an angle M that identifies if datapoint x is po- fined to its cluster’s outer area. sitioned within sn and dn: we calculate angle M using Wsn , To achieve this pushing process we: Wdn and x (datapoint vector), so if M < 90, x is between them (explained below).2. Identify whether the datapoint x is also within the speci- 1. Associate a set of boundary values bound to every central fied boundary rad, i.e. (dist{Wdn , x} < rad). node r in R, where the number of boundaries in boundr is equal to the number of nodes in Or ; (i) All the values in a boundr set are initially the same, and4. Calculating the angle M are calculated using dist{Wr , Wg }/2, at the beginning of To do this, we need to rearrange the cosine law (shown be- extension II. Wg is the weight vector (node) closest tolow): node r from Or . 2. Associate a boundary value bnd to every node in the Om2 = k2 + l2 − 2kl cos M (4.2) sets: We gave an arbitrary value of 1 to all nodes.where k=dist{Wdn , x}, l = dist{Wsn , Wdn }, m = dist{Wsn , x}. 3. Identify and push out a node from every set of O: We implement this at the start of every epoch. This is described4. Calculating the numddn score in more detail below. The numddn is now the number of datapoints that satisfy: Identifying and pushing a node from an O set. To(dist{Wdn , x} < rad)&(M < 90) achieve this, we:where rad is the lowest recorded value associated to sn in stage2 of extension I. 1. Identify a node g from set Or , which is closest to r.
  • 11. 62 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68Fig. 4. Top left: This shows the result of the process when we used 10 nodes. It shows all of the nodes were used to represent either the centre or the outside area ofthe clusters. Top right: this shows the datapoints to which node b (from an O set) can move towards, i.e. between the boundaries: line (x), which is perpendicular tothe central node r, and (boundrb ), the circle surrounding the central node. These boundaries change overtime, i.e. (boundrb ) increases in size, and line (x) changeswith respect to r, as b moves around the cluster’s outer area. Any changes in the boundaries define new datapoints, which the dispersing process temporarily excludes.If node b moved to the cluster’s right side its line boundary would change, e.g. line (y). Bottom: This shows the boundaries used in the pushing process for node band the central node. When both boundaries overlap, the process pushes b out. The boundary surrounding the central node increases over time pushing b further outuntil a stopping condition has been satisfied, i.e b is near the cluster’s edge.2. Quit the process: if g satisfies conditions 1, 2 or 3, explained where λ2 is a small arbitrary amount to push the node. We in the next section. used 0.1.3. Increase the boundary for r associated with g using: We attribute small values to both κ and λ2 to move the nodes boundrg = dist{Wr , Wg } − bnd g + κ (4.3) out a small distance. Therefore, the nodes have a higher chance of spacing out around the clusters’ in the dispersing stage. Oth- where κ adds a small amount to the boundary. We use a small erwise, the process may push a subset of the nodes too far into amount to produce a slight overlap between both r and g’s an area that confines them. boundaries. We used an arbitrary number of 0.01.4. Update node g’s weight vector, i.e. push out node g, contin- Verifying a node’s new position. We have to verify that uously, until g no longer satisfies: the process will not push the chosen node g out from, or into dist{Wr , Wg } < (bound rg + bnd g ) another cluster. To achieve this, we do not increase boundrg if g satisfies one of the three conditions described below. using: The first two conditions analyses the numdg score over sub- sets of consecutive pushes. The final condition examines node Wg = Wg − λ2 (Wr − Wg ) (4.4) g’s new position, i.e. is it within another cluster?
  • 12. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 634. Calculating the numd score for every push Both conditions 2 and 3 have disadvantages, however, us- ing the conditions simultaneously negates them. Condition The numdg score is the number of datapoints that satisfy: 3 can determine whether node g is moving towards another(dist{Wg , x} < rad r )&(M < 90) cluster earlier than condition 2. However, it will only deter- mine this successfully when the outer nodes (O sets) reachwhere x is a vector of a datapoint. We associate the central nodes’ their cluster’s edge before node g.rad values, i.e. the values recorded in extension I stage 2, to thenodes in their corresponding O set, i.e. g’s rad value is the same 4.4.3. The dispersing processas its corresponding central node. We calculate M by rearrang- For the rest of the epoch, we disperse all non-central nodesing Eq. (4.2), where k = dist{Wg , Wx }, l = dist{Wr , Wg }, m = (sets O) around their cluster’s outer area.dist{Wr , Wx }. To achieve this, we use and modify the Kohonen training4. Set of conditions to analyse node g process, so:• Condition 1: this condition determines whether node g 1. We no longer use fairer competition. currently represents an area near the edge of its cluster: 2. A winning node is from one of the O sets. We can determine this in two ways: 3. If the current datapoint is not within the centre or outside the winning node’s cluster (explained below) we move (update) 1. If g satisfies (numd g < αg ), the node represents an ac- the winning node. ceptable cluster area. (i) We calculate αg using the numd g value of the first Identifying and moving the winning node. To achieve push and dividing it by an arbitrary number of 3. this, we: 2. If g satisfies (numdg == 0), the node represents an empty region of the feature space, i.e. it is outside the cluster. 1. Identify a winning (closest) node d for every datapoint x. 2. Do not update d using x, i.e.: If g’s cluster overlaps another, condition 1 may fail to de- (a) Within the boundary of the corresponding central node tect whether g is near the edge. Therefore, the process would r, i.e. bound rd . continually push g into the wrong cluster. To prevent this, we (b) Behind the winning node d with respect to r, i.e. if M > also analyse g’s position using two other conditions. 90. We rearrange Eq. (4.2) to calculate M, where k =• Condition 2: this condition determines whether the pro- dist{Wd , x}, l = dist{Wr , Wd }, m = dist{Wf , x}. cess is pushing node g into another cluster: These constraints confine d’s movement to its cluster’s outer area. Therefore, we update Wd , i.e. move d, if: To test for this we: (dist{Wr , x} ≥ bnd rd + bnd d )&(M ≤ 90) 1. Average every two (which is arbitrary) numdg scores and using x. Additionally, we add bndd to prevent r and d’s bound- append them to navg . aries overlapping. 2. Use the averages to identify whether the process has pushed g into another cluster, i.e. g has been pushed into Fig. 4 (top right panel) shows an example of the datapoints another cluster if: that the process excludes from updating a node b (left cluster). (navgg (p − 1) < navgg (p)) 5. Noise extraction where p is the current average. Therefore, we move g to a position within the correct cluster and cease pushing it. Electrical noise is almost invariably incorporated into electro- physiological data. This often originates from electrical equip-• Condition 3: this condition determines whether g’s new ment, e.g. from electrical actuators for delivering stimuli. Such position np is within another cluster: i.e. between a central signals may exceed the spike-triggering threshold and may re- node f and a node from its corresponding O set. semble spikes in their form and amplitude. They are often To test for this we: difficult to exclude from electrophysiological datasets. Here we introduce a process for extracting waveforms that have 1. Identify a node from each set of O, except from g’s O set, been caused by noise, i.e. artifacts, from the spike waveform which is closest to the new position np. Thus, a set of Y datasets, so they do not compromise the results of subsequent nodes is formed. analyses. 2. Use this set to determine whether np is within another To achieve this extraction we need to construct a manually cluster, i.e. g’s new position is in another cluster, if: selected set of noise waveforms for each channel of data, and then implement the following: dist{Wf , np} < dist{Wf , Wh } where Wf is a weight vector (node) from Y, and Wf is the 1. Preprocess the spike waveform dataset. weight vector of the corresponding central node from R. 2. Use the clustering stage to identify several clusters.
  • 13. 64 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–683. Use the classification stage to define the clusters. 7. Applications to experimental data4. Acquire a set of weights: We obtain an average spike wave- form from every cluster, using the raw data. The noise wave- 7.1. Datasets forms and the average spike waveforms constitute our set of weights. Electrophysiological data was acquired from two animal sys-5. Transform the set of weights and the spike waveform tems, the olfactory bulb (OB) of anaesthetized rats, and the in- dataset again, using our pre-processing method: This time ferotemporal cortex of awake behaving sheep. All experimental we append the full set of unmodified curv scores, i.e. not procedures involving animals were conducted in strict accor- averaged or rounded, to each of the PCA component sets. dance with the Animals (Scientific Procedures) Act, 1986. Rats This is because the artifacts are normally very different, were anaesthetized (25% urethane, 1500 mg/kg) and fixed in and will form distinguishable clusters without any modifica- a stereotaxic frame. A craniotomy was performed and the left tions. eye enucleated to expose the left OB and allow a 30 channel6. Use the classification process with the data from step 5 and MEA (6 × 5 electrodes) to be positioned laterally in the mi- the transformed weights: We remove the clusters associated tral cell layer of the region. Throughout the recordings, humid- to the artifacts. ified air was supplied to the rat via a mask over the nose (these procedures are described in greater detail elsewhere, Christen We show an example set of noise waveforms in Fig. 5 (top et al., in press; Fischer et al., submitted; Horton et al., 2005).left panel). Sheep, under halothane (fluothane) anaesthesia, were implanted with two chronic 64-channel MEAs in the right and left infer-6. Results generated from using the spike sorting otemporal cortices, respectively. Recordings were made fromprocess on simulated data the unanaesthetised sheep while they performed an operant dis- crimination task in which different pairs of sheep faces were To test the efficiency of our process, we created two sim- presented and a correct panel-press response elicited a food re-ulated datasets. The first contained three clusters, which were ward. For both preparations, individual electrodes were fabri-positioned apart and had similar distributions. The second con- cated from tungsten wires (125 m diameter) sharpened to atained two closely positioned clusters, where one had a wider < 1 m tip and insulated with epoxylite. Electrode impedancesdistribution. We used SOMA on these datasets with an arbi- were ∼200 k . Neuronal activity (spikes) was sampled extra-trary number of nodes. We used 10 as this exceeded the num- cellularly from each electrode.ber of clusters. The analysis on both datasets using all 10nodes with SOMA appeared more effective and reliable than 7.2. Application to recordings from the rat olfactory bulbthe method based on K-means (using SOMA with only the cen-tral nodes). The spike data used in this section was recorded from the SOMA classified 100% of the first dataset correctly and 85% mitral cell layer of the OB when no odour stimulus was pre-of the second when used with only the central nodes. The im- sented. The approximate firing rate of the mitral cells is knownplicit boundary formed for the second dataset could not separate to be 10–50 spikes/s, and we incorporated this range into our pre-the clusters sufficiently (shown in Fig. 3, bottom panel, right). processing stage. We used SOMA with 30 outputs, unless oth-However, when all 10 were used (shown in Fig. 3, bottom panel, erwise stated, which we calculated using the method explainedleft), 98% were correctly classified. in the Kohonen network section. Secondly, we examined how the classification results ofSOMA are affected when we vary the number of nodes used.We assumed that increasing the number of nodes improves the 7.2.1. Using SOMA with a dataset containing a knownclassification result, especially for large or closely positioned number of cellsclusters. To test this assumption we firstly used SOMA, with a In this section we used spikes recorded in two sampling ses-constant number of nodes (in this case 10), on several datasets. sions, using a single electrode positioned at different locationsEach dataset contained three clusters, which varied in size and in the mitral cell layer of the olfactory bulb in a single animal.position. We found that if the clusters, independent of size, were Thus, it was certain that the data was from two distinct neu-positioned apart the process could still classify 98% of these rons. The data was merged into a single dataset. Our objectivedatasets correctly. This is because an early stage in the process here was to determine whether the results acquired by SOMA,distributes the nodes appropriately. For example, the process using the merged dataset, corresponded to the informationuses more nodes to represent larger clusters (shown in Fig. 3, known.top left panel), or, equals them out to represent clusters with We used SOMA with nine nodes, and from our pre-processingsimilar distributions (shown in Fig. 4, top left panel). The re- method we found that PCA alone, using 11 components, formedsult decreased to 92% when we used the process on close and two clusters (shown in Fig. 5, top right panel) with firing ratesslightly overlapping clusters. The result improved by 6%, i.e. in the correct range. SOMA identified the correct number ofcluster separation was enhanced, when we increased the num- clusters and sufficiently distributed the nodes in each (shownber of nodes, however, using too many can be computationally in Fig. 5, top right panel). The process then implemented theexpensive. classification procedure.
  • 14. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 65Fig. 5. Top left: An example set of artifacts extracted from the rat olfactory bulb dataset. Top right: This shows the number of clusters formed using PCA, wherecomponents 3–5 are shown. We used nine nodes in the clustering process, where the middle nodes (green) represent the set R, and the nodes connected to themrepresent the O sets (black). Bottom: This shows the number of clusters formed using PCA, where components 1–3 are shown. (For interpretation of the referencesto colour in this figure legend, the reader is referred to the web version of the article.) We compared the classification results SOMA produced with The objective was to determine whether SOMA could formthe two single cell datasets, and found that SOMA had classified and identify several clusters (neurons) from the dataset. To100% of the spikes correctly, i.e. the process associated all the achieve this, we firstly produced PCA components from thespikes to the correct neurons. spike waveform data. These components only formed one clus- We show in Fig. 6 (top panels left and right), the set of wave- ter (Fig. 5, bottom panel) with an extrapolated firing rate ex-forms attributed to each group, and the lower panel shows the ceeding 200 spike/s, which is an unfeasibly high rate for a mi-average waveform shape from each. In conclusion, SOMA iden- tral cell to sustain. To resolve this, we used our pre-processingtified the correct number of neurons and associated the correct method further, which appended four curvature components tonumber of spikes to each. the PCA sets; creating five more clusters, each with a firing rate in the correct range. We show the groups found in Fig. 7, pan- els a–f. The groups can be associated to different neurons, as7.2.2. Using SOMA with a dataset containing an unknown each group corresponds to a distinctive shape (Fig. 7, panel g).number of cells For example, the green spike rises much earlier than the red In this section, we used data recorded from the multi- one.electrode array (described earlier). We used SOMA with all the On average, using Matlab and a 3.1 GHz processor, it tookelectrodes on the array, however, we only present the results ac- ∼1.5 min for the SOMA process to identify several groups andquired from the dataset of one electrode, which contained the classify ∼350,000 waveforms recorded from one electrode onspiking activity of several neurons. the array. This result varied depending on the subset size of spike
  • 15. 66 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68Fig. 6. Top left and right: These show the spike waveforms attributed to the cluster groups 1 and 2, respectively. Bottom: This shows the average waveform fromboth groups.waveform data used in the training process. We used an arbitrary We found that SOMA was similarly successful when ap-size of 100,000. plied to this dataset. We show the results in Fig. 8 (pan- els a–f) where SOMA identified the activity of six neurons7.3. Application to recordings from sheep temporal cortex using eight extra curvature components (double the number used with the rat recordings). SOMA used more components The spike data for this section was recorded from the sheep to describe the waveforms as they were more complex, thus,temporal cortex using a multi-electrode array (procedure de- more difficult than the rat recordings to distinguish. In Fig.scribed earlier). Below, we show the SOMA results acquired 8 (panel g), we show the average spike waveform from eachwhen used with the data from one electrode. group. Fig. 7. Panel a–f in each panel, spikes generated by an individual neuron are overlayed. Panel g shows the average spike waveform for each neuron.
  • 16. P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68 67 Fig. 8. Panel a–f in each panel, spikes generated by an individual neuron are overlayed. Panel g shows the average spike waveform for each neuron. We have shown, from the rat and sheep result sections, that most, if not all, algorithms to sort spikes, only the raw data onSOMA can identify several neurons within a dataset regardless potential traces are used. It is well-known in machine learn-of waveform shape complexity by adapting its process according ing approaches that feature extractions or pre-processing ofto the data presented, i.e. increasing the number of components data is as important as the algorithm itself. The curvature of ato describe more complex waveform shapes. The process, there- spike waveform is obviously an important feature of a spike,fore, is not restricted to work with a particular dataset and can as clearly revealed in Figs. 7 and 8. Furthermore, a discrim-be used with recordings from any animal or brain area, not just ination task such as spike sorting becomes easier when wethe ones presented here. correctly project our data to a higher dimension, as amply We have recently used SOMA on spike data recorded from the demonstrated in support vector machine approach (Williamssheep temporal cortex using two 64-channel arrays (results not et al., 2005).shown), which identified 496 neurons. Using SOMA, therefore, 2. Automatically identifying the number of clusters (neu-has proven to be a very useful tool in identifying large numbers rons) without giving the process any indication. Clusteringof neurons, which is a critical stage if one wants to understand algorithms such as Kohonen algorithm requires some aprioriinformation processing fully within the brain. Recent studies information such as the number of neurons recorded in a sin-have shown that neuron ensembles interacting together can con- gle electrode. The information is certainly not available andvey more information about a stimulus (Franco et al., 2004). We this is the bottleneck for the application of such algorithms.discuss the analysis of our data in a forthcoming paper. Various techniques such as ATR, etc. have been reported in the literature. However, SOMA developed here is completely8. Discussion different and the successful applications of our algorithm to real data validates our approach. A comprehensive understanding of the principles employed 3. Analysing the clusters’ distributions to enhance clusterby the brain in processing information requires sampling the ac- separation, thus reducing misclassification. Many classi-tivity of many individual neurons in neural systems. We have fication techniques do not analyse cluster distributions andpresented here a novel approach combining several machine will only classify a high percentage of spike waveforms cor-learning techniques to identify the number of cells, and their rectly if the clusters have a spherical distribution and areactivity, from multielectrode recordings. clearly separated (as discussed earlier). This cluster forma- SOMA is a fully automated process that requires minimal tion is not always present in spike datasets, therefore theseuser involvement and is computationally inexpensive; these are techniques may not produce the best results.vital considerations when sorting through simultaneous record- 4. Optimizing the speed of the process by using a minimalings across multielectrode arrays. Thus, the process sorts noisy amount of memory. SOMA achieves this by identifying andspike data fast and efficiently, thereby outperforming traditional using a minimal set of feature components, and, by using aspike sorting/clustering methods. SOMA achieves this by in- subset of the data to train the neural network.cluding techniques, which are normally absent, such as The number of outputs (nodes) used with SOMA can affect1. Mapping the input space into a higher dimensional fea- its speed or result, i.e. too many will decrease its speed, or, too ture space with additional information on curvatures. In few will decrease the classification result. Therefore, the number
  • 17. 68 P.M. Horton et al. / Journal of Neuroscience Methods 160 (2007) 52–68chosen needs to achieve what is most important, i.e. acquiring Christen M, Nicol AU, Kendrick KM, Ott T, Stoop R. Stabilization not synchro-results quickly, acquiring the best results, or a combination of nization: stable neuron clusters signal odour presentation. Neuroreport; inboth. press. Csicsvari J, Hirase H, Czurko A, Buzsaki G. Reliability and state dependence In the current paper, we have successfully developed the crit- of pyramidal cell–interneuron synapses in the hippocampus: an ensembleical technique of effectively sorting the activities of individual approach in the behaving rat. Neuron 1998;21:179–189.neurons from multiple neuron spike trains though this technique Fischer H, Leigh AE, Tate AJ, Kendrick KM. Neural activity patterns in sheepcan be applied to extracting recurring motifs from any noisy temporal cortex during face emotion recognition tasks. Behav Brain Funct,data, which we showed an example of earlier. The techniques submitted for publication. Franco L, Rolls ET, Aggelopoulos NC, Treves A. The use of decoding to analyzepresented here have allowed us to analyse and compare the data the contribution to the information of the correlations between the firing offrom two systems (rat olfactory bulb and sheep temporal cor- simultaneously recorded neurons. Exp Brain Res 2004;155:370–384.tex) across experimental variables (not presented here). This Harris KD, Henze DA, Csicsvari J, Hirase H, Buzsáki G. Accuracy of tetrodeenabled us to elaborate the information flow in the recorded spike separation as determined by simultaneous intracellular and extracel-area, as reported elsewhere (Christen et al., in press; Nicol et lular measurements. J Neurophysiol 2000;84:401–414. Horton PM, Bonny L, Nicol AU, Kendrick KM, Feng JF. Applications of multi-al., 2005; Tate et al., 2005). For example, which cell fires first variate analysis of variance (MANOVA) to multi-electrode array electro-in response to stimulus presentation (an odour, a face or an ob- physiology data. J Neurosci Methods 2005;146:22–41.ject); what stimulus properties elicit the response; is a response Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer-Verlag;modified by behavioural conditions; does the right hemisphere 2002. p. 1–10.or the left hemisphere respond to the stimulus first; what are the Kohonen T. Self-organizing maps. 2nd ed. Berlin: Springer-Verlag; 2001. p. 1–40.response latencies? Our data allows us to look at the combined Lewicki M. A review of methods for spike sorting: the detection and classifica-activity of many individual neurons, revealing much more of tion of neural action potentials. Network-Comp Neural 1998;R53–78.the information processing functions of neuronal networks that Nicol AU, Magnusson MS, Segonds-Pichon A, Tate A, Feng JF, Kendrick KM.can be achieved by studying the activities of individual neu- Local and global encoding of odor stimuli by olfactory bulb neural net-rons. works. Annual meeting of the society for neuroscience 2005. Program No. 476.6.2005 Abstract Viewer/Itinerary Planner; 2005 [online]. Petersen R, Panzeri S. A case study of population coding: stimulus localisa-Acknowledgements tion in the barrel cortex. In: Feng JF, editor. Computational neuroscience: a comprehensive approach. Florida: Chapman and Hall/CRC; 2004. 375–396. J.F. was partially supported by grants from UK EPSRC Rutter J. Geometry of curves. 3rd ed. Florida: Chapman and Hall/CRC; 2000.(GR/R54569), (GR/S20574), and (GR/S30443). KMK was sup- p. 133–147.ported by a BBSRC project grant. Tate AJ, Nicol AU, Fischer H, Segonds-Pichon A, Feng JF, Magnusson MS, et al. Lateralised local and global encoding of face stimuli by neural networks in the temporal cortex. Annual meeting of the society for neuroscience 2005.References Program No 362.1; 2005 Abstract Viewer/Itinerary Planner; 2005 [online]. Williams P, Li S, Feng JF, Wu S. Scaling the kernel function to improveBishop C. Neural networks for pattern recognition. 1st ed Oxford: Clarendon performance of the support vector machine. Lect Notes Comput Sci Press; 1995. p. 310–318. 2005;3496:831–836.