Brainwave Feature Extraction, Classification & Prediction
Upcoming SlideShare
Loading in...5
×
 

Brainwave Feature Extraction, Classification & Prediction

on

  • 2,165 views

This document will examine issues pertaining to feature extraction, classification and prediction. It will ...

This document will examine issues pertaining to feature extraction, classification and prediction. It will
consider the application of these techniques to unlabelled Electroencephalogram (E.E.G.) data in an
attempt to discriminate between left and right hand imagery movements

Statistics

Views

Total Views
2,165
Views on SlideShare
2,164
Embed Views
1

Actions

Likes
0
Downloads
86
Comments
0

1 Embed 1

http://www.docshut.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Brainwave Feature Extraction, Classification & Prediction Brainwave Feature Extraction, Classification & Prediction Document Transcript

  • Cognitive ComputingFeature Extraction, Classification & Prediction www.oliviamoran.me
  • About The AuthorOlivia Moran is a leading training specialist who specialises in E-Learning instructional design and is a certifiedMoodle expert. She has been working as a trainer and course developer for 3 years developing and deliverytraining courses for traditional classroom, blended learning and E-learning.Courses Olivia Moran Has Delivered:● MOS● ECDL● Internet Marketing● Social Media● Google [Getting Irish Businesses Online]● Web Design [FETAC Level 5]● Adobe Dreamweaver● Adobe Flash● MoodleSpecialties:★Moodle [MCCC Moodle Certified Expert]★ E Learning Tools/ Technologies [Commercial & Opensource]★ Microsoft Office Specialist★ Web Design & Online Content Writer★ Adobe Dreamweaver, Flash & Photoshop FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 3
  • Feature Extraction, Classification & Prediction1.. ABSTRACT1 ABSTRACTThis document will examine issues pertaining to feature extraction, classification and prediction. It willconsider the application of these techniques to unlabelled Electroencephalogram (E.E.G.) data in anattempt to discriminate between left and right hand imagery movements. It will briefly reflect on theneed for brainwave signal preprocessing. The feature extraction and classification process will beexamined in depth and the results obtained using various classifiers will be illustrated. Classificationalgorithms will be given some thought, namely Linear Discriminant Analysis (L.D.A.), K-NearestNeighbour (K.N.N.) and Neural Network (N.N.) analysis. This document will explore prediction andhighlight its effect on accuracy. Due to time and knowledge constraints the data could not be testedusing all the desired approaches however, these are briefly addressed. The way in which biology andnature inspires the design of feature extraction, classification and prediction systems will be explored.Finally future work will be touched on.2.. IINTRODUCTIION2 NTRODUCT ONThe study of E.E.G. data is a very important field of study that according to Ebrahimi et al (2003) hasbeen ‚Motivated by the hope of creating new communication channels for persons with severe motordisabilities‛. Advances in this area of research caters for the construction of more advanced BrainComputer Interfaces (B.C.I.’s). Wolpaw et al (2002) describes such an interface as a ‚Non-muscularchannel for sending messages and commands to the external world‛. The impact that suchtechnologies could have on the quality of peoples’ everyday lives, namely those who have some formof physical disability is enormous. ‚Brain-Computer Interfacing is an interesting emerging technologythat translates intentional variations in the Electroencephalogram into a set of particular commands inorder to control a real world machine‛ Atry et al (2005). Improvements to these systems are oftenmade through an increased understanding of the human body and the way in which it operates.Feature extraction, classification and prediction are all processes that our bodies carry out on a dailybasis with or without our knowledge. Studying such activities will undoubtedly lead researchers to thecreation of more biologically plausible B.C.I. solutions.It is not only individuals who will benefit from further studies and understanding of these processes, asfeature extraction, classification and prediction have many other applications. Take for example, theworld of business. Companies everywhere have to deal with a constant bombardment of informationfrom both their internal and external environments. There seems to be an endless amount of bothuseful and useless information. As one can imagine, it is often very difficult to find exactly what youare looking for. When people eventually locate what they have been seeking it may be in a formatthat does not suit them. This is where feature extraction, classification and prediction play their part.These processes are often the only way in which a business can locate information gems in a sea ofdata. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 4
  • This document explores the various issues pertaining to feature extraction, classification andprediction. The application of these techniques to unlabelled E.E.G. data is examined in an attempt todiscriminate between left and right hand imagery movements. It briefly looks at brainwave signalpreprocessing. An in depth study of the feature extraction and the classification process is carried outfocusing on numerous classifiers. L.D.A., K.N.N. and N.N. classification algorithms are examined. Thisdocument gives thought to prediction and how it could be used to improve accuracy. Due to time andknowledge constraints the data could not be tested using all the desired approaches, however, thesemethods are mentioned in this document. Biology and nature often inspire the computing industry toproduce feature extraction, classification and prediction systems that operate in the same or a similarway as the human body does. This issue of inspiration is briefly addressed and examples from natureare given. Finally areas for future work are considered.3.. BRAIINWAVE SIIGNAL PREPROCESSIING3 BRA NWAVE S GNAL PREPROCESS NGE.E.G. data is commonly used for tasks such as discrimination between left and right hand imagerymovements. ‚An E.E.G. is a recording of the very weak electrical potentials generated by the brain onthe scalp‛ Ebrahimi et al (2003). The collection of such signals is non-invasive and they can be ‚Easilyrecorded and processed with inexpensive equipment‛ Ebrahimi et al (2003). It also offers manyadvantages over other methods as ‚It is based on a much simpler technology and is characterized bymuch smaller time constants when compared to other noninvasive approaches such as M.E.G, P.E.T. andF.M.R.I.‛ Ebrahimi et al (2003).The E.E.G. data used as input for the analysis carried out during the course of this assignment had beenpreprocessed. Ebrahimi et al (2003) points out ‚Some preprocessing is generally performed due to thehigh levels of noise and interference usually present‛. Artifacts are factors such as motor movements,eye blinking, electrode movement etc. that are removed, as these are not required and all the essentialdata needed to carry out classification is left behind.The E.E.G. data was recorded on two different channels, C3 and C4. These correspond to the left andright hemisphere of the motor cortex and would have been recorded by placing electrodes over theright and left sides of the motor cortex as shown in the figure 1 below.Figure 1. – Showing the placing of the electrodes at channels 3 and 4 of the motor cortex. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 5
  • It is important to record signals at these two channels due to the fact that ‚When people execute orimagine the movement of left and right hand, E.E.G. features differs in two brain hemispherescorresponding to sensorimotor hand representation area‛ Pei & Zheng (2004). Subsequently, when animagined left hand movement is made, there are essentially two signals recorded C3 and C4, with bothbeing left signals and vice versa for the right hand imagery movements.4.. FEATURE EXTRACTIION4 FEATURE EXTRACT ONA feature is described by Sriraja (2002) as ‚Any structural characteristic, transform, structuraldescription or graph, extracted from a signal or a part of it, for use in pattern recognition orinterpretation. It is a representation of the signal or pattern, containing only the salient information‛.Ripley (1996) goes on to argue that a ‚Feature is a measurement on an example, so the training set ofexamples has measured features and a class for each‛.Feature extraction is concerned with the identification of features that are unique or specific to aparticular type of E.E.G. data such as all imagined left hand movements. The aim of this process is theformation of useful new features by combining existing ones. Using such features facilitates theprocess of data classification. There are multiple amounts of these features; some provide usefulinformation while others none. The next logical step is the elimination of features that produce thelowest accuracy.For each test ran the accuracy of the classifier used was calculated. This was important as it allowedthe author to determine which classifiers gave the best results for the data being examined. Wolpert(1992) points out that ‚Estimating the accuracy of a classier is important not only to predict its futureprediction accuracy, but also for choosing a classifier from a given set (model selection), or combiningclassifiers‛.5.. THE CLASSIIFIICATIION PROCESS5 THE CLASS F CAT ON PROCESS5. 1. Descriptive ClassifiersIn an effort to find the most appropriate type of classifier for the analysis of the E.E.G. data used in thisassignment, the author turned to descriptive methods. These included basic features like the mean,standard deviation and kurtosis. Using this descriptive approach allows for the summarisation of thetest and training data. This is useful where the sample contains a large amount of variables.5. 1. 1. MeanThe mean is ‚Short for arithmetic mean: in descriptive statistics, the average value, calculated for afinite set of scores by adding the scores together and then dividing the total by the number of scores‛Coleman (2003). During ‘Descriptive Features – Test 1’ an accuracy of 64% was obtained using themean feature. It performed slightly higher than that of the standard deviation, which reached 61%accuracy. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 6
  • 5. 1. 2. Standard DeviationStandard Deviation is defined by Coleman (2003) as ‚A measure of the degree of dispersion, variabilityor scatter in a set of scores, expressed in the same units as the scores themselves, defined as the squareroot of the variance‛. ‘Descriptive Features – Test 2’ attempted to classify the E.E.G. data by utilisingthe feature of standard deviation. An accuracy of 61% was achieved.5. 1. 3. KurtosisKurtosis is useful in that it ‚Provides information about the ‘peakedness’ of the distribution. If thedistribution is perfectly normal you would obtain a skewness and kurtosis value of 0‛ Pallant (2001).The results obtained during ‘Descriptive Features – Test 3’ using the kurtosis feature weredisappointing with an accuracy of 49%. Kurtosis in this instance was not able to offer a higher level ofseparability than with both the mean and standard deviation. Kurtosis is usually more appropriate forlager samples, with which more satisfactory results could be accomplished. As noted by Tabachnick &Fidell (1996), ‚Kurtosis can result in an underestimate of the variance, however, this risk is also reducedwith a large sample‛.5. 1. 4. Combination Of Mean, Standard Deviation And Kurtosis FeaturesIn some instances the combination of features can allow for greater accuracy, however this was not thecase for the E.E.G. data that was examined using the mean, standard deviation and kurtosis. Testresults from ‘Descriptive Features – Test 4’ showed accuracy to be in the region of 49% giving muchlower performance than that of the mean and standard deviation features when used individually.5. 1. 5. Conclusion Drawn From Mean, Standard Deviation And KurtosisFeature TestsThe accuracy of the mean as a classifier was substantially higher than that of the standard deviationand kurtosis as well as a combination of all three. On the other hand, it still did not offer a satisfactorylevel of separation between the imagery left and right signals. These three features it seems are notappropriate for E.E.G. data and are better suited to more simple forms of data. With this in mind theauthor turned to the Hjorth features.5. 2. Hjorth FeaturesA number of Hjorth parameters were drawn upon during the course of this assignment. ‚In 1970, BoHjorth derived certain features that described the E.E.G. signal by means of simple time domainanalysis. These parameters, namely Activity, Mobility and Complexity, together characterize the E.E.G.pattern in terms of amplitude, time scale and complexity‛ Sriraja (2002). These were used in anattempt to achieve a separation between imagery left and right hand signals.The Hjorth approach involves the measurement of the E.E.G. signal ‚For successive epochs (or windows)of one to several seconds. Two of the attributes are obtained from the first and second time derivatesof the amplitude fluctuations in the signal. The first derivative is the rate of change of the signal’samplitude. At peaks and troughs the first derivative is zero. At other points it will be positive ornegative depending on whether the amplitude is increasing or decreasing with time. The steeper theslope of the wave, the greater will be the amplitude of the first derivative. The second derivative isdetermined by taking the first derivative of the first derivative of the signal. Peaks and troughs in the FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 7
  • first derivative, which correspond to points of greatest slope in the original signal, result in zeroamplitude in the second derivative, and so forth‛ Miranda & Brouse (2005).According to Sriraja (2002) mathematically the equation for mobility and complexity resembles thefollowing if x1, x2, …, xn are the n EEG data values, and the consecutive differences, xn - xn-1 bedenoted as dn5. 2. 1. Activity FeatureActivity is defined by Miranda & Brouse (2005) as ‚The variance of the amplitude fluctuations in theepoch‛. This feature during ‘Hjorth Features – Test 1’ was able to achieve only an accuracy of 44% andtherefore offered very poor separability. ‘Hjorth Features – Test 2’ used the same classifier, howeverthe time interval for sampling was changed from the 6th second to the 7th. This change resulted in anaccuracy of 55%, an increase of 11% on the previous test. ‘Hjorth Features – Test 3’ was also carriedout using the activity feature. This test aimed to determine whether or not changing the number ofneurons used in the N.N. would have a notable effect on the accuracy of the classification. A change inthis instance of the neuron numbers did not have a significant impact on performance.5. 2. 2. Mobility Feature‚Mobility is calculated by taking the square root of the variance of the first derivative divided by thevariance of the primary signal‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 4’ utilised thismobility feature for classification purposes. Results from this test showed that accuracy using thisfeature stands at 52%.5. 2. 3. Complexity FeatureComplexity is described as ‚The ratio of the mobility of the first derivative of the signal to the mobilityof the signal itself‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 5’ examined the complexityfeature and its effect on accuracy. Results for this test showed the level of accuracy using this featureto be 64%.5. 2. 4. Combination Of Activity, Mobility And Complexity Features‘Hjorth Features – Test 6’ combined the activity, mobility and complexity feature in the hope ofincreasing accuracy further. This test showed very mediocre results with accuracy at 56%. However,when the data windows were specified as in ‘Hjorth Features – Test 7’ more promising results wererecorded. Accuracy at 74% was achieved with a greater level of separability of the imagery left andright hand signals than all other pervious results. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 8
  • Combining multiple features is useful as it can often lead to improved accuracy. Lotte et al (2007)highlights this point arguing, ‚A combination of similar classifiers is very likely to outperform one ofthe classifiers on its own. Actually, combining classifiers is known to reduce the variance and thus theclassification error‛.6.. CLASSIIFIICATIION ALGORIITHMS6 CLASS F CAT ON ALGOR THMSKohavi (1995) defines a classifier as ‚A function that maps an unlabelled instance to a label usinginternal data structures‛. Three different types of algorithms were used for classification. Theseincluded the L.D.A, K.N.N. and the N.N. classification algorithms.6.1. L.D.A. ClassificationL.D.A. also known as Fisher’s L.D.A. is ‚Often used to investigate the difference between variousgroups when their relationship is not clear. The goal of a discriminant analysis is to find a set offeatures or discriminants whose values are such that the different groups are separated as much aspossible‛ Sriraja (2002). Lotte et al (2007) describes the aim of L.D.A. as being to ‚Use hyperplanes toseparate the data representing the different classes. For a two-class problem, the class of a featurevector depends on which side of the hyperplane the vector is‛. The L.D.A. is concerned with findingthe features that will maximise the distance between the two classes and reducing the distance thatexists among the interclass. This concept is illustrated in figure 2 below. Imagery Left Hand Data Imagery Right Hand Data Imagery Right Hand DataFigure 2. – Shows a hyperplane that is used to illustrate graphically the separation of the classes i.e. theseparability of the imagery left hand data from the imagery right hand dataThe equation for L.D.A. can be denoted in mathematical terms. Sriraja (2002) discusses the equation ofL.D.A. and the principles on which it works. ‚First, a linear combination of the features x are projectedinto a new feature, y. The idea is to have a projection such that the y’s from the two classes would beas much separated as possible. The measure of separation between the two sets of y’s is evaluated in FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 9
  • terms of the respective means and the variances of the projected classes . . . The objective is thereforeto have a linear combination such that the following ratio is maximised.‛where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, andwhere 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and n1 and n2 are thesample sizes for the two sets‛.During testing the author utilised scatter graphs like figure 3 below to display graphically the resultsfrom the tests. Figure 3 shows the scatter graph that was constructed as part of test, which attemptedclassification of the E.E.G. data using the mean feature. The accuracy achieved using this feature was64%. 0.08 0.06 0.04 0.02 0-0.02-0.04-0.06-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08Figure 3. – Mean Scatter GraphThe next graph Figure 4 illustrates the results of a test examining standard deviation with the accuracyof this feature standing at 61%. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 10
  • 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1Figure 4. – Standard Deviation Scatter GraphScatter graphs are described by Fisher & Holtom (1999) as useful for the presentation of ‚Therelationship between two different types of information plotted on horizontal, x, and vertical, y, axis.You simply plot the point at which the values meet, to get an idea of the overall distribution of yourdata‛. Pallant (2001) is keen to point out that ‚The scatter graph also provides a general indication ofthe strength of the relationship between your two variables. If the relationship is weak, the points willbe all over the place, in a blob type arrangement. For a strong relationship the points will form avague cigar shape with a definite clumping of scores around an imaginary straight line‛.6.2. K.N.N. ClassificationThe K.N.N. function is concerned with the computation of the minimum distance between the test dataand the data used for training. Ripley (1996) defines test data as a ‚Set of examples used only to assessthe performance of a fully specified classifier‛ while training data is a ‚Set of examples used forlearning, that is to fit the parameters of the classifier‛. The K.N.N. belongs to the family ofdiscriminative nonlinear classifiers. According to Lotte et al (2007) the main objective of this method is‚To assign to an unseen point the dominant class among its k nearest neighbours within the trainingset‛. A metric distance may be used to find the nearest neigbour. ‚With a sufficiently high value of kand enough training samples, K.N.N. can approximate any function which enables it to producenonlinear decision boundaries‛ Lotte et al (2007).6.3. N.N. ClassificationN.N.’s are widely used for classification ‚Due to their non-linear model and parallel computationcapabilities‛ Sriraja (2002). N.N.’s are described by Lotte et al (2007) as ‚An Assembly of severalartificial neurons which enables us to produce nonlinear decision boundaries‛. The N.N. used for theclassification tests was the Multilayer Perception (M.L.P.) which is one of the more popular N.N.’s. Itused 10 linear neurons for the first input layer and then 12 for the hidden layer. In this M.L.P. N.N‚Each neuron’s input is connected with the output of the previous layer’s neurons whereas the neuronsof the output layer determine the class of the input feature vector‛ Lotte et al (2007). FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 11
  • M.L.P. are useful for classification, provided they have a satisfactory amount of neurons and layers‚They can approximate any continuous function‛ Lotte et al (2007). They are commonly used as theycan quickly adapt to different problems and situations. However, it must be noted, ‚The fact thatM.L.P. are universal approximators makes these classifiers sensitive to overtraining, especially with suchnoisy and non-stationary data as E.E.G. therefore, careful architecture selection and regularization isrequired‛ Lotte et al (2007).The greater the amount of neurons available or used, the greater the ability of the N.N. to learnhowever, they are susceptible to over learning and therefore sometimes a lower amount of neuronsgives greater accuracy. Cross validation is useful as it is concerned with preventing the N.N. fromlearning too much and consequently ignoring new data when it is inputted.Usually training sets are small in size as it is very time consuming and costly collecting ‚Known cases fortraining and testing‛ Masters (1995). These small cases are often broken down further into relativelysmall sets for both training and testing, however this is not a desirable approach. Instead of taking thisaction one can avail of cross validation. This is a process which ‚Combines training and validation intoone operation‛ Masters (1995).When constructing a prediction rule reducing the error rate where possible is an important task. Efron(1983) describes an error rate as the ‚Probability of incorrectly classifying a randomly selected futurecase, in other words the exception‛ to the rule. Cross validation is often used to reduce this error rateand ‚Provides a nearly unbiased estimate, using only the original data‛ Efron (1983).6. 3. 1. Euclidean DistanceA part of the N.N. algorithm examines the Euclidean distance. This distance refers to the differencebetween the coordinates i.e. location of a set of objects squared. This Euclidean distance between twopoints where and can be denoted as6.. PREDIICTIION6 PRED CT ONFrank et al (2001) defines a time series as ‚A sequence of vectors, x(t), t = 0,1,… , where t representselapsed time. Theoretically, x may be a value which varies continuously with t, such as a temperature‛.This time series method can be used in prediction in what is known as time series prediction. It involvesthe examination of past performance to predict future performance.This according to Coyle et al (2004) can be used to improve classification accuracy. Their work uses a‚Novel feature extraction procedure which carries out self-organising fuzzy neural network based timeseries prediction, performing feature extraction in the time domain only‛. Using such a method intheir studies allowed for classification accuracies in the region of 94%. They argue that the main FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 12
  • advantage of this approach is that ‚The problem of specifying the neural network architecture doesnot have to be considered‛. Instead of adapting the parameters for individual users, the system can‚Self-organise the network architecture, adding and pruning neurons as required‛ just like with thehuman body.The author, using 6-step ahead prediction carried out a number of tests. The parameters for thesetests were set at the following, unless otherwise stated.  Data was trained and tested with x (trl3)  Embedding Dimension = 6  Time Lag = 1  Cross Validation was not used  Number of neurons available to the neural network = one layer of 6.All results were graphically displayed on a chart like that seen in figure 5 below. Training Vectors 0.15 0.1 0.05 Target and Output 0 -0.05 -0.1 -0.15 -0.2 0 500 1000 1500 2000 2500 3000 Time Step tFigure 5. – Shows the training data in blue and the test data in red. The difference between these twolines is referred to as the root square error or simply the error rate.7. 1. One Layer Neural NetworkThe first test examined accuracy using a neural network with one layer of 6 neurons. This test was ran10 times and then the average training root mean square and testing root mean square werecalculated. The training root mean square was recorded at 0.0324 and the testing root mean square at0.0313.7. 2. Multi Layer Neural NetworkThe next test was conducted using the exact same parameters except the neural network was changedfrom a single layer network with 6 neurons to one that also has a hidden layer of 8 neurons. Theresults from this test were slightly worse than the previous with a training and testing root meansquare of 0.0326 and 0.0314. The difference between the figures from test 1 and test 2 were extremelyminuet. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 13
  • 7. 3. Cross ValidationThe next test was exactly the same as test 1 except that cross validation was used to determine whetheror not it has a negative or positive effect. The training data scored slightly better with cross validationat 0.0293 compared to 0.0324 obtained in test 1. On the other hand the testing data performed betterin test 1 with 0.0313 rather than 0.0317 found with cross validation.7. 4. Left E.E.G. DataA test was carried out which used trl3 to train the network and trl4 to test it. The training meansquare root was relatively the same as previous experiments using the same parameters for the trainingdata. The testing mean square root however, was much improved with a result of 0.0240 compared to0.0313 using trl3 for training.7. 5. Right E.E.G. DataTests were conducted using the right data. The N.N. was trained and tested with trr3. The error was alot less than that found with the tests on the left data using the same parameters. 0.0292 was recordedfor the training mean root square error and 0.0281 for the testing mean root square error. The rightdata was also tested to see what effect testing the N.N. with trr4 instead or trr3 would have on theperformance. The training root mean square error stayed more or less the same and the testing rootmean square error increased slightly to 0.0293.8.. OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACTIION8 OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACT ONThere are many other methods that could be used and that offer satisfactory performance when itcomes to feature extraction for B.C.I’s.8. 1. Amplitude And Phase Coupling MeasureOne such approach was created by Wei et al (2007), it is known as the ‘Amplitude and Phase CouplingMeasure’. This method is concerned with ‚Using amplitude and phase coupling measures, quantifiedby a nonlinear regressive coefficient and phase locking value respectively‛. Wei and his colleaguescarried out studies utilising this approach. The results obtained from the application of this featureextraction method were promising. The ‚Averaged classification accuracies of the five subjects rangedfrom 87.4% to 92.9%‛ and the ‚Best classification accuracies ranged between 84.4% and 99.6%‛. Theconclusion reached from these studies is that ‚The combination of coupling and autoregressivefeatures can effectively improve the classification accuracy due to their complementarities‛ Wei et al(2007). FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 14
  • 8. 2. Combination Of ClassifiersSome researchers in an effort to improve performance and accuracy have begun using multipleclassifiers to achieve the desired results. The author attempted this approach with the combination ofmean, standard deviation and kurtosis as well as activity, mobility and complexity however, there arevarious different strategies that can be followed. These include boosting, voting and stacking to namebut a few. Boosting basically operates on the principle of cooperation with ‚Each classifier focusing onthe errors committed by the previous ones‛ Lotte et al (2007).Voting on the other hand works like a voting system. The different modules of the N.N. are ‚Modeledas multiple voters electing one candidate in a single ballot election assuming the availability of votespreferences and intensities. All modules are considered as candidates as well as voters. Voting bids arethe output-activations of the modules forming the cooperative modular structure‛ Auda et al (1995).Those candidates who have the majority vote wins. According to Lotte et al (2007) ‚Voting is the mostpopular way of combining classifiers in B.C.I. research, probably because it is simple and efficient‛.Another strategy used for the combining of classifiers is what’s known as ‘Stacking’. This methodaccording to Ghorbani & Owrangh (2001) ‚Improves classification performance and generalizationaccuracy over single level cross-validation model‛.8. 3. Multivariate Autoregressive Analysis (M.V.A.R.)Studies have been conducted in the past based on the M.V.A.R. model. Pei et al (2004) carried out sucha study and boasts a classification accuracy of 88.57%. They describe the MVAR model as ‚Theextension form of univariate A.R. model‛ and argue, ‚Using the coefficients of M.V.A.R. model as EEGfeatures is feasible‛.9.. IINSPIIRATIION FROM BIIOLOGY9 NSP RAT ON FROM B OLOGYThere is no doubt that inspiration for some of the classification and prediction techniques that we usetoday came from the world of biology. Shadbolt (2004) points out that ‚We see complexity all aroundus in the natural world – from the cytology and fine structures of cells to the organization of thenervous system . . . Biological systems cope with and glory in complexity – they seem to scale, to berobust and inherently adaptable at the system level . . . Nature might provide the most directinspiration‛. The author shares the view of Bamford et al (2006) that ‚An attempt to imitate abiological phenomenon is spawning innovative system designs in an emerging alternativecomputational paradigm with both specific and yet unexplored potential‛.9. 1. Classification And Object RecognitionOur brains are constantly classifying things in our everyday environment whether we are aware of it ornot. Classification is the process that is responsible for letting us determine what the objects around usare i.e. a chair, a car, a person. It even allows us to recognise different faces of people with whom wecome in contact with. The brain is able to distinguish each specific object by examining its numerousfeatures and does so with great speed and accuracy. Many systems seek to reproduce a similar meansof classifying data and can be useful in nearly every kind of industry. Take for example, the medical FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 15
  • industry in which classification plays a crucial role. Classification is used extensively for theidentification of almost every kind of disease and illness. The process of diagnosis would be muchmore complex and time consuming if classification techniques were not applied to it.9. 2. Self-OrganisationComputer systems i.e. neural networks can be constructed on the same principles and concepts of self-organisation in humans. The term self-organisation is used to describe the process by which ‚Internalstructures can evolve without the intervention of an external designer or the presence of somecentralised form of internal control. If the capacities of the system satisfy a number of constraints, itcan develop a distributed form of internal structure through a process of self-organisation‛ Cilliers(1998). Self-organising maps are widely used a method for feature extraction and data mapping aswell as prediction. Self-organising neural networks can encompass a time series prediction elementand often with huge success. These can be extremely useful for predicting trends in different areassuch as weather forecasting, marketing, the list is endless.The various prediction algorithms available work in the same way as the nervous system in humans.These programs aim to replicate the ‘anticipatory neural activity’ that occurs in the body and reproducethis in a system. Take for example a financial decisions system recently developed. This system lookedat how using the ‘anticipatory neural activity’ element and taking it into consideration could helppeople using this system to make decisions that are more likely to be successful and thus less risky.When people are making financial decisions, they can often opt for an option that seems like theirrational one. The reasons for this irrational thought had not previously been known.Kuhnen & Knutson (2005) examined ‚Whether anticipatory neural activity would predict optimal andsuboptimal choices in a financial decision-making task‛. They observed that the nucleus accumbenswas more active when risky choices were being made and that anterior insula when riskless optionswere being followed. From their findings they concluded that particular neural circuits linked toanticipatory affect would either hinder or encourage an individual to go for either a risky or risklesschoice. They also uncovered the fact that an over activation in these circuits are more likely to causeinvesting mistake and ‚Thus, consideration of anticipatory neural mechanisms may add predictivepower to the rational actor model of economic decision making‛. The system was able to replicaterelatively successfully the way in which humans make investment decisions.10.. FURTURE WORK10 FURTURE WORKThe combination of classifiers is gaining popularity and becoming more widely used as a means ofimproving accuracy and performance. From researching this topic one can see that most publicationsdeal with one particular classifier with little effort been taken to compare one classifier to the next.Studies could be undertaken in an attempt to compare these to particular criteria.There is a lot more room for improvement considering the algorithms that are available at themoment. A deeper understanding of the human brain and how it classifies and predicts should lead tothe creation of more biologically plausible solutions. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 16
  • 11.. CONCLUSIION11 CONCLUS ONThis document addressed the various issues pertaining to feature extraction, classification andprediction. It focused on the application of these techniques to unlabelled E.E.G. data. This was donein an effort to discriminate between left and right hand imagery movements. It briefly reflected onthe need for brainwave signal preprocessing. An in depth analysis of the feature extraction andclassification process was carried out and the results highlighted. Classification algorithms wereexamined, namely L.D.A., K.N.N. and N.N. This document looked at prediction and its effect onaccuracy. Due to time and knowledge constraints the data could not be tested using all the desiredapproaches, however, a number of these other methods not tested were dealt with. This documentalso highlighted the fact that inspiration for the design of feature extraction, classification andprediction systems often comes from nature. Finally thought was given to future work.From studying the E.E.G. data and carrying out various tests using numerous parameters and classifiers,it has been concluded that a combination of the three Hjorth features, activity, mobility andcomplexity gives the highest level of accuracy. The author discovered that the descriptive classifiersdrawn upon are not suitable for E.E.G. data, as they do not provide a satisfactory level of separation,they work better with simple data. It was found that feature extraction and classification enjoyedmore success by using cross validation and a multiple layer N.N. in contrast to prediction that was bestsuited to a single layer N.N. without cross validation.The greatest level of accuracy recorded using the combined Hjorth features was 74%. Separability ofthe left hand imagery motor signal from the right was greater at 7 seconds than it was at 6. Accuracywas improved by specifying the data window extents of s=680 and e=700. Prediction tests indicatedthat left hand data is more easily separated and classified than the right hand data. The author alsorealised that the N.N. performed better when different data was used for training and testing.New methods of feature extraction, classification and prediction will undoubtedly be discovered as theunderstanding of the human body evolves. The research of this particular topic can extend overmultiple disciplines and therefore it is likely that ‚Insights from one subject will inform the thinking inanother‛ Shadbolt (2004). Advances made in the field of science often results in complimentary gainsin the area of computing and vice versa.All the processes discussed in this document can have a huge impact on the lives of individuals,businesses and society at large. Many people suffering from motor impairments rely heavily on B.C.I.technologies that incorporate classification and prediction techniques for everyday living. They willundoubtedly progress society towards the creation of a safer and more inclusive society. Classificationand prediction can often be an integral part of any business decision. A manager may consult withhis/her computer system to make risky business decisions such as, should I invest in this new product?,how much stock should I buy? etc. Society also benefits from feature extraction, classification andprediction. These processes are widely used for disease and illness diagnoses and other things such asweather forecasting and storm prediction to name but a few. Consequently, it is safe to assume thatthis field of study will remain a popular one in the years to come and make many more advances. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 17
  • BIIBLIIOGRAPHYB BL OGRAPHYATRY, F. & OMIDVARNIA, A. H. & SETAREHDAN, S. K. (2005) ‚Model Based E.E.G. Signal Purification toImprove the Accuracy of the B.C.I. Systems‛ Proceedings from the 13th European Signal ProcessingConference.Auda, G. & Kamel, M. & Raafat, H. (1995) ‚Voting Schemes for Cooperative Neural Network Classifiers‛Neural Networks 3(3), pp. 1240-1243. Proceedings of the IEEE International Conference on NeuralNetworks.Bamford, S. & Murray, A. & Willshaw, D. J. (2006) ‚Synaptic Rewiring in Neuromorphic VLSI forTopographic Map Formation‛ [Internet], Date Accessed 15 April 2007, Available From:http://www.see.ed.ac.uk/~s0454958/interimreport.pdf.ColEman, A. M. (2003) ‚Oxford Dictionary of Psychology‛ Oxford: Oxford University Press.COYLE, D. & PRASAD, g. & MCGINNITY, T. M. (2004) ‚extracting Features for a Brain-ComputerInterface by Self-Organising Fuzzy Neural Network-Based Time Series Prediction‛ Proceedings from the26th Annual International Conference of the IEEE EMBS.Cilliers, P. (1998) ‚Complexity and Postmodernism: Understanding Complex Systems‛ London:Routledge.EBRAHIMI, T. & VESIN, J. M. & GARCIA, G. (2003) ‚Brain-Computer Interface in MultimediaCommunication‛ IEEE Signal Processing Magazine 20(1), pp. 14-24.Efron, B. (1983) ‚Estimating the ErrorRate of Prediction Rules: Improvement on Cross Validation‛ Journal of the American StatisticalAssociation 78(382), pp. 316-331.FISHER, E. & HOLTOM, D (1999) ‚Enjoy Writing Your Science Thesis or Dissertation – A Step by StepGuide to Planning and Writing ‛ London: Imperial College Press.FRANK, R. J. & DAVEY, N. & HUNT, S. P. (2001) ‚Time Series Prediction and Neural Networks‛ Journal ofIntelligent and Robotic Systems 31(1-3), pp. 91-103.GHORBANI, A. A. & OWRANGH, K. (2001) ‚Stacked Generalization in Neural Networks: Generalizationon Statistically Neutral Problems‛ Neural Networks 3, pp. 1715-1720, Proceedings from the IJCNNInternational Joint Conference on Neural Networks.Kohavi, R. (1995) ‚A Study of Cross-Validation and Bootstrap for Accuracy Estimation and ModelSelection‛ IJCAI Proceedings from the International Joint Conference on Artificial Intelligence. FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 18
  • Kuhnen, C. M. & knutson, b. (2005) ‚The Neural Basis of Financial Risk Taking‛Neuron 47(5), pp. 763-770.LOTTE, F. & CONGEDO, M. & LECUYER, A. & LAMARCHE, F. & ARNALDI, B. (2007) ‚A Review ofClassification Algorithms for EEG-Based Brain-Computer Interfaces‛ Journal of Neural Engineering 4,pp. R1-R13.MASTERS, T. (1995) ‚Neural, Novel & Hybrid Algorithms for Time Series Prediction‛ New York: JohnWiley & Sons Inc.MIRANDA, E. & BROUSE, A. (2005) ‚Toward Direct Brain-Computer Musical Interfaces‛ Proceedingsfrom the 2005 Conference on New Interfaces for Musical Expression, pp. 216 - 219.PALLANT, J. (2001) ‚S.P.S.S. Survival Manual – A Step By Step Guide To Data Analysis Using S.P.S.S.‛Berkshire: Open University Press.PEI, X. M. & ZHENG, C. X. (2004) ‚Feature Extraction and Classification of Brain Motor Imagery TaskBased on MVAR Model‛ Machine Learning and Cybernetics 6, pp. 3726 – 3730, Proceedings from the 3rdInternational Conference on Machine Learning and Cybernetics.RIPLEY, B. D. (1996) ‚Pattern Recognition and Neural Networks‛ Cambridge: Cambridge UniversityPress.SHADBOLT, N. (2004) ‚From the Editor in Chief: Nature-Inspired Computing‛ IEEE Intelligent Systems19(1), pp.2-3.SRIRAJA, Y. (2002) ‚E.E.G. Signal Analysis for Detection of Alzheimer’s Disease‛ PhD Thesis, Texas TechUniversity, Data Accessed: 11 April 2007, Available From: http://webpages.acs.ttu.edu/ysriraja/MSthesis/Thesis.pdf.TABACHNICK, B. G. & FIDELL, L. S. (1996) ‚Using Multivariate Statistics‛ 3 ed. New York: Harper Collins.WEI, Q. & WANG, Y. & GAO, X. & GAO, S. (2007) ‚Amplitude and Phase Coupling Measures for FeatureExtraction in an E.E.G.-Based Brain-Computer Interface‛ Journal of Neural Engineering 4, pp. 120-129.Wolpaw, J. R. & Birbaumer, N. & McFarland, D. J. & Pfurtscheller, G. & Vaughan, T. M. (2002) ‚Brain-Computer Interfaces for Communication and Control‛ The Journal of Clinical Neurophysiology 113(6),pp. 767-91.WOLPERT, D. H. (1992) ‚Stacked Generalization‛ Neural Networks 5(2), pp. 241-259, Pergamon Press. Heading One FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 19