Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

INMAS Final Report


Published on

  • Login to see the comments

  • Be the first to like this

INMAS Final Report

  1. 1.     Brain Informatics Using Deep Learning          Final Research Report     Cognitive Science and Deep Learning  Research Intern    Institute of Nuclear Medicine and Allied Sciences (INMAS)   Defence Research and Development Organisation (DRDO)  Ministry of Defence, Govt. of India                Student:                                                                                                                          Guide:  Vikramank Singh                                                                                      Sushil Chandra  Computer Engineering,                                                                                        Scientist ‘F’  VES Institute of Technology,                      Head, Bio-Medical Engineering Department    University of Mumbai                                                                                     INMAS, DRDO                    
  2. 2. CERTIFICATE      This is to certify that the project entitled ‘​Brain Informatics Using Deep                        Learning’ ​is the bonafide work of Vikramank Singh conducted in the Biomedical                        Engineering Department of the Institute of Nuclear Medicine and Allied Sciences,                      DRDO, Delhi under the supervision and guidance of Mr. Sushil Chandra, Scientist                        ‘F’.                                              Sh. Sushil Chandra  Scientist ‘F’  Biomedical Engg. Department  INMAS (DRDO)                  1           
  3. 3. ACKNOWLEDGEMENT       I hereby take this opportunity to express my sincere gratitude to all the people                            who have contributed with their knowledge and experience in aiding me with my                          project. It would have been quite a difficult task for me to complete this work.    I am thankful to Mr. Sushil Chandra, Scientist ‘F’ & Head B.M.E. Deptt. , INMAS                              (DRDO) for coordinating this training and giving me an invaluable opportunity to                        work in a competitive yet amicable atmosphere and providing me with all the                          facilities and paraphernalia required to carry out this project. His profound                      knowledge and understanding provided me with an entirely new perspective on                      my project. It was always a new and unique experience working with him.    I would like to express my gratitude towards Mrs. Greeshma Sharma my project                          monitor for her worthwhile suggestions and fruitful help and also for all the                          knowledge she imparted to me during the course of time.    Finally I would like to express my deep appreciation to my family and friends                            who have been a constant source of inspiration. I am eternally grateful to them                            for always encouraging and being with me whereever and whenever I needed                        them.                                  2           
  4. 4.    About the Organization  Defense Research and Development Organization (DRDO)  DRDO was formed in 1958 from the amalgamation of the then already                        functioning Technical Development Establishment (TDEs) of the Indian Army and                    the Directorate of Technical Development & Production (DTDP) with the Defense                      Science Organization (DSO). DRDO was then a small organization with 10                      establishments or laboratories. Over the years, it has grown multi-directionally in                      terms of the variety of subject disciplines, number of laboratories, achievements.                      Today, DRDO is a network of more than 50 laboratories which are deeply                          engaged in developing defense technologies covering various disciplines, like                  aeronautics, armaments, electronics, combat vehicles, engineering systems,              instrumentation, missiles, advanced computing and simulation, special materials,                naval systems, life sciences, training, information systems and agriculture.                  Presently, the Organization is backed by over 5000 scientists and about 25,000                        other scientific, technical and supporting personnel. Several major projects for                    the development of missiles, armaments, light combat aircrafts, radars, electronic                    warfare systems etc. are on hand and significant achievements have already                      been made in several such technologies.    Institute of Nuclear Medicine and Allied Sciences (INMAS)  At the instance of Pandit Jawaharlal Nehru, the first Prime Minister of India, a                            Radiation Cell was established in 1956 at Defence Science Laboratory, Delhi.The                      initial assignment was to undertake a study on the consequences of the use of                            nuclear and other weapons of mass destruction. But it was soon realized that                          nuclear energy can also be harnessed for the good of the mankind.  Radioisotopes could find peaceful medical applications. The scope of work was,                      therefore, enlarged and the cell upgraded to Radiation Medicine Division in 1959.                        As awareness increased, so did the work and a full-fledged establishment was                        created in 1961 and named Institute of Nuclear Medicine and Allied Sciences.                        Since then it has traversed a long way, carrying out R&D and providing service as                              a model of excellence in various aspects of Nuclear Medicine and Allied Sciences.                          The activities of the Institute have proliferated enormously over the years. Its  areas of activity have been diversified to cover many fields of radiation and                          bio-medical sciences.    3         
  5. 5.   Vision  The Vision of INMAS has been identified as to be a centre of excellence in                              biomedical and clinical research with special reference to ionizing radiation.        Mission  The Mission of INMAS is clinical research in nuclear medicine and non-invasive                        imaging methods with a focus on biological radio-protectors and thyroid                    disorders.    Basic Background and Theory    Project Background  Institute of Nuclear Medicine and Allied Sciences (INMAS), a wing of Defence                        Research and Development Organization (DRDO) is currently in the third year of                        it four-year project “Cognition Enhancement using Non-Invasive Interventions”.  This project would not only benefit the training regimen for defence personnel as                          it would enhance their reasoning, attention, planning, decision making, memory                    and sensory input processing abilities, but would also contribute to the treatment                        of cognitive disorders like ADD and ADHD, executive disfunctioning in stroke                      patients, autism and cognitive skill degradation due to natural ageing.          Fig 1: Research at BME, INMAS       
  6. 6.   Brain Informatics Using Deep Learning    Final Research Report  4​th​  February 2016        1. Abstract      ​Electroencephalography (EEG) technology has gained growing popularity in                  various applications​. ​In this report we propose a deep learning based automated                        system which can classify the workload into 3 categories - High, Medium and Low                            using the Electroencephalographic signals (EEG) acquired by an inexpensive EEG                    device (Emotiv EEG). Workload is a critical factor influencing the performance of                        an individual in any field ranging from Research, corporate job to Army personels.                          In this study, a 14 channel EEG was used to acquire the brain signals while the                                subjects were given some tasks to perform which were divided based on the                          workload they can cause on an individual. The then acquired signals were passed                          through various deep learning algorithms as training sets. The trained deep                      learning models were then used for classification of workload on an individual by                          just acquiring the EEG signals of that individual and pass them through those                          models.     Keywords:    Deep Learning, Artificial Neural Networks, Radial Basis Function,                           Support Vector Machines (SVM), Stacked Autoencoders, Linear                          Discriminant Analysis (LDA), EEG, EEG Feature Extraction           2. Introduction    In this research work we made use of five deep learning algorithms to train and                              then compare the results of each of the algorithms to figure out which algorithm                            best suited our results. The Emotiv EEG machine was used to gather the 14                            channel data. Since, the Electroencephalographic data is found to contain a lot of                          noise and other disturbing elements which if directly fed into the algorithms as                          the training data can bring out aberrant results. Hence, the acquired EEG data                          was then treated with various digital signal processing techniques to filter out the                          noise and other elements and try to make the signal as pure as possible.     Various noise reduction filter were applied to eliminate the noise from the data                          as far as possible. The filtered data was then passed through butterworth filter in                            order to perform feature extraction of EEG signals. The Alpha, Beta, Gamma,                             
  7. 7. Delta and Theta Features were extracted from the EEG signals based on their                          frequencies. These features of EEG were then used as the input training sets to                            train the various deep learning algorithm. The five deep learning algorithms                      used were - Artificial Neural Networks (ANNs), Support Vector Machines (SVM),                      Radial Basis Function (RBF), Linear Discriminant Analysis (LDAs) and Stacked                    Autoencoders. We will go through each and every algorithm below in detail.     Once the models were trained and the classification was performed, the next step                          in the study was to discern any correlation between various features of the EEG                            signals in case of all the three load cases. We also calculated the significant                            difference between various features in case of each workload condition using                      various statistical methods.         2.1 Artificial Neural Networks    The first deep learning model that we made use of was the Artificial Neural                            Network. We developed a deep neural network consisting of 1 hidden layer with                          8 hidden neurons. The input to the network were the 14 channel EEG signals and                              thus the input layer consisted of 14 neurons. The output that we wanted was a                              classifier which could classify, on the basis of EEG signals, the workload in 3                            categories and hence the output layer consisted of 3 neurons.     The below figure shows how the artificial neural network appeared visually.          Fig 1 - Deep Neural Network (14, 8, 3)          
  8. 8. The 14 input neurons represent the 14 EEG channels - AF3, F7, F3, FC5, T7, P7, O1,                                  O2, P8, T8, FC6, F4, F8 and AF4. The 3 output neurons represent the BL (Base                                Line) i.e no workload, LWL (Low Workload) and HWL (High Workload). We                        made use of R programming to perform the entire research work and the above                            shown neural network was also coded in R. We made use of Resilient                          Backpropagation technique (+Rprop) to train the deep neural net.     In order to train the deep neural network, we first needed to normalize the entire                              input data set. We made use of normalize function available in the RSNNS                          package on the CRAN server for R programming. The testing data was also                          normalized before being fed into the network for testing. The obtained                      classification output thus was in a normalized form and we had to denormalize                          the output using the denormalization function available in the same package                      mentioned above. The denormalized values thus obtained were the actual values                      which represented whether the workload is Base, Low or High.     The data that we had was of 10 students which we further divided in a ratio of                                  8:2 which would then be used for training : testing. We trained the neural net                              with the EEG data of 8 students and then tested the deep net with the data of 2                                    students.     The input / training data which we fed into the neural net was as shown below.             
  9. 9.   The output set of the 14 channel EEG signals was transformed into a binary                            matrix format where the 3 columns are in a format (1,0,0) which signify that for                              each pair of signal it can only be any one of the 3 cases. Hence, when the output                                    of the neural net was denormalized using the denormalization function, the                      output of 3 neurons where in the same format (0,1,0) which was satisfied by the                              input data set.         2.2 Support Vector Machines    Support Vector Machines are based on the concept of decision planes that define                          decision boundaries. A decision plane is one that separates between a set of                          objects having different class memberships. Classification tasks based on                  drawing separating lines to distinguish between objects of different class                    memberships are known as hyperplane classifiers. Support Vector Machines are                    particularly suited to handle such tasks.     The illustration below shows the basic idea behind Support Vector Machines.                      Here we see the original objects (left side of the schematic) mapped, i.e.,                          rearranged, using a set of mathematical functions, known as kernels. The process                        of rearranging the objects is known as mapping (transformation). Note that in                        this new setting, the mapped objects (right side of the schematic) is linearly                          separable and, thus, instead of constructing the complex curve (left schematic),                      all we have to do is to find an optimal line that can separate the GREEN and the                                    RED objects.    In our case also, we made use of SVM as one of the classification models to                                classify the workloads. We made use of the Kernel function in the SVM for the                              classification. In R programming, the SVM was used where the kernel type was                          “Radial”. The output of the SVM was pretty much accurate like that of the ANN.                                   
  10. 10. The same dataset was used to train the SVM which was used to train the Artificial                                Neural Network.     2.3 Stacked Autoencoders (SDAs)     A stacked autoencoder is a neural network consisting of multiple layers of sparse                          autoencoders in which the outputs of each layer is wired to the inputs of the                              successive layer. Formally, consider a stacked autoencoder with n layers. Using                      notation from the autoencoder section, let ​W​(​k​,1)​ ,​W​(​k​,2)​ ,​b​(​k​,1)​ ,​b​(​k​,2)​ denote the                parameters ​W​(1)​ ,​W​(2)​ ,​b​(1)​ ,​b​(2) for kth autoencoder. Then the encoding step for the                      stacked autoencoder is given by running the encoding step of each layer in                          forward order:    The decoding step is given by running the decoding stack of each autoencoder in                            reverse order:    The information of interest is contained within ​a​(​n​)​ , which is the activation of the                            deepest layer of hidden units. This vector gives us a representation of the input in                              terms of higher-order features.  A good way to obtain good parameters for a stacked autoencoder is to use greedy                              layer-wise training. To do this, first train the first layer on raw input to obtain                              parameters ​W​(1,1)​ ,​W​(1,2)​ ,​b​(1,1)​ ,​b​(1,2)​ . Use the first layer to transform the raw input into                        a vector consisting of activation of the hidden units, A. Train the second layer on                              this vector to obtain parameters W​(2,1)​ ,​W​(2,2)​ ,​b​(2,1)​ ,​b​(2,2)​ . Repeat for subsequent                  layers, using the output of each layer as input for the subsequent layer.  This method trains the parameters of each layer individually while freezing                      parameters for the remainder of the model. To produce better results, after this                          phase of training is complete, ​fine-tuning using backpropagation can be used to                        improve the results by tuning the parameters of all layers are changed at the                            same time.  A stacked autoencoder enjoys all the benefits of any deep network of greater                          expressive power.  Further, it often captures a useful "hierarchical grouping" or "part-whole                    decomposition" of the input. To see this, recall that an autoencoder tends to learn                            features that form a good representation of its input. The first layer of a stacked                              autoencoder tends to learn first-order features in the raw input (such as edges in                                 
  11. 11. an image). The second layer of a stacked autoencoder tends to learn second-order                          features corresponding to patterns in the appearance of first-order features (e.g.,                      in terms of what edges tend to occur together--for example, to form contour or                            corner detectors). Higher layers of the stacked autoencoder tend to learn even                        higher-order features.  The training and testing process of a Stacked Autoencoder was pretty much the                          same as that of the ANN. Initially, the training dataset was normalized and then                            fed into the neural net. The output thus obtained was in a normalized form and                              was necessary to de normalize the output to get it into a conducive form. The                              output however of a SDA was not that accurate when compared to that of ANN                              and SVM.       2.4  Radial Basis Function (RBF)     In the field of mathematical modeling, a ​radial basis function network is an                          artificial neural network that uses radial basis functions as activation ​functions​.                      The output of the network is a linear combination of radial basis functions of the                              inputs and neuron parameters.     Radial basis function (RBF) networks typically have three layers: an input layer, a                          hidden layer with a non-linear RBF activation function and a linear output layer.                          The input can be modeled as a vector of real numbers . The output of the                                network is then a scalar function of the input vector, , and is given                            by    RBF networks are typically trained by a two-step algorithm. In the first step, the                            center vectors of the RBF functions in the hidden layer are chosen. This step                              can be performed in several ways; centers can be randomly sampled from some                          set of examples, or they can be determined using k-means clustering. Note that                          this step is unsupervised. A third backpropagation step can be performed to                        fine-tune all of the RBF net's parameters.​[3]  The second step simply fits a linear model with coefficients to the hidden                            layer's outputs with respect to some objective function. A common objective                      function, at least for regression/function estimation, is the least squares function:       
  12. 12.   where  .  We have explicitly included the dependence on the weights. Minimization of the                        least squares objective function by optimal choice of weights optimizes accuracy                      of fit.  There are occasions in which multiple objectives, such as smoothness as well as                          accuracy, must be optimized. In that case it is useful to optimize a regularized                            objective function such as    where    and    where optimization of S maximizes smoothness and is known as a                        regularization parameter.    In our case, the weighted- SSE plot v/s Iterations shows a gradual reduction thus                            indicating a positive sign, however some disturbances in between shows that the                        model is still not an ideal one.            
  13. 13. The above diagram shows the image of the SSE v/s iteration plot along with the  result being shown at the top.           2.5 Linear Discriminant Analysis (LDA)     Linear discriminant analysis (​LDA​) is a generalization of ​Fisher's linear                    discriminant​, a method used in statistics, pattern recognition and machine                    learning to find a linear combination of ​features that characterizes or separates                        two or more classes of objects or events. The resulting combination may be used                            as a linear classifier, or, more commonly, for dimensionality reduction before                      later classification.  LDA is closely related to analysis of variance (ANOVA) and regression analysis,                        which also attempt to express one dependent variable as a linear combination of                          other features or measurements. However, ANOVA uses categorical independent                  variables and a continuous dependent variable​, whereas discriminant analysis                  has continuous independent variables and a categorical dependent variable (​i.e.                    the class label).​[3] Logistic regression and probit regression are more similar to                        LDA than ANOVA is, as they also explain a categorical variable by the values of                              continuous independent variables. These other methods are preferable in                  applications where it is not reasonable to assume that the independent variables                        are normally distributed, which is a fundamental assumption of the LDA method.  LDA is also closely related to principal component analysis (PCA) and factor                        analysis in that they both look for linear combinations of variables which best                          explain the data. LDA explicitly attempts to model the difference between the                        classes of data. PCA on the other hand does not take into account any difference in                                class, and factor analysis builds the feature combinations based on differences                      rather than similarities. Discriminant analysis is also different from factor                    analysis in that it is not an interdependence technique: a distinction between                        independent variables and dependent variables (also called criterion variables)                  must be made.    In the case where there are more than two classes, the analysis used in the                              derivation of the Fisher discriminant can be extended to find a ​subspace which                          appears to contain all of the class variability. This generalization is due to C. R.                              Rao. Suppose that each of C classes has a mean and the same covariance .                                Then the scatter between class variability may be defined by the sample                        covariance of the class means       
  14. 14.   where is the mean of the class means. The class separation in a direction in                                  this case will be given by    This means that when is an eigenvector of the separation will be equal                              to the corresponding eigenvalue.  If is diagonalizable, the variability between features will be contained in                        the subspace spanned by the eigenvectors corresponding to the ​C − 1 largest                          eigenvalues (since is of rank ​C − 1 at most). These eigenvectors are primarily                              used in feature reduction, as in PCA. The eigenvectors corresponding to the                        smaller eigenvalues will tend to be very sensitive to the exact choice of training                            data, and it is often necessary to use regularisation as described in the next                            section.  If classification is required, instead of dimension reduction, there are a number of                          alternative techniques available. For instance, the classes may be partitioned, and                      a standard Fisher discriminant or LDA used to classify each partition. A common                          example of this is "one against the rest" where the points from one class are put in                                  one group, and everything else in the other, and then LDA applied. This will result                              in C classifiers, whose results are combined. Another common method is pairwise                        classification, where a new classifier is created for each pair of classes (giving ​C​(​C                            − 1)/2 classifiers in total), with the individual classifiers combined to produce a                          final classification.    The LDA plot for the given training dataset came out to be as below.          
  15. 15.     3. Correlation and Significance Analysis    In this section we perform a statistical analysis over the features of the EEG                            signals to check whether there exists any significant relationship or correlation                      between the these components of alpha, beta, gamma and delta of EEG signals.                          We performed this analysis for each of the workload (Base Line, Low and High)                            and with a pair of each possible combination to check the relativity.     Firstly, we performed the one-way ANOVA (Analysis of Variance) test to calculate                        any significance difference between the values for each class.         The first table shows the significant difference between the alpha values and the                          beta values for the Base Line class. The P-value for this is greater than 0.05 and                                hence we can say that there is a significant difference between the values of                                 
  16. 16. alpha and beta for the Base Line. Similarly, we can calculate the same for each                              and every class as done above.     The next step is performing the correlation analysis. We made use of the                          Pearson’s Correlation technique and compared the Pearson’s co-efficient to check                    the positive or negative correlation between these components for all the 3                        classes.         The first table shows the correlation between the all the possible combinations                        of components of EEG for the Base Line class. Thus, we can make significant                            conclusions from the above tables.       4. Conclusion    Thus, we made use of 14 channels of EEG to calculate the workload on any                              individual using 5 deep learning techniques and at the end made use of various                            statistical methods to draw inferences from the obtained results. Below is shown                        a visualization of the channels location on the head surface where we can locate                            the 14 channels that we had used to train our models.     The models that we made use of showed some variances in their results and thus                              all of them cannot be termed as the best models for the workload classification.                            The Artificial Neural Networks and the Support Vector Machines were among the                        best working algorithms for the classification and can be more trusted over the                          others.        
  17. 17.     Fig: EEG channel visualization     In our case we had made use of 14 channel EEG device named Emotiv. The                              machine developed however can be used for all the pair of channels - 14, 128,                              256. The UI developed in R shiny is so designed that a drop-down menu can be                                used to select which kind of data the user is trying to train the machine with.     Following is a  screenshot of the complete application developed in R shiny -                  
  18. 18. 5. References    1. NEURAL NETWORK CLASSIFICATION OF EEG SIGNALS BY USING AR                  WITH MLE PREPROCESSING FOR EPILEPTIC SEIZURE DETECTION              Abdulhamit Subasia , M. Kemal Kiymika*, Ahmet Alkana , Etem                    Koklukayab a Department of Electrical and Electronics Engineering,                Kahramanmaraş Sütçü İmam University, 46100 Kahramanmaraş, Turkey.              b Department of Electrical and Electronics Engineering, Sakarya                University 54187 Sakarya, Turkey.    2. CLASSIFYING MENTAL ACTIVITIES FROM EEG-P300 SIGNALS USING              ADAPTIVE NEURAL NETWORKS, Arjon Turnip and Keum-Shik Hong.    3. Epileptic EEG detection using neural networks and post-classification L.M.                  Patnaik a,∗, Ohil K. Manyam    4. Multi-class SVM for EEG Signal Classification Using Wavelet Based                  Approximate Entropy​, ​A. S. Muthanantha Murugavel, S. Ramakrishnan    5. Support Vector Machine Technique for EEG Signals P Bhuvaneswari                  Research Scholar Bharathiar University Coimbatore, J Satheesh Kumar                Assistant Professor, Bharathiar University Coimbatore.