classification-resting-state

Classification of Resting-State fMRI Data of Nicotine
Dependent Patients
Ian McCann
Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA
December 7, 2016
Abstract
Resting-state function magnetic resonance imaging (fMRI) images allow us to see the level
of activity in a patients brain. We are given fMRI of patients before and after they underwent
a smoking cessation treatment. 19 patients took the drug N-acetylcysteine (NAC) and 20
took a placebo. Our goal was to classify the patients as treatment or non-treatment based
off their fMRI scans. To do this the data had to be reduced and the first process in doing
that was to create a mask to apply to all images. The mask was created by averaging the
before images for all patients and selecting the top 40% of voxels from that average. This
mask was then applied to all fMRI images for all patients. The average of the difference in
the before and after fMRI images for each patient were found and these were flattened to one
dimension. Then a matrix was made by stacking these 1D arrays on top of each other and
a data reduction algorithm was performed on it. Lastly, this matrix was fed into a machine
learning algorithm and leave-one-out cross-validation was used to test the accuracy. Out of all
the data reduction machine learning algorithms used, the best accuracy was obtained using
principal component analysis (PCA) along with a linear support vector machine (SVM). This
gave an accuracy of 76.75%, which we consider significant enough to suggest that there is a a
difference in the resting-state fMRI images of a smoker that undergoes this smoking cessation
treatment compared to a smoker that receives a placebo.
1 Introduction
Smoking cigarettes lead to illness such as heart disease, strokes and cancer. Smoking is the leading
cause of preventable mortality in the United States with around 50% of lifelong smokers dying
from one of the illness mentioned earlier. What drives people to continue smoking cigarettes
is the nicotine dependency that smoking causes them to have. This dependency drives them
to compulsively have to smoke in order to keep the withdrawal effects associated with smoking
cessation away. Developing a cessation treatment that will reduce a patients dependency on nicotine
as well as reduce the effects of withdrawal could help millions of people quit a dangerous habit.
In this paper, we analyze data from a smoking cessation treatment, where subjects take a drug
to reduce their nicotine dependence while still being allowed to smoke in order to keep off the
effects of withdrawal. This is the preferred method as more people are likely to try it if they do
not have to quit smoking immediately. The goal is to reduce the nicotine dependency to the point
that it is easier for the subject to quit.
The purpose of this paper is to prove that there is a difference in the resting-state functional
magnetic resonance imaging (fMRI) images of a smoker that undergoes this smoking cessation
treatment compared to a smoker that receives a placebo. Areas of high activity are defined to be
those where more oxygen-rich blood is flowing and the fMRI is able to map these areas. Smokers
should have similar networks in areas of the brain where addiction occurs. Therefore, it should be
noticeable if there is a change in these areas in the brains of subjects who underwent the treatment.
We will use machine learning algorithms in this paper to attempt to classify the subjects on
whether or not the underwent the treatment. The accuracy of classification will rely heavily on
how the data is reduced. Techniques on how to accomplish that will be discussed.
1

2 Data Gathering and Pre-Processing
This section will discuss how the data was obtained and the steps that were taken to clean the
data afterwards.
2.1 Subjects and fMRI data
The data used in this research was obtained from a study done by the Amsterdam Center for
Addiction and Research. The study consisted of 39 subjects who were regular smokers and had
a nicotine dependence. The study sought to determine if administering the drug N-acetylcysteine
(NAC) to a subject would decrease their dependence to nicotine. In total 39 subjects partook in the
experiments with 19 receiving the drug NAC and 20 receiving a placebo. All the subjects underwent
a session of anatomical and functional scans of their brains before and after the experiment.
This MRI data was obtained using a 3.0 T Intera MRI scanner (Philips Healthcare, Best, The
Netherlands) equipped with a SENSE eight-channel receiver head coil. A gradient-echo planar
image sequence sensitive to blood oxygen level-dependent contrast took 200 3-dimensional temporal
images or the subjects brains. The subjects were asked to relax, keep their eyes closed and stay
awake during the scan.
The fMRI works by combining a strong magnetic field with radio waves in order to flip the
magnetic fields of the nuclei in oxygen-rich blood. This allows us to create detailed maps of the flow
of oxygen-rich blood in the brain, which we relate to areas with high activity. The measurement
of blood flow, blood volume and oxygen use is called the blood-oxygen-level-dependent (BOLD)
signal and this signal is what we study in this paper.
The 3-dimensional anatomical data obtained from this was of size 240x240x220 with a voxel
size of 1mm. The 3-dimensional functional data was of size 80x80x37 with a voxel size of 3mm.
Below are slices of the brain from one patient in all three axes for both the anatomical data and
the functional.
(a) Anatomical (b) Functional
Figure 1: Image Slices of Brain of Subject
2.2 Preprocessing Data
In this study, we received raw and preprocessed data from the Amsterdam Center for Addiction
and Research. The results in this paper were found using only the preprocessed data obtained
from them. This section will discuss how that data was preprocessed.
The fMRI data was analyzed using Statistical Parametric Mapping (SPM). The images under-
went segmentation, temporal slice timing correction, realignment, reregistration, spatial normal-
ization, and spatial smoothing. To increase spatial signal to noise ratio the images were spatially
smoothed to an 8 mm FWHM Gaussian kernel. To remove linear trends in each session of 200
images, the function data were band-pass filtered and detrended.
3 Methods and Techniques
This section will explain how we went about manipulating the data in order to reduce its dimensions
to a working size and how the machine learning algorithms we applied to the data function.
3.1 Method
I start by taking an average of all the patients fMRI data for all 200 time steps from before they
underwent the treatment. I will use this to create masks based on what the average patients brain
looked like before the treatment, whether they took a placebo or not.
2

I created three different masks to apply to the patients fMRI data and I discuss how these were
created in the next section. Once I had my mask, I applied it to the average of each patients brain
over the 200 time steps for their before and after examinations.
Next, I found the difference in the masked before and after brain images for each patient and
flattened the 3-dimensional array into a 1-dimensional array.
Then, in order to reduce the features, I applied a data reduction algorithm to the matrix of all
the flattened arrays.
Lastly, I applied my machine learning algorithm to the reduced matrix and used leave-one-out
cross-validation to test my accuracy.
3.2 Masks
For all purposes in this paper, a mask is a 3-dimensional array of 0’s and 1’s, where a 1 signifies
to keep the voxel in that position and 0 indicates to ignore it.
For my first masked, I researched the parts of the brain that have to do with addiction (in the
section 3 I discuss how we can specify this even more to areas of the brain that are specific to
nicotine addiction). I chose to look at the limbic system as I read papers that said this is where
addiction occurs. Observing the average 3-dimensional image obtained earlier, I constructed a 3-
dimensional rectangular box around where I believe the average patients limbic system was. This
box determined my "limbic" mask.
Next, I wanted to only look at voxels that had "high activity". I considered this to be the
voxels that had the highest values. I decided to reduce the matrix by 40% of its original size, so I
found the highest voxel values that would account for 40% of the size of the matrix. These voxel
positions determined my "high activity" mask.
Lastly, I wanted to see if finding the "high activity" voxels in my "limbic" mask would increase
accuracy. I achieved this by first getting the box from the first mask and then finding the highest
activity voxels in this box to reduce it to 40% the size of the box. This determined my "high
activity limbic" mask.
3.3 Data Reduction Algorithms
3.3.1 Singular Value Decomposition (SVD)
SVD is the factorization of a mxn matrix to the form UΣV ∗
, where U is a mxm unitary matrix,
Σ is a mxn diagonal matrix, and V is an nxn unitary matrix. The diagonal values of Σ are the
singular values of the original matrix and the columns of U and V are the left and right singular
values of the original matrix respectively. When SVD is used in this paper, we will only use the
diagonals of the matrix Σ because this tells us properties of the matrix that we can use to compare
to other matrices and reduces the dimension of the matrix. Below is a graph of the correlation
matrix of the Σ matrix with itself for different numbers of singular values. We see that the number
of red (high correlated) values goes down as the number of singular values goes up. This is as
expected.
3

Figure 2: SVD Correlation Matrix
3.3.2 Principle Component Analysis (PCA)
PCA orthogonally transforms data consisting of correlated of uncorrelated variables into linearly
uncorrelated variables, which are called principle components. The principle components are or-
dered so that the ﬁrst principle component will have the largest variance within the data set and
the last will have the least variance. All principle components must also be orthogonal to one
another, thus giving us an orthogonal basis set. The possible number of principle components is
equal to or less than the number of subjects. Below is a graph of the correlation matrix produced
from getting the correlation of the principal components with each other. Again we see that the
number of red (high correlation) values goes down as the number of principle components goes up.
Figure 3: PCA Correlation Matrix
3.3.3 Independent Component Analysis (ICA)
ICA is a technique to separate a multivariate signal into multiple independent non-Gaussian sig-
nals. ICA has the assumption that these underlying signals are independent of each other. For
this research, ICA we hope that ICA will tell us where the regions of the brain are that share
similar brain activity. ICA is also limited by the number of subjects. Below is a graph of the
4

correlation matrix produced from getting the correlation of the independent components with each
other. Again we see that the number of red (high correlation) values goes down as the number of
independent components goes up.
Figure 4: ICA Correlation Matrix
3.4 Machine Learning Techniques
In this section I will briefly discuss the various machine learning algorithms that were implemented.
Their effectiveness in their classifications will be discussed in section 3.
3.4.1 Logistic Regression
Logistic regression is useful in problems that only involve two classes, such as in this paper. The
method works by finding the relationship between the classes and the features and it does this by
estimating probabilities using a logistic function. These odds are used for the predictive model.
3.4.2 Naive Bayes
The Naive Bayes classifier assumes all the features to be independent and applies Baye’s Theorem
to them. Baye’s theorem being the conditional probability (posterior), which is equal to the prior
times the likelihood. The prior can be calculated by making an assumption about the feature’s
distribution called an event model. In this paper we test two distributions for our event model:
Bernoulli and Gaussian distributions.
3.4.3 K-Nearest Neighbors (KNN)
In KNN, the K stands for how many near neighbors we will consider in our algorithm. When
classifying, we will look at the K nearest neighbors of our test set and use them to determine what
class it belongs to.
3.4.4 SVM
An SVM is a supervised learning model used to classify data. The SVM is given training data
along with its corresponding classification and the SVM’s algorithm builds a model that will assign
input data with a classification.
The SVM classifies the data by separating the data with a hyper-plane. This hyper-plane can
be linear or non-linear and is determined by the kernel, which I am able to decide. In this paper,
I use a linear, radial basis function, sigmoid, and degree-3 polynomial kernel in my SVM. For the
linear kernel, a hyper-plane is created between the two classes such that the distance between the
hyper-plane and the nearest point from each class is maximized.
5

The radial basis function is defined below, where xi and xj are feature vectors for subjects:
k(xi, xj) = e−γ||xi−xj ||2
The polynomial kernel is similar to the linear hyper-plane but instead of being straight it will
be a curved hyper-plane with the degree of the curve depending on the degree of the polynomial.
3.5 Leave-One-Out Cross-validation
To test for accuracy in this paper, I used Leave-One-Out Cross-validation. This works by leaving
one subject out of the training processes in order to use it in the predictive process. This is done
39 times in order to leave each subject out once and the accuracy is the number of times our model
predicted correctly divided by the number of subjects, 39.
4 Results
For this experiment, we had 19 subjects who underwent the treatment and 20 who did not. Our
null hypothesis is that there is no difference in the before and after fMRI images of the people who
underwent the treatment compared to those who did not. This means that an accuracy of around
48.7% would imply that our null hypothesis is true.
Below are the graphs of all the machine learning algorithms applied to the data reduced using
SVD for different number of singular values for all three masks.
(a) High-Activity Mask (b) Limbic Mask
(c) High-Activity Limbic Mask
Figure 5: Machine Learning Accuracy Using Different Masks for SVD
For the High-Activity mask in figure 5a, it appears all the algorithms trend to around 45%
once the reach 35 singular values. This is below the null hypothesis, so it is not significant. For
the Limbic mask in figure 5b, we do slightly better for several algorithms, but not enough to be
significant. For the High-Activity Limbic mask in figure 5c, we do better for all algorithms. Using
Naive Bayes with a Bernoulli distribution, we reach around 70% accuracy with 35 singular values.
This is a significant result.
PCA for different number of principal components for all three masks.
6

Figure 6: Machine Learning Accuracy Using Different Masks for PCA
For the High-Activity mask in figure 6a, we get interesting results for the linear SVM and logistic
regression. For the maximum number of principle components, the linear SVM reaches 76.75%
accuracy, which is the best result we get in this research. Logistic regression reaches around 68%,
which is also a notable result. For the Limbic mask in figure 6b, the linear SVM outperforms
the rest again, but only reaches around 70% this time. This is still a notable result, but since no
algorithm did better, we would want to search for a better mask. For the High-Activity Limbic
mask in figure 6c, we do get better results for most of the algorithms, but none that perform better
than the linear SVM in figure 6a.
SVD for different number of independent components for all three masks.
Figure 7: Machine Learning Accuracy Using Different Masks for ICA
Figure 7 gives us very interesting results. The SVM’s using the RBF, Sigmoid and polynomial
3 kernels do extraordinarily well (the SVM with the linear kernel is the only SVM that didn’t do
7

this well). All other algorithms perform around where we expect for the null hypothesis for all the
masks. The SVM with the polynomial 3 kernel reaches 100% accuracy for all three masks at either
15 or 20 independent components. The SVM’s using RBF and Sigmoid kernels also perform very
well.
5 Conclusions
Overall, no one mask proved to be better than the others in all 3 data reduction techniques or
in all machine learning algorithms. However, the fact that the masks performed differently shows
that changing where we look in the brain affects our classification accuracy. I believe this means
that with the right mask, we may be able to find the areas of the brain that matter to nicotine
dependency, which could greatly increase our classification accuracy.
In figure 7 we reach 100% accuracy using ICA and a SVM with a polynomial 3 kernel. I believe
it is due to the high degree of the kernel and the low number of independent components we have
that leads to this. For all the masks, once the number of independent components reaches 20
the accuracy drops. I believe that this technique would not perform as well for data with more
patients, but I do believe that getting this result is significant and experimenting more with the
independent components is the future would be worth while. The fact that all the SVM algorithms
except the linear performed well using ICA makes me think that the assumption ICA makes (that
the underlying signals are independent) may be true and this could be beneficial to know in the
future.
In conclusion, I believe that we have found that there is a difference between the before and
after images of the subjects that underwent the treatment compared to those who did not. The
most notable result from this research is reaching 76.75% accuracy using a linear SVM with PCA.
This classification accuracy is much higher than you would expect with the null hypothesis and I
trust that all aspects of the techniques to reach this conclusion were done correctly. This includes
reading the data from the file correctly, obtaining and applying the mask, rearranging the data,
applying PCA to the data and finally classifying it using the linear SVM. For ICA, as I stated
earlier, I do not trust all aspects of the technique used to reach that result, considering the degree
of the kernel and number of patients in the study. Thus, I would like to look more into the strategy
before saying we obtained a 100% classification rate. However, I believe that the rest of our results
are more than enough to say that the null hypothesis is false and an alternative hypothesis should
be made. My alternative hypothesis would be that there are certain regions in the brain that are
affected by the treatment and therefore creating a mask that only looks at these regions would
allow us to classify the patients much more easily.
6 State of the Art
For fMRI image classification ICA and the linear SVM seem to be standard data reduction and
classification algorithms used. I believe ICA is used because the underlying signals that we are
trying to separate are actually independent as ICA assumes. The linear SVM works well because
we have a two class problem and a lower degree kernel is better for that.
In [6] they explain their method of reducing the data and applying their machine learning
algorithm. First they hand pick their ROI’s based off of areas in the brain they are interested
in. Then they get the Pearson correlation of all the ROI’s with themselves to obtain a symmetric
square matrix of correlation coefficients. Then they select the ROI-pair coefficients from the lower
triangular part of this matrix and used them as the features in their SVM. Using this method they
were able to obtain 84% accuracy. I believe this is a very good technique as long as the ROI’s are
picked carefully.
In order to pick ROI’s well for our research, we want to focus on regions of the brain where
addiction is known to influence. According to [1], the areas of the brain that get affected by nicotine
are those where the dopamine released in the brain due to the drug flood the neurotransmitters
and causes the nerve cells in these areas to change. This change leads us to seek out these activities
as a source of pleasure, but it leads us to need to engage in these activities causing addiction. From
the paper it suggests that the prefrontal cortex and the limbic system are good areas to look at
for addiction. In the future finding very specific areas would be important and the paper discusses
some of these.
8

7 Future Considerations
This research will be carried on by Amirhessam Tahmassebi and we have discussed various ways
this research can be improved and expanded on. This section will discuss those ideas.
An idea to get more accurate and reliable results would be to use ROI based functional connec-
tivity to get our feature vectors. I believe that we could do something similar to [6] as I discussed
in section 6, but the most important part is choosing our ROI’s correctly. In that paper they were
using resting-state fMRI data to classify a patient as young or old so they picked areas of the brain
where there are well known differences in old and young people. For our research we would want
to look at areas that are known or suspected of relating to addiction or more specifically nicotine
addiction. In this paper, the limbic system was singled out and there was no increase in accuracy,
but honing in on specific regions known for addiction and finding their correlation coefficients I
believe could improve accuracy drastically.
References
[1] How addiction hijacks the brain., Harvard Mental Health Letter. (2011).
[2] Larry M. Manevitz. David R. Hardoon, fmri analysis via one-class machine learning techniques.
[3] , Classifying cognitive tasks to fmri data using machine learning techniques., (2011.).
[4] Tim Hahn Andre F. Marquand Steve C.R. Williams John Shawe-Taylor Michael Brammer.
Janaina Mourão-Miranda, David R. Hardoon, Patient classification as an outlier detection
problem: An application of the one-class support vector machine., ELSEVIER. (2011.).
[5] Zaw Zaw Htike Nur Farahana Mohd Suhaimi and Nahrul Khair Alang Md Rashid., Studies on
classification of fmri data using deep learning approach., ARPN Journal of Engineering and
Applied Sciences. (2015.).
[6] Timothy B. Meier. Svyatoslav Vergun, Alok S. Deshpande, Characterizing functional connec-
tivity differences in aging adults using machine learning on resting state fmri data., Frontiers
in Computational Neuroscience. (2013.).
9

classification-resting-state

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to classification-resting-state

Similar to classification-resting-state (20)

classification-resting-state