SlideShare a Scribd company logo
Using SPLS Regression to identify Genomic Loci affecting Essential
Polyunsaturated Fatty Acid Desaturation and Elongation
Daniel Crawford (0590539)
BINF 6999 - University of Guelph
Advisors: Mutch, D., Dang, S.
Submitted: 8 Aug 2013
Final Weighting: 80% Analysis
20% Laboratory
Table of Contents
Introduction 3
Methods 5
Results 8
Model 1 10
Model 2 (a) 11
(b) 15
Model 3 (a) 16
(b) 19
Discussion 20
Conclusions 25
Appendix i - Confidence Intervals for desaturation / elongation activity 26
Appendix ii - Confidence intervals for effect of SNPs on PUFA levels. 27
Appendix iii - Genomic location of SNPs in mapping the FADS gene cluster 28
References 29
2
INTRODUCTION
Investigations into the human plasma lipidome will provide useful information
regarding many areas of human health. The poly-unsaturated fatty acids (PUFAs)
Alpha-linolenic acid (ALA; 18:3 n-3), and Linoleic acid (LA; 18:2 n-6) are essential fatty
acids for humans (Goodhart, R.,1980). These FAs are precursors to longer chain
PUFAs such as Arachidonic Acid (AA; 20:4 n-6), Eicosapentaenoic Acid (EPA; 20:5 n-3),
and Docosapentaenoic Acid (DPA; 22:5n-3). Levels of these fatty acids (FAs) in cellular
membranes and plasma are related to dietary intake of either the preformed products or
precursor FAs (Marangoni, R., 2002).
Relative amounts of n-6 and n-3 FAs in the diet have an important impact on an
individual's overall health. Typically, a diet rich in n-6 PUFAs will shift the physiological
state towards being prothrombotic and proaggregatory (Simopoulos, A.,1999). High
levels of long chain n-3 PUFAs decrease the production of inflammatory eicosanoids
and cytokines (Caulder, P., 2006). An increased proportion of EPA and DHA in
inflammatory cell phospholipids directly decreases the availability of AA to be used as a
substrate for synthesis of pro-inflammatory eicosanoids (Caughey, GE., 1996). EPA can
also be used as a substrate for eicosanoid production, resulting in less pro-inflammatory
molecules (Goldman, DW., 1983) or even anti-inflammatory molecules (Serhan, CN.,
2000). Long chain n-3 PUFAs can also affect gene transcription, including transcription
of lipogenic enzymes (Clarke, SD., 1994). While the long-chain n-3s exhibit potent anti-
inflammatory effects, the precursor ALA does not, in itself, contribute anti-inflammatory
effects to the same degree (Simopoulos, A.,1999) (Calder, P., 2006); therefore
investigations into desaturation and elongation of the essential PUFAs is warranted.
The fatty acid desaturation pathway
The fatty acid desaturation pathway (Figure 1) consists of a delta-6 desaturation
(D6D), an elongation step, and a delta-5 desaturation (D5D) step. The enzymes
responsible for these steps are Δ6 desaturase, fatty acid elongase, and Δ5 desaturase.
n-3 and n-6 PUFAs share these enzymes and are competitive substrates. Δ6 and Δ5
desaturases are rate-limiting enzymes which catalyze double bond formation at the Δ6
3
and Δ5 position of essential PUFAs, and are coded by the genes FADS1 and FADS2
respectively. The two FADS genes are found in a head to head configuration on
chromosome 11 (61.57 – 61.63 Mb). Recent gene association studies have identified
links between genetic variants in the FADS cluster of genes, plasma fatty acids, and
development of diseases such as metabolic syndrome (Truong, H. et al, 2009),
coronary artery disease (Martinelli, N. et al, 2008), myocardial infarction (Baylin, A. et al,
2007), and dyslipidemia (Lu, Y, et al, 2010). Genetic variation in the form of single
nucleotide polymorphisms (SNPs) within the FADS cluster resulting in altered
desaturation activity can be identified using regression.
The goal of this independent research project was to identify the causal genetic
variants altering the efficiency of the fatty acid desaturase pathway, and construct a
haplotype to differentiate high converters and low converters of n-3 and n-6 PUFAs. A
high converter is an individual with high aggregate desaturase activity (ADA), meaning
the overall activity of the desaturation pathway is increased.
Figure 1- Fatty acid desaturase pathway. The FADS2 (fatty acid desaturase 2) gene
encodes the protein Δ-6 desaturase, catalyzing a double bond formation at the Δ-6
carbon. Fatty acid elongase 2, encoded by ELOVL2, adds 2 carbons. (FADS1) Fatty
acid desaturase 1 encodes Δ-6 desaturase, the enzyme which catalyzes a double bond
formation at the Δ-5 carbon.
4
METHODS
Toronto Nutrigenomics and Health (TNH) Study
The Toronto Nutrigenomics and Health (TNH) Study examined over 2000 young
adults from the University of Toronto. Participants of the study completed a health and
lifestyle questionnaire, which included information about smoking habits, physical
activity, medical history, caffeine habits, as well as ethnic background. The two largest
ethnocultural groups were Caucasian and East Asian. Anthropometric measurements
such as sex, waist circumference and Body Mass Index (BMI) were collected. The study
collected genomic and lipidomic data from a subset of the population.
Sparse Partial Least Squares (SPLS) Regression
As modern clinical studies such as the TNHS are able to produce copious
amounts of genomic and metabolomic data, investigations into statistical methodologies
and bioinformatics are critical in order to make the most efficient extraction of
biologically relevant information.
Partial Least Square (PLS) was introduced by Wold in 1966. PLS performs a
basic latent decomposition of the predictor matrix and the response matrix, to find a
small number direction vectors which represent linear combinations of the predictor
variables. In 2010, Chun, H., and Keles, S., introduced Sparse PLS which incorporates
sparsity directly into the dimension reduction step of PLS. This results in sparse linear
combinations of the original predictors, and in the case of this analysis, the selection of
the SNPs most likely to be biologically relevant. Sparsity is imposed by implementing an
L1 penalty; SPLS uses two tuning parameters, eta and K.
Eta (η) is the thresholding parameter and must be between 0 and 1. SPLS uses
a form of soft thresholding in which components are retained if they are greater then
some fraction, determined by eta, of the maximum component. Eta directly determines
the sparsity of the final solution, as a high eta will result in fewer selected variables. K is
the number of latent components, and must be less then the rank of the predictor
matrix. As K decreases, the solution becomes more sparse. Selection of the optimal
parameters to use in a SPLS calculation is done using a 10-fold cross validation. This
5
algorithm randomly partitions the variables into 10 subgroups, and uses each subgroup
once as the validation group while the other 9 are used as the training set. Optimal
parameters can then be estimated as a combination of the results from each analysis.
An advantage of 10-fold cross validation is that each observation is used one and only
once as part of the training set. An inherent consequence of this method is that the 10
subgroups are randomly assigned, and therefore subsequent cross-validations may
result in different estimations of optimal parameters. Optimal parameters are those
determined to have the lowest mean square error (MSE). Performing the SPLS analysis
using alternate parameters may affect the final number of SNPs selected to be causing
a significant effect, however SNPs that are more significant are less likely be affected by
small changes in eta and K. The SPLS algorithm will automatically set the regression
coefficient of non-causal SNPs to 0.
While previous studies found associations with many SNPs within the FADS
gene cluster, using SPLS regression will select a small number of important variables.
This method will identify the SNP(s) within FADS1/2 an ELOVL2 that are most likely
responsible for the observed variation in FA levels in plasma.
Data was collected as part of the Toronto Nutrigenomics and Health Survey.
Phenotype data for this analysis were GC measurements for the following fatty acids:
Linoleic acid LA 18:2 n-6
γ-Linoleic acid GLA 18:3 n-6
Dihomo γ-Linoleic acid DGLA 20:3 n-6
Arachidonic Acid AA 20:4 n-6
Alpha-linolenic acid ALA 18:3 n-3
Eicosapentaenoic Acid EPA 20:5 n-3
6
Enzyme activity was approximated by the following FA ratios:
n-6 Δ6 Desaturase GLA / LA
n-6 Elongase DGLA / GLA
n-6 Δ5 Desaturase AA / DGLA
n-6 Aggregate Desaturase Activity (n-3 ADA) AA / LA
n-3 Aggregate Desaturase Activity (n-6 ADA) ALA / EPA
Genotype data for 26 SNPs mapped out 3 genes: FADS1, FADS2, and ELOVL2
(see Appendix iii). The genotype data was formatted in a matrix such that individuals
were given a ‘0’ for SNPs which were homozygous for the major allele, and ‘1’ for SNPs
that were either heterozygous, or homozygous for the minor allele. 37 individuals were
missing data for one or more SNP, and were removed from the analysis. Each SNP was
tested for (HWE) in the whole population, the caucasian population, and the asian
population, to identify ethnic dependent polymorphisms.
Data analysis was performed in R (R core team, 2013). The package
“SPLS” (Chum, H and Keles, S, 2010) was used for regression analysis. Optimal
parameters for regression were first determined using cross-validation (CV). The SPLS
function was then used for variable selection, where an SPLS object was created from
the predictor and response matrix data. Confidence intervals were calculated for the
effect of each SNP on multivariate responses.
For model 1, predictor matrices were created for both the Caucasian and Asian
populations, containing genotype data for only the SNPs used in the Merino et al.
analysis. 15 SNPs were tested in Caucasians: 3 SNPs in FADS1 (rs174547, rs412334,
and rs695867) and 12 SNPs in FADS2 (rs174576, rs174579, rs174593, rs174602,
rs174611, rs174627, rs17831757, rs2072114, rs2845573, rs482548, rs498793, and
rs968567). In Asians, 15 SNPs were tested: 3 in FADS1 (rs174547, rs412334, and
rs695867) and 12 SNPs in FADS2 (rs174570, rs174576, rs174579, rs174593,
rs174602, rs174611, rs174627, rs17831757, rs2072114, rs2845573, rs498793, and
rs526126).
7
For models 2 and 3, a training data set was created for the Caucasian
population, composed of randomly selected individuals of Caucasian ethnicity. The
predictor matrix of for these models including the SNPs in HWE within the Caucasian
population. 19 SNPs from the FADS cluster where tested for HWE (RS174547,
RS174570, RS174576, RS174579, RS174593, RS174602, RS174626, RS174627,
RS17831757, RS2072114, RS412334, RS482548, RS498793, RS526126, RS695867,
RS968567, RS174611, RS2845573, RS2851682). 7 SNPs from ELOVL2 were tested
for HWE (RS12195587, RS13204015, RS3798719, RS8523, RS976081, RS3798720,
RS911196). Age, BMI, and sex were included as confounding variables to account for
the effect of these factors on plasma lipid levels.
The SPLS object, containing the set of regression coefficients, was then used to
create a predicted fatty acid profile. The Pearson product-moment correlation coefficient
between the predicted response and the actual response are presented as r2 values.
RESULTS
Data from 1059 individuals (309 males and 750 females) ages 20-29 years was
used for this study. Participants were of normal BMI (22.3 +/- 3.5) and non-smokers.
The population was ethnoculturally diverse; the ethnic group with the largest
representation was Caucasian (n=450), and the second largest was East Asian (n=384).
Gas chromatography results showed that the highest average PUFA in plasma
was Linoleic Acid, and the second highest was Arachidonic Acid (Figure 2). The highest
ratio was AA/LA in Caucasians (Table 1), as well as in Asians (Table 2), and represents
n-6 D5D, and the second highest ratio was DGLA/GLA and represents n-6 elongase.
8
Figure 2 - Distribution of levels of each PUFA as measured by GC.
Table 1- Desaturation pathway activity - Average of Caucasian population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.011 4.662 4.832 0.203 1.596
Table 2- Desaturation pathway activity - Average of East Asian population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.011 4.779 5.291 0.204 1.597
The 26 SNPs mapping the FADS1, FADS2, and ELOVL2 genes were tested for HWE.
In the Caucasian population (n=451) 25 SNPs were in HWE:
"RS12195587" "RS174547" "RS174570" "RS174576" "RS174579"
"RS174593" "RS174602" "RS174626" "RS174627" "RS17831757"
"RS2072114" "RS3798719" "RS412334" "RS482548" "RS498793"
"RS526126" "RS695867" "RS8523" "RS968567" "RS976081"
"RS174611" "RS2845573" "RS2851682" "RS3798720" "RS911196"
RS13204015 was not in HWE.
In the East Asian population (n=326), the following 19 SNPs were in HWE:
"RS12195587" "RS13204015” "RS174576" "RS174579" "RS174593"
"RS174602" "RS174626" "RS174627" "RS17831757" "RS2072114"
"RS3798719" "RS412334" "RS482548" "RS526126" "RS695867"
"RS968567" "RS976081" "RS174611" “RS911196"
9
In the general population (n=1059), the following 9 SNPS were in HWE
"RS174579" "RS174593" "RS174626" "RS17831757" "RS3798719" "RS482548"
"RS695867" "RS968567" "RS976081"
MODEL 1 - Replication of Merino et al analysis
The goal of this model was to compare results of SPLS regression to linear
regression. It would be expected that SPLS regression will select the most important
variables in terms of having an actual effect on desaturation activity. Merino et al used
linear regression to test 15 SNPs within a Caucasian group (n=78) for significant
associations with D5D, D6D, n-6 ADA and n-3 ADA. Significant associations were found
for 9 SNPs. Testing the same 15 SNPs using SPLS (eta=0.8 and K=2) in a larger
Caucasian population (n=450) identified 3 SNPs ("RS174547" "RS174576"
"RS968567") with significant associations to desaturase activity.
SPLS regression (Eta = 0.88, K=3) results show that in the caucasian population
RS174547 is significantly associated with D5D and n-6 ADA, RS174576 is significantly
associated with n-3 ADA, and RS968567 is significantly associated with n-6 D5D
activity.
Table 3 - Regression Coefficients of presence of at least 1 minor allele in SNPs
with significant associations to desaturation activity (Caucasian population).
Negative coefficients indicate decreased desaturation activity.
SNP n-6 D6D n-6 D5D n-6 ADA n-3 ADA
"RS174547" -0.299 0 -0.003 0
"RS174576" 0 0 0 -0.130
"RS968567" -0.227 0 0 0
10
Linear regression by Merino et al resulted in 9 SNPs having significant
associations. SPLS (Eta = 0.88, K=2) results show that in the Asian population
RS174547 is significantly associated with D5D, D6D, n-6 ADA, and n-3 ADA. RS174576
is significantly associated with D5D, D6D, n-6 ADA, and n-3 ADA. RS174611 is
significantly associated with n-6 D5D activity.
Table 4- Regression Coefficients of presence of at least 1 ‘C’ allele in SNPs with
significant associations to desaturation activity (Asian population). In the Asian
population, the ‘C’ allele is the major allele in the SNP RS174547. Negative coefficients
indicate decreased desaturation activity.
SNP n-6 D6D n-6 D5D n-6 ADA n-3 ADA
RS174547 -0.250 -0.001 -0.013 -0.041
RS174576 -0.199 -0.002 -0.014 -0.045
RS174611 -0.314 0 0 0
MODEL 2(a) - Multivariate Analysis of Desaturase and Elongase activity (Caucasian
population)
The goal of this model was to select SNPs which affect all or a subset of the
desaturation/elongation activities, including ADA. It is hypothesized that cross validation
will select optimal parameters that will result in a solution with the appropriate sparsity,
SPLS will select SNPs best associated with the desaturase activity responses, and will
exclude SNPs that are highly collinear but do not have any direct effect on any
desaturation activity.
A sample of Caucasians (n=335) were randomly selected to be used as the
training population. Genotype data for 25 SNPs mapping ELOVL2 and FADS1/2, as well
as age, gender, and BMI comprised the prediction matrix. Fatty acid ratios were used to
estimate desaturation/ elongation activity, and Aggregate n-6 and n-3 desaturation
activity. A log transformation of the FA ratios were used as it was more normally
distributed. 10-fold cross validation determined that optimal parameters were eta = 0.64
11
and K = 3, with a mean square prediction error of 0.157. The cross-validation results
indicate that the most likely solution has low sparsity.
Figure 3 - CV Heatmap of Mean Square Prediction Errors for range of possible
SPLS parameters.
SPLS regression selected 10 SNPs that were significantly associated with at
least one step in the fatty acid desaturation pathway. 6 SNPs were associated with D6D
activity, 4 SNPs with elongation, 7 SNPs with n-6 D5D, 9 SNPs with n-6 ADA, and 6
SNPs with n-3 ADA. Table 5 shows the regression coefficients for the SNPs with
signifiant associations, value with a CI containing 0 are automatically set to 0.
12
Table 5 - Regression coefficients for selected variables and desaturation/
elongation activity (eta = .64, K=3). Negative coefficients indicate presence of at least
one minor allele results in decreased desaturation activity.
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
RS174547 -0.017 0 -0.022 -0.023 -0.033
RS174570 -0.034 0.027 0 -0.018 0
RS174576 -0.016 0 -0.022 -0.023 -0.034
RS174579 0 0 -0.024 -0.013 -0.033
RS174593 0 0 -0.023 -0.013 0
RS2072114 -0.045 0.048 -0.023 -0.020 -0.023
RS968567 0 0 -0.029 -0.012 -0.039
RS976081 0 0 0.036 0 0.034
RS2845573 -0.036 0.028 0 -0.018 0
RS2851682 -0.032 0.025 0 -0.018 0
sex -0.079 0.125 -0.058 0 0
The SPLS derived regression coefficients were used to predict the desaturation
activity response in an independent population of Caucasian individuals. The caucasian
individuals not selected as part of the training population were used as the testing
population (n=115). As SPLS creates linear combinations of the original variables,
predicted response has a linear relationship to actual response, therefore Pearsons
correlation coefficient was used to evaluate how well predicted data matched actual
data. This regression model with direction vectors containing 10 SNPs was best able to
predict n-6 ADA (r2 = 0.24) in the test population.
Table 6- r2 value for predicted response in Caucasian testing population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.085 0.184 0.188 0.239 0.080
13
The model was then tested on the general population (n=1059), including the training
population (Table 7), and on the second largest ethnicity, the East Asian population
(Table 8):
Table 7- r2 value for whole population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.252 0.226 0.089 0.249 0.064
Table 8 - r2 value for East Asian population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.221 0.229 0.053 0.223 0.050
Figure 4 - Predicted desaturation and elongation activity in Caucasian Testing
population compared to actual values. SPLS regression coefficients from training
Caucasian population were applied to a test set of Caucasians.
14
MODEL 2(b)
This model uses the same prediction and response data as model 2(a). The
cross validation heatmap for model 2(a)(figure 3) also indicated a high sparsity solution
could be applied with a small increase (0.001) in MSE. As the parameters for this model
(eta = .96, K=2) impose higher sparsity, it is expected that the number of selected
variables will be reduced to a smaller number of SNPs.
SPLS regression using parameters reduced the number of significant variables to
3. Two SNPs, RS174547 and RS174576 were selected. Sex was also found to
contribute to desaturase activity.
Table 9 - Regression coefficients for selected variables and desaturation/
elongation activity (eta = .96, K=2). Negative coefficients indicate presence of at least
one minor allele results in decreased desaturation activity.
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
RS174547 -0.059 0.063 -0.057 -0.053 -0.081
RS174576 -0.058 0.061 -0.055 -0.053 -0.080
sex -0.078 0.149 -0.084 0 -0.064
Table 10 - r2 value for Caucasian test population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.051 0.103 0.202 0.200 0.032
Table 11 - r2 value for East Asian population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.224 0.242 0.058 0.210 0.047
Table 12 - r2 value for whole population
n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA
0.229 0.238 0.073 0.231 0.058
15
Figure 5 - Predicted desaturation and elongation activity in Caucasian Testing
population, from SPLS regression using training Caucasian population.
MODEL 3 (a)
An SPLS regression analysis was done to identify associations between SNPs in
the FADS gene cluster and the ELOVL2 gene, and levels of the plasma FAs: 18.2n6,
18.3n6, 20.3n6, 20.4n6, 18.3n3, and 20.5n3. Fatty acid levels were generally normally
distributed. The training population consisted of 338 randomly selected Caucasian
individuals. Genotype data included 25 SNPs which were in HWE in the caucasian
population mapping the genes FADS1, FADS2, and ELOVL2. Confounding variables,
age, bmi, and sex, were included. The response matrix was composed of the 6
individual plasma fatty acid levels as measured by gas chromatography (GC). Cross
validation determined that the optimal parameters were eta = 0.92, and K = 2. MSE =
2.6128 (Figure 6)
16
Figure 6 - CV Heatmap of Mean Square Prediction Errors for range of possible
SPLS parameters
SPLS regression was performed and 2 SNPs were selected as significant
effectors of the 6 FAs. Only one SNP (RS174547) resulted in a significant effect on
18.2n6 (LA). None of the SNPs significantly explained variation in 20.3n6 (DGLA), and
the remainder of the FAs were effected by 2 SNPs (RS174547 and RS174576) by
approximately the same amount. Age and BMI had significant effects on the amount of
18.2n6 (LA).
Table 13 - Regression coefficients for selected variables and PUFA levels (eta = .
92, K=2). Negative coefficients indicate presence of at least one minor allele results in
decreased desaturation activity.
18.2n6 (LA) 18.3n6
(GLA)
20.3n6
(DGLA)
20.4n6 (AA) 18.3n3 (ALA)20.5n3
(EPA)
RS174547 0.219 -0.017 0 -0.270 0.018 -0.040
RS174576 0 -0.017 0 -0.268 0.018 -0.042
age 0.478 0 0 0 0 0
bmi -0.774 0 0 0 0 0
17
Figure 7 - Predicted desaturation and elongation activity in Caucasian Testing
population compared to actual fatty acid levels, from SPLS regression using
training Caucasian population.
When the model was tested on the test set of Caucasians (n=113), the best correlated
prediction was for fa20.4n6 (AA).
Table 14 - r2 value for predicted response in Caucasian testing population
18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA)
0.08679 0.05626 0.01965 0.21063 0.019205 0.00013
When this model was tested on all individuals (n=1059), the predicted fa20.4n6 (AA)
values once again were the best correlated to the actual values.
Table 15 - r2 value for whole population
18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA)
0.04656 0.15318 0.01167 0.24159 0.01100 0.01226
18
Additionally, this model was tested on the East Asian population (n=384). The prediction
of GLA levels for the asian population was better the in the general population. The best
predicted FA in the East Asian population was AA.
Table 16 - r2 value for East Asian population
18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA)
0.03744 0.21149 0.01749 0.21721 0.00822 0.00312
Model 3 (b)
To explore the effects of dimension reduction the regression algorithms was run
again with the eta parameter set to 0. This model therefore will not impose sparsity, thus
resulting in each variable being assigned a coefficient. While each SNP is assigned a
coefficient, not all are significant. The following table shows the association of the 14
SNPs with each FA.
Table 17 - Regression coefficients for variables and PUFA levels (eta = .0, K=2).
Negative coefficients indicate presence of at least one minor allele results in decreased
desaturation activity.
18.2n6
(LA)
18.3n6
(GLA)
20.3n6
(DGLA)
20.4n6
(AA)
18.3n3
(ALA)
20.5n3
(EPA)
RS12195587 -0.388545
8
0 0.036 0 0 0
RS174547 0 -0.005 0 -0.086 0.006 -0.013
RS174570 0 -0.004 0 -0.061 0 0
RS174576 0 -0.005 0 -0.083 0.006 -0.014
RS174579 0 0 0 -0.051 0 -0.010
RS174593 0 -0.004 0 -0.063 0.005 0
RS174602 0 0 0 -0.031 0 0
RS174626 0 0 0 -0.050 0.004 0
RS174627 0 0 0 -0.049 0 0
RS2072114 0 -0.004 0 -0.057 0 0
19
18.2n6
(LA)
18.3n6
(GLA)
20.3n6
(DGLA)
20.4n6
(AA)
18.3n3
(ALA)
20.5n3
(EPA)
RS968567 0 0 0 -0.038 0 -0.011
RS174611 0 -0.004 0 -0.066 0.005 0
RS2845573 0 -0.004 0 -0.053 0 0
RS2851682 0 -0.003 0 -0.046 0 0
BMI -0.700121
4
0 0.062 0 0 0
These coefficients predicted the fatty acid approximately as well as when the dimension
reduction step is included.
Table 18- Correlation with actual values of Caucasian test population
18.2n6 (LA) 18.3n6
(GLA)
20.3n6
(DGLA)
20.4n6 (AA) 18.3n3 (ALA)20.5n3
(EPA)
Sparsity 0.02742 0.06301 0.03332 0.23542 0.01023 0.00023
No Sparsity 0.08679 0.05626 0.01965 0.21063 0.019205 0.00013
DISCUSSION
Identification of genomic loci affecting PUFA metabolism leads itself to functional
analysis of SNPs with a potential importance in the pathology of metabolic diseases and
understanding how those diseases could be prevented. Previous studies tested for
associations with genetic variants using linear regression. When considering genomic
data, a number of issues arise that are not properly dealt with in linear regression
models. SNPs within same gene can be founds in high linkage disequilibrium (LD), thus
resulting in the genomic data being highly collinear. Using linear regression to model
highly collinear relationships can result in unstable coefficients (Wold, 1984), the
individual effects from each SNP, thus predicted responses are often poor. For the type
of data presented in this study, it is assumed that only a few SNPs are causing the
20
observed effects, this is otherwise known as the sparsity principal. This type of SNP
selection problem is effectively handled by Sparse Partial Least Squares (SPLS)
regression created as an adaptation of PLS. SPLS performs simultaneous dimension
reduction and variable selection to identify the most relevant variables, and calculates
coefficients to predict the response.
In high-throughput biological research, accurate variable selection techniques are
critical in determining relevant information. In studies focusing on single nucleotide
polymorphisms, statistical complexities arise including high collinearity between the
variables, and sparsity of relevant SNPs. The present study examined 26 SNPs
mapping out 3 genes. All SNPs were tested for HWE in the whole population, as well as
in the caucasian population, and the asian population. The population with the most
SNPs in HWE was the caucasian population. This was expected as the SNPs for
genotyping were selected from their presence within a caucasian population using
HapMap.
Predictive ability of SPLS was tested by comparison to linear regression. Merino
et al performed linear regression on SNPs within FADS 1 and 2. Ratios of fatty acids
approximating D6-Desaturase, D5-Desaturase, n-6 ADA, and n-3 ADA were the
response variables. 15 SNPs were tested in the Caucasian population (n=78), and 9
were found to be significant, with the strongest association between RS174547
(FADS1) and n-6 ADA (p=3.99×10^8). All other SNPs were no longer significantly
associated when rs174547 was considered as a co-variate. The same analysis was
repeated using SPLS. In the Caucasian 3 SNPs were selected as relevant, RS174547
(FADS1), RS174576 (FADS2), and RS968567 (FADS2). The advantage of SPLS in this
context is that, through the selection of optimal parameters, cross validation will provide
a solution with the appropriate sparsity. Linear regression selected more variables then
SPLS, this suggests that many of these variables were included because of their high
collinearity to the causal loci.
21
In the Asian population (n=69), of the 15 SNPs studied, 8 were found to be
significantly associated with altered fatty acid levels. The strong correlation between
rs174547 (FADS1) and n-6 ADA was also identified in Asians, as well as a strong
association between rs498793 (FADS2) and n-3 ADA. All other associations became
insignificant when rs174547 was added as a covariate. The same analysis was
repeated using SPLS. In the East Asian population, 3 SNPs were selected as relevant,
RS174547 (FADS1), RS174576 (FADS2), and RS174611 (FADS2).
RS174547 and RS174576 were selected to be significant in all models, moreover
at different parameters within these models. This was the case in not only the
Caucasian population, but also the East Asian population. This is evidence that genetic
variation one or both of these loci, or a closely linked non-genotyped loci, may be
directly responsible for variation in plasma PUFAs by affecting desaturation enzyme
activity. RS174547 and RS174576 have been identified as significantly associated with
altered fatty acids in several other studies (Rzehak, J., 2009; Schaeffer, L., 2006;
Martinellis, N., 2008; Boker, S., 2010) however these studies identified multiple SNPs all
in in high LD with RS174547 and RS174576.
RS174547 (FADS1)
Presence of at least one minor allele (T) at this loci was associated with
decreases in D6D, D5D, n-6 ADA, and n-3 ADA. The T allele was associated with
increased precursor levels and decreased products. The T allele was also associated
with increased elongation activity. In model 1, this SNP was associated with D6D activity
and not D5D activity. As this SNP is within the gene for D6 desaturase, this is likely
close to its actual effect. Associations with D5D activity in other models is likely due to
the very high collinearity between this SNP and RS174576. RS174547 is in high LD
with RS174576 (r2= 0.97).
22
RS174576 (FADS2)
Presence of at least one minor allele (A) is associated with decreased D5D
activity, D6D activity, n-3 ADA, and n-6 ADA. Minor allele presence was associated with
decreased GLA, AA, and EPA, increased ALA, and increased n-6 elongation activity. It
is unclear if this SNP has actual causal effects or is associations are due to high
collinearity with the causal SNPs.
RS968567 (FADS2)
Significant associations with the presence of at least one minor (A) allele at this
loci existed with with decreased D6D activity, D5D activity, n-3 and n-6 ADA and with
decreased levels of Ararchidonic acid (fa20.4n6). This SNPs is found within the
promotor region of FADS2, in a predicted binding site for transcription factors such as
SREBP1 and PPARa (Lattka, E. 2010). Functional analysis by Lattka, E. et al showed
using luciferase reporter gene assays that two transcription factors including ELK1 bind
to the promotor region in a manor specific to the RS968567 allele. Promotor activity
increased with the minor T allele. If would be expected that the presence of a minor
allele here would be associated with increases D5D activity however this was not
observed. (Lattka, E. 2010)
ELONGATION ACTIVITY
Both RS174576 and RS174547 showed positive associations with elongation activity
however this is most likely an artifact of highly correlated enzyme activity creating the
precursor and depleting the product of the elongation step. No ELOVL2 SNPs were
selected in any SPLS model. This suggests that genetic variation within these genes is
not resulting in any altered metabolite levels within this pathway. As the elongation step
is not the rate limiting step within this pathway, changes in enzyme activity is unlikely to
significantly impact the lipid measurements.
23
CROSS VALIDATION
For each model, optimal parameters for SPLS regression were selected using a
10-fold cross validation. Each time a CV is performed it may give different results as the
subgroups are randomly selected. A CV heat map can be plotted to visualize the mean
squared prediction error, and can be a useful statistical diagnostic tool displaying which
parameters will provide the ideal sparsity, and if multiple optimal parameters exist. The
optimal parameters selected correspond to the global minima of prediction error,
however local minima may exist. Figure 3 in model 2 demonstrates this idea, since the
global minimum for mean square error is 0.157 and corresponds to eta = 0.54, and K=3,
but a local minima of 0.158 exists when eta = 0.96 and K=3. This shows that a higher
sparsity can be imposed with only a very small increase in mean squared prediction
error. Using these new parameters reduced the number of selected SNPs from 10,
down to 2.
Outcomes of SPLS can be modified by manually selecting the parameters used.
Since eta, the thresholding parameter, determines the sparsity of the final solution, the
dimension reduction feature of the SPLS function can even be omitted by setting eta=0.
This was done with the fatty acid response matrix to elucidate the multicollinearity of
many SNP. When coefficients are assigned to each variable without dimension
reduction, the predicted fatty acid levels are largely the same as when the sparsity
principal is applied, this demonstrates that many of the SNPs selected under low
sparsity conditions are not contributing significantly to the fatty acid responses.
An important corollary of this approach is that the tag-SNPs which are used to
map the genes may not include the causal variant. The tag SNPs which are selected
have the highest likelihood of those which were genotyped to be causal, yet may only
be indicative of the region where the true causal SNP exists. Often more than one SNP
is selected as significant, and this method alone does not distinguish if they are both
causal, if one is causal and the other is in very high LD, or if the two SNPs are in equal
LD to the causal gene. Molecular biology approaches should be used to further
investigate functional changes associated with each selected polymorphism.
24
CONCLUSION
SPLS regression is an effective statistical approach for determining the most
relevant SNPs affecting a multivariate response. SPLS handles multicollinearity through
implementing an L1 penalty, and can impose a reasonable amount of sparsity on the
solution through selecting optimal parameters by cross validation. Selection of optimal
parameters should not be solely left to the included cross validation function as many
parameters will give useful results with negligible increases in MSE. The SNPs
RS174547 and RS174576 were consistently selected as relevant variables and
therefore one or both of them are likely either the causal SNPs, or are very closely link
to the causal loci.
25
Appendix i - Confidence Intervals for desaturation / elongation activity
Training caucasian population (eta=0.96, K=2)
Confidence intervals of effect of selected SNPS on D6D activity
2.5% 97.5%
RS174547 -0.082 -0.036
RS174576 -0.082 -0.034
sex -0.125 -0.032
Confidence intervals of effect of selected SNPS on elongation
2.5% 97.5%
RS174547 0.042 0.085
RS174576 0.039 0.081
sex 0.114 0.186
Confidence intervals of effect of selected SNPS on D5D
2.5% 97.5%
RS174547 -0.074 -0.040
RS174576 -0.073 -0.038
sex -0.117 -0.052
Confidence intervals of effect of selected SNPS on n-6 ada
2.5% 97.5%
RS174547 -0.067 -0.040
RS174576 -0.066 -0.040
sex -0.040 0.014
Confidence intervals of effect of selected SNPS on n-3 ada
2.5% 97.5%
RS174547 -0.111 -0.053
26
2.5% 97.5%
RS174576 -0.110 -0.053
sex -0.116 -0.007
Appendix ii - Confidence intervals for effect of SNPs on PUFA levels.
Confidence intervals of effect of selected SNPS on Linolenic acid (18:2 n-6)
2.5% 97.5%
RS174547 0.018 0.427
RS174576 -0.029 0.383
sex 0.062 0.915
bmi -1.216 -0.302
Confidence intervals of effect of selected SNPS on γ-Linolenic acid (18:3 n-6)
2.5% 97.5%
RS174547 -0.025 -0.009
RS174576 -0.025 -0.009
sex -0.010 0.009
bmi -0.006 0.018
Confidence intervals of effect of selected SNPS on Dihomo γ-Linolenic acid (20:3 n-6)
2.5% 97.5%
RS174547 -0.007 0.033
RS174576 -0.004 0.037
sex -0.051 0.004
bmi -0.005 0.081
Confidence intervals of effect of selected SNPS on Arachidonic acid (20:4 n-6)
2.5% 97.5%
RS174547 -0.332 -0.214
RS174576 -0.328 -0.210
27
2.5% 97.5%
sex -0.103 0.102
bmi -0.004 0.236
Confidence intervals of effect of selected SNPS on Alpha-linolenic acid (18:3 n-3)
2.5% 97.5%
RS174547 0.006 0.035
RS174576 0.005 0.035
sex -0.012 0.024
bmi -0.037 0.012
Confidence intervals of effect of selected SNPS on Eicosapentaenoic acid (20:5 n-3)
2.5% 97.5%
RS174547 -0.064 -0.019
RS174576 -0.066 -0.019
sex -0.007 0.065
bmi -0.043 0.023
Appendix iii - Genomic location of SNPs in mapping the FADS gene cluster (from NCBI)
28
Literature Cited
Robert S. Goodhart and Maurice E. Shils (1980). Modern Nutrition in Health and
Disease (6th ed.). Philadelphia: Lea and Febinger. pp. 134–138.ISBN 0-8121-0645-8.-
Risé P, Marangoni F, Galli C. Prostaglandins Leukot Essent Fatty Acids. 2002 Aug-Sep;
67(2-3):85-9. Regulation of PUFA metabolism: pharmacological and toxicological
aspects.
Artemis P Simopoulos, Essential fatty acids in health and chronic disease.. Am J Clin
Nutr 1999;70(suppl):560S–9S
Calder, C., n−3 Polyunsaturated fatty acids, inflammation, and inflammatory diseases.
2006, American Society for Clinical Nutrition
Caughey GE, Mantzioris E, Gibson RA, Cleland LG, James MJ. The effect on human
tumor necrosis factor α and interleukin 1β production of diets enriched in n−3 fatty acids
from vegetable oil or fish oil. Am J Clin Nutr 1996;63:116–22.
Goldman DW, Pickett WC, Goetzl EJ. Human neutrophil chemotactic and degranulating
activities of leukotriene B5 (LTB5) derived from eicosapentaenoic acid. Biochem Biophys
Res Commun 1983;117:282–8.
Serhan CN, Clish CB, Brannon J, Colgan SP, Gronert K, Chiang N. Anti-inflammatory
lipid signals generated from dietary n−3 fatty acids via cyclooxygenase-2 and
transcellular processing: a novel mechanism for NSAID and n−3 PUFA therapeutic
actions. J Physiol Pharmacol 2000;4:643–54.
Clarke SD, Jump DB. Dietary polyunsaturated fatty acid regulation of gene transcription.
Annu Rev Nutr 1994;14:83–98.
Raatz, K., Bibus, D., Thomas, W., Kris-Etherton, P. Total Fat Intake Modifies Plasma
Fatty Acid Composition in Humans
N.Martinelli,D.Girelli,G.Malerba,P.Guarini,T.Illig,E.Trabetti,M.Sandri,S.Friso, F. Pizzolo,
L. Schaeffer, J. Heinrich, P.F. Pignatti, R. Corrocher, O. Olivieri, FADS genotypes and
desaturase activity estimated by the ratio of arachidonic acid to linoleic acid are
associated with inflammation and coronary artery disease, Am. J. Clin. Nutr. 88 (2008)
941–949
29
H. Truong, J.R. DiBello, E. Ruiz-Narvaez, P. Kraft, H. Campos, A. Baylin, Does genetic
variation in the {Delta}6-desaturase promoter modify the association between {alpha}-
linolenic acid and the prevalence of metabolic syndrome? Am. J. Clin. Nutr. 89 (2009)
920–925.
A. Baylin, E. Ruiz-Narvaez, P. Kraft, H. Campos, {alpha}-Linolenic acid,{Delta 6-
desaturase gene polymorphism, and the risk of nonfatal myocardial infarction, Am. J.
Clin. Nutr. 85 (2007) 554–560.
Y. Lu, E.J. Feskens, M.E. Dolle, S. Imholz, W.M. Verschuren, M. Muller, J.M. Boer,
Dietary n!3 and n!6 polyunsaturated fatty acid intake interacts with FADS1 genetic
variation to affect total and HDL-cholesterol concentrations in the Doetinchem Cohort
Study, Am. J. Clin. Nutr. 92 (2010) 258–265.
R Core Team (2013). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL http://www.R-project.org/
Lattka, E. The Journal of Lipid Research, 51, 182-191.Cross-validation, and issues
withinA common FADS2 promoter polymorphism increases promoter activity and
facilitates binding of transcription factor ELK1, January 2010
Wold, S., The collinearity problem in linear regression. The PLS approach to
generalized inversions, SIAM J. ScI. STAT. COMPUT. Vol. 5, No. 3, September 1984
P.Rzehak,J.Heinrich,N.Klopp,L.Schaeffer,S.Hoff,G.Wolfram,T.Illig,J.Linseisen, Evidence
for an association between genetic variants of the fatty acid desaturase 1 fatty acid
desaturase 2 (FADS1 FADS2) gene cluster and the fatty acid composition of erythrocyte
membranes, Br. J. Nutr. 101 (2009) 20–26.
L.Schaeffer,H.Gohlke,M.Muller,I.Heid,L.Palmer,I.Kompauer,H.Demmelmair, T. Illig, B.
Koletzko, J. Heinrich, Common genetic variants of the FADS1 FADS2 gene cluster and
their reconstructed haplotypes are associated with the fatty acid composition in
phospholipids, Hum. Mol. Genet. 15 (2006) 1745–1756.
N.Martinelli,D.Girelli,G.Malerba,P.Guarini,T.Illig,E.Trabetti,M.Sandri,S.Friso, F. Pizzolo,
L. Schaeffer, J. Heinrich, P.F. Pignatti, R. Corrocher, O. Olivieri, FADS genotypes and
desaturase activity estimated by the ratio of arachidonic acid to linoleic acid are
associated with inflammation and coronary artery disease, Am. J. Clin. Nutr. 88 (2008)
941–949.
S. Bokor, J. Dumont, A. Spinneker, M. Gonzalez-Gross, E. Nova, K. Widhalm, G.
Moschonis, P. Stehle, P. Amouyel, S. De Henauw, D. Molnar, L.A. Moreno, A.
Meirhaeghe, J. Dallongeville, Single nucleotide polymorphisms in the FADS gene
cluster are associated with delta-5 and delta-6 desaturase activities estimated by serum
fatty acid ratios, J. Lipid Res. 51 (2010) 2325–2333.
30

More Related Content

What's hot

EngenuitySC's Science Cafe - March with Dr. Patrick Woster
EngenuitySC's Science Cafe - March with Dr. Patrick WosterEngenuitySC's Science Cafe - March with Dr. Patrick Woster
EngenuitySC's Science Cafe - March with Dr. Patrick Woster
EngenuitySC
 
Using Reaction Mechanism to measure Enzyme Similarity
Using Reaction Mechanism to measure Enzyme SimilarityUsing Reaction Mechanism to measure Enzyme Similarity
Using Reaction Mechanism to measure Enzyme Similaritybaoilleach
 
MSBMB 2016 conference poster
MSBMB 2016 conference posterMSBMB 2016 conference poster
MSBMB 2016 conference posterNasiru Abdullahi
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
Dmitry Grapov
 
journal.pone.0064521
journal.pone.0064521journal.pone.0064521
journal.pone.0064521Jared Bergman
 
Rosloniec_et_al-2009-Molecular_Microbiology.sup-1
Rosloniec_et_al-2009-Molecular_Microbiology.sup-1Rosloniec_et_al-2009-Molecular_Microbiology.sup-1
Rosloniec_et_al-2009-Molecular_Microbiology.sup-1Kamila du Plessis
 
Biomarkers
BiomarkersBiomarkers
Biomarkers
Subin Varghese
 
2015_FASEB_v4
2015_FASEB_v42015_FASEB_v4
2015_FASEB_v4Kin Chan
 
Neuropeptides 2014
Neuropeptides 2014 Neuropeptides 2014
Neuropeptides 2014
JoAnna Pendergrass, DVM
 
Dis. Model. Mech.-2013-Bersell-CRE_TOX paper
Dis. Model. Mech.-2013-Bersell-CRE_TOX paperDis. Model. Mech.-2013-Bersell-CRE_TOX paper
Dis. Model. Mech.-2013-Bersell-CRE_TOX paperBalakrishnan Ganapathy S
 
cAMP article_publication final
cAMP article_publication finalcAMP article_publication final
cAMP article_publication finalChristina Ross
 
69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...
69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...
69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...
Fernando Hernandez Borja
 

What's hot (20)

EngenuitySC's Science Cafe - March with Dr. Patrick Woster
EngenuitySC's Science Cafe - March with Dr. Patrick WosterEngenuitySC's Science Cafe - March with Dr. Patrick Woster
EngenuitySC's Science Cafe - March with Dr. Patrick Woster
 
Santiago-Vazquez EJMC 2014
Santiago-Vazquez EJMC 2014Santiago-Vazquez EJMC 2014
Santiago-Vazquez EJMC 2014
 
Using Reaction Mechanism to measure Enzyme Similarity
Using Reaction Mechanism to measure Enzyme SimilarityUsing Reaction Mechanism to measure Enzyme Similarity
Using Reaction Mechanism to measure Enzyme Similarity
 
Kalyani rajalingham maoa
Kalyani rajalingham maoaKalyani rajalingham maoa
Kalyani rajalingham maoa
 
MSBMB 2016 conference poster
MSBMB 2016 conference posterMSBMB 2016 conference poster
MSBMB 2016 conference poster
 
American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014American Society of Mass Spectrommetry Conference 2014
American Society of Mass Spectrommetry Conference 2014
 
Boobalan-LSD1-2013-Poster
Boobalan-LSD1-2013-PosterBoobalan-LSD1-2013-Poster
Boobalan-LSD1-2013-Poster
 
journal.pone.0064521
journal.pone.0064521journal.pone.0064521
journal.pone.0064521
 
AlexGriffith432
AlexGriffith432AlexGriffith432
AlexGriffith432
 
Rosloniec_et_al-2009-Molecular_Microbiology.sup-1
Rosloniec_et_al-2009-Molecular_Microbiology.sup-1Rosloniec_et_al-2009-Molecular_Microbiology.sup-1
Rosloniec_et_al-2009-Molecular_Microbiology.sup-1
 
Biomarkers
BiomarkersBiomarkers
Biomarkers
 
JB REU Report
JB REU ReportJB REU Report
JB REU Report
 
2015_FASEB_v4
2015_FASEB_v42015_FASEB_v4
2015_FASEB_v4
 
Neuropeptides 2014
Neuropeptides 2014 Neuropeptides 2014
Neuropeptides 2014
 
Dis. Model. Mech.-2013-Bersell-CRE_TOX paper
Dis. Model. Mech.-2013-Bersell-CRE_TOX paperDis. Model. Mech.-2013-Bersell-CRE_TOX paper
Dis. Model. Mech.-2013-Bersell-CRE_TOX paper
 
cAMP article_publication final
cAMP article_publication finalcAMP article_publication final
cAMP article_publication final
 
ProteinSci2007
ProteinSci2007ProteinSci2007
ProteinSci2007
 
69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...
69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...
69. synthesis, ex vivo and in silico studies of 3 cyano-2-pyridone derivative...
 
jm960365n
jm960365njm960365n
jm960365n
 
Elizabeth UROP poster_v5
Elizabeth UROP poster_v5Elizabeth UROP poster_v5
Elizabeth UROP poster_v5
 

Viewers also liked

Catalogo
CatalogoCatalogo
Catalogo
mundopc05
 
Usos y abusos de las TIC´S
Usos y abusos de las TIC´SUsos y abusos de las TIC´S
Usos y abusos de las TIC´S
Max Meza
 
Sistema de Gestão de Programas e Projetos do Terceiro Setor
Sistema de Gestão de Programas e Projetos do Terceiro SetorSistema de Gestão de Programas e Projetos do Terceiro Setor
Sistema de Gestão de Programas e Projetos do Terceiro Setor
rafaelmranieri
 
Art21--Paola Charnichart
Art21--Paola CharnichartArt21--Paola Charnichart
Art21--Paola Charnichart
Paola Charnichart
 
InnoMark VSM Presentation 03_16_16 (2)
InnoMark VSM Presentation 03_16_16 (2)InnoMark VSM Presentation 03_16_16 (2)
InnoMark VSM Presentation 03_16_16 (2)Patrick Connor
 

Viewers also liked (6)

Catalogo
CatalogoCatalogo
Catalogo
 
Usos y abusos de las TIC´S
Usos y abusos de las TIC´SUsos y abusos de las TIC´S
Usos y abusos de las TIC´S
 
Sistema de Gestão de Programas e Projetos do Terceiro Setor
Sistema de Gestão de Programas e Projetos do Terceiro SetorSistema de Gestão de Programas e Projetos do Terceiro Setor
Sistema de Gestão de Programas e Projetos do Terceiro Setor
 
Art21--Paola Charnichart
Art21--Paola CharnichartArt21--Paola Charnichart
Art21--Paola Charnichart
 
InnoMark VSM Presentation 03_16_16 (2)
InnoMark VSM Presentation 03_16_16 (2)InnoMark VSM Presentation 03_16_16 (2)
InnoMark VSM Presentation 03_16_16 (2)
 
CrickeTV
CrickeTVCrickeTV
CrickeTV
 

Similar to dan.crawford.project.final

Abstract_SMODIA2015_Acharjee_et_al
Abstract_SMODIA2015_Acharjee_et_alAbstract_SMODIA2015_Acharjee_et_al
Abstract_SMODIA2015_Acharjee_et_alBenjamin Jenkins
 
Data as research output, data as part of the scholarly record
Data as research output, data as part of the scholarly recordData as research output, data as part of the scholarly record
Data as research output, data as part of the scholarly record
Todd Vision
 
Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95Jessica Ayache
 
Spilman tropisetron brain research
Spilman tropisetron brain researchSpilman tropisetron brain research
Spilman tropisetron brain researchpatricia spilman
 
Accelrys UGM slides 2011
Accelrys UGM slides 2011Accelrys UGM slides 2011
Accelrys UGM slides 2011
Sean Ekins
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
Jennifer Shelton
 
presentation-presentation-biomarkers-arthur-burghes_en.pdf
presentation-presentation-biomarkers-arthur-burghes_en.pdfpresentation-presentation-biomarkers-arthur-burghes_en.pdf
presentation-presentation-biomarkers-arthur-burghes_en.pdf
MuhammadTanveer231769
 
Computation and System Biology Assignment Help
Computation and System Biology Assignment HelpComputation and System Biology Assignment Help
Computation and System Biology Assignment Help
Nursing Assignment Help
 
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016Jatinder Singh, PhD, ERT.
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Laurence Dawkins-Hall
 
Poster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathwayPoster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathway
Guide to PHARMACOLOGY
 
Eckenhoff et al 2001
Eckenhoff et al 2001Eckenhoff et al 2001
Eckenhoff et al 2001Kin Chan
 
UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...
UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...
UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...
Santosh Kumar Sahoo
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
Chris Southan
 
Rehmat ullah assignment
Rehmat ullah assignmentRehmat ullah assignment
Rehmat ullah assignment
UmarRasheed16
 
Arf6 Reliability Paper - LinkedIn
Arf6 Reliability Paper - LinkedInArf6 Reliability Paper - LinkedIn
Arf6 Reliability Paper - LinkedInKenneth Hee
 

Similar to dan.crawford.project.final (20)

Abstract_SMODIA2015_Acharjee_et_al
Abstract_SMODIA2015_Acharjee_et_alAbstract_SMODIA2015_Acharjee_et_al
Abstract_SMODIA2015_Acharjee_et_al
 
LSD1 - bmc-paper
LSD1 - bmc-paperLSD1 - bmc-paper
LSD1 - bmc-paper
 
Data as research output, data as part of the scholarly record
Data as research output, data as part of the scholarly recordData as research output, data as part of the scholarly record
Data as research output, data as part of the scholarly record
 
Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95Mol. Biol. Cell-2015-Ayache-2579-95
Mol. Biol. Cell-2015-Ayache-2579-95
 
Spilman tropisetron brain research
Spilman tropisetron brain researchSpilman tropisetron brain research
Spilman tropisetron brain research
 
ACCP publication
ACCP publication ACCP publication
ACCP publication
 
Accelrys UGM slides 2011
Accelrys UGM slides 2011Accelrys UGM slides 2011
Accelrys UGM slides 2011
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 
presentation-presentation-biomarkers-arthur-burghes_en.pdf
presentation-presentation-biomarkers-arthur-burghes_en.pdfpresentation-presentation-biomarkers-arthur-burghes_en.pdf
presentation-presentation-biomarkers-arthur-burghes_en.pdf
 
Computation and System Biology Assignment Help
Computation and System Biology Assignment HelpComputation and System Biology Assignment Help
Computation and System Biology Assignment Help
 
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
 
Poster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathwayPoster on systems pharmacology of the cholesterol biosynthesis pathway
Poster on systems pharmacology of the cholesterol biosynthesis pathway
 
Eckenhoff et al 2001
Eckenhoff et al 2001Eckenhoff et al 2001
Eckenhoff et al 2001
 
UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...
UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...
UNDERSTANDING THE INVOLVEMENT OF N-TERMINAL DOMAIN OF FATS IN INTERACTION WIT...
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
MSc dissertation
MSc dissertationMSc dissertation
MSc dissertation
 
Rehmat ullah assignment
Rehmat ullah assignmentRehmat ullah assignment
Rehmat ullah assignment
 
Internship Poster Diabetes 2014 Final
Internship Poster Diabetes 2014 FinalInternship Poster Diabetes 2014 Final
Internship Poster Diabetes 2014 Final
 
Arf6 Reliability Paper - LinkedIn
Arf6 Reliability Paper - LinkedInArf6 Reliability Paper - LinkedIn
Arf6 Reliability Paper - LinkedIn
 

dan.crawford.project.final

  • 1. Using SPLS Regression to identify Genomic Loci affecting Essential Polyunsaturated Fatty Acid Desaturation and Elongation Daniel Crawford (0590539) BINF 6999 - University of Guelph Advisors: Mutch, D., Dang, S. Submitted: 8 Aug 2013 Final Weighting: 80% Analysis 20% Laboratory
  • 2. Table of Contents Introduction 3 Methods 5 Results 8 Model 1 10 Model 2 (a) 11 (b) 15 Model 3 (a) 16 (b) 19 Discussion 20 Conclusions 25 Appendix i - Confidence Intervals for desaturation / elongation activity 26 Appendix ii - Confidence intervals for effect of SNPs on PUFA levels. 27 Appendix iii - Genomic location of SNPs in mapping the FADS gene cluster 28 References 29 2
  • 3. INTRODUCTION Investigations into the human plasma lipidome will provide useful information regarding many areas of human health. The poly-unsaturated fatty acids (PUFAs) Alpha-linolenic acid (ALA; 18:3 n-3), and Linoleic acid (LA; 18:2 n-6) are essential fatty acids for humans (Goodhart, R.,1980). These FAs are precursors to longer chain PUFAs such as Arachidonic Acid (AA; 20:4 n-6), Eicosapentaenoic Acid (EPA; 20:5 n-3), and Docosapentaenoic Acid (DPA; 22:5n-3). Levels of these fatty acids (FAs) in cellular membranes and plasma are related to dietary intake of either the preformed products or precursor FAs (Marangoni, R., 2002). Relative amounts of n-6 and n-3 FAs in the diet have an important impact on an individual's overall health. Typically, a diet rich in n-6 PUFAs will shift the physiological state towards being prothrombotic and proaggregatory (Simopoulos, A.,1999). High levels of long chain n-3 PUFAs decrease the production of inflammatory eicosanoids and cytokines (Caulder, P., 2006). An increased proportion of EPA and DHA in inflammatory cell phospholipids directly decreases the availability of AA to be used as a substrate for synthesis of pro-inflammatory eicosanoids (Caughey, GE., 1996). EPA can also be used as a substrate for eicosanoid production, resulting in less pro-inflammatory molecules (Goldman, DW., 1983) or even anti-inflammatory molecules (Serhan, CN., 2000). Long chain n-3 PUFAs can also affect gene transcription, including transcription of lipogenic enzymes (Clarke, SD., 1994). While the long-chain n-3s exhibit potent anti- inflammatory effects, the precursor ALA does not, in itself, contribute anti-inflammatory effects to the same degree (Simopoulos, A.,1999) (Calder, P., 2006); therefore investigations into desaturation and elongation of the essential PUFAs is warranted. The fatty acid desaturation pathway The fatty acid desaturation pathway (Figure 1) consists of a delta-6 desaturation (D6D), an elongation step, and a delta-5 desaturation (D5D) step. The enzymes responsible for these steps are Δ6 desaturase, fatty acid elongase, and Δ5 desaturase. n-3 and n-6 PUFAs share these enzymes and are competitive substrates. Δ6 and Δ5 desaturases are rate-limiting enzymes which catalyze double bond formation at the Δ6 3
  • 4. and Δ5 position of essential PUFAs, and are coded by the genes FADS1 and FADS2 respectively. The two FADS genes are found in a head to head configuration on chromosome 11 (61.57 – 61.63 Mb). Recent gene association studies have identified links between genetic variants in the FADS cluster of genes, plasma fatty acids, and development of diseases such as metabolic syndrome (Truong, H. et al, 2009), coronary artery disease (Martinelli, N. et al, 2008), myocardial infarction (Baylin, A. et al, 2007), and dyslipidemia (Lu, Y, et al, 2010). Genetic variation in the form of single nucleotide polymorphisms (SNPs) within the FADS cluster resulting in altered desaturation activity can be identified using regression. The goal of this independent research project was to identify the causal genetic variants altering the efficiency of the fatty acid desaturase pathway, and construct a haplotype to differentiate high converters and low converters of n-3 and n-6 PUFAs. A high converter is an individual with high aggregate desaturase activity (ADA), meaning the overall activity of the desaturation pathway is increased. Figure 1- Fatty acid desaturase pathway. The FADS2 (fatty acid desaturase 2) gene encodes the protein Δ-6 desaturase, catalyzing a double bond formation at the Δ-6 carbon. Fatty acid elongase 2, encoded by ELOVL2, adds 2 carbons. (FADS1) Fatty acid desaturase 1 encodes Δ-6 desaturase, the enzyme which catalyzes a double bond formation at the Δ-5 carbon. 4
  • 5. METHODS Toronto Nutrigenomics and Health (TNH) Study The Toronto Nutrigenomics and Health (TNH) Study examined over 2000 young adults from the University of Toronto. Participants of the study completed a health and lifestyle questionnaire, which included information about smoking habits, physical activity, medical history, caffeine habits, as well as ethnic background. The two largest ethnocultural groups were Caucasian and East Asian. Anthropometric measurements such as sex, waist circumference and Body Mass Index (BMI) were collected. The study collected genomic and lipidomic data from a subset of the population. Sparse Partial Least Squares (SPLS) Regression As modern clinical studies such as the TNHS are able to produce copious amounts of genomic and metabolomic data, investigations into statistical methodologies and bioinformatics are critical in order to make the most efficient extraction of biologically relevant information. Partial Least Square (PLS) was introduced by Wold in 1966. PLS performs a basic latent decomposition of the predictor matrix and the response matrix, to find a small number direction vectors which represent linear combinations of the predictor variables. In 2010, Chun, H., and Keles, S., introduced Sparse PLS which incorporates sparsity directly into the dimension reduction step of PLS. This results in sparse linear combinations of the original predictors, and in the case of this analysis, the selection of the SNPs most likely to be biologically relevant. Sparsity is imposed by implementing an L1 penalty; SPLS uses two tuning parameters, eta and K. Eta (η) is the thresholding parameter and must be between 0 and 1. SPLS uses a form of soft thresholding in which components are retained if they are greater then some fraction, determined by eta, of the maximum component. Eta directly determines the sparsity of the final solution, as a high eta will result in fewer selected variables. K is the number of latent components, and must be less then the rank of the predictor matrix. As K decreases, the solution becomes more sparse. Selection of the optimal parameters to use in a SPLS calculation is done using a 10-fold cross validation. This 5
  • 6. algorithm randomly partitions the variables into 10 subgroups, and uses each subgroup once as the validation group while the other 9 are used as the training set. Optimal parameters can then be estimated as a combination of the results from each analysis. An advantage of 10-fold cross validation is that each observation is used one and only once as part of the training set. An inherent consequence of this method is that the 10 subgroups are randomly assigned, and therefore subsequent cross-validations may result in different estimations of optimal parameters. Optimal parameters are those determined to have the lowest mean square error (MSE). Performing the SPLS analysis using alternate parameters may affect the final number of SNPs selected to be causing a significant effect, however SNPs that are more significant are less likely be affected by small changes in eta and K. The SPLS algorithm will automatically set the regression coefficient of non-causal SNPs to 0. While previous studies found associations with many SNPs within the FADS gene cluster, using SPLS regression will select a small number of important variables. This method will identify the SNP(s) within FADS1/2 an ELOVL2 that are most likely responsible for the observed variation in FA levels in plasma. Data was collected as part of the Toronto Nutrigenomics and Health Survey. Phenotype data for this analysis were GC measurements for the following fatty acids: Linoleic acid LA 18:2 n-6 γ-Linoleic acid GLA 18:3 n-6 Dihomo γ-Linoleic acid DGLA 20:3 n-6 Arachidonic Acid AA 20:4 n-6 Alpha-linolenic acid ALA 18:3 n-3 Eicosapentaenoic Acid EPA 20:5 n-3 6
  • 7. Enzyme activity was approximated by the following FA ratios: n-6 Δ6 Desaturase GLA / LA n-6 Elongase DGLA / GLA n-6 Δ5 Desaturase AA / DGLA n-6 Aggregate Desaturase Activity (n-3 ADA) AA / LA n-3 Aggregate Desaturase Activity (n-6 ADA) ALA / EPA Genotype data for 26 SNPs mapped out 3 genes: FADS1, FADS2, and ELOVL2 (see Appendix iii). The genotype data was formatted in a matrix such that individuals were given a ‘0’ for SNPs which were homozygous for the major allele, and ‘1’ for SNPs that were either heterozygous, or homozygous for the minor allele. 37 individuals were missing data for one or more SNP, and were removed from the analysis. Each SNP was tested for (HWE) in the whole population, the caucasian population, and the asian population, to identify ethnic dependent polymorphisms. Data analysis was performed in R (R core team, 2013). The package “SPLS” (Chum, H and Keles, S, 2010) was used for regression analysis. Optimal parameters for regression were first determined using cross-validation (CV). The SPLS function was then used for variable selection, where an SPLS object was created from the predictor and response matrix data. Confidence intervals were calculated for the effect of each SNP on multivariate responses. For model 1, predictor matrices were created for both the Caucasian and Asian populations, containing genotype data for only the SNPs used in the Merino et al. analysis. 15 SNPs were tested in Caucasians: 3 SNPs in FADS1 (rs174547, rs412334, and rs695867) and 12 SNPs in FADS2 (rs174576, rs174579, rs174593, rs174602, rs174611, rs174627, rs17831757, rs2072114, rs2845573, rs482548, rs498793, and rs968567). In Asians, 15 SNPs were tested: 3 in FADS1 (rs174547, rs412334, and rs695867) and 12 SNPs in FADS2 (rs174570, rs174576, rs174579, rs174593, rs174602, rs174611, rs174627, rs17831757, rs2072114, rs2845573, rs498793, and rs526126). 7
  • 8. For models 2 and 3, a training data set was created for the Caucasian population, composed of randomly selected individuals of Caucasian ethnicity. The predictor matrix of for these models including the SNPs in HWE within the Caucasian population. 19 SNPs from the FADS cluster where tested for HWE (RS174547, RS174570, RS174576, RS174579, RS174593, RS174602, RS174626, RS174627, RS17831757, RS2072114, RS412334, RS482548, RS498793, RS526126, RS695867, RS968567, RS174611, RS2845573, RS2851682). 7 SNPs from ELOVL2 were tested for HWE (RS12195587, RS13204015, RS3798719, RS8523, RS976081, RS3798720, RS911196). Age, BMI, and sex were included as confounding variables to account for the effect of these factors on plasma lipid levels. The SPLS object, containing the set of regression coefficients, was then used to create a predicted fatty acid profile. The Pearson product-moment correlation coefficient between the predicted response and the actual response are presented as r2 values. RESULTS Data from 1059 individuals (309 males and 750 females) ages 20-29 years was used for this study. Participants were of normal BMI (22.3 +/- 3.5) and non-smokers. The population was ethnoculturally diverse; the ethnic group with the largest representation was Caucasian (n=450), and the second largest was East Asian (n=384). Gas chromatography results showed that the highest average PUFA in plasma was Linoleic Acid, and the second highest was Arachidonic Acid (Figure 2). The highest ratio was AA/LA in Caucasians (Table 1), as well as in Asians (Table 2), and represents n-6 D5D, and the second highest ratio was DGLA/GLA and represents n-6 elongase. 8
  • 9. Figure 2 - Distribution of levels of each PUFA as measured by GC. Table 1- Desaturation pathway activity - Average of Caucasian population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.011 4.662 4.832 0.203 1.596 Table 2- Desaturation pathway activity - Average of East Asian population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.011 4.779 5.291 0.204 1.597 The 26 SNPs mapping the FADS1, FADS2, and ELOVL2 genes were tested for HWE. In the Caucasian population (n=451) 25 SNPs were in HWE: "RS12195587" "RS174547" "RS174570" "RS174576" "RS174579" "RS174593" "RS174602" "RS174626" "RS174627" "RS17831757" "RS2072114" "RS3798719" "RS412334" "RS482548" "RS498793" "RS526126" "RS695867" "RS8523" "RS968567" "RS976081" "RS174611" "RS2845573" "RS2851682" "RS3798720" "RS911196" RS13204015 was not in HWE. In the East Asian population (n=326), the following 19 SNPs were in HWE: "RS12195587" "RS13204015” "RS174576" "RS174579" "RS174593" "RS174602" "RS174626" "RS174627" "RS17831757" "RS2072114" "RS3798719" "RS412334" "RS482548" "RS526126" "RS695867" "RS968567" "RS976081" "RS174611" “RS911196" 9
  • 10. In the general population (n=1059), the following 9 SNPS were in HWE "RS174579" "RS174593" "RS174626" "RS17831757" "RS3798719" "RS482548" "RS695867" "RS968567" "RS976081" MODEL 1 - Replication of Merino et al analysis The goal of this model was to compare results of SPLS regression to linear regression. It would be expected that SPLS regression will select the most important variables in terms of having an actual effect on desaturation activity. Merino et al used linear regression to test 15 SNPs within a Caucasian group (n=78) for significant associations with D5D, D6D, n-6 ADA and n-3 ADA. Significant associations were found for 9 SNPs. Testing the same 15 SNPs using SPLS (eta=0.8 and K=2) in a larger Caucasian population (n=450) identified 3 SNPs ("RS174547" "RS174576" "RS968567") with significant associations to desaturase activity. SPLS regression (Eta = 0.88, K=3) results show that in the caucasian population RS174547 is significantly associated with D5D and n-6 ADA, RS174576 is significantly associated with n-3 ADA, and RS968567 is significantly associated with n-6 D5D activity. Table 3 - Regression Coefficients of presence of at least 1 minor allele in SNPs with significant associations to desaturation activity (Caucasian population). Negative coefficients indicate decreased desaturation activity. SNP n-6 D6D n-6 D5D n-6 ADA n-3 ADA "RS174547" -0.299 0 -0.003 0 "RS174576" 0 0 0 -0.130 "RS968567" -0.227 0 0 0 10
  • 11. Linear regression by Merino et al resulted in 9 SNPs having significant associations. SPLS (Eta = 0.88, K=2) results show that in the Asian population RS174547 is significantly associated with D5D, D6D, n-6 ADA, and n-3 ADA. RS174576 is significantly associated with D5D, D6D, n-6 ADA, and n-3 ADA. RS174611 is significantly associated with n-6 D5D activity. Table 4- Regression Coefficients of presence of at least 1 ‘C’ allele in SNPs with significant associations to desaturation activity (Asian population). In the Asian population, the ‘C’ allele is the major allele in the SNP RS174547. Negative coefficients indicate decreased desaturation activity. SNP n-6 D6D n-6 D5D n-6 ADA n-3 ADA RS174547 -0.250 -0.001 -0.013 -0.041 RS174576 -0.199 -0.002 -0.014 -0.045 RS174611 -0.314 0 0 0 MODEL 2(a) - Multivariate Analysis of Desaturase and Elongase activity (Caucasian population) The goal of this model was to select SNPs which affect all or a subset of the desaturation/elongation activities, including ADA. It is hypothesized that cross validation will select optimal parameters that will result in a solution with the appropriate sparsity, SPLS will select SNPs best associated with the desaturase activity responses, and will exclude SNPs that are highly collinear but do not have any direct effect on any desaturation activity. A sample of Caucasians (n=335) were randomly selected to be used as the training population. Genotype data for 25 SNPs mapping ELOVL2 and FADS1/2, as well as age, gender, and BMI comprised the prediction matrix. Fatty acid ratios were used to estimate desaturation/ elongation activity, and Aggregate n-6 and n-3 desaturation activity. A log transformation of the FA ratios were used as it was more normally distributed. 10-fold cross validation determined that optimal parameters were eta = 0.64 11
  • 12. and K = 3, with a mean square prediction error of 0.157. The cross-validation results indicate that the most likely solution has low sparsity. Figure 3 - CV Heatmap of Mean Square Prediction Errors for range of possible SPLS parameters. SPLS regression selected 10 SNPs that were significantly associated with at least one step in the fatty acid desaturation pathway. 6 SNPs were associated with D6D activity, 4 SNPs with elongation, 7 SNPs with n-6 D5D, 9 SNPs with n-6 ADA, and 6 SNPs with n-3 ADA. Table 5 shows the regression coefficients for the SNPs with signifiant associations, value with a CI containing 0 are automatically set to 0. 12
  • 13. Table 5 - Regression coefficients for selected variables and desaturation/ elongation activity (eta = .64, K=3). Negative coefficients indicate presence of at least one minor allele results in decreased desaturation activity. n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA RS174547 -0.017 0 -0.022 -0.023 -0.033 RS174570 -0.034 0.027 0 -0.018 0 RS174576 -0.016 0 -0.022 -0.023 -0.034 RS174579 0 0 -0.024 -0.013 -0.033 RS174593 0 0 -0.023 -0.013 0 RS2072114 -0.045 0.048 -0.023 -0.020 -0.023 RS968567 0 0 -0.029 -0.012 -0.039 RS976081 0 0 0.036 0 0.034 RS2845573 -0.036 0.028 0 -0.018 0 RS2851682 -0.032 0.025 0 -0.018 0 sex -0.079 0.125 -0.058 0 0 The SPLS derived regression coefficients were used to predict the desaturation activity response in an independent population of Caucasian individuals. The caucasian individuals not selected as part of the training population were used as the testing population (n=115). As SPLS creates linear combinations of the original variables, predicted response has a linear relationship to actual response, therefore Pearsons correlation coefficient was used to evaluate how well predicted data matched actual data. This regression model with direction vectors containing 10 SNPs was best able to predict n-6 ADA (r2 = 0.24) in the test population. Table 6- r2 value for predicted response in Caucasian testing population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.085 0.184 0.188 0.239 0.080 13
  • 14. The model was then tested on the general population (n=1059), including the training population (Table 7), and on the second largest ethnicity, the East Asian population (Table 8): Table 7- r2 value for whole population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.252 0.226 0.089 0.249 0.064 Table 8 - r2 value for East Asian population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.221 0.229 0.053 0.223 0.050 Figure 4 - Predicted desaturation and elongation activity in Caucasian Testing population compared to actual values. SPLS regression coefficients from training Caucasian population were applied to a test set of Caucasians. 14
  • 15. MODEL 2(b) This model uses the same prediction and response data as model 2(a). The cross validation heatmap for model 2(a)(figure 3) also indicated a high sparsity solution could be applied with a small increase (0.001) in MSE. As the parameters for this model (eta = .96, K=2) impose higher sparsity, it is expected that the number of selected variables will be reduced to a smaller number of SNPs. SPLS regression using parameters reduced the number of significant variables to 3. Two SNPs, RS174547 and RS174576 were selected. Sex was also found to contribute to desaturase activity. Table 9 - Regression coefficients for selected variables and desaturation/ elongation activity (eta = .96, K=2). Negative coefficients indicate presence of at least one minor allele results in decreased desaturation activity. n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA RS174547 -0.059 0.063 -0.057 -0.053 -0.081 RS174576 -0.058 0.061 -0.055 -0.053 -0.080 sex -0.078 0.149 -0.084 0 -0.064 Table 10 - r2 value for Caucasian test population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.051 0.103 0.202 0.200 0.032 Table 11 - r2 value for East Asian population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.224 0.242 0.058 0.210 0.047 Table 12 - r2 value for whole population n-6 D6D n-6 Elong n-6 D5D n-6 ADA n-3 ADA 0.229 0.238 0.073 0.231 0.058 15
  • 16. Figure 5 - Predicted desaturation and elongation activity in Caucasian Testing population, from SPLS regression using training Caucasian population. MODEL 3 (a) An SPLS regression analysis was done to identify associations between SNPs in the FADS gene cluster and the ELOVL2 gene, and levels of the plasma FAs: 18.2n6, 18.3n6, 20.3n6, 20.4n6, 18.3n3, and 20.5n3. Fatty acid levels were generally normally distributed. The training population consisted of 338 randomly selected Caucasian individuals. Genotype data included 25 SNPs which were in HWE in the caucasian population mapping the genes FADS1, FADS2, and ELOVL2. Confounding variables, age, bmi, and sex, were included. The response matrix was composed of the 6 individual plasma fatty acid levels as measured by gas chromatography (GC). Cross validation determined that the optimal parameters were eta = 0.92, and K = 2. MSE = 2.6128 (Figure 6) 16
  • 17. Figure 6 - CV Heatmap of Mean Square Prediction Errors for range of possible SPLS parameters SPLS regression was performed and 2 SNPs were selected as significant effectors of the 6 FAs. Only one SNP (RS174547) resulted in a significant effect on 18.2n6 (LA). None of the SNPs significantly explained variation in 20.3n6 (DGLA), and the remainder of the FAs were effected by 2 SNPs (RS174547 and RS174576) by approximately the same amount. Age and BMI had significant effects on the amount of 18.2n6 (LA). Table 13 - Regression coefficients for selected variables and PUFA levels (eta = . 92, K=2). Negative coefficients indicate presence of at least one minor allele results in decreased desaturation activity. 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA)20.5n3 (EPA) RS174547 0.219 -0.017 0 -0.270 0.018 -0.040 RS174576 0 -0.017 0 -0.268 0.018 -0.042 age 0.478 0 0 0 0 0 bmi -0.774 0 0 0 0 0 17
  • 18. Figure 7 - Predicted desaturation and elongation activity in Caucasian Testing population compared to actual fatty acid levels, from SPLS regression using training Caucasian population. When the model was tested on the test set of Caucasians (n=113), the best correlated prediction was for fa20.4n6 (AA). Table 14 - r2 value for predicted response in Caucasian testing population 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA) 0.08679 0.05626 0.01965 0.21063 0.019205 0.00013 When this model was tested on all individuals (n=1059), the predicted fa20.4n6 (AA) values once again were the best correlated to the actual values. Table 15 - r2 value for whole population 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA) 0.04656 0.15318 0.01167 0.24159 0.01100 0.01226 18
  • 19. Additionally, this model was tested on the East Asian population (n=384). The prediction of GLA levels for the asian population was better the in the general population. The best predicted FA in the East Asian population was AA. Table 16 - r2 value for East Asian population 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA) 0.03744 0.21149 0.01749 0.21721 0.00822 0.00312 Model 3 (b) To explore the effects of dimension reduction the regression algorithms was run again with the eta parameter set to 0. This model therefore will not impose sparsity, thus resulting in each variable being assigned a coefficient. While each SNP is assigned a coefficient, not all are significant. The following table shows the association of the 14 SNPs with each FA. Table 17 - Regression coefficients for variables and PUFA levels (eta = .0, K=2). Negative coefficients indicate presence of at least one minor allele results in decreased desaturation activity. 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA) RS12195587 -0.388545 8 0 0.036 0 0 0 RS174547 0 -0.005 0 -0.086 0.006 -0.013 RS174570 0 -0.004 0 -0.061 0 0 RS174576 0 -0.005 0 -0.083 0.006 -0.014 RS174579 0 0 0 -0.051 0 -0.010 RS174593 0 -0.004 0 -0.063 0.005 0 RS174602 0 0 0 -0.031 0 0 RS174626 0 0 0 -0.050 0.004 0 RS174627 0 0 0 -0.049 0 0 RS2072114 0 -0.004 0 -0.057 0 0 19
  • 20. 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA) 20.5n3 (EPA) RS968567 0 0 0 -0.038 0 -0.011 RS174611 0 -0.004 0 -0.066 0.005 0 RS2845573 0 -0.004 0 -0.053 0 0 RS2851682 0 -0.003 0 -0.046 0 0 BMI -0.700121 4 0 0.062 0 0 0 These coefficients predicted the fatty acid approximately as well as when the dimension reduction step is included. Table 18- Correlation with actual values of Caucasian test population 18.2n6 (LA) 18.3n6 (GLA) 20.3n6 (DGLA) 20.4n6 (AA) 18.3n3 (ALA)20.5n3 (EPA) Sparsity 0.02742 0.06301 0.03332 0.23542 0.01023 0.00023 No Sparsity 0.08679 0.05626 0.01965 0.21063 0.019205 0.00013 DISCUSSION Identification of genomic loci affecting PUFA metabolism leads itself to functional analysis of SNPs with a potential importance in the pathology of metabolic diseases and understanding how those diseases could be prevented. Previous studies tested for associations with genetic variants using linear regression. When considering genomic data, a number of issues arise that are not properly dealt with in linear regression models. SNPs within same gene can be founds in high linkage disequilibrium (LD), thus resulting in the genomic data being highly collinear. Using linear regression to model highly collinear relationships can result in unstable coefficients (Wold, 1984), the individual effects from each SNP, thus predicted responses are often poor. For the type of data presented in this study, it is assumed that only a few SNPs are causing the 20
  • 21. observed effects, this is otherwise known as the sparsity principal. This type of SNP selection problem is effectively handled by Sparse Partial Least Squares (SPLS) regression created as an adaptation of PLS. SPLS performs simultaneous dimension reduction and variable selection to identify the most relevant variables, and calculates coefficients to predict the response. In high-throughput biological research, accurate variable selection techniques are critical in determining relevant information. In studies focusing on single nucleotide polymorphisms, statistical complexities arise including high collinearity between the variables, and sparsity of relevant SNPs. The present study examined 26 SNPs mapping out 3 genes. All SNPs were tested for HWE in the whole population, as well as in the caucasian population, and the asian population. The population with the most SNPs in HWE was the caucasian population. This was expected as the SNPs for genotyping were selected from their presence within a caucasian population using HapMap. Predictive ability of SPLS was tested by comparison to linear regression. Merino et al performed linear regression on SNPs within FADS 1 and 2. Ratios of fatty acids approximating D6-Desaturase, D5-Desaturase, n-6 ADA, and n-3 ADA were the response variables. 15 SNPs were tested in the Caucasian population (n=78), and 9 were found to be significant, with the strongest association between RS174547 (FADS1) and n-6 ADA (p=3.99×10^8). All other SNPs were no longer significantly associated when rs174547 was considered as a co-variate. The same analysis was repeated using SPLS. In the Caucasian 3 SNPs were selected as relevant, RS174547 (FADS1), RS174576 (FADS2), and RS968567 (FADS2). The advantage of SPLS in this context is that, through the selection of optimal parameters, cross validation will provide a solution with the appropriate sparsity. Linear regression selected more variables then SPLS, this suggests that many of these variables were included because of their high collinearity to the causal loci. 21
  • 22. In the Asian population (n=69), of the 15 SNPs studied, 8 were found to be significantly associated with altered fatty acid levels. The strong correlation between rs174547 (FADS1) and n-6 ADA was also identified in Asians, as well as a strong association between rs498793 (FADS2) and n-3 ADA. All other associations became insignificant when rs174547 was added as a covariate. The same analysis was repeated using SPLS. In the East Asian population, 3 SNPs were selected as relevant, RS174547 (FADS1), RS174576 (FADS2), and RS174611 (FADS2). RS174547 and RS174576 were selected to be significant in all models, moreover at different parameters within these models. This was the case in not only the Caucasian population, but also the East Asian population. This is evidence that genetic variation one or both of these loci, or a closely linked non-genotyped loci, may be directly responsible for variation in plasma PUFAs by affecting desaturation enzyme activity. RS174547 and RS174576 have been identified as significantly associated with altered fatty acids in several other studies (Rzehak, J., 2009; Schaeffer, L., 2006; Martinellis, N., 2008; Boker, S., 2010) however these studies identified multiple SNPs all in in high LD with RS174547 and RS174576. RS174547 (FADS1) Presence of at least one minor allele (T) at this loci was associated with decreases in D6D, D5D, n-6 ADA, and n-3 ADA. The T allele was associated with increased precursor levels and decreased products. The T allele was also associated with increased elongation activity. In model 1, this SNP was associated with D6D activity and not D5D activity. As this SNP is within the gene for D6 desaturase, this is likely close to its actual effect. Associations with D5D activity in other models is likely due to the very high collinearity between this SNP and RS174576. RS174547 is in high LD with RS174576 (r2= 0.97). 22
  • 23. RS174576 (FADS2) Presence of at least one minor allele (A) is associated with decreased D5D activity, D6D activity, n-3 ADA, and n-6 ADA. Minor allele presence was associated with decreased GLA, AA, and EPA, increased ALA, and increased n-6 elongation activity. It is unclear if this SNP has actual causal effects or is associations are due to high collinearity with the causal SNPs. RS968567 (FADS2) Significant associations with the presence of at least one minor (A) allele at this loci existed with with decreased D6D activity, D5D activity, n-3 and n-6 ADA and with decreased levels of Ararchidonic acid (fa20.4n6). This SNPs is found within the promotor region of FADS2, in a predicted binding site for transcription factors such as SREBP1 and PPARa (Lattka, E. 2010). Functional analysis by Lattka, E. et al showed using luciferase reporter gene assays that two transcription factors including ELK1 bind to the promotor region in a manor specific to the RS968567 allele. Promotor activity increased with the minor T allele. If would be expected that the presence of a minor allele here would be associated with increases D5D activity however this was not observed. (Lattka, E. 2010) ELONGATION ACTIVITY Both RS174576 and RS174547 showed positive associations with elongation activity however this is most likely an artifact of highly correlated enzyme activity creating the precursor and depleting the product of the elongation step. No ELOVL2 SNPs were selected in any SPLS model. This suggests that genetic variation within these genes is not resulting in any altered metabolite levels within this pathway. As the elongation step is not the rate limiting step within this pathway, changes in enzyme activity is unlikely to significantly impact the lipid measurements. 23
  • 24. CROSS VALIDATION For each model, optimal parameters for SPLS regression were selected using a 10-fold cross validation. Each time a CV is performed it may give different results as the subgroups are randomly selected. A CV heat map can be plotted to visualize the mean squared prediction error, and can be a useful statistical diagnostic tool displaying which parameters will provide the ideal sparsity, and if multiple optimal parameters exist. The optimal parameters selected correspond to the global minima of prediction error, however local minima may exist. Figure 3 in model 2 demonstrates this idea, since the global minimum for mean square error is 0.157 and corresponds to eta = 0.54, and K=3, but a local minima of 0.158 exists when eta = 0.96 and K=3. This shows that a higher sparsity can be imposed with only a very small increase in mean squared prediction error. Using these new parameters reduced the number of selected SNPs from 10, down to 2. Outcomes of SPLS can be modified by manually selecting the parameters used. Since eta, the thresholding parameter, determines the sparsity of the final solution, the dimension reduction feature of the SPLS function can even be omitted by setting eta=0. This was done with the fatty acid response matrix to elucidate the multicollinearity of many SNP. When coefficients are assigned to each variable without dimension reduction, the predicted fatty acid levels are largely the same as when the sparsity principal is applied, this demonstrates that many of the SNPs selected under low sparsity conditions are not contributing significantly to the fatty acid responses. An important corollary of this approach is that the tag-SNPs which are used to map the genes may not include the causal variant. The tag SNPs which are selected have the highest likelihood of those which were genotyped to be causal, yet may only be indicative of the region where the true causal SNP exists. Often more than one SNP is selected as significant, and this method alone does not distinguish if they are both causal, if one is causal and the other is in very high LD, or if the two SNPs are in equal LD to the causal gene. Molecular biology approaches should be used to further investigate functional changes associated with each selected polymorphism. 24
  • 25. CONCLUSION SPLS regression is an effective statistical approach for determining the most relevant SNPs affecting a multivariate response. SPLS handles multicollinearity through implementing an L1 penalty, and can impose a reasonable amount of sparsity on the solution through selecting optimal parameters by cross validation. Selection of optimal parameters should not be solely left to the included cross validation function as many parameters will give useful results with negligible increases in MSE. The SNPs RS174547 and RS174576 were consistently selected as relevant variables and therefore one or both of them are likely either the causal SNPs, or are very closely link to the causal loci. 25
  • 26. Appendix i - Confidence Intervals for desaturation / elongation activity Training caucasian population (eta=0.96, K=2) Confidence intervals of effect of selected SNPS on D6D activity 2.5% 97.5% RS174547 -0.082 -0.036 RS174576 -0.082 -0.034 sex -0.125 -0.032 Confidence intervals of effect of selected SNPS on elongation 2.5% 97.5% RS174547 0.042 0.085 RS174576 0.039 0.081 sex 0.114 0.186 Confidence intervals of effect of selected SNPS on D5D 2.5% 97.5% RS174547 -0.074 -0.040 RS174576 -0.073 -0.038 sex -0.117 -0.052 Confidence intervals of effect of selected SNPS on n-6 ada 2.5% 97.5% RS174547 -0.067 -0.040 RS174576 -0.066 -0.040 sex -0.040 0.014 Confidence intervals of effect of selected SNPS on n-3 ada 2.5% 97.5% RS174547 -0.111 -0.053 26
  • 27. 2.5% 97.5% RS174576 -0.110 -0.053 sex -0.116 -0.007 Appendix ii - Confidence intervals for effect of SNPs on PUFA levels. Confidence intervals of effect of selected SNPS on Linolenic acid (18:2 n-6) 2.5% 97.5% RS174547 0.018 0.427 RS174576 -0.029 0.383 sex 0.062 0.915 bmi -1.216 -0.302 Confidence intervals of effect of selected SNPS on γ-Linolenic acid (18:3 n-6) 2.5% 97.5% RS174547 -0.025 -0.009 RS174576 -0.025 -0.009 sex -0.010 0.009 bmi -0.006 0.018 Confidence intervals of effect of selected SNPS on Dihomo γ-Linolenic acid (20:3 n-6) 2.5% 97.5% RS174547 -0.007 0.033 RS174576 -0.004 0.037 sex -0.051 0.004 bmi -0.005 0.081 Confidence intervals of effect of selected SNPS on Arachidonic acid (20:4 n-6) 2.5% 97.5% RS174547 -0.332 -0.214 RS174576 -0.328 -0.210 27
  • 28. 2.5% 97.5% sex -0.103 0.102 bmi -0.004 0.236 Confidence intervals of effect of selected SNPS on Alpha-linolenic acid (18:3 n-3) 2.5% 97.5% RS174547 0.006 0.035 RS174576 0.005 0.035 sex -0.012 0.024 bmi -0.037 0.012 Confidence intervals of effect of selected SNPS on Eicosapentaenoic acid (20:5 n-3) 2.5% 97.5% RS174547 -0.064 -0.019 RS174576 -0.066 -0.019 sex -0.007 0.065 bmi -0.043 0.023 Appendix iii - Genomic location of SNPs in mapping the FADS gene cluster (from NCBI) 28
  • 29. Literature Cited Robert S. Goodhart and Maurice E. Shils (1980). Modern Nutrition in Health and Disease (6th ed.). Philadelphia: Lea and Febinger. pp. 134–138.ISBN 0-8121-0645-8.- Risé P, Marangoni F, Galli C. Prostaglandins Leukot Essent Fatty Acids. 2002 Aug-Sep; 67(2-3):85-9. Regulation of PUFA metabolism: pharmacological and toxicological aspects. Artemis P Simopoulos, Essential fatty acids in health and chronic disease.. Am J Clin Nutr 1999;70(suppl):560S–9S Calder, C., n−3 Polyunsaturated fatty acids, inflammation, and inflammatory diseases. 2006, American Society for Clinical Nutrition Caughey GE, Mantzioris E, Gibson RA, Cleland LG, James MJ. The effect on human tumor necrosis factor α and interleukin 1β production of diets enriched in n−3 fatty acids from vegetable oil or fish oil. Am J Clin Nutr 1996;63:116–22. Goldman DW, Pickett WC, Goetzl EJ. Human neutrophil chemotactic and degranulating activities of leukotriene B5 (LTB5) derived from eicosapentaenoic acid. Biochem Biophys Res Commun 1983;117:282–8. Serhan CN, Clish CB, Brannon J, Colgan SP, Gronert K, Chiang N. Anti-inflammatory lipid signals generated from dietary n−3 fatty acids via cyclooxygenase-2 and transcellular processing: a novel mechanism for NSAID and n−3 PUFA therapeutic actions. J Physiol Pharmacol 2000;4:643–54. Clarke SD, Jump DB. Dietary polyunsaturated fatty acid regulation of gene transcription. Annu Rev Nutr 1994;14:83–98. Raatz, K., Bibus, D., Thomas, W., Kris-Etherton, P. Total Fat Intake Modifies Plasma Fatty Acid Composition in Humans N.Martinelli,D.Girelli,G.Malerba,P.Guarini,T.Illig,E.Trabetti,M.Sandri,S.Friso, F. Pizzolo, L. Schaeffer, J. Heinrich, P.F. Pignatti, R. Corrocher, O. Olivieri, FADS genotypes and desaturase activity estimated by the ratio of arachidonic acid to linoleic acid are associated with inflammation and coronary artery disease, Am. J. Clin. Nutr. 88 (2008) 941–949 29
  • 30. H. Truong, J.R. DiBello, E. Ruiz-Narvaez, P. Kraft, H. Campos, A. Baylin, Does genetic variation in the {Delta}6-desaturase promoter modify the association between {alpha}- linolenic acid and the prevalence of metabolic syndrome? Am. J. Clin. Nutr. 89 (2009) 920–925. A. Baylin, E. Ruiz-Narvaez, P. Kraft, H. Campos, {alpha}-Linolenic acid,{Delta 6- desaturase gene polymorphism, and the risk of nonfatal myocardial infarction, Am. J. Clin. Nutr. 85 (2007) 554–560. Y. Lu, E.J. Feskens, M.E. Dolle, S. Imholz, W.M. Verschuren, M. Muller, J.M. Boer, Dietary n!3 and n!6 polyunsaturated fatty acid intake interacts with FADS1 genetic variation to affect total and HDL-cholesterol concentrations in the Doetinchem Cohort Study, Am. J. Clin. Nutr. 92 (2010) 258–265. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/ Lattka, E. The Journal of Lipid Research, 51, 182-191.Cross-validation, and issues withinA common FADS2 promoter polymorphism increases promoter activity and facilitates binding of transcription factor ELK1, January 2010 Wold, S., The collinearity problem in linear regression. The PLS approach to generalized inversions, SIAM J. ScI. STAT. COMPUT. Vol. 5, No. 3, September 1984 P.Rzehak,J.Heinrich,N.Klopp,L.Schaeffer,S.Hoff,G.Wolfram,T.Illig,J.Linseisen, Evidence for an association between genetic variants of the fatty acid desaturase 1 fatty acid desaturase 2 (FADS1 FADS2) gene cluster and the fatty acid composition of erythrocyte membranes, Br. J. Nutr. 101 (2009) 20–26. L.Schaeffer,H.Gohlke,M.Muller,I.Heid,L.Palmer,I.Kompauer,H.Demmelmair, T. Illig, B. Koletzko, J. Heinrich, Common genetic variants of the FADS1 FADS2 gene cluster and their reconstructed haplotypes are associated with the fatty acid composition in phospholipids, Hum. Mol. Genet. 15 (2006) 1745–1756. N.Martinelli,D.Girelli,G.Malerba,P.Guarini,T.Illig,E.Trabetti,M.Sandri,S.Friso, F. Pizzolo, L. Schaeffer, J. Heinrich, P.F. Pignatti, R. Corrocher, O. Olivieri, FADS genotypes and desaturase activity estimated by the ratio of arachidonic acid to linoleic acid are associated with inflammation and coronary artery disease, Am. J. Clin. Nutr. 88 (2008) 941–949. S. Bokor, J. Dumont, A. Spinneker, M. Gonzalez-Gross, E. Nova, K. Widhalm, G. Moschonis, P. Stehle, P. Amouyel, S. De Henauw, D. Molnar, L.A. Moreno, A. Meirhaeghe, J. Dallongeville, Single nucleotide polymorphisms in the FADS gene cluster are associated with delta-5 and delta-6 desaturase activities estimated by serum fatty acid ratios, J. Lipid Res. 51 (2010) 2325–2333. 30