The document summarizes a study analyzing factors that affect ratings of French fries, including oil type and age used in cooking. Data were collected from volunteers rating fries made from different oil conditions. Models were developed to determine the effects of explanatory variables on an ordinal overall rating variable. The optimal logit mixed model indicated oil age had a significant effect on ratings, while oil type and other variables did not. This suggests oil age is the primary determinant of optimal fries, with older oil associated with higher ratings.
1. An Equal Opportunity University 1 F:/ZhanxiongXu_XiaojiaoZang/MockProject1
Date: March 19, 2015
From: Zhanxiong Xu, Xiaojiao Zang
To: Kirsten Eilertson
Re: French Fries
RESEARCH FIELD: Nutrition Science
PROJECT TITLE: Cumulative Logit Mixed Models for the French Fries Dataset
1. PROJECT DESCRIPTION
The goal of this study is to look for the optimal combination of the attributes of oil to make
French fries. The data were collected by our client in an on-campus food science lab at the
Pennsylvania State University. Based on the description of our client, there are around 1,800
observations collected from around 900 volunteers over 10 days. Two sets of fries made under
different oil conditions were presented to every volunteer for tasting and rating. The type and the
age of the oil used for making fries were recorded as explanatory variables. The response
variable is multivariate, consisting of ratings of five different attributes (temperature, appearance,
color, taste, and texture) of fries and an overall rating of fries. Meanwhile, some demographic
variables such as gender and food allergies were also collected. The explanatory and response
variables would be further explained in section 1.3. There are no missing values in the dataset.
Additionally, this is an observational study. The study is on an analysis stage.
1.1 – RESEARCH QUESTIONS
Question 1: Do different types of oil and different ages of oil affect people's overall liking to
fries?
Question 2: If the answer to Question 1 is yes, how do we determine the optimal fries-making
condition?
Question 3: Are the five attributes of fries reasonable summaries of the overall liking?
1.2 – STATISTICAL QUESTIONS
We addressed all three questions raised in section 1.1 under one modeling framework. Two
Statistical Consulting Center
Department of Statistics
Eberly College of Science
Tel: (814) 863-0281
Fax: (814) 863-7114
The Pennsylvania State University
323 Thomas Building
University Park, Pa 16802
2. P a g e | 2
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
separate classes of models, say M1 and M2 are established to answer Questions 1, 2 and 3
respectively. Suppose we have appropriately built M1 and M2 for the data, it is easy to convert
three research questions in section 1.1 to the following statistical questions:
Question 1: In the full model from M1, do the two variables of interest have significant effects?
Question 2: In the final model from M1, how do we determine the optimal level combinations of
the two variables of interest that results in the highest score of some criterion?
Question 3: In the full model from M2, do the five variables that represent the five attributes of
fries have significant effects?
1.3 – VARIABLE OF INTEREST
For computing reasons, we renamed some variables from the original dataset. We will
use these new names to introduce variables. The variables of main interest are
summarized as follows (see Table 1).
OilType: Types of oil to make fries including four categories: Clear Valley, Mel Fry
Free, Advantage, and Mel Fry.
OilAge: Ages of oil to make fries, measured in day, have five categories: 1, 2, 3, 4 and 5.
Temp: Participants' ratings on fries temperature, the scales are ranged from 1 to 9.
App: Participants' ratings on fries appearance.
Color: Participants' ratings on fries color.
Taste: Participants' ratings on fries taste.
Texture: Participants' ratings on fries texture.
Rating: Participants' ratings on overall liking to the fries they tasted.
There are other two variables Day and Volunteer. Although they are not our primary research
interest, they are not negligible since they account for the structure of data collection. See 2.2 for
more details.
Table 1. Rating scale for different attributes of fries
1 2 3 4 5 6 7 8 9
Temperature Extremely too
cool
Very much
too cool
Moderately
too cool
Slightly too
cool
Just about
right
Slightly too
hot
Moderately
too hot
Very much
too hot
Extremely
too hot
Appearance Dislike
extremely
Dislike very
much
Dislike
moderately
Dislike
slightly
Neither like
nor dislike
Like slightly Like
moderately
Like very
much
Like
extremely
Color Extremely
too light
Very much
too light
Moderately
too light
Slightly
too light
Just about
right
Slightly too
dark
Moderately
too dark
Very much
too dark
Extremely too
dark
Taste Dislike
extremely
Dislike very
much
Dislike
moderately
Dislike
slightly
Neither like
nor dislike
Like slightly Like
moderately
Like very
much
Like
extremely
Texture Dislike
extremely
Dislike very
much
Dislike
moderately
Dislike
slightly
Neither like
nor dislike
Like slightly Like
moderately
Like very
much
Like
extremely
Overall
Liking
Dislike
extremely
Dislike very
much
Dislike
moderately
Dislike
slightly
Neither like
nor dislike
Like slightly Like
moderately
Like very
much
Like
extremely
3. P a g e | 3
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
2. EXPLORATORY DATA ANALYSIS (EDA)
2.1- Design or Regression?
At first glance, the data look like being collected from a blocking factorial design. However, as
revealed by the table below, the two factors of our primary interest are not orthogonal to each
other in that any column is not proportional to another one (i.e., a rigorous definition of
orthogonality, see Montgomery,2008). In addition, since each volunteer produces two
observations, the independence assumption assumed in a conventional ANOVA model is clearly
violated. Essentially, this study is not an experimental study. However, the model we built is a
regression model. Thus, the interpretations about conclusions should be adjusted accordingly.
1 2 3 4 5
Clear Valley 128 105 36 76 75
Mel Fry Free 128 105 36 76 75
Advantage 215 150 35 72 53
Mel Fry 215 150 35 72 53
2.2 - Nested Structure and Repeated Measurements
The data were collected in an obvious nested way. The researcher first selected (seemingly
randomly) ten different days as ten big blocks to conduct the experiment. For each day, many
volunteers were invited to taste the fries. Each volunteer was presented two varieties of fries,
which were made of different combinations of OilType and OilAge. Although we are not sure
whether the same volunteer came on different days, it seems a reasonable assumption that each
volunteer only showed up once during the whole period of experiment. Under this assumption,
we say that the factor Volunteer is nested in factor Day in the sense that each volunteer appeared
once in only one day. The model we built has to reflect such structure (at least at the beginning).
Besides nesting, another important feature of this dataset is that each volunteer tasted two sets of
fries. It is clearly distinct from that two different volunteers tasted two sets of the fries. In
statistics, this is called “repeated measurements”, which need to be taken into account while
modeling. Therefore, we should take consideration of both features into the model.
2.3 - Fixed Effects vs. Random Effects
On one hand, we should treat variables OilType ..., Texture) having fixed-effects because of the
primary goal of this study. On the other hand, to be consistent with the data collection procedure
as well as to avoid voluminous nuisance parameters, the variables Day and Volunteer are best
treated as random effects. In conjunction with the fact described in subsection 2.2, the tentative
model we proposed is a Multilevel Mixed Effects Model, as shown in the following section 3.
2.4 – Scale Measurement of Variables
All the variables considered in this dataset are factors. In other words, they have a finite number
of categories. Some of variables are nominal, such as OilType while the remaining variables
4. P a g e | 4
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
(e.g., OilAge) are essentially ordinal in that there is ordering or ranking among categories of
those variables. Also, we should treat the response variable Rating as the ordinal scale so that we
can fit a cumulative logit model which enjoys a nice interpretation property. For other variables
related to scores, as they all appear as explanatory variables, we considered them as nominal
variables for their easier and clearer control-treatment contrast interpretations. Due to the
discreteness of all variables, it is not easy to represent data in an informative way. We plotted
some graphs by our own but decided not to report here since they are not that helpful in deciding
the form of models we are to fit.
2. STATISTICAL ANALYSIS
3.1 - The Cumulative Logit Models
Suppose that an ordinal variable Y has J categories, one way to use category ordering forms
logits of cumulative probabilities,
P (Y ≤ j|x) = π1 (x) + · · · + πj (x), j = 1, . . . , J
The Cumulative Logits are defined as
logit[P(Y ≤ j|x)] = log
𝑃(𝑌≤ 𝑗|𝒙)
1−𝑃(𝑌≤ 𝑗|𝒙)
= log
𝜋1 (𝒙) + · · · + 𝜋 𝑗 (𝒙)
𝜋 𝑗+1 (𝒙) + · · · + 𝜋 𝐽 (𝒙)
, j = 1, . . . , J
A model that simultaneously uses all cumulative logits is
logit[P(Y ≤ j|x)] = αj + β′ x, j = 1, . . . , J − 1 (1)
Each cumulative logit has its own intercept. The cumulative logit model (1) satisfies
logit[P(Y ≤ j|x1)] − logit[P(Y ≤ j|x2)] = β′ (x1 − x2)
An odds ratio of cumulative probabilities is called a cumulative odds ratio. From the above
equations, we see that the odds of making response ≤ j at x = x1 are exp[β′ (x1 − x2)] times the
odds at x = x2. The same proportionality constant applies to each logit. Because of this property,
model (1) is called a proportional odds model. Since the response variable Rating is an ordinal
variable with 8 categories (i.e., category 1 is automatically discarded since no observation has it
as response), it is a very reasonable choice to use model (1) to model our data. Additionally, we
need further modeling the random effects as described in section 2 above.
5. P a g e | 5
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
3.2 - The Cumulative Logit Mixed Effects Model for Fries Data
In this section, we will focus on describing the analysis procedure to answer Question 1 and 2,
while choosing to omit presenting the modeling process details to answer Question 3. The
conclusions and recommendations to address all problems are summarized in the next section.
As mentioned in section 2, our data presents obvious nested structure, thus an initial full model
that includes all possible covariates is:
logit (πi(j)t,k ) = θk − β1OilTypet − β2OilAget − β3Genderi − Dayj − Volunteeri(j)
i = 1, . . . , 945, j = 1, . . . , 10, t = 1, 2, k = 2, . . . , 8.
{Dayj } i.i.d ~ N (0, σ2
), {Volunteeri(j)} i.i.d ~ N (0, σ2
).
(2)
where πi(j)t,k = P (Ratingi(j)t ≤ k|Volunteeri(j) , Dayj ), that is, given Volunteer i came from Date j,
the probability that his/her tth
taste’s rating score is less than or equal to k. All the fixed-effects
factors are coded using treatment contrast (be aware that the notations such as β2 OilAget are just
short hands of writing the full dummy coding parameters). The notation i(j) signifies the nested
structure between variables Volunteer and Day, namely, the levels of Volunteer depend on the
levels of Day.
Except for incorporating nested random effects, model (2) differs from model (1) slightly in that
‘−’ are appeared in front of parameters in M(2) instead of ‘+’ in M(1) because the R package
ordinal (cf [2]) that we used to fit data adopts the former parametrization.
The estimates of parameters in model (2) are as follows:
6. P a g e | 6
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
The above output suggests none of OilType, Gender and Day is significant. We therefore update
the full model by deleting these three variables to get the following reduced model:
logit (πit, k) = θk − βOilAget − Volunteeri
i = 1, . . . , 945, t = 1, 2, k = 2, . . . , 8.
{Volunteeri } i.i.d ~ N (0, σ2
).
(3)
To justify such reduction which is reasonable, we can perform likelihood ratio test between M(2)
and M(3):
7. P a g e | 7
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
The large p-value shows that the reduced simpler model (3) is adequate.
When the insignificant random effect Day was removed, the Volunteer effect is kept to
account for repeated measurements effect. One may ask whether this variable could also be
removed in order to get a further parsimonious purely fixed-effects model. A similar
likelihood ratio test invalidates this attempt.
Our following analysis and final conclusions related to Question 1 and Question 2 were
concluded based on model (3). Since model (3) involves both fixed-effects (OilAge) and
random-effects (Volunteer), it is referred as a Cumulative Logit Mixed Effects Model or a
Proportional Odds Mixed Effects Model.
The estimates of parameters in model (3) can be summarized as follows:
𝜃̂2= −6.85, 𝜃̂3= −4.72, 𝜃̂4 = −2.62, 𝜃̂5 = −1.32, 𝜃̂6 = 0.57, 𝜃̂7 = 2.82, 𝜃̂8 = 5.39
𝛽̂a2 = 0.33, 𝛽̂a3 = 0.16, 𝛽̂a4 = 0.92, 𝛽̂a5 = 1.77
𝜎̂2
= 4.21
where subscripts a2, a3, a4, a5 represent OilAge = 2, 3, 4, 5 respectively.
Since this model doesn’t involve OilType variable, we deduce that OilType doesn’t
significantly affect the overall liking. On one hand, this conclusion holds exactly in a
conditional sense, although the same conclusion also tends to be true in a marginal sense. On the
other hand, OilAge has significant effect on the response. Therefore, instead of determining the
optimal levels combination of OilType and OilAge, now we only need to look for the optimal
level of OilAge in order to obtain the high overall rating.
All above estimates (except σ2
) should also be interpreted in a conditional sense. That is, given a
fixed value of the random effect Volunteer. Keeping this in mind, θk s are interpreted as baseline
cumulative logits, namely, the logits of (conditional) probabilities of the overall liking rating are
less than or equal to k, when 1-day-old oil was used. For instance, 𝜃̂5 = −1.32 means the
cumulative logit of the event {Rating ≤ 5}, when set OilAge = 1, is -1.32, or put it in another
way, the odds of the event {Rating ≤ 5}, given OilAge = 1, is exp (−1.32) = 0.267. Similarly, βa2
, . . . , βa5 have odds ratio interpretation between different treatments: for example, βa5 = 1.7675
8. P a g e | 8
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
means the odds of getting Rating ≤ k when using OilAge = 5 are exp(−1.77) = 0.17 times the
odds when using OilAge = 1, holding for every k in {2, . . . , 8}. It shows that using elder- aged
oil has advantages over young-aged oil. Again, these conclusions should be interpreted in a
conditional sense (as opposed to completely ignoring the Volunteer effect).
Volunteer Effect on Rating
5 557 54 721 306 133 677 939 255 534
Volunteer
Figure 1: Volunteer effects given by conditional modes with 95%
confidence intervals based on the conditional variance. The indices along
the horizontal axis show the labelling numbers of volunteers.
The large 𝜎̂2
shows that there is considerable heterogeneity among subjects who took the
experiment. Such heterogeneity can be clearly viewed from Figure 1, where we selected 19
“equally-spaced” volunteers (as plotting every volunteer’s effect makes the figure too dense) and
plotted their estimated individual effect together with confidence intervals in an increasing order.
The significant volunteer effect indicates that volunteers perceived the goodness of the fries
differently. Two natural interpretations are that either an overall liking of, say, 3 means different
things to different volunteers, or the volunteers actually perceived the goodness of the fries
differently. Because of the existence of such heterogeneity, we must control the volunteer effect
when we try to determine the optimal OilAge level. One reasonable criterion is to consider
“average judge” only, that is, set Volunteer = 0 and then estimate the cumulative probability for
each OilAge group. We would anticipate that the optimal OilAge gives the lowest cumulative
probability for each k. If any group possesses this property, then this group has the highest
probability to take higher scores (in statistics, this property is called “stochastically larger”). In
picture, such a group traces out a cdf curve to the rightest horizontally. Based on above
discussion and the Figure 2 below, we claim OilAge = 5 is the optimal OilAge level in that its
corresponding cdf curve locates at the rightest on the figure.
Volunteereffect
−10−505
9. P a g e | 9
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
For completeness, we also considered two classes of extreme “volunteers” which result in very
low and very high cumulative logits. These were investigated by setting the random effect
Volunteer to be the 5th and 95th quantiles of the estimated Volunteer distribution N (0, 4.21).
The following graph gives the same conclusion as shown by Figure 2: OilAge 5 is the optimal
level.
Cumulative Probability Comparison (average)
2 3 4 5 6 7 8 9
Overall liking rating scale
Figure 2: Rating cumulative probabilities for “average” volunteers at different
oil age levels
Figure 3: Rating cumulative probabilities for “extreme” volunteers at different
oil age levels. Left: 5%th
volunteers; Right: 95%th
volunteers
CumulativeProbability
0.00.20.40.60.81.0
10. P a g e | 10
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
To answer Question 3, we also started with the full model (2), but replace OilType and OilAge by
the five attributes factors Temp, App, Col, Taste and Texture. The final model, chosen based on
likelihood ratio test, is:
logit (πit, k) = θk − β1 Tempt − β2 Appt − β3 Tastet − β4 Texturet − Volunteeri
i = 1, . . . , 945, t = 1, 2, k = 2, . . . , 8.
{Volunteeri } i.i.d ~ N (0, σ2
).
The estimates of parameters are omitted here to save space. We see that except Color, other four
attributes do provide meaningful summaries of the overall liking rating. Among all five
attributes, Taste has the strongest relationship with Rating while Texture comes to the second.
This conclusion complies well with our common sense. The volunteer effect is still significant
but it seems having a weaker effect on the response than it is in model (3).
4. CONCLUSIONS AND RECOMMENDATIONS
Based on the statistical analysis in the last section, now we are able to summarize our responses
and recommendations for Question 1 to Question 3.
Question 1: Given the volunteer, the oil age factor is significant while the oil type factor is not
significant. It would not affect customer’s overall liking much using either one of oil type.
Therefore, clients are free to choose any type of oil taking other considerations such as price and
convenience.
Question 2: Clients are suggested using 5-day-oil to make fries in contrast to use fresh oil (that
is, one with shorter age).This is slightly counterintuitive but could be explained if more
properties of the 5-days oil and 1-day oil are measured and reported. Before further information
is available, we recommend setting the optimal oil age level to be 5 days.
Question 3: Given the volunteer, the overall liking is associated with four factors: Taste,
Texture, Appearance and Temperature but not with the Color attribute.
Additionally, we should not over-interpret these conclusions in two aspects. First, as our model
is a mixed-effects model, the individual effect plays an important role and can never be
neglected. This issue has already been stressed in section 3. In fact, we are not sure that between
individual effects and fixed treatment effects (OilAge), which one has more dominant effects on
the response variables. Second, since our model is a regression model instead of an experimental
design model, significance doesn’t imply any causation effect. For example, a 9-point overall
liking scale of a pack of fries doesn’t imply that it was made by using the 5-days oil and vice
versa. We will provide further advices in section 6.
5. RESOURCES
[1] Agresti, Alan. Categorical data analysis. John Wiley & Sons, 2nd edition, 2002.
[2] Christensen, Rune, 2015. Ordinal — Regression Models for Ordinal Data,
http://www.cran.r-project.org/package=ordinal/.
[3] Montgomery, Douglas C. Design and analysis of experiments. John Wiley & Sons, 2008.
11. P a g e | 11
March 31. 2015
An Equal Opportunity University F:/ZhanxiongXu_XiaojiaoZang/MockProject1
6. CONSIDERATIONS
Our analysis needs the assumption that each volunteer appeared only once during the period of
study, while its validity hasn’t been confirmed by clients. The analysis could be inappropriate if
this assumption doesn’t hold.
As mentioned in section 4, the conclusion we draw is somewhat restrictive. Stronger conclusions
can be reached through a more meticulous design at the very beginning stage. The main
suggestion is to make two factors OilAge and OilType orthogonal, which can be done by
arranging a 4 × 5 factorial design for each volunteer. If this is too expensive to implement, we
can randomly distribute volunteers to each of the twenty combinations while maintaining the
balance of the design (i.e., keeping each combination receiving equal number of volunteers). In
this way, we could obtain a mixed-effects design model. An alternative is to enroll a small
number of volunteers randomly, and arrange each volunteer in a full factorial design repeating
this procedure for a longer time. In this way, we can use a purely fixed-effects model by setting
volunteers as fixed effects and time as a continuous variable. Then, we may use an analysis of
covariance (ANCOVA) model to analyze results. Either approach stresses keeping the
orthogonality between experimental variables, which, we believe, would result in a stronger
cause-effect conclusion to some extent.