2.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
integrating these two approaches into a general stochastic model, which allows for
predicting the probability of observable as well as cognitive trails.
LO Trails and Their Stochastic Modelling
A web learning environment consists of learning objects (LOs) that are connected
to each other by means of hyperlinks. An environment like this is created by
instructors, and learners navigate through it in order to try to achieve their learning
goals. The ability to model such environments and navigation through them is
useful to aid the instructors (in this case learning environment designers) in
designing the link structure for the learning material. In general, a web learning
environment can be modelled as a graph, whose nodes are the LOs of the hypertext
and whose edges are the hyperlinks between LOs. Figure 2 below provides an
example of such a graph. Each of the LOs E, O1, O2, O3, O4, and O5 constitutes a
node and the directed edges (i.e. arrows) indicate how the learner can move
between them. Navigation through the environment is then represented by paths
through the graph. These paths – which are the observable LO trails – can be
stochastically modelled as Markov chains (Kemeny, & Snell, 1960) on the graph,
where the probability of moving from one node (i.e. one LO of the environment) to
another is determined by which LO the user is currently visiting. In order to deduce
a Markov model of learners' observable behaviour, web data mining techniques can
be used to deduce navigation sessions from the server log files (Borges, & Levene,
2000), including the statistics of how many times learners choose particular links
when confronted with a choice (i.e. from the frequencies of page transitions as
recorded in the log file). The trails taken by users through a learning environment
can be observed in real-time by watching where the user clicks (either manually or
using a tracking tool), or after the trail has been completed using web log mining
techniques to find the trails from the log files recorded at the server. As the Markov
model gives probabilities for transitions from one LO to another within the
environment, the probability of observing any particular trail through the
environment can be calculated by taking the product of each of the LO transition
probabilities from the Markov model. For example, if the transition from O1 → O2
has probability 0.9 and O2 → O4 has probability 0.5, then the probability of
observing trail O1 → O2 → O4 (once the user has arrived at page A) is 0.9× 0.5 =
0.45.
When environments are modelled in this way the Markov property means that
the only thing of importance in determining the next link to be followed is the
current LO. This may be too simplistic for some situations. In such cases higher-
order Markov models could be used to take the influence of earlier LO visits into
account, or as in the subsequently developed more complex model, the sequence of
cognitive states of the learner can also be considered. In this case semantic meta-
data about the LOs can be used in conjunction with the basic Markov model to give
a deeper analysis of LO trails: Meta-data about the concepts that can be learned
from the LO and the types of problem that can be answered by the learner after the
LO has been mastered can be used to help infer the possible learning paths
(cognitive trails) that might be expected from learners following a particular trail.
2
3.
COGNITIVE ASPECTS OF TRAILS
Trails in Knowledge Spaces
As learners navigate through a learning environment and interact with the LOs in
it, their knowledge may change: they learn. In this section we briefly outline
Knowledge Space Theory that provides a framework for representing the learner’s
state knowledge its changes. Knowledge Space Theory has been developed
constantly over the last twenty years (Doignon, & Falmagne, 1985, 1999; Albert,
& Lukas, 1999). This theory is fundamentally different from the traditional
psychological approach, which is based on a numerical evaluation of some
‘aptitude’. Knowledge Space Theory provides a completely qualitative, i.e. non-
numerical, but nevertheless precise representation of the knowledge of individual
learners in a certain domain. In the last two decades it accumulated an impressive
body of theoretical results (see Doignon, & Falmagne, 1999).
Knowledge Space Theory is based on the notion of a knowledge domain, which
is identified with a set Q of problems. Usually, these problems are assumed to be
dichotomous, i.e. the answer to each of these problems is judged to be either
correct, or incorrect. A typical example in the field of arithmetic is the problem
Gwendolyn is 3/4 as old as Rebecca. Rebecca is 2/5 as old as Edwin. Edwin
is 20 years old. How old is Gwendolyn?
The knowledge state of a learner is identified with the subset K ⊆ Q of problems
in the knowledge domain Q that the learner is capable of solving. This means that
for a knowledge domain of n problems there exist no less than 2n potential
knowledge states. The observed solution behaviour, however, will exhibit some
dependencies – from observing that a student is capable of mastering a given
problem, sometimes the mastery of other problems can be surmised. Due to these
mutual dependencies not all of the subsets of Q are plausible knowledge states, but
only a collection of them. This motivates the definition of a knowledge structure as
a collection K of knowledge states K of a given knowledge domain Q. A
knowledge structure is assumed to contain at least the empty set ∅ and the set Q,
as it may always be the case that none or all of the problems are solved.
Set-inclusion induces a natural ordering on a knowledge structure. Figure 1
provides an illustration of such an ordering, in which the upwards-directed line
segments represent set-inclusion. Any sequence of upwards directed line segments
from the naive knowledge state ∅ to the set Q of full mastery may be interpreted as
a cognitive trail representing one of the possible learning paths in the given
knowledge structure. The learning path {}, {a}, {a, b}, {a, b, d}, {a, b, c, d}
provides an example of a cognitive trail that is possible in the knowledge structure
K.
3
4.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
Figure 1. Example of a knowledge structure on the knowledge domain Q = {a, b, c, d}.
Knowledge structures that satisfy certain properties have received particular
attention; among them are knowledge spaces, which are closed for set union. This
means that with any two knowledge states their set-union is also a state of the
knowledge space. Quasi-ordinal knowledge spaces are closed for both union and
intersection (see Doignon, & Falmagne, 1985, 1999). The notion of a knowledge
space is equivalent to the concept of an and/or-graph that is well known in the
context of artificial intelligence.
The consideration of learning processes in Knowledge Space Theory is based on
the interpretation of the upwards directed line segments in the diagram of a
knowledge structure (see Figure 1) as the possible learning paths. We can
formulate Markov models of the learning process that proceeds along these paths
representing cognitive trails. The simplest such model identifies the states of the
Markov chain with the knowledge states of the knowledge structure. As an initial
condition we may assume that the learner is in the naive state ∅ with probability 1.
Moreover, a transitions from state K to state K' is only possible if K' is a
neighbouring state that covers K. Consider the knowledge structure of Figure 1,
and assume, for example, that an individual is in knowledge state {a}. Then the
only possible transitions are into the knowledge states {a, b} and {a, c}, as these
are the only knowledge states that are linked to {a} upwards directed lines. On this
basis, various learning models were suggested, which are reviewed in Doignon, &
Falmagne (1999).
Extensions of Knowledge Space Theory have been suggested that enrich the
originally behaviouristic perspective of the approach, which confines consideration
to the problem solving behaviour, by embedding cognitive assumptions. A recent
account of the results obtained within this line of research can be found in Albert,
& Lukas (1999). Moreover, the approach has proved to be highly effective and
beneficial in adaptive computer assisted knowledge assessment and technology
enhanced learning. There are various systems that implement this approach, e.g.
4
5.
COGNITIVE ASPECTS OF TRAILS
the commercial ALEKS system (http://www.aleks.com), which is a fully
automated mathematics tutor, and the APeLS system (Conlan, Hockemeyer, Wade,
& Albert, 2002), which rests upon a competence-based extension of the theory
(Albert, Hockemeyer, Conlan, & Wade, 2001; Conlan, Hockemeyer, Lefrere,
Wade, & Albert, 2001).
A STOCHASTIC MODEL
In this section we suggest a formal framework capable of capturing the mutual
dependence between observable trails of LOs and cognitive trails on knowledge
domains by means of a stochastic model. Two sets form the basis of the stochastic
modelling in the subsequently developed theory. On the one hand, we consider the
set L of LOs that constitute the learning environment. On the other hand, we refer
to a knowledge structure K on a knowledge domain Q. The knowledge domain Q
consists of problems that test the skills and competencies taught by the LOs in L.
The two sets L and K thus are intimately related. This relationship is actually
mediated by assigning skills and competencies to each element in both sets. For the
present purpose it can be characterised by a prerequisite map g that associates to
each problem in Q a collection of subsets of LOs. Each of these subsets assigned to
a problem is interpreted a minimal subset of LOs that provide the content sufficient
for solving it. More than one such subset may be assigned to a problem to reflect
the fact that there may be different ways to solve it, which are taught in different
subsets of LOs. To put it more formally, a prerequisite map on the knowledge
domain Q is defined as a map g that associates to each problem in Q a nonempty
collection of nonempty subsets of L. Assuming each collection of LOs to be
nonempty amounts to confine consideration to problems for which the content
taught by the LOs is relevant.
Consider the following simple example. The set of LOs for the learning
environment illustrated in Figure 2 is given by L = {E, O1, O2, O3, O4, O5}.
Figure 2. Example of a learning environment.
The symbol E denotes an entry point to the learning environment. In contrast to E,
the LOs O1 to O5 teach some content, and may be accessed in accordance with the
depicted link structure.
5
6.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
Suppose that the problems in the knowledge domain Q = {a, b, c, d} test the
contents taught by the LOs in L. In particular, let the prerequisite map g on Q be
given by
g(a) = {{O1, O2}},
g(b) = {{O1, O3}},
g(c) = {{O1, O2, O3, O4}, {O1, O2, O5}},
g(d) = {{O1, O2, O3, O5}}.
According to this assignment the problem a can be solved using the content
taught by the two LOs O1, O2. Problem c can be solved in two different ways. One
of the solutions requires the content of the LOs O 1, O2, O3, O4, and the other one
the content of O1, O2, O5. Details on how a prerequisite map may be established
from taught and tested competencies associated to LOs and problems (e.g. in the
form of metadata) can be found in Heller et al. (2004).
The prerequisite map g induces a knowledge structure on Q. For deriving this
structure we consider all subsets of L, which are taken as representing the portion
of the content that has been learned after visiting a sequence of LOs. The subset
{O1, O3, O4}, for example, induces the subset {b} of Q, i.e. having learned the
content of O1, O3, O4 only allows to solve problem b. The subset {b} thus is a
possible knowledge state of the knowledge structure
K = {∅, {a}, {b}, {a, b}, {a, c}, {a, b, c}, {a, b, d}, Q},
which actually is the knowledge structure illustrated in Figure 1. The diagram of
the knowledge structure shows the possible learning paths that a user of the above
displayed learning environment can take.
Basics
We now outline a stochastic model capable of predicting the overt navigation
behaviour and the underlying learning process, as well as their mutual
dependencies. As only knowledge of both the current LO L and the current
knowledge state K leads to a proper characterisation of the process at any point in
time, we subsequently consider the set of all pairs (L, K), which is the Cartesian
product L × K. Within a Markov chain model we identify the set of potential states
the learner can be in (the Markov states, or M-states, for short) with exactly this
Cartesian product L × K. The Markov property that we assume implies that the
future of the process is completely determined by the current M-state, i.e. the pair
(L, K) characterising the learners position in the learning process at this point in
time. In particular, it is not important how the process got to this state, or, in other
words, all information about the past is embodied in the current state.
We consider discrete time, which will be indicated by a subscript t = 0, 1, 2, … .
This index is incremented whenever the learner selects a link to another learning
object from those available at a time. For all points t in time let L t and Kt denote
6
7.
COGNITIVE ASPECTS OF TRAILS
random variables that take their values in L and K, respectively. The Markov chain
then consists of a sequence of pairs
(L0, K0), (L1, K1), (L2, K2), (L3, K3), … .
To identify this sequence information for an individual learner in a specific
learning environment information from different sources has to be integrated. The
trail of visited LOs can be obtained by log data analysis. Assessing the knowledge
states of K requires testing the learner on the problems in the knowledge domain Q.
We will provide details on the requested knowledge assessment later on.
The Markov model is defined by specifying an initial probability distribution on
the set of M-states L × K – i.e. by specifying P(L0, K0) – and by giving the
conditional probabilities P(Lt, Kt | Lt-1, Kt-1) of a transition from state (L t-1, Kt-1) at
time t-1 to state (Lt, Kt) at time t, for all t = 1, 2, … . Drawing upon the Markov
property allows for computing the probability of any trail (L0, K0), …, (Ln, Kn) by
P((L0, K0), …, (Ln, Kn)) = P(Ln, Kn | Ln-1, Kn-1) ∙ … ∙ P(L1, K1 | L0, K0) ∙ P(L0, K0).
Specifying an initial probability distribution on L × K will pose no problems in
most of the cases. The LO L0 is taken to be nothing else than the entry point to the
learning environment, the common starting point from which all learners depart
(denoted by E in Figure 2). Consequently, LO L1 represents the first LO providing
content that is actually inspected by the learner, which is in line with considering
K0 as the knowledge state before being exposed to the content. This knowledge
state is either the naive state ∅ for all learners (e.g. whenever the material is
completely new to them), or is assumed to be any other state in K, which may
differ over learners. In the first case, we have the initial condition P(L0, K0) = 1
whenever L0 = E, K0 = ∅ and P(L0, K0) = 0 otherwise. Here E denotes the entry
point of the learning environment. In the second case only the probabilities P(L 0 =
E, K0) can be non-zero, and their actual values may be estimated from data of an
assessment that precedes access to the learning environment (pre-assessment).
Defining the conditional probabilities P(Lt, Kt | Lt-1, Kt-1) for all t = 1, 2, …
requires to take into account their interpretation in the present context. In fact, the
transition (Lt-1, Kt-1) → (Lt, Kt) may be interpreted as:
A person visiting LO Lt-1 and having knowledge state Kt-1 at time t-1 selects LO Lt
and, as a consequence of this, moves into knowledge state Kt at time t.
Notice that, whenever no learning occurs, the knowledge state Kt at time t may
equal the knowledge state Kt-1 at time t-1. In any case, the above interpretation
suggests that the transition from state (Lt-1, Kt-1) at time t-1 to state (Lt, Kt) at time t
can be decomposed into two sub-processes, or stages
1. the selection of the next LO,
2. the learning process induced by the selected LO.
7
8.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
These two stages are reflected in the formula
P(Lt, Kt | Lt-1, Kt-1) = P(Lt | Lt-1, Kt-1) ∙ P(Kt | Lt, Kt-1),
which additionally incorporates the (straightforward and quite plausible)
assumption that only the current LO will affect the transition between knowledge
states.
This equation supposes that the effect of the history of the visited LOs up to Lt-1
is completely subsumed in the knowledge state Kt-1. The conditional probability
P(Lt | Lt-1, Kt-1) refers to the first of the above listed stages, and describes the impact
of the knowledge state Kt-1 on choosing the link from LO Lt-1 to Lt. The conditional
probability P(Kt | Lt, Kt-1) models the second stage, and captures the impact of the
LO Lt on the transition of knowledge states from Kt-1 to Kt (i.e. the learning
process). The presented formal framework thus describes the influence that
knowledge states impose on the transitions between LOs, as well as the effect that
the observable trails have on the knowledge states. Figure 3 provides a graphical
representation of the model showing the conditional dependencies, from which the
above equation can be inferred. It forms a so-called Bayesian network (Jensen,
2001).
Figure 3. Bayesian network representing the assumed conditional dependencies.
Considering the conditional probability P(Lt | Lt-1, Kt-1) instead of P(Lt | Lt-1)
allows for modelling dependencies between LOs that are mediated by knowledge
states. A learner, who already knows the content of a LO, for example, may select
this one with lower probability compared to a learner not having this prior
knowledge. In the same way P(Kt | Lt, Kt-1) explicitly refers to the LO that possibly
causes a transition between knowledge states. By this the model generalises the
conceptions where the trails of LOs and the knowledge trails, respectively, were
assumed to form Markov chains.
Parameter Constraints
As outlined above, the stochastic model contains a number of parameters (the
initial and conditional probabilities) that need to be specified in each application of
the model. Some of the values, however, are already determined due to
characteristics of the considered situation. Various aspects of the learning
8
9.
COGNITIVE ASPECTS OF TRAILS
environment put constraints on the conditional probabilities P(Lt | Lt-1, Kt-1), and
P(Kt | Lt, Kt-1), among them the topology induced by the link structure on the set of
LOs, and the relationship between the possible knowledge states. Due to the link
structure on L on the one hand, and the knowledge structure K on the other hand,
some of the conditional probabilities will have zero value (‘structural zeros’).
These parameters need not be estimated in the applications. We illustrate the
resulting reduction of the number of free parameters for the learning environment
depicted in Figure 2 with |L| = 6, and the knowledge structure K of Figure 1 with |
K| = 8. The probability P(Lt | Lt-1, Kt-1) can be non-zero only if there is a direct link
from LO Lt-1 to Lt. There are 13 direct links between the 6 LOs. This means that
instead the |L|2 ∙ |K| = 62 ∙ 8 = 288 potential parameters only 13 ∙ 8 = 104
conditional probabilities have to estimated from the data. Similarly, for P(Kt | Lt,
Kt-1) to be non-zero we have to have Kt-1 ⊆ Kt (see condition C3 below), and the
difference between the knowledge states Kt-1 and Kt has to be related to the content
taught in LO Lt (see condition C2 below). If we take into account set-inclusion
then, instead of |K|2 ∙ |L| = 82 ∙ 6 = 384 potential parameters only 31 ∙ 6 = 186 non-
zero conditional probabilities remain (31 is the number of pairs K, K' ∈ K for
which K ⊆ K' holds).
Not only in large scale applications the general model outlined above may still
contain a large number of parameters, even under the constraints discussed above.
In order to further reduce the number of free parameters, more specific sub-models
may be formulated. The additionally introduced assumptions, however, have to be
checked for theoretical soundness as well as for empirical validity. For instance,
the two conditional probabilities P(Lt | Lt-1, Kt-1) and P(Kt | Lt, Kt-1) may be assumed
to satisfy
P(Lt | Lt-1, Kt-1) = P(Lt | Lt-1) ∙ P(Lt | Kt-1),
and
P(Kt | Lt, Kt-1) = P(Kt | Lt) ∙ P(Kt | Kt-1),
respectively. Each of these equations can be plugged into the general stochastic
model, which decreases the number of parameters from 104 to 13 + 6 ∙ 8 = 61 and
from 186 to 31 + 8 ∙ 6 = 79, respectively. These additional assumptions relate the
stochastic model to the above discussed Markov chain models of both the
navigation behaviour – captured by the conditional probabilities P(Lt | Lt-1) – and
the learning process on the knowledge structure K – captured by the conditional
probabilities P(Kt | Kt-1). Apart from the theoretical soundness of the interpretation
of the parameters the empirical adequacy of the additional assumptions has to be
checked by statistical tests. Standard methods, such as likelihood ratio tests (e.g.
Lindgren, 1993) or methods based on information criteria, like AIC or BIC
(Akaike, 1973; Schwarz, 1978) may be used to test them against the general model
outlined above.
9
10.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
APPLICATION OF THE MODEL
Observability, Parameter Estimation, and Empirical Validation
Whereas the LO Lt visited at time t can be observed directly, this is not true for the
knowledge state Kt. Its determination would require partial (or even full)
assessment of the learner’s knowledge state at each point t in time, which
obviously is not a viable option in practical applications. Besides consuming a lot
of time, this continuously disrupts the learning process. So, we are left with only
partial observability of the M-states. In order to deal with this situation we consider
a scenario in which we have a pre-assessment (K0) before the LOs are inspected,
and a post-assessment (Kn) after finishing the interaction with the learning
environment (resulting in the sequence L0, …, Ln of visited LOs). Due to the partial
observability of the M-states the stochastic model may be conceived as a special
case of a hidden Markov model as illustrated in Figure 4. The squares represent
entities that are observable within the considered scenario, while non-observable
M-states are represented by circles. As the labelling of the downward arrows in the
diagram indicates the relation between the M-states (Lt, Kt) and the respective LO
Lt is deterministic, i.e. we have P(Lt | Lt, Kt) = 1.
Figure 4. Bayesian network representation of the stochastic model as a hidden Markov
model.
Standard procedures (e.g. Viterbi, Baum-Welch or EM algorithms) are available
for identifying the most likely sequence K1, …, Kn−1 and for parameter estimation,
given the observation of L0, …, Ln and K0, Kn (cf. Rabiner, & Juang, 1986;
Rabiner, 1989). Notice that the above described parameter constraints induced by
the link topology and the knowledge structure need to be implemented properly in
these procedures (e.g. Niculescu, 2005; Niculescu, Mitchell, & Rao, 2005).
The empirical validation of the Markov chain model can be based on deriving a
prediction of the distribution over the knowledge states Kn in the post-assessment,
given the sequence L0, …, Ln of visited LOs and the pre-assessment K0. This
prediction may be contrasted with the distribution that is estimated directly from
the log data by the observed relative frequencies within a cross-validation design
(i.e. the two estimates of the marginal distribution are based on different sub-
10
11.
COGNITIVE ASPECTS OF TRAILS
samples). Methods based on the Kullback-Leibler divergence (sometimes also
called relative entropy) may be used to evaluate the resulting discrepancy.
Identifying Cognitive Trails
In the sequel we consider a situation in which all the content taught by the LOs is
completely new to the learner. In this case knowledge is acquired exclusively by
navigating the learning environment and we have K0 = ∅. Again, the sequence of
visited LOs is observed and Kn is assessed in a post-test. The fundamental question
is, whether we can uniquely identify the sequence of M-states (Lt, Kt) with t = 1,
…, n. The major advantage in this situation is that parameter estimation techniques
based on log data analysis may be employed (cf. Borges, & Levene, 2000; Levene,
& Loizou, 1999). Given the partial information it is, however, not possible to
uniquely identify the trail that the learner took. For this we need to introduce
assumptions on the learning process. The following plausible assumptions narrow
down the set of possible sequences of knowledge states K1, …, Kn−1 by making
explicit their compatibility to the observations.
– Condition C1. Solving a problem q cannot be learned before visiting a set of
relevant LOs, which are sufficient for its solution: If q ∈ Kt K0 then there is a
subset N ∈ g(q) such that N ⊆ {L0, …, Lt};
– Condition C2. Learning to solve a problem q can only occur when visiting a
relevant LO: If q ∈ Kt Kt-1 then there is a subset N ∈ g(q) such that Lt ∈ N;
– Condition C3. There is no forgetting, i.e. the trail of knowledge states K0, …, Kn
is non-decreasing (with respect to set-inclusion): K0 ⊆ … ⊆ Kn.
Condition C1 assumes that a correct response to a problem, which was not
solved in the pre-assessment, cannot occur before visiting a subset of LOs that
provide content sufficient for solving it. This reasonable assumption is related to
the scope of the given trail of LOs, i.e. to what in principle can be learned from it.
A problem q ∈ Q lies within this scope (which means that we may have q ∈ Kn
K0 ) if and only if there exists N ∈ g(q) such that N ⊆ {L0, …, Ln}. A learner
following this trail of LOs, however, does not necessarily learn to solve all the
problems within its scope.
Condition C2 means that learning to solve a problem can only occur if the
currently visited LO is relevant for its solution. This means that the solution is
learned as soon as the last portion of the relevant and sufficient information is
considered, and excludes effects based on unrelated material mediating learning.
Condition C3 implies K0 ⊆ Kn, which, in principle (i.e. in case of K0 being non-
empty), may be contradicted by data. In empirical applications, however, we may
avoid this problem by simply identifying K0 with K0 ∩ Kn. Proceeding in this way
implicitly interprets the correct response to the problems in K0 Kn as lucky
guesses.
A trail (L0, K0), …, (Ln, Kn) on L × K is called consistent whenever the trail of
knowledge states K0, …, Kn is compatible to the trail L0, …, Ln of LOs, i.e.
whenever the compatibility conditions C1-C3 are satisfied.
11
12.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
Notice that the compatibility conditions C1-C3 are independent, and, in general,
there will be more than one trail of knowledge states satisfying these conditions.
The non-uniqueness is illustrated by the trails specified in Table 1. Both pairs of
consistent trails T1, T2 and T3, T4 are based on identical trails of LOs as well as
coinciding pre- and post-assessment. In the trails T1 and T2 the solution to problem
c is learned at different points in time. Trails T 3 and T4 differ with respect to
learning to solve problem a.
T1: (E, ∅), (O1, ∅), (O2, {a}), (O3, {a, b}), (O4, {a, b, c}), (O5, {a, b, c, d})
T2: (E, ∅), (O1, ∅), (O2, {a}), (O3, {a, b}), (O4, {a, b}), (O5, {a, b, c, d})
T3: (E, ∅), (O1, ∅), (O2, {a}), (O3, {a, b}), (O2, {a, b}), (O4, {a, b, c}), (O5, {a, b, c})
T4: (E, ∅), (O1, ∅), (O2, ∅), (O3, {b}), (O2, {a, b}), (O4, {a, b, c}), (O5, {a, b, c})
Table 1. Examples of consistent trails on the learning environment and knowledge structure
as illustrated in Figure 2 and Figure 1, respectively.
The examples demonstrate that, in general, the compatibility conditions C1-C3 will
not suffice to reconstruct a single trail of knowledge states from the given data.
There are, however, sufficiently strict assumptions that warrant uniqueness of the
inferred cognitive trail.
Strict Learning Assumption
Here we assume that learning can occur as soon as the relevant content has been
exposed. Given the observable trail L0, …, Ln of LOs from L, a prerequisite map g
on Q, and K0, Kn ⊆ Q.
Consider a trail K0, …, Kn of knowledge states in the domain Q, which is
defined in the following way. For all q ∈ Kn K0 and t ∈ {1, …, n-1} we have
q ∈ Kt K0 if and only if there is a subset N ∈ g(q) such that N ⊆ {L0, …, Lt}.
This condition is called Strict Learning Assumption (SLA).
Notice that the Strict Learning Assumption differs from C1, because it is stated
in form of a logical equivalence, and not as an implication. This seemingly slight
difference has important implications. First of all, the Strict Learning Assumption
defines a uniquely determined trail. If we assume that there are two trails K0, K1,
…, Kn-1, Kn and K0, K'1, …, K'n-1, Kn then for all t ∈ {1, …, n-1} we have q ∈ Kt
K0 if and only if the above condition holds, which in turn is equivalent to q ∈ K't
K0. Second, it can be shown that any trail K 0, …, Kn defined by SLA satisfies the
compatibility conditions C1-C3, and thus constitutes a consistent trail.
From the trails listed in Table 1 only T1 and T3 are in accordance with SLA. In
T2, for example, the learner is not able to solve problem c after visiting O 1, O2, O3
and O4, although these LOs provide all the required information according to the
prerequisite map g. The essential assumption is that learning occurs as soon as the
relevant material is available. This may be too optimistic. In particular, the Strict
Learning Assumption lacks plausibility if LOs are visited more than once.
Consider the trail T3, where the LO O2 is revisited, although its contents have
12
13.
COGNITIVE ASPECTS OF TRAILS
already been learned. In fact, revisiting an LO may be interpreted as an indication
of the fact that the material has not been learned during previous visits. This is
taken into account in the subsequently outlined Weak Learning Assumption.
Weak Learning Assumption
Under the same conditions as above consider a trail K0, …, Kn of knowledge states
in the domain Q, which is defined in the following way. For all q ∈ Kn K0 and t ∈
{1, …, n-1} we have
q ∈ Kt K0 if and only if there is a subset N ∈ g(q) such that (N ⊆ {L0, …, Lt} and
N ∩ {Lt+1, …, Ln} = ∅).
This condition is called Weak Learning Assumption (WLA).
As in case of SLA, the Weak Learning Assumption defines a unique consistent
trail (i.e. a trail that satisfies the compatibility conditions C1-C3). Still, learning
occurs whenever the relevant contents has been exposed, only deferred to the last
occurrence of multiply visited content. Notice that for trails, in which none of the
LOs are visited more than once, the assumptions SLA and WLA coincide. This is
the case for the trails T1 and T2 in Table 1. Thus, for T1 both SLA and WLA hold,
while T2 satisfies neither SLA, nor WLA. The trails T3 and T4 of Table 1 contain
multiple visits to LO O2. We have already seen that SLA holds for T3. This means
it cannot satisfy WLA, since solving problem a is learned from O2 at the first visit.
In contrast to that, learning how to solve problem a does not occur before the
second visit to O2 in trail T4, which therefore satisfies WLA.
In principle, we can formulate an even more relaxed assumption where learning
occurs at the latest point in time for which a consistent trail results. Stating this
hypothesis as a general rule, however, would mean adopting an overly pessimistic
point of view.
Effectiveness of Trails
The proposed model allows for judging the effectiveness of certain trails of LOs.
This information may provide guidelines for an optimisation of the learning
environment. It can form the basis for reshaping the link structure in the learning
environment by eliminating links that belong to ineffective trails. The information
could also be used to guide the adaptation of the hypertext using techniques such as
link hiding and adaptive ordering (Brusilovsky, Kobsa, & Vassileva, 1998).
The effectiveness of a trail of LOs has to evaluate the actual performance of the
learners relative to what in principle can be learned from the particular trail, which
has been called its scope. Recall that a problem q in Q lies in the scope s (L0, …,
Ln) of the trail L0, …, Ln if and only if there exists N in g (q) such that N ⊆ {L0, …,
Ln}. As learners following the trail of LOs do not necessarily learn to solve all the
problems within its scope, we may contrast it with the actual solution behaviour
represented by Kn K0.
13
14.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
A whole variety of numerical indices may be devised to capture the resulting
discrepancy. As a first index we propose to consider the probability P(s(L0, …, Ln) |
L0, …, Ln) of solving all problems within the scope given a certain trail, which
means Kn K0 = s(L0, …, Ln). A trail of LOs clearly is effective, if this probability
is close to 1, and ineffective if it is close to 0. More differentiated information may
be gained from additionally conditioning on the initial knowledge state K0 , which
can be employed to guide procedures for adapting the hypertext, like link hiding.
In general, given the trail L0, …, Ln we can consider a probability distribution on
the collection of subsets of Q that contain the problems the solution to which has
actually been learned. This collection is defined by the intersection K ∩ s (L0, …,
Ln) for all K ∈ K. The information in this distribution may be integrated into a
single index by forming the mean with respect to the number of problems the
solution of which has been learned, i.e. the mean of |K ∩ s (L0, …, Ln)| for all K ∈
K. To be comparable for different trails this index has to be normalised with
respect to the number of problems in the scope |s (L 0, …, Ln)|. Other aspects of the
observed trail may also be taken into account, like, for example, its length n. Given
the same effectiveness as measured by the above introduced indices, a shorter trail
may be considered superior to a longer one.
CASE STUDIES
In order to make use of the theoretical model presented in the previous sections to
predict user behaviour, suggest useful trails or adapt the structure of a learning
environment, the transition probabilities for the model must first be estimated.
Transition probabilities are the probabilities of a user moving from one (L,K) state
to another, and their values for a particular environment must be estimated from
usage data showing the behaviour of many users in the environment.
As the (L, K) states are not directly observable at each transition the states for
each transition must be inferred from whatever data is available about the L and K
states of users. In an on-line environment usage data can be obtained from log files
recording the history of access to the on-line resources. Data mining techniques can
be used to ‘clean’ the log file, and trails of L states for individual users (and
individual sessions) can be straightforwardly extracted as the sequences of
resources accessed by a user. The corresponding K states can be inferred from the
responses given to on-line assessments that test the current knowledge state of a
user. In most cases users will take an assessment test only at the beginning and end
of a session, or at the beginning and end of a course, perhaps with some
intermediate testing. The very least that is needed to be able to use the model is a
pre-assessment giving the user's knowledge state before interacting with the
environment and a post-assessment giving the knowledge state after learning has
taken place. In this case either the SLA or WLA can be used to derive a unique
trail of knowledge states.
In addition to the test scores, we must also know the mapping that associates
each problem q in Q with the subset of the resources LO in the environment that
teach the skills necessary to solve q. This enables us to infer at which point during
the observed trail of L states a particular change in knowledge state occurred, thus
14
15.
COGNITIVE ASPECTS OF TRAILS
generating the (L,K) trails for users that give us the usage statistics necessary for
predicting the transition probabilities.
Ideally, in order to make full use of our theoretical model the log file for a
system should allow the identification of the following:
– The resources visited by individual users – ideally users should have to ‘log in’
at the start of each session and the log file records the user for each request.
Where this is not possible, individual users can be identified using cookies or IP
address, but these options are not as reliable as explicit log-in information.
– The date and time of each access, to allow the identification of sessions.
– The knowledge state of a user at the beginning and end of a session – the results
from a pre-assessment and post-assessment for each session is ideal, but in
systems where users must log in each time they use the system a pre-assessment
may only be necessary for the first session, with the results of the post-
assessment at the end of a session serving as the pre-assessment for the
following session (this methodology assumes no forgetting between sessions).
and the following information about the environment must also be known:
– The mapping of test questions to LOs – each question in the pre- and post-
assessments should ideally relate to the content taught by a single LO within the
system in order to infer the cognitive trail of K states. This means that if
question q tests content taught by LO L and the post-assessment shows that the
user has learned to answer q during the preceding session, then the content was
learned at a point where they were interacting with L, and not any other LO in
the environment. If the mapping from questions to LOs is one-to-many, we need
to introduce assumptions on the learning process (like SLA or WLA), or we can
infer only probabilities for where the concept tested by q was learned.
We have obtained log data from two different web-based learning environments.
In the following case studies we illustrate how our model could be applied to each
of these environments. The first is an English grammar course offered to students
at Eötvös Loránd University during October 2004. The second is a course
developed by the computer science department at Trinity College Dublin that
teaches some basics about database management systems and the SQL query
language.
English Grammar Course
The Course
The English grammar course uses the Coraler mapping tool to display links to free
on-line educational portals teaching various aspects of English grammar,
categorised according to topic and difficulty level. The map allows learners to
navigate the material and to take both a general test and small topic tests. The
nodes of the graph are coloured differently to show which topic nodes have been
mastered (or partially mastered) by the learner, based on their test results.
An environment such as this is ideal for the application of our model, as there is
regular testing of learners to asses their knowledge state, and the test questions are
related to the content of known LOs. The log files record all of the information
necessary to be able to apply the model – both resources visited and test results –
15
16.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
and every action is associated with the corresponding user's ID, so identification of
sessions is straightforward and reliable.
The course was run at Eötvös Loránd University during October 2004, but
unfortunately it was not used enough to provide the quantities of usage data
necessary to be able to estimate probabilities with any degree of confidence. We
will here use some of the data collected to show the main principles of how the
model can be applied, without giving a full analysis of the environment.
Analysing the Log File
The system had almost fifty registered users, but only seventeen of these accessed
the system. We extracted the trails for individual users, and of the seven of any
reasonable length only one has the structure necessary for analysis using our model
(i.e. beginning with a test, followed by some visits to LOs and then some more
post-assessment). The other users' trails were disregarded for one of the following
reasons: one was the system administrator so the trail was meaningless in terms of
learning, one did no pre-assessment, one no post-assessment and three did a pre-
assessment but then only clicked on one content page.
The trail taken by the remaining user over three sessions is shown in Figure 5.
The log data that this visualisation is generated from can be found in Heller et al.
(2004).
16
17.
COGNITIVE ASPECTS OF TRAILS
Figure 5. The trail of one user of the English grammar course, over three sessions.
The ``Home'' page is the entry point to the system (like E in Figure 2), so is the
starting point for each of the session trails. All three sessions together can be
considered as a single trail beginning with the taking of the general test on the 1st
October and ending with the taking of mini-test #20 on the 5th of October. Even
this trail is not ideal for the application of our model because the user did not re-
take the general test after following their learning trail. However, the results of the
mini-tests (which consist of three questions on a single topic) could be used to
assess whether individual concepts within the knowledge space have been learned,
and if we had statistics from many users of the system (and thus could estimate the
transition probabilities) our model would allow us to predict from the user trail
which questions the user might be able to answer if they did take another general
test.
In order to see how the model can be applied we will consider one small part of
the English grammar course. Three figures show the situation: Figure 6 shows a
section of the knowledge space for the grammar course – knowledge of the simple
present and past tenses, Figure 7 shows the mapping from some of the questions in
the general test to the LOs that teach the concepts necessary to be able to answer
the questions, and Figure 8 shows the trail taken by one hypothetical learner
through the environment.
Figure 6. A small part of the knowledge space for the English grammar course
17
18.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
Figure 7. The mapping from questions in the general test to pages of learning content, and
the mapping from content pages to topics.
Figure 8. A learner trail through the English grammar course, and the accompanying
partially known Markov chain of (Lt, Kt) states.
As can be seen in Figure 8, the Markov chain of (Lt, Kt) states is partially known
– the results of the pre-and post-assessments give the knowledge state at the
beginning and end of the trail ( ∅ and {a} respectively), and the trail of learning
objects can be read directly from the log file. The knowledge state has changed at
some point along the trail from {} to {a}, but at which point the change occurred is
not directly observed. Our model allows us to infer the intermediate knowledge
states of the trail – the Kt. Referring back to the above introduced compatibility
conditions C1-C3:
– C1 tells us that the change in knowledge state from {} to {a} can not have
occurred before the user visited one of the pages that teach a. It can be seen in
Figure 7 that these are pages P1, P8 and P12. This means that the earliest along
the trail that the change could occur is at the visit to P8, the second page along
the trail, i.e. K1 must be {}.
– C2 tells us that {a} can only enter the knowledge state while the learner is
visiting one of the pages that teach a, i.e. while visiting P1, P8 or P12.
– C3 tells us that once the knowledge state becomes {a} then it will remain like
that for the rest of the trail.
18
19.
COGNITIVE ASPECTS OF TRAILS
This still leaves us with three possible consistent trails:
T1: (L0,{}), (P4,{}), (P8,{a}), (P12,{a}), (P30,{a}), (P1,{a}), (Lt,{a})
T2: (L0,{}), (P4,{}), (P8,{}), (P12,{a}), (P30,{a}), (P1,{a}), (Lt,{a})
T3: (L0,{}), (P4,{}), (P8,{}), (P12,{}), (P30,{}), (P1,{a}), (Lt,{a}).
Under either the SLA or WLA this set of three possible trails can be reduced to
one. If the Strict Learning Assumption holds then the learner trail must be T1, as
the content a is learned the first time relevant material is viewed, which is when the
learner visits P8. As there are no re-visits to the same LO in the trail, the same is
true under the Weak Learning Assumption – it will be T1 that is the correct trail.
In order to estimate the parameters for the model we would need log data for a
collection of users. For our example part of the English grammar course, imagine a
small log file that allows us (using the WLA) to infer the trails shown in Table 2.
T1: (L0,{}), (P4,{}), (P8,{}), (P12,{}), (P30,{}), (P1,{a}), (Lt,{a})
T2: (L0,{}), (P4,{}), (P1,{}), (P12,{a}), (P7,{a}), (P4,{a}), (Lt,{a})
T3: (L0,{}), (P12,{}), (P4,{}), (P1,{a}), (P30,{a}), (P4,{a,b}), (Lt,{a,b})
Table 2. An example set of trails inferred from a log file for the English grammar course.
The estimation of the initial probability distribution P(L 0, K0) is simple, as all
the trails begin at the entry point, which in this case is the set of links presented
after taking the general test, and all users so far have begun in the naive knowledge
state:
P(L0, K0) = 1 for L0 = E, K0 = ∅ and P(L0, K0) = 0 otherwise.
Further probabilities can be estimated as the relevant frequencies observable in
the logged trails. Estimating the probabilities of choosing a certain page
conditional on the previous page and previous knowledge state yields
P(Lt=P4 | Lt-1=L0, Kt-1={}) = 0.66,
P(Lt=P12 | Lt-1=L0, Kt-1={}) = 0.33,
P(Lt=P1 | Lt-1=P4, Kt-1={}) = 0.66,
P(Lt=P8 | Lt-1=P4, Kt-1={}) = 0.33,
P(Lt=P8 | Lt-1=P4, Kt-1={}) = 0.33,
and estimating the probabilities for the current knowledge state conditional on the
current page and the previous knowledge state provides
P(Kt={} | Lt=P4, Kt-1={}) = 1,
P(Kt={} | Lt=P1, Kt-1={}) = 0.33,
P(Kt={a} | Lt=P1, Kt-1={}) = 0.66,
P(Kt={a, b} | Lt=P4, Kt-1={a}) = 0.5,
P(Kt={a} | Lt=P30, Kt-1={a}) = 1.
19
20.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
Obviously, this example of three trails is too small for the estimates to be
reliable, but it shows the principle behind applying the model. The estimated
values can then be validated by predicting the results of post assessments based on
the trail followed.
SQL Course
The Course
This on-line SQL course is divided into five main sections – database concepts,
creating a database, populating a database, database retrieval and database
applications. The course is adaptive in that it only displays the sections that teach
material that the learner does not already know: A personalised home page for the
course (analogous to the entry point page E in Figure 2) is generated based on the
user's responses to an initial questionnaire consisting of five questions. Depending
on the answer to each question a link to one of the available sections will be
included in the personalised home page or not. Figure 9 shows the structure of the
course – the five sections that are shown or hidden depending on the questionnaire
responses are the five sub-trees below the home page, and the leaf pages represent
lessons consisting of a number of learning objects (‘pagelets’) that are accessed in
sequence.
Figure 9. Structure of the SQL course.
20
21.
COGNITIVE ASPECTS OF TRAILS
The adaptation is at a very high level, and the initial five questions are not really
precise enough to count as a pre-assessment and so do not give a fine-grained
picture of the user's knowledge state before beginning the course. No dependencies
between the sections are taken into account when building the course (i.e. the
prerequisite map is not considered in structuring the environment), so the
knowledge space corresponding to the course is effectively modelled as five
independent sections. This is probably too simplistic a model: If a user can embed
SQL in a C application (covered in the fifth section of the course), they would
reasonably be expected to know some basic database concepts (the subject of the
first section), however this constraint is not enforced within the system.
Assessment in the course is through a project that the learners must complete
that involves creating some database tables and running queries on them. This is
external to the on-line environment, and so there is no `post-assessment' data stored
in the log file – without separate assessment results for each user there is no way to
assess the learners' knowledge states after interacting with the environment.
Analysing the Log File
The data is obviously far from ideal for the application of our model – there is no
pre- or post-assessment, which means we cannot analyse the trails of K states for
learners. Despite these drawbacks we can do some analysis of the trails taken by
users through the system, which can begin to show how some parts of the model
can be acquired from basic server log data. The log data processed consists of
12472 lines covering the period 18-30 Nov 2003. From this we have generated 532
user sessions, within which we can identify 339 unique patterns of behaviour
within a session (i.e. 339 different trails). The format of the log file and the scope
of the data recorded in it imposes some further limitations on our analysis:
– Cached pages – the analysis does not look into referral (i.e. which was the
previous page visited) to establish whether any pages have been cached in-
between actual hits to the web server, hence the extracted sessions may not give
the full picture of the trails of LOs actually seen by the users.
– Some sessions may be incomplete as there is the possibility that a session may
extend over the ‘timeout’ period and/or users may run concurrent sessions in
different windows. Some sessions begin at ‘middle’ pages (i.e. do not follow the
login → course index → section sequence), which may be evidence of this.
– It is difficult to identify individual users from the log file – the user is `tagged'
only at the login page, and subsequent pages are not tagged. This makes it
difficult to accurately assign log entries to the right users – in our analysis we
have assumed that all entries belonging to the same IP address are grouped
together to represent one distinct user.
Number of links selected Percentage of sessions
0 4.7%
1 28.9%
2 7.7%
3 1.5%
4 1.7%
21
22.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
5 1.9%
6 1.9%
7 1.3%
10 1.3%
11-15 8.8%
16-20 4.7%
21-40 15.7%
41-60 9.0%
61-80 4.0%
81-100 3.7%
>100 3.2%
Table 3. The number of links selected per session for the SQL course log data
In order to more easily see general patterns of behaviour (rather than exact
trails), pages (and therefore the visits to them) can be categorised as falling into
one of four categories or ‘levels’ within the learning environment – the login}
level, the rebuild/personalise level, the section level and the pagelet level. Looking
at Figure 9, the Home Page (the course index) can be considered to be the login
level, the first row of five pages reached directly from the Home Page are the
section level, the pages below the first row in the diagram represent clusters of
pagelets, and the rebuild/personalise level is not shown as it is in some sense
‘outside’ of the course structure (if it had been shown it could be placed ‘parallel’
to the Home Page). Table 4 shows a sample of the URLs that appear in the log file,
and their respective categorisations into levels.
URL in log file Level
/sql/login.jsp Login
/sql/test.jsp?learner=user1 Rebuild/
Personalise
/sql/test.jsp?Q1=a&Q2=a&Q3=a&Q4=a&Q5=a&learner=user1&build=true Rebuild/
Personalise
/sql/section.jsp?course=SQL%20Course§ion=Database%20Concepts Section
/sql/page.jsp?course=SQL%20Course§ion=Database Pagelet
%20Concepts&subsection=Introduction&pagelet=1
/sql/page.jsp?course=SQL%20Course§ion=Database Pagelet
%20Concepts&subsection=Introduction&pagelet=2
/sql/page.jsp?course=SQL%20Course§ion=Database Pagelet
%20Concepts&subsection=Introduction&pagelet=3
/sql/page.jsp?course=SQL%20Course§ion=Database Pagelet
%20Concepts&subsection=Introduction&pagelet=4
22
23.
COGNITIVE ASPECTS OF TRAILS
Table 4. Categorisation of page hits into the four levels.
The user behaviour varies tremendously. The most frequent behaviour is a
single click on the login page, as shown in Figure 10. Another common sequence
of activity is shown in Figure 11 – the user logs in, rebuilds/personalises the
content, visits the course index, selects a section from the course table of contents,
and then selects a number (n) of pagelets from the subsection. This group of users
only visits one section out of the five possible sections during a session.
Figure 10. The most common trail for the SQL course – a single hit on the login page.
Figure 11. Another common trail for the SQL course – visiting pagelets in only one section.
We have seen how individual user's trails can be extracted from the log file and
how these can be used to identify general patterns of behaviour. However, with 339
different trails and 532 instances there is not enough data here to make reliable
estimates of transition probabilities from the observed frequencies. As there is no
pre- or post-assessment we do not have the data necessary to say anything about
the knowledge states of actual users. We will now hypothetically consider how the
knowledge structure for the ‘Populating a Database’ section of the course could be
(partially) derived from analysis using our model, were this data available. The
first thing to notice is that the knowledge structure of the environment will impose
some restrictions on the trails that can possibly occur – so by observing patterns in
the user log data it is possible to hypothesise about what the knowledge structure
might be. As more log data is collected the number of possible knowledge
structures consistent with the observations will reduce.
For example, Figure 12 shows the ‘Populating a Database’ part of the course
and one possible corresponding knowledge structure. Assuming that users begin
this section of the course in the null knowledge state ∅ we can deduce many
propositions about the trails that could possibly be observed if this is the correct
knowledge structure, such as:
– If a trail does not contain L1 then the final knowledge state should always be ∅,
as learners must learn {a} from L1 before any further progress can be made;
– If the final knowledge state contains {c} then the trail must contain both L 2 and
L3 , with L3 being visited after L2 , as learners must learn {b} from L 2 before
being able to learn {c} from L3;
23
24.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
– The final knowledge state for a learner following the trail {L1 , L4 , L3 , L2} must
be one of ∅, {a}, {a, b}, {a, d}, {a, b, d};
– and so on …
If, when analysing log data files, any of these propositions was found to be
contradicted then this would be evidence that the knowledge structure is not the
one hypothesised, and a new hypothesis consistent with the observations would
need to be made.
Figure 12. (a) The ``Populating a Database'' section of the SQL course, showing which
parts teach which concept; (b) The possible knowledge structure for this section of the
course, which shows that the ‘Insert Statement’ material must be learned before the ‘Update
Statement’ can be learned.
Once the probable knowledge structure has been deduced it can be used to
inform future re-structuring of the learning material for a static environment, or to
add useful adaptive functionality to an adaptive environment. In the case shown in
Figure 12 the environment could be re-structured so that L3 could only be accessed
after L2, or a more adaptive system could be engineered so that the links to L3 are
hidden either until L2 has been visited, or until an assessment shows that the
learner's knowledge state contains b. If the system was designed to automatically
implement our model by recording user behaviour, these changes could be made
automatically once the system was sure about the knowledge structure.
DISCUSSION
The case studies begin to show how our model can be applied in different
situations. The case study of the English grammar course illustrates the first step in
any application of the model -getting an estimate of the parameters (i.e. the (L, K)
transition probabilities) from usage data and the results of learner assessments.
24
25.
COGNITIVE ASPECTS OF TRAILS
This can be done for any environment where we have log data recording the usage
of the system coupled with pre- and post-assessments of the user's knowledge
states.
We then saw in the SQL course case study how frequent navigational patterns
(common trails) can be discovered from server log files. Consideration of the
common navigational patterns that emerge, along with the stochastic model of (L,
K) states can be applied for several purposes. Firstly, the model can reveal whether
the most common trails followed (as observed in the log file) are also the most
effective for learning. If this is the case then any re-design of the linkage structure
of the course (either manual or adaptive) should support these common types of
usage. If the most common trails are not the most effective then it may be that the
structure of the environment encourages or even forces suboptimal trails to be
followed, and in this case the course can be re-designed to prevent users following
the less efficient paths. The SQL course showed a huge range of different user
behaviours, and while it may be a good thing to allow learners to explore however
they like, it may be worth adjusting the structure to prevent some of the paths
where learning does not happen to make browsing more efficient for learning.
The SQL course case study also showed how the observed trails can be used to
(partially) derive the knowledge structure for a course. This information can also
be used to inform the future structuring of the course, again so that the material
along allowed paths through the environment corresponds to possible paths of
progression through the knowledge structure.
The model can further be used to predict user behaviour within an environment.
Given the model (with estimated parameters), an estimate of the current knowledge
state of the user and the user's current location the most probable next step in the
user trail can easily be calculated. Such predictions of the item most likely to be
requested next could be used to optimise pre-fetching and caching of resources.
Perhaps more interestingly, the model can also be used to suggest the best next step
– a LO that the learner is ready to learn the content of – to learners as they
navigate. This would provide a personalised environment, as the suggestions would
be based on an assessment the individual user's current knowledge state, current
location, and possibly their historical trail so far.
One method of automatically adapting a course is through the use of Adaptive
Hypertext (AH) techniques such as link hiding and adaptive ordering (Brusilovsky,
Kobsa, & Vassileva, 1998). Awareness of the learner's current knowledge state
means that an AH system could use link hiding to hide links to LOs covering
material that the learner is not yet ready to tackle (i.e. when there is no route in the
knowledge structure from the learner's current knowledge state to a knowledge
state containing the concepts taught at the ‘hidden’ LOs). Adaptive ordering can be
used to order material into the most effective trails, according to the model built
from initial (or perhaps ongoing, on-line) log file analysis. An adaptive system
such as this would mean that the course designer could set the course up to start by
giving a very open choice of LOs to learners, and the system would adapt over
time to give the most effective structuring of the material.
The suggested model is not only applicable to learning environments. In the
more general web environment current user models do not attempt to take any
25
26.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
account of users’ knowledge states. Usage mining models could be improved to do
so, possibly giving an insight into modelling, for example, user behaviour in e-
commerce environments.
In this situation the definition of cognitive states is much looser – the relevant
set of cognitive categories may even be as simple as: {Browsing, Buying}. As we
did with the SQL course, it may also be useful to characterise pages into a small set
of page types, perhaps: {Catalogue page, Product page, Search results, Shopping
basket, Checkout}. Armed with these sets of categories the log file can be analysed
by extracting the individual user sessions (using standard web data mining
techniques), and then categorising the cognitive state of the user based on the
outcome and characteristics of the observed trail. If the trail ends with a sale, then
we can say that the user was definitely ‘buying’, and mark the trail as a ‘buying’
trail. For other trails it is not so clear-cut, but there will be useful ‘fuzzy’ rules for
allocating probabilities to the cognitive state, for example if the user only looked at
product pages then it is quite probable that they were only ‘browsing’, but if they
got as far as the checkout then there is a higher probability that they may have been
‘buying’, but something stopped them from doing so.
Once all trails have been classified according to the user’s cognitive state the set
of ‘buying’ trails can be analysed to find any common features, and likewise for
‘browsing’ trails. This will enable an adaptive e-commerce system to recognise as
early as possible if a visitor to the site looks to be following a ‘buying’ or
‘browsing’ trail. In the former case the system can then adapt to try to make sure
the visitor has every opportunity to buy when they are ready. In the latter case the
environment could adapt to try to shift the visitor onto a ‘buying’ path.
CONCLUSIONS
The present chapter suggests a Markov chain model that interlinks observable trails
of learning objects (LOs) with associated latent trails in an underlying cognitive
space. It elaborates on the theory and develops methods for its application, which is
exemplified in two case studies. The main contribution of this work is the
combination of Knowledge Space Theory with web data mining techniques to
produce a new model of the relationship between the cognitive processes of users
of on-line environments and their observable navigation behaviour. As we have
begun to see in the case studies, the model opens up new possibilities for including
analysis of cognitive information in web data mining, with applications for both
learning environments and other more general web-based environments.
We believe the model is a good start, and a definite improvement of the current
state of both data mining (which does not consider cognitive states), and learner
assessment (which does not usually consider learner navigation). However, there is
still potential to further improve predictive power of the model, e.g. by taking into
account the effect of the time spent on a particular LO (visiting time). Within an
extended framework the dependence of the probability of a transition between
knowledge states on the current LO’s visiting time may be modelled by a learning
curve, the possible forms of which are well understood in psychology.
26
27.
COGNITIVE ASPECTS OF TRAILS
One issue that arose in conducting the case studies is that it is difficult to obtain
sufficient good quality log data as current e-learning practice appears to be
deficient in collecting (and storing) thorough logs. This would seem to be an issue
for the community in general – there is widespread use of on-line learning
environments, and they are often criticised for being quite ineffective at enabling
learning. Log files coupled with assessment records should be a major tool in
assessing and improving such environments, but many of the systems appear not to
be keeping sensible logs of all the relevant information, making useful log file
analysis impossible, or at best patchy.
ACKNOWLEDGEMENTS
We wish to thank Owen Conlan and Vincent Wade at Trinity College Dublin for
kindly providing us with access to the log files from their SQL course. Thanks also
to Marta Turcsányi-Szabó and Péter Kaszás at Eötvös Loránd University for re-
running their English grammar course in October and providing us with the log
data.
REFERENCES
Akaike, H. (1973). Information theory and an extension of the maximum
likelihood principle. In B.N. Petrov, & F. Csaki (Eds.). Second International
Symposium on Information Theory. Budapest: Adadeiai Kiado.
Albert, D., & Lukas, J. (1999). Knowledge Spaces: Theories, Empirical Research
Applications. Mahwah: Lawrence Erlbaum..
Albert, D., Hockemeyer, C., Conlan, O., & Wade, V. (2001). Reusing adaptive
learning resources. In C.-H. Lee et al. (Eds.), Proceedings of the International
Conference on Computers in Education ICCE/SchoolNet2001 (vol. 1, pp. 205–
210).
Borges, J., & Levene, M. (2000). Data mining of user navigation patterns. In B.
Masand, & M. Spiliopoulou (Eds.), Web Usage Analysis and User Profiling,
Lecture Notes in Artificial Intelligence (vol. 1836, pp. 92–111). Berlin: Springer.
Brusilovsky, P., Kobsa, A., & Vassileva, J. (Eds., 1998). Adaptive Hypertext and
Hypermedia. Dordrecht: Kluwer.
Conlan, O., Hockemeyer, C., Lefrere, P., Wade, V., & Albert, D. (2001).
Extending educational metadata schemas to describe adaptive learning resources.
In H. Davies, Y. Douglas, & D.G. Durand (Eds.), Hypertext '01: Proceedings of
the twelfth ACM Conference on Hypertext and Hypermedia (pp. 161-162). New
York: Association for Computing Machinery.
Conlan, O., Hockemeyer, C., Lefrere, P., Wade, V., & Albert, D. (2002). Metadata
driven approaches to facilitate adaptivity in personalized eLearning systems. The
Journal of Information and Systems in Education, 1, 38–44.
Doignon, J., & Falmagne, J. (1985). Spaces for the assessment of knowledge.
International Journal of Man-Machine Studies, 23, 175–196.
Doignon, J., & Falmagne, J. (1999). Knowledge Spaces. Berlin: Springer.
27
28.
HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER
Heller, J., Keenoy, K., Levene, M., Hassan, M.M., Hockemeyer, C., & Albert, D.
(2004). Cognitive and Pedagogical Aspects of Trails: A Case Study. Available:
http://www.dcs.bbk.ac.uk/trails/docs/D22-01-02-F.pdf
Jensen, F.B. (2001). Bayesian Networks and Decision Graphs. Statistics for
Engineering & Information Science. New York: Springer.
Kemeny, J.G., & Snell, J.L. (1960). Finite Markov Chains. Princeton: van
Nostrand.
Levene, M., & Loizou, G. (1999). A probabilistic approach to navigation in
hypertext. Information Sciences, 114, 165–186.
Lindgren, B.W. (1993). Statistical Theory (4th ed.). London: Chapman & Hall.
Niculescu, R.S. (2005). Exploiting parameter domain knowledge for learning in
bayesian networks. Technical report CMU-TR-05-147, Carnegie Mellon
University.
Niculescu, R.S. , Mitchell, T.M. , & Rao, R.B. (2005). Parameter related domain
knowledge for learning in graphical models. Proceedings of SIAM Data Mining
conference 2005 (pp. 310–321).
Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected
applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models.
IEEE ASSP Magazine, 3(1), 4–16.
Schwarz, J. H. (1978). Estimating the dimension of a model. The Annals of
Statistics, 6, 461_464.
AFFILIATIONS
Jürgen Heller
Department of Psychology
University of Graz
Mark Levene
School of Computer Science and Information Systems
Birkbeck University of London
Kevin Keenoy
School of Computer Science and Information Systems
Birkbeck University of London
Dietrich Albert
Department of Psychology
University of Graz
Cord Hockemeyer
Department of Psychology
University of Graz
28
Be the first to comment