SlideShare a Scribd company logo
1 of 134
Download to read offline
The Pennsylvania State University
The Graduate School
College of Education
THE EFFECTS OF CHANGING THE NUMBER OF TERMS USED TO CREATE
PROXIMITY FILES ON THE PREDICTIVE ABILITY OF SCORING ESSAY-DERIVED
NETWORK GRAPHS VIA THE ALA-READER APPROACH
A Dissertation in
Learning, Design, and Technology
by
Daniel F. Fanella
 2015 Daniel F. Fanella
Submitted in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
August 2015
ii
The dissertation of Daniel F. Fanella was reviewed and approved* by the following:
Roy B.Clariana
Professor of Education (Learning, Design, and Technology)
Dissertation Advisor
Chair of Committee
Susan M. Land
Associate Professor of Education (Learning, Design, and Technology)
Major Field Member
Priya Sharma
Associate Professor of Education (Learning, Design, and Technology)
Major Field Member
Ravinder Koul
Associate Professor of Education (Curriculum and Instruction)
Outside Field Member and Outside Field Member
*Signatures are on file in the Graduate School
iii
ABSTRACT
Knowledge structure is the interrelationship between the concepts within a given domain
that exists in the memory associations of an individual that can be captured in external
artifacts. Assessing knowledge structure consists of eliciting approaches, representation
approaches, and then comparing the structural representations. In the pursuit to create an
automated method to capture and assess knowledge structure, the Analysis of Lexical
Aggregates tool (ALA-Reader) has shown promise. Research has supported the ALA-
Reader as a computer based approach for automatically scoring network graphs,
measuring individual, and group knowledge from essays, and as an approach to elicit
knowledge structure. The ALA-Reader translates written text summaries into a
proximity file to aggregate data at the sentence level to score essays in comparison to
human raters. In order to create proximity files, the ALA-Reader needs to have a list of
terms. Past research on the ALA approach for scoring network graphs and essays has
focused primarily on how knowledge is elicited as proximity files, for example, within
sentence approach verses the linear aggregate approach, but the number of terms used to
create the proximity files was not a variable. The purpose of this current non-
experimental exploratory investigation is to attempt to answer the question: “does an
optimal number of terms exist that can be used to create proximity files or are more terms
better?" To answer this question the current investigation plans to hold the elicitation
approach for creating the proximity files constant, and focus on varying the number of
terms used to create the proximity files in order to determine what effect this has on the
convergent validity of essay scores relative to human rater scores. This study found that
when using the ALA-Reader to create proximity arrays, 20 terms consistently had the
highest correlation of the word lists. This result was constant when applied to all five
iv
referent expert essays. These results suggest more terms are not necessarily needed to
create valid proximity arrays, which may increase the practicality of using the ALA-
Reader as a valid tool for automatically scoring essays. This study also extends past
research by considering essays that are considerably longer than those in the past; also the
type of essays analyzed in this study were argumentative essays. Future research is
needed to refine these results by applying the current study’s methodologies to other
types of essays i.e. persuasive, expository, or narrative, and to smaller restricted response
essays, blogs, or online forums.
v
TABLE OF CONTENTS
LIST OF FIGURES .....................................................................................................vii
LIST OF TABLES.......................................................................................................ix
ACKNOWLEDGEMENTS.........................................................................................xi
Chapter 1......................................................................................................................1
INTRODUCTION .......................................................................................................1
The Analysis of Lexical Aggregates.....................................................................5
Statement of the Problem, Purpose, and Research Questions ..............................8
Definitions ............................................................................................................10
Chapter 2......................................................................................................................12
REVIEW OF THE LITERATURE .............................................................................12
Essay Assessment .................................................................................................13
Methods of Capturing and Assessing Structural Knowledge...............................16
Measuring Knowledge Structure Using Concept Maps ................................16
Knowledge Elicitation Approaches...............................................................20
Other Approaches for Eliciting Knowledge Structure. .................................22
Assessing Knowledge Structure with the Pathfinder Approach...........................26
Comparing Knowledge Structures with Experts ...........................................30
Analysis of Lexical Aggregates Approach...........................................................32
Validating the ALA approach .......................................................................33
Creating a List of Terms................................................................................36
Chapter 3......................................................................................................................40
METHODOLOGY ......................................................................................................40
Participants ...........................................................................................................40
Course Materials...................................................................................................41
Research Purpose and Context .............................................................................45
Criterion Measures................................................................................................46
Essay Scoring Procedures.....................................................................................48
Developing Course Pony Document .............................................................48
Essay Assessment..........................................................................................48
The Sync Meeting and After Action Report..................................................49
Essay Requirement ...............................................................................................51
Procedures.............................................................................................................53
Referent Essays Defined................................................................................54
Essay Data Collection and Preparation .........................................................54
Generating a List of Terms............................................................................55
vi
Finalizing the importance of terms................................................................57
Creating the Proximity Files and PFnets.......................................................58
Chapter 4......................................................................................................................60
RESULTS ....................................................................................................................60
Term List ..............................................................................................................60
Proximity File Analysis of all Essays...................................................................61
Benchmark Referent Essay One: Containment Approach ............................63
Benchmark Referent Essay Two: Deterrence Approach...............................64
Benchmark Referent Essay Three: Engagement Approach ..........................66
Benchmark Referent Essay Four: Student Expert.........................................67
Benchmark Essay Five: PONY document ....................................................68
Proximity File Analysis of Essays with all Scores Excluding 3-Scores...............71
Benchmark Referent Essay One: Containment Approach ............................71
Benchmark Referent Essay Two: Deterrence Approach...............................74
Benchmark Referent Essay Three: Engagement Approach ..........................75
Benchmark Referent Essay Four: Student Expert.........................................76
Benchmark Referent Essay Five: Pony Document .......................................78
Comparison of the Proximity File Analysis of Essays with all Scores and the
Proximity File Analysis of Essays without the 3-Scores...............................79
Chapter 5......................................................................................................................81
GENERAL DISCUSSION ..........................................................................................81
Summary of Results..............................................................................................81
Research Implications...........................................................................................86
Limitations of the Study .......................................................................................92
Future Research ....................................................................................................94
References....................................................................................................................98
Appendix A..................................................................................................................107
Appendix B..................................................................................................................109
Appendix C..................................................................................................................115
Appendix D..................................................................................................................117
vii
LIST OF FIGURES
Figure 1.1. Diagram of Various Elicitation, Representation and Comparison
Approaches ...................................................................................................... 4
Figure 2.1. Simple Network Graph Produced by the Pathfinder Algorithm ........................ 27
Figure 2.2. Mean Predictive Ability Based on Number of Terms ...............................29
Figure 4.1. Example 20-Term Link Array ...................................................................62
Figure 4.2. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with the Containment Approach Referent by
the Number of Terms Used to Create Proximity Link Arrays .........................64
Figure 4.3. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with the Deterrence Approach Referent by
Number of Terms Used to Create Proximity Link Arrays ...............................66
Figure 4.4. Comparison Between the R-value Measuring the Relationship
Between Score and Engagement Approach Referent by the Number of
Terms Used to Create Proximity Link Arrays ................................................. 67
Figure 4.5. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with the Student Expert Essay by the Number
of Terms Used to Create Proximity Link Arrays .............................................68
Figure 4.6. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with the PONY Document by the Number of
Terms Used to Create Proximity Link Arrays ................................................. 70
Figure 4.7. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with the Containment Approach Referent by
the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores ... 73
Figure 4.8. Comparison Between the R-value Measuring the Relationship Between
Score and in Common with the Deterrence Approach Referent by the
Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores .........75
Figure 4.9. Comparison Between the R-value Measuring the Relationship between
Score and Links in Common with the Engagement Approach Referent by
the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores ... 76
Figure 4.10. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with a Student Expert Essay by the Number of
Terms Used to Create Proximity Link Arrays w/o 3 Scores ........................... 77
viii
Figure 4.11. Comparison Between the R-value Measuring the Relationship Between
Score and Links in Common with the PONY Document by the Number of
Terms Used to Create Proximity Link Arrays w/o 3 Scores ........................... 79
Figure 5.1. Comparison of the Five Benchmark Referent Essays .............................. 83
Figure 5.2. Comparison of Past Studies with the Current Study to Find Optimal
Range ............................................................................................................... 86
Figure 5.3. Comparison of the Anticipated Optimal Range with the Optimal Based
on the Results ...................................................................................................91
ix
LIST OF TABLES
Table 3.1. DEP Courseware Content ............................................................................ 42
Table 3.2. Features of OASIS ........................................................................................ 44
Table 3.3. Grading Scaling and Number of Essays ....................................................... 47
Table 3.4. Textalyser Word List Based on Frequency of Terms ................................... 57
Table 3.5. Correlation Comparison of Terms by Benchmark Referent ......................... 59
Table 4.1. Summary of the Correlations Between Student Score and Links in
Common with the Containment Approach Referent Based-on Number of
Terms (N = 215) ................................................................................................ 63
Table 4.2. Summary of the Correlations Between Student Score and Links in
Common with the Deterrence Approach Referent Based-on Number of
Terms (N = 215) ................................................................................................ 65
Table 4.3. Summary of the Correlations Between Student Score and Engagement
Approach Referent Based-on Number of Terms (N = 215) .............................. 66
Table 4.4. Summary of the Correlations Between Student Score and Links in
Common with the Student Expert Based-on Number of Terms (N = 215) ....... 67
Table 4-5. Summary of the Correlations Between Student Score and Links in
Common with the PONY Document Based-on Number of Terms (N = 215) .. 69
Table 4.6. Summary of the Correlations Between Student Score and Links in
Common with the Containment Approach Referent Based-on Number of
Terms (N = 70) without 3-Score Essays ............................................................ 72
Table 4.7. Summary of the Correlations Between Student Score and Links in
Common with the Deterrence Approach Referent Based-on Number of
Terms (N = 70) without 3-Score Essays ............................................................ 74
Table 4.8. Summary of the Correlations Between Student Score and Links in
Common with the Engagement Approach Referent Based-on Number of
Terms (N = 70) without 3-Score Essays ............................................................ 75
Table 4.9. Summary of the Correlations Between Student Score and Links in
Common with a Student Expert Essay Based-on Number of Terms (N = 70)
without 3-Score Essays ...................................................................................... 77
Table 4.10. Summary of the Correlations Between Student Score and Links in
Common with the PONY Document Based-on Number of Terms (N = 70)
x
without 3-Score Essays ...................................................................................... 78
Table 4.11. Summary of the Correlations Between the PFALL Analysis and the
PFW3 Analysis ................................................................................................. 80
xi
ACKNOWLEDGEMENTS
First, I would like to thank God for providing me with all the opportunities to finish my
degree and making all things work out. I would like to thank my wife Michele and
children Madelyn and Eden for their encouragement and putting up with me when I had
to sequester myself from the rest of the world. I would also like to thank my
grandparents Frank and Peggy Rovito for all their support over the years, without them,
none of this would have been possible. Finally, I would like to thank my Advisor Dr.
Roy Clariana for his guidance and support, and my committee (Dr. Susan Land, Dr. Priya
Sharma, and Dr. Ravinder Koul) for asking the tough questions at the proposal and
defense.
1
Chapter 1
Introduction
When individuals read text and then write an essay on what they read, they
produce and reproduce chunks of sequential language that represent higher levels of
knowledge (Clariana, Wolfe, & Kim, 2014). These “text structures” are the building
blocks used to construct high dimensional representations of knowledge. The
configuration of high dimensional knowledge is stored in the mind of an individual in the
form of structural knowledge. Structural knowledge is a type of knowledge that contains
the analysis of high dimensions of knowing (Jonassen, Beissner & Yacci, 1993).
Structural knowledge may contain multiple levels of complexity because it consists of the
schematic makeup of the interrelationships between concepts (knowledge structure) in
the individual’s mind. Thus, structural knowledge assessment may be essential in
measuring complex and intricate levels of knowledge. The structure of knowledge
constructs may reside within theoretical constructs such as mental models and schema,
which are not directly observable because they consist of cognitive processes internal to
the individual (Ifenthaler, 2008). The question becomes, what are the means to externally
visualize, measure, and or assess structural knowledge?
Essay exams have been shown to be an important measure of complex
knowledge (Clariana, Wallace, & Godshalk, 2009). They are used to evaluate and
promote higher levels of knowledge and understanding (Diekhoff, 1983). In education,
using essays as a method of evaluating students may be the most common way to assess
the knowledge students possess about a given topic, especially when evaluation takes
place within a complex domain. An essay, according to Nitko and Brookhart (2007),
2
"offers students the opportunity to display their abilities to write about, to organize, to
express, and to explain interrelationships among ideas" (p.191). When writing an essay,
a student begins to restructure the low dimensional linear structures in the course texts in
order to construct higher dimensional relational knowledge structures of the given
domain or topic. According to Clariana et al. (2009), “measuring the progress of learning
in complex domains is an import issue for instructional designers, instructors, and
researchers” (p. 726). Although essays can contain the structure of the student’s
knowledge of the topic at hand, they also contain idiosyncratic or even extraneous
content that can cloud or even hide the learner’s actual knowledge structure.
Knowledge structure is the interrelationship between the concepts within a given
domain and is something that exists in the memory associations of an individual that can
be captured in external artifacts (Clariana, 2010a). Kim (2012) summarizes many
methods that have been developed to capture knowledge structure, such as ALA-Mapper
(Analysis of Lexical Aggregates Mapper), ALA-Reader (Analysis of Lexical Aggregates
Reader), DEEP (Dynamic Evaluation of Enhanced Problem-Solving), SMD (Surface
Matching and Deep Structure), and KNOT (Knowledge Network Orientation Tool), to
name a few. Research conducted by the National Center for Research on Evaluations
Standards and Student Testing (CRESST) has shown promising results in visually
creating mental models in order to compare team or shared mental models with that of an
expert (Herl, O'Neil, Chung, & Schacter, 1999). Automated Knowledge Visualization
and Assessment (AKOVIA) is a methodology that uses various algorithms for visually
representing internal and shared mental models and team performance (Ifenthaler,
2014b). The Texas Christian University Node-Link Mapping system uses various types
3
of mapping systems such as Information maps, guide maps, and freestyle maps to
visualize knowledge (Dansereau, 2005). Although the protocols, mapping systems, and
algorithms differ among the aforementioned methods, they all have a similar goal, to spot
and identify the structural make-up of the knowledge within a given domain residing in
the mind of the individual.
Figure 1.1 shows the various types of elicitation approaches, representation
approaches, and comparison approaches for the three-stage process of assessing and
representing structural knowledge. Conversion of essays into networks graphs
(visualizations that show the relationships between concepts) is one method used to elicit,
represent, and then compare knowledge structure because, according to Koul, Clariana,
and Salehi (2005), networks graphs can provide a type of visual and holistic way of
representing knowledge structure. Although essays and networks graphs are related,
networks graphs provide a visual structure of knowledge that essays do not.
4
Figure 1.1. Shows the various types of knowledge elicitation, knowledge representation,
and knowledge comparison approaches. Adapted from "Structural knowledge:
techniques for representing, conveying, and acquiring structural knowledge, " by D. H.
Jonassen et al., 1993, p 22. Copyright 1993 by Lawrence Erbaum, Associates, Hillsdale,
NJ.
Following from a series of design studies, this investigation will elicit knowledge
structure as linear /sequential patterns in essays, will represent the patterns as network
graphs, and then will compare these graphs to an expert derived reference.
Past research has shown that Pathfinder network graphs can be extracted from
essays and produce graphical representations of knowledge structure (Clariana, 2010a;
Clariana et al., 2009). Pathfinder Networks can also be used to simplify complex
networks (Schvaneveldt, 1990b). Knowledge structure, according to Clariana (2010a), is
“the precursor of meaningful expression and is the underpinning of thought” (p. 41).
Knowledge structure can be represented in knowledge maps, which consist of nodes and
5
links (Clariana, 2010b, Dansereau, 2005, Ifenthaler, 2008, Ruiz-Primo, Shavelson, Li, &
Schultz, 2001). The linking of terms within a knowledge map provides a visual structure
of how the terms within a map relate to each other. The relationship of concepts (terms)
can be graphically represented via the terms and links forming a network graph. Once a
graphical representation is obtained, these graphical models can be compared
quantitatively using array proximity data.
Proximity data is an n-by-n matrix where n is number of terms-concepts
representing distance or adjacency (Kim, 2012). The proximity data measure of distance
looks at all pair-wise distances between terms and calculates the distance of the terms in
space or directly judged by students on an ordinal scale (Taricani & Clariana, 2006). The
proximity data measure of adjacency relates paired terms by an n by n matrix of 1’s
(indicating a connection between terms) and 0’s (indicating no connection between
terms) (Clariana et al., 2009). Koul, Clariana, and Saleh, (2005) indicated that several
dimensions of information exist within network graphs, yet there is no one singular
method for generating network graphs. One way to elicit network graphs derived from
essays and the focus of the current research is the analysis of lexical aggregates approach.
The Analysis of Lexical Aggregates
A lexical aggregate approach was developed by Clariana and Koul (2004) as a
way to score essays as well as graphically represent knowledge structure via network
graphs. This approach is called the Analysis of Lexical Aggregates (ALA). In short,
ALA aggregates pairs of terms from a pre-selected list including synonyms and
metonyms at either the sentence level or linearly across sentences, and saves the
6
aggregate as a link array proximity file (e.g., dot.prx) for further analysis (Clariana &
Wallace, 2007). Achieving an accurate list as possible is crucial to the validity of the
post aggregate analysis. Past research has sought to refine how these lists are created and
analyzed (Clariana & Taricani, 2010).
Clariana and Koul (2004) developed a computer program that translates written
text summaries in a proximity file to aggregate data at the sentence level to score essays
in comparison to the human raters called the ALA-Reader (Clariana, 2004). In that
investigation, ALA scores ranked 5th
(r = 0.69) out of 12 against 11 human raters,
indicating that the ALA-Reader only had 4 human raters that scored essays more
accurately than the ALA-Reader, and 7 human raters scored less accurately than the
ALA-Reader. In a follow up investigation, Koul et al. (2005) used the sentence-level
approach to score essays relative to an expert verses Latent Symantec Analyses (LSA) - a
method for extracting meaning from passages of text, based on statistical computations
from multiple documents (Evangelopoulos, 2013). Results indicated the ALA-Reader
performed more like the human raters (r = 0.38 to r = .71) then the LSA approach (r = -
0.07 to r = 0.39).
Clariana and Wallace (2007) used the ALA-Reader to score essays relative to an
expert, using the within sentence approach (aggregation of data by focusing on key
concepts within sentences) and a new linear aggregate approach (aggregation of data by
considering key concepts both within and across sentences). The linear aggregate
approach produced larger correlations with the human raters (r = 0.60 and r = 0.45) then
did the sentence aggregate approach (r = 0.47 and r= 0.29). These studies demonstrate
7
that ALA-Reader could be used to score essay exams, or at least provide a second level
of validation for the human rater.
Recent studies have attempted to further validate the ALA-Reader software as a
tool to score essays and analyze the structure of knowledge by refining the validly of the
terms used to create network graphs from essays. Descriptive research conducted by
Clariana et al. (2009) sought to refine the accuracy of term lists by manually replacing the
pronouns in student essays with their referents. They found that the linear aggregate
approach had better correlation (r = 0.74) with human raters than the sentence aggregate
approach (r = 0.44); also, converting pronouns to their referent had little effect on the
linear aggregate scores.
Visual forms of knowledge elicitation (graphical representations of relationships
among sets of terms) may be the most explicit way to represent an individual’s
knowledge structure (Ifenthaler, 2014a; Koul et al., 2005) without the extraneous
information often found in essays. Clariana and Taricani (2010) investigated how the
number of terms influenced the scoring of open ended concept maps by comparing term
lists of 16, 26, and 36 terms. The results of their study indicated that increasing the
number of terms decreased the predictive ability of the concept maps scores. These
results are contrary to the outcome of a study conducted by Goldsmith, Johnson, and
Acton (1991) where structure was elicited as pair-wise ranking from subsets of 5 - 25
terms derived from a 30 term list. Goldsmith et al. (1991) found an almost linear
relationship between increasing the number of terms and score predictability. The
current study plans to further validate the ALA-Reader by investigating whether or not
8
there are an optimal number of terms required to maximize the predictive ability of the
network graph scores.
Statement of the Problem, Purpose, and Research Questions
Writing is important and complex high dimensions of knowledge can be assessed
based on what and how we write. Visual tools such as KU-Mapper, DEEP, and
AKOVIA have been implement and tested with promising results (Ifenthaler, 2014b).
The ALA tool has shown promise as an automated process to assess complex learning
(Clariana & Taricani, 2010; Clariana & Wallace, 2007; Clariana et al., 2009; Koul et al.,
2005; Taricani & Clariana, 2006). Research continues to yield favorable results in the
continuing validation of the ALA-Reader as a computer based approach for automatically
scoring open-ended concept maps (Taricani & Clariana, 2006), measuring individual and
group knowledge from essays (Clariana, 2010b; Clariana et al., 2009; Clariana &
Wallace, 2007), and as an approach to elicit structural knowledge (Clariana, 2010a;
Clariana et al. 2014; Clariana & Wallace, 2009).
One of the variables that have not been investigated in any great degree in the
exploratory research on validating the ALA approach is an optimal number of terms,
which begs the question, would the previous studies have elicited the same results had the
number of terms used to create the terms list been a variable? As reported earlier,
Goldsmith et al. (1991) experimentally demonstrated a near linear relationship between
number of terms and predictably, whereas Taricani and Clariana (2010) found nearly the
opposite to be the case. Future research should focus on finding an optimal number of
terms (Clariana et al., 2014). Therefore, the current research will explore the effect of
9
varying the number of terms used to create a terms list when scoring network graphs as a
measure of assessing structural knowledge derived from essays.
The theoretical question this study plans to answer: “are there an optimal number
of terms needed to accurately assess structural knowledge derived from essays? The goal
of this exploratory research is to proffer an actual number or range of numbers that define
an optimal number of terms. It is hypothesized in the current research that the number of
terms used to create proximity files will affect the predictive ability as well as the
concurrent validity, a measure of how well a particular test correlates with a previously
validated measure (Shuttleworth, 2009), of scoring network graphs derived from essays
as well as measuring individual and group knowledge from derived essays.
This study will investigate the following research question:
What are the effects of changing the number of terms used to create proximity
files on the predictive ability of scoring essay-derived network graphs via the
ALA-Reader approach for individual knowledge structure?
10
Definitions
ALA-Reader - a computer program that translates written text summaries in a proximity
file. (Clariana & Koul, 2004)
Analysis of Lexical Aggregates - a computer-based method developed to analyze
knowledge structure within essays by translating text into concept map
representations of knowledge. (Clariana, 2004)
Automated Knowledge Visualization and Assessment - a methodology that uses
various algorithms for visually representing internal and shared mental models
and team performance (Ifenthaler, 2014b).
Concept maps - graphical representations of relationships among sets of terms (Koul et
al., 2005).
Concurrent Validity - a measure of how well a particular test correlates with a
previously validated measure. (Shuttleworth, 2009)
CRESST - a tool for the visual creation of mental models in order to compare team or
shared mental models with that of an expert (Herl et al., 1999)
Knowledge Structure - the interrelationship between the concepts within a given
domain and is something that exists in the memory association of an individual
that can be captured in external artifacts. (Clariana, 2010a)
Latent Symantec Analyses (LSA) - a method for extracting meaning from passages of
text, based on statistical computations from multiple documents (Evangelopoulos,
2013).
Linear aggregate approach - aggregation of data by considering key concepts both
within and across sentences. (Clariana & Wallace, 2007)
11
Network Graphs - visualizations that show the relationships between concepts. (Koul et
al., 2005)
Proximity data - is an n by n matrix where n is number of terms-concepts representing
distance or adjacency (Kim, 2012).
Proximity data measure of distance - looks at all pair-wise distance between terms and
calculates the distance of the terms in space or directly judged by students on an
ordinal scale (Taricani & Clariana, 2006).
The proximity data measure of adjacency - relates paired terms by an n-by-n matrix of
1’s and 0’s (1 indicates a connection between terms, 0 indicates not connection
between terms) (Clariana et al., 2009).
Structural knowledge - knowledge “that mediates the translation of declarative into
procedural knowledge and facilitates the application of procedural knowledge
(Jonassen et al., 1993)
Within sentence approach - aggregates data by focusing key concepts within sentences.
(Koul et al., 2005)
12
Chapter 2
Review of the Literature
Jonassen et al. (1993) defined a type of knowledge that exists between the declarative and
complex levels of knowledge called structural knowledge. It may be conceived that structural
knowledge is a part of or type of declarative knowledge. Jonassen et al. (1993) stated the
following on structural knowledge as a part of declarative knowledge:
Structure refers to how information within a knowledge domain is organized, which we
have defined as structural knowledge. Whether structural knowledge exists, as a separate
type of knowledge or it is part of declarative knowledge, is a semantic distinction that
does not affect its recognition as an entity or as a distinct type of knowledge (p.4).
What Jonassen is saying is that structural knowledge may be considered a type of declarative
knowledge, but it also has enough characteristic differences that make it behave like a distinct
type of knowledge requiring different elicitation and assessment routines.
The make-up of structural knowledge consists of the interrelationships of mental schema
(based-on declarative knowledge concepts) within a given domain. It is structural knowledge that
enables us to visualize the schematic makeup of declarative knowledge components that produces
complex levels of knowledge. If we can elicit the knowledge structure of an individual's
performance, we can then begin to compare these structures with other performers or learners,
and establish a paradigm for assessing such knowledge.
This chapter will present the literature surrounding the assessment of structural
knowledge found within essays, and tools developed to assess and elicit knowledge structure. We
will then focus on how the ALA approach was developed to assess knowledge structure. We will
analyze the literature surrounding the development of the ALA approach as well as research
attempting to increase the validation of this approach as a tool for creating proximity files and
network graphs derived from essay exams and as a tool for automating the assessment of essays.
The chapter will conclude with reviewing research on the importance of creating term lists as an
13
essential component of validating the ALA approach as a tool for assessing network graphs
derived from essays and thus increase validation for assessing the structure of knowledge within
complex domains using the ALA approach.
Essay Assessment
Essay exams are useful in attempting to tap into complex thinking requiring students to
demonstrate their ability to organize, integrate, interpret information, explain a position, construct
an argument, evaluate ideas, or any other complex higher learning task (Piontek, 2008). They are
in most instances used to evaluate and promote higher levels of understanding (Diekhoff, 1983).
There are both advantages and disadvantages to using essays to assess student knowledge. The
advantages include measuring complex ideas and reasoning, motivating better study habits,
providing students the flexibility in demonstrating what they know, and allowing students to
demonstrate communication skills (Piontek, 2008). The disadvantages of using essays include
the amount of time it takes to grade as careful attention must be made to creating accurate rubrics,
and only a single domain of knowledge can be assessed at one time (Piontek, 2008).
There are two types of essay items, restricted response and extended response (Nitko &
Brookhart, 2007). Restricted response essays focus and limit the content of the student’s answer.
According to Nitko and Brookhart (2007), “Restricted response items should require students to
apply their skills to solve new problems or to analyze novel situations” (p. 189). Extended
response essays, according to Piontek (2008), “allow the students to construct a variety of
strategies, processes, interpretations, and explanations for a given question, and to provide any
information they consider relevant” (p. 6). Extended response essays tend to reveal more of the
student’s organizational, integration, and evaluation abilities, but this essay type is less efficient
in extracting exact information about a given topic.
In addition to the two types of essays, restricted and extended response, there are four
genre of essay writing. According to the Online Writing Lab (2015) the four genres are
14
• Expository essays require the student to investigate an idea, evaluate evidence,
expound on the idea, and set forth an argument concerning that idea in a clear
and concise manner.
• Descriptive essays ask the student to describe something, such as an object,
person, place, experience, emotion, situation, etc.
• Narrative essays are often anecdotal, experiential, and personal - allowing
students to express themselves in a creative and moving ways.
• The argumentative essay is a genre of writing that requires the student to
investigate a topic; collect, generate, and evaluate evidence; and establish a
position on the topic in a concise manner. (Online Writing Lab, 2015)
In the current study, the essays that were secured for analysis were extended response type essays
that could be categorized as an argumentative essay, as students were asked to take a specific
diplomatic approach and defend their choice. Regardless of the genre and type of essay, a student
must write, and assessment can be problematic.
One of the aforementioned disadvantages of essay assessments is that they are difficult to
grade with a high degree of reliability. How raters score essays should be consistent and
objective so that students are truly being assessed on their knowledge and not on unrelated
factors, such as rater biases (Schaefer, 2008). Raters who inject bias into the assessment become
a threat to the validity of essay assessment (Messick, 1995). Without the use of a rubric, scoring
reliability is a problem prevalent within extended response essays (Nitko & Brookhart, 2007).
Due to the lack of restrictions within extended response essays, raters may have trouble rating
essays consistently. Rater drift is the tendency for raters to change their scoring over time (Nitko
& Brookhart, 2007). This tends to happen slowly when raters score many essays over a long
period of time.
15
To reduce rater bias and increase scoring reliability there are a few guidelines that should
be adhered to when scoring essay items (Nitko & Brookhart, 2007; Piontek, 2008).
1. Use a scoring rubric. Rubrics guide the rater to make sure he or she is focusing on the
correct content and weighting it correctly.
2. Outline or demonstrate what an expected answer looks like. A model answer will help
the rater identify what the right answer looks like and makes it consistent for rating every
essay.
3. Score one question at a time. If the assessment requires students to answer multiple essay
questions, the rater should assess the same question from all students before evaluating
the next question.
4. Score subject matter content separately from other factors. The content of the essay
should be evaluated separate from other factors as spelling, style, format, and language,
unless these non-content factors are listed in the rubric or part of the essay writing
objective.
5. Score essays anonymously. Remove the names from the essay in order not to allow bias
towards a student based on what you know about them. This will reduce the Halo Effect,
which is when one characteristic of an individual affects the rater’s judgment.
6. Give students feedback. Given the complexity of the essays and the level of knowledge
they tend to assess, feedback becomes an important part of the learning, and also
feedback keeps the rater consistent and aware of what they are assessing.
7. Use a systematic process for scoring each essay. The same method should be used by the
rater to score each essay. This can be accomplished by the use of model answers and
rubrics.
Strictly adhering to these guidelines may not eliminate the problems with scoring essays, but it
should reduce rater bias, rater drift, the halo effect, and increase scoring reliability.
16
Methods of Capturing and Assessing Structural Knowledge
Measuring Knowledge Structure Using Concept Maps
Graphic depictions of how people organize knowledge within a given domain are called
concept maps (Green, Lubin, Slater, & Walden, 2013). Concept maps are a way in which we
externally and visually represent the internal structure of our knowledge within a given domain,
as well as measuring the important aspects of the structure of domain knowledge (Hu, Cheng, &
Heh, 2011; Ifenthaler, 2008; Ruiz-Primo et al., 2001). These maps can be both digitally created
for individuals and also for teams as a way to represent the shared mental models (Engelmann &
Hesse, 2010).
Ruiz-Primo et al., (2001) conducted research attempting to establish a framework for
examining the validity of the knowledge structure interpretation of three concept mapping
techniques. These techniques consisted of using 20 preselected concepts/terms to 1) construct a
concept map from scratch, 2) fill in the nodes, and 3) fill in the line. According to Ruiz-Primo et
al. (2001), “a concept map can be categorized along a continuum from high-directed to low-
directed, based on the information provided to the students” (p.101). Constructing a concept map
from scratch was a low-directed technique, meaning that only the concepts were given without
instruction or aid on how to construct the concept maps. The fill in the nodes and fill in the line
concept mapping techniques were high-directed in that they were much more structured by
providing either the nodes or the links for the concept map. In that study, Ruiz-Primo et al.,
(2001), concluded that all three mapping techniques measure student knowledge. Low-directed
techniques provide students more opportunities to reveal more of their conceptual understanding
than high-directed techniques. The results of the Ruiz-Primo et al. (2001) research, demonstrated
that concept maps could measure different levels of knowledge structure depending on the
amount of direction a student is given when completing a concept map.
Research has shown that concept maps can be used to chronicle changes in knowledge
(Green et al., 2012). By creating pre and post-instruction concept maps, Green et al. (2012)
17
empirically demonstrated that student knowledge changed after instruction. Each concept map
(pre and post) was scored based on formulas calculating nodes, links, density, depth, complexity,
and chunks resulting in a specific quantifiable score. Learning was measured by significant
changes in the mean scores between pre- and post-concept map scores. The result of the research
provides support for the use of concept map construction as to measure increases or decreases in
knowledge representation after instruction. Additionally, when the participants constructed their
concepts maps, the number of nodes or terms used to create the concept maps had a mean range
from 12.82 to 25.13 terms.
Rye and Rubba (2002) investigated a concept map scoring methodology that weighted
concepts and relationships differently based on the presence or absence of the concepts and
relationships when compared to an expert referent. In that study, the relationships between the
concepts were weighted higher than concepts themselves. Links that were made between
concepts were given higher importance than the identification of the concepts or terms. When
comparing the student concept map with that of the expert referent, twice as many points were
awarded for identifying the relationships between the concepts in common with the expert and
fewer points were given to identification of concepts alone. Rye and Rubba (2002) also weighted
the concepts and relationships based on importance defined by the expert. Eight concepts out of
127 where defined as central concepts thus resulting in the student receiving 3 points; two points
were awarded for the next 33 concepts identified by the expert, and the reaming 87 concepts were
given 1 point. When scoring relationships between concepts students received 6 points, if their
relationship matched the expert's linking between two concepts. Students received 4 points if one
expert’s concept was linked to a student concept. Students received 2 points if they made a valid
link between two concepts that was not found on the expert’s concept map. The results of this
study indicated that weighted concept maps scores had good predictive validity for measuring
student performance.
18
Internal mental models are not directly observable, and research into externalizing mental
models requires valid and actuate tools (Ifenthaler, 2008). Concept map research has shown
promising results in its validity as a method to represent the structure of knowledge in the mind of
a student (Chen, Cheng, & Heh, 2005; Green et al., 2012; Hu, Cheng, & Heh, 2011; Ifenthaler,
2008; Ruiz-Primo et al. 2001; Rye & Rubba, 2002; Taricani & Clariana, 2006). Methods of
scoring concept maps can change to reveal different levels of understanding, without changing
the overall attributes of the concept map design (Green et al., 2012; Ruiz-Primo et al., 2001; Rye
& Rubba, 2002). Research has shown that concept maps are not relegated to just measuring
individual mental models, but also shared mental models (Engelmann & Hess, 2010, Hu et al.,
2011; Ifenthaler, 2014; Johnson, Sikorski, Mendenhall, Khalil, & Lee, 2010).
Web-based applications, programming, and technology, has opened up another platform
for the development of educational and computer-based assessment tools (Ifenthaler, 2014a).
Ifenthaler (2014a), developed a methodology for visualizing individual and shared mental models
called Automated Knowledge Visualization and Assessment (AKOVIA). According to Ifenthaler
(2014b), AKOVIA:
"…is based on mental model theory and integrates a large number of dynamic interfaces
to different online environments, for instance, learning management systems,
personalized learning environments, game-based environments, or computer-based
assessment environments such as PISA or PIAAC. This open architecture of AKOVIA
enables a large variety of research and practical applications, such as investigation of
learning processes; distinguishing features of subject domains; cross-curricular,
nonroutine, dynamic, and complex skill; or convergence of team-based knowledge"
(p.653).
Similar to other knowledge elicitation and visualization techniques, AKOVIA runs under the
assumption that individual and shared knowledge can be externalized and visually represented
(Ifenthaler, 2014a). AKOVIA is applied to small amounts of text and does not require
19
referencing a large lexical database, which is unlike LSA’s text analysis requirement. In order for
a valid AKOVIA analysis, text passages must contain at least 300 words. AKOVIA uses
multistage language oriented algorithms to transform text into list form and proximity matrices,
the concept maps are then generated from the text list and proximity matrices (Pirnay-Dummer &
Ifenthaler, 2010). The expert or individual does not pick or create the terms, but rather the
AKOVIA software picks the specific terms based-on the battery of algorithms. The number of
terms used to create the concept maps are derived from that proximity matrices appears to vary
based on the number of words making up the text passage (Ifenthaler, 2014a; Ifenthaler, 2014b).
AKOVIA is carried out in four stages. In Stage 1, the text is put into the system where it
is cleaned up, i.e. metadata is removed. Stage 2 parses out the text, stems the text, and calculates
the word association. When stemming, AKOVIA associates words with their stem i.e. card and
cards become the same word card. Stage 3 employs a battery of measures, such a surface
matching, graphical matching, structural matching, gamma matching, concept matching,
positional matching, and balanced semantic matching to produce graphical analysis. Stage four
outputs the graph based on the analysis conducted in stage three. From this output, comparisons
can be made between or among various individual or shared mental models.
Ifenthaler (2014b) conducted research into the feasibility and validity of the AKOVIA
framework, by investigating 1) whether AKOVIA’s semantic and structure matching
measurements provide evidence for differences in team performance between differently
composed teams based on task knowledge, and 2) are greater levels of task shared mental models
and team shared mental models associated with higher team based performance when assessed
with the semantic and structure matching measurements. Teams of learners were to perform
tasks within an online environment. Team-based essays were analyzed with AKOVIA by
comparing them with an expert reference solution. The results of the study indicated that
AKOVIA was able to find differences between group performances, and support AKOVIA’s
validity as a fully automated methodology for assessing team-based performance. AKOVIA
20
joins the list of successfully tested instruments using graphical representations for computer based
knowledge assessment, such as DEEP and KU-Mapper (Ifenthaler, 2014b). AKOVIA appears to
be a practical and feasible tool to employ for individual and team-based assessment (Ifenthaler,
2014a, Ifenthaler, 2014b).
Team Assessment and Diagnostic Instrument (TADI) is a tool that can quickly be set up
to assess team-shared cognition as well as integrate with other computer-based diagnostic tools.
(Johnson et al., 2010). TADI measures the degree of knowledge within a team to determine level
of shared mental models. Once a team develops a shared mental model, this model can be used to
measure the potential productivity of a team (Johnson et al., 2010). Once team tasks are
completed, individuals fill out a web form questionnaire. The form’s data is collected by the
TADI system from every team member, and then exported into a spreadsheet application. Two
measures calculated from collected TADI data are: 1) the mean representing the given factor that
a team has, and 2) the standard deviation that represents the level of variation in the individual
ratings (Johnson et al., 2010). TADI unlike AKOVIA does not generate visual representations of
shared mental models, but rather uses statistics such as mean and standard deviation to calculate
the shared mental models of teams.
Knowledge Elicitation Approaches
There are many methods that have been developed to try to elicit and assess structural
knowledge (Kim, 2012). A few of these examples are DEEP, jMap, ACSMM, KU-Mapper,
LSA, ALA-Reader, and ALA-Mapper. Although the process among the aforementioned
elicitation methods differ, once the individual's knowledge structure is elicited and turned into
comparable representations, it can then be analyzed and assessed. Assessment of knowledge
structure occurs when compared with an expert. Wouters, van der Spek, and van Oostendorp
(2011) noted that the validity of structure assessment of knowledge is rooted in the agreement of
central concepts among experts within a given domain. Hard domains are domains of knowledge
21
where a central body of theory is generally agreed upon (i.e., biology), and soft domains are
domains of knowledge where there is a lack of a centralized body of knowledge (i.e., political
science) (Wouters et al, 2011). Hard domains should elicited consistent similar knowledge
structures across experts due to the "hardness" (degree of adherence to a central theory) of a
domain, whereas soft domains apply looser definitions of central concepts, thus increase the
probability of variance among experts' knowledge structure (Keppens, 2007). Research has show
that hard domains such as computer programming and serious gaming dealing with procedural
knowledge are appropriate for structural knowledge assessment (Keppens, 2007; Wouters et al,
2011). Knowledge elicitation and eventual assessment is more challenging within soft domains.
Dynamic Enhanced Evaluation of Problem Solving (DEEP) is a method created to assess
learning within complex soft domains, such as ill-defined problems within a medical diagnosis
(Koszalka & Epling, 2010). DEEP uses causal influence diagramming to create knowledge
elicitation and representation. In a two-step process, the first is to identify the recognizable
patterns experts use to solve ill-defined problems. The next step is to develop measures of
similarity between the novice and experts, and observe how the novice's pattern begins to match
the expert's over time. DEEP collects responses to a problem from both novice and experts alike.
Responses to complex problems by an expert provide the baseline from which the novice patterns
are compared. Learning is then assessed by how close the novice(s) patterns come to or match
the pattern of responses by the expert (Koszalka & Epling, 2010). In the case of DEEP, written
exams are not the measure of success, but rather the knowledge structure that is elicited by the
process the student utilizes to solve an ill-defined problem.
The excel-based application called jMap is software designed to elicit mental models,
assess changes in the models, and compare models to that of an expert (Shute, Masduki, Donmez,
Dennen, Kim, Jeong, & Wang, 2010). JMap is a tool designed to assess causal diagrams
(diagrams that show cause and effect relationships). The jMap process is programmed to enable
the elicitation, recording, and coding of mental models (students create models using excel's auto
22
shape tools). This process quantitatively assesses the models over time (code and translate
models into a transitional frequency matrix), and compares the models to experts (compile raw
scores and compare quantitative measures such as percentages of shared links between the expert
and novice) (Shute et al., 2010).
The Analysis-Constructed Shared Mental Model Methodology (ACSMM) is a method for
comparing the shared mental models of teams. According to Johnson and O'Connor (2008),
ACSMM "translates individual mental models into a team sharedness map without losing the
original perspective of the individual, thereby representing a more accurate representation of the
team sharedness" (p.188). It is a five-phase process and is similar to most methods of capturing
structural knowledge. The first phase consists of the elicitation of knowledge by compiling a list
of terms from an expert. Phase two consists of each individual member of the group constructing
their own individual mental model. During phase three, the concepts and relationship between
concepts are coded in order to make similarity comparisons among the maps. Once the individual
mental models are coded, phase four consists of determining what concepts are shared among the
mental models of the team/group members. The final phase consists of constructing the
ACSMM, which is a multi-step process that constructs a single team's mental model. Once a
model is constructed, it can be compared over time with that of an expert and or other teams.
Other Approaches for Eliciting Knowledge Structure
One of the most widely used methods for text analysis is Latent Semantic Analysis
(LSA), which is a vector space modeling technique for representing word meanings (Olney,
2009; Olney, 2011). A vector space model (VSM) uses vectoring statistical techniques to
represent the similarity of a collection of words as a cosine between vectors. The vector space
model is usually captured as term occurrences in paragraphs. According to Evangelopoulos
(2013), LSA has been shown to model cognitive functions, learning and understanding of word
meaning, episodic memory, semantic memory, discourse coherence, and metaphoric
23
comprehension. One of the practical applications of LSA is automatic essay grading
(Evangelopoulos, 2013).
When automatically grading essays, according to Koul et al. (2005), LSA "compares the
interrelationship of words in a particular essay with the interrelationship of words in the essays
used to train the software" (p.234). The process is as follows, first essays scores are determined
by a comparison of the new essay's vector to a large set of rater scored student essays (at least 100
student essays). Next, a new student essay receives the score that the nearest previously scored
essay received. If the new essay is nearest to an existing scored essay that had received a “2,”
then the new essay also receives a “2." LSA requires thousands to millions of words to derive
high dimensional semantic spaces and hundreds of rater scored essays that span the full range of
possible scores in order to make its comparisons, which can lead to increased costs (Koul et al.,
2005).
LSA requires a large database of documents and terms, which makes this method
relatively expensive, but a functional multi-modeling document retrieval and analysis tool
(Deerwester, Dumals, Furnas, Landauer, & Harshman, 1990; Evangelopoulos, 2013). The
process of indexing for document/essay retrieval using LSA requires thousands of documents and
terms, which is used to create high-dimensional representations (about 100 dimensions) using
mathematical techniques to retrieve documents from query terms (Deerwester et al., 1990).
Although LSA is a popular method for scoring essays, it borders the line of practicality due to the
number and maintenance of terms. This impracticality may be providing the catalyst for the
development of other methods of automated essay assessment techniques such as the ALA-
reader, which requires less terms (20 - 30) as oppose to a database full of thousands of documents
and millions terms (Deerwester et al., 1990; Taricani & Clariana, 2006).
Another approach to assessing written text is the use of data mining. Dringus and Ellis
(2005) analyzed the use of data mining as a strategy for assessing online discussion forums. Data
mining does not employ Pathfinder analysis in extracting structural knowledge, but rather uses a
24
method of creating queries to extract specific data to improve a teacher’s ability to assess
discussion threads. In the case of a discussion thread, the data mining method queries databases
to look for previously unknown interrelationships among concepts. Once an instructor can query
and pull the data from a discussion forum, he or she can more efficiently understand how the
student creates the relationships among concepts. Forums can last from days to weeks and
contain hundreds of threads (Dringus & Ellis, 2005). The authors indicate that the data mining
technique offers promise, but data mining should aid in the process of assessing online forums
and not take the place of in-depth human analysis of a student’s participation in a forum.
Knowledge Unit Mapper or KU-Mapper is supplemental software that streamlines the
collection of proximity data using three different elicitation approaches for use in KNOT
software as well as standard MDS software (Clariana, 2003). The collection of proximity data is
important to the Pathfinder approach for measuring and assessing structural knowledge because it
provides the quantitative measures for the relatedness data between pairs of concepts. KU-
Mapper collects proximity data in three ways, 1) traditional Pathfinder rate pair-wise comparisons
of all terms, 2) an abbreviated approach using list-wise comparisons, and 3) semantic-map card-
sorting tool that allows the participants to add terms to an on-screen semantic-map (Clariana,
2003).
Clariana and Wallace (2009) compared pair-wise, list-wise, and clustering approaches for
the construction of structural knowledge. The pair-wise approach compares two terms at a time
in which the individuals rate the relatedness of term pairing on an ordinal scale. With the list-
wise approach, individuals make comparisons of one term to another term from a list of terms.
The clustering approach takes all terms that are to be compared and allows the individual to move
terms that are related close together and unrelated terms further apart. Using the KU-Mapper in
conjunction with KNOT software, Clariana and Wallace (2009) were able to ascertain that all
three approaches (pair-wise, list-wise, and clustering) create similar network representations of
25
structural knowledge at the group level. The significance of this finding is that it was a wholly
computer-based method for interpreting and measuring knowledge structure.
Analysis of Lexical Aggregates (ALA)-Mapper, evaluates network graphs by comparing
terms and distances between terms in a student's network graph with that of an expert's graph
(Koul et al. 2005). Proximity data derived from the ALA-Mapper software is based on the
conversion of node-link information within concept maps. It only focuses on the links or distance
between terms and not the link labels (Clariana, 2010b). Proximity data is then created from the
distances between nodes. Once proximity data is created, the process of assessing structural
knowledge begins using the Pathfinder approach. The ALA-Mapper approach is another
computer based method for assessing essays. Regardless of the methods used to create the
proximity file, assessment of structural knowledge is rooted in the Pathfinder finder approach.
The advantage of computer-based methods of interpreting and measuring structural
knowledge is that they can be relatively low cost, easy to use, and easy to interpret (Koul et al.,
2005; Toranj & Ansari, 2012). These computer-based methods can also be used as teaching tools
within classrooms (Toranj & Ansari, 2012). Given these advantages of computer-based
approaches, further tools are being developed and validated such as the ALA-Mapper and ALA
Reader. The ALA-Reader and the ALA-Mapper are computer-based approaches for assessing
structural knowledge derived from essays. The Analysis of Lexical Aggregates approach uses
student essays to derive knowledge structure representations (Clariana et al., 2009). The ALA-
Reader is software developed by Clariana (2004), that converts essays into Pathfinder network
representations by aggregating key terms at the sentence level both within sentences and across
sentences (Clariana & Wallace 2009). The ALA-Reader aggregates the sentences and creates
proximity data files for Pathfinder; then, KNOT software converts the proximity data into
graphical representation of the PFNet. The ALA-Reader approach is a computer-based process
for deriving knowledge structure from essays with the goal of creating a fully automated
structural knowledge assessment tool.
26
Assessing Knowledge Structure with the Pathfinder Approach
The constructed structural makeup of the interrelationships among concepts within the
mind of the learner is the individual's knowledge structure within a given domain. Knowledge
structure can be represented by core aspects such as domain concepts, nature of the relationships
between concepts, and the strength of these relationships; these representations can be elicited
and measured by creating analogies, concept maps, semantic relationship tests, network graphs,
and semantic nets (Murphy & Suen, 1999). Once knowledge structure is represented, for
example, as a network graph, it can be measured by comparing these graphs to an
instructor/expert and/or other student's network graphs (Clariana & Wallace, 2007; Goldsmith et
al., 1991; Jonassen et al., 1993).
Knowledge structure represented as a network graph consists of two components: nodes,
which represent concepts, and lines between nodes, which represent the relationships between
nodes or concepts. It is the relationship between nodes that, according to Goldsmith et al. (1991),
is the critical attribute of structural representations of knowledge. By using a structural
assessment approach, the relationships between concepts can be assessed. According to
Goldsmith et al. (1991), the structural approach consists of three steps: knowledge elicitation,
knowledge representation, and evaluation of individual knowledge representations. In step one,
knowledge elicitation involves the evaluation of the individual's understanding of the relationship
between concepts. The second step involves defining the representation of the knowledge. Some
frequently used procedures for defining the knowledge representation are multidimensional
scaling (MDS), cluster analysis, and additive trees (Goldsmith et al., 1991). The final step in the
structural approach is to evaluate constructed knowledge representations based on a standard.
One method of evaluating constructed knowledge representations is to compare the knowledge
structure of a novice to that of an expert (Clariana & Wallace, 2007). Comparisons are then
assessed base on a degree of similarity, with similarity being defined as number of links in
common between comparison PFNets divide by the total number of links.
27
Figure 2.1. Shows a simple network graph produced by the Pathfinder algorithm using the terms
displayed on the left. When multiple PFNets are created, they can be compared between each
other and a degree of similarity can be calculated by taking common links and dividing them by
total links.
Goldsmith et al. (1991) indicated that research has shown that with instruction, a student's
knowledge representation becomes similar to an expert. The structural approach elicits
knowledge. Then from the elicited knowledge, structures are built (i.e., from Pathfinder
networks, MDS etc.). Once external structures are created, they can be compared and evaluated.
The results of the evaluation produce a correlation or other measure of similarity between the
novice and the expert. We can then begin to assess performance from these correlations. The
higher the correlation is between novice and expert knowledge structures; the better the student
has performed. Hence a scale can be created assessing performance based on correlation of
similarity of individual knowledge structures.
The research conducted by Goldsmith et al. (1991) attempted to validate assessment
methods of students' cognitive representations (knowledge structures). In that study, Pathfinder
networks were a network graph derived from the Pathfinder algorithm. This algorithm used the
28
relationship between pairs of items to produce a network based on the relationship of several item
pairs, reported as similarity, defined as the ratio of common links between two network
representations divided by total number of links (Goldsmith & Davenport, 1990). A network
graph obtained from the Pathfinder algorithm is a structural modeling technique. It represents
items or concepts as nodes in a network, and it represents relationships as links between nodes.
The links within a Pathfinder network do not identify the nature of the relationship between the
concepts (Cook, Neville, & Rowe, 1996). From the Pathfinder network, a visualization of
knowledge structure can be formed.
Closeness, according to Goldsmith et al. (1991), is a “method of quantifying the
configural similarity between two networks having a common set of nodes" (p.89). Closeness is
used to quantify the similarity between graphs by assigning a similarity value between 0 and 1.
In this study, Pathfinder created the network of concepts and Closeness assessed the similarity.
The results of the study indicated that Pathfinder networks contain a unique variance over
proximity rates. In addition, MDS and Closeness had better predictably of classroom test
network similarity compared to raw proximity data correlations. It is important to note that the
number of concepts used to create the Pathfinder networks and assess them via Closeness varied
between 5 to 30 terms. They found that increasing the number of terms within pair-wise
comparisons increased the predictive validity of the Pathfinder network. A copy of results from
the Goldsmith et al. (1991) study in Figure 2.2 shows that mean predicative validity of the
Pathfinder network increased as the number of terms increased.
29
Figure 2.2. Shows that mean predictive validity of exam performance increases as the number of
terms increase. Adapted from: "Assessing structural knowledge," by T. E. Goldsmith, O. J.
Johnson, and W. H. Acton, Journal of Educational Psychology 83, pp. 88 - 96. Copyright 1991 by
the American Psychological Association.
Curtis and Davis (2003) presented Pathfinder as a technique for measuring and assessing
knowledge structure within account education. In that study, managerial accounting student's
knowledge structure (n=56) using pair wise comparisons were analyzed via Closeness scores (as
defined in Goldsmith et al. (1991)); declarative knowledge was measured using multiple choice
questions and two problem solving questions; case performance was assessed via a simulated
case-based consultant report; and Self-Efficacy for audit tasks was assessed using a three-item
self-efficacy scale measuring student's degree of confidence in auditing tasks. In order to obtain a
measure for Closeness, instructors provided a list of 30 terms within the related domain of
instruction and a term file was created for inclusion into Pathfinder. A rating session began
where pairs of concepts were displayed to the student who then rated them for relatedness. Once
all possible pair-wise comparisons were rated, a network structure could then be developed.
Course instructors also went through this process to create a referent network structure. Once
ratings were complete, a proximity file was created for each participant. They compared student
30
raw proximity data with that of the instructor's raw proximity data, and Pathfinder generated
physical representations of the data.
The results of the Curtis and Davis (2003) study found a significant difference in
Closeness scores (the correlation between two network structures) between pre and post
instruction. Knowledge structure post-instruction Closeness scores were higher than knowledge
structure pre-instruction scores (mean r = 0.23 vs. mean r = 0.37). Closeness scores also
positively correlated with the exam scores and case analysis. In the case for testing discriminant
validity within case analysis, they found that Closeness scores provided an incremental prediction
in performance similar to conventional scores of the case analysis. The Closeness scores were
able to discern between each case based on proximity data. The Curtis and Davis (2003) study
provided continued support for the use of proximity data as a predicator of student performance.
In the Curtis and Davis (2003) research as well as in the Goldsmith et al. (1991) study, the list of
terms created for eliciting knowledge structure were derived from the course author or an expert.
Comparing Knowledge Structures with Experts
Once knowledge structures are elicited, they take the quantifiable form of proximity files
and then a graphical form as network graphs (Chen & Hen, 2005; Clariana et al., 2009; Clariana
& Wallace, 2007; Murphy & Suen, 1999). The elicited knowledge structure of a single individual
enables us to see how the concepts and their relationships are structured within a given domain.
An individual’s knowledge structure alone does not allow for the assessment of the quality of that
structure. While it can provide us a structural representation of domain related knowledge, it
does not tell us to what degree the structure is correct or incorrect. In order to make that
judgment, elicited knowledge structures need to be compared both to other individual’s within the
domain and also with the expert in the given domain (Clariana et al., 2009; Clariana & Wallace
2007; Murphy & Suen, 1999; Ruiz-Primo & Shavelson, 1996).
31
In the Murphy and Suen (1999) study, experts were used to generate a list of concepts,
provide formative feedback in developing semantic relationship testing materials, and establish a
benchmark from which Pathfinder networks were compared. When multiple experts are used as a
referent, the averages of the expert Pathfinder nets are used to provide a single referent structure
(Goldsmith et al., 1991; Murphy & Suen, 1999). When assessing knowledge structure from
closed or open-ended essays the expert is used to create a list of terms, and or validate a list of
terms (Clariana et al., 2009; Clariana & Wallace 2007; Taricani & Clariana, 2006). Once the
expert referent structure has been created, then individuals or groups can be compared using a
correlational measurement in order to see how closely related the individual scores are compared
to the expert (Clariana & Wallace 2007; Clariana et al., 2009; Diekhoff, 1983). Knowledge
structure can now be assessed because as the relationship between the expert and non-expert
become stronger (via correlation), we can gauge the quality of learning based-on the strength of
the relationship.
Past studies have indicated that as a result of instruction the student’s knowledge
structure becomes more like the expert’s (Diekhoff, 1983; Goldsmith et al., 1991; Jonassen et al.,
1993; Kim, 2012; Thro, 1978). Thro (1978) found by comparing regression models that
associative structures (patterns of relationships) of knowledge contributes significantly to
prediction of achievement. Diekhoff (1983) used Pearson correlations to validate relationship
judgment tests (a method where students use a numerical scale in judging the strength of the
relationship between pairs of concepts), finding that relationship judgments were reliable for
assessing structural knowledge when comparing the results with that of an expert (average r =
0.58).
A computer-based knowledge mapping system was develop by The National Center for
Research on Evaluations Standards and Student Testing (CRESST) (Herl, et al. 1999). Research
on this computer-based tool for constructing knowledge maps consisted of comparing computer
generated knowledge maps of both students and student groups. The Herl et al. (1999) research
32
used four expert-maps as the criteria from assessing the pre and posttest scores. By using expert
referent map comparisons, they were able to gauge the success of instruction from pretest score to
post test scores. The results of this study support the use of multiple expert referent comparisons
as a methodology for gauging student learning using knowledge mapping techniques.
Analysis of Lexical Aggregates Approach
In order to assess structural knowledge within complex domains, essays are a reliable
assessment tool. The reason for this is, whether intentional or not, essays contain a reflection of
the individual's knowledge structure (Clariana & Wallace, 2007). Some of the underlying
problems with essay assessment are that it is costly to asses and administer on a large scale, it
tends to be subjective, and the actual structure of the student's knowledge is not overtly apparent
(Koul et al., 2005). The actual structure of the individual's knowledge can be clouded in
additional data not directly related to the topic at hand or organized in such a way that structural
knowledge representation becomes clouded.
In the quest to elicit knowledge structures stored within the mind of the individual,
Pathfinder network analysis is an established and reliable approach to capturing and assessing
structural knowledge (Curtis & Davis, 2003; Dearholt & Schvaneveldt, 1990; Jonassen et al.
1993). The ALA approach was developed as a computer-based means of aggregating term data
from essays and then applying the Pathfinder approach to measure and assesses knowledge
structure. By creating a valid computer-based approach for assessing structural knowledge
housed within essays, it is possible to reduce or eliminate some of the problems that plague essay
assessment, such as cost and human rater bias (Nitko & Brookhart, 2007). There is growing
research on validating essay assessment using the computer-based ALA approach.
33
Validating the ALA approach
The ALA-Reader software is a utility used to translate written text summaries into a
proximity file for analysis by PCKNOT software (Clariana & Koul, 2004). Using ALA-Reader
software, Clariana and Koul (2004) investigated a computer-based approach to translate text into
network graph representations. In this research, ALA-Reader translated twenty-four written
summaries of the heart and circulatory system into proximity data for processing in PCKNOT
software. The ALA-Reader used 26 terms generated by text-occurrence frequency analysis,
analyses term co-occurrence, converted term co-occurrence into propositions, and finally
aggregated across sentences into a proximity array. The PCKNOT software then transformed the
proximity data into visual PFNets. Sixteen of the text summaries were rated by 11 pairs of
human raters and a PFNet rating. The ALA-Reader PFNet rating was ranked the 5th highest
correlation (r = 0.69) out of 12 scores. The Clariana and Koul (2004) research indicates that a
computer-based method for capturing knowledge structure is valid as it assesses similar
knowledge to that of the human rater.
A study conducted by Koul et al. (2005) sought to continue the investigation on
methodologies for computer-based approaches for scoring essays and concept maps. This study
compared several computer-based approaches for scoring concept maps against essays scored by
human raters. The three computer-based approaches were ALA-Mapper for concept-maps, ALA-
Reader for essays, and Latent Semantic Analysis for essays. The ALA-Reader ranked fifth out of
13 raters, which is consistent with the Clariana and Koul (2004) study, whereas LSA ranked ninth
out of 13 raters. Only the ALA-Mapper was used to score concept-maps compared to human
raters using quantitative rubrics (a rubric focusing on correctness) and a qualitative rubric (a
rubric focusing on mental representation of content). It is interesting to note that ALA-Mapper
ranked higher (1 and 5) in the list of scores when the qualitative rubric was used compared to
score concept maps with the quantitative rubric (8 and 12). Since the qualitative rubric used by
the raters focuses on mental representations, the ranking of 1 and 5 provides validation of the
34
ALA-Mapper as an approach for measuring and assessing structural knowledge. The results of
this study indicated that both the human raters and computer-based scores capture some of the
same information about process knowledge within a given domain, thus presenting a step toward
validating a low-cost automatic process for scoring essays in this way.
As more and more research begins to indicate that computer-based methods are a cost-
effective way to assess essays (Clariana & Koul, 2004; Koul et al., 2005), the techniques for
automatically assessing essays need to be fine-tuned and validated. Clariana and Wallace (2007)
conducted a proof-of-concept investigation on deriving and measuring knowledge structure from
essay exams from individuals and teams. The study scored essays using ALA-Reader software
employing the within sentence aggregation approach and a new approach (added to the software)
called the liner aggregation approach.
The linear approach aggregates terms both within and across sentences and results in
always producing a connected graph; whereas connectedness is not guaranteed in the sentence-
wide approach, which tends to produce a discontented graph of multiple unconnected clusters
(Clariana & Wallace, 2007). The study compared linear aggregation scores with the within
sentence aggregation scores as well as human scores and multiple choice scores. A frequency list
of terms provided 30 important concept terms, which were then stemmed (removal of suffixes to
obtain root words) and used by the ALA-Reader. The study found that ALA-Reader scores fell
between the two human rater scores for both the sentence and linear approach. The sentence
approach tended to score essays higher than the linear approach, although the linear approach did
fall between the scores of the human raters.
When correlating the scores to the human rater and the multiple-choice test, the ALA-
Reader using the linear approach correlated higher to the human rater over the sentence approach,
and both linear and sentence approaches correlated low with the multiple choice tests (sentence
approach r = 0.17 and linear approach r = 0.39 ). This correlation should be low since multiple
choice tests tend not to assess structural knowledge. These results continue to support the ALA-
35
Reader approach as a tool to measure structural knowledge as well as using the linear-wise
approach as the algorithmic process for providing proximity files.
The Clariana and Wallace (2007) study attempted to investigate whether the ALA-Reader
could measure team knowledge by grouping the essays based on a high score group (n = 14) and
lower score group (n = 15) - determination was based on the median split of the human rater
scores. PFNets were created by averaging the proximity files within a group resulting in a single
PFNet for each group. By measuring similarity (the intersection of links in common between two
PFNets divided by the union of all unique links in the two), this results in a score ranging
between 0 (no similarity) to 1 (perfect similarity). The research found that the high scoring group
(r = 0.31) was more similar to the expert’s essays then the low scoring group (r = 0.19). The
results of this study provided support for the ALA-Reader as a valid approach from measuring
team knowledge derived from essays. The ability of the ALA-Reader as a tool to measure team
or group knowledge could provide a computer-based cost effective approach from validating
human rater scores.
The Clariana et al. (2009) study investigated how manually replacing pronouns before
analysis with their noun referents might improve individual scores using both the linear and
sentence aggregation approaches. Specifically, pronouns are a serious problem for text pattern
matching, because it is difficult to accurately associate the pronoun with its antecedent noun and
so derived patterns will contain more error. Overall the result indicated that the linear
aggregation approach correlated with the human raters (r = 0.74) better than the sentence
aggregation approach (r = 0.44). Replacing the pronouns with their referents had little effect on
the linear aggregation approach (r = 0.71) and had a small positive effect on the sentence
aggregation approach (r = 0.51). The authors of the study concluded that the linear approach is
less influenced by pronouns and superior to the sentence aggregation approach overall when
comparing individual scores, and narrowly superior when conducting group-wise comparisons.
Therefore, adding an additional pronoun handling subroutine would not substantially improve the
36
validity of the ALA-Reader when deriving knowledge structures from essays, the ALA approach
is robust.
Creating a List of Terms
When eliciting knowledge structure from essays, the foundation of the process is creating
a list of terms. These terms become the concepts used for pattern matching that make up the
linked nodes forming the knowledge structure represented as a network graph. Past research
derived concepts from a predefined list of terms in order to create network graphs (Koul et al.,
2005; Curtis & Davis, 2003; Goldsmith et al., 1991). For example, close-ended network graphs
have a predefined set of terms, whereas open-ended network graphs can have a high degree of
variance in terms (Taricani & Clariana, 2006). Essays are intrinsically open-ended because it is
nearly impossible and unrealistic to constrain a writer to just one list of terms. Taricani and
Clariana (2006) developed a technique for automatically scoring open ended network graphs
using the frequency counts of words in order to determine the important words that can be
extended to essay scoring. This approach can be applied for deriving a list of terms for scoring
essays.
In actual educational settings, deriving knowledge structure from essays produces open-
ended network graphs, which is a more accurate way to capture knowledge structures (Taricani &
Clariana, 2006). Since the inception of the ALA-Mapper and ALA-Reader approach, the focus of
research has been on the algorithmic processes for the aggregation of data such as the linear
aggregation approach verses the sentence aggregation approach (Clariana & Wallace, 2007;
Clariana et al., 2009). Most research conducted on validating the ALA-Mapper for maps and
ALA-Reader for essays did not have much variance in the number of terms (ranging from the mid
to high twenties). The foundation for measuring structural knowledge is rooted in the creation of
a term list. One could proffer that without a valid list of terms you cannot have a valid measure
of structural knowledge.
37
What do we know about the number of terms needed to validly represent knowledge
structure? Research conducted by Goldsmith et al. (1991) shows that when using pair-wise rating
tasks, the predictive validity of Pathfinder networks increase in an almost linear fashion as the
number of terms increase. When scoring network graphs Clariana and Taricani (2010)
investigated how increasing the number of terms affected the predictive ability of network graph
scores. These results were contrary to what they expected. They created three lists based on 16
terms, 26 terms, and 36 terms. The result was that the network graphs with the 26 terms had the
best predicative ability, which was in contrast to the Goldsmith et al. (1991) study. One major
difference between the two studies is that in the Goldsmith et al. (1991) study, the network graphs
were not open ended, whereas the Clariana and Taricani (2010) network graph were open-ended.
When focusing solely on the term list alone, another difference between the two studies is that in
the Clariana and Taricani (2010) study, the term list intervals were 10 terms, whereas in the
Goldsmith et al. (1991) study the term list intervals were 5 terms (the minimum term list had 5
terms, the maximum term list had 30 terms).
Past research on the ALA-Reader (Clariana et al., 2009; Clariana & Koul, 2008; Clariana
& Wallace, 2007; Taricani & Clariana, 2006;) would support the Clariana and Taricani (2010)
results as these studies created term lists with terms in the mid to high twenties for essay about
500 words long. In the Clariana and Taricani (2010) study with concept maps, variances in the
student’s selection of important terms could have affected the results. According to Clariana and
Taricani (2010), “apparently, a few of the most important terms contribute to most of the
predictive ability of the concept maps created by the non-experts” (p. 170). This statement
indicates that not only would the number of terms effect predictability but also a combination of
the number of terms along with the quality of the terms. Clariana and Taricani (2010) conclude
that further refinement of the tool and approach is warranted and also that the number of terms
may be a confounding variable when attempting to refine the tool and approach.
38
A recent study conducted by Clariana, Wolfe, and Kim (2014) specifically recommended
that the number of terms used in analysis should be the focus of future research. Their research
sought to continue the validation of the ALA-approach using Pathfinder analysis with both a
linear approach and a minimum distance approach applied to narrative and expository lesson
texts. The linear approach was performed by the ALA-Reader, which generates the proximity
files containing only ones and zeros to indicate the sequential occurrence of terms in text
(Clariana et al., 2009). The minimum distance approach used an Excel spreadsheet to establish
minimum distance values between all of the selected important terms (Clariana et al., 2014).
Both the sequential approach (median r = .70) and the minimum distance approach (median r =
0.67) were a comparative measurement of text structure as a proxy of knowledge structure.
Additionally, the number of terms used by the software in the study was 17. The study provides
additional evidence that both approaches are valid, although the linear approach performed
slightly better than the minimum distance approach.
The Clariana et al. (2014) research is another study that sets varying the text to data
processes as an independent variable but leaves the number of terms static. Although the authors
state several limitations to the generalizability of the research, the establishment of an optimal
number of terms is of particular interest to the current study. Clariana et al. (2014) noted that all
participants did not use all 17 of the important terms in their essays; that missing terms could
negatively impact PFNet structure; since missing important terms typically create multiple
missing links; therefore future research is needed to determine the optimal number of terms for
pattern matching.
In summary, past research on the ALA approach for scoring network graphs and essays
has focused primarily on how knowledge is elicited as proximity files, for example, comparing
list-wise, pair-wise, and clustering approaches (Clariana & Wallace, 2009) within sentence
approach verses the linear aggregate approach (Clariana & Wallace, 2007; Clariana et al., 2009),
sequential approach (Clariana et al., 2009), and minimum distance approach (Clariana et al.,
39
2014). The results of these studies continued to increase the validity of the ALA approach as a
means to elicit structural knowledge, but one area of exploration has been lacking in this domain
which is the exploration of the number of terms used to create the proximity files (Clariana &
Taricani, 2010; Clariana et al., 2014). All of these previous studies asked content experts to give
a frequency list of terms. Therefore, each study had a list of terms of different length, partly
based on the content of the participants' essays, but mainly on the intuition of the experts.
The purpose of the current investigation is an attempt to answer the question: “does an
optimal number of terms exist that can be used to create proximity files or are more terms better?"
To answer this question the current investigation plans to hold the elicitation approach for
creating the proximity files constant and focus on varying the number of terms used to create the
proximity files in order to determine what effect it has on the convergent validity of essay scores
relative to human scores. In the end, this investigation hopes to find a starting point for the
creation of a protocol for optimizing the quality and quantity of terms used to create proximity
files and further the validity of the ALA approach as a mechanism for the elicitation of structural
knowledge.
40
Chapter 3
METHODOLOGY
The purpose of this study was to further validate the ALA-Reader as an
automated tool for scoring essays by varying the number of terms used to create raw
proximity files and then compare the derived network graphs to five referent network
graphs derived from essays. Construction of a valid network graph should reveal the
underlining structural knowledge of the student within a given domain. This validation
was conducted by varying the number of terms used to score network graphs similar to
the Clariana and Taricani (2010) study and comparing them to other groups and
individuals (Clariana & Wallace, 2007; Clariana et al., 2009; Clariana, 2010b). Further
validation of the ALA-Reader was conducted by comparing the structural makeup of
network graphs from varying degrees of expertise (i.e., comparing an expert's essays with
lower scored essays (Clariana & Wallace, 2007). This chapter describes the methods of
the study.
Participants
Archived exam data was secured from students enrolled in a Distance Education
Strategic Studies program at a Senior Military Learning Institution. These students were
U.S. Military Officers ranked as 05 (Lieutenant Colonels or branch equivalent) and 06
(Colonels or branch equivalent), senior international officers ranked 05 or higher, and
senior grade GS-14 or higher U.S civilian government employees. Course enrollment
was 361 students. All participants have an undergraduate college degree and are senior
civilian or military leaders.
41
The standard IRB process for conducting research at the Pennsylvania State
University was adhered to in order to obtain approval to conduct the research. The IRB
approved the research as non-human subjects research since archived student data was
collected for the current study. In addition to the standard IRB process, approval was
obtained by the Office of Institutional Research from the military educational institution
in which the archived data was obtained. As a condition of approval, all references that
directly or indirectly identify a student or students within the archived data set and or
research paper were removed.
Course Materials
Students who attended this military educational institute were enrolled in one of
two programs: a nine-month resident education program, or a two-year distance
education program. Although the curriculum of these two programs differs in
organization and time, students in both programs are awarded the same graduate degree -
a Masters in Strategic Leadership.
The focus of the current research is on essays produced by students enrolled in the
Distance Education Program’s (DEP) core curriculum. The DEP is a middle-states and
JPME-1 accredited master’s degree program. The DEP consists of eight primary courses,
an elective course, and two resident courses taken over a two-year period. All
courseware is available online with a few books that are mailed to the students. Each
course starts with a directive, which is similar to a course syllabus. Course content is
organized into multiple blocks of instruction. Each block is divided into multiple
sections and within each section are multiple lessons, which are the basic units of
42
instruction. For most of the courses, students are assessed by writing a comprehensive
essay and participate in an online-evaluated forum. The focus of this research is on the
end of course comprehensive essay the student completes as part of the course’s core
curriculum requirement.
A course is created by a course author who is responsible for the content creation
and defining the evaluation criteria for each course. All course authors must have a
graduate degree in strategic leadership or specialize in a strategic area earning a
doctorate. Course content and evaluation criteria are validated by other faculty members
within the course author’s year group, year group directors, and the department chairman
through a series of meetings that take place once one iteration of the course is completed
and before the next iteration of that course begins (approximately a year later). All
course material is then converted for online delivery and management by an instructional
support group (ISG). The ISG is a group of individuals experienced in developing online
courseware and instructional methodologies. Table 3.1 shows the DDE course content.
Table 3.1. DEP Courseware Content.
____________________________________________________________________
Content Description
e-Book HTML page that contains course instruction and scaffolds the
learning.
Lectures Video or audio recordings of lectures by guest speakers that
have appeared at the USAWC or other schools pertaining to
the objectives of the course.
43
Presentations PowerPoint presentation that discuss process, policies or plans.
Interactive Learning
Modules
Multi-media learning modules that require students to interact
with the instruction.
Models Images or graphics that demonstrate a policy or process.
Readings PDF documents that contain information relevant to the
completion of the course or additional material if the student
wants to learn more.
Self-Diagnostics Non-graded multiple-choice quizzes that allow the student to
assess their own understanding of the course objectives.
External Links Websites that are external to the courseware, providing
additional support to the course objectives.
Library Resources Online library access to download additional readings, books,
and presentations.
_______________________________________________________________________
Students access all courseware materials online through a student management
system called the Online Automated Student Information System (OASIS) and the
courseware website. OASIS is a customized student management system (SMS)
designed specifically for the distance and resident programs. OASIS manages the
student’s access to courseware but does not deliver the courseware. Table 3.2 lists the
features of OASIS.
44
Table 3.2. Features of OASIS.
____________________________________________________________________
Features Purpose
Student Authentication Authenticates students into the system.
Courseware Access Provides links to the courseware that exists on another
server.
Forum (used to evaluate
students)
Provides asynchronous discussion between students
including the course author. Also, provides function to
upload and download files. It is not used as a social learning
tool/vehicle, but rather as an assessment tool used near the
completion of a course.
News and Events A list of news and events the student should be aware of
regarding the course, class, or curriculum.
Task List List of tasks the student needs to accomplish; it usually holds
additional information about the course requirements.
Other administration
tools
Other features and student self-service options that pertain to
administrative links, student info, and college links unrelated
to the courseware itself.
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned
DanielFanella_DissertationSigned

More Related Content

Similar to DanielFanella_DissertationSigned

Reference & Research W Finds
Reference & Research W  FindsReference & Research W  Finds
Reference & Research W Finds
out2sea5
 
Reference Research Secondary 01revleslearned 1233335774105732 3
Reference Research Secondary 01revleslearned 1233335774105732 3Reference Research Secondary 01revleslearned 1233335774105732 3
Reference Research Secondary 01revleslearned 1233335774105732 3
out2sea5
 
Exercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docx
Exercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docxExercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docx
Exercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docx
SANSKAR20
 

Similar to DanielFanella_DissertationSigned (20)

Lacue presentation 2013 ccss and tech
Lacue presentation 2013 ccss and techLacue presentation 2013 ccss and tech
Lacue presentation 2013 ccss and tech
 
Mde dennismargueratt thesis
Mde dennismargueratt thesisMde dennismargueratt thesis
Mde dennismargueratt thesis
 
Mde dennismargueratt thesis
Mde dennismargueratt thesisMde dennismargueratt thesis
Mde dennismargueratt thesis
 
Reference & Research W Finds
Reference & Research W  FindsReference & Research W  Finds
Reference & Research W Finds
 
Reference Research Secondary 01revleslearned 1233335774105732 3
Reference Research Secondary 01revleslearned 1233335774105732 3Reference Research Secondary 01revleslearned 1233335774105732 3
Reference Research Secondary 01revleslearned 1233335774105732 3
 
AN EXPLORATORY CASE STUDY FACULTY INVOLVEMENT IN DEVELOPING WRITING-INFUSED ...
AN EXPLORATORY CASE STUDY  FACULTY INVOLVEMENT IN DEVELOPING WRITING-INFUSED ...AN EXPLORATORY CASE STUDY  FACULTY INVOLVEMENT IN DEVELOPING WRITING-INFUSED ...
AN EXPLORATORY CASE STUDY FACULTY INVOLVEMENT IN DEVELOPING WRITING-INFUSED ...
 
Alt I Lab 2005 Ep Services
Alt I Lab 2005 Ep ServicesAlt I Lab 2005 Ep Services
Alt I Lab 2005 Ep Services
 
Academic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDFAcademic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDF
 
Achterman csla 2011reading_online
Achterman csla 2011reading_onlineAchterman csla 2011reading_online
Achterman csla 2011reading_online
 
Csla presentation reading online 2011
Csla presentation reading online 2011Csla presentation reading online 2011
Csla presentation reading online 2011
 
Learning Analytics & the Changing Landscape of Higher Education
Learning Analytics & the Changing Landscape of Higher EducationLearning Analytics & the Changing Landscape of Higher Education
Learning Analytics & the Changing Landscape of Higher Education
 
Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...
 
Essay Revision Online.pdf
Essay Revision Online.pdfEssay Revision Online.pdf
Essay Revision Online.pdf
 
Writing a tenure statement 2011
Writing a tenure statement 2011Writing a tenure statement 2011
Writing a tenure statement 2011
 
Conceptual framework
Conceptual frameworkConceptual framework
Conceptual framework
 
Exercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docx
Exercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docxExercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docx
Exercise 3 Worksheet Create a Peer Reviewed ReferenceSave this .docx
 
How to Integrate Technology into the Curriculum
How to Integrate Technology into the CurriculumHow to Integrate Technology into the Curriculum
How to Integrate Technology into the Curriculum
 
Authorial Stance In Accounting PhD Theses In A Nigerian University
Authorial Stance In Accounting PhD Theses In A Nigerian UniversityAuthorial Stance In Accounting PhD Theses In A Nigerian University
Authorial Stance In Accounting PhD Theses In A Nigerian University
 
Research 36. How to Write Significance. Code.601.pptx
Research 36. How to Write Significance.  Code.601.pptxResearch 36. How to Write Significance.  Code.601.pptx
Research 36. How to Write Significance. Code.601.pptx
 
ACRL's Framework for Information Literacy for Higher Education: Implications ...
ACRL's Framework for Information Literacy for Higher Education: Implications ...ACRL's Framework for Information Literacy for Higher Education: Implications ...
ACRL's Framework for Information Literacy for Higher Education: Implications ...
 

DanielFanella_DissertationSigned

  • 1. The Pennsylvania State University The Graduate School College of Education THE EFFECTS OF CHANGING THE NUMBER OF TERMS USED TO CREATE PROXIMITY FILES ON THE PREDICTIVE ABILITY OF SCORING ESSAY-DERIVED NETWORK GRAPHS VIA THE ALA-READER APPROACH A Dissertation in Learning, Design, and Technology by Daniel F. Fanella  2015 Daniel F. Fanella Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2015
  • 2. ii The dissertation of Daniel F. Fanella was reviewed and approved* by the following: Roy B.Clariana Professor of Education (Learning, Design, and Technology) Dissertation Advisor Chair of Committee Susan M. Land Associate Professor of Education (Learning, Design, and Technology) Major Field Member Priya Sharma Associate Professor of Education (Learning, Design, and Technology) Major Field Member Ravinder Koul Associate Professor of Education (Curriculum and Instruction) Outside Field Member and Outside Field Member *Signatures are on file in the Graduate School
  • 3. iii ABSTRACT Knowledge structure is the interrelationship between the concepts within a given domain that exists in the memory associations of an individual that can be captured in external artifacts. Assessing knowledge structure consists of eliciting approaches, representation approaches, and then comparing the structural representations. In the pursuit to create an automated method to capture and assess knowledge structure, the Analysis of Lexical Aggregates tool (ALA-Reader) has shown promise. Research has supported the ALA- Reader as a computer based approach for automatically scoring network graphs, measuring individual, and group knowledge from essays, and as an approach to elicit knowledge structure. The ALA-Reader translates written text summaries into a proximity file to aggregate data at the sentence level to score essays in comparison to human raters. In order to create proximity files, the ALA-Reader needs to have a list of terms. Past research on the ALA approach for scoring network graphs and essays has focused primarily on how knowledge is elicited as proximity files, for example, within sentence approach verses the linear aggregate approach, but the number of terms used to create the proximity files was not a variable. The purpose of this current non- experimental exploratory investigation is to attempt to answer the question: “does an optimal number of terms exist that can be used to create proximity files or are more terms better?" To answer this question the current investigation plans to hold the elicitation approach for creating the proximity files constant, and focus on varying the number of terms used to create the proximity files in order to determine what effect this has on the convergent validity of essay scores relative to human rater scores. This study found that when using the ALA-Reader to create proximity arrays, 20 terms consistently had the highest correlation of the word lists. This result was constant when applied to all five
  • 4. iv referent expert essays. These results suggest more terms are not necessarily needed to create valid proximity arrays, which may increase the practicality of using the ALA- Reader as a valid tool for automatically scoring essays. This study also extends past research by considering essays that are considerably longer than those in the past; also the type of essays analyzed in this study were argumentative essays. Future research is needed to refine these results by applying the current study’s methodologies to other types of essays i.e. persuasive, expository, or narrative, and to smaller restricted response essays, blogs, or online forums.
  • 5. v TABLE OF CONTENTS LIST OF FIGURES .....................................................................................................vii LIST OF TABLES.......................................................................................................ix ACKNOWLEDGEMENTS.........................................................................................xi Chapter 1......................................................................................................................1 INTRODUCTION .......................................................................................................1 The Analysis of Lexical Aggregates.....................................................................5 Statement of the Problem, Purpose, and Research Questions ..............................8 Definitions ............................................................................................................10 Chapter 2......................................................................................................................12 REVIEW OF THE LITERATURE .............................................................................12 Essay Assessment .................................................................................................13 Methods of Capturing and Assessing Structural Knowledge...............................16 Measuring Knowledge Structure Using Concept Maps ................................16 Knowledge Elicitation Approaches...............................................................20 Other Approaches for Eliciting Knowledge Structure. .................................22 Assessing Knowledge Structure with the Pathfinder Approach...........................26 Comparing Knowledge Structures with Experts ...........................................30 Analysis of Lexical Aggregates Approach...........................................................32 Validating the ALA approach .......................................................................33 Creating a List of Terms................................................................................36 Chapter 3......................................................................................................................40 METHODOLOGY ......................................................................................................40 Participants ...........................................................................................................40 Course Materials...................................................................................................41 Research Purpose and Context .............................................................................45 Criterion Measures................................................................................................46 Essay Scoring Procedures.....................................................................................48 Developing Course Pony Document .............................................................48 Essay Assessment..........................................................................................48 The Sync Meeting and After Action Report..................................................49 Essay Requirement ...............................................................................................51 Procedures.............................................................................................................53 Referent Essays Defined................................................................................54 Essay Data Collection and Preparation .........................................................54 Generating a List of Terms............................................................................55
  • 6. vi Finalizing the importance of terms................................................................57 Creating the Proximity Files and PFnets.......................................................58 Chapter 4......................................................................................................................60 RESULTS ....................................................................................................................60 Term List ..............................................................................................................60 Proximity File Analysis of all Essays...................................................................61 Benchmark Referent Essay One: Containment Approach ............................63 Benchmark Referent Essay Two: Deterrence Approach...............................64 Benchmark Referent Essay Three: Engagement Approach ..........................66 Benchmark Referent Essay Four: Student Expert.........................................67 Benchmark Essay Five: PONY document ....................................................68 Proximity File Analysis of Essays with all Scores Excluding 3-Scores...............71 Benchmark Referent Essay One: Containment Approach ............................71 Benchmark Referent Essay Two: Deterrence Approach...............................74 Benchmark Referent Essay Three: Engagement Approach ..........................75 Benchmark Referent Essay Four: Student Expert.........................................76 Benchmark Referent Essay Five: Pony Document .......................................78 Comparison of the Proximity File Analysis of Essays with all Scores and the Proximity File Analysis of Essays without the 3-Scores...............................79 Chapter 5......................................................................................................................81 GENERAL DISCUSSION ..........................................................................................81 Summary of Results..............................................................................................81 Research Implications...........................................................................................86 Limitations of the Study .......................................................................................92 Future Research ....................................................................................................94 References....................................................................................................................98 Appendix A..................................................................................................................107 Appendix B..................................................................................................................109 Appendix C..................................................................................................................115 Appendix D..................................................................................................................117
  • 7. vii LIST OF FIGURES Figure 1.1. Diagram of Various Elicitation, Representation and Comparison Approaches ...................................................................................................... 4 Figure 2.1. Simple Network Graph Produced by the Pathfinder Algorithm ........................ 27 Figure 2.2. Mean Predictive Ability Based on Number of Terms ...............................29 Figure 4.1. Example 20-Term Link Array ...................................................................62 Figure 4.2. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with the Containment Approach Referent by the Number of Terms Used to Create Proximity Link Arrays .........................64 Figure 4.3. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with the Deterrence Approach Referent by Number of Terms Used to Create Proximity Link Arrays ...............................66 Figure 4.4. Comparison Between the R-value Measuring the Relationship Between Score and Engagement Approach Referent by the Number of Terms Used to Create Proximity Link Arrays ................................................. 67 Figure 4.5. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with the Student Expert Essay by the Number of Terms Used to Create Proximity Link Arrays .............................................68 Figure 4.6. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with the PONY Document by the Number of Terms Used to Create Proximity Link Arrays ................................................. 70 Figure 4.7. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with the Containment Approach Referent by the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores ... 73 Figure 4.8. Comparison Between the R-value Measuring the Relationship Between Score and in Common with the Deterrence Approach Referent by the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores .........75 Figure 4.9. Comparison Between the R-value Measuring the Relationship between Score and Links in Common with the Engagement Approach Referent by the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores ... 76 Figure 4.10. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with a Student Expert Essay by the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores ........................... 77
  • 8. viii Figure 4.11. Comparison Between the R-value Measuring the Relationship Between Score and Links in Common with the PONY Document by the Number of Terms Used to Create Proximity Link Arrays w/o 3 Scores ........................... 79 Figure 5.1. Comparison of the Five Benchmark Referent Essays .............................. 83 Figure 5.2. Comparison of Past Studies with the Current Study to Find Optimal Range ............................................................................................................... 86 Figure 5.3. Comparison of the Anticipated Optimal Range with the Optimal Based on the Results ...................................................................................................91
  • 9. ix LIST OF TABLES Table 3.1. DEP Courseware Content ............................................................................ 42 Table 3.2. Features of OASIS ........................................................................................ 44 Table 3.3. Grading Scaling and Number of Essays ....................................................... 47 Table 3.4. Textalyser Word List Based on Frequency of Terms ................................... 57 Table 3.5. Correlation Comparison of Terms by Benchmark Referent ......................... 59 Table 4.1. Summary of the Correlations Between Student Score and Links in Common with the Containment Approach Referent Based-on Number of Terms (N = 215) ................................................................................................ 63 Table 4.2. Summary of the Correlations Between Student Score and Links in Common with the Deterrence Approach Referent Based-on Number of Terms (N = 215) ................................................................................................ 65 Table 4.3. Summary of the Correlations Between Student Score and Engagement Approach Referent Based-on Number of Terms (N = 215) .............................. 66 Table 4.4. Summary of the Correlations Between Student Score and Links in Common with the Student Expert Based-on Number of Terms (N = 215) ....... 67 Table 4-5. Summary of the Correlations Between Student Score and Links in Common with the PONY Document Based-on Number of Terms (N = 215) .. 69 Table 4.6. Summary of the Correlations Between Student Score and Links in Common with the Containment Approach Referent Based-on Number of Terms (N = 70) without 3-Score Essays ............................................................ 72 Table 4.7. Summary of the Correlations Between Student Score and Links in Common with the Deterrence Approach Referent Based-on Number of Terms (N = 70) without 3-Score Essays ............................................................ 74 Table 4.8. Summary of the Correlations Between Student Score and Links in Common with the Engagement Approach Referent Based-on Number of Terms (N = 70) without 3-Score Essays ............................................................ 75 Table 4.9. Summary of the Correlations Between Student Score and Links in Common with a Student Expert Essay Based-on Number of Terms (N = 70) without 3-Score Essays ...................................................................................... 77 Table 4.10. Summary of the Correlations Between Student Score and Links in Common with the PONY Document Based-on Number of Terms (N = 70)
  • 10. x without 3-Score Essays ...................................................................................... 78 Table 4.11. Summary of the Correlations Between the PFALL Analysis and the PFW3 Analysis ................................................................................................. 80
  • 11. xi ACKNOWLEDGEMENTS First, I would like to thank God for providing me with all the opportunities to finish my degree and making all things work out. I would like to thank my wife Michele and children Madelyn and Eden for their encouragement and putting up with me when I had to sequester myself from the rest of the world. I would also like to thank my grandparents Frank and Peggy Rovito for all their support over the years, without them, none of this would have been possible. Finally, I would like to thank my Advisor Dr. Roy Clariana for his guidance and support, and my committee (Dr. Susan Land, Dr. Priya Sharma, and Dr. Ravinder Koul) for asking the tough questions at the proposal and defense.
  • 12. 1 Chapter 1 Introduction When individuals read text and then write an essay on what they read, they produce and reproduce chunks of sequential language that represent higher levels of knowledge (Clariana, Wolfe, & Kim, 2014). These “text structures” are the building blocks used to construct high dimensional representations of knowledge. The configuration of high dimensional knowledge is stored in the mind of an individual in the form of structural knowledge. Structural knowledge is a type of knowledge that contains the analysis of high dimensions of knowing (Jonassen, Beissner & Yacci, 1993). Structural knowledge may contain multiple levels of complexity because it consists of the schematic makeup of the interrelationships between concepts (knowledge structure) in the individual’s mind. Thus, structural knowledge assessment may be essential in measuring complex and intricate levels of knowledge. The structure of knowledge constructs may reside within theoretical constructs such as mental models and schema, which are not directly observable because they consist of cognitive processes internal to the individual (Ifenthaler, 2008). The question becomes, what are the means to externally visualize, measure, and or assess structural knowledge? Essay exams have been shown to be an important measure of complex knowledge (Clariana, Wallace, & Godshalk, 2009). They are used to evaluate and promote higher levels of knowledge and understanding (Diekhoff, 1983). In education, using essays as a method of evaluating students may be the most common way to assess the knowledge students possess about a given topic, especially when evaluation takes place within a complex domain. An essay, according to Nitko and Brookhart (2007),
  • 13. 2 "offers students the opportunity to display their abilities to write about, to organize, to express, and to explain interrelationships among ideas" (p.191). When writing an essay, a student begins to restructure the low dimensional linear structures in the course texts in order to construct higher dimensional relational knowledge structures of the given domain or topic. According to Clariana et al. (2009), “measuring the progress of learning in complex domains is an import issue for instructional designers, instructors, and researchers” (p. 726). Although essays can contain the structure of the student’s knowledge of the topic at hand, they also contain idiosyncratic or even extraneous content that can cloud or even hide the learner’s actual knowledge structure. Knowledge structure is the interrelationship between the concepts within a given domain and is something that exists in the memory associations of an individual that can be captured in external artifacts (Clariana, 2010a). Kim (2012) summarizes many methods that have been developed to capture knowledge structure, such as ALA-Mapper (Analysis of Lexical Aggregates Mapper), ALA-Reader (Analysis of Lexical Aggregates Reader), DEEP (Dynamic Evaluation of Enhanced Problem-Solving), SMD (Surface Matching and Deep Structure), and KNOT (Knowledge Network Orientation Tool), to name a few. Research conducted by the National Center for Research on Evaluations Standards and Student Testing (CRESST) has shown promising results in visually creating mental models in order to compare team or shared mental models with that of an expert (Herl, O'Neil, Chung, & Schacter, 1999). Automated Knowledge Visualization and Assessment (AKOVIA) is a methodology that uses various algorithms for visually representing internal and shared mental models and team performance (Ifenthaler, 2014b). The Texas Christian University Node-Link Mapping system uses various types
  • 14. 3 of mapping systems such as Information maps, guide maps, and freestyle maps to visualize knowledge (Dansereau, 2005). Although the protocols, mapping systems, and algorithms differ among the aforementioned methods, they all have a similar goal, to spot and identify the structural make-up of the knowledge within a given domain residing in the mind of the individual. Figure 1.1 shows the various types of elicitation approaches, representation approaches, and comparison approaches for the three-stage process of assessing and representing structural knowledge. Conversion of essays into networks graphs (visualizations that show the relationships between concepts) is one method used to elicit, represent, and then compare knowledge structure because, according to Koul, Clariana, and Salehi (2005), networks graphs can provide a type of visual and holistic way of representing knowledge structure. Although essays and networks graphs are related, networks graphs provide a visual structure of knowledge that essays do not.
  • 15. 4 Figure 1.1. Shows the various types of knowledge elicitation, knowledge representation, and knowledge comparison approaches. Adapted from "Structural knowledge: techniques for representing, conveying, and acquiring structural knowledge, " by D. H. Jonassen et al., 1993, p 22. Copyright 1993 by Lawrence Erbaum, Associates, Hillsdale, NJ. Following from a series of design studies, this investigation will elicit knowledge structure as linear /sequential patterns in essays, will represent the patterns as network graphs, and then will compare these graphs to an expert derived reference. Past research has shown that Pathfinder network graphs can be extracted from essays and produce graphical representations of knowledge structure (Clariana, 2010a; Clariana et al., 2009). Pathfinder Networks can also be used to simplify complex networks (Schvaneveldt, 1990b). Knowledge structure, according to Clariana (2010a), is “the precursor of meaningful expression and is the underpinning of thought” (p. 41). Knowledge structure can be represented in knowledge maps, which consist of nodes and
  • 16. 5 links (Clariana, 2010b, Dansereau, 2005, Ifenthaler, 2008, Ruiz-Primo, Shavelson, Li, & Schultz, 2001). The linking of terms within a knowledge map provides a visual structure of how the terms within a map relate to each other. The relationship of concepts (terms) can be graphically represented via the terms and links forming a network graph. Once a graphical representation is obtained, these graphical models can be compared quantitatively using array proximity data. Proximity data is an n-by-n matrix where n is number of terms-concepts representing distance or adjacency (Kim, 2012). The proximity data measure of distance looks at all pair-wise distances between terms and calculates the distance of the terms in space or directly judged by students on an ordinal scale (Taricani & Clariana, 2006). The proximity data measure of adjacency relates paired terms by an n by n matrix of 1’s (indicating a connection between terms) and 0’s (indicating no connection between terms) (Clariana et al., 2009). Koul, Clariana, and Saleh, (2005) indicated that several dimensions of information exist within network graphs, yet there is no one singular method for generating network graphs. One way to elicit network graphs derived from essays and the focus of the current research is the analysis of lexical aggregates approach. The Analysis of Lexical Aggregates A lexical aggregate approach was developed by Clariana and Koul (2004) as a way to score essays as well as graphically represent knowledge structure via network graphs. This approach is called the Analysis of Lexical Aggregates (ALA). In short, ALA aggregates pairs of terms from a pre-selected list including synonyms and metonyms at either the sentence level or linearly across sentences, and saves the
  • 17. 6 aggregate as a link array proximity file (e.g., dot.prx) for further analysis (Clariana & Wallace, 2007). Achieving an accurate list as possible is crucial to the validity of the post aggregate analysis. Past research has sought to refine how these lists are created and analyzed (Clariana & Taricani, 2010). Clariana and Koul (2004) developed a computer program that translates written text summaries in a proximity file to aggregate data at the sentence level to score essays in comparison to the human raters called the ALA-Reader (Clariana, 2004). In that investigation, ALA scores ranked 5th (r = 0.69) out of 12 against 11 human raters, indicating that the ALA-Reader only had 4 human raters that scored essays more accurately than the ALA-Reader, and 7 human raters scored less accurately than the ALA-Reader. In a follow up investigation, Koul et al. (2005) used the sentence-level approach to score essays relative to an expert verses Latent Symantec Analyses (LSA) - a method for extracting meaning from passages of text, based on statistical computations from multiple documents (Evangelopoulos, 2013). Results indicated the ALA-Reader performed more like the human raters (r = 0.38 to r = .71) then the LSA approach (r = - 0.07 to r = 0.39). Clariana and Wallace (2007) used the ALA-Reader to score essays relative to an expert, using the within sentence approach (aggregation of data by focusing on key concepts within sentences) and a new linear aggregate approach (aggregation of data by considering key concepts both within and across sentences). The linear aggregate approach produced larger correlations with the human raters (r = 0.60 and r = 0.45) then did the sentence aggregate approach (r = 0.47 and r= 0.29). These studies demonstrate
  • 18. 7 that ALA-Reader could be used to score essay exams, or at least provide a second level of validation for the human rater. Recent studies have attempted to further validate the ALA-Reader software as a tool to score essays and analyze the structure of knowledge by refining the validly of the terms used to create network graphs from essays. Descriptive research conducted by Clariana et al. (2009) sought to refine the accuracy of term lists by manually replacing the pronouns in student essays with their referents. They found that the linear aggregate approach had better correlation (r = 0.74) with human raters than the sentence aggregate approach (r = 0.44); also, converting pronouns to their referent had little effect on the linear aggregate scores. Visual forms of knowledge elicitation (graphical representations of relationships among sets of terms) may be the most explicit way to represent an individual’s knowledge structure (Ifenthaler, 2014a; Koul et al., 2005) without the extraneous information often found in essays. Clariana and Taricani (2010) investigated how the number of terms influenced the scoring of open ended concept maps by comparing term lists of 16, 26, and 36 terms. The results of their study indicated that increasing the number of terms decreased the predictive ability of the concept maps scores. These results are contrary to the outcome of a study conducted by Goldsmith, Johnson, and Acton (1991) where structure was elicited as pair-wise ranking from subsets of 5 - 25 terms derived from a 30 term list. Goldsmith et al. (1991) found an almost linear relationship between increasing the number of terms and score predictability. The current study plans to further validate the ALA-Reader by investigating whether or not
  • 19. 8 there are an optimal number of terms required to maximize the predictive ability of the network graph scores. Statement of the Problem, Purpose, and Research Questions Writing is important and complex high dimensions of knowledge can be assessed based on what and how we write. Visual tools such as KU-Mapper, DEEP, and AKOVIA have been implement and tested with promising results (Ifenthaler, 2014b). The ALA tool has shown promise as an automated process to assess complex learning (Clariana & Taricani, 2010; Clariana & Wallace, 2007; Clariana et al., 2009; Koul et al., 2005; Taricani & Clariana, 2006). Research continues to yield favorable results in the continuing validation of the ALA-Reader as a computer based approach for automatically scoring open-ended concept maps (Taricani & Clariana, 2006), measuring individual and group knowledge from essays (Clariana, 2010b; Clariana et al., 2009; Clariana & Wallace, 2007), and as an approach to elicit structural knowledge (Clariana, 2010a; Clariana et al. 2014; Clariana & Wallace, 2009). One of the variables that have not been investigated in any great degree in the exploratory research on validating the ALA approach is an optimal number of terms, which begs the question, would the previous studies have elicited the same results had the number of terms used to create the terms list been a variable? As reported earlier, Goldsmith et al. (1991) experimentally demonstrated a near linear relationship between number of terms and predictably, whereas Taricani and Clariana (2010) found nearly the opposite to be the case. Future research should focus on finding an optimal number of terms (Clariana et al., 2014). Therefore, the current research will explore the effect of
  • 20. 9 varying the number of terms used to create a terms list when scoring network graphs as a measure of assessing structural knowledge derived from essays. The theoretical question this study plans to answer: “are there an optimal number of terms needed to accurately assess structural knowledge derived from essays? The goal of this exploratory research is to proffer an actual number or range of numbers that define an optimal number of terms. It is hypothesized in the current research that the number of terms used to create proximity files will affect the predictive ability as well as the concurrent validity, a measure of how well a particular test correlates with a previously validated measure (Shuttleworth, 2009), of scoring network graphs derived from essays as well as measuring individual and group knowledge from derived essays. This study will investigate the following research question: What are the effects of changing the number of terms used to create proximity files on the predictive ability of scoring essay-derived network graphs via the ALA-Reader approach for individual knowledge structure?
  • 21. 10 Definitions ALA-Reader - a computer program that translates written text summaries in a proximity file. (Clariana & Koul, 2004) Analysis of Lexical Aggregates - a computer-based method developed to analyze knowledge structure within essays by translating text into concept map representations of knowledge. (Clariana, 2004) Automated Knowledge Visualization and Assessment - a methodology that uses various algorithms for visually representing internal and shared mental models and team performance (Ifenthaler, 2014b). Concept maps - graphical representations of relationships among sets of terms (Koul et al., 2005). Concurrent Validity - a measure of how well a particular test correlates with a previously validated measure. (Shuttleworth, 2009) CRESST - a tool for the visual creation of mental models in order to compare team or shared mental models with that of an expert (Herl et al., 1999) Knowledge Structure - the interrelationship between the concepts within a given domain and is something that exists in the memory association of an individual that can be captured in external artifacts. (Clariana, 2010a) Latent Symantec Analyses (LSA) - a method for extracting meaning from passages of text, based on statistical computations from multiple documents (Evangelopoulos, 2013). Linear aggregate approach - aggregation of data by considering key concepts both within and across sentences. (Clariana & Wallace, 2007)
  • 22. 11 Network Graphs - visualizations that show the relationships between concepts. (Koul et al., 2005) Proximity data - is an n by n matrix where n is number of terms-concepts representing distance or adjacency (Kim, 2012). Proximity data measure of distance - looks at all pair-wise distance between terms and calculates the distance of the terms in space or directly judged by students on an ordinal scale (Taricani & Clariana, 2006). The proximity data measure of adjacency - relates paired terms by an n-by-n matrix of 1’s and 0’s (1 indicates a connection between terms, 0 indicates not connection between terms) (Clariana et al., 2009). Structural knowledge - knowledge “that mediates the translation of declarative into procedural knowledge and facilitates the application of procedural knowledge (Jonassen et al., 1993) Within sentence approach - aggregates data by focusing key concepts within sentences. (Koul et al., 2005)
  • 23. 12 Chapter 2 Review of the Literature Jonassen et al. (1993) defined a type of knowledge that exists between the declarative and complex levels of knowledge called structural knowledge. It may be conceived that structural knowledge is a part of or type of declarative knowledge. Jonassen et al. (1993) stated the following on structural knowledge as a part of declarative knowledge: Structure refers to how information within a knowledge domain is organized, which we have defined as structural knowledge. Whether structural knowledge exists, as a separate type of knowledge or it is part of declarative knowledge, is a semantic distinction that does not affect its recognition as an entity or as a distinct type of knowledge (p.4). What Jonassen is saying is that structural knowledge may be considered a type of declarative knowledge, but it also has enough characteristic differences that make it behave like a distinct type of knowledge requiring different elicitation and assessment routines. The make-up of structural knowledge consists of the interrelationships of mental schema (based-on declarative knowledge concepts) within a given domain. It is structural knowledge that enables us to visualize the schematic makeup of declarative knowledge components that produces complex levels of knowledge. If we can elicit the knowledge structure of an individual's performance, we can then begin to compare these structures with other performers or learners, and establish a paradigm for assessing such knowledge. This chapter will present the literature surrounding the assessment of structural knowledge found within essays, and tools developed to assess and elicit knowledge structure. We will then focus on how the ALA approach was developed to assess knowledge structure. We will analyze the literature surrounding the development of the ALA approach as well as research attempting to increase the validation of this approach as a tool for creating proximity files and network graphs derived from essay exams and as a tool for automating the assessment of essays. The chapter will conclude with reviewing research on the importance of creating term lists as an
  • 24. 13 essential component of validating the ALA approach as a tool for assessing network graphs derived from essays and thus increase validation for assessing the structure of knowledge within complex domains using the ALA approach. Essay Assessment Essay exams are useful in attempting to tap into complex thinking requiring students to demonstrate their ability to organize, integrate, interpret information, explain a position, construct an argument, evaluate ideas, or any other complex higher learning task (Piontek, 2008). They are in most instances used to evaluate and promote higher levels of understanding (Diekhoff, 1983). There are both advantages and disadvantages to using essays to assess student knowledge. The advantages include measuring complex ideas and reasoning, motivating better study habits, providing students the flexibility in demonstrating what they know, and allowing students to demonstrate communication skills (Piontek, 2008). The disadvantages of using essays include the amount of time it takes to grade as careful attention must be made to creating accurate rubrics, and only a single domain of knowledge can be assessed at one time (Piontek, 2008). There are two types of essay items, restricted response and extended response (Nitko & Brookhart, 2007). Restricted response essays focus and limit the content of the student’s answer. According to Nitko and Brookhart (2007), “Restricted response items should require students to apply their skills to solve new problems or to analyze novel situations” (p. 189). Extended response essays, according to Piontek (2008), “allow the students to construct a variety of strategies, processes, interpretations, and explanations for a given question, and to provide any information they consider relevant” (p. 6). Extended response essays tend to reveal more of the student’s organizational, integration, and evaluation abilities, but this essay type is less efficient in extracting exact information about a given topic. In addition to the two types of essays, restricted and extended response, there are four genre of essay writing. According to the Online Writing Lab (2015) the four genres are
  • 25. 14 • Expository essays require the student to investigate an idea, evaluate evidence, expound on the idea, and set forth an argument concerning that idea in a clear and concise manner. • Descriptive essays ask the student to describe something, such as an object, person, place, experience, emotion, situation, etc. • Narrative essays are often anecdotal, experiential, and personal - allowing students to express themselves in a creative and moving ways. • The argumentative essay is a genre of writing that requires the student to investigate a topic; collect, generate, and evaluate evidence; and establish a position on the topic in a concise manner. (Online Writing Lab, 2015) In the current study, the essays that were secured for analysis were extended response type essays that could be categorized as an argumentative essay, as students were asked to take a specific diplomatic approach and defend their choice. Regardless of the genre and type of essay, a student must write, and assessment can be problematic. One of the aforementioned disadvantages of essay assessments is that they are difficult to grade with a high degree of reliability. How raters score essays should be consistent and objective so that students are truly being assessed on their knowledge and not on unrelated factors, such as rater biases (Schaefer, 2008). Raters who inject bias into the assessment become a threat to the validity of essay assessment (Messick, 1995). Without the use of a rubric, scoring reliability is a problem prevalent within extended response essays (Nitko & Brookhart, 2007). Due to the lack of restrictions within extended response essays, raters may have trouble rating essays consistently. Rater drift is the tendency for raters to change their scoring over time (Nitko & Brookhart, 2007). This tends to happen slowly when raters score many essays over a long period of time.
  • 26. 15 To reduce rater bias and increase scoring reliability there are a few guidelines that should be adhered to when scoring essay items (Nitko & Brookhart, 2007; Piontek, 2008). 1. Use a scoring rubric. Rubrics guide the rater to make sure he or she is focusing on the correct content and weighting it correctly. 2. Outline or demonstrate what an expected answer looks like. A model answer will help the rater identify what the right answer looks like and makes it consistent for rating every essay. 3. Score one question at a time. If the assessment requires students to answer multiple essay questions, the rater should assess the same question from all students before evaluating the next question. 4. Score subject matter content separately from other factors. The content of the essay should be evaluated separate from other factors as spelling, style, format, and language, unless these non-content factors are listed in the rubric or part of the essay writing objective. 5. Score essays anonymously. Remove the names from the essay in order not to allow bias towards a student based on what you know about them. This will reduce the Halo Effect, which is when one characteristic of an individual affects the rater’s judgment. 6. Give students feedback. Given the complexity of the essays and the level of knowledge they tend to assess, feedback becomes an important part of the learning, and also feedback keeps the rater consistent and aware of what they are assessing. 7. Use a systematic process for scoring each essay. The same method should be used by the rater to score each essay. This can be accomplished by the use of model answers and rubrics. Strictly adhering to these guidelines may not eliminate the problems with scoring essays, but it should reduce rater bias, rater drift, the halo effect, and increase scoring reliability.
  • 27. 16 Methods of Capturing and Assessing Structural Knowledge Measuring Knowledge Structure Using Concept Maps Graphic depictions of how people organize knowledge within a given domain are called concept maps (Green, Lubin, Slater, & Walden, 2013). Concept maps are a way in which we externally and visually represent the internal structure of our knowledge within a given domain, as well as measuring the important aspects of the structure of domain knowledge (Hu, Cheng, & Heh, 2011; Ifenthaler, 2008; Ruiz-Primo et al., 2001). These maps can be both digitally created for individuals and also for teams as a way to represent the shared mental models (Engelmann & Hesse, 2010). Ruiz-Primo et al., (2001) conducted research attempting to establish a framework for examining the validity of the knowledge structure interpretation of three concept mapping techniques. These techniques consisted of using 20 preselected concepts/terms to 1) construct a concept map from scratch, 2) fill in the nodes, and 3) fill in the line. According to Ruiz-Primo et al. (2001), “a concept map can be categorized along a continuum from high-directed to low- directed, based on the information provided to the students” (p.101). Constructing a concept map from scratch was a low-directed technique, meaning that only the concepts were given without instruction or aid on how to construct the concept maps. The fill in the nodes and fill in the line concept mapping techniques were high-directed in that they were much more structured by providing either the nodes or the links for the concept map. In that study, Ruiz-Primo et al., (2001), concluded that all three mapping techniques measure student knowledge. Low-directed techniques provide students more opportunities to reveal more of their conceptual understanding than high-directed techniques. The results of the Ruiz-Primo et al. (2001) research, demonstrated that concept maps could measure different levels of knowledge structure depending on the amount of direction a student is given when completing a concept map. Research has shown that concept maps can be used to chronicle changes in knowledge (Green et al., 2012). By creating pre and post-instruction concept maps, Green et al. (2012)
  • 28. 17 empirically demonstrated that student knowledge changed after instruction. Each concept map (pre and post) was scored based on formulas calculating nodes, links, density, depth, complexity, and chunks resulting in a specific quantifiable score. Learning was measured by significant changes in the mean scores between pre- and post-concept map scores. The result of the research provides support for the use of concept map construction as to measure increases or decreases in knowledge representation after instruction. Additionally, when the participants constructed their concepts maps, the number of nodes or terms used to create the concept maps had a mean range from 12.82 to 25.13 terms. Rye and Rubba (2002) investigated a concept map scoring methodology that weighted concepts and relationships differently based on the presence or absence of the concepts and relationships when compared to an expert referent. In that study, the relationships between the concepts were weighted higher than concepts themselves. Links that were made between concepts were given higher importance than the identification of the concepts or terms. When comparing the student concept map with that of the expert referent, twice as many points were awarded for identifying the relationships between the concepts in common with the expert and fewer points were given to identification of concepts alone. Rye and Rubba (2002) also weighted the concepts and relationships based on importance defined by the expert. Eight concepts out of 127 where defined as central concepts thus resulting in the student receiving 3 points; two points were awarded for the next 33 concepts identified by the expert, and the reaming 87 concepts were given 1 point. When scoring relationships between concepts students received 6 points, if their relationship matched the expert's linking between two concepts. Students received 4 points if one expert’s concept was linked to a student concept. Students received 2 points if they made a valid link between two concepts that was not found on the expert’s concept map. The results of this study indicated that weighted concept maps scores had good predictive validity for measuring student performance.
  • 29. 18 Internal mental models are not directly observable, and research into externalizing mental models requires valid and actuate tools (Ifenthaler, 2008). Concept map research has shown promising results in its validity as a method to represent the structure of knowledge in the mind of a student (Chen, Cheng, & Heh, 2005; Green et al., 2012; Hu, Cheng, & Heh, 2011; Ifenthaler, 2008; Ruiz-Primo et al. 2001; Rye & Rubba, 2002; Taricani & Clariana, 2006). Methods of scoring concept maps can change to reveal different levels of understanding, without changing the overall attributes of the concept map design (Green et al., 2012; Ruiz-Primo et al., 2001; Rye & Rubba, 2002). Research has shown that concept maps are not relegated to just measuring individual mental models, but also shared mental models (Engelmann & Hess, 2010, Hu et al., 2011; Ifenthaler, 2014; Johnson, Sikorski, Mendenhall, Khalil, & Lee, 2010). Web-based applications, programming, and technology, has opened up another platform for the development of educational and computer-based assessment tools (Ifenthaler, 2014a). Ifenthaler (2014a), developed a methodology for visualizing individual and shared mental models called Automated Knowledge Visualization and Assessment (AKOVIA). According to Ifenthaler (2014b), AKOVIA: "…is based on mental model theory and integrates a large number of dynamic interfaces to different online environments, for instance, learning management systems, personalized learning environments, game-based environments, or computer-based assessment environments such as PISA or PIAAC. This open architecture of AKOVIA enables a large variety of research and practical applications, such as investigation of learning processes; distinguishing features of subject domains; cross-curricular, nonroutine, dynamic, and complex skill; or convergence of team-based knowledge" (p.653). Similar to other knowledge elicitation and visualization techniques, AKOVIA runs under the assumption that individual and shared knowledge can be externalized and visually represented (Ifenthaler, 2014a). AKOVIA is applied to small amounts of text and does not require
  • 30. 19 referencing a large lexical database, which is unlike LSA’s text analysis requirement. In order for a valid AKOVIA analysis, text passages must contain at least 300 words. AKOVIA uses multistage language oriented algorithms to transform text into list form and proximity matrices, the concept maps are then generated from the text list and proximity matrices (Pirnay-Dummer & Ifenthaler, 2010). The expert or individual does not pick or create the terms, but rather the AKOVIA software picks the specific terms based-on the battery of algorithms. The number of terms used to create the concept maps are derived from that proximity matrices appears to vary based on the number of words making up the text passage (Ifenthaler, 2014a; Ifenthaler, 2014b). AKOVIA is carried out in four stages. In Stage 1, the text is put into the system where it is cleaned up, i.e. metadata is removed. Stage 2 parses out the text, stems the text, and calculates the word association. When stemming, AKOVIA associates words with their stem i.e. card and cards become the same word card. Stage 3 employs a battery of measures, such a surface matching, graphical matching, structural matching, gamma matching, concept matching, positional matching, and balanced semantic matching to produce graphical analysis. Stage four outputs the graph based on the analysis conducted in stage three. From this output, comparisons can be made between or among various individual or shared mental models. Ifenthaler (2014b) conducted research into the feasibility and validity of the AKOVIA framework, by investigating 1) whether AKOVIA’s semantic and structure matching measurements provide evidence for differences in team performance between differently composed teams based on task knowledge, and 2) are greater levels of task shared mental models and team shared mental models associated with higher team based performance when assessed with the semantic and structure matching measurements. Teams of learners were to perform tasks within an online environment. Team-based essays were analyzed with AKOVIA by comparing them with an expert reference solution. The results of the study indicated that AKOVIA was able to find differences between group performances, and support AKOVIA’s validity as a fully automated methodology for assessing team-based performance. AKOVIA
  • 31. 20 joins the list of successfully tested instruments using graphical representations for computer based knowledge assessment, such as DEEP and KU-Mapper (Ifenthaler, 2014b). AKOVIA appears to be a practical and feasible tool to employ for individual and team-based assessment (Ifenthaler, 2014a, Ifenthaler, 2014b). Team Assessment and Diagnostic Instrument (TADI) is a tool that can quickly be set up to assess team-shared cognition as well as integrate with other computer-based diagnostic tools. (Johnson et al., 2010). TADI measures the degree of knowledge within a team to determine level of shared mental models. Once a team develops a shared mental model, this model can be used to measure the potential productivity of a team (Johnson et al., 2010). Once team tasks are completed, individuals fill out a web form questionnaire. The form’s data is collected by the TADI system from every team member, and then exported into a spreadsheet application. Two measures calculated from collected TADI data are: 1) the mean representing the given factor that a team has, and 2) the standard deviation that represents the level of variation in the individual ratings (Johnson et al., 2010). TADI unlike AKOVIA does not generate visual representations of shared mental models, but rather uses statistics such as mean and standard deviation to calculate the shared mental models of teams. Knowledge Elicitation Approaches There are many methods that have been developed to try to elicit and assess structural knowledge (Kim, 2012). A few of these examples are DEEP, jMap, ACSMM, KU-Mapper, LSA, ALA-Reader, and ALA-Mapper. Although the process among the aforementioned elicitation methods differ, once the individual's knowledge structure is elicited and turned into comparable representations, it can then be analyzed and assessed. Assessment of knowledge structure occurs when compared with an expert. Wouters, van der Spek, and van Oostendorp (2011) noted that the validity of structure assessment of knowledge is rooted in the agreement of central concepts among experts within a given domain. Hard domains are domains of knowledge
  • 32. 21 where a central body of theory is generally agreed upon (i.e., biology), and soft domains are domains of knowledge where there is a lack of a centralized body of knowledge (i.e., political science) (Wouters et al, 2011). Hard domains should elicited consistent similar knowledge structures across experts due to the "hardness" (degree of adherence to a central theory) of a domain, whereas soft domains apply looser definitions of central concepts, thus increase the probability of variance among experts' knowledge structure (Keppens, 2007). Research has show that hard domains such as computer programming and serious gaming dealing with procedural knowledge are appropriate for structural knowledge assessment (Keppens, 2007; Wouters et al, 2011). Knowledge elicitation and eventual assessment is more challenging within soft domains. Dynamic Enhanced Evaluation of Problem Solving (DEEP) is a method created to assess learning within complex soft domains, such as ill-defined problems within a medical diagnosis (Koszalka & Epling, 2010). DEEP uses causal influence diagramming to create knowledge elicitation and representation. In a two-step process, the first is to identify the recognizable patterns experts use to solve ill-defined problems. The next step is to develop measures of similarity between the novice and experts, and observe how the novice's pattern begins to match the expert's over time. DEEP collects responses to a problem from both novice and experts alike. Responses to complex problems by an expert provide the baseline from which the novice patterns are compared. Learning is then assessed by how close the novice(s) patterns come to or match the pattern of responses by the expert (Koszalka & Epling, 2010). In the case of DEEP, written exams are not the measure of success, but rather the knowledge structure that is elicited by the process the student utilizes to solve an ill-defined problem. The excel-based application called jMap is software designed to elicit mental models, assess changes in the models, and compare models to that of an expert (Shute, Masduki, Donmez, Dennen, Kim, Jeong, & Wang, 2010). JMap is a tool designed to assess causal diagrams (diagrams that show cause and effect relationships). The jMap process is programmed to enable the elicitation, recording, and coding of mental models (students create models using excel's auto
  • 33. 22 shape tools). This process quantitatively assesses the models over time (code and translate models into a transitional frequency matrix), and compares the models to experts (compile raw scores and compare quantitative measures such as percentages of shared links between the expert and novice) (Shute et al., 2010). The Analysis-Constructed Shared Mental Model Methodology (ACSMM) is a method for comparing the shared mental models of teams. According to Johnson and O'Connor (2008), ACSMM "translates individual mental models into a team sharedness map without losing the original perspective of the individual, thereby representing a more accurate representation of the team sharedness" (p.188). It is a five-phase process and is similar to most methods of capturing structural knowledge. The first phase consists of the elicitation of knowledge by compiling a list of terms from an expert. Phase two consists of each individual member of the group constructing their own individual mental model. During phase three, the concepts and relationship between concepts are coded in order to make similarity comparisons among the maps. Once the individual mental models are coded, phase four consists of determining what concepts are shared among the mental models of the team/group members. The final phase consists of constructing the ACSMM, which is a multi-step process that constructs a single team's mental model. Once a model is constructed, it can be compared over time with that of an expert and or other teams. Other Approaches for Eliciting Knowledge Structure One of the most widely used methods for text analysis is Latent Semantic Analysis (LSA), which is a vector space modeling technique for representing word meanings (Olney, 2009; Olney, 2011). A vector space model (VSM) uses vectoring statistical techniques to represent the similarity of a collection of words as a cosine between vectors. The vector space model is usually captured as term occurrences in paragraphs. According to Evangelopoulos (2013), LSA has been shown to model cognitive functions, learning and understanding of word meaning, episodic memory, semantic memory, discourse coherence, and metaphoric
  • 34. 23 comprehension. One of the practical applications of LSA is automatic essay grading (Evangelopoulos, 2013). When automatically grading essays, according to Koul et al. (2005), LSA "compares the interrelationship of words in a particular essay with the interrelationship of words in the essays used to train the software" (p.234). The process is as follows, first essays scores are determined by a comparison of the new essay's vector to a large set of rater scored student essays (at least 100 student essays). Next, a new student essay receives the score that the nearest previously scored essay received. If the new essay is nearest to an existing scored essay that had received a “2,” then the new essay also receives a “2." LSA requires thousands to millions of words to derive high dimensional semantic spaces and hundreds of rater scored essays that span the full range of possible scores in order to make its comparisons, which can lead to increased costs (Koul et al., 2005). LSA requires a large database of documents and terms, which makes this method relatively expensive, but a functional multi-modeling document retrieval and analysis tool (Deerwester, Dumals, Furnas, Landauer, & Harshman, 1990; Evangelopoulos, 2013). The process of indexing for document/essay retrieval using LSA requires thousands of documents and terms, which is used to create high-dimensional representations (about 100 dimensions) using mathematical techniques to retrieve documents from query terms (Deerwester et al., 1990). Although LSA is a popular method for scoring essays, it borders the line of practicality due to the number and maintenance of terms. This impracticality may be providing the catalyst for the development of other methods of automated essay assessment techniques such as the ALA- reader, which requires less terms (20 - 30) as oppose to a database full of thousands of documents and millions terms (Deerwester et al., 1990; Taricani & Clariana, 2006). Another approach to assessing written text is the use of data mining. Dringus and Ellis (2005) analyzed the use of data mining as a strategy for assessing online discussion forums. Data mining does not employ Pathfinder analysis in extracting structural knowledge, but rather uses a
  • 35. 24 method of creating queries to extract specific data to improve a teacher’s ability to assess discussion threads. In the case of a discussion thread, the data mining method queries databases to look for previously unknown interrelationships among concepts. Once an instructor can query and pull the data from a discussion forum, he or she can more efficiently understand how the student creates the relationships among concepts. Forums can last from days to weeks and contain hundreds of threads (Dringus & Ellis, 2005). The authors indicate that the data mining technique offers promise, but data mining should aid in the process of assessing online forums and not take the place of in-depth human analysis of a student’s participation in a forum. Knowledge Unit Mapper or KU-Mapper is supplemental software that streamlines the collection of proximity data using three different elicitation approaches for use in KNOT software as well as standard MDS software (Clariana, 2003). The collection of proximity data is important to the Pathfinder approach for measuring and assessing structural knowledge because it provides the quantitative measures for the relatedness data between pairs of concepts. KU- Mapper collects proximity data in three ways, 1) traditional Pathfinder rate pair-wise comparisons of all terms, 2) an abbreviated approach using list-wise comparisons, and 3) semantic-map card- sorting tool that allows the participants to add terms to an on-screen semantic-map (Clariana, 2003). Clariana and Wallace (2009) compared pair-wise, list-wise, and clustering approaches for the construction of structural knowledge. The pair-wise approach compares two terms at a time in which the individuals rate the relatedness of term pairing on an ordinal scale. With the list- wise approach, individuals make comparisons of one term to another term from a list of terms. The clustering approach takes all terms that are to be compared and allows the individual to move terms that are related close together and unrelated terms further apart. Using the KU-Mapper in conjunction with KNOT software, Clariana and Wallace (2009) were able to ascertain that all three approaches (pair-wise, list-wise, and clustering) create similar network representations of
  • 36. 25 structural knowledge at the group level. The significance of this finding is that it was a wholly computer-based method for interpreting and measuring knowledge structure. Analysis of Lexical Aggregates (ALA)-Mapper, evaluates network graphs by comparing terms and distances between terms in a student's network graph with that of an expert's graph (Koul et al. 2005). Proximity data derived from the ALA-Mapper software is based on the conversion of node-link information within concept maps. It only focuses on the links or distance between terms and not the link labels (Clariana, 2010b). Proximity data is then created from the distances between nodes. Once proximity data is created, the process of assessing structural knowledge begins using the Pathfinder approach. The ALA-Mapper approach is another computer based method for assessing essays. Regardless of the methods used to create the proximity file, assessment of structural knowledge is rooted in the Pathfinder finder approach. The advantage of computer-based methods of interpreting and measuring structural knowledge is that they can be relatively low cost, easy to use, and easy to interpret (Koul et al., 2005; Toranj & Ansari, 2012). These computer-based methods can also be used as teaching tools within classrooms (Toranj & Ansari, 2012). Given these advantages of computer-based approaches, further tools are being developed and validated such as the ALA-Mapper and ALA Reader. The ALA-Reader and the ALA-Mapper are computer-based approaches for assessing structural knowledge derived from essays. The Analysis of Lexical Aggregates approach uses student essays to derive knowledge structure representations (Clariana et al., 2009). The ALA- Reader is software developed by Clariana (2004), that converts essays into Pathfinder network representations by aggregating key terms at the sentence level both within sentences and across sentences (Clariana & Wallace 2009). The ALA-Reader aggregates the sentences and creates proximity data files for Pathfinder; then, KNOT software converts the proximity data into graphical representation of the PFNet. The ALA-Reader approach is a computer-based process for deriving knowledge structure from essays with the goal of creating a fully automated structural knowledge assessment tool.
  • 37. 26 Assessing Knowledge Structure with the Pathfinder Approach The constructed structural makeup of the interrelationships among concepts within the mind of the learner is the individual's knowledge structure within a given domain. Knowledge structure can be represented by core aspects such as domain concepts, nature of the relationships between concepts, and the strength of these relationships; these representations can be elicited and measured by creating analogies, concept maps, semantic relationship tests, network graphs, and semantic nets (Murphy & Suen, 1999). Once knowledge structure is represented, for example, as a network graph, it can be measured by comparing these graphs to an instructor/expert and/or other student's network graphs (Clariana & Wallace, 2007; Goldsmith et al., 1991; Jonassen et al., 1993). Knowledge structure represented as a network graph consists of two components: nodes, which represent concepts, and lines between nodes, which represent the relationships between nodes or concepts. It is the relationship between nodes that, according to Goldsmith et al. (1991), is the critical attribute of structural representations of knowledge. By using a structural assessment approach, the relationships between concepts can be assessed. According to Goldsmith et al. (1991), the structural approach consists of three steps: knowledge elicitation, knowledge representation, and evaluation of individual knowledge representations. In step one, knowledge elicitation involves the evaluation of the individual's understanding of the relationship between concepts. The second step involves defining the representation of the knowledge. Some frequently used procedures for defining the knowledge representation are multidimensional scaling (MDS), cluster analysis, and additive trees (Goldsmith et al., 1991). The final step in the structural approach is to evaluate constructed knowledge representations based on a standard. One method of evaluating constructed knowledge representations is to compare the knowledge structure of a novice to that of an expert (Clariana & Wallace, 2007). Comparisons are then assessed base on a degree of similarity, with similarity being defined as number of links in common between comparison PFNets divide by the total number of links.
  • 38. 27 Figure 2.1. Shows a simple network graph produced by the Pathfinder algorithm using the terms displayed on the left. When multiple PFNets are created, they can be compared between each other and a degree of similarity can be calculated by taking common links and dividing them by total links. Goldsmith et al. (1991) indicated that research has shown that with instruction, a student's knowledge representation becomes similar to an expert. The structural approach elicits knowledge. Then from the elicited knowledge, structures are built (i.e., from Pathfinder networks, MDS etc.). Once external structures are created, they can be compared and evaluated. The results of the evaluation produce a correlation or other measure of similarity between the novice and the expert. We can then begin to assess performance from these correlations. The higher the correlation is between novice and expert knowledge structures; the better the student has performed. Hence a scale can be created assessing performance based on correlation of similarity of individual knowledge structures. The research conducted by Goldsmith et al. (1991) attempted to validate assessment methods of students' cognitive representations (knowledge structures). In that study, Pathfinder networks were a network graph derived from the Pathfinder algorithm. This algorithm used the
  • 39. 28 relationship between pairs of items to produce a network based on the relationship of several item pairs, reported as similarity, defined as the ratio of common links between two network representations divided by total number of links (Goldsmith & Davenport, 1990). A network graph obtained from the Pathfinder algorithm is a structural modeling technique. It represents items or concepts as nodes in a network, and it represents relationships as links between nodes. The links within a Pathfinder network do not identify the nature of the relationship between the concepts (Cook, Neville, & Rowe, 1996). From the Pathfinder network, a visualization of knowledge structure can be formed. Closeness, according to Goldsmith et al. (1991), is a “method of quantifying the configural similarity between two networks having a common set of nodes" (p.89). Closeness is used to quantify the similarity between graphs by assigning a similarity value between 0 and 1. In this study, Pathfinder created the network of concepts and Closeness assessed the similarity. The results of the study indicated that Pathfinder networks contain a unique variance over proximity rates. In addition, MDS and Closeness had better predictably of classroom test network similarity compared to raw proximity data correlations. It is important to note that the number of concepts used to create the Pathfinder networks and assess them via Closeness varied between 5 to 30 terms. They found that increasing the number of terms within pair-wise comparisons increased the predictive validity of the Pathfinder network. A copy of results from the Goldsmith et al. (1991) study in Figure 2.2 shows that mean predicative validity of the Pathfinder network increased as the number of terms increased.
  • 40. 29 Figure 2.2. Shows that mean predictive validity of exam performance increases as the number of terms increase. Adapted from: "Assessing structural knowledge," by T. E. Goldsmith, O. J. Johnson, and W. H. Acton, Journal of Educational Psychology 83, pp. 88 - 96. Copyright 1991 by the American Psychological Association. Curtis and Davis (2003) presented Pathfinder as a technique for measuring and assessing knowledge structure within account education. In that study, managerial accounting student's knowledge structure (n=56) using pair wise comparisons were analyzed via Closeness scores (as defined in Goldsmith et al. (1991)); declarative knowledge was measured using multiple choice questions and two problem solving questions; case performance was assessed via a simulated case-based consultant report; and Self-Efficacy for audit tasks was assessed using a three-item self-efficacy scale measuring student's degree of confidence in auditing tasks. In order to obtain a measure for Closeness, instructors provided a list of 30 terms within the related domain of instruction and a term file was created for inclusion into Pathfinder. A rating session began where pairs of concepts were displayed to the student who then rated them for relatedness. Once all possible pair-wise comparisons were rated, a network structure could then be developed. Course instructors also went through this process to create a referent network structure. Once ratings were complete, a proximity file was created for each participant. They compared student
  • 41. 30 raw proximity data with that of the instructor's raw proximity data, and Pathfinder generated physical representations of the data. The results of the Curtis and Davis (2003) study found a significant difference in Closeness scores (the correlation between two network structures) between pre and post instruction. Knowledge structure post-instruction Closeness scores were higher than knowledge structure pre-instruction scores (mean r = 0.23 vs. mean r = 0.37). Closeness scores also positively correlated with the exam scores and case analysis. In the case for testing discriminant validity within case analysis, they found that Closeness scores provided an incremental prediction in performance similar to conventional scores of the case analysis. The Closeness scores were able to discern between each case based on proximity data. The Curtis and Davis (2003) study provided continued support for the use of proximity data as a predicator of student performance. In the Curtis and Davis (2003) research as well as in the Goldsmith et al. (1991) study, the list of terms created for eliciting knowledge structure were derived from the course author or an expert. Comparing Knowledge Structures with Experts Once knowledge structures are elicited, they take the quantifiable form of proximity files and then a graphical form as network graphs (Chen & Hen, 2005; Clariana et al., 2009; Clariana & Wallace, 2007; Murphy & Suen, 1999). The elicited knowledge structure of a single individual enables us to see how the concepts and their relationships are structured within a given domain. An individual’s knowledge structure alone does not allow for the assessment of the quality of that structure. While it can provide us a structural representation of domain related knowledge, it does not tell us to what degree the structure is correct or incorrect. In order to make that judgment, elicited knowledge structures need to be compared both to other individual’s within the domain and also with the expert in the given domain (Clariana et al., 2009; Clariana & Wallace 2007; Murphy & Suen, 1999; Ruiz-Primo & Shavelson, 1996).
  • 42. 31 In the Murphy and Suen (1999) study, experts were used to generate a list of concepts, provide formative feedback in developing semantic relationship testing materials, and establish a benchmark from which Pathfinder networks were compared. When multiple experts are used as a referent, the averages of the expert Pathfinder nets are used to provide a single referent structure (Goldsmith et al., 1991; Murphy & Suen, 1999). When assessing knowledge structure from closed or open-ended essays the expert is used to create a list of terms, and or validate a list of terms (Clariana et al., 2009; Clariana & Wallace 2007; Taricani & Clariana, 2006). Once the expert referent structure has been created, then individuals or groups can be compared using a correlational measurement in order to see how closely related the individual scores are compared to the expert (Clariana & Wallace 2007; Clariana et al., 2009; Diekhoff, 1983). Knowledge structure can now be assessed because as the relationship between the expert and non-expert become stronger (via correlation), we can gauge the quality of learning based-on the strength of the relationship. Past studies have indicated that as a result of instruction the student’s knowledge structure becomes more like the expert’s (Diekhoff, 1983; Goldsmith et al., 1991; Jonassen et al., 1993; Kim, 2012; Thro, 1978). Thro (1978) found by comparing regression models that associative structures (patterns of relationships) of knowledge contributes significantly to prediction of achievement. Diekhoff (1983) used Pearson correlations to validate relationship judgment tests (a method where students use a numerical scale in judging the strength of the relationship between pairs of concepts), finding that relationship judgments were reliable for assessing structural knowledge when comparing the results with that of an expert (average r = 0.58). A computer-based knowledge mapping system was develop by The National Center for Research on Evaluations Standards and Student Testing (CRESST) (Herl, et al. 1999). Research on this computer-based tool for constructing knowledge maps consisted of comparing computer generated knowledge maps of both students and student groups. The Herl et al. (1999) research
  • 43. 32 used four expert-maps as the criteria from assessing the pre and posttest scores. By using expert referent map comparisons, they were able to gauge the success of instruction from pretest score to post test scores. The results of this study support the use of multiple expert referent comparisons as a methodology for gauging student learning using knowledge mapping techniques. Analysis of Lexical Aggregates Approach In order to assess structural knowledge within complex domains, essays are a reliable assessment tool. The reason for this is, whether intentional or not, essays contain a reflection of the individual's knowledge structure (Clariana & Wallace, 2007). Some of the underlying problems with essay assessment are that it is costly to asses and administer on a large scale, it tends to be subjective, and the actual structure of the student's knowledge is not overtly apparent (Koul et al., 2005). The actual structure of the individual's knowledge can be clouded in additional data not directly related to the topic at hand or organized in such a way that structural knowledge representation becomes clouded. In the quest to elicit knowledge structures stored within the mind of the individual, Pathfinder network analysis is an established and reliable approach to capturing and assessing structural knowledge (Curtis & Davis, 2003; Dearholt & Schvaneveldt, 1990; Jonassen et al. 1993). The ALA approach was developed as a computer-based means of aggregating term data from essays and then applying the Pathfinder approach to measure and assesses knowledge structure. By creating a valid computer-based approach for assessing structural knowledge housed within essays, it is possible to reduce or eliminate some of the problems that plague essay assessment, such as cost and human rater bias (Nitko & Brookhart, 2007). There is growing research on validating essay assessment using the computer-based ALA approach.
  • 44. 33 Validating the ALA approach The ALA-Reader software is a utility used to translate written text summaries into a proximity file for analysis by PCKNOT software (Clariana & Koul, 2004). Using ALA-Reader software, Clariana and Koul (2004) investigated a computer-based approach to translate text into network graph representations. In this research, ALA-Reader translated twenty-four written summaries of the heart and circulatory system into proximity data for processing in PCKNOT software. The ALA-Reader used 26 terms generated by text-occurrence frequency analysis, analyses term co-occurrence, converted term co-occurrence into propositions, and finally aggregated across sentences into a proximity array. The PCKNOT software then transformed the proximity data into visual PFNets. Sixteen of the text summaries were rated by 11 pairs of human raters and a PFNet rating. The ALA-Reader PFNet rating was ranked the 5th highest correlation (r = 0.69) out of 12 scores. The Clariana and Koul (2004) research indicates that a computer-based method for capturing knowledge structure is valid as it assesses similar knowledge to that of the human rater. A study conducted by Koul et al. (2005) sought to continue the investigation on methodologies for computer-based approaches for scoring essays and concept maps. This study compared several computer-based approaches for scoring concept maps against essays scored by human raters. The three computer-based approaches were ALA-Mapper for concept-maps, ALA- Reader for essays, and Latent Semantic Analysis for essays. The ALA-Reader ranked fifth out of 13 raters, which is consistent with the Clariana and Koul (2004) study, whereas LSA ranked ninth out of 13 raters. Only the ALA-Mapper was used to score concept-maps compared to human raters using quantitative rubrics (a rubric focusing on correctness) and a qualitative rubric (a rubric focusing on mental representation of content). It is interesting to note that ALA-Mapper ranked higher (1 and 5) in the list of scores when the qualitative rubric was used compared to score concept maps with the quantitative rubric (8 and 12). Since the qualitative rubric used by the raters focuses on mental representations, the ranking of 1 and 5 provides validation of the
  • 45. 34 ALA-Mapper as an approach for measuring and assessing structural knowledge. The results of this study indicated that both the human raters and computer-based scores capture some of the same information about process knowledge within a given domain, thus presenting a step toward validating a low-cost automatic process for scoring essays in this way. As more and more research begins to indicate that computer-based methods are a cost- effective way to assess essays (Clariana & Koul, 2004; Koul et al., 2005), the techniques for automatically assessing essays need to be fine-tuned and validated. Clariana and Wallace (2007) conducted a proof-of-concept investigation on deriving and measuring knowledge structure from essay exams from individuals and teams. The study scored essays using ALA-Reader software employing the within sentence aggregation approach and a new approach (added to the software) called the liner aggregation approach. The linear approach aggregates terms both within and across sentences and results in always producing a connected graph; whereas connectedness is not guaranteed in the sentence- wide approach, which tends to produce a discontented graph of multiple unconnected clusters (Clariana & Wallace, 2007). The study compared linear aggregation scores with the within sentence aggregation scores as well as human scores and multiple choice scores. A frequency list of terms provided 30 important concept terms, which were then stemmed (removal of suffixes to obtain root words) and used by the ALA-Reader. The study found that ALA-Reader scores fell between the two human rater scores for both the sentence and linear approach. The sentence approach tended to score essays higher than the linear approach, although the linear approach did fall between the scores of the human raters. When correlating the scores to the human rater and the multiple-choice test, the ALA- Reader using the linear approach correlated higher to the human rater over the sentence approach, and both linear and sentence approaches correlated low with the multiple choice tests (sentence approach r = 0.17 and linear approach r = 0.39 ). This correlation should be low since multiple choice tests tend not to assess structural knowledge. These results continue to support the ALA-
  • 46. 35 Reader approach as a tool to measure structural knowledge as well as using the linear-wise approach as the algorithmic process for providing proximity files. The Clariana and Wallace (2007) study attempted to investigate whether the ALA-Reader could measure team knowledge by grouping the essays based on a high score group (n = 14) and lower score group (n = 15) - determination was based on the median split of the human rater scores. PFNets were created by averaging the proximity files within a group resulting in a single PFNet for each group. By measuring similarity (the intersection of links in common between two PFNets divided by the union of all unique links in the two), this results in a score ranging between 0 (no similarity) to 1 (perfect similarity). The research found that the high scoring group (r = 0.31) was more similar to the expert’s essays then the low scoring group (r = 0.19). The results of this study provided support for the ALA-Reader as a valid approach from measuring team knowledge derived from essays. The ability of the ALA-Reader as a tool to measure team or group knowledge could provide a computer-based cost effective approach from validating human rater scores. The Clariana et al. (2009) study investigated how manually replacing pronouns before analysis with their noun referents might improve individual scores using both the linear and sentence aggregation approaches. Specifically, pronouns are a serious problem for text pattern matching, because it is difficult to accurately associate the pronoun with its antecedent noun and so derived patterns will contain more error. Overall the result indicated that the linear aggregation approach correlated with the human raters (r = 0.74) better than the sentence aggregation approach (r = 0.44). Replacing the pronouns with their referents had little effect on the linear aggregation approach (r = 0.71) and had a small positive effect on the sentence aggregation approach (r = 0.51). The authors of the study concluded that the linear approach is less influenced by pronouns and superior to the sentence aggregation approach overall when comparing individual scores, and narrowly superior when conducting group-wise comparisons. Therefore, adding an additional pronoun handling subroutine would not substantially improve the
  • 47. 36 validity of the ALA-Reader when deriving knowledge structures from essays, the ALA approach is robust. Creating a List of Terms When eliciting knowledge structure from essays, the foundation of the process is creating a list of terms. These terms become the concepts used for pattern matching that make up the linked nodes forming the knowledge structure represented as a network graph. Past research derived concepts from a predefined list of terms in order to create network graphs (Koul et al., 2005; Curtis & Davis, 2003; Goldsmith et al., 1991). For example, close-ended network graphs have a predefined set of terms, whereas open-ended network graphs can have a high degree of variance in terms (Taricani & Clariana, 2006). Essays are intrinsically open-ended because it is nearly impossible and unrealistic to constrain a writer to just one list of terms. Taricani and Clariana (2006) developed a technique for automatically scoring open ended network graphs using the frequency counts of words in order to determine the important words that can be extended to essay scoring. This approach can be applied for deriving a list of terms for scoring essays. In actual educational settings, deriving knowledge structure from essays produces open- ended network graphs, which is a more accurate way to capture knowledge structures (Taricani & Clariana, 2006). Since the inception of the ALA-Mapper and ALA-Reader approach, the focus of research has been on the algorithmic processes for the aggregation of data such as the linear aggregation approach verses the sentence aggregation approach (Clariana & Wallace, 2007; Clariana et al., 2009). Most research conducted on validating the ALA-Mapper for maps and ALA-Reader for essays did not have much variance in the number of terms (ranging from the mid to high twenties). The foundation for measuring structural knowledge is rooted in the creation of a term list. One could proffer that without a valid list of terms you cannot have a valid measure of structural knowledge.
  • 48. 37 What do we know about the number of terms needed to validly represent knowledge structure? Research conducted by Goldsmith et al. (1991) shows that when using pair-wise rating tasks, the predictive validity of Pathfinder networks increase in an almost linear fashion as the number of terms increase. When scoring network graphs Clariana and Taricani (2010) investigated how increasing the number of terms affected the predictive ability of network graph scores. These results were contrary to what they expected. They created three lists based on 16 terms, 26 terms, and 36 terms. The result was that the network graphs with the 26 terms had the best predicative ability, which was in contrast to the Goldsmith et al. (1991) study. One major difference between the two studies is that in the Goldsmith et al. (1991) study, the network graphs were not open ended, whereas the Clariana and Taricani (2010) network graph were open-ended. When focusing solely on the term list alone, another difference between the two studies is that in the Clariana and Taricani (2010) study, the term list intervals were 10 terms, whereas in the Goldsmith et al. (1991) study the term list intervals were 5 terms (the minimum term list had 5 terms, the maximum term list had 30 terms). Past research on the ALA-Reader (Clariana et al., 2009; Clariana & Koul, 2008; Clariana & Wallace, 2007; Taricani & Clariana, 2006;) would support the Clariana and Taricani (2010) results as these studies created term lists with terms in the mid to high twenties for essay about 500 words long. In the Clariana and Taricani (2010) study with concept maps, variances in the student’s selection of important terms could have affected the results. According to Clariana and Taricani (2010), “apparently, a few of the most important terms contribute to most of the predictive ability of the concept maps created by the non-experts” (p. 170). This statement indicates that not only would the number of terms effect predictability but also a combination of the number of terms along with the quality of the terms. Clariana and Taricani (2010) conclude that further refinement of the tool and approach is warranted and also that the number of terms may be a confounding variable when attempting to refine the tool and approach.
  • 49. 38 A recent study conducted by Clariana, Wolfe, and Kim (2014) specifically recommended that the number of terms used in analysis should be the focus of future research. Their research sought to continue the validation of the ALA-approach using Pathfinder analysis with both a linear approach and a minimum distance approach applied to narrative and expository lesson texts. The linear approach was performed by the ALA-Reader, which generates the proximity files containing only ones and zeros to indicate the sequential occurrence of terms in text (Clariana et al., 2009). The minimum distance approach used an Excel spreadsheet to establish minimum distance values between all of the selected important terms (Clariana et al., 2014). Both the sequential approach (median r = .70) and the minimum distance approach (median r = 0.67) were a comparative measurement of text structure as a proxy of knowledge structure. Additionally, the number of terms used by the software in the study was 17. The study provides additional evidence that both approaches are valid, although the linear approach performed slightly better than the minimum distance approach. The Clariana et al. (2014) research is another study that sets varying the text to data processes as an independent variable but leaves the number of terms static. Although the authors state several limitations to the generalizability of the research, the establishment of an optimal number of terms is of particular interest to the current study. Clariana et al. (2014) noted that all participants did not use all 17 of the important terms in their essays; that missing terms could negatively impact PFNet structure; since missing important terms typically create multiple missing links; therefore future research is needed to determine the optimal number of terms for pattern matching. In summary, past research on the ALA approach for scoring network graphs and essays has focused primarily on how knowledge is elicited as proximity files, for example, comparing list-wise, pair-wise, and clustering approaches (Clariana & Wallace, 2009) within sentence approach verses the linear aggregate approach (Clariana & Wallace, 2007; Clariana et al., 2009), sequential approach (Clariana et al., 2009), and minimum distance approach (Clariana et al.,
  • 50. 39 2014). The results of these studies continued to increase the validity of the ALA approach as a means to elicit structural knowledge, but one area of exploration has been lacking in this domain which is the exploration of the number of terms used to create the proximity files (Clariana & Taricani, 2010; Clariana et al., 2014). All of these previous studies asked content experts to give a frequency list of terms. Therefore, each study had a list of terms of different length, partly based on the content of the participants' essays, but mainly on the intuition of the experts. The purpose of the current investigation is an attempt to answer the question: “does an optimal number of terms exist that can be used to create proximity files or are more terms better?" To answer this question the current investigation plans to hold the elicitation approach for creating the proximity files constant and focus on varying the number of terms used to create the proximity files in order to determine what effect it has on the convergent validity of essay scores relative to human scores. In the end, this investigation hopes to find a starting point for the creation of a protocol for optimizing the quality and quantity of terms used to create proximity files and further the validity of the ALA approach as a mechanism for the elicitation of structural knowledge.
  • 51. 40 Chapter 3 METHODOLOGY The purpose of this study was to further validate the ALA-Reader as an automated tool for scoring essays by varying the number of terms used to create raw proximity files and then compare the derived network graphs to five referent network graphs derived from essays. Construction of a valid network graph should reveal the underlining structural knowledge of the student within a given domain. This validation was conducted by varying the number of terms used to score network graphs similar to the Clariana and Taricani (2010) study and comparing them to other groups and individuals (Clariana & Wallace, 2007; Clariana et al., 2009; Clariana, 2010b). Further validation of the ALA-Reader was conducted by comparing the structural makeup of network graphs from varying degrees of expertise (i.e., comparing an expert's essays with lower scored essays (Clariana & Wallace, 2007). This chapter describes the methods of the study. Participants Archived exam data was secured from students enrolled in a Distance Education Strategic Studies program at a Senior Military Learning Institution. These students were U.S. Military Officers ranked as 05 (Lieutenant Colonels or branch equivalent) and 06 (Colonels or branch equivalent), senior international officers ranked 05 or higher, and senior grade GS-14 or higher U.S civilian government employees. Course enrollment was 361 students. All participants have an undergraduate college degree and are senior civilian or military leaders.
  • 52. 41 The standard IRB process for conducting research at the Pennsylvania State University was adhered to in order to obtain approval to conduct the research. The IRB approved the research as non-human subjects research since archived student data was collected for the current study. In addition to the standard IRB process, approval was obtained by the Office of Institutional Research from the military educational institution in which the archived data was obtained. As a condition of approval, all references that directly or indirectly identify a student or students within the archived data set and or research paper were removed. Course Materials Students who attended this military educational institute were enrolled in one of two programs: a nine-month resident education program, or a two-year distance education program. Although the curriculum of these two programs differs in organization and time, students in both programs are awarded the same graduate degree - a Masters in Strategic Leadership. The focus of the current research is on essays produced by students enrolled in the Distance Education Program’s (DEP) core curriculum. The DEP is a middle-states and JPME-1 accredited master’s degree program. The DEP consists of eight primary courses, an elective course, and two resident courses taken over a two-year period. All courseware is available online with a few books that are mailed to the students. Each course starts with a directive, which is similar to a course syllabus. Course content is organized into multiple blocks of instruction. Each block is divided into multiple sections and within each section are multiple lessons, which are the basic units of
  • 53. 42 instruction. For most of the courses, students are assessed by writing a comprehensive essay and participate in an online-evaluated forum. The focus of this research is on the end of course comprehensive essay the student completes as part of the course’s core curriculum requirement. A course is created by a course author who is responsible for the content creation and defining the evaluation criteria for each course. All course authors must have a graduate degree in strategic leadership or specialize in a strategic area earning a doctorate. Course content and evaluation criteria are validated by other faculty members within the course author’s year group, year group directors, and the department chairman through a series of meetings that take place once one iteration of the course is completed and before the next iteration of that course begins (approximately a year later). All course material is then converted for online delivery and management by an instructional support group (ISG). The ISG is a group of individuals experienced in developing online courseware and instructional methodologies. Table 3.1 shows the DDE course content. Table 3.1. DEP Courseware Content. ____________________________________________________________________ Content Description e-Book HTML page that contains course instruction and scaffolds the learning. Lectures Video or audio recordings of lectures by guest speakers that have appeared at the USAWC or other schools pertaining to the objectives of the course.
  • 54. 43 Presentations PowerPoint presentation that discuss process, policies or plans. Interactive Learning Modules Multi-media learning modules that require students to interact with the instruction. Models Images or graphics that demonstrate a policy or process. Readings PDF documents that contain information relevant to the completion of the course or additional material if the student wants to learn more. Self-Diagnostics Non-graded multiple-choice quizzes that allow the student to assess their own understanding of the course objectives. External Links Websites that are external to the courseware, providing additional support to the course objectives. Library Resources Online library access to download additional readings, books, and presentations. _______________________________________________________________________ Students access all courseware materials online through a student management system called the Online Automated Student Information System (OASIS) and the courseware website. OASIS is a customized student management system (SMS) designed specifically for the distance and resident programs. OASIS manages the student’s access to courseware but does not deliver the courseware. Table 3.2 lists the features of OASIS.
  • 55. 44 Table 3.2. Features of OASIS. ____________________________________________________________________ Features Purpose Student Authentication Authenticates students into the system. Courseware Access Provides links to the courseware that exists on another server. Forum (used to evaluate students) Provides asynchronous discussion between students including the course author. Also, provides function to upload and download files. It is not used as a social learning tool/vehicle, but rather as an assessment tool used near the completion of a course. News and Events A list of news and events the student should be aware of regarding the course, class, or curriculum. Task List List of tasks the student needs to accomplish; it usually holds additional information about the course requirements. Other administration tools Other features and student self-service options that pertain to administrative links, student info, and college links unrelated to the courseware itself.