Directed versus undirected network analysis of student essays

Directed versus
undirected network
analysis of student essays
Roy Clariana (RClariana@psu.edu)
Penn State
IWALS 2018
6th International Workshop on
Advanced Learning Sciences
Perspectives on the Learner:
Cognition, Brain, and Education
University of Pittsburgh, USA
JUNE 6-8, 2018

Directed versus undirected
network analysis of student essays
Abstract
Knowledge structure (KS) is an expansive and expanding area
of research with a rich set of theoretical and software tools
for eliciting, representing, and analyzing KS. KS is especially
amenable to network graph methods.
Based on our work with analysis of concept maps, in 2003 I
developed a text-to-network aggregation of lexical aggregates
(ALA) approach that uses Pathfinder network scaling.
ALA pattern matches for preselected key terms (synonyms
and metonyms) in a sequential forward pass through the text,
pairs of terms discovered are entered as links into a
symmetric n x n array that is then analyzed using Pathfinder
analysis. For theoretical and pragmatic reasons at that time,
ALA was based on the representation of text as undirected
networks.
Slide 2 of 20
Research question: For analysis of student essays using ALA-
Reader, which is better, undirected networks or directed networks?

Elicit  represent  compare
graph
building
similarity
ratings
semantic
proximity
word
associations
ordered
recall
free
recall
additive
trees
hierarchical
clustering
ordered
trees minimum
spanning
trees
link
weighted
Pathfinder
nets
Networks
Dimensional
principal
components
MDS – multidimensional scaling
cluster
analysis
expert/
novice
qualitative
graph
comparisons
quantitative
graph
comparisons
relatedness
coefficients
scaling
solutions
C of PFNets
Trees
Knowledge
representation
Knowledge
comparison
Knowledge
elicitation
Jonassen, Beissner, & Yacci (1993), page 22
3 of 20
concept maps
written text
card
sort
Raw distance correl
listwise
pairwise
Origin of using undirected

Why has ALA used undirected?
4 of 20
X
X
X
X
n x n = 16
(n x n) - n = 12
n = 4 terms
asymmetric, undirected
((n x n) – 1)/2 = 6
symmetric, directed
terms asymmetric symmetric
0 0 0
3 9 3
6 36 15
9 81 36
12 144 66
15 225 105
18 324 153
21 441 210
24 576 276
27 729 351
30 900 435
33 1089 528
Pragmatics! How big is a concept map and do students use arrows? And how
many pair-wise comparison can you make before you go daft?
w/o diagonal
salt – pepper
pepper – salt
More data
Origin of using undirected

ALA-Reader example: Undirected
vs. directed Networks
Slide 5 of 20
humanists
jobsatisfaction
productivity
employees
empowered
humanists -- 1 0 0 0
job satisfaction 1 -- 1 0 0
productivity 0 1 -- 1 1
employees 0 0 1 -- 1
empowered 0 0 1 1 --
humanists -- 1 0 0 0
job satisfaction 0 -- 1 0 0
productivity 0 0 -- 1 0
employees 0 0 0 -- 1
empowered 0 0 1 0 --
Symmetric array
(undirected Pfnet)
Asymmetric array
(directed Pfnet)
Text example: “Humanists believed that job satisfaction is related to productivity. They
found that if employees were given more freedom and power, then they produced more”.
humanists employees
job satisfaction
empowered
productivity
humanists employees
job satisfaction
empowered
productivity
w/o sentence break
Pathfinder software 

Contrast raw data from ALA-Reader
vs. Document-term matrix (i.e., LSA)
Slide 6 of 20
DATA: Expert link.txt
similarities
17 items
1 decimals
0.1 min
1 max
matrix:
1 1 0 1 0 1 1 1 1 0 0 1 0 1 1 0 0
1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0
0 0 1 0 1 1 0 1 1 1 0 1 0 0 0 1 1
1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0
0 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
Expert Essay  ALA-Reader software (n x n data points) 
Expert Essay  doc-term matrix (n data points)
terms -->
management
employee
product
situation
work
TQM
customers
organization
quality
scientific_mxnagement
efficiency
humanistic
contingency
feelings
needs
service
planExpert: The most basic is the classical theory, which also includes scientific mxnagement. Managers who follow this theory focus more on business efficiency tha7 8 7 6 4 0 1 2 4 1 1 1 2 1 2 2 0
b01: The Classical style of management includes an old school and scientific approach to organization. Originally relationship between employer and employee3 7 1 2 5 0 2 1 0 0 2 0 1 1 3 0 0
b02: These four management theories can all be found in today’s workforce and each of them has their own specific way of managing employees and accomplis8 8 3 1 4 0 3 2 2 1 0 1 1 0 1 0 0
b03: Classical/ scientific mxnagement focuses on pay incentives and external rewards for job completion. Efficiency is the goal of the organization. Empowerme0 5 2 1 0 0 1 2 0 1 0 0 1 1 0 2 0
b04: Classical Management, with innovators such as Henri Fayol had a lot to do with maximizing productivity for the business’s sake. It has strict division of labo4 4 3 1 3 0 0 0 4 0 0 1 2 1 1 0 0
b05: Various management theories have been explored throughout history, each with its own benefits and disadvantages. The first of these was classical manag9 4 2 1 4 1 0 1 1 1 2 4 1 1 5 0 0
b06: Management is an evolving science that has taken many different perspectives: One of the first people to lead an organized study on management was Tay13 6 1 0 1 2 1 3 3 1 2 3 1 1 1 0 1
Expert Essay [7, 8, 7, 6, 4, 0, 1, 2, 4, 1, 1, 1, 2, 2, 2, 2, 0]
Linear order is preserved by ALA-Reader (not just
bag-of-words)

ALA-Reader (as excel spreadsheet)

Raw data from ALA-Reader to
Pathfinder network
Slide 8 of 20
DATA: Expert link.txt
similarities
17 items
1 decimals
0.1 min
1 max
matrix:
1 1 0 1 0 1 1 1 1 0 0 1 0 1 1 0 0
1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0
0 0 1 0 1 1 0 1 1 1 0 1 0 0 0 1 1
1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0
0 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
Expert Essay  ALA-Reader software (n x n data points) 
Linear order is preserved by ALA-Reader
(not just bag-of-words)
Pathfinder
KNOT. r, and
also NodeXL
Central ideas
(high degree nodes)
& peripheral ideas
(lower degree nodes)

Expert essay referent network (same raw
data as directed and as undirected networks)
Slide 9 of 20
Symmetric array
(undirected Pfnet)
Asymmetric array
(directed Pfnet)
25 links in common
45 (90) links
57 links
Note: central and peripheral terms are the same in both networks

• Participants are 45 undergraduates enrolled in a business course
• During the regularly scheduled examination week at the end of
the semester, all students completed the customary final
examination for the course (worth 25% of their final course
grade) and also answered an extended-response essay question
for extra credit.
• Writing prompt: “Describe and contrast in an essay of 300 words
or less the following four Management theories: Classical/
scientific management, Humanistic/Human Resources,
Contingency, and Total Quality Management.”
• The essays were scored by three human raters and by the
ALA-Reader software (with using Pathfinder Network analysis)
Slide 10 of 20

The essays were scored by three human raters and by ALA-Reader
software (then using Pathfinder Network analysis links in common)
Research questions: For analysis of student essays using ALA-
Reader, which is better
1. Analysis of undirected networks or of directed networks?
2. Pattern analysis across sentence boundaries (document wise)
or NOT across sentence boundaries (sentence-wise)
Slide 11 of 20
expert
student
34 23 12
Links in common

Correlation Results
Document-wise (no
breaks between
sentences)
Sentence-wise
(breaks between
sentences)
Raters(3,median)
FinalExam
#ofwords
DirectedPfnetCMN
(toexpert)
UndirectedPfnetCMN
(toexpert)
DirectedPfnetCMN
(toexpert)
UndirectedPfnetCMN
(toexpert)
Raters (3, median) 1 .517 .600 .732 .655 .733 .615
Final Exam .517 1 .372 .342 .298 .332 .283
# of words .600 .372 1 .404 .319 .390 .307
Document-wise (no breaks between sentences)
Directed Pfnet CMN
(to expert)
.732 .342 .404 1 .923 .986 .881
Undirected Pfnet CMN
(to expert)
.655 .298 .319 .923 1 .919 .977
Sentence-wise (breaks between sentences)
Directed Pfnet CMN
(to expert)
.733 .332 .390 .986 .919 1 .899
(to expert)
.615 .283 .307 .881 .977 .899 1
r > .290, p > .05 and r > .400, p > .01
• Directed > undirected (but only a
small difference)
• Document wise analysis = sentence
wise analysis; for these essays,
sentence boundaries don’t matter
• ALA-Reader data inter-correlations
all high
• Number of words in the student
essays strongly correlated with
raters’ scores (r = .60)
• Stepwise multiple regression
analysis was used to test if the essay
features significantly predicted
human essay rater scores. The
results of the regression indicated
two predictors explained 64.6% of
the variance (F(2,42)=38.395,
p<.0001). It was found that the
directed network common scores
significantly predicted rater scores (β
= .585, p<.0001), as did essay word
count (β = .364, p=.001).
Click bigger

of xx
Document-wise (no
breaks between
sentences)
Sentence-wise
(breaks between
sentences)
Raters(3,median)
FinalExam
#ofwords
DirectedPfnetCMN
(toexpert)
UndirectedPfnetCMN
(toexpert)
DirectedPfnetCMN
(toexpert)
UndirectedPfnetCMN
(toexpert)
Raters (3, median) 1 .517 .600 .732 .655 .733 .615
Final Exam .517 1 .372 .342 .298 .332 .283
# of words .600 .372 1 .404 .319 .390 .307
Directed Pfnet CMN
(to expert)
.732 .342 .404 1 .923 .986 .881
(to expert)
.655 .298 .319 .923 1 .919 .977
Directed Pfnet CMN
(to expert)
.733 .332 .390 .986 .919 1 .899
(to expert)
.615 .283 .307 .881 .977 .899 1

Correlation Results
Document-wise (no
breaks between
sentences)
Sentence-wise
(breaks between
sentences)
Raters(3,median)
FinalExam
#ofwords
DirectedPfnetCMN
(toexpert)
UndirectedPfnetCMN
(toexpert)
DirectedPfnetCMN
(toexpert)
UndirectedPfnetCMN
(toexpert)
Raters (3, median) 1 .517 .600 .732 .655 .733 .615
Final Exam .517 1 .372 .342 .298 .332 .283
# of words .600 .372 1 .404 .319 .390 .307
Directed Pfnet CMN
(to expert)
.732 .342 .404 1 .923 .986 .881
(to expert)
.655 .298 .319 .923 1 .919 .977
Directed Pfnet CMN
(to expert)
.733 .332 .390 .986 .919 1 .899
(to expert)
.615 .283 .307 .881 .977 .899 1
r > .290, p > .05 and r > .400, p > .01
• Directed > undirected (but only a
small difference)
• Document wise analysis = sentence
wise analysis; for these essays,
sentence boundaries don’t matter
• ALA-Reader data inter-correlations
all high
• Number of words in the student
essays strongly correlated with
raters’ scores (r = .60)
• Stepwise multiple regression
analysis was used to test if the essay
features significantly predicted
human essay rater scores. The
results of the regression indicated
two predictors explained 64.6% of
the variance (F(2,42)=38.395,
p<.0001). It was found that the
directed network common scores
significantly predicted rater scores (β
= .585, p<.0001), as did essay word
count (β = .364, p=.001).

Next steps
• Working with Ping Li’s neuroimaging lab at Penn
State to consider the possible neural influence of
the text’s and of reader’s knowledge structure of
those texts
• Further develop the GIKS “universal” feedback
writing tool
Slide 15 of 20

Central vs. peripheral neural
correlates
“Central ideas are functionally distinct from peripheral ideas,
showing greater activation in the PCC and PCU, while over the
course of passage comprehension, central and peripheral
ideas increasingly recruit different parts of the semantic
control network. The finding that central information elicits
greater response in mental model updating regions than
peripheral ideas supports previous behavioral models on the
cognitive importance of distinguishing textual centrality.” (p.
853)
Swett, K., Miller, A.C., Burns, S., Hoeft, F., Davis, N., Petrill,
S.A., & Cutting, L.E. (2013). Comprehending expository texts:
the dynamic neurobiological correlates of building a coherent
text representation. Frontiers in Human Neuroscience, 7, 853-
867. doi:10.3389/fnhum.2013.00853
Slide 16 of 20

GIKS
1. Enter list of key terms
with synonyms and
metonyms
2. Enter the writing
prompt
3. Enter expert referent
essay or term-term
data
4. Set the expert network
layout positions
5. Distribute ID#s for the
study
Slide 17 of 20
http://giks.herokuapp.com/

Student entry screen
Slide 18 of 20

Student feedback screen
Slide 19 of 20

Stop, done
• Questions
Slide 20 of 20

How many terms in the ALA-
Reader pattern matching?
• The optimal number of terms for ALA-Reader is
unknown (note that many LSA studies use 300-
element vectors). A recent dissertation indicates
about 20 (Fanella, 2015)
Slide 21 of xx
https://etda.libraries.psu.edu/catalog/26367
Expert essay and concept
map of this content
surfaced 17 terms for this
analysis

NodeXL network groups redrawn as a cmap
Could humans live on Mars some day?
Scientists ask this question because Earth and Mars are similar.
Similar to Earth’s day, Mars’s day is about 24 hours long.
Also, both planets are near the Sun in our solar system.
Earth is the 3rd planet and Mars the 4th planet from the Sun.
Mars also has an axial tilt similar to Earth's axial tilt.
An axial tilt gives both planets seasons with temperature changes.
Just like Earth, Mars has cold winters and warmer summers.
Like Earth, Mars has winds, weather, dust storms, and volcanoes.
But in some ways, Earth and Mars are different.
Differences include temperature, length of a year, and gravity.
The average temperature is –81o F on Mars, but 57o F on Earth.
A Martian year is almost twice as long as an Earth year.
Earth’s gravity is almost 3 times stronger than Martian gravity.
Given the similarities, can humans go to Mars and live there?
NASA scientists want to answer this question.
NASA oversees U.S. research on space exploration.
NASA scientists send devices called spacecraft to explore Mars.
The spacecraft carry rovers that can rove or move around.
These wheeled rovers can explore characteristics of the planet.
They can take pictures of mountains, plains, and dust storms on Mars.
One of these NASA rovers is named Curiosity.
Curiosity found evidence that soil on Mars contains 2% water.
NASA has planned a new mission called Mars 2020.
This mission will use a new car–sized rover to examine Mars.
The new rover will contain additional instruments to study Mars.
For example, one instrument will take images beneath Mars’s surface.
Another instrument will attempt to make oxygen from carbon
dioxide.
Mars 2020 will help scientists answer important questions.
It will explore whether there has been life on Mars.
It will also answer whether humans can live on Mars in the future.
Mars lesson text (or eye track)

ALA-Reader articles
• Zimmerman, W. et al. (2018). Computer-automated approach for scoring short essays in an introductory statistics course.
Journal of Statistics Education, 25, in press.
• Kim, K., & Clariana, R. (2018). Applications of Pathfinder Network scaling for identifying the optimal use of a first
language to support second language text comprehension. Educational Technology Research and Development, in press.
• Kim, K., & Clariana, R. (2017). Text signals influence second language expository text comprehension: Knowledge
structure analysis. Educational Technology Research and Development, 65, 909-930. Online First,
http://link.springer.com/article/10.1007/s11423-016-9494-x.
• Kim, K., & Clariana, R.B. (2015). Knowledge structure measures of reader’s situation models across languages:
Translation engenders richer structure. Technology, Knowledge and Learning, 20, 249-268.
• Clariana, R.B., Wolfe, M. B., & Kim, K. (2014). The influence of narrative and expository text lesson text structures on
knowledge structures: alternate measures of knowledge structure. Educational Technology Research and Development,
62 (4), 601-616. doi:10.1007/s11423-014-9348-3
• Clariana, R.B. (2010). Deriving group knowledge structure from semantic maps and from essays. In D. Ifenthaler, P.
Pirnay-Dummer, & N.M. Seel (Eds.), Computer-Based Diagnostics and Systematic Analysis of Knowledge (Chapter 7, pp.
117-130). New York, NY: Springer.
• Clariana, R.B., Wallace, P.E., & Godshalk, V.M. (2009). Deriving and measuring group knowledge structure from essays:
The effects of anaphoric reference. Educational Technology Research and Development, 57, 725-737. ETRD.pdf
• Clariana, R.B., & Wallace, P. E. (2007). A computer-based approach for deriving and measuring individual and team
knowledge structure from essay questions. Journal of Educational Computing Research, 37 (3), 209-225. link
• Koul, R., Clariana, R.B., & Salehi, R. (2005). Comparing several human and computer-based methods for scoring concept
maps and essays. Journal of Educational Computing Research, 32 (3), 261-273.
Slide 23 of xx

Directed versus undirected network analysis of student essays

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Directed versus undirected network analysis of student essays

Similar to Directed versus undirected network analysis of student essays (20)

More from Roy Clariana

More from Roy Clariana (7)

Recently uploaded

Recently uploaded (20)

Directed versus undirected network analysis of student essays