Talk presented at #ICLS2016 presented in Singapore. I discuss levels of description as sites of epistemic cognition focusing on writing and use of textual features to associate rubric scores with epistemic cognition.
My thanks to my collaborators (listed on the paper) particularly Laura Allen, who also generously let me adapt the later slides on NLP studies of writing.
Abstract: Literacy, encompassing the ability to produce written outputs from the reading of multiple sources, is a key learning goal. Selecting information, and evaluating and integrating claims from potentially competing documents is a complex literacy task. Prior research exploring differing behaviours and their association to constructs such as epistemic cognition has used ‘multiple document processing’ (MDP) tasks. Using this model, 270 paired participants, wrote a review of a document. Reports were assessed using a rubric associated with features of complex literacy behaviours. This paper focuses on the conceptual and empirical associations between those rubric-marks and textual features of the reports on a set of natural language processing (NLP) indicators. Findings indicate the potential of NLP indicators for providing feedback regarding the writing of such outputs, demonstrating clear relationships both across rubric facets and between rubric facets and specific NLP indicators.
Writing Analytics for Epistemic Features of Student Writing #icls2016 talk
1. Writing Analytics for Epistemic
Features of Student Writing
Simon Knight
@sjgknight
www.sjgknight.com
Simon Knight, University of Technology Sydney
Laura Allen, Arizona State University
Karen Littleton, Open University
Dirk Tempelaar, Maastricht University
2. • Arizona State University (Laura Allen)
• Open University (Karen Littleton & Bart Rienties)
• Maastricht University (Dirk Tempelaar & team)
• Rutgers University (Chirag Shah & Matthew Mitsui)
Acknowledgements
4. Sites of epistemic
cognition: Situations
a parent is
attempting to
understand
information around
childhood
vaccinations;
Public domain image from
https://en.wikipedia.org/wiki/File:Fluzone_vaccine_extr
5. Sites of epistemic cognition:
Situations
a voter wants to
investigate the
plausibility of a
politician’s climate
change denial;
By Twm CC-By-NC-ND
https://www.flickr.com/photos/twmlabs/29463820/
6. Sites of epistemic
cognition: Situations
someone seeking to
lose weight wishes
to investigate the
merits of diet
versus regular
foodstuffs or
supplements.
Public domain image from
https://commons.wikimedia.org/wiki/File:%22Miracle_Cure!%
22_Health_Fraud_Scams_%288528312890%29.jpg
7. Sites of epistemic cognition: Activities
The information seeker requires more
than just the ability to read content; they
must make complex decisions about
where to look for information, which
sources to select (and corroborate), and
how to synthesise (sometimes competing)
claims from across sources.
Rouet [39] – students should be taught:
• Skill of integration
• Skill of sourcing
• Skill of corroboration
8. “reading literacy is
understanding, using,
reflecting on and engaging
with written texts, in order
to achieve one’s goals, to
develop one’s knowledge
and potential, and to
participate in society.”
(OECD, 2013, p. 9).
Sites of epistemic cognition: Activities
9. “epistemological beliefs are a lens for a learner's views
on what is to be learnt” (Bromme, 2009)
The Lens of Epistemic Beliefs
10. “exploring students’ thought processes during online
searching allows examination of personal epistemology
not as a decontextualized set of beliefs, but as an
activated, situated aspect of cognition that influences the
knowledge construction process” (Hofer, 2004, p. 43).
The Lens of Epistemic Beliefs: Activities
11. • Certainty – static to tentative & evolving
• Simplicity – discrete to holistic
• Source – external to constructed by self
• Justification – authority to evaluation
of knowledge (Mason, Boldrin, & Ariasi, 2009)
Epistemic Cognition
12. Epistemic cognition
• Certainty – static to tentative & evolving
• Simplicity – discrete to holistic
• Source – external to constructed by self
• Justification – authority to evaluation
of knowledge (Mason, Boldrin, & Ariasi, 2009)
• source, corroborate, and integrate claims – key
facets of literacy for mature internet use (Rouet,
2006, p. 177)
15. MD-TRACE & epistemic cognition relationships (Bråten et
al., 2011)
Facet of
cognition
Less adaptive More adaptive
Simplicity Accumulation of facts,
prefer simple sources
Integrated, downplay simple
sources
Certainty Single document sourcing Corroboration, represents
complex perspectives and views
showing the diversity of angles
Source Emphasizes own opinion,
differentiates between
sources less
Emphasizes source
characteristics, distinguishes
between source trustworthiness
Justification Emphasizes authority,
less corroboration
Emphasizes use of argument
schema and combination of
corroboration and authority
16. Sites of epistemic cognition: Products
• Written outputs (summaries,
reports, tests, etc.)
• Cognitive process (think aloud)
• Problem navigation (pages
viewed, searches made, etc.)
• Help seeking & collaborative
dialogue
• Implicit/explicit assessments of
source-trust
17. Learning Analytics
• Increasing technology use:
– foregrounds some learning needs around
complexity of literacy
– affords opportunity for research & feedback
19. Study Design
• ~1100 Maastricht 1st year business & economics
students
Participants
20. Study Design
• Maastricht study credit
• Coagmento terms
• + wider research consent
Consent & ethics
21. Study Design
• ~1100 Maastricht 1st years
• ~250– individual (software issues)
Participants
22. • ~1100 Maastricht 1st years
• ~250– individual (software failure)
• ~250 – collaborative but discarded data
(software issues)
Study Design
Participants
23. • ~1100 Maastricht 1st years
• ~250– individual (software failure)
• ~250 – collaborative but discarded data
(software failure)
• ~600 – collaborative & data used
Study Design
Participants
25. Tasks
• Two collaborative tasks facilitated by a browser
add-on
• ‘Warm up’ task – fact retrieval
• One group provided with documents; the second
group searches on the web
• “A review of the best supported claims around
the risks” of a substance (herbicide or food
supplement)
26.
27. 28
Friends of
the Earth:
Press
Release
(Urine
presence)
FoE
Commission
ed report
(‘scientific’)
(-ve)
Science-
Literacy
website:
Refutation
(+ve)
Farmer’s
Weekly
Reprints
(+ve)
Related
peer-review
publication
(Limited
risk)
Peer-review
publication
Health
danger
Reuters
Reprints
main claims
Blogger
Critiques
journal &
author
Peer-review
publication
(Limited
health risks)
Peer-review
review of
literature
(Limited risk
to health or
plants)
Peer-review
of lit
(Limited
risk; control
suggestions)
Urine
Health
Agricu
lture
29. Figure3:3: Coagmento Screenshots (from top: 3.3.1 A full screen display from a browser window;
3.3.2 The toolbar element; 3.3.3 Sidebar with Chat displayed; 3.3.4 Sidebar with Snippets displayed)
30. Sites of Epistemic Cognition
• Situation
– ‘best supported claims around the risks of x’ as a
government advisor
• Activity
– Multiple document literacy
• “Products”
– Process data, written output, survey items
• Units
– Collaborative pairs, with both snapshot (survey, product
assessment) & dynamic (process, chat analysis) analyses
33. Product Textual Indicators
Analysis of written outputs for implicit/explicit sourcing and
trustworthiness evaluations (e.g. Anmarkrud, Bråten, &
Strømsø, 2014; Bråten, Braasch, Strømsø, & Ferguson, 2014)
Doc / Rank
= 1
= 2
= 3
Doc
A
Doc
B
Doc
C
34. Product Textual Indicators
• Goldman, Lawless, Pellegrino and Gomez (2012) identified three clusters
of students from their written outputs: satisficers, who selected few
sources; selectors who selected many sources but did not connect them;
and synthesisers who selected sources and integrated them.
Doc
A
Satisficer
Doc
B
Doc
C
Lots of
text A
Selector
•Text C
•Text A
•Text B
Synthesiser
A ¬ B, C
supports
B but…
35. Product Textual Indicators
Hastings, P., Hughes, S., Magliano, J. P., Goldman, S. R., & Lawless, K. (2012).
Assessing the use of multiple sources in student essays. Behavior Research
Methods, 44(3), 622–633. http://doi.org/10.3758/s13428-012-0214-0
Doc
A
Doc
B
Doc
C
“A quotation from
text A”, followed by
some paraphrased text B.
Some key language is copied
from text A drawing inference
between A and B…
No
shared
lang
37. Writing skills are important for success in our school,
workplace, and personal lives
Geiser & Studley, 2001; Light, 2001; Powell, 2009; Sharp, 2007
By ccarlstead CC-By https://www.flickr.com/photos/cristic/359572656/
38. Geiser & Studley, 2001; Light, 2001; Powell, 2009; Sharp, 2007
By ccarlstead CC-By https://www.flickr.com/photos/cristic/359572656/
Writing skills are important for success in our school,
workplace, and personal lives
39. By ccarlstead CC-By https://www.flickr.com/photos/cristic/359572656/
Geiser & Studley, 2001; Light, 2001; Powell, 2009; Sharp, 2007
Writing skills are important for success in our school,
workplace, and personal lives
40. Geiser & Studley, 2001; Light, 2001; Powell, 2009; Sharp, 2007
Writing skills are important for success in our school,
workplace, and personal lives
47. Words serve as proxies
to
actions, skills,
interactions, emotions,
thoughts…
48. NLP tools calculate numerous indices related
to the characteristics of language
Words
Syntax
Reasoning
Affect
49. Identified a number of individual differences
related to proficiency on writing assessments
Vocabulary1
Motivation2
Strategy Knowledge3
Working Memory4
1 Allen et al., 2014
2 Pajares et al., 2001; 2003
3 Roscoe & McNamara, 2013
4 Kellogg, 2008
50. Natural Language Processing
Analysis of the language produced by humans
Uses:
• Various statistical techniques
• Various sources of information in language
In order to:
• Understand language
• Respond to the “speaker” appropriately
Bird, Klein, & Loper, 2009; Crossley, Allen, Kyle, & McNamara, 2014
53. There’s a lot of text on the following slides - sorry
54. Product Textual Indicators - Qualitative
Across the rubric facets variations in outcome were
characterized by, for example:
• Synthesis: Lists v integration
• Topic Coverage: Sparse keywords/tight subtopic focus
vs range of themes & keywords
• Source Diversity: ‘One best’ article vs. multiple sources
• Source quality: Uncritical citation of claims, even where
claims disagreed, versus identification, critique &
connection of source quality & disagreement
55. Product Textual Indicators (MDP only)
TAACO:
• basic indices (‘information’ indicator)
– Tokens, word types, type-token ratios
• sentence overlap (local cohesion)
– All, content (e.g. topic), and function (e.g. rhetorical) word
overlap
• paragraph overlap (global cohesion)
– Overlap at paragraph level per sentence level
• connectives (local cohesion)
– basic connectives, sentence linking connectives, and reason
and purpose connectives
56. Product Textual Indicators – TAACO to Rubric
Exploratory analysis
• Synthesis – global cohesion
• Topic Coverage – basic indices
• Source Diversity – basic indices + local cohesion
• Source Quality – reason & purposive connectives
57. Product Textual Indicators (MDP only)
Low to moderate correlations (.1-.4 range) of indices
to scores on rubric facets
• Synthesis:
– -ve association to basic indicators (i.e. Longer texts
synthesised less)
– +ve association to sentence & connective indicators (i.e.
more synthesis related with local but not global
cohesion in these texts)
– No sig association to paragraph level indices (perhaps
due to thematic shifts & copy-pasting)
58. Product Textual Indicators (MDP only)
Low to moderate correlations (.1-.4 range) of indices to
scores on rubric facets
• Topic coverage:
– +ve association to lexical diversity (rather than n of words)
– -ve association to local sentence cohesion & connectives -
indicating that higher topic scores perhaps tended to involve
more ‘listing’ of claims from sources, with less integration of
those claims on a local level (a feature observed in the
scoring exercise)
• Source diversity.
– Similar to topic coverage, with stronger associations to
logical connectives (linking sources for similar claims)
59. Product Textual Indicators (MDP only)
Low to moderate correlations (.1-.4 range) of indices to
scores on rubric facets
• Source quality.
– +ve association to lexical diversity (or information given) in
the type/token ratio (§1),
– +ve association (as in synthesis) a relationship to sentence
overlap (§2) indicating that local cohesion was being built
(suggesting local argumentation focused on specific topics).
– But also +ve association to paragraph overlap (§4) indicating
that those who evaluated tended to build a cohesive
argument through their text, making purposeful connections
(§3) between sentences.
60. Future Directions
• Analysis of source
documents
• Collaborative
contribution
• Other measures of ep-
cog comparison with
writing
By Andrea_44 CC-By
https://www.flickr.com/photos/andrea_44/2680944871/
61. Thank you (and questions)
Acknowledgements:
• Arizona State University (Laura Allen)
• Open University (Karen Littleton & Bart Rienties)
• Maastricht University (Dirk Tempelaar & team)
• Rutgers University (Chirag Shah & Matthew Mitsui)
@sjgknight http://sjgknight.com/
62. MDPFor this task, you will be researching a chemical used in herbicide (Roundup) called Glyphosate.
Your task is to act as an advisor to an official within the science ministry. You are advising an
official on the issues below. The official is not an expert in the area, but you can assume they are
a generally informed reader. They are interested in the best supported claims in the
documents. Produce a summary of the best supported claims you find and explain why you
think they are. Note you are not being asked to “create your own argument” or “summarise
everything you find” but rather, make a judgement about which claims have the strongest
support.
A colleague has already found a number of documents for you to process with your partner, you
should use these to extract the best supported claims (without using the internet to find further
material).
You should:
Read the questions/topic areas provided, these will require you to find information and
arguments in the documents to present the best supported of these, you should decide with
your partner which are best as you read.
Group information together by using headings in the Editor
You should work with your partner to explain why the claims you’ve found are the best available
You should spend about 45 minutes on this task
A review is coming up for the license of Glyphosate, the official would like to know the best
supported claims around its risks.
A colleague has collected some documents, available from the
63. ICLS presentation notes
• Each of the three presenters will give a 25 minute talk followed by a 5
minute discussion. The chair is responsible for keeping times and for
creating the conditions for productive discussions. Since people will be
moving between sessions, it is important that everyone keeps to the time
allocated in the program.
The computer (a desktop PC) in our conference rooms are connected to
the internet and have Windows 7 operating system and Microsoft Office
2013 suite installed. You need to bring your presentation files on a PC-
formatted USB stick if you want to use this computer.
If you are going to use a Mac (Apple device) or any other device, please
remember to bring along the necessary adapter (e.g. mini Display port to
VGA adaptor) so that you can project your presentation on our system via
a VGA port.