The document discusses using data science techniques like sentiment analysis, topic modeling, and noun extraction to analyze descriptions of items in the Europeana 1914-1918 collection. This revealed topics like soldiers' bravery, nurses caring for injured soldiers, and patriotic symbols. New contextualizing labels were then created for storytelling and reuse. User studies were proposed to evaluate how users interact with the collection and appropriated technologies. The aim is to reflect critically on biases from makers, platforms, and tools to increase transparency and awareness in digital hermeneutics.
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
"Storytelling and creative reuse with linked (open) data: How data science and user analysis reveal 'hidden stories' in Europeana"
1. Storytelling and creative
reuse with linked (open) data
How data science and user analysis reveal
'hidden stories' in Europeana
dr. Berber Hagedoorn
Assistant Professor Media Studies
University of Groningen, the Netherlands
b.hagedoorn@rug.nl
https://berberhagedoorn.wordpress.com
Workshop “Next Generation Research with Europeana: the
Humanities and Cultural Heritage in a Digital
Perspective”, DH2019, Utrecht
2. What is creative reuse and why
is it relevant for researchers?
• “Creative reuse is the
process whereby one or
multiple works, or parts
thereof, are combined into a
new work that is original, i.e.
a non-obvious extension,
interpretation or
transformation of the source
material” (Cheliotis, 2007)
For protocol,
models, datasets
see:
https://tinyurl.com/
y3ya4qb2
4. Need for critical (self-) reflection:
Who / what are your 'filters'? What are
your information bubbles?
Image source: 'Filter Bubbles and Echo Chambers'
https://www.youtube.com/watch?v=Zk1o2BpC79g
5. Storytelling:
scholars and
professionals
Creative perspectives…
See: https://pro.europeana.eu/data/11-
11-memories-retold
… as well as scholarly perspectives
See: Hagedoorn & Sauer (2019), “The
Researcher as Storyteller: Using Digital Tools
for Search and Storytelling with Audio-Visual
Materials” in VIEW www.viewjournal.eu
6. Main question
• Using a combination of
data science and
qualitative analysis to
understand platform
engagement and map
out requirements for
creative reuse and
storytelling with the
Europeana 1914-1918
thematic collection
Europeana 1914-
1918, Femmes
peintres
photographs of
women
responsible for
painting canvas
planes in WWI
7. Aim project
• Main starting point is that the selection
of historical sources in a database adds
another – more or less visible – layer of
representation, and interpretation
• Can data science offer opportunities to
bring emotion 'back' into these sources?
• Can user analysis help here to better
understand the value of such personal
narratives in digital(ized) cultural
heritage for creative reuse, storytelling
and research, and how it is shaped in
practice by interaction of platform-user?
An unidentified
news report about
various aspects of
the First World
War on the
Europeana 1914-
1918 platform
8. Data scraping using Python
library Selenium
• Selected collections: Films; Women in WWI; Diaries &
Letters; Photographs; Official Documents; Aerial warfare
• Dataset with item number; title of item; description of
item; type; provider; institution; creator; first published
in Europeana; subject (=list of different keywords);
language; providing country; item link; linked open data
YES or NO; and collection
9. Translation: normalizing into
English (automatic + manual)
Example of
Europeana 1914-1918
item and description
'The contribution of
Cypriot women in the
First World War'.
17 languages in Europeana (Italian, Polish, Czech etc).
10. Sentiment value for every
description
• Demands for improving affective computing that extracts
people's sentiments from online data has been on the rise
(Cambria, 2016). Sentiment analysis (opinion mining and
emotions AI) uses natural language processing and text
analysis to recognize, extract and examine affect and
information; classifying the polarity of a text as positive,
negative or neutral.
• Python library TextBlob provides pre-trained models to
quite accurately predict sentiment of a sentence (array of
tokens), in a range of (-1, 1), -1 = most negative limit, and
1 = positive.
• When calculating sentiment for a single word, TextBlob
uses “averaging”, it finds words and phrases it can assign
polarity to (‘great’ or ‘disaster’), and it averages them all
together for longer text such as sentences.
11. Distribution of sentiment in the World War I Diaries and Letters collections
A visible cluster of positive sentiments near 0 (so around 0 - 0.5) could easily be
expected in correspondence between soldiers and their families or diaries,
where emotions such as hope, affection, love, longing, etc. could be present.
12. Sentiment
analysis
example
• Item description:
“Drama in which two
kidnapped persons,
employees of a diamond
cutting establishment,
chase their kidnappers, a
mine owner and his lover”
• Sentiment score: -0.6
• Discussion: challenges of
the Europeana dataset &
relation to user studies
13. Topic modelling and noun
extraction
• Topic modelling is a machine learning and natural language processing
method allowing for the discovery of stories in terms of more vague, abstract
or 'hidden' topics within a collection. The topics that are extracted from this
process are clusters of comparable words. Analysed through a mathematical
framework, the statistics of each word can help deduce not only what each
topic might be, as well as the overall topic balance in the whole collection
(Papadimitriou et al., 1998; Blei, 2012).
• As a first step, nouns were extracted from the descriptions using TextBlob.
'display-case', 'photographs', 'right', 'son', 'brother', 'biplane', 'identity',
'tag', 'end', 'right', 'medal', 'family', 'disability', 'officer', 'whistle',
'handgun', 'pistol', 'protection', 'county', 'region', 'war', 'family',
'grandson', 'display-case', 'display', 'city'
14. Example:
Women in World War I collection
• Each item is accompanied by a description, which depending on the item varied
in sizes. Therefore, the first step in the process would be to analyse the
descriptions of the items. However, a data problem arises, concluding that the
deviation of the description sizes was too big (3-386 words), something that
could create problems with using standard text-mining techniques, such as
topic modelling and clustering. Instead, custom labels were produced, after
a lengthy manual annotating process of the collection, where context and the
most concise information from each item were extracted by the annotator.
Descriptions size Labels size
Mean 104.38 9.95
Min 3.00 1.00
Max 386.00 41.00
Statistics of descriptions and labels, regarding size
15. Automated topic modelling
in Python
• Topic modelling is used in order to extract possible contexts and
topics of interest. For our research we mainly used the Python library
for machine learning Scikit-learn and we also used the Gensim library,
which provide the LDA algorithm.
• Topic modelling as a text-mining technique allowed for the
identification of word associations, that led to the creation of
new topics, which derived from the comprehension of the likeliness of
items/images. In order for the number of topics to be produced, a
coherence score was incorporated, in order to figure out the possibility of
a good topic size. By experimenting from 2 to 14 topics, it seemed like
the 6 topics might have had a higher coherence score, but the 8 topics
made more sense to the annotator.
16. Example: Results of the LDA algorithm for 8 topics, Women in World War I
Topic
Number
Words Topics produced
[0] courage, bravery, honour, medal,
left_behind, certificate, woman,
medals, widow, Irish
Soldiers fought with bravery and
courage and either received medals
upon their returns or their wives
received their death certificates.
[1] soldiers, active_duty, care, war,
recovery, nurse, bravery, uniform,
postcards, military_hospitals
Brave nurses worked at military
hospitals and took care of injured
soldiers until they recovered. Often,
they received letters/postcards of
gratitude.
[2] nationalistic, patriotic, sadness,
symbols, army, educated, training,
women, young, possible_death
Many postcards contained patriotic
and nationalistic symbols, which
were often sent by young and
educated people in the army or by
women.
[3] transfer, horse, family, hospitals,
hard_work, censorship. hospital,
help, brothers, doctor
Families worked hard to sustain
themselves and send help to soldiers,
who sometimes transferred or got
injured.
17. [4] war, soldier, man, injured, family,
woman, children, correspondence,
war_life, letters
Soldiers corresponded with their
families, sending letters with their
news about life at the front. Often,
they got injured.
[5] affection, woman, portrait, child,
album, love, handicrafts,
no_war_discussion, man, married
Many postcards featured family
portraits of the soldiers or crafts on
them, containing words of love and
affection. Usually if more sensitive
soldiers survived, they never
mentioned the war again.
[6] soldier, wife, death, marriage,
man, war, letters, survived,
worker, War
Many soldiers were workers before
the war and they exchanged letters
with their partners or got married
upon their return, provided they
survived.
[7] sister, gender_stereotypes,
postcards, elegant, photos, irish,
red_cross, everyday_life,
messages, fundraising
Rich women often helped the war
cause by fundraising, whereas other
volunteered at the Red Cross,
contributing more than society
thought possible.
Topic
Number
Words Topics produced
Example: Results of the LDA algorithm for 8 topics, Women in World War I
19. Followed up and
evaluated by means of:
• Annotation using manual labelling
NB we tried not to use the words which were
already presented in the description, but
either to use synonyms, generalisation or
possible associations
• Automated labelling: clustering with
unsupervised machine learning
shows the distribution of topics and
sentiments among the items and collections
and the variety
• Thus, offering new contextualization
'I stand in gloomy
midnight!' A field
service postcard
featured in the
Women in WWI
collection.
20.
21. New labels as contextualization for
storytelling and creative reuse with
the collection (1/2)
➢ Defining new topics, including topics which are impossible to find with
algorithm by using a combination with manual approaches such as
manual topic modelling (defining new keywords/topics manually and
assigning them to items)
• An example is the topic 'domestic life', a key theme in Women in World
War I, currently not available for instance as a filter in search
➢ Improving the search algorithm in the collection (new keywords; new
filters)
➢ Creating meaningful links between items (new sub collections)
22. New labels as contextualization for
storytelling and creative reuse with
the collection (2/2)
➢This contextualization goes beyond present
information in metadata such as
descriptions
➢Show the distribution of topics and
sentiments among the items and collections
and the variety (e. g. if there is a large
difference between the lowest and the
highest sentiment)
➢Incorporating human annotators in the
process of labelling (!)
During WW1
young women
corresponded
with one another
by postcard,
Women in WWI
23. User studies: the sociology of DH
• User studies observe technology use in practice, and
therefore can show how users appropriate technologies (a.o.
Haddon 2011; Oudshoorn & Pinch 2003)
• User studies can serve to evaluate technologies in UI/UX
testing (i.e. User Interface Design and User Experience testing)
and pre-conceived use cases (Warwick 2012) but can also help us
understand how technologies are more and more becoming
a part of disciplinary practices
see further in Hagedoorn & Sauer 2019, in VIEW
24. User tasks
• Did users find what they were looking for?
• Reflection on a.o. (successful) keywords,
items useful for reuse in creative content (out
of how many & why), how many new angles /
new learning / new research questions were
triggered or fine-tuned…
• Talk aloud protocol (thinking aloud)
• what are they trying to do or find
• why take an action or make a choice
• how is platform interpreted
• but: alters task? hard to talk if they are
concentrating; for a few participants
also unnatural/uncomfortable
25. Creative reuse + storytelling on Europeana;
e.g. selection; UGC
https://www.europeana.eu/
portal/en/record/08615/001
0827a07.html?q=soldiers#d
cId=1556192615499&p=1
26. Creative reuse + storytelling on Europeana;
e.g. search and sentiments
Patriotic cartoon postcard.
Contributed by Tony Cole via
Europeana 1914-1918, CC BY-SA
Field service
postcard
http://blog.europeana.eu/201
8/10/use-of-propaganda-in-
wwi-postcards/
27. Creative reuse and storytelling: a
critical digital hermeneutics perspective
• I argue for and in my project seek to find methods to:
update (digital) hermeneutics including critical (self-)reflective
approaches
study of maker–platform–user in interaction
reflecting on and seeking transparency of (pre-)selection or bias, of
maker, user (a.o. researcher) and used media platforms, databases,
tools (interaction, affordances)
• Digital technology can increase options for awareness of the process
and the product (a.o. Bolter et al 2006; 1999)
• Today we are more self-reflective then ever before, but are we also
critical? Project Creative reuse and
storytelling with Europeana
1914-1918, Hagedoorn 2019
28. Thank you for your attention!
dr. Berber Hagedoorn
Assistant Professor Media Studies
University of Groningen, the
Netherlands
b.hagedoorn@rug.nl
https://berberhagedoorn.wordpress.com
For protocol, models,
datasets see:
https://tinyurl.com/y
3ya4qb2